In the beginning of the year I was helping readwise.io get some of their observability up to snuff. A few weeks later “What should I monitor?” came up on another call, so I decided to list out the metrics that I expect from a great dashboard:
p50
, p90
, p99
, sum
, avg
/min
p50
, p90
, p99
%
/min
p50
, p90
, p99
, sum
, avg
{error, success, retry}
p50
, p90
, p99
, count
, by type
/min
More details about what these all mean in the latest napkin post!
Any favourites of yours missing? Let me know.
P.S. On Thursday night, eastern time, I’ll be doing a short talk about napkin math on memory bandwidth.