Response times and what to make of their percentile values

Upon integrating any performance monitoring tool for your application, you must be observing p95, p99 response times on the dashboards. If you are wondering what do these terms and their values mean, you have come to the right place.

What does percentile even mean?

A value on a scale of 100 that indicates the percent of a distribution that is equal to or below it. For example, if you score in the 25th percentile, then that means 25% of test takers are equal to or below your score.

What does p95 response time mean in performance monitoring?

It means that 95 percent of the requests have a response time of less than the p95 value. Lets say that the p95 is 170 ms. This means that the response times of 95 percent of the requests your application receives is less than or equal to 170 ms. So the remaining 5% of the requests have a response time greater than 170 ms. It could be 2s or 180 ms, it does not specify that.

Similarly, p99 response time means the response time of 99% of the requests is less than or equal to the p99 value.

p99 - 99% of the requests will be equal to or faster than the p99 value.
p90 - 90% of the requests will be equal to or faster than the p90 value.

Why are we not looking at average response time?

Assume these are your application's response times for the past 1 hour

Response times
100 ms
110 ms
105 ms
115 ms
120 ms
100 ms
102 ms
20 s

If you calculate the average of the above values, the result would be 2.594 seconds. But if you look at the values closely, 7 out of the 8 requests are averaging at 107 ms. And a single request with response time of 20 seconds is skewing the average response time of the whole app.

If you were to look at the response time as a metric for measuring performance, you would be worried with an average response time of 2.594 seconds. But now we know it does not truly depict the true performance of the app.

Instead, if you were to look at the p99 for this data, you would see that 99% of the requests have response times less than or equal to 120 ms. That would be a much more accurate reflection of the performance of the app.

How about looking at minimum and maximum response times?

Consider the same response time data. If you were to look for minimum and maximum response time, you would get 100ms and 20s, respectively. This, however, does not give you any information about how your application is performing in general. It only tells you the best and the worst response time.

Now that we know why we should not look at average response times, let us understand if we should look at p50, p95, p99, or all of them?

p50 shows the experience of 50% of the users.
p95 shows the experience of 95% of the users.
p99 shows the experience of 99% of the users.

If you were looking at places to improve the performance of your application, then it would make more sense to look at p95 response time values than to look at p99 values.

When you are looking at p99, you are potentially looking to improve the 1% of the requests with unacceptable response times. But there could also be outliers in that 1% of the requests, which took so much times to respond due to various reasons outside the scope of the application. For example, it could be due to a timeout at elb responsible for sending requests to your app server while the elb is outside the control of the application. So you don’t want to spend a lot of time trying to improve performance when looking at outliers.

For this reason, it makes more sense to look at p95 values. Now you will be looking to improve the 5% of the requests with higher response times. Those 5% of the requests would include the outliers, but would also include some genuinely slow requests.

These metrics are not only used for performance improvement, they are also used for performance monitoring. You can add alarms based on threshold values assigned to each one of the p99, p95, p50 values. There is no preference for any specific metric when it comes to setting alarms. Ideally, you should set alarms for all 3 values and sometimes, depending on the nature of your business and type of traffic your application serves, it might also make sense to start monitoring and add alarms for p99.99 response times.

Why do we even need to measure response times?

For only one single reason: to measure the performance of your application. If someone were to ask you "how fast is your application?", how would you respond if not in the form of a metric.

Response times in the form of p95, p99 are not the only metric that needs to be tracked when speaking of performance monitoring. It could be throughput, request queuing, memory usage, CPU utilization and many more.