Unfortunately, performance testing tools do not emulate the exact same behavior a real user would show when accessing any application. You need to calibrate them first to have accurate results.

-Roland Van Leusden

Frankly speaking, so far I haven’t seen an organization engage in performance testing properly. Maybe I am a little bit unlucky or a little bit pessimistic (or even a little bit exaggerating), but this is the case! In some organizations we were using inappropriate tools, in some we were lacking deliverables, processes, and reporting, and in others we were lacking performance requirements.

But most importantly, for all of the above we were unaware of the performance testing tool calibration phenomenon. Putting it another way, we were 100 percent relying on the tools’ behavior and output. Having realized this fact, I can easily say that processes, deliverables, reporting, requirements, and everything else you have successfully adapted in performance testing are worthless if you are neglecting tool calibration. I have told you; truth is bitter!

Do all these mean that the performance tests so far (at least the ones I have executed!) do not reflect real results (and can be discarded as trash)? No, I cannot say that; but I do have serious doubts. Hopefully, my ex-supervisors and managers won’t read this blog post!

Performance test tools need calibration

At Romania Testing Community Conference, an interesting session was held by Roland Van Leusden, a colleague and a good old friend from the Netherlands, which explained this fact in perfect detail. I have great respect for Roland’s research and know-how about the subject, so I asked him to give me some details about it. Here is what he had to say:

“When we buy a ruler, we expect that it is correct within a certain tolerance. When we buy a more expensive ruler, it comes with a calibration certificate that states under which conditions the ruler will be exactly one meter long. With performance test tooling, there is no such calibration and certification; you just have to believe that when you run with fifty users it will behave and produce a load equal to fifty real users.

After investigation I found that this is not the case when you use the tools with their default settings. You need to analyze what the user is doing at client, network, and server level with your application; this will be your reference. The good thing is that most tools can be adjusted to create similar behavior to the real user; it requires, however, in-depth knowledge of your application at the various levels and tooling. To get accurate, production-like results, you need to calibrate the tooling and validate the output, to make the right decisions for the best user experience.”

Any objections?