In God we trust. All others must bring data.
W. Edwards Deming

Since software testing is mostly a data-driven activity, and testers spend at least half of their efforts in managing the test data (creating, obtaining, using, storing, refreshing, masking, subsetting, and tracking), I want to write about this subject and try to give you some practical ideas and information.

Let me start by telling you a little bit about the “big data” phenomenon. Several investigations indicate that more than half of global companies are storing more than 100 terabytes (TB) of data. And these data are expected to double in the next two years’ time. By 2020, it is assumed that there will be 20 billion connected devices. Apart from these, companies still find themselves immature about handling their data (which is quite correct), and almost all of them are willing to invest in this area for a better governance.

While companies are making the use of these information oceans and derive profits from the data they store, at the same time they suffer from it. It is obvious that no company can cope with data growth by just increasing their hardware capacity. Companies need to find smart solutions for this inevitable growth.

When we narrow the subject to testing, we observe that IT organizations are deeply focusing on the collection and organization of data for their testing processes. The ability to control this process and use test data has become the key competitive advantage for these organizations because benefits of such mechanisms will outweigh their disadvantages. Ultimately, test data management plays a vital role in any software development project. Unstructured processes may lead organizations to:

  • Do inadequate testing (poor quality of product)
  • Be unresponsive (increased time to market)
  • Perform redundant operations and rework (increased costs)
  • Be noncompliant with regulatory norms, especially on data confidentiality and usage

We all know that testing is a very critical part of good software development; nevertheless, test data management gets only minimal attention from us. Why is that? I really do not know.

Maybe it is because we are more focused on execution rather than preparation, or we are more focused on the result of the game rather than how we are playing. Whatever the root cause is, we need to accept that many test failures are caused by inconsistencies in the test data. Therefore, we need to be sure that we have constructed an efficient test data management process.

If we do so, then we can be quite certain that test metrics are not biased and we do not cause any interruptions and time loss through our test execution process. On the following page, I list the most essential activities and processes to achieve a complete test data management process. If you follow them in a structured manner, then you can talk about test efficiency, cost reduction, and acceleration. Here they are:


  • Initiate a demand tracking process for managing the test data demands and their status
    • Including creation of the workflow and utilization of a activity tracking tool
  • Include test data analysis activities in the test planning phase
    • Specify all the necessary data parameters, like depth (amount of data), breadth (variation of data), scope (relevancy of test data to the test objectives), sensitivity, and architecture (physical structure of the test data)
  • Set and frequently measure your test data objectives
    • Reliability, accessibility, completeness, consistency, integrity, timeliness, security, and so on
  • Include test data preparation activities in the project plan and test development phase
    • Estimate efforts for test data analysis and preparation
  • Follow a step-by-step approach (below you can see the high-level picture)
    • Extracting (from the source database; you can do sub-setting if necessary)
    • Masking (for desensitizing the test data and ensuring that there exists no confidential customer information that is against any legislation or legal enforcement)
    • Loading (into the target database)

When we go even deeper, we shall observe that every different testing activity (e.g., test type or test level) requires different test data. The following chart hopefully will help you in determining what volume and variation of data you need while you are executing different levels and/or types of testing.

To clarify your test data requirements, there are questions you should ask yourself, including:

  • What kind of data is needed?
  • How much data is needed?
  • When is the data needed?
  • Who will use the data?
  • Where will it be extracted, and where will it be loaded?
  • Does it have any dependencies?
  • Is it sensitive, and how will the data be secured?
  • How will the governance be done?
  • How will the data be refreshed?

Once you have gathered all the answers and you are satisfied, then you can be sure that you are on the right track.

No matter which approach you choose to handle the challenges of the important subject of test data management, the basic requirements for you to be successful are a combination of good test cases and test data along with the proper usage of tools to help you automate extraction, transformation, and governance of the data being used.

If you want to see results, you need to play well in the game. You either will have excuses or results; the choice is yours…