Building new solutions instantly transforms existing data into “Legacy Data." What does this mean?
Software developers use the term “Legacy Code” to summarize the challenges that existing code poses to evolving or changing systems. In Working Effectively with Legacy Code, author Michael Feathers defines Legacy Code as “simply code without tests.” This definition highlights:
What about Legacy Data?
Legacy Data seldom has tests. Builders creating new solutions from this data face the same problems as software developers face with Legacy Code. Today’s complex system of data lakes, data warehouses, and data meshes increases this risk.
The best new systems have:
One system we built used a sanitized version of the previous day’s production data to re-run all the unit and integration tests against. We changed the existing tests to be data-aware and handle the new scale. This approach uncovered subtle and hidden data assumptions. We used this insight to correct quality problems in both the new and legacy code and data.
We are all building more and more innovative data products. As we do, we must get better at embracing the realities of Legacy Data. GistLabsAI can help. Please see our Data Strategy and Data Engineering services for more information.