Reproducibility, Testability, Traceability: A Framework for Ensuring Data Correctness at Scale

Blog Post

Matt Gordon, Principal Software Engineer, Kairos Aerospace

Prior to working at Kairos, I was an experimental physicist for many years.  One of the most exciting things about science is the moment when you realize you have just learned something that nobody else in the world knew before because, until that moment, nobody else had the tools to observe it.  At Kairos, we get to feel that excitement every day, because every day we collect data that was previously impossible to collect: wide-area, on-demand, high resolution images of natural gas leaks.  But the hardest part about selling a new kind of data is credibility: convincing customers that you really can do the things you say you can do.  

________________________________________________________________________________________________________

Any sufficiently advanced technology is indistinguishable from magic. –Arthur C. Clarke

________________________________________________________________________________________________________

In enterprise software, your most valuable asset is your reputation.  Our customers trust us because we give them a 96% true positive rate for emission detection, and in Stanford University’s recent controlled-release tests of our technology, we had a 0% false positive rate.  The key to generating credible, actionable information is the same in both physics and software engineering: reproducibility, testability, and traceability.

In science, an empirically untestable statement is one that cannot be said to be true or false, because it is not provable.  For instance, the statement “If a frog had a glass butt, he wouldn’t hop” is empirically untestable, due to the lack of glass-butted frogs in the world.  In a similar way, software cannot be said to be correct or incorrect if it is not tested, because testing provides the constraints against which correctness is measured.  Testability is critical because tests are where we record the things which must be true in order to say the system is producing correct answers.

________________________________________________________________________________________________________

This isn’t right.  This isn’t even wrong. — Wolfgang Pauli

________________________________________________________________________________________________________

Practically speaking, testability means that every function in your code base should be able to be tested in as close to a production environment as possible, via a combination of unit tests and end-to-end tests of the pipeline.  It ensures that any changes to your code or any of its dependencies will not introduce unintended consequences.  In addition to unit tests and end-to-end tests, we also employ a series of data integrity tests on our incoming flight data, to ensure that the airplane and instruments performed as expected.  And, every change to our code is run through a set of acceptance tests on historical data which are then compared to the same data run through the production branch.  These results have to be approved by our Chief Science Officer before merging into production, so that we never accidentally introduce sources of error into measurements.

Reproducibility requires that every time we provide the system with the same set of inputs, we get the same outputs, or that the process be idempotent: calling it multiple times should produce the same results as calling it once.  There are lots of sources of stochasticity in data pipelines: external data sources, for instance, don’t always promise idempotency, and can undergo changes in their API or underlying data without warning.  But, even more importantly, every time your data pipeline crashes, and it requires human intervention to unbork it, you risk introducing irreproducibility.  That’s one of the reasons why testability is so important: end-to-end testing ensures that new code doesn’t accidentally crash the pipeline. When you get that Slack notification at 11PM on Taco Tuesday, you’re not always in top form, and the clock is ticking, which can lead to mistakes.

Traceability means that we know from end-to-end the provenance of every piece of data, and the code that generated it.  Whereas repeatability tells us that the same inputs will produce the same outputs, traceability relates to knowing which inputs are related to which outputs, and always knowing the provenance of your data.  As my high school geometry teacher used to say, “Show your work!”  If a data product is flawed, you have to be able to find out why, otherwise you can never fix it.  And, it strongly reduces the temptation to “just fix it manually” by editing the outputs directly, which violates the principles of reproducibility and testability.

Oftentimes, hardware companies start as “experiments,” and code and hardware undergo coordinated rapid iteration to “just get it to work”.  The transition from science experiment to a polished, scalable product requires a disciplined approach to building an end-to-end system that does more than “just works”: it needs to reliably produce accurate results every time, while still allowing for rapid iteration.  When you have customers and deadlines, this often amounts to fixing the engine on the airplane while it’s flying at 3000’.  The principles of testability, reproducibility, and traceability provide a framework for prioritizing the huge mass of technical debt that gets built up during the early development phases, which allow you to rapidly iterate without fear of delivering incorrect results.

But in the end, this is all just plumbing.  The reason we spend so much time on plumbing is so that our customers can find out about natural gas leaks quickly and accurately, and so that they can dispatch work crews to fix those leaks with confidence, knowing that the information is correct.  For enterprise software, every customer matters, and for customers to trust you, every piece of information has to be accurate.  A testable and reliable data pipeline ensures that we deliver results on time, and correctly.

And, for me and my team, the stakes are just as high: it means I can push new code to the production pipeline and then go play with my kids.  I don’t have to worry about whether it will break during a mission critical reporting cycle, or whether the results might be wrong, and instead I can focus on thinking up silly voices for reading I Want My Hat Back instead of watching for Slack notifications.  A happy engineering team, with good work-life balance, is a team that produces high quality results consistently, which is a win-win-win for me, for Kairos, and for our customers.

Menu