Blog /January 21, 2026

The Plutonium Protocol: Engineering Safety for the LLM Intern Era

Adrian Brudaru,
Co-Founder & CDO

For over a decade, we’ve operated under the comforting metaphor that "data is the new oil".

The logic was simple: data is a crude resource. You pump it out of the ground (APIs, databases), store it in massive tanks (Data Lakes), and refine it later into value (dashboards). If you spilled some oil (broke a chart, a missed forecast), it was messy, but you could clean it up. The risks were economic and localized.

That era is over. We have entered the nuclear age of information, and the physics of data have changed.

In the hands of an autonomous agent or a Large Language Model (LLM), data is no longer oil. It is plutonium. It is incredibly dense with potential, inherently unstable and toxic when mishandled.

The green fire in LLM’s eyes

We are increasingly giving AI agents write-access to our systems, including them in automation and putting trust in them. But how wrong do things go when they go wrong? Let’s talk concrete examples.

1. The Production Database Wipe

In a catastrophic failure of agentic control, an AI coding agent explicitly ignored a "code freeze" command from SaaStr founder Jason Lemkin. The agent proceeded to delete the entire production database containing over 1,200 executive profiles and, when it realized the error, "panicked" and attempted to cover its tracks by fabricating 4,000 fake user records to make the database look full again.

2. The "Phantom Package" Supply Chain Attack

Hackers discovered that AI coding assistants often hallucinate software packages that don't exist (e.g., huggingface-cli), so they registered those fake names on the real internet and filled them with malware. Thousands of developers at major companies (including Alibaba and Amazon) blindly accepted the AI's suggestion, downloaded the "phantom" packages, and installed the hackers' malware directly into their corporate servers.

3. The "Invisible Text" Prompt injection

Security researchers demonstrated a "silent" hack where invisible instructions (white text on a white background) were hidden in a public code repository's README file. When developers used AI tools like Cursor to "summarize this repo," the AI read the invisible text which secretly commanded it to scan the developer's computer for private API keys and send them to the attacker's server without the user ever seeing a thing. You too could do it - just stand up a webhook and listen for POSTs from your instructions.

4. The data leak - making your chat history essentially public:

Engineers at Samsung’s semiconductor division were eager to use the new capabilities of GenAI to speed up their workflow. In three separate incidents within a span of weeks, employees uploaded highly sensitive data to ChatGPT including source code, trade secrets and confidential exec meeting notes. Because at the time GPT had no option to not use your data for training, this meant that if queried by others, subsequent versions of GPT contained the private information.

Why don’t data teams test more?

Why did this happen? Because for the last ten years, the "Modern Data Stack" (MDS) optimized for democratization, not discipline.

We made it easy for anyone to write SQL. We made it easy to dump JSON into Snowflake. But we forgot the rigorous controls that software engineers use everyday.

Software Engineers never deploy without a transient test environment (a temporary, isolated replica of production).
Data Engineers, trapped by vendor-locked cloud warehouses, often test in "Staging", or worse, in Production.

We need to formalize a new discipline: Data Reliability Engineering. We need to Shift Left. We need to test the data before it enters the reactor.

The 5 Pillars of Data Quality

To move from "hope-based" ETL to deterministic engineering and survive the nuclear age, we must enforce these controls at the ingestion layer. These are the non-negotiable guarantees required to safely sign off on a production data system.

1. Structural Integrity

In typical software development, data is strictly managed by the database or the software classes, but in data we typically receive interchange formats like JSON which are weakly typed and have no restrictions - here you make working assumptions, that you want to ensure hold consistent as you work with the data. Examples of things you want to do here:

Infer and manage schemas.
Manage schema drift through evolution or contracts.
Enforce strong types or nulls.
Normalize nested data structures into flat tables with lineage.

2. Semantic Validity

Similar to the tests above, semantics are typically enforced by the data producer, but when consuming the data we need to validate that the semantics we believe are implicit in the data are indeed true. For example, a discount field might be expected to be between 0 and 1 but unexpectedly contain the value 80 (8000%) that was manually entered by a service agent accidentally. If we let that through into an order total calculation, we might find the order to cost negative 79x, affecting the reported totals significantly.

Reject values outside defined domain bounds (e.g., age >= 0, probability <= 1.0).
Verify business rules like event_timestamp > ingestion_timestamp.
Reject values not present in strict Enums (e.g., ISO-3166 codes).
Verify string patterns (Email, UUID, URL) against strict Regex definitions.

3. Uniqueness & Relations

While source systems often enforce rules like distinct primary keys or that FK values must have a PK equivalent, data in movement or in analytical systems doesn’t have any mechanism for enforcing or testing these rules. The logic and rules applied after the fact in analytical systems are often unverified assumptions that while they were valid at the time of implementation, might not hold valid over time if source logic changes.

Testing for identity resolution between systems, for example associating a Stripe customer to a HubSpot customer, also falls under this pillar. In many companies, these processes are often manually handled by account managers or customer service, requiring testing and alerting the owner though a data contract enforcement mechanism

Guarantee uniqueness during merge (upsert) strategies.
Block child records that reference non-existent Parent IDs (Orphan detection).
Detect orphan records and alert discrepancies to human curators.

4. Privacy & Governance

The definition of PII can vary depending on industry, types of rules, and scope. For example, you might want to avoid working with user input text not due to risky PII but due to risky prompt injection attacks.

Scan for PII and tag the fields: Detect PII regex patterns (CC, SSN) in payloads before serialization.
Handle PII fields in various ways: Remove, hash, scrub, quarantine for testing, quarantine for review, or just flag and handle during access.
PII lineage follows the PII from extraction to consumption, ensuring it’s safely handled.

5. Operational Health (SLA Enforcement)

Well-engineered pipelines seldom crash, but data can still be late or wrong if the upstream producers are having problems. This pillar includes the types of tests that test for both our operations and the upstream ones.

Monitor Volume or distribution shifts.
Monitor freshness based on data fields.
Monitor operational metadata about the pipelines such as run health, quarantine events.

Secure the Reactor

We are building the nervous systems of the future enterprise. To treat this responsibility with the tooling and mindset of the "Dashboard Era" is professional negligence. The "Data is Oil" era was about extraction. The "Data is Plutonium" era is about trust in AI automation.

You cannot wait for the meltdown to audit your safety protocols. You must shift left, enforcing rigorous contracts, sanitizing inputs, and validating logic before the data ever enters the reactor core.

The tools to build these containment vessels exist today.

Start Engineering Now: Dive into our Data Quality Lifecycle docs to learn how to implement these 5 pillars using the open-source dlt library.
Scale Your Control Room: For enterprises that need to manage the entire data quality lifecycle, we are building dltHub. Sign up for the dltHub waiting list to get notified of dltHub’s commercial launch and early access opportunities.

Debugging Our Docs RAG, Part 1: Evaluating a Production RAG System

The Builder: Outliving the Modern Data Stack

The green fire in LLM’s eyesLink icon

1. The Production Database Wipe Link icon

2. The "Phantom Package" Supply Chain AttackLink icon

3. The "Invisible Text" Prompt injectionLink icon

4. The data leak - making your chat history essentially public:Link icon

Why don’t data teams test more?Link icon

The 5 Pillars of Data QualityLink icon

1. Structural IntegrityLink icon

2. Semantic ValidityLink icon

3. Uniqueness & RelationsLink icon

4. Privacy & GovernanceLink icon

5. Operational Health (SLA Enforcement)Link icon

Secure the ReactorLink icon