Blog | MAR 16, 2026

Why the EU AI Act Makes Cryptographic Data Provenance Mandatory

EU AI ActData Notarization

The EU AI Act, which entered into force in August 2024, contains two provisions that companies building high risk AI systems need to address: Article 10 on data governance and Article 15(5) on cybersecurity. Together, they set a compliance bar that perimeter security and access controls alone do not meet. This post looks at what those requirements actually say, and why cryptographic verification of training data is the practical path to satisfying them.

Patrick Lamplmair

CTO, Tributech

~ 6 min

The Double Mandate for Data Provenance

Most coverage of the AI Act focuses on prohibited practices or conformity assessment procedures. What gets less attention are the specific technical requirements buried in Articles 10 and 15. These aren't recommendations or best practices. They use "shall" - the language of legal obligation.

Article 10(2)(a) requires providers to document "data collection processes and the origin of data". Not approximate origins. Not general categories. The actual origin.

Article 10(3) goes further: training data "shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose"

Then Article 15(5) adds the second requirement: "technical solutions to address AI specific vulnerabilities shall include, where appropriate, measures to prevent, detect, respond to, resolve and control for attacks trying to manipulate the training data set (data poisoning)."

This isn't about having good security practices. It's about proving - to a regulator who asks - that your training data hasn't been manipulated.

The table below lists the specific provisions and their exact text.

Article/Annex	Requirement	Citation
Article 10(2)(a)	Data origin documentation	"data collection processes and the origin of data"
Article 10(3)	Error-free data	"to the best extent possible, free of errors and complete"
Article 15(5)	Data poisoning protection	"measures to prevent, detect, respond to, resolve and control for attacks trying to manipulate the training data set (data poisoning)"
Annex IV, Section 2(d)	Provenance documentation	"information about their provenance, scope and main characteristics; how the data was obtained and selected"
Article 72(2)	Continuous compliance	"actively and systematically collect, document and analyse relevant data... which allow the provider to evaluate the continuous compliance"
Article 21(1)	Regulatory cooperation	"provide that authority all the information and documentation necessary to demonstrate the conformity"
Article 20(1)	Corrective action	"immediately take the necessary corrective actions to bring that system into conformity, to withdraw it, to disable it, or to recall it"
Article 99(4)	Penalties	"up to EUR 15 000 000 or... 3 % of its total worldwide annual turnover"

Source: Regulation (EU) 2024/1689 (EU AI Act)

Why Perimeter Security Doesn't Suffice

Traditional security focuses on preventing unauthorized access. Firewalls, access controls, network segmentation. These are necessary but they don't create proof.

A regulator investigating under Article 79 can ask: "Demonstrate that your training data originated from the sources you claim and hasn't been altered since collection."

With perimeter security alone, you respond with process documentation. We have firewalls. We use access controls. Only authorized personnel can modify data. This describes what should happen, not what did happen.

The regulator asks again: "Can you prove this specific dataset used to train your model on March 15th came from Sensor 5346 and wasn't modified between collection and training?"

The audit logs show who accessed what systems. But there's no mathematical proof linking the data you collected to the data you trained on.

Article 10 requires documentation of data origin. Article 15(5) requires detecting manipulation. Process documentation describes controls. It doesn't prove integrity.

What Detection Actually Requires

Article 15(5) lists five mandates: prevent, detect, respond to, resolve, and control for data poisoning attacks. Three of these (detect, respond, resolve) are impossible without a verifiable baseline.

Detection requires knowing the original state. Response requires identifying which data was affected. Resolution requires access to verified clean data.

This leads to a specific technical requirement: tamper-evident cryptographic proofs of data from collection through training.

Without this, you're claiming data integrity based on the security of your infrastructure. That's risk management, not proof.

The Documentation Burden

Annex IV specifies what technical documentation must include: "information about their provenance, scope and main characteristics; how the data was obtained and selected."

Provenance means lineage. Where did this data come from? Who collected it? When? Through what process? Was it modified? By whom? Why?

You need records for every dataset used in training, validation, and testing. These records must be "kept up-to-date" per Article 11(1). As data sources change, documentation changes.

For industrial systems, this gets complex. Data comes from multiple sensors and multiple sites. It crosses from OT to IT systems through network and application layers. Each source must be traceable through these boundaries.

A spreadsheet listing data sources doesn't meet the standard. You need an audit trail showing data lineage from sensor to model. When a regulator asks about dataset integrity, you produce cryptographic proof, not procedural assertions.

Post-Market Monitoring Extends the Requirement

Article 72(2) requires "actively and systematically collect, document and analyse relevant data... which allow the provider to evaluate the continuous compliance"

Continuous compliance includes Article 10's data requirements. You prove data integrity once for initial conformity assessment. Then you keep proving it throughout the system's operational life.

This matters for systems that retrain on new data. Each time you update the model, you need verified data. The monitoring plan required by Article 72(3) must document how you maintain data integrity over time.

Perimeter security doesn't scale to this requirement. You need automated verification that data flowing into your system maintains integrity without manual checks every time.

Enforcement Makes This Practical

Article 99(4) sets penalties for non-compliance with Article 10 or Article 15 at up to €15 million or 3% of global annual turnover, whichever is higher.

Article 21(1) requires providers to give competent authorities "all the information and documentation necessary to demonstrate the conformity of the high-risk AI system with the requirements."

This isn't theoretical. Market surveillance authorities can request data governance documentation. If you cannot produce it, Article 20(1) requires you to "immediately take the necessary corrective actions to bring that system into conformity, to withdraw it, to disable it, or to recall it."

The practical question becomes: what do you build now to satisfy these requirements when asked?

Process documentation and access controls describe your security posture. They don't prove data hasn't been poisoned. Cryptographic verification does, it creates a mathematical proof of data integrity that survives regulatory scrutiny.

Data Provenance Verification: A Global Requirement

The EU AI Act isn't an outlier. Data integrity verification for AI systems is emerging as a global standard:

• US NIST AI Risk Management Framework (AI 100-1) identifies data integrity as a core risk category requiring mitigation

• US NIST AI Lifecycle Risk Management (AI 600-1) provides specific controls for training data verification throughout the ML pipeline

• UAE Information Assurance Standard V2 mandates data provenance controls for AI deployments

• CISA Secure Integration of AI in OT explicitly addresses protecting against manipulated training data in industrial environments

These frameworks converge on the same technical requirement: prove your training data wasn't tampered or poisoned. Industrial AI systems face this across markets. Predictive maintenance, forecasting, process automation and optimization pull data from operational technology environments: sensors, edge devices, SCADA systems. This creates a new threat and risk landscape on infrastructure not designed for it.

For systems under the EU AI Act, cryptographic verification is mandatory. For systems in other markets following NIST, CISA, or UAE standards, it's required as well. For commercial deployments, it's risk reduction and competitive differentiation.

The requirement is global. The question is the implementation.

What Implementation Looks Like

The regulation doesn't prescribe specific technologies. It sets requirements. How you meet them is up to you. But the requirements narrow the viable approaches. You need:

Tamper-evident records of data collection (Article 10(2)(a))
Verification that data is "free of errors" (Article 10(3))
Detection of data manipulation (Article 15(5))
Documentation of data provenance (Annex IV)
Continuous compliance monitoring (Article 72)

Access controls and process documentation address points 1 and 4 partially. They don't address points 2, 3, or 5 in a way that produces evidence for regulators.

Cryptographic methods address all five. Create cryptographic proofs of the data at source. Record in a secure proof storage. Verify before training. Document the verification process. Monitor integrity continuously.

The technical implementation varies by environment. But the core requirement is consistent: verifiable proof that training data hasn't been manipulated between collection and use.

Building the Infrastructure

The EU AI Act requires cryptographic proof of training data integrity. Traditional security (audit logs, process documentation) records activity but doesn't provide cryptographic proof.

Tributech built the data-centric zero trust infrastructure for this. Our data notarization provides cryptographic verification from OT to IT systems, from collection through training. Each data point gets notarized at source, proofs stored securely, integrity verified mathematically before training.

When Article 21 requires demonstrating data provenance to authorities, you produce cryptographic proof instead of procedural documentation.

For high-risk AI systems, this capability is mandatory. For commercial industrial AI, it's risk reduction and competitive differentiation. The question is when to build it, not whether.

Want to turn a compliance requirement into a competitive advantage? Email a Tributech expert about your data architecture.