GDPR Compliance for Software Development: Integrating Privacy into the SDLC

Software engineers have always been responsible for the security of the systems they build. Under GDPR, they are now the primary implementers of privacy as well. While the Regulation sets the legal framework, the actual obligations – how data is collected, stored, and deleted – are realized through technical decisions in the data model, API design, and system architecture. This makes privacy a fundamental engineering requirement that must be integrated into the design phase, ideally through a collaborative partnership between engineering, product, and legal teams.

When you look at the enforcement record, it is largely a record of engineering failures. Nine of the ten largest GDPR penalties have fallen on technology companies, for violations rooted in how systems were built. These included things like how user data fed into advertising algorithms, how it moved across borders, and how consent was handled at the application layer. Regulators have become sophisticated enough to follow that trail, and fines now reflect it.

The other pressure is structural. Privacy features such as consent flows, data minimisation, and the right to erasure reach into the data model, the API layer, and the logging infrastructure. Developers who leave these considerations until late in the development cycle routinely find that compliance requires unpicking foundational decisions. The earlier privacy is treated as a design constraint, the less disruptive — and more robust — the outcome.

This article covers how GDPR requirements translate into concrete engineering practices across each phase of the Software Development Lifecycle, from requirements and design through to deployment and ongoing operations.

Table of Contents

What Developers Actually Need to Understand About GDPR

GDPR is 99 articles long. Most of it doesn’t affect how you write code. What follows is the subset that does.

1.1 The Three Roles and Why They Matter to Your Architecture

GDPR organises data relationships around three roles: the Data Subject (the person whose data it is), the Data Controller (the entity that decides why and how data is processed), and the Data Processor (the entity that processes data on the controller’s behalf).

The distinction matters architecturally because each role carries different obligations, and most engineering teams occupy more than one simultaneously. When you build a product that collects user data directly, your organisation is the controller. When the product processes data on behalf of your customers, such as a platform where businesses manage their own end users, your organisation is the processor. The role is determined by the specific processing activity, not by the company as a whole. A single system can have your team acting as controller in one data flow and processor in another, and the compliance requirements follow accordingly.

Third-party dependencies extend this further. Every analytics SDK, cloud provider, logging service, or payment processor your system integrates with becomes part of your data processing chain. Controllers are responsible for ensuring those vendors offer sufficient guarantees of compliance, which means their architectural choices carry GDPR implications. Selecting an observability tool, choosing a cloud region, or integrating a third-party authentication provider are all decisions with privacy consequences. A Data Processing Agreement (DPA) with each vendor is the minimum contractual baseline, but the more important discipline is understanding what data each integration actually receives and what it does with it.

1.2 What Counts as Personal Data in Code

GDPR defines personal data as any information that relates to an identified or identifiable natural person. In practice, this reaches well beyond names and email addresses.

IP addresses are personal data. Device IDs are personal data. Behavioural logs – click streams, session recordings, search queries – are personal data. Inferred attributes derived from behavioural data, such as predicted preferences or risk scores, are personal data. If a value in your database can be used, alone or in combination with other data, to single out an individual, it falls under the Regulation.

The practical implication is that data minimisation starts at the schema level, before a single line of application code is written. If your age verification flow only needs to confirm that a user is over 18, storing their date of birth is unnecessary. If your monitoring stack only needs HTTP status codes, logging full request bodies creates liability with no operational benefit. Every additional field you persist is a field you’ll need to protect, account for in subject access requests, and eventually delete.

There is also a subset of personal data that demands stricter treatment: special category data. This includes health information, biometric data, racial or ethnic origin, and precise geolocation. GDPR requires explicit consent or another narrow legal basis to process it, and that stricter standard has direct schema implications. Special category fields need tighter access controls, stricter retention limits, and, in many cases, should be stored separately from general profile data to limit exposure.

1.3 The 7 Principles Translated Into Engineering Requirements

GDPR’s seven data protection principles, set out in Article 5, read like policy language. They function as engineering requirements.

Lawfulness, Fairness, and Transparency mean your system must have a documented legal basis for every processing activity before data is collected — not as an afterthought. Practically speaking, this means your user registration flow, your analytics pipeline, and your marketing automation each need a mapped legal basis (consent, legitimate interest, contractual necessity, etc.), and your privacy notice must accurately reflect what the system actually does. When the code changes, the notice needs to change with it.

Purpose Limitation means data collected for one reason cannot quietly be repurposed for another. If you collect email addresses to send transactional notifications, feeding them into a behavioural profiling model requires a separate legal basis. Architecturally, this argues for keeping data pipelines intentionally separated – not letting raw user data flow freely into analytics or ML systems without explicit purpose mapping.

Data Minimisation means collecting only what you actually need. Run a schema review before writing any migration that adds a new column of personal data. Ask whether the field is necessary for the feature, or just convenient to have. Convenience is not a legal basis.

Accuracy means building processes to keep personal data current and correctable. This has product implications – users need a way to update their data – and operational ones: stale data in your system isn’t just a quality problem, it’s a compliance one.

Storage Limitation means data should be retained only as long as necessary. In engineering terms: define retention periods for every personal data category, then build the deletion jobs to enforce them. TTLs on cache entries, automated purge scripts, and archiving pipelines all need to be first-class features, not manual cleanup tasks someone runs occasionally.

Integrity and confidentiality map directly onto security engineering — encryption at rest and in transit, access controls, audit logging, vulnerability management. This principle is where GDPR and secure software development most obviously overlap.

Accountability is the one that tends to surprise teams. It requires you to be able to demonstrate compliance, not just achieve it. That means maintaining Records of Processing Activities (RoPA), documenting design decisions that have privacy implications, and building systems that can produce audit trails on demand. The ability to answer “what data do you hold about this user, and why?” needs to be a feature of the system, not a manual investigation.

What Privacy by Design Actually Means

2.1 Privacy by Design Is Article 25 of GDPR

Privacy by Design predates GDPR by decades – it originated as a framework in the 1990s. Article 25 of the Regulation gave it legal force. For engineers, the implication is direct: data protection must be built into systems by default, and “by default” has a specific technical meaning.

Article 25(2) requires that, without any action by the user, only the personal data necessary for each specific purpose is processed. That translates into concrete defaults: consent for non-essential processing must be opt-in, features that collect additional data should be disabled, and the minimum viable data set should be the starting point, not something you trim back later.

It is in these defaults that you will find the gap between compliance and non-compliance. A pre-ticked consent checkbox fails Article 25. An analytics integration that activates on first load, before any consent signal, fails Article 25. A user profile that stores a dozen fields because the form was designed that way – rather than because the product needs them – fails Article 25. All these are common patterns in production systems that regulators have repeatedly scrutinised.

2.2 The Privacy Threat Model

Security engineers routinely use threat modelling to map attack surfaces before writing code. The same approach applies to privacy. A privacy threat model asks: where does personal data enter the system, where does it move, and where does it rest – and what can go wrong at each point.

The practical starting point is a Data Flow Diagram (DFD). In a security context, DFDs help identify attack vectors. In a GDPR context, they serve an additional purpose by becoming a compliance artefact. A well-maintained DFD lets you answer – quickly and accurately – where personal data is processed, which third parties receive it, and whether transfers cross jurisdictional boundaries. Regulators increasingly expect this kind of documentation as evidence of accountability.

For structured privacy threat analysis, LINDDUN is the most mature framework available. Developed at KU Leuven and referenced by NIST, it works as STRIDE does for security – mapping threat categories onto system components – but focused on privacy.

The seven threat types it covers are:

  • Linking (combining data to identify individuals),
  • Identifying (singling out a person from a dataset),
  • Non-repudiation (preventing a user from denying an action),
  • Detecting (inferring sensitive information from observable behaviour),
  • Data Disclosure,
  • Unawareness (users lacking meaningful insight into processing),
  • and Non-compliance.

Teams already familiar with STRIDE will find the methodology familiar. LINDDUN GO, the lightweight version, is designed for use in sprint ceremonies and design reviews – making it practical for teams that need to move quickly without skipping privacy analysis entirely.

2.3 Architecture Patterns That Enable Compliance

Pseudonymisation and Anonymisation

These two terms are often used interchangeably, but they have distinct legal meanings under GDPR and different technical implications.

Pseudonymised data has had direct identifiers replaced with a reference — a user ID instead of a name, a token instead of an email address. The data is still personal data under GDPR, because re-identification is possible if the reference key is available. Pseudonymisation reduces risk and is explicitly encouraged by the Regulation as a technical safeguard, but it does not remove data from GDPR’s scope.

Anonymised data, by contrast, falls outside GDPR entirely — because the individual can no longer be identified from it, even indirectly. True anonymisation is technically difficult. Aggregation, noise injection, and k-anonymity techniques can achieve it, but datasets that appear anonymised have been re-identified in published research when combined with external data sources. If your system relies on anonymisation as a compliance strategy, that claim needs to be technically defensible, not assumed.

Separating Identity from Behavioural Data

A practical architectural pattern is to store identity data, such as name, email, and contact details, separately from behavioural and transactional data. Behavioural records reference a pseudonymous user ID rather than personally identifiable fields. This has two compliance benefits: it limits the blast radius of a breach (behavioural logs are far less sensitive without the identity layer attached), and it makes fulfilling subject access requests and erasure requests significantly simpler since you’ll be operating on a clean boundary rather than hunting through interleaved records.

Event-Sourced Architectures and the Right to Erasure

Event sourcing creates a specific tension with GDPR’s Article 17 right to erasure. The pattern is built on an immutable append-only log – every state change is a persisted event, and the current state is derived by replaying that history. Deleting an event breaks the log. If personal data is embedded in those events, a deletion request becomes architecturally problematic.

There are three established approaches to resolving this:

The first is to keep personal data out of the event store entirely. Events reference a pseudonymous identifier, and the identity data lives in a separate, mutable store. Erasure requests are fulfilled there, without touching the event log. This is the cleanest solution architecturally, though it requires deliberate event design from the start.

The second is cryptographic erasure. Personal data within events is encrypted with a per-user key. When an erasure request is received, the key is destroyed. The events remain in the log, but the personal data within them is unreadable and effectively inaccessible. This approach preserves full replayability while satisfying the practical intent of Article 17, though key management at scale requires careful infrastructure design.

The third is direct event mutation — overwriting or nullifying personal data fields in existing events. This works but sacrifices the immutability guarantees that make event sourcing valuable, and it requires all downstream consumers of those events to handle the change gracefully.

The right approach depends on the system’s replayability requirements and the sensitivity of the data involved. What matters from a compliance perspective is that a decision is made deliberately, before the event schema is defined — retrofitting any of these solutions onto an existing event store is substantially more complex than designing for it upfront.

GDPR Across the SDLC — Phase by Phase

3.1 Requirements & Planning Phase

Privacy requirements need to go into the backlog like any other requirement – and they need to be specific enough to build against. “Handle user data in compliance with GDPR” tells a developer nothing. “Users must be able to download all their personal data in JSON format within 30 days of requesting it” is something you can build, test, and ship. Write privacy requirements the same way you write functional ones: concrete, testable, and assigned to someone.

The other thing to sort out early is whether your project needs a Data Protection Impact Assessment (DPIA). A DPIA is a formal risk assessment required by Article 35 when a feature or system is likely to create high privacy risks. This includes things like building a system that profiles users, processing health or biometric data at scale, or making automated decisions that significantly affect people. The decision on whether a DPIA is needed should be made at project kick-off, not after the architecture is already built.

In an Agile environment, a DPIA doesn’t have to be a heavyweight process. At its core, it’s a structured conversation asking questions like, what data are we collecting, why, what could go wrong, and how are we reducing that risk? For most features, a few hours of focused discussion is enough. Document the outcome, get sign-off from your data protection officer or legal team, and move on. That documentation becomes genuinely useful later when an auditor asks how a decision was made; a dated DPIA with named contributors is a direct, credible answer.

What to record at this stage: the legal basis for each type of data processing, any DPIAs you’ve completed, and your privacy requirements as user stories. This forms the start of your Records of Processing Activities (RoPA) and shows that privacy was considered before anyone wrote a line of code.

3.2 Design & Architecture Phase

Schema Design for Data Minimisation

The simplest way to think about data minimisation at the schema level is: only add a column if you can clearly answer why you need it, how long you’ll keep it, and who should be able to read it. If you can’t answer all three, the column shouldn’t exist yet.

In practice, schemas drift. Tables start lean and accumulate fields over time because adding a column is easy, and nobody questions it. A ‘metadata‘ JSONB field is the most common version of this problem — it becomes a catch-all for whatever was convenient to store, and it’s impossible to apply retention rules or access controls to data you haven’t explicitly defined. Name your PII fields clearly, type them properly, and keep them out of generic containers.

A simple way to enforce this is to add a comment to every migration that touches personal data, declaring what the field is for and when it should be deleted. It costs almost nothing and gives you an auditable record of intent.

Designing for User Rights

GDPR gives users five rights that translate directly into engineering requirements: the right to access their data, correct it, delete it, export it, and restrict how it’s processed. Designing for all five from the start is much cheaper than adding them later.

Access and portability mean your system needs to produce a complete export of everything held about a specific user – across every service and data store. In a single application, this is a complex query. In a microservices architecture, it requires a coordination layer that knows where personal data lives. That layer is far easier to build before you have a dozen services than after.

Rectification sounds simple, but it has real distributed systems implications. If a user corrects their email address in their profile, that correction needs to reach every place that email address lives – the database, the cache, the search index, the analytics warehouse. Without deliberate design, it often doesn’t.

Erasure means actually deleting the data, not just setting a 'delete_at' flag. Soft deletes are fine for operational purposes, but they don’t satisfy Article 17 unless the personal data is genuinely removed or rendered unreadable at some point. Decide upfront what “deleted” means across every system that touches a user’s data, including third-party integrations.

Restriction of processing means the system can pause all processing of a user’s data without deleting it, like when a dispute is being resolved. This needs to be a state your system understands and that downstream services respect.

Consent Data Model

Consent is not a boolean. A proper consent record needs to capture:

  • what specifically the user agreed to,
  • when they agreed,
  • which version of your privacy notice was live at the time, and
  • whether they’ve since withdrawn consent.

It also needs to be versioned – when your privacy notice changes, you need to know which users consented under which version.

More importantly, consent needs to be tied to specific processing activities. A user who consents to receiving order confirmation emails hasn’t consented to being profiled for advertising. If your data model doesn’t make that distinction, you’ll eventually process data without a valid legal basis for it.

3.3 Development Phase

Encryption

TLS 1.3 for data moving between systems and AES-256 for data stored on disk are the current standards. Both are expected. The detail that actually matters is key management. Storing your encryption keys in the same database as the encrypted data defeats the purpose – if someone gets access to the database, they have everything. Keys should live in a dedicated secrets manager (AWS KMS, HashiCorp Vault, GCP Cloud KMS) that is separate from your data layer, with its own access controls and audit log.

Hardcoded secrets in source code are both a security failure and a GDPR problem. If credentials end up in a repository and that repository is ever exposed, so is access to the personal data those credentials protect. Secret scanning tools — Gitleaks, Trufflehog, GitHub’s native scanner — catch this in the CI pipeline before it ships. They’re simple to set up and catch a class of mistakes that code review regularly misses.

Logging Pitfalls

Logging is one of the most common ways developers accidentally create GDPR violations. Request and response bodies logged for debugging often contain email addresses or user-supplied content. Error stack traces sometimes include function arguments with PII in them. Analytics events fire with user identifiers attached that were never meant to be there.

The fix is to treat logging as a privacy-sensitive decision, not an afterthought. Define what can and cannot be logged, and enforce it in code review. PII should be masked or excluded at the point of log generation — not filtered later downstream, where it’s already been written to disk and possibly shipped to a log aggregation service outside your control.

Practical rules: never log full request bodies from authenticated endpoints, never log raw auth tokens, and treat any user-supplied string as PII unless you know otherwise.

Access Control on PII Fields

Not every service or team member needs access to every field containing personal data. During code review, it’s worth asking: does this query, API response, or service call actually need this field to do its job? If the answer is no, the field shouldn’t be there. Column-level permissions, row-level security, and scoped service-to-service authentication all help enforce this at the infrastructure level, rather than relying on developers to remember.

3.4 Testing Phase

Using real user data in test and staging environments is a GDPR violation. Data protection authorities have noted it remains common for organisations to use actual personal data in development and testing environments – and these environments typically have weaker access controls, broader team access, and no formal breach process, making them the worst possible place for real personal data to live.

The practical alternatives are masked data and synthetic data. Masked data takes production records and replaces PII fields with realistic but fake values – preserving the structure and relationships of real data without the compliance risk. Synthetic data is generated from scratch, which is cleaner but requires more work to replicate complex data relationships accurately. Tools worth knowing: Presidio (Microsoft’s open-source PII detection and anonymisation library), Faker and Mimesis for generating synthetic records, and cloud-native options like AWS Glue DataBrew or Google Cloud DLP for masking pipelines that integrate with your existing infrastructure.

The testing gap that matters most, though, is privacy-specific test cases. Most test suites verify that features work correctly. They don’t verify that data subject rights are honoured correctly. Does the delete endpoint cascade to every related record, including soft-deleted rows, audit logs, and third-party systems? Does the data export actually include everything – not just the primary profile, but activity history, preferences, and inferred attributes? Does withdrawing consent actually stop the relevant processing, or does it only update a flag that nothing downstream reads?

These tests belong in your regression suite and should run on every build. Adding automated PII detection to your CI pipeline prevents production data from sneaking into the test suite through shortcuts taken under deadline pressure.

3.5 Deployment & CI/CD Phase

Every deployment should leave a record: what changed, when, who approved it, and what data processing activities are affected. GDPR’s accountability principle requires you to be able to demonstrate compliance at any point, and manual deployment notes in a shared doc don’t hold up under audit scrutiny. Immutable deployment logs generated by your CI/CD pipeline, stored separately from the systems being deployed, give you the evidence trail you need without extra manual effort.

PII scanning in the build pipeline catches something code review regularly misses: personal data that’s been committed directly to source code or config files. Database seed files with real email addresses, test fixtures captured from live environments, API response mocks containing actual user data — all of these appear in codebases more often than they should. Running a PII scanner as a CI gate means the issue is flagged before it ships.

Infrastructure-as-Code has its own compliance surface. A Terraform or CloudFormation template that provisions an S3 bucket without encryption, a database in the wrong region, or an IAM role with access to more data than it needs creates a GDPR problem at the infrastructure level. Compliance scanning tools — Checkov, tfsec, AWS Config rules — let you enforce encryption requirements, region restrictions, and access controls as part of the pipeline. Non-compliant infrastructure simply doesn’t get provisioned.

3.6 Operations & Maintenance Phase

The 72-Hour Breach Notification Rule

Article 33 requires you to notify your supervisory authority within 72 hours of becoming aware of a personal data breach. The clock starts when someone in the organisation knows — not when the investigation is complete, or the breach is fully understood. According to IBM’s 2025 breach report, the average organisation takes 181 days to detect a breach and another 60 days to contain it — a total of 241 days. That gap between when a breach actually starts and when it’s detected is where GDPR fines happen.

Meeting the 72-hour requirement operationally means having the right pieces in place before anything goes wrong: detection and alerting on anomalous data access, and a documented incident response runbook that defines what counts as a notifiable breach, who makes that call, what the notification to the supervisory authority must include, and when affected users also need to be told. None of this should be worked out for the first time during an active incident.

Retention Policy Enforcement

Defining retention periods is straightforward. Actually enforcing them consistently across every data store is where most teams fall short. Automated deletion jobs and TTLs need to be treated as production features — monitored, tested, and alerted on when they fail. A deletion job that silently stops running because a schema changed is a compliance failure, and it won’t surface until an audit or a data subject request reveals data that should have been deleted months ago.

Retention enforcement also needs to cover derived data. If you delete a user’s profile but their email address is still sitting in a marketing tool, a data warehouse, or an archived log file, the deletion is incomplete. A complete retention policy maps every place personal data lives — not just the primary database — and enforces deletion across all of them.

Handling Data Subject Requests at Scale

GDPR gives you one month to respond to most data subject requests — access, erasure, correction, and portability. At low user volumes, handling these manually is manageable. At scale, it breaks down: requests get missed, deadlines slip, and producing a complete subject access response across multiple services becomes expensive and slow.

The solution is to treat DSR handling as a first-class operational capability. That means a tracked intake process, a workflow that routes requests to the right teams and systems, and enough automation that fulfilling a deletion or producing a data export doesn’t require manually contacting every team that owns a data store. Tools like DataGrail, Transcend, and OneTrust are built for this. But the tooling only works if you know where personal data lives — if your data map is incomplete, no tool can reliably retrieve or delete it.

The User Rights APIs — What the Law Actually Requires

GDPR gives individuals five rights over their personal data. Each one has a one-month response deadline. And each one, when you look at it closely, is a set of engineering requirements. This section walks through what each right actually means in code.

Right of Access

The right of access means a user can ask: “What data do you hold about me?” And you have to be able to answer that question completely, within one month.

In a simple application with a single database, this is a complex query. In a system with multiple services, including a user profile service, an activity service, a notifications service, and a billing service, it requires a coordination layer that can reach out to every service, collect what each one holds about a specific user, and stitch it together into a single coherent response.

The endpoint itself is straightforward to define:"GET /me/data-export". What’s harder is making sure it’s complete. Common gaps include data sitting in caches that never get returned, analytics events stored in a separate warehouse that nobody thinks to include, or third-party integrations that hold a copy of user data you’ve sent them. A useful test is to work backwards, starting from a new user account, walk through every action they can take in your product, and ask where data gets written at each step. That map tells you what the export endpoint needs to cover.

The response format should be machine-readable. JSON is the most common choice. Structure it so a non-technical user can understand what they’re looking at — labelled fields, human-readable timestamps, no internal IDs without context.

Right to Erasure

The right to erasure (Article 17) means a user can ask you to delete their data, and in most cases, you have to do it. The one-month deadline applies here, too.

The hardest part is the scope. Most teams think of erasure in terms of the primary database — delete the user row, done. But personal data tends to spread. It lives in:

  • The main application database
  • Redis or Memcached caches
  • Search indexes (Elasticsearch, Algolia)
  • Analytics warehouses (BigQuery, Redshift, Snowflake)
  • Log files and log aggregation tools
  • Email marketing and CRM tools
  • Any third-party integration you’ve sent user data to
  • Backups

Backups are a particular challenge because GDPR doesn’t specifically address personal data in the context of the right to erasure for backups — yet users who request deletion reasonably expect backup copies to be removed too. The practical approach most teams adopt is to delete from all live systems immediately, then either restore and re-snapshot backups with the data removed, or — more commonly — document a policy that backup snapshots containing the data will be purged on their normal rotation schedule. If you take this approach, ensure your backup retention periods are short enough that this is credible.

The European Data Protection Board’s 2026 coordinated enforcement report identified seven recurring challenges that regulators observed in how organisations implement erasure, including reliance on inefficient anonymisation techniques as a substitute for actual deletion, and a lack of internal procedures for handling requests consistently. Both of these are engineering problems, not just process ones.

A few implementation details worth getting right: erasure should be a hard delete on personal data fields, not just a “deleted_at" flag. If you need to retain a record for operational reasons (audit trail, accounting), you can keep the row — but the personal data fields within it should be nulled or overwritten. And when you send an erasure request to a third-party processor, log that you did it and when. That’s your evidence of compliance if it’s ever challenged.

Right to Rectification

The right to rectification means users can correct inaccurate data about themselves. You update it within one month.

The engineering challenge is consistency. If a user changes their email address, that change needs to reach everywhere the old email address lives. In a monolith with a single database, this is a query. In a distributed system, it’s a propagation problem — and if you haven’t designed for it, data gets out of sync.

The places a corrected field typically needs to reach:

  • The primary database (straightforward)
  • Application-level caches — Redis keys built from the old value need invalidating
  • Search indexes — documents need re-indexing with the updated value
  • Analytics and data warehouses — historical records referencing the old value
  • Downstream services that received the data via event or API call

The cleanest architectural solution is to treat identity data as a single source of truth — one service owns it, everything else references it by a user ID rather than copying the raw field. When the value changes in the source of truth, consumers re-fetch it. This is easier to design in from the start than to retrofit into a system where personal data has been duplicated freely across services.

Right to Portability

The right to portability (Article 20) is related to the right of access but distinct. It specifically covers data the user has actively provided to you — not inferred data or data you’ve generated about them — and the requirement is to export it in a structured, commonly used, machine-readable format so they can take it somewhere else if they choose.

In practice: JSON or CSV, cleanly structured, covering the data the user actually gave you — profile information, content they’ve created, preferences they’ve set. Inferred attributes (like a predicted interest category your ML model generated) are not covered by portability, though they may be covered by access.

The design considerations: the export should be downloadable as a file, not just displayed in the UI. It should be generated on demand and made available securely — either via a download link with a short expiry or a direct file download from an authenticated endpoint. For large data sets, generating the export asynchronously and notifying the user when it’s ready is the right pattern.

Right to Object and Right to Restriction

These two rights are less commonly discussed but have real implementation requirements.

The right to object (Article 21) lets a user say: “Stop using my data for this specific purpose.” The most common trigger is objecting to processing based on legitimate interest, or objecting to direct marketing. When a user exercises this right, you stop that processing — and you have to be able to demonstrate that you did.

The right to restriction (Article 18) is narrower. It lets a user freeze processing of their data without requesting deletion — for example, while they dispute its accuracy. The data stays, but it can’t be actively processed until the restriction is lifted.

Both of these require a processing state attached to the user record. A simple boolean flag per processing activity — “marketing_enabled, profiling_enabled, processing_restricted” — covers the most common cases. The harder part is making sure every system that processes that user’s data actually reads and respects those flags. A flag that sits in the user profile service but never gets checked by the email marketing pipeline or the recommendation engine isn’t compliance — it’s just a record that you tried.

The practical approach is to build flag checks into the service layer that handles each processing activity, and to include processing state in the data you emit to downstream consumers via events. That way, when a restriction is applied, it propagates naturally through the system rather than requiring every service to poll for updates.

A Note on the One-Month Deadline

Every right covered in this section carries a one-month response deadline. For simple requests in a well-designed system, that’s plenty of time. For complex requests in a system where personal data is spread across many services, that deadline can become tight — especially if fulfilling the request requires manual steps or coordination across multiple teams.

The response deadline is from the date the user submits the request, not from when your team gets around to reviewing it. Building a tracked intake process — even a simple ticket queue — ensures requests don’t get lost, deadlines are visible, and you have an audit trail showing when each request was received and completed. At scale, dedicated DSR tooling makes this more manageable, but the process matters more than the tool.

Third-Party Dependencies

Every library you install, every API you call, and every cloud service you deploy to extends your compliance surface. Under GDPR, you remain responsible for what happens to personal data even after it leaves your system. If a vendor causes a breach or mishandles data, the regulatory consequence, in many cases, falls on you as the data controller.

This is the core asymmetry in third-party risk: your vendor’s failure mostly becomes your compliance liability. Regulators don’t distinguish between breaches that originated in your network versus a vendor’s system. What matters is that personal data was compromised while under your stewardship. The Snowflake incident in 2024 made this concrete — downstream organisations whose data was exposed through Snowflake’s infrastructure faced their own GDPR breach notification obligations, regardless of where the breach started.

Auditing Your Dependencies for Data Collection Behaviour

Most developers review third-party packages for functionality and security vulnerabilities. Fewer review them for data collection behaviour — what personal data they collect, where they send it, and under what terms.

For npm, pip, or Maven dependencies, the starting point is understanding what network calls a package makes. Tools like “socket.dev” for npm flag packages that make unexpected network requests. For browser-facing JavaScript, browser developer tools and privacy-focused proxies like mitmproxy let you inspect what data an SDK sends on initialisation. An analytics SDK that fires on page load before any consent signal has been received is a GDPR violation — regardless of what the vendor’s documentation says it does.

Questions worth asking before integrating any third-party tool:

  • What personal data does this tool collect, and can I control what it receives?
  • Where is that data processed and stored geographically?
  • Does the vendor offer a Data Processing Agreement?
  • Does the tool respect consent signals — specifically, does it have a mode that disables data collection until the user has opted in?
  • What happens to data I’ve sent them if I terminate the contract?

That last question matters. In December 2025, CNIL fined a marketing technology processor €1 million for retaining personal data belonging to 46.9 million users after its contract ended — and for processing data outside the controller’s instructions. The vendor’s contract had expired. The data hadn’t been deleted. The fine went to the processor, but the controller’s failure to enforce data return and deletion obligations was part of the investigative record.

What a Data Processing Agreement Must Cover

A Data Processing Agreement (DPA) is the contract that governs what a processor can do with personal data you share with them. Under GDPR Article 28, you are required to have one in place with every vendor that processes personal data on your behalf. Most major SaaS tools and cloud providers offer standard DPAs — but signing one isn’t the same as reading it.

A DPA that actually protects you needs to cover:

  • The subject matter and duration of the processing — what data, for what purpose, for how long
  • Instructions for processing — the processor must only act on your documented instructions, not use the data for their own purposes
  • Confidentiality obligations — people with access to the data must be bound by confidentiality
  • Security measures — what technical and organisational measures the processor has in place
  • Sub-processor rules — whether they can engage further sub-processors, and your right to object
  • Data subject rights — the processor must help you fulfil access, erasure, and portability requests
  • Deletion or return of data at the end of the contract
  • Audit rights — your right to verify compliance

The sub-processor clause deserves particular attention. When you sign a DPA with a vendor, you’re also inheriting their sub-processors — the cloud infrastructure, logging tools, and analytics platforms they use. Foundation model providers often rely on cloud infrastructure from other vendors, creating chains of sub-processing that may span multiple countries. Documenting these relationships and ensuring adequate safeguards across the entire chain requires vendor transparency that contracts don’t always provide. Ask for the sub-processor list, and check whether it includes any transfers outside the EU/EEA.

Cloud Provider Data Residency

When you deploy to a cloud provider, the region you choose determines where personal data is physically stored and processed. Under GDPR, transferring personal data outside the EU/EEA requires either an adequacy decision, Standard Contractual Clauses (SCCs), or another recognised transfer mechanism. Choosing the wrong region — or letting auto-scaling spin up instances in an unintended region — can create a transfer compliance problem without anyone noticing.

Enforcing data residency in practice means making it a constraint in your infrastructure configuration, not a convention that relies on people remembering. In Terraform or CloudFormation, hard-code the region. Use cloud provider policies — AWS Service Control Policies, GCP Organisation Policies — to prevent resources from being created outside approved regions. Some cloud providers offer region-locked services specifically for EU data; use them where they’re available.

The EU-US Data Privacy Framework (adopted in 2023) provides a legal basis for transfers to certified US organisations. The September 2025 General Court judgment upheld its adequacy, which is a more stable position than its two predecessors — but the framework has been invalidated twice before, and relying on it as a sole safeguard carries historical risk. SCCs remain the belt-and-suspenders approach for US transfers: implement them alongside the framework, and document a Transfer Impact Assessment for sensitive data categories

The AI/ML Vendor Problem

Sending user data to an external model API is a data processing activity under GDPR, and most teams don’t treat it that way. Every API call that routes personal data to an LLM provider is a potential compliance event: a cross-border transfer, a sub-processing relationship, and a question of what happens to that data after the call completes.

Using a third-party model API, whether OpenAI, Anthropic, Google, or any other provider, means sharing user data with a processor. You need a DPA with each vendor. You remain liable for their compliance failures.

The specific questions to resolve before sending user data to an external model API:

Data retention by the provider. How long does the provider retain prompt data? Can you configure zero-retention mode? If a user exercises their right to erasure, can you guarantee their data is deleted from the provider’s systems too? Several major providers offer API tiers with no training on customer data and configurable retention — but these are opt-in, not default.

Where inference happens. A real-time API call is a data transfer to wherever the model runs. If that’s a US data centre and you’re processing EU personal data, you need a valid transfer mechanism in place. Some providers offer EU-region endpoints — use them where available, and verify rather than assume.

What goes into the prompt? The most controllable variable is what you actually send. Strip or pseudonymise personal data before it reaches the API, where possible. If a user’s support ticket contains their name and account number, you often don’t need to include both to get a useful model response. Minimising what enters the prompt reduces the blast radius of any future compliance question about that call.

Training on your data. Check explicitly whether the provider uses API inputs to train or fine-tune their models. Most enterprise tiers exclude this by default, but default terms for developer-tier access sometimes don’t. If a user’s data was ever used to fine-tune or train a production model, their personal data becomes embedded in model weights — and the right to erasure under Article 17 becomes architecturally impossible to fulfil.

The EDPB’s April 2025 guidance clarifies that large language models rarely achieve true anonymisation standards, and controllers deploying third-party LLMs must conduct comprehensive legitimate interests assessments before doing so. For most use cases involving personal data, a DPIA is also required before integrating an external model API into a production system. This isn’t a formality — it’s the documented evidence that you assessed the risk before the data started flowing.

GDPR and AI Features For Developers

AI features create GDPR obligations that most teams haven’t mapped yet. Recommendation engines, scoring systems, LLM integrations, and ML models trained on user data each carry specific requirements under the Regulation, and regulators are actively enforcing them. The EU AI Act, which began applying in phases from 2024 and reaches full application in 2027, adds a second layer of obligations on top of GDPR for higher-risk AI systems. For developers building AI features today, both frameworks are live and relevant.

Article 22: When Your Algorithm Triggers Special Obligations

Article 22 of GDPR restricts automated decision-making, including profiling, that produces decisions with a legal or similarly significant effect on a person. If your system automatically determines whether someone gets a loan, is shortlisted for a job, is flagged as high-risk, or is served or denied a service based on algorithmic scoring, Article 22 applies.

The obligations when Article 22 is triggered are specific: you must give the person meaningful information about the logic involved, allow them to request human review of the decision, and give them the ability to contest the outcome. “Meaningful information about the logic” is deliberately vague in the text, but regulators have interpreted it to mean that a user should be able to understand, in plain terms, what factors drove the decision — not a full technical explanation of the model, but enough to know why they were declined or flagged.

The engineering implication here is that systems making Article 22 decisions need explainability built in. A black-box model that produces a score with no interpretable feature attribution doesn’t meet this standard. Tools like SHAP or LIME, which produce feature importance explanations for individual predictions, are the current practical approach. You also need a human review pathway — an actual mechanism, not a mailto link buried in a privacy policy — and a way to record and respond to challenges.

Recommendation engines and ad personalisation systems sit in a grey area. They don’t usually produce legally significant decisions, but if they involve profiling at scale based on inferred sensitive characteristics, such as health interests, political views, and sexuality inferred from behaviour, regulators treat them with significant scrutiny. LinkedIn’s €310 million fine in 2024 came specifically from using behavioural data for targeted advertising and analytics without a valid legal basis — the Irish DPC found that processing first and third-party data for ad personalisation didn’t meet the threshold for legitimate interest or contractual necessity. The violation was in how the recommendation and advertising system was designed, not in how it was disclosed.

Training ML Models on User Data

Using user data to train or fine-tune a machine learning model is a processing activity under GDPR and needs a legal basis like any other. The legal basis you used to collect the data originally doesn’t automatically extend to using it for model training — purpose limitation means the new use needs its own justification.

For most commercial applications, the realistic options are consent or legitimate interest. Consent is the cleanest but hardest to operationalise at scale, as it needs to be specific, informed, and freely given, which means users need to understand that their data will be used to train a model, not just to provide them with a service.

Legitimate interest is more commonly relied upon, but it requires a documented balancing test demonstrating that the interest in training the model outweighs the impact on users’ rights, and regulators have become more sceptical of this basis being applied to AI training without rigorous assessment.

A DPIA is required before training any model that involves large-scale processing of personal data, systematic profiling, or processing of special category data. DPIA triggers specific to LLMs include processing personal data for AI training without explicit consent, automated profiling through generative AI responses, and cross-border transfers to jurisdictions without adequacy decisions. The DPIA is basically the document that captures your legal basis, your risk assessment, and your mitigation measures. Without it, you have no documented evidence that the training activity was assessed before it started.

A real and costly example of what happens without this groundwork: CNIL fined Clearview AI €20 million after finding it had processed biometric data of French residents — building facial recognition templates from billions of scraped images — without any valid legal basis and without honouring data subjects’ rights to access or erasure. The root failure was that the data collection and model training activity had no legal foundation at all. The same logic applies to any team training models on user data without first establishing and documenting a lawful basis.

Special category data demands explicit consent or one of a narrow set of exceptions. If your model training pipeline touches any of these data types, the bar is significantly higher than for general personal data, and the DPIA requirement is not optional.

LLM Integrations: What Happens to Data After the API Call

We covered this earlier in the context of third-party vendor risk, but it’s worth addressing specifically from the AI feature development angle, because the decisions developers make when building LLM-powered features determine most of the compliance exposure.

Every time your application sends user data to a third-party model API, you’re executing a data processing operation. Most developers think of it as an API call. That distinction — between a routine API call and a regulated data processing event — is what regulators focus on, and getting it wrong carries fines of up to €20 million or 4% of global annual turnover.

The design decisions that matter most:

Minimise what you send. Before any user data reaches an external model API, ask whether all of it is necessary for the task. A support ticket summarisation feature doesn’t need to send the user’s account number, billing history, or email address if the ticket text itself is sufficient. Strip or pseudonymise fields that aren’t needed for the model to do its job.

Know your provider’s data retention terms. The default terms for API access vary significantly between providers and tiers. Enterprise tiers typically offer zero-retention options — prompt data is not stored after the response is returned. Developer or free tiers often don’t. If you’re processing EU personal data, you need to know exactly how long the provider retains it and under what conditions, because a user’s Article 17 erasure right extends to data held by your processors.

Check where inference happens. An API call to a US-based provider is a data transfer under GDPR. You need a valid transfer mechanism — Standard Contractual Clauses are the most common — and a DPA in place. Some providers offer EU-region endpoints; use them where available.

Think carefully about fine-tuning. If user data is used to fine-tune or train a production model, the personal data becomes embedded in model weights, and the right to erasure under Article 17 becomes architecturally impossible to fulfil. Before fine-tuning any model on user data, confirm that you have a valid legal basis for that specific processing activity, and understand that you may be creating an erasure obligation you cannot technically satisfy.

Logging and Observability in AI Features

Logging is how engineering teams debug and monitor systems. In AI features, it creates a specific and very common GDPR problem: prompts contain personal data, and those prompts get logged.

A user asking a customer support chatbot about their account includes identifying information by nature. A user querying a health-related AI feature may include symptoms or diagnoses. A code assistant prompt from a developer may include company data or customer identifiers from a codebase. When these prompts are logged in full — as they typically are in standard observability setups — you’ve created a store of personal data that nobody designed, nobody retention-labelled, and that probably isn’t covered by your privacy notice.

The fix requires deliberate decisions at the instrumentation layer:

Don’t log full prompts by default. Log metadata, including latency, token count, model version, and error codes, without the prompt content. If you need prompt content for debugging, log a hash or a truncated, sanitised version.

Apply PII detection before logging. Tools like Microsoft Presidio, AWS Comprehend, and Google Cloud DLP can detect and redact personal data in text before it’s written to a log store. Integrating this into your logging pipeline means you’re never writing raw personal data to disk unless you’ve explicitly decided to.

Set retention limits on any logs that contain personal data. If you must retain prompts for quality assurance or model evaluation, treat that log store as a personal data store with access controls, retention limits, and inclusion in your RoPA.

Tell users what you log. If prompts are retained for any purpose, whether for safety monitoring, quality review, or model improvement, your privacy notice needs to say so, in plain language, before the user sends their first message.

The EU AI Act adds a transparency obligation on top of this, stating that users must be told when they’re interacting with an AI system. That means a clear disclosure in the UI, not buried in a terms of service page. For AI features that were deployed before these obligations started applying, this is a retroactive requirement — the disclosure needs to be there now, not only in new features going forward.

Common Mistakes That Lead to Real Fines

Every mistake on this list has contributed to documented GDPR enforcement actions. Some have cost companies hundreds of millions of euros.

1. Using Production Data in Test and Staging Environments

This is one of the most widespread compliance gaps in software development, and one of the least discussed. Test and staging environments typically have weaker access controls, broader team access, and no formal incident response process. When real user data lives there, every person with access to that environment becomes an unintended data processor — often without a legal basis for that processing.

The fix is to make synthetic or masked data the default for all non-production environments, and to treat any exception as requiring explicit sign-off. Automated checks in your CI pipeline that scan for real-looking personal data in test fixtures and seed files catch the shortcuts that happen under deadline pressure.

2. Logging PII in Application or Error Logs

Italy’s Garante fined Luka — the company behind the Replika AI chatbot — €5 million after finding it collected and processed personal data without proper consent and with opaque privacy practices. Part of the investigation focused on how personal data flowed through the system in ways users weren’t aware of. Logging is one of the most common channels through which this happens — request bodies, error traces, and analytics events that contain personal data nobody intended to store.

The mistake usually happens gradually. A developer logs a full request body to debug an issue. A stack trace captures a function argument that contains a user’s email address. An analytics event fires with a user identifier attached. None of it is malicious, but collectively it creates a store of personal data that isn’t in your privacy notice, has no retention policy, and may be sitting in a log aggregation service in a different country.

Treat logging as a privacy decision. Define what can and cannot be written to logs, enforce it in code review, and use PII detection tools to scan log output before it reaches your aggregation stack.

3. No Automated Data Deletion — Manual Processes That Never Run

Defining a data retention policy is the easy part. Actually enforcing it consistently, at scale, over time, is where most teams fall short. Manual deletion processes — a script someone runs occasionally, a quarterly task nobody owns — stop running. Schemas change, scripts break, and the task gets deprioritised until an audit or a subject access request reveals data that should have been deleted months or years ago.

CNIL’s January 2026 fine against Free Mobile — €42 million — included a finding that the company had retained millions of subscriber records beyond their permitted retention period, without implementing any automated mechanism to enforce deletion at the end of that period. The regulator found that Free Mobile had no process to sort and delete data once it was no longer needed. That’s an engineering failure as much as a policy one.

Automated deletion jobs, TTLs, and archiving pipelines need to be first-class production features: monitored, tested, and alerted on when they fail. And retention enforcement needs to cover every place personal data lives — not just the primary database, but caches, warehouses, log stores, and third-party tools.

4. Consent That Isn’t Granular, Auditable, or Reversible

CNIL fined SHEIN €150 million after finding that advertising cookies were placed before users gave consent — a direct violation of the requirement that consent precede processing, not follow it. Google Consent Mode v2 became mandatory in March 2024, yet 67% of implementations still contain violations, with the most common being defaulting to granted consent before any user action. These are product and engineering failures, not legal ones.

Valid consent under GDPR has four properties: it must be freely given, specific to a purpose, informed, and unambiguous. In practice, this means no pre-ticked boxes, no consent bundled across multiple purposes, and no dark patterns that make declining harder than accepting.

Beyond collection, consent needs to be stored in a way that’s auditable. If a regulator asks whether a specific user consented to a specific processing activity on a specific date, and which version of your privacy notice was in effect, you need to be able to answer that from your data. A single boolean flag per user isn’t enough. And withdrawal needs to be as easy as giving consent in the first place, with downstream systems that actually stop processing when the flag changes.

5. Third-Party Integrations Without DPAs

Vodafone GmbH was fined €15 million in 2025 specifically because it had failed to properly oversee contracts drawn up by third-party agencies, which is a direct failure of processor oversight. Uber was fined €290 million by the Dutch DPA for transferring European drivers’ personal data to the US without implementing Standard Contractual Clauses or any other adequate transfer safeguard. In both cases, the liability sat with the data controller, not the vendor.

Every service your system sends personal data to needs a Data Processing Agreement in place. That includes analytics tools, error monitoring services, customer support platforms, cloud infrastructure, and AI model APIs. The DPA needs to cover what the vendor can do with the data, what sub-processors they use, and what happens to the data when the contract ends.

The sub-processor chain deserves specific attention. When you sign a DPA with a vendor, you’re also inheriting their sub-processors. A vendor whose infrastructure runs on a US cloud provider, whose logs go to a third-party aggregator, is a chain of data transfers — each of which needs a valid legal basis. Asking vendors for their current sub-processor list, and checking that list when it changes, is basic processor due diligence that most teams skip entirely.

6. No Incident Response Plan for the 72-Hour Window

Article 33 requires notification to your supervisory authority within 72 hours of becoming aware of a personal data breach. The fine for missing this deadline is separate from any fine for the underlying breach — it’s a standalone violation.

Meta was fined €251 million by the Irish DPC for a 2018 breach affecting 29 million Facebook users — a case that dragged through years of investigation and concluded in 2024. One of the findings was inadequate technical and organisational measures to prevent and respond to the breach. Having detection without a response plan is the same failure in practice: you know something happened, but you can’t act on it within the required window.

An incident response runbook for a data breach needs to answer, in advance: what qualifies as a notifiable breach, who makes that call, what information the Article 33 notification must contain, who submits it, and under what conditions Article 34 notification to affected individuals is also required. None of this should be worked out under pressure during an active incident. The runbook should be tested — at least as a tabletop exercise — before you need it.

7. Assuming GDPR Only Applies If You’re a European Company

GDPR applies to any organisation that processes personal data belonging to EU residents, regardless of where the organisation is based. Clearview AI, a US-based company with no EU offices, has been fined seven times by European regulators since 2020, with penalties totalling over €100 million — including a €30.5 million fine from the Dutch DPA in 2024. The company’s position that it isn’t subject to GDPR has not been accepted by any European regulator.

If your product has users in the EU — or could have users in the EU — GDPR applies to how you handle their data. Article 3 makes this explicit: the Regulation covers processing that relates to offering goods or services to EU residents, or monitoring their behaviour within the EU. The fact that your servers are in the US, or that your company is incorporated outside Europe, doesn’t change that.

For companies outside the EU that do fall under GDPR’s scope, Article 27 also requires appointing an EU representative — a named point of contact for regulators and individuals. Clearview’s failure to appoint an EU representative was cited as a separate violation in the French enforcement action — a relatively easy compliance step that compounded an already serious case.

Final thought

GDPR compliance in software development comes down to a simple principle: privacy decisions made early are cheap, and privacy decisions made late are expensive — technically, operationally, and financially. The enforcement record makes clear that regulators follow the code, not just the policy documents. The violations generating the largest fines trace back to schema design choices, logging configurations, consent implementations, and data transfer decisions — the kind of work that happens in sprints, in pull requests, and in architecture reviews.

The regulatory environment will continue to evolve — the EU AI Act is still rolling out, enforcement is broadening across sectors, and regulators are becoming more technically sophisticated in their scrutiny. But the foundation doesn’t change. Build systems that collect only what they need, protect what they hold, and can account for what they do. Everything else follows from that.

Leave a Comment

X