Skip to content
← Back to Blog

PII in Your DataLayer and the Form Listener Problem

A common Custom HTML pattern pushes raw form data into the dataLayer, making every field value accessible to every tag in the container. Here's how to fix it.

PII in Your DataLayer and the Form Listener Problem

The dataLayer is a JavaScript array that GTM uses to pass information between the page and the container. When something happens on the page (a form submission, a purchase, a button click), the site pushes an object to the dataLayer, and GTM reads it. Tags use dataLayer values for everything from sending conversion data to Google Ads to recording events in GA4, passing revenue to analytics, and triggering remarketing pixels.

The dataLayer is also a shared namespace. Every tag in the container can read every value that's been pushed to it, with no access control, no scoping, and no permission model. A value pushed to the dataLayer for GA4 is equally accessible to every advertising pixel, every Custom HTML tag, and every third-party script operating through GTM. Whatever goes in becomes available to everything else.

This becomes a problem when form data enters the dataLayer unsanitized.

The form listener pattern

A common Custom HTML tag pattern captures form submissions and pushes the data to the dataLayer for event tracking. The implementation typically looks like this: the tag listens for form submit events, calls something like form.vals() or iterates over formValues, and pushes the result as a dataLayer object. The intent is to track which forms are being submitted and capture identifying information for lead attribution.

The problem is what form.vals() returns. It returns every field value in the form: email address, phone number, full name, company name, job title, message content. Whatever the form collects, the function returns, and all of it gets pushed to the dataLayer as key-value pairs.

Once in the dataLayer, those values are accessible to every tag. GA4 can log them as event parameters. Google Ads can send them as conversion data. Third-party remarketing pixels can read them through dataLayer variables and append them to their tracking calls. A LinkedIn Insight tag configured to capture form events could potentially ingest the raw field values along with the event data.

This isn't hypothetical. A study of 718 GTM tag configurations found widespread PII transmission to external domains through exactly this mechanism. The tags weren't malicious; they were doing what they were configured to do. The dataLayer made the PII available, and the tags consumed it.

DataLayer versus server-side data handling

When a form submits data to a server endpoint, the data flows through a pipeline the organization controls. The server processes it, the database stores it, and access controls determine who sees it. A developer who handles form data server-side can enforce field-level access, encrypt sensitive values, and control which downstream systems receive what data. There's a perimeter.

The dataLayer has no perimeter. It exists in the browser, in the user's session, accessible to client-side JavaScript. GTM's architecture treats the dataLayer as a communication bus between the page and the tag management system, with the design assumption that the values pushed there are intended for tag consumption. There's no mechanism to say "this value is for GA4 only" or "this value should not be sent to third-party domains." No access control lists, no field-level permissions, no encryption layer. Every tag with a trigger that fires on dataLayer events gets the same data.

This means a PII exposure in the dataLayer isn't a traditional data leak where data escapes a controlled system. It's a broadcast. The data is pushed to a shared space and every consumer in that space receives it simultaneously. The number of external endpoints that receive the PII depends entirely on how many tags are configured to read form-related dataLayer events. A container with 5 advertising tags, 2 analytics tags, and 3 Custom HTML scripts that fire on form submissions sends the raw PII to up to 10 external domains in a single page interaction.

The compliance implications compound quickly. Under GDPR, each of those external domains becomes a data processor, and the organization needs a legal basis for sharing the data, a record of the processing activity, and in many cases a data processing agreement with each vendor. Most organizations don't know the form listener exists, let alone that it's distributing PII to their entire tag stack.

Safer form tracking approaches

The fix isn't removing the form listener, because form submission tracking is operationally useful and marketing teams need to know which forms are converting and where. The fix is controlling what gets pushed to the shared namespace.

Push the form identifier, not the form contents. A form's ID attribute, its name, the page URL where the submission occurred, and optionally the number of fields completed are enough for most tracking purposes. "Contact form submitted on /pricing" tells the analytics team what they need to know without exposing the user's email address to every tag in the container. If the form has a hidden field with a lead source or campaign parameter, that's safe to push because it's marketing metadata, not user PII.

If specific field data is needed for lead scoring or attribution, hash it before it hits the dataLayer. A SHA-256 hash of an email address can be matched against hashed customer lists for advertising purposes without exposing the raw address. Google Ads and Meta both accept hashed identifiers for enhanced conversions, and hashing in the Custom HTML tag before the dataLayer push means the raw value never enters the shared namespace. The hash is one-way: tags downstream can use it for matching but can't reverse it to recover the original email.

For fields that have no tracking purpose (message content, company description, free-text responses), don't push them at all. The form listener should whitelist which fields to capture, not default to capturing everything. This is a code change in the Custom HTML tag, usually replacing a blanket form.vals() call with explicit field selection: push form_id, form_name, page_url, and optionally a hashed email field. Everything else stays out of the dataLayer.

Detecting form listeners in a container audit

The pattern to detect is straightforward in static analysis. A Custom HTML tag that references form.vals(), formValues, serialize(), or iterates over form inputs using querySelectorAll('input') and pushes the result to dataLayer.push() is a candidate. Variations exist: some implementations use jQuery's .serializeArray(), others loop through HTMLFormElement.elements and build an object. The common thread is blanket field capture followed by a dataLayer push.

The risk scales with what the forms on the site collect. A newsletter signup form with one email field is lower risk than an enterprise contact form with 8 fields including phone, company, job title, and message. A checkout form with payment information is the highest risk, though these are rarer in GTM form listeners because payment fields are typically handled by iframe-isolated payment processors.

The detection is static analysis: reading the Custom HTML code and identifying the data flow from form element to dataLayer push. Runtime detection would require intercepting dataLayer events and inspecting the values, which is a different kind of audit entirely. Static analysis catches the pattern regardless of whether a specific form exists on the current page, which matters because the form listener tag might fire site-wide while the problematic form only exists on one page.

For containers where this pattern exists, the remediation is specific: replace the blanket form capture with a field-level whitelist that pushes only non-PII identifiers, or hash sensitive fields before they reach the dataLayer. Both approaches preserve the form tracking capability while removing the broadcast risk. The change is typically 10-15 lines of code in a single Custom HTML tag, and the compliance improvement covers every form on the site.

The dataLayer as shared infrastructure

The dataLayer is a communication mechanism, not a data store. It has no retention policy, no access controls, and no concept of sensitivity. Treating it as a temporary holding area for whatever data the page produces misses the fact that every tag configured to fire on that event receives the data the moment it's pushed.

Any value pushed to the dataLayer should pass a simple test: would you be comfortable if every vendor tag in the container sent this value to their servers? If the answer is no, the value shouldn't be in the dataLayer. The form listener problem is the most common violation of this principle, but it applies equally to Custom HTML tags that read URL parameters containing user IDs, tags that capture search queries from the page's search bar, or scripts that push session replay data with form field contents visible.

The dataLayer is shared infrastructure. Treat it like a public channel, not a private one.

Audit your GTM container

TagManifest gives you an instant health score and prioritized fixes.

Scan Your Container