I've refined this process across a few dozen container audits. The core of it is simple: break the audit into distinct steps where each one produces something concrete that feeds the next. The whole thing takes an afternoon for a container with 100 to 200 tags.
One thing to get out of the way early: throwing raw container JSON into ChatGPT and asking "what's wrong" doesn't work well. The JSON for a mid-size container runs hundreds of kilobytes. An LLM processing that in one pass will skim, not audit. You'll get observations like "you have many tags" and "consider reviewing consent." Technically true. Not useful.
Here's the process that actually works.
Step 1: Export the container JSON
In Google Tag Manager, go to Admin, then Export Container. Select the latest workspace version. Download the JSON file. This is your container's complete configuration: every tag, trigger, variable, folder, and consent setting in one file.
This export is also your safety net. Before you change anything, you can restore this exact state from the same Admin panel. Keep the file. I name mine with the date and version number so I can track what the container looked like at each stage of the audit. It's saved me more than once when I needed to compare current state against pre-audit state.
Step 2: Build a structured dataset
This is the step that makes everything else work, and it's the step most people skip.
The raw JSON is a configuration file. It tells you what exists, not what's wrong. The goal here is to transform that configuration into categorized, scored findings that you can work through systematically, one issue at a time.
Two ways to do this:
Use a tool. TagManifest takes your container JSON, runs 85 diagnostic rules across consent, analytics, advertising, security, performance, ecommerce, organization, infrastructure, and naming, and returns a structured report. Findings scored by impact, organized by effort, specific tags identified. Free, browser-based, nothing leaves your machine. About two minutes to get your structured dataset.
Build your own. Create a Claude project or a ChatGPT custom GPT with instructions to parse the container JSON and produce a structured audit. Define the categories, the rules, the output format. More upfront work, but you control what gets flagged and how. Consultants managing dozens of containers tend to prefer this because they can standardize the process across clients.
Either way, the output should give you: findings categorized by type, an indication of impact, the specific tags affected, and some organization by effort (quick fix versus project).
The reason this matters: it separates the audit step (deterministic, rule-based) from the interpretation step (where AI adds real value). Combining those into one prompt is where things go sideways. Separating them gives you a solid dataset to work from when you start making decisions.
Step 3: Load the structured data into your copilot
Open your AI tool of choice and load two things: the structured dataset from step 2, and the original container JSON as a reference. The structured data is what the copilot works from. The raw JSON is what you consult when you need to dig deeper into a specific tag's code.
If you're using Claude Code, you can load both files into the project and work through findings conversationally. Chat interface? Paste the structured data first and attach the JSON.
Worth noting: the copilot doesn't have your organizational context. It doesn't know your company migrated payment processors six months ago, or that the marketing team uses HubSpot, or that the developer who built the data layer left last year. You provide that context as it becomes relevant. The structured data handles the technical "what's wrong." You handle the "what to do about it."
Step 4: Work through findings by category
Don't try to address everything at once. Pick a category and work through it completely before moving to the next.
Start with consent. Highest compliance risk, often the fastest to fix. Have the copilot walk through each finding, explain the correct configuration for each tag type, and flag anything that needs team input.
Then measurement accuracy. GA4 event configuration, duplicate tracking, event model hygiene. The copilot can cross-reference your setup against Google's GA4 documentation to catch parameters that won't map correctly.
Then dead code and quick wins. UA tags still firing into retired properties. Custom HTML loading jQuery when the site already has it. Orphaned triggers attached to nothing. Low-risk, immediate improvement.
Then structural work. Naming conventions, folder organization, trigger architecture. Real improvements, but they're projects. The copilot helps scope the work and produce documentation. The implementation needs time and usually coordination.
For each finding: understand it, verify it, research the fix if you're unsure, check dependencies in the container, then act or defer with a note about why. The "defer with a note" part matters more than it sounds. A clear deferral with a blocking reason ("needs client input on attribution ownership") is more useful than a fix done without enough context. The deferred items become your backlog for the next session.
Step 5: Verify in Preview mode
Nothing replaces GTM Preview mode. After making changes, open Preview, load your site, verify that tags fire correctly, consent is respected, and events show up in GA4 DebugView.
AI can't do this for you. It can help you interpret what you're seeing ("why is this tag firing twice?" or "what does this consent state mean?"), but verification requires a browser, a real page load, and your judgment about whether behavior matches intent.
Check consent behavior specifically: load the page, deny consent in the CMP dialog, and confirm that the tags you just configured actually respect that decision. Then accept consent and verify they fire. Watch GA4 DebugView for the events you expect to see and make sure the parameters are correct. This is the part that catches configuration mistakes that look right in the GTM interface but behave wrong on a live page.
There are MCP servers emerging that connect AI directly to the GTM API, enabling the copilot to read container state and push changes programmatically. If you're heading in that direction, having a structured audit as the starting point matters even more. You want the AI working from a clear understanding of what needs changing, not improvising.
Step 6: Document as you go
The audit findings are a starting point. As you work through fixes, the copilot can produce documentation for different audiences:
- Technical changelogs for whoever maintains the container next
- Summary communications for stakeholders who need the what-happened without the technical detail
- Decision logs for judgment calls, so the reasoning is recorded when someone asks six months later
- Deferred items with clear notes about what's blocked and what needs to happen next
This is where copilots quietly earn a lot of their value. Producing different document types from one set of findings is tedious work for a human and fast work for an AI. Do it during the session while context is fresh, not after.
What this gets you
A container audit done this way surfaces quick wins in the first hour (dead tags, consent misconfigurations, obvious duplicates), focused work for the afternoon (consent corrections across tag groups, event model cleanup), and structural projects you can scope and schedule (naming overhaul, trigger refactoring, data layer improvements).
The AI accelerates research, analysis, and documentation at every stage. It does not replace your judgment about what's safe to change, who needs to be consulted, and what needs testing in Preview mode first. The process is what makes this work. The specific tools matter less than the order you use them in. If you want a solid overview of GTM auditing fundamentals without the AI layer, MeasureMinds has a thorough walkthrough of the manual process.