There are prompt libraries out there for GTM auditing with AI. Some of them are quite good. Analytics Mates published 25 prompts that are genuinely useful. But I've noticed a structural problem with most of these libraries: each prompt asks the AI to do too many things at once.
Take a typical example: "Check for each conversion tag that it fires once and only once. Assess if the firing conditions could allow multiple fires per user action. If yes, propose a fix." That reads like one prompt. It's actually three tasks: inventory all conversion tags, evaluate every trigger condition, then design a fix for each problem. Against a container with 40 conversion tags and 80 triggers, that's a lot of state to hold simultaneously.
The AI can do it. The output will be reasonable. But "reasonable" isn't what you want when you're making changes to a production container. You want specific. You want verifiable. You want to be able to check the answer before you act on it.
Data quality vs. prompt quality
When you run prompts against raw container JSON, the AI is doing two jobs at once: parsing the structure to understand what exists, and analyzing what it finds. Both are expensive, and the analysis quality suffers because the parsing consumes so much context.
The result is directionally correct but vague. "Several tags appear to have broad firing conditions." Which tags? What's broad about them? The AI skimmed because the raw JSON gave it too much surface area.
I've found that better prompts don't fix this. Better data does. When my copilot has access to structured audit data (findings already categorized, scored, mapped to specific tags), the prompts get simpler because the data has already done the hard work.
Instead of: "Check every tag for consent issues and tell me what's misconfigured."
I can say: "The audit flagged these three advertising tags for firing without consent. They're Google Ads conversion pixels. What should the consent configuration be?"
First prompt: parse, identify, evaluate, propose. Second prompt: research one thing. The answer is more specific, more verifiable, faster.
Real prompts from a real audit session
I use voice transcription (Wispr Flow) to talk to the copilot rather than type. This changes the interaction pattern more than you'd expect. Typed prompts tend toward formal, multi-step instructions. You compose them carefully. Spoken prompts tend toward the question you'd ask a colleague sitting next to you: less structured, more conversational, often more honest about what you don't know.
With structured data already loaded, the spoken approach works well because the parsing step is done. I can be direct about what I need without writing a paragraph of setup. I can also convey urgency or confusion in a way that typed prompts don't capture. "I don't understand what this tag does and I need to before I delete it" is a perfectly good prompt when the copilot already has the structured data showing why that tag was flagged.
Here's what those conversations sound like in practice:
Understanding a finding:
- "What does tag 65 actually do? It's flagged for a jQuery dependency and a dead UA reference. Walk me through the code."
- "Six advertising tags bypass consent. The audit says they should be respecting it. What's the correct configuration for each tag type?"
- "This Custom HTML tag is 318 lines. Is it doing anything that GA4 or GTM handles natively now?"
Researching a fix:
- "This GA4 event tag sends a custom parameter called 'transaction_id' but it reads from a data layer key called 'orderID'. Is that the right field name or does it need to match GA4's spec?"
- "The audit flagged duplicate Facebook pixels. One's Custom HTML, one's the native template. Which should I keep?"
- "This Custom HTML tag injects a script from a third-party domain. Is that vendor still active? What does the script do?"
Checking dependencies:
- "Tag 65 pushes values to the data layer. Are any other tags reading from those keys?"
- "If I change consent types on these advertising tags, what happens to conversion volume in Google Ads during the transition?"
- "This trigger fires on All Pages. Which tags use it? Any of them consent-sensitive?"
Documenting decisions:
- "Write a changelog entry for the consent timing fix. Audience is the client's technical team. Plain language, before and after values, why we chose 500ms."
- "Summarize the three UTM attribution options for the marketing ops lead. Include tradeoffs."
- "List everything we deferred today with the reason. Format it as a backlog."
None of these are reusable in the traditional "prompt library" sense. They're specific to the container, the finding, and the moment. That's the point. The structured data provides the context. The prompt just needs to be clear about what you need right now.
Where prompt libraries do help
Prompt libraries have real value at two moments.
Before the audit, if you're building your own structured dataset. Prompts like "parse this container JSON and list all tags with their consent configuration, trigger type, and firing conditions" are data assembly prompts. They shape the dataset you'll work from, so quality matters here. Getting specific about the output format also matters. "For each tag, return: tag name, tag type, consent configuration, trigger names, firing condition, and any Custom HTML dependencies" gives you a dataset you can actually work from. "Tell me about the tags" gives you a summary you'll have to re-parse.
After the audit, for pattern recognition beyond what rules catch. "Looking at event naming across all GA4 tags, are there inconsistencies that suggest different people configured different tags?" That kind of prompt works best when the AI already has the structured overview and is looking for patterns within it. You can also use post-audit prompts to look for systemic issues: "Are there trigger patterns that suggest tags were added for one-time campaigns and never cleaned up?" or "Which Custom HTML tags could be replaced with native GTM templates?"
Between those two moments, during the actual audit work, the prompts should be conversational. The data is doing the heavy lifting. You're just talking through the work with someone who can look things up fast.
Prompt engineering vs. data engineering
The Analytics Mates prompts I mentioned are solid. Bidmark also wrote up their agency approach to AI-assisted GTM auditing, which covers similar ground from a different angle. If you're starting from raw JSON and don't have a structured audit step, those prompts will get you further than winging it. I'd recommend them for anyone who wants a quick audit pass without building a structured dataset first.
Where I'd push back is on the emphasis on prompt engineering as the skill that matters. In my experience, a mediocre prompt against structured, categorized audit data produces better output than a carefully engineered prompt against raw JSON. The data quality determines the output quality. The prompt just points the AI at what you need right now.
The skill that actually makes a difference is knowing how to break the audit into small enough steps. Parse first, then categorize, then score, then work through one category at a time. Each step is simple. They add up to a thorough audit.
I've done this workflow enough times now that I rarely think about prompts at all. The structured data is loaded, I talk through findings conversationally, and the copilot responds with specific, verifiable answers because it has specific, structured data to work from. Getting the data right is what makes everything else work.