How TagManifest Scores Your Container

When you scan a GTM container with TagManifest, the summary panel shows two scores instead of one. A functional health score with a letter grade, and a container hygiene profile with five independent dimensions. If you're looking at your results wondering what a C+ with "Mixed" hygiene actually means, this is the explanation.

Most audit tools produce a single number. The problem with that approach is straightforward: a container with perfect naming conventions, organized folders, and zero orphaned triggers scores well even if it has consent violations on every advertising tag. And a container with messy names and no folders scores poorly even if its tracking is flawless. Those are fundamentally different states. One is a compliance risk. The other is a maintenance inconvenience. A single score makes it impossible to tell the difference, and that difference is the thing you actually need to know.

TagManifest separates them. Functional health tells you whether the container is doing its job. Container hygiene tells you whether someone else could maintain it. They're measured differently because they matter differently.

Two dimensions, not one score

Functional health gets a score out of 100 and a letter grade because the stakes justify precision. If your consent configuration is wrong, advertising tags are firing without valid consent for European visitors. If your GA4 events are misconfigured, you're making business decisions on bad data. These are operational problems with measurable consequences.

Container hygiene gets a profile, not a score. Five dimensions, each rated independently as Clean, Mixed, or Cluttered. There's no combined hygiene number because "72% organized" doesn't mean anything actionable. Knowing that your naming is Clean but your triggers are Cluttered tells you exactly where to focus. A composite would hide that.

The practical result: a container can score a B on functional health (solid tracking, a few consent gaps) while showing a Cluttered hygiene profile (inconsistent naming, no folders, orphaned variables everywhere). That's a container that works but would be painful to hand off. It's a different kind of problem than a container with Clean hygiene and an F on functional health, which looks tidy but isn't actually tracking correctly.

The functional health score

The score starts at 100 and goes down based on findings from the scan. Every finding is classified by dimension (functional or hygiene) and by tier (error, optimization, or info). Only functional findings affect the health score. Hygiene findings appear in the hygiene profile but don't touch the number.

The deductions are deliberately asymmetric:

Errors deduct 8 points each. These are findings where something is materially wrong: no CMP detected, consent mode not wired, PII in GA4 parameters, duplicate conversion tracking.
Optimizations deduct 2 points each. These are findings that affect data quality or compliance but don't indicate a fundamental misconfiguration: missing Consent Mode v2 signals, ad tags on the wrong consent type, event names exceeding GA4 limits.
Info findings deduct nothing. They're observations, not problems: Custom HTML tags detected, multiple GA4 measurement IDs present, server-side tagging indicators found.

The asymmetry reflects how the findings translate to real-world impact. A container with no CMP and unconsented advertising tags is in a materially different position than one where a few event names are slightly too long. The scoring should make that difference visible at a glance.

Per-category deduction caps

This is the design decision that prevents a single broken area from dominating the score.

If a container has 12 consent findings (all functional), uncapped deductions would subtract 40+ points from the score based on consent alone. That would make the grade meaningless for understanding anything else about the container. So each category has a deduction cap:

Category	Maximum deduction
Consent	16
GA4 Data Quality	16
Advertising	8
Ecommerce	8
Security & PII	8
Performance	8
Dead Code	8

Once a category hits its cap, additional findings in that category still appear in the findings list and work plan, but they don't push the score lower. This means the health score reflects breadth of issues, not depth in a single area.

Consent and GA4 get higher caps (16) because they're the categories most likely to have many independent findings, and because problems in those areas tend to have the most direct operational impact. A container with consent violations across 12 tags should score lower than one with a single consent misconfiguration, but not so much lower that the score becomes a proxy for "consent problems" and nothing else.

The practical ceiling on total deductions (sum of all category caps) is 80 points, which means the lowest achievable score is 20. In practice, most containers land between 50 and 90.

Grade thresholds

The letter grade maps from the score:

Score	Grade	What it means
90-100	A	Well-configured. Findings are minor optimizations, not functional problems.
75-89	B	Healthy. Tracking works, but there are configuration improvements worth making.
60-74	C	Needs attention. Enough functional issues that data quality or compliance is affected.
45-59	D	Issues found. Multiple areas have problems that likely affect business decisions.
0-44	F	Action required. Significant functional problems across several categories.

A D doesn't mean the container is broken. Most D-grade containers are tracking something. They're sending data to GA4, firing conversion tags, loading the consent banner. The D means the gap between what the container is supposed to be doing and what it's actually doing is wide enough to affect decisions. Maybe advertising tags are gated behind the wrong consent type and EEA conversion data is being silently suppressed. Maybe ecommerce events are missing the revenue parameter and GA4 shows $0 for transactions that are clearly happening. The tracking runs, but the data it produces isn't reliable.

The container hygiene profile

Hygiene is measured across five dimensions, each computed independently. There's no combined grade because the dimensions measure different things and have different remediation paths.

Naming

How consistently your tags, triggers, and variables follow a naming convention. TagManifest detects the dominant pattern in your container (separator character, prefix structure) and measures what percentage of items follow it. It also counts default names: tags still called "Tag 1" or "Untitled Variable," which indicate items that were created and never properly named.

Clean: 80%+ consistency and two or fewer default names
Mixed: 50-79% consistency or up to five default names
Cluttered: Below 50% consistency or more than five default names

Organization

Whether your container uses GTM's folder system to group related tags. TagManifest measures folder utilization (what percentage of tags are in a folder) and checks for empty folders, which usually indicate abandoned organizational attempts.

Clean: 80%+ of tags in folders and no empty folders
Mixed: 50-79% folder utilization
Cluttered: Below 50% folder utilization

Variables

The ratio of orphaned variables to total variables. An orphaned variable is one that isn't referenced by any tag or trigger in the container. Some orphans are intentional (variables used by Custom HTML JavaScript, for example), but a high orphan ratio usually means variables were created for tags that no longer exist.

Clean: 20% or fewer orphaned
Mixed: 21-50% orphaned
Cluttered: More than 50% orphaned

Triggers

Same logic as variables. Orphaned triggers are triggers not connected to any tag. In containers that have gone through multiple iterations, orphaned triggers accumulate when tags get deleted but their triggers don't.

Clean: 20% or fewer orphaned
Mixed: 21-50% orphaned
Cluttered: More than 50% orphaned

Ownership

How many distinct contributor patterns TagManifest detects in your naming conventions. If some tags start with "GA4 -" and others with "gtm_" and others with "Agency_Facebook_", that's three contributor patterns suggesting three different people or teams have worked in the container at different times. More patterns means more naming debt and less coherence.

Clean: Two or fewer contributor patterns
Mixed: Three to four patterns
Cluttered: Five or more patterns

Reading the two together

The value of the two-dimension model shows up in the combinations.

High health, Clean hygiene is the container you hope for. Tracking works, and someone else could take it over without a week of archaeology. These are rare, and they're usually containers that have had consistent ownership from a single person or team.

High health, Cluttered hygiene is the most common state for containers that have been actively used for several years. The tracking works because someone kept it working, but the naming is inconsistent, the folders are a mess, and there are orphaned triggers from three agencies ago. If you've inherited one of these, the good news is that the data is reliable. The work ahead is organizational, not functional.

Low health, Clean hygiene is the alarming combination. Everything is neatly named and well-organized, but the consent configuration is wrong, the ecommerce events are missing parameters, or the GA4 tags are sending data to the wrong measurement ID. These containers look professional but aren't doing their job correctly. They're the reason functional health and hygiene are scored separately.

Low health, Cluttered hygiene is the container that needs the most work, but it's also the most straightforward to prioritize. Start with the functional health findings (they affect live data), then use the hygiene profile to guide cleanup once the tracking is reliable.

In practice, the hygiene profile also helps explain the functional health score. A container with five contributor patterns and inconsistent naming almost always has consent or tracking misconfigurations, because the conditions that produce naming debt are the same conditions that produce configuration drift. When you see Cluttered ownership alongside a D on functional health, the root cause is usually the same: too many people making changes without a shared framework for how the container should work.

What to do about your score

The score is a diagnostic, not a judgment. A D+ isn't a failure. It's a container that's been actively used by multiple people over several years, which is exactly how containers get messy. TagManifest describes the state without moralizing about it.

The more useful output is the work plan, which takes the same findings that produced your score and organizes them by effort: quick wins you can ship in an hour, focused remediation for a morning's work, structural changes that need planning, and strategic improvements for the roadmap. The consent dashboard is one example of how a specific category's findings translate to that effort framing.

Your score will change as you address findings. Fix the consent configuration and the 16-point consent cap might drop to 4. Clean up dead UA tags and the dead code deductions disappear entirely. The score reflects the container's current state, and it updates every time you scan a new version.