The locator tax nobody puts in the budget

Two or three senior engineers on test maintenance cost $75,000 to $120,000 a year. That is a staffing decision wearing a process costume, and it rarely shows up as a line item.

The cost hides in the sprint

Bug0 put numbers to what most engineering managers already feel: broken-test triage is high-cost, low-leverage work. Industry data puts the flaky-test tax around $2,200 per developer per month in large enterprises, and Functionize traces most of it to selectors that break when the UI shifts.

You see the bill as slow sprints, senior engineers pulled off feature work, and QA leads triaging red builds instead of reviewing coverage. None of that lands in a maintenance budget, so the spend stays invisible until you measure it.

What changed: agents read intent, not selectors

A class rename breaks a CSS selector. It does not change what the element is. Agents that read the accessibility tree repair the test against intent and flag the cases where the product actually regressed. The difference is which layer the test binds to:

// Brittle: binds to markup that refactors break
page.locator('.btn-primary.checkout-cta');

// Durable: binds to what the user sees, what an agent can re-derive
page.getByRole('button', { name: 'Pay now' });

Forrester estimates self-healing tooling cuts those maintenance costs by more than half, and the 2026 Playwright ecosystem survey reports the same direction across teams.

Model the token cost before you scale

Self-healing is not free. Plan for roughly 114K tokens per AI-assisted test run at CI scale. On a 500-test suite run daily, that adds up to a real bill, so treat it like any other infra cost. Pilot on a stable subset, measure the triage hours you recover, then expand to the suites that churn most.

The engineers you pay senior rates should review genuine regressions. If their weeks go to CSS-rename fallout, the case for self-healing closes fast.

The decision

Frame it as headcount, not tooling. Two senior engineers freed from selector triage is the return; the token spend and a pilot are the cost. Qadence and TestDino both land on the same split: let agents handle the mechanical repair, keep humans on the judgment calls. Run the pilot, recover the hours, and put the recovered time where it earns more than green builds.

$ echo "EOF · thanks for reading"

Written by

Tim Stacey

Lead quality engineer. Writes about testing strategy.

Playwright agents and the new QA skills gap

The locator tax nobody puts in the budget.

The cost hides in the sprint

What changed: agents read intent, not selectors

Model the token cost before you scale

The decision