// blog.md
Field notes from someone who tests living.
- 15
- when it matters
- markdown + vim
// post 015
featured · latest Where Test Health Belongs: CI Logs or an Observability Backend
Your suite emits pass rate and flake count every run, then buries them in a CI log nobody scrolls; export them over OTLP and a dashboard catches the rot.
#TestAutomation
#OpenTelemetry
#TestObservability
#CICD
#DevOps
cat test-health-ci-logs-vs-observability.md
$ cat test-health-ci-logs-vs-observability.md
# # Where test health belongs
Every run, your suite reports its pass
rate and flake count. The number lands
in a CI log and nobody scrolls back.
Export it over OTLP and pass rate flows
into the same Grafana board as prod...
// published
Retrying a flaky test deletes the evidence of a real bug Strategy A bug that fails one run in four passes CI 99.6 percent of the time under three retries. Quarantine the test instead and keep the signal. Three caching changes that take 80% off a GitHub Actions build Practice A cached ~/.npm drops a cold Node install from four minutes to thirty seconds, and two more cache changes take the rest of the pipeline down with it. My resume site ships behind 460 tests Meta I set the direction and Claude Code wrote the code and the tests; 247 unit tests and 213 browser tests are how I trust a site I never hand-wrote. GitHub Actions parallel steps and the matrix jobs you can retire Tools Three matrix jobs for lint, type-check, and unit tests pay three runner boots and an artifact handoff for concurrency that parallel steps fold back into one job. Contract Testing vs End-to-End: Where Integration Bugs Belong Strategy A contract test catches a renamed field in seconds; a 20-minute E2E suite catches it after booting six services. Put each test where it earns its minutes. k6 Script Authoring calibrates load tests to live traffic Tools Grafana Assistant reads your telemetry, finds endpoints by real RPS and p95, and generates a k6 script that inherits that profile. When AI can write every test, what ships to CI is the job Strategy AI-generated Playwright tests flake under 1.5%. The new problem is test explosion, and coverage intent is still yours to define. One click to fix a failing GitHub Actions run Tools Fix with Copilot puts a cloud agent on the failure: it investigates, pushes a fix, reruns CI, and tags you for review. 90% use AI in the IDE; the pipeline is another story Strategy JetBrains data: daily AI in the editor, almost none in CI/CD. The trust gap closes when AI reduces noise instead of adding it. Bitbucket Agentic Pipelines automates the chores Tools Define an agent block in bitbucket-pipelines.yml, scope it, tie it to an event. It drafts the docs and the coverage gaps; you review. Playwright 1.59 turns failures into reviewable evidence Tools The 1.59 agents plus screencast and browser.bind shift your job from chasing selectors to reviewing what the Healer did. k6 2.0 moves load-test authoring into the CLI Tools Grafana previewed k6 2.0 at GrafanaCON 2026: AI authoring in the CLI, an MCP server, and a Playwright-to-k6 converter. The locator tax nobody puts in the budget Strategy Broken-test triage is a staffing decision disguised as a process one. Here is the cost, and where AI self-healing pays it back. Playwright agents and the new QA skills gap Tools Playwright v1.56 put a Planner, Generator, and Healer in the test runner. The interesting part is what it asks of the engineers who own the suite.