<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Tim Stacey — Field notes</title>
  <subtitle>Field notes on testing and quality engineering by Tim Stacey.</subtitle>
  <link href="https://tim.sillysamoyed.com/atom.xml" rel="self"/>
  <link href="https://tim.sillysamoyed.com/blog"/>
  <id>https://tim.sillysamoyed.com/blog</id>
  <updated>2026-06-09T00:00:00.000Z</updated>
  <author><name>Tim Stacey</name></author>
  <entry>
    <title>Where Test Health Belongs: CI Logs or an Observability Backend</title>
    <link href="https://tim.sillysamoyed.com/blog/test-health-ci-logs-vs-observability"/>
    <id>https://tim.sillysamoyed.com/blog/test-health-ci-logs-vs-observability</id>
    <updated>2026-06-09T00:00:00.000Z</updated>
    <summary>Your suite emits pass rate and flake count every run, then buries them in a CI log nobody scrolls; export them over OTLP and a dashboard catches the rot.</summary>
    <category term="TestAutomation"/>
    <category term="OpenTelemetry"/>
    <category term="TestObservability"/>
    <category term="CICD"/>
    <category term="DevOps"/>
  </entry>
  <entry>
    <title>Retrying a flaky test deletes the evidence of a real bug</title>
    <link href="https://tim.sillysamoyed.com/blog/retries-hide-real-bugs"/>
    <id>https://tim.sillysamoyed.com/blog/retries-hide-real-bugs</id>
    <updated>2026-06-07T00:00:00.000Z</updated>
    <summary>A bug that fails one run in four passes CI 99.6 percent of the time under three retries. Quarantine the test instead and keep the signal.</summary>
    <category term="TestAutomation"/>
    <category term="FlakyTests"/>
    <category term="CICD"/>
    <category term="SoftwareTesting"/>
    <category term="DevOps"/>
  </entry>
  <entry>
    <title>Three caching changes that take 80% off a GitHub Actions build</title>
    <link href="https://tim.sillysamoyed.com/blog/github-actions-cache-strategy"/>
    <id>https://tim.sillysamoyed.com/blog/github-actions-cache-strategy</id>
    <updated>2026-06-04T00:00:00.000Z</updated>
    <summary>A cached ~/.npm drops a cold Node install from four minutes to thirty seconds, and two more cache changes take the rest of the pipeline down with it.</summary>
    <category term="GitHubActions"/>
    <category term="CICD"/>
    <category term="DevOps"/>
    <category term="SoftwareDevelopment"/>
    <category term="TestAutomation"/>
  </entry>
  <entry>
    <title>My resume site ships behind 460 tests</title>
    <link href="https://tim.sillysamoyed.com/blog/resume-site-behind-460-tests"/>
    <id>https://tim.sillysamoyed.com/blog/resume-site-behind-460-tests</id>
    <updated>2026-06-04T00:00:00.000Z</updated>
    <summary>I set the direction and Claude Code wrote the code and the tests; 247 unit tests and 213 browser tests are how I trust a site I never hand-wrote.</summary>
    <category term="Astro"/>
    <category term="StaticSite"/>
    <category term="Playwright"/>
    <category term="ContinuousIntegration"/>
    <category term="TestAutomation"/>
  </entry>
  <entry>
    <title>GitHub Actions parallel steps and the matrix jobs you can retire</title>
    <link href="https://tim.sillysamoyed.com/blog/github-actions-parallel-steps"/>
    <id>https://tim.sillysamoyed.com/blog/github-actions-parallel-steps</id>
    <updated>2026-06-02T00:00:00.000Z</updated>
    <summary>Three matrix jobs for lint, type-check, and unit tests pay three runner boots and an artifact handoff for concurrency that parallel steps fold back into one job.</summary>
    <category term="GitHubActions"/>
    <category term="CICD"/>
    <category term="DevOps"/>
    <category term="TestAutomation"/>
    <category term="SoftwareDevelopment"/>
  </entry>
  <entry>
    <title>Contract Testing vs End-to-End: Where Integration Bugs Belong</title>
    <link href="https://tim.sillysamoyed.com/blog/contract-testing-vs-e2e"/>
    <id>https://tim.sillysamoyed.com/blog/contract-testing-vs-e2e</id>
    <updated>2026-06-01T00:00:00.000Z</updated>
    <summary>A contract test catches a renamed field in seconds; a 20-minute E2E suite catches it after booting six services. Put each test where it earns its minutes.</summary>
    <category term="ContractTesting"/>
    <category term="Microservices"/>
    <category term="APITesting"/>
    <category term="TestAutomation"/>
    <category term="CICD"/>
  </entry>
  <entry>
    <title>k6 Script Authoring calibrates load tests to live traffic</title>
    <link href="https://tim.sillysamoyed.com/blog/k6-script-authoring-live-telemetry"/>
    <id>https://tim.sillysamoyed.com/blog/k6-script-authoring-live-telemetry</id>
    <updated>2026-05-26T00:00:00.000Z</updated>
    <summary>Grafana Assistant reads your telemetry, finds endpoints by real RPS and p95, and generates a k6 script that inherits that profile.</summary>
    <category term="PerformanceTesting"/>
    <category term="k6"/>
    <category term="Grafana"/>
    <category term="TestAutomation"/>
    <category term="DevOps"/>
  </entry>
  <entry>
    <title>When AI can write every test, what ships to CI is the job</title>
    <link href="https://tim.sillysamoyed.com/blog/playwright-ai-test-explosion"/>
    <id>https://tim.sillysamoyed.com/blog/playwright-ai-test-explosion</id>
    <updated>2026-05-24T00:00:00.000Z</updated>
    <summary>AI-generated Playwright tests flake under 1.5%. The new problem is test explosion, and coverage intent is still yours to define.</summary>
    <category term="Playwright"/>
    <category term="TestAutomation"/>
    <category term="SoftwareTesting"/>
    <category term="AI"/>
    <category term="CICD"/>
  </entry>
  <entry>
    <title>One click to fix a failing GitHub Actions run</title>
    <link href="https://tim.sillysamoyed.com/blog/github-copilot-fixes-failing-ci"/>
    <id>https://tim.sillysamoyed.com/blog/github-copilot-fixes-failing-ci</id>
    <updated>2026-05-21T00:00:00.000Z</updated>
    <summary>Fix with Copilot puts a cloud agent on the failure: it investigates, pushes a fix, reruns CI, and tags you for review.</summary>
    <category term="GitHubActions"/>
    <category term="CICD"/>
    <category term="TestAutomation"/>
    <category term="DevOps"/>
    <category term="SoftwareDevelopment"/>
    <category term="Playwright"/>
  </entry>
  <entry>
    <title>90% use AI in the IDE; the pipeline is another story</title>
    <link href="https://tim.sillysamoyed.com/blog/ai-cicd-adoption-gap"/>
    <id>https://tim.sillysamoyed.com/blog/ai-cicd-adoption-gap</id>
    <updated>2026-05-19T00:00:00.000Z</updated>
    <summary>JetBrains data: daily AI in the editor, almost none in CI/CD. The trust gap closes when AI reduces noise instead of adding it.</summary>
    <category term="CICD"/>
    <category term="DevOps"/>
    <category term="TestAutomation"/>
    <category term="SoftwareDevelopment"/>
    <category term="AITesting"/>
  </entry>
  <entry>
    <title>Bitbucket Agentic Pipelines automates the chores</title>
    <link href="https://tim.sillysamoyed.com/blog/bitbucket-agentic-pipelines"/>
    <id>https://tim.sillysamoyed.com/blog/bitbucket-agentic-pipelines</id>
    <updated>2026-05-17T00:00:00.000Z</updated>
    <summary>Define an agent block in bitbucket-pipelines.yml, scope it, tie it to an event. It drafts the docs and the coverage gaps; you review.</summary>
    <category term="Bitbucket"/>
    <category term="DevOps"/>
    <category term="CICD"/>
    <category term="TestAutomation"/>
    <category term="SoftwareDevelopment"/>
    <category term="Playwright"/>
  </entry>
  <entry>
    <title>Playwright 1.59 turns failures into reviewable evidence</title>
    <link href="https://tim.sillysamoyed.com/blog/playwright-1-59-healer-agent-ci"/>
    <id>https://tim.sillysamoyed.com/blog/playwright-1-59-healer-agent-ci</id>
    <updated>2026-05-14T00:00:00.000Z</updated>
    <summary>The 1.59 agents plus screencast and browser.bind shift your job from chasing selectors to reviewing what the Healer did.</summary>
    <category term="Playwright"/>
    <category term="TestAutomation"/>
    <category term="AITesting"/>
    <category term="QA"/>
    <category term="CI"/>
  </entry>
  <entry>
    <title>k6 2.0 moves load-test authoring into the CLI</title>
    <link href="https://tim.sillysamoyed.com/blog/k6-2-ai-performance-testing"/>
    <id>https://tim.sillysamoyed.com/blog/k6-2-ai-performance-testing</id>
    <updated>2026-05-12T00:00:00.000Z</updated>
    <summary>Grafana previewed k6 2.0 at GrafanaCON 2026: AI authoring in the CLI, an MCP server, and a Playwright-to-k6 converter.</summary>
    <category term="PerformanceTesting"/>
    <category term="TestAutomation"/>
    <category term="k6"/>
    <category term="Grafana"/>
    <category term="AI"/>
  </entry>
  <entry>
    <title>The locator tax nobody puts in the budget</title>
    <link href="https://tim.sillysamoyed.com/blog/playwright-mcp-locator-tax"/>
    <id>https://tim.sillysamoyed.com/blog/playwright-mcp-locator-tax</id>
    <updated>2026-05-12T00:00:00.000Z</updated>
    <summary>Broken-test triage is a staffing decision disguised as a process one. Here is the cost, and where AI self-healing pays it back.</summary>
    <category term="EngineeringLeadership"/>
    <category term="SoftwareEngineering"/>
    <category term="TestAutomation"/>
    <category term="DevProductivity"/>
    <category term="QualityAssurance"/>
  </entry>
  <entry>
    <title>Playwright agents and the new QA skills gap</title>
    <link href="https://tim.sillysamoyed.com/blog/playwright-ai-agents"/>
    <id>https://tim.sillysamoyed.com/blog/playwright-ai-agents</id>
    <updated>2026-05-10T00:00:00.000Z</updated>
    <summary>Playwright v1.56 put a Planner, Generator, and Healer in the test runner. The interesting part is what it asks of the engineers who own the suite.</summary>
    <category term="Playwright"/>
    <category term="SoftwareTesting"/>
    <category term="AI"/>
    <category term="TestAutomation"/>
    <category term="QualityAssurance"/>
  </entry>
</feed>
