You can now identify your flakiest tests through the API and have an AI agent fix them, at a cost of around $1 to $1.50 per fix.
What Shipped
Flaky Test Data in the API (Generally Available)
Flaky test data is now accessible programmatically through the sem-ai API. The API surfaces tests ranked by disruption count, along with metadata like the last failure timestamp and relevant logs. Find it on github.
Flaky Test Fix Skill
A new sem-ai skill lets an agent automatically fix flaky tests end-to-end. The agent pulls the highest-disruption tests from the API, gathers context around each failure, identifies the root cause, and implements a fix. It then attempts to verify the fix, first by running tests locally, and if that’s not possible, by spinning up Semaphore test boxes to run the test repeatedly across multiple machines. Since a single run is rarely enough to confirm a flaky test is resolved, the multi-machine approach is especially useful for high-confidence validation.
Benchmarking with Claude Opus 4.8 on high effort shows a typical cost of $1 to $1.50 per fix, covering analysis and solution generation.
Skill Quality Improvements
Four existing sem-ai skills were updated this week with additional examples. Agents were occasionally skipping skill instructions due to a lack of concrete examples to follow. Adding examples directly into the skill definitions improves agent adherence and makes sem-ai’s guidance more reliable in practice.
What’s Coming
User and organization management will be covered in an upcoming release, closing another gap in sem-ai’s API surface. The team is also continuing to improve existing skills and commands based on usage feedback.
Want to discuss this article? Join our Discord.