As projects grow, test suites grow with them.
What starts as a 2-minute CI run becomes 15 minutes. Then 30. Eventually, developers wait longer for feedback than they spend writing code.
The problem isnβt too many tests. The problem is running all tests for every change.
This is where AI-assisted test selection can help.
Instead of executing the entire test suite on every commit, AI can predict which tests are most relevant for a given change β reducing CI build times without sacrificing confidence.
The Core Problem: Most Tests Arenβt Relevant to Most Changes
In a large repository:
- A frontend change rarely affects backend database tests.
- A documentation update shouldnβt trigger full integration suites.
- A small utility function change probably doesnβt require every end-to-end test.
Traditional CI pipelines often use simple rules:
- Run everything on every pull request.
- Or use path-based filters.
Path-based rules work initially, but they break down when dependencies become complex.
AI-based test selection tries to solve this by learning from history.
Example: Running Everything vs Running What Matters
Imagine a repository with:
- 2,000 unit tests
- 400 integration tests
- 50 end-to-end tests
Total CI runtime: 28 minutes.
A developer changes a small formatting function in utils/date.js.
β Traditional CI Behavior
CI runs:
- All unit tests
- All integration tests
- All end-to-end tests
Result: 28 minutes of runtime.
Most of those tests were unrelated.
How AI Test Selection Works
AI-based test selection uses historical CI data:
- Which files changed
- Which tests failed
- How long tests take
- Dependency relationships
- Flaky behavior patterns
Over time, the system learns patterns such as:
Changes in utils/date.js have only ever affected 12 unit tests.
Instead of running 2,450 tests, it runs 12.
Example: AI-Based Test Selection
β AI-Optimized CI Behavior
For the same change:
- Run 12 related unit tests
- Skip unrelated integration tests
- Skip unrelated end-to-end tests
Result: 90 seconds instead of 28 minutes.
Confidence is maintained because:
- AI uses historical failure patterns
- Full test suite still runs on main branch or nightly
- Critical smoke tests always run
Step 1: Collect Historical CI Data
AI cannot optimize test selection without data.
Your CI system must capture:
- File diffs per commit
- Test results
- Test runtime
- Failure patterns
CI platforms like Semaphore already track test reports and execution history.
This data becomes the training signal for AI models.
Step 2: Define Safety Guardrails
AI test selection should never blindly skip everything.
Safe implementations include:
- Always run smoke tests
- Always run security checks
- Always run full suite on main branch
- Run full suite periodically (nightly)
AI optimizes feedback loops: it does not eliminate validation.
Step 3: Integrate AI Into the CI Workflow
Conceptually, your pipeline might look like this:
blocks:
- name: Determine Tests
task:
jobs:
- name: AI Selection
commands:
- run-ai-test-selection
- name: Run Selected Tests
task:
jobs:
- name: Execute
commands:
- run-selected-tests
- name: Smoke Tests
task:
jobs:
- name: Smoke
commands:
- npm run smoke
AI determines which tests to execute. CI enforces execution and validation.
What About Flaky Tests?
AI can also detect patterns like:
- Test fails intermittently
- Test fails unrelated to code changes
- Test failures correlate with resource constraints
Instead of randomly retrying, AI can:
- Flag tests as flaky
- Adjust selection confidence
- Recommend investigation
This reduces noise and improves signal quality.
Risks of AI-Based Test Selection
AI optimization introduces new risks:
- False negatives β skipping a test that should have run
- Cold start problem β insufficient historical data
- Trust issues β developers unsure what was skipped
Mitigations include:
- Confidence thresholds
- Periodic full-suite runs
- Transparent reporting of selected tests
- Audit logs
Optimization should increase speed without reducing trust.
When AI Test Selection Makes Sense
AI-based test selection is most valuable when:
- Test suites exceed 10β15 minutes
- Repositories are large or monorepos
- Teams want faster pull request feedback
- Infrastructure costs are high
For small projects, deterministic rules may be enough.
Summary
AI can help optimize which tests to run by learning from historical CI data and predicting which tests are most relevant to a given code change.
Done correctly, this reduces CI runtime, lowers infrastructure costs, and improves developer feedback speed β without sacrificing confidence.
AI should act as an optimization layer, not a replacement for validation.
Smoke tests, security checks, and full-suite runs must remain part of your CI/CD strategy.
Frequently Asked Questions
What is AI test selection?
AI test selection uses machine learning to determine which tests are relevant to a specific code change.
Is AI test selection safe?
Yes, when combined with guardrails like smoke tests and periodic full test suite runs.
Can AI detect flaky tests?
Yes. By analyzing failure patterns over time, AI can identify tests that fail intermittently.
Does AI replace full test suites?
No. Full test suites should still run on main branches or scheduled intervals.
What data is required?
Historical test results, code change data, and execution patterns from your CI system.
Want to discuss this article? Join our Discord.