How Can AI Help Optimize Which Tests to Run in CI/CD?

As projects grow, test suites grow with them.

What starts as a 2-minute CI run becomes 15 minutes. Then 30. Eventually, developers wait longer for feedback than they spend writing code.

The problem isn’t too many tests. The problem is running all tests for every change.

This is where AI-assisted test selection can help.

Instead of executing the entire test suite on every commit, AI can predict which tests are most relevant for a given change — reducing CI build times without sacrificing confidence.

The Core Problem: Most Tests Aren’t Relevant to Most Changes

In a large repository:

A frontend change rarely affects backend database tests.
A documentation update shouldn’t trigger full integration suites.
A small utility function change probably doesn’t require every end-to-end test.

Traditional CI pipelines often use simple rules:

Run everything on every pull request.
Or use path-based filters.

Path-based rules work initially, but they break down when dependencies become complex.

AI-based test selection tries to solve this by learning from history.

Example: Running Everything vs Running What Matters

Imagine a repository with:

2,000 unit tests
400 integration tests
50 end-to-end tests

Total CI runtime: 28 minutes.

A developer changes a small formatting function in utils/date.js.

❌ Traditional CI Behavior

CI runs:

All unit tests
All integration tests
All end-to-end tests

Result: 28 minutes of runtime.

Most of those tests were unrelated.

How AI Test Selection Works

AI-based test selection uses historical CI data:

Which files changed
Which tests failed
How long tests take
Dependency relationships
Flaky behavior patterns

Over time, the system learns patterns such as:

Changes in utils/date.js have only ever affected 12 unit tests.

Instead of running 2,450 tests, it runs 12.

Example: AI-Based Test Selection

✅ AI-Optimized CI Behavior

For the same change:

Run 12 related unit tests
Skip unrelated integration tests
Skip unrelated end-to-end tests

Result: 90 seconds instead of 28 minutes.

Confidence is maintained because:

AI uses historical failure patterns
Full test suite still runs on main branch or nightly
Critical smoke tests always run

Step 1: Collect Historical CI Data

AI cannot optimize test selection without data.

Your CI system must capture:

File diffs per commit
Test results
Test runtime
Failure patterns

CI platforms like Semaphore already track test reports and execution history.

This data becomes the training signal for AI models.

Step 2: Define Safety Guardrails

AI test selection should never blindly skip everything.

Safe implementations include:

Always run smoke tests
Always run security checks
Always run full suite on main branch
Run full suite periodically (nightly)

AI optimizes feedback loops: it does not eliminate validation.

Step 3: Integrate AI Into the CI Workflow

Conceptually, your pipeline might look like this:

blocks:
  - name: Determine Tests
    task:
      jobs:
        - name: AI Selection
          commands:
            - run-ai-test-selection


  - name: Run Selected Tests
    task:
      jobs:
        - name: Execute
          commands:
            - run-selected-tests


  - name: Smoke Tests
    task:
      jobs:
        - name: Smoke
          commands:
            - npm run smoke

AI determines which tests to execute. CI enforces execution and validation.

What About Flaky Tests?

AI can also detect patterns like:

Test fails intermittently
Test fails unrelated to code changes
Test failures correlate with resource constraints

Instead of randomly retrying, AI can:

Flag tests as flaky
Adjust selection confidence
Recommend investigation

This reduces noise and improves signal quality.

Risks of AI-Based Test Selection

AI optimization introduces new risks:

False negatives — skipping a test that should have run
Cold start problem — insufficient historical data
Trust issues — developers unsure what was skipped

Mitigations include:

Confidence thresholds
Periodic full-suite runs
Transparent reporting of selected tests
Audit logs

Optimization should increase speed without reducing trust.

When AI Test Selection Makes Sense

AI-based test selection is most valuable when:

Test suites exceed 10–15 minutes
Repositories are large or monorepos
Teams want faster pull request feedback
Infrastructure costs are high

For small projects, deterministic rules may be enough.

Summary

AI can help optimize which tests to run by learning from historical CI data and predicting which tests are most relevant to a given code change.

Done correctly, this reduces CI runtime, lowers infrastructure costs, and improves developer feedback speed — without sacrificing confidence.

AI should act as an optimization layer, not a replacement for validation.

Smoke tests, security checks, and full-suite runs must remain part of your CI/CD strategy.

Frequently Asked Questions

What is AI test selection?

AI test selection uses machine learning to determine which tests are relevant to a specific code change.

Is AI test selection safe?

Yes, when combined with guardrails like smoke tests and periodic full test suite runs.

Can AI detect flaky tests?

Yes. By analyzing failure patterns over time, AI can identify tests that fail intermittently.

Does AI replace full test suites?

No. Full test suites should still run on main branches or scheduled intervals.

What data is required?

Historical test results, code change data, and execution patterns from your CI system.

Want to discuss this article? Join our Discord.