Documentation Index
Fetch the complete documentation index at: https://trunk-4cab4936-mintlify-migrate-docs-changes-1778515731.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What it is
Pending failure depth allows pull requests to wait until other pull requests behind them in the queue complete testing before getting removed from the queue. By default, a PR that fails testing will be evicted from the queue. The Pending Failure Depth feature allows a failed PR to remain in the queue for pull requests behind it so that testing can be finished before this eviction occurs. The number of PRs that the queue will wait for is the Pending Failure Depth. This depth is configurable and reflects the number of pull requests behind this one that should complete testing before eviction is assessed.Why use it
- Prevent queue stalls - When a PR fails, the queue doesn’t grind to a halt. Other PRs continue testing, assuming the failure was isolated. Keeps merge velocity high even during issues.
- Faster failure recovery - If PR #3 fails but PR #4 fixes the issue, both can be processed quickly because they tested in parallel. Without pending failure depth, you’d wait for #3 to fail, then wait for #4 to test sequentially.
- Optimize for your team size - Small teams benefit from lower values (fewer wasted tests), large teams benefit from higher values (maintain throughput despite occasional failures).
- Balance risk vs. throughput - Tune the setting to match your team’s tolerance for wasted CI resources vs. need for high queue velocity.
How to enable
Pending failure depth is set to zero by default and should be enabled after you’re confident in your basic queue setup.
Configuration options
Just getting started with tuning Pending Failure Depth? Try a value of 2, and work from there with your team to find the right balance.
- If your queue frequently stalls when PRs fail → Increase value
- If you see lots of wasted test runs (many PRs test then all fail) → Decrease value
- If your CI infrastructure is constrained → Use lower value (3-4)
- If you have abundant CI capacity → Use higher value (7-10)
Verify it’s working
When a PR fails, watch for:- ✅ Multiple other PRs continue testing (up to your configured depth)
- ✅ Queue doesn’t stop entirely
- ✅ Failed PR is removed, but others keep going
Tradeoffs and considerations
What you gain
- Queue never fully stops - Failures don’t block all subsequent PRs
- Faster recovery - Independent PRs can merge while others fail
- Tunable throughput - Adjust for your team’s needs
- Better CI utilization - Tests keep running instead of stopping
What you give up or risk
- Wasted CI resources - PRs may test against a state that includes failing PRs, then need to retest
- Cascading failures - If one PR breaks something, multiple subsequent PRs might fail before the issue is caught
- Complexity - More PRs testing simultaneously = harder to understand queue state
When to decrease pending failure depth
Lower the value (3-4) if:- Your tests are flaky (>5% flake rate) - Flaky tests cause false failures, leading to wasted retests
- CI resources are expensive/limited - Lower parallelism reduces waste
- PRs frequently conflict - Related changes often fail together, so testing them in parallel wastes resources
- You’re seeing excessive retests - Many PRs testing, failing, retesting pattern
When to increase pending failure depth
Raise the value (7-10) if:- Your queue stalls frequently when PRs fail - Low depth is blocking throughput
- PRs are mostly independent - Failures are isolated, not cascading
- You have abundant CI capacity - Waste isn’t a concern
- Large team, high PR volume - Need parallelism to maintain velocity
Understanding the cost
Example cost calculation: Scenario: Pending failure depth = 5, PR #101 fails testing- PRs #102, #103, #104, #105, #106 all test against a state including #101
- All 5 fail because #101 broke something
- All 5 retest after #101 is removed
- Wasted: 5 test runs
- Without pending failure depth, PRs would test sequentially (much slower)
- In most cases, failures ARE independent, so PRs merge successfully
- Occasional waste is preferable to frequent queue stalls
Common misconceptions
- Misconception: “Higher pending failure depth always means faster queue”
- Reality: Too high = wasted CI resources and cascading failures. Too low = queue stalls. The sweet spot depends on your team size and test stability.
- Misconception: “Pending failure depth should be set to 1 to avoid waste”
- Reality: Value of 1 means queue stops on every failure (defeats the purpose of predictive testing). Start at 5 and adjust.
- Misconception: “This setting isn’t important”
- Reality: Poorly tuned pending failure depth can either waste significant CI resources or cause frequent queue stalls. It’s worth monitoring and adjusting.
Next Steps
Initial setup:- Start with a small value (2)
- Monitor queue behavior
- Check metrics for wasted test runs
- Adjust based on observations
- Queue stalls frequently? → Increase depth
- Excessive retests (>15%)? → Decrease depth
- Make small adjustments and observe impact
- Metrics and monitoring - Track retest rate and queue throughput
- Watch for patterns: Do failures cascade? Are they independent?
- Adjust pending failure depth based on data
- Anti-flake protection - Reduce false failures first
- Batching - Understand how pending failure depth affects batch splitting
- Predictive testing - Read the full explanation of how these work together
- Too many wasted tests → Lower pending failure depth
- Queue stops on every failure → Increase pending failure depth
- Unclear which value to use → Start at 2, monitor for a week