Feature flag platforms are excellent at the first half of the flag lifecycle. Creation, targeting, rollout, experimentation -- LaunchDarkly, Split.io, Unleash, and others handle these phases with genuine sophistication. SDKs evaluate flags in milliseconds, targeting rules segment users with precision, and experimentation engines measure statistical significance automatically.
But every one of these platforms shares a blind spot: cleaning up flag code from your codebase once a flag's job is done. The if/else branches, the dead code paths, the test cases for permanently unreachable logic -- all of it stays in your codebase long after the management platform shows a clean dashboard. This is not a bug. It is a structural incentive problem.
TL;DR: No feature flag platform automatically removes stale flag code from your codebase. LaunchDarkly tracks code references but cannot generate cleanup PRs. Split.io tracks flag usage but has no code-level detection. Unleash marks lifecycle states but relies on manual removal. Custom implementations have zero automated tooling. The cleanup gap exists because flag management (a runtime concern) and flag removal (a code transformation concern) are fundamentally different problems requiring different tools.
Why don't feature flag platforms clean up their own flags?
Two forces explain why every flag platform stops at the same point.
Misaligned incentives. Platforms bill by seats, monthly active users, or monthly tracked keys. More flags in production means more SDK evaluations, more targeting rules to manage, and more reasons to stay on the dashboard. A platform that aggressively removed flags would be reducing its own usage metrics. This is not cynical -- it is simply how SaaS incentive structures work.
Different technical domains. Flag management is a runtime evaluation problem -- evaluate targeting rules, consult user segments, return the correct value in single-digit milliseconds. This requires distributed systems expertise and highly optimized SDKs.
Flag cleanup is a code transformation problem -- parse syntax trees across multiple languages, identify which branch of a conditional to keep, eliminate dead code paths, remove unused imports, and update test files. This requires compiler-level tooling and multi-language AST parsing.
Asking a flag management platform to also do automated code refactoring is like asking your GPS to also fix the potholes it routes you around. Both involve roads, but the skills and tools required are entirely different.
What flag debt does LaunchDarkly miss?
LaunchDarkly has invested more in code-awareness than any other commercial platform. Its Code References feature (ld-find-code-refs) scans repositories, identifies where flag keys appear in source code, and surfaces those references in the dashboard. It can also mark flags as stale and archive them. This combination gives teams real visibility into flag state at both the platform and code level -- the most advanced code-level awareness of any flag management platform.
But here is where LaunchDarkly's capabilities end:
- It cannot generate cleanup PRs. Code References tells you that
checkout_service.go:47contains a reference tonew-checkout-flow. It does not generate the pull request that removes lines 45-55, eliminates theprocessLegacyCheckoutfunction, and deletes the test cases for the disabled path. - It cannot remove
if/elsebranches. Knowing where a flag is referenced is different from knowing how to safely remove the conditional logic surrounding it. The evaluation call is one line. The dead code it controls might span hundreds of lines across multiple functions. - It cannot delete associated test code. Stale flags typically have test cases covering both branches. Code References does not identify or remove test code for the dead path.
- It cannot coordinate cross-repo removal. For organizations where a single flag spans a backend service, a frontend application, and a mobile client, Code References shows per-repository references but does not orchestrate cross-repo cleanup.
- It cannot handle cross-language removal complexity. A flag evaluated in Go, TypeScript, and Python requires three different AST parsers to safely remove. Code References identifies references in all three but has no transformation capability in any of them.
The honest assessment: LaunchDarkly has the best code-level visibility in the market. But visibility and cleanup are different problems, and LaunchDarkly handles the first but not the second.
What flag debt does Split.io miss?
Split.io approaches the flag lifecycle from a data-driven perspective. The platform tracks flag evaluations in real time, identifies flags that have stopped receiving evaluations, and knows when experiments have concluded. This evaluation-level telemetry is valuable for understanding flag usage patterns.
But Split has no equivalent to Code References -- no code-level awareness at all:
- No code-level detection. Split cannot tell you which files in your codebase reference a specific flag. It knows the flag is being evaluated but not where in the source code that evaluation happens.
- Cannot quantify code impact. Without code scanning, Split cannot tell you that removing
experiment-pricing-v2would eliminate 340 lines of dead code across 8 files. - Cannot identify code-only references. Flag references in configuration files, test fixtures, or documentation exist outside the evaluation path. Split's telemetry cannot see them.
- No view of flag-to-code relationships. A flag might control a branch calling three functions, two used nowhere else. Split cannot identify that removing the flag would let you delete those orphaned functions too.
Split knows what is happening at runtime but is blind to what exists in the codebase. For cleanup, the code-level view is the one that matters.
What flag debt does Unleash miss?
Unleash provides the most explicit lifecycle model of the three platforms. Its markers -- created, pre-live, live, completed, archived -- map directly to flag lifecycle stages. The platform marks flags as "potentially stale" when they exceed their expected lifetime. This is well-designed and conceptually sound.
But like Split, Unleash has no code-level detection:
- Knows management state but not code state. A flag can be "archived" in Unleash while still referenced in 14 files across 3 services. The dashboard shows a completed lifecycle. The codebase tells a different story.
- No repository scanning. Unleash cannot tell you where a flag is used, how many lines of code are associated with it, or the blast radius of removing it.
- Relies on manual removal. When Unleash marks a flag as stale, the implicit next step is manual developer action. In practice, the cleanup ticket gets deprioritized behind feature work, context is lost as time passes, and the stale flag persists.
- Lifecycle markers create false completion. When a flag moves to "archived," the team feels the job is done. But the code-level lifecycle -- the part that creates technical debt -- remains unaddressed.
Unleash has the right conceptual framework. The gap is in execution: the lifecycle ends at "archived in the dashboard" rather than "removed from the codebase."
What flag debt do custom implementations miss?
Teams managing flags through environment variables, configuration files, database tables, or custom systems have the weakest tooling story:
- No centralized dashboard. Flag awareness is distributed across config files, database entries, and tribal knowledge.
- No lifecycle tracking. No system records when a flag was introduced, who created it, or when it was last changed.
- No staleness detection. No automated way to distinguish flags actively gating behavior from those permanently on or off.
- No code references or cleanup automation. Removing a flag means manually searching the codebase, understanding the conditional logic, and making changes by hand.
This is where we see the highest flag debt accumulation. Teams with custom implementations often do not know how many flags they have, let alone which ones are stale. Discovery happens only when someone touches code and asks "is this flag still used?" -- by which point the flag may have been stale for years.
How do you close the flag cleanup gap regardless of platform?
The solution is to complement your management platform with dedicated lifecycle tooling. Think of it as a two-layer stack:
Layer 1: Flag management (your existing platform). Handles creation, targeting, rollout, experimentation, and operational controls. LaunchDarkly, Split, Unleash, or a custom implementation -- whatever you use today.
Layer 2: Flag cleanup (dedicated lifecycle tooling). Handles staleness detection, code-level tracking, and automated removal. Works alongside your management platform, not instead of it.
A cleanup tool needs to do several things well:
Monitor repositories via AST parsing. Regex is not sufficient -- it finds string matches but cannot distinguish between a flag evaluation in an if/else branch, a flag key in a comment, and a flag key in a test fixture. AST parsing understands code structure, which is essential for safe removal.
Track the full lifecycle across pull requests. The tool should know when a flag first appears in a PR, how it evolves, and when it stops changing. This provides the timeline that neither management platforms nor point-in-time code scans can offer.
Generate safe cleanup PRs. The tool must determine which branch to keep, remove the dead branch, eliminate orphaned functions and imports, and update test files -- producing a clean, reviewable pull request.
Work across multiple languages. A flag evaluated in Go on the backend, TypeScript on the frontend, and Python in a data pipeline needs AST parsing support for all three.
FlagShark is purpose-built for this second layer. It uses tree-sitter for multi-language AST parsing, tracks flag lifecycles from the moment they appear in pull requests, and generates cleanup PRs when flags become stale. It works alongside any provider -- the management platform handles creation through rollout, and FlagShark handles staleness through removal.
For trade-offs between management providers, see our platform comparison. For how FlagShark compares to native provider capabilities, see our provider comparison and LaunchDarkly comparison.
The point is not that your management platform is failing -- it is doing what it was built to do. Cleanup is a separate problem requiring separate tooling, and the teams that accumulate the least flag debt invest in both layers from the start.
Key Takeaways
- No feature flag platform removes stale flag code from your codebase. The dead
if/elsebranches, unreachable code paths, and obsolete test cases remain after a flag is archived. - LaunchDarkly has the best code-level visibility via Code References, but visibility is not cleanup -- it cannot generate the pull request that removes a stale flag.
- Split.io has strong evaluation telemetry but no code-level awareness. It knows which flags are evaluated but cannot tell you which files reference them or how much dead code they create.
- Unleash has the most explicit lifecycle model but the lifecycle ends at the dashboard, not the codebase. Manual removal is where the process breaks down.
- Custom implementations accumulate the highest flag debt because they lack visibility, lifecycle tracking, and cleanup automation entirely.
- The solution is a two-layer stack: your management platform for creation through rollout, and a dedicated cleanup tool like FlagShark for staleness through removal.
People Also Ask
Does LaunchDarkly remove stale flag code?
No. LaunchDarkly's Code References feature identifies where flags appear in your codebase by scanning repositories for flag key strings, giving you code-level visibility in the dashboard. However, Code References cannot generate removal pull requests, eliminate dead if/else branches, delete unreachable code paths, or clean up associated test files. LaunchDarkly can archive a flag in the dashboard, but the corresponding code remains in your repositories until a developer manually removes it or a dedicated cleanup tool generates the removal PR.
How do you find stale feature flags in your codebase?
Finding stale flags requires two data sources: your flag management platform (which knows each flag's state) and your codebase (which contains the actual references). The most reliable approach is AST-level parsing across all languages your codebase uses, cross-referenced with flag provider state. AST parsing understands code structure rather than just matching strings, so it distinguishes between flag evaluations in production code, flag keys in comments, and flag keys in test fixtures. Regex-based search produces false positives and misses references that do not exactly match the flag key string.
What is the difference between flag management and flag cleanup?
Flag management is a runtime concern -- it answers "what value should this flag return for this user right now?" by evaluating targeting rules, consulting segments, and returning values in milliseconds. Flag cleanup is a code transformation concern -- it answers "how do I safely remove this flag's conditional logic from the codebase?" by parsing syntax trees, determining which branch to keep, eliminating dead code, and updating tests. The two require different capabilities: distributed systems for management, compiler-level tooling for cleanup.
Can you automate feature flag cleanup?
Yes. Automated cleanup requires tree-sitter or similar AST parsing that understands syntactic structure across multiple languages. The automation identifies flag evaluation calls in the AST, determines the resolved value (which branch to keep), removes the conditional logic and dead branch, and generates a pull request. Tools like FlagShark use tree-sitter to support 11 programming languages and generate cleanup PRs automatically when flags become stale. The generated PRs are reviewed and merged by developers -- the automation handles the tedious transformation while humans retain the final review step.