Skip to content

Conversation

@TimBeyer
Copy link

⚠️ Review Notice

This fix was developed using Codex and Claude. Please review with extra care.

Problem

There is a race condition in segment PATCH updates that can cause incorrect flag evaluations. When a segment is updated via PATCH, the drf_writable_nested library processes nested rules and conditions in multiple steps within the serializer. Meanwhile, the segment read path (get_segments_from_cache) uses Django's prefetch_related which executes multiple sequential queries - one for segments, one for rules, one for conditions, etc.

The Race Condition

If a PATCH request commits between the read's prefetch queries, the reader can observe an inconsistent state:

  1. Reader queries rules → sees old rule IDs (before PATCH commits)
  2. PATCH commits → old rules and conditions are cascade-deleted, new rules created
  3. Reader queries conditions for the old rule IDs → returns empty (conditions were deleted)

The flag engine evaluates rules with empty conditions as True:

# flag_engine/segments/evaluator.py
matches_conditions = (
    rule.matching_function([...])
    if (conditions := rule.conditions)
    else True  # <-- Empty conditions = matches ALL identities!
)

This causes any identity to incorrectly match the segment during the race window, potentially enabling features for users who should not have them.

Reproduction

An integration test was added that reproduces this bug reliably:

  • api/tests/integration/environments/identities/test_integration_segment_patch_atomic.py

The test runs two concurrent loops for ~10 seconds:

  • PATCH loop: Continuously PATCHes a segment with the same ruleset (no rule IDs, forcing create+delete cycles)
  • Poll loop: Fetches identity feature states and records any unexpected enabled: true responses

Before the fix, the test fails within seconds as the poll loop catches the race condition.

Solution

Use REPEATABLE READ transaction isolation level for PostgreSQL when fetching segments with prefetch_related. This ensures all prefetch queries see the same database snapshot, preventing the race condition.

Changes

  1. environments/models.py - Environment.get_segments_from_cache():

    • Wrap queries in transaction.atomic()
    • Set REPEATABLE READ isolation level for PostgreSQL
    • Only attempt to set isolation level if not already in an atomic block (to avoid errors in nested transactions)
  2. projects/services.py - get_project_segments_from_cache():

    • Same fix applied
    • Return type changed from QuerySet[Segment] to list[Segment] to ensure queries execute within the transaction
  3. tests/unit/projects/test_unit_projects_models.py:

    • Updated test to work with list return type (len() instead of .count())

Why This Works

With REPEATABLE READ isolation, all queries within the transaction see data as it existed at the start of the transaction. Even if a PATCH commits during the read:

  • The reader sees either the complete old state OR the complete new state
  • Never an inconsistent mix of old rule IDs with deleted conditions

PostgreSQL-Specific

The fix only applies REPEATABLE READ for PostgreSQL (connection.vendor == "postgresql"). Other databases continue using their default isolation level. This is acceptable because:

  • Flagsmith primarily targets PostgreSQL in production
  • The race condition is less likely with SQLite's single-writer model

Testing

# Run the integration test (requires PostgreSQL)
DATABASE_URL=postgres://postgres:password@localhost:5432/flagsmith \
  pytest tests/integration/environments/identities/test_integration_segment_patch_atomic.py -v

# Run related unit tests
DATABASE_URL=postgres://postgres:password@localhost:5432/flagsmith \
  pytest tests/unit/environments/test_unit_environments_models.py -k "get_segments" \
         tests/unit/projects/test_unit_projects_models.py -k "get_segments_from_cache" -v

Test Plan

  • Integration test passes (previously failing)
  • Unit tests for get_segments_from_cache pass
  • Linting passes
  • Manual testing in staging environment recommended

🤖 Generated with Claude Code

TimBeyer and others added 4 commits January 13, 2026 18:03
Use REPEATABLE READ transaction isolation level for PostgreSQL when
fetching segments with prefetch_related. This prevents a race condition
where concurrent PATCH requests could cause readers to see old rule IDs
but find their conditions deleted, causing rules to evaluate with empty
conditions and incorrectly match all identities.

The fix wraps segment cache queries in an atomic block and sets
REPEATABLE READ isolation (PostgreSQL only) to ensure all prefetch
queries see the same database snapshot.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@TimBeyer TimBeyer requested a review from a team as a code owner January 13, 2026 18:23
@TimBeyer TimBeyer requested review from khvn26 and removed request for a team January 13, 2026 18:23
@vercel
Copy link

vercel bot commented Jan 13, 2026

@TimBeyer is attempting to deploy a commit to the Flagsmith Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the api Issue related to the REST API label Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant