Skip to content

feat: add rowset planning and pruning#574

Merged
ethe merged 6 commits intomainfrom
issue-551-two-phase-read-step1
Feb 6, 2026
Merged

feat: add rowset planning and pruning#574
ethe merged 6 commits intomainfrom
issue-551-two-phase-read-step1

Conversation

@belveryin
Copy link
Collaborator

Why

  • RFC 0010 two-phase read needs planner-side RowSet handling and pruning to avoid unnecessary IO.
  • Scan planning needed early key-bounds/commit_ts pruning for memtables and SSTs.
  • Preserve schema extension + missing-column errors while introducing RowSet flow.

What

  • Added RowSet abstraction (total_rows + RowSelection) and wired it through ScanPlan for mutable, immutable, and SST sources.
  • Implemented key-bounds/commit_ts pruning for mutable + immutable memtables and SSTs (pre-metadata skip on stats).
  • Preserved scan schema extension and missing-column error behavior.
  • Added tests for RowSet behavior, memtable pruning, SST pre-metadata pruning, and missing-column errors.

Tests

  • cargo test
  • cargo clippy -- -D warnings

@belveryin belveryin marked this pull request as ready for review February 3, 2026 09:18
src/db/scan.rs Outdated
};
let row_set = match prune_result.row_selection().cloned() {
Some(selection) => {
let total_rows = if row_groups.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses file_total_rows when row_groups.is_empty(), but empty means "all pruned", not "all kept". This creates RowSet::all(file_total_rows) for fully-pruned SSTs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you’re right. row_groups.is_empty() means fully pruned, so using file_total_rows was wrong and created RowSet::all(file_total_rows) for skipped SSTs. I’ve fixed it to size the row set using only kept row-group rows, and I now skip SSTs with empty row sets at plan time. Added a regression test plan_scan_skips_fully_pruned_sst.

@belveryin belveryin requested a review from ethe February 4, 2026 09:42
* feat: wire parquet pushdown execution

* fix: keep residual predicate for schema-mismatched ssts

* refactor: avoid predicate columns in sst-only scans
@ethe ethe merged commit 029593c into main Feb 6, 2026
6 checks passed
@ethe ethe deleted the issue-551-two-phase-read-step1 branch February 6, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants