feat(vortex-array): add Interleave array encoding#8277
Draft
joseph-isaacs wants to merge 3 commits into
Draft
Conversation
Adds an `Interleave` array: a lazy, random-access gather of `N` value arrays into one array, taking output row `i` from `values[array_indices[i]][row_indices[i]]`. It is the random-access analog of `Merge` — instead of consuming each branch under a cursor, `row_indices` names an explicit position, so rows may be reordered, skipped, or repeated. The layout mirrors `Merge`: an array encoding with `N` value children plus two non-nullable selector children (`array_indices`, `row_indices`), a single `check` source of truth for invariants, value-type-dispatched execution with an optimized boolean kernel, oracle-backed tests, and an `ArrayBuiltins::interleave` constructor. As with the merge skeleton, only the boolean (two-value) selector form is wired into execution; integer selectors construct but panic on execute. Signed-off-by: Claude <noreply@anthropic.com>
…gned Removes the boolean two-value `array_indices` special case from the `Interleave` encoding. `array_indices` is now always a non-nullable unsigned integer indexing into `values`, unifying selector validation in `check` (which remains the single source of truth used by both `try_new` and `validate`). With the boolean selector gone, the boolean-value execute kernel now implements the (previously panicking) integer-selector path directly: it gathers `N` boolean values routed by unsigned `array_indices` / `row_indices`, so multi-value interleaves execute end to end. Tests are updated to build unsigned selectors and now cover a three-value random-access gather. Signed-off-by: Claude <noreply@anthropic.com>
The Interleave module docs linked to `Merge`, which does not exist on `develop` (it lives in a separate, not-yet-merged PR). Under `-D rustdoc::broken-intra-doc-links` this failed the docs build. Demote the references to plain code spans so the docs build standalone. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will improve performance by 14.5%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
164.1 µs | 143.3 µs | +14.5% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/interleave-method-6BEge (65b6a4c) with develop (e06d80b)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an
Interleavearray: a lazy, random-access gather ofNvalue arrays into one array, taking output rowifromvalues[array_indices[i]][row_indices[i]].It is the random-access analog of the
Mergeskeleton: instead of consuming each branch in cursor order under a selector,row_indicesnames an explicit position within the selected value, so rows may be reordered, skipped, or repeated.Mergeis the special case where each value is consumed front-to-back exactly once.Layout (mirrors
Merge)vortex-array/src/arrays/interleave/mod.rs— the encoding (Interleave/InterleaveArray/InterleaveData),InterleaveArrayExtaccessors (num_values,value,array_indices,row_indices), theInterleave::checkinvariant source of truth,try_new, theVTable,OperationsVTable::scalar_at(direct gather),ValidityVTable::validity(inner non-nullable interleave of the values' validities), and oracle-backed tests.vortex-array/src/arrays/interleave/execute/{mod.rs,bool.rs}— value-type dispatch plus the boolean gather kernel.vortex-array/src/arrays/mod.rs— module registration.vortex-array/src/builtins.rs—ArrayBuiltins::interleaveconstructor (+ anExprBuiltinsTODO), mirroringArrayBuiltins::merge.Spec / invariants
Nvalue children + two selector children:array_indicesandrow_indices, both non-nullable unsigned integers of equal length (the output length).array_indices[i] < values.len()androw_indices[i] < values[array_indices[i]].len()— per-row bounds depend on selector values, so they are a runtime precondition checked in the execution kernel rather than at construction.Interleave::checkis the single source of truth for these invariants and is used by bothtry_newand theVTable::validatehook.Selector
array_indicesis always a non-nullable unsigned integer indexing intovalues(the boolean two-value special case was intentionally dropped). The boolean-value execute kernel handles the integer-selector path directly, so multi-value interleaves execute end to end; only non-boolean value types remain unimplemented (they construct but panic on execute, dispatched on value type).Checks
Run with
RUSTC_WRAPPER=(sandboxsccachenote in AGENTS.md):cargo build -p vortex-array✅cargo test -p vortex-array --lib interleave→ 13 passed ✅cargo +nightly fmt -p vortex-array(+--check) ✅cargo clippy -p vortex-array --all-targets --all-features✅Follow-ups
interleaveexpression builtin (TODO left inExprBuiltins).Generated by Claude Code