Skip to content

feat(vortex-array): add Interleave array encoding#8277

Draft
joseph-isaacs wants to merge 3 commits into
developfrom
claude/interleave-method-6BEge
Draft

feat(vortex-array): add Interleave array encoding#8277
joseph-isaacs wants to merge 3 commits into
developfrom
claude/interleave-method-6BEge

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

Adds an Interleave array: a lazy, random-access gather of N value arrays into one array, taking output row i from values[array_indices[i]][row_indices[i]].

It is the random-access analog of the Merge skeleton: instead of consuming each branch in cursor order under a selector, row_indices names an explicit position within the selected value, so rows may be reordered, skipped, or repeated. Merge is the special case where each value is consumed front-to-back exactly once.

Layout (mirrors Merge)

  • vortex-array/src/arrays/interleave/mod.rs — the encoding (Interleave / InterleaveArray / InterleaveData), InterleaveArrayExt accessors (num_values, value, array_indices, row_indices), the Interleave::check invariant source of truth, try_new, the VTable, OperationsVTable::scalar_at (direct gather), ValidityVTable::validity (inner non-nullable interleave of the values' validities), and oracle-backed tests.
  • vortex-array/src/arrays/interleave/execute/{mod.rs,bool.rs} — value-type dispatch plus the boolean gather kernel.
  • vortex-array/src/arrays/mod.rs — module registration.
  • vortex-array/src/builtins.rsArrayBuiltins::interleave constructor (+ an ExprBuiltins TODO), mirroring ArrayBuiltins::merge.

Spec / invariants

  • N value children + two selector children: array_indices and row_indices, both non-nullable unsigned integers of equal length (the output length).
  • array_indices[i] < values.len() and row_indices[i] < values[array_indices[i]].len() — per-row bounds depend on selector values, so they are a runtime precondition checked in the execution kernel rather than at construction.
  • All values share a logical type up to nullability; the output type is that shared type with the union of the values' nullabilities.
  • Interleave::check is the single source of truth for these invariants and is used by both try_new and the VTable::validate hook.

Selector

array_indices is always a non-nullable unsigned integer indexing into values (the boolean two-value special case was intentionally dropped). The boolean-value execute kernel handles the integer-selector path directly, so multi-value interleaves execute end to end; only non-boolean value types remain unimplemented (they construct but panic on execute, dispatched on value type).

Checks

Run with RUSTC_WRAPPER= (sandbox sccache note in AGENTS.md):

  • cargo build -p vortex-array
  • cargo test -p vortex-array --lib interleave → 13 passed ✅
  • cargo +nightly fmt -p vortex-array (+ --check) ✅
  • cargo clippy -p vortex-array --all-targets --all-features

Follow-ups

  • Non-boolean value kernels (primitive/varbin/etc.).
  • An interleave expression builtin (TODO left in ExprBuiltins).

Generated by Claude Code

claude added 2 commits June 5, 2026 22:30
Adds an `Interleave` array: a lazy, random-access gather of `N` value
arrays into one array, taking output row `i` from
`values[array_indices[i]][row_indices[i]]`. It is the random-access analog
of `Merge` — instead of consuming each branch under a cursor, `row_indices`
names an explicit position, so rows may be reordered, skipped, or repeated.

The layout mirrors `Merge`: an array encoding with `N` value children plus
two non-nullable selector children (`array_indices`, `row_indices`), a
single `check` source of truth for invariants, value-type-dispatched
execution with an optimized boolean kernel, oracle-backed tests, and an
`ArrayBuiltins::interleave` constructor. As with the merge skeleton, only
the boolean (two-value) selector form is wired into execution; integer
selectors construct but panic on execute.

Signed-off-by: Claude <noreply@anthropic.com>
…gned

Removes the boolean two-value `array_indices` special case from the
`Interleave` encoding. `array_indices` is now always a non-nullable unsigned
integer indexing into `values`, unifying selector validation in `check`
(which remains the single source of truth used by both `try_new` and
`validate`).

With the boolean selector gone, the boolean-value execute kernel now
implements the (previously panicking) integer-selector path directly: it
gathers `N` boolean values routed by unsigned `array_indices` / `row_indices`,
so multi-value interleaves execute end to end. Tests are updated to build
unsigned selectors and now cover a three-value random-access gather.

Signed-off-by: Claude <noreply@anthropic.com>
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Jun 5, 2026 — with Claude
The Interleave module docs linked to `Merge`, which does not exist on
`develop` (it lives in a separate, not-yet-merged PR). Under
`-D rustdoc::broken-intra-doc-links` this failed the docs build. Demote the
references to plain code spans so the docs build standalone.

Signed-off-by: Claude <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will improve performance by 14.5%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1512 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation encode_varbin[(1000, 2)] 164.1 µs 143.3 µs +14.5%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/interleave-method-6BEge (65b6a4c) with develop (e06d80b)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants