Skip to content

implement merge skeleton#8276

Draft
joseph-isaacs wants to merge 7 commits into
developfrom
claude/vortex-merge-function-type-CriOI
Draft

implement merge skeleton#8276
joseph-isaacs wants to merge 7 commits into
developfrom
claude/vortex-merge-function-type-CriOI

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

Adds a Merge array that interleaves N compact branches into one array, taking output row i from branches[selector[i]] and consuming each branch in order — spec: a non-nullable selector (boolean for exactly 2 branches, unsigned integer for 2+), all branches equal up to nullability with the output nullability their union, count(selector == j) == branches[j].len(), and output length selector.len().

https://claude.ai/code/session_01FvDYsL7AtQxTY3uYqdevMY


Generated by Claude Code

claude added 5 commits June 5, 2026 21:18
Add a lazy, order-preserving MergeN array that merges N compact branches
into one array, routing output row i to branches[selector[i]]. This is the
"mux" inverse of partitioning rows into branches by a predicate.

- Spec documented in the module docs (invariants on selector, compact
  branch lengths, and union-of-nullabilities output type).
- Optimized execute path for the two-branch boolean-selector case operating
  directly on bit buffers, with branch validity only materialized when the
  output is nullable.
- scalar_at and validity implemented so the un-executed array compares
  equal to its canonical form.
- try_new rejects more than two branches or a non-boolean selector with a
  TODO to extend it.
- A private, test-only reference implementation built on execute_scalar
  acts as the oracle the fast path is checked against.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Make the MergeN selector non-nullable rather than silently mapping null to
branch 0. The selector only records where each output row goes, which is
always a definite decision; predicate nullability (a null WHEN falling
through to "not matched") is resolved upstream by the caller, exactly as
case_when does via to_mask_fill_null_false. Output nullability remains the
union of the branches' value nullabilities, independent of the selector.

Also drop the cursor mechanics from the spec doc in favor of a semantic
description, and add a test that a nullable selector is rejected.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Rename the MergeN array encoding to Merge (MergeNArray -> MergeArray,
MergeNArrayExt -> MergeArrayExt, id vortex.merge). It lives in
crate::arrays and does not collide with the struct-field-merge scalar
function, which is only referenced by fully-qualified path.

Allow the selector to be a non-nullable boolean (exactly 2 branches) or a
non-nullable unsigned integer (2 or more branches). All invariant checks
now live in a single Merge::check, used by both the constructor and the
VTable::validate hook.

Move execution into a merge/execute module: execute/mod.rs dispatches on
the selector type and panics for non-boolean selectors, and execute/bool.rs
holds the optimized two-branch boolean implementation.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Execution dispatches on the first branch's value type (all branches share
it, validated in Merge::check) rather than the selector type, which is an
orthogonal, invariant concern. Non-boolean branches panic until their
kernels exist. The boolean kernel still only implements the boolean
two-branch selector form and panics otherwise, leaving generic-T and
integer-selector routing for later.

Also restore two test names mangled by an earlier global rename.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Expose the merge operation via ArrayBuiltins::merge, where `self` is the
selector (mirroring zip, whose `self` is the mask). Also reword the module
docs to drop the "mux" terminology.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Jun 5, 2026 — with Claude
claude added 2 commits June 5, 2026 22:02
Drop the intra-doc link to the private `execute` module from the public
Merge module docs (rustdoc -D private-intra-doc-links). Take the branches
in ArrayBuiltins::merge as an `impl IntoIterator<Item = ArrayRef>` instead
of a Vec.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Note in ExprBuiltins that a `merge` expression builtin still needs to be
added to mirror ArrayBuiltins::merge.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will degrade performance by 22.84%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 3 regressed benchmarks
✅ 1510 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 31.6 µs 46.7 µs -32.24%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 161.9 µs 198.2 µs -18.34%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 177.2 µs 213.4 µs -16.98%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/vortex-merge-function-type-CriOI (23626a5) with develop (e06d80b)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants