implement merge skeleton#8276
Draft
joseph-isaacs wants to merge 7 commits into
Draft
Conversation
Add a lazy, order-preserving MergeN array that merges N compact branches into one array, routing output row i to branches[selector[i]]. This is the "mux" inverse of partitioning rows into branches by a predicate. - Spec documented in the module docs (invariants on selector, compact branch lengths, and union-of-nullabilities output type). - Optimized execute path for the two-branch boolean-selector case operating directly on bit buffers, with branch validity only materialized when the output is nullable. - scalar_at and validity implemented so the un-executed array compares equal to its canonical form. - try_new rejects more than two branches or a non-boolean selector with a TODO to extend it. - A private, test-only reference implementation built on execute_scalar acts as the oracle the fast path is checked against. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Make the MergeN selector non-nullable rather than silently mapping null to branch 0. The selector only records where each output row goes, which is always a definite decision; predicate nullability (a null WHEN falling through to "not matched") is resolved upstream by the caller, exactly as case_when does via to_mask_fill_null_false. Output nullability remains the union of the branches' value nullabilities, independent of the selector. Also drop the cursor mechanics from the spec doc in favor of a semantic description, and add a test that a nullable selector is rejected. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Rename the MergeN array encoding to Merge (MergeNArray -> MergeArray, MergeNArrayExt -> MergeArrayExt, id vortex.merge). It lives in crate::arrays and does not collide with the struct-field-merge scalar function, which is only referenced by fully-qualified path. Allow the selector to be a non-nullable boolean (exactly 2 branches) or a non-nullable unsigned integer (2 or more branches). All invariant checks now live in a single Merge::check, used by both the constructor and the VTable::validate hook. Move execution into a merge/execute module: execute/mod.rs dispatches on the selector type and panics for non-boolean selectors, and execute/bool.rs holds the optimized two-branch boolean implementation. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Execution dispatches on the first branch's value type (all branches share it, validated in Merge::check) rather than the selector type, which is an orthogonal, invariant concern. Non-boolean branches panic until their kernels exist. The boolean kernel still only implements the boolean two-branch selector form and panics otherwise, leaving generic-T and integer-selector routing for later. Also restore two test names mangled by an earlier global rename. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Expose the merge operation via ArrayBuiltins::merge, where `self` is the selector (mirroring zip, whose `self` is the mask). Also reword the module docs to drop the "mux" terminology. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Drop the intra-doc link to the private `execute` module from the public Merge module docs (rustdoc -D private-intra-doc-links). Take the branches in ArrayBuiltins::merge as an `impl IntoIterator<Item = ArrayRef>` instead of a Vec. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Note in ExprBuiltins that a `merge` expression builtin still needs to be added to mirror ArrayBuiltins::merge. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Merging this PR will degrade performance by 22.84%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
31.6 µs | 46.7 µs | -32.24% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.9 µs | 198.2 µs | -18.34% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.2 µs | 213.4 µs | -16.98% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/vortex-merge-function-type-CriOI (23626a5) with develop (e06d80b)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a
Mergearray that interleavesNcompact branches into one array, taking output rowifrombranches[selector[i]]and consuming each branch in order — spec: a non-nullable selector (boolean for exactly 2 branches, unsigned integer for 2+), all branches equal up to nullability with the output nullability their union,count(selector == j) == branches[j].len(), and output lengthselector.len().https://claude.ai/code/session_01FvDYsL7AtQxTY3uYqdevMY
Generated by Claude Code