[codex] add row encoder order fuzzer by joseph-isaacs · Pull Request #8268 · vortex-data/vortex

joseph-isaacs · 2026-06-05T12:59:56Z

Summary

Adds a native row_order fuzz target in the regular vortex-fuzz crate. The target generates pairs of same-schema column batches over row-encoder-supported dtypes, encodes both sides with vortex-row, and checks every left/right row comparison against a recursive Vortex scalar-space ordering oracle.

The generated schemas cover null, bool, primitive, decimal up to Decimal128, UTF-8, binary, struct, and fixed-size-list dtypes. Unsupported row encoder types such as variable-size list, variant, union, extension, and Decimal256 are intentionally excluded.

Validation

cargo check -p vortex-fuzz --bin row_order
cargo +nightly fmt --all
cargo clippy -p vortex-fuzz --all-targets --all-features -- -D warnings
cargo run -p vortex-fuzz --bin row_order -- -runs=1000
git diff --check

Adds `vortex-row`, which encodes N columnar arrays into a single byte-comparable `ListView<u8>` (the Vortex analogue of arrow-row) for use as sort/row keys. Encoding runs as two scalar functions behind the `RowEncoder` API: a `RowSize` sizing/classification pass and a `RowEncode` pass that allocates one contiguous buffer and writes each column left-to-right into its per-row slot. Per-column ordering (`RowSortField`) controls ascending/ descending and null placement. Includes the byte codec for fixed-width, varlen, and nested canonical types, the `convert_columns`/`compute_row_sizes` helpers, round-trip + invariant tests, and arrow-row-baselined throughput benches. The crate is marked `publish = false` for now, so no public-api.lock is tracked. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Add a CodSpeed shard for `vortex-row` so the `row_encode` divan benchmarks (vortex vs arrow-row) build and run in CI alongside the other crates. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

The row encoder builds the output `(elements, offsets, sizes)` triple itself, so the invariants `ListViewArray::try_new` validates (monotone offsets, per-row slices within bounds and disjoint) already hold by construction. Skip the revalidation walk via `new_unchecked`. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Introduce `ValidityKind`/`resolve_validity`: resolve a column's validity once, materializing the per-row mask only when the column may actually contain nulls. The size pass for varbinview and the bool and primitive encoders now branch once on validity, so the all-valid path drops the per-row `mask.value(i)` check (and mask allocation) entirely. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Every byte of the output range is written by some encoder: fixed-width null rows write sentinel + explicit zero-fill, varlen encoders zero-pad their final partial block, and struct/FSL null parent bodies are overwritten with the canonical null encoding. The pre-zero-init memset is therefore redundant, so replace it with `set_len`, saving a `total_len`-byte memset per call. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Materialize the listview offsets buffer with `set_len` + a slice write instead of per-row `push`. For the pure-fixed path, `iter_mut().enumerate()` lets LLVM auto-vectorize `offsets[i] = i * fixed_per_row` (no per-element bounds or capacity checks). `nrows` is validated to fit u32 at function entry, so the cast is exact. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Write the mixed (fixed + varlen) offsets through `iter_mut().zip` with wrapping arithmetic, mirroring the pure-fixed path: this elides per-element bounds checks so the `i * fixed_per_row` multiply auto-vectorizes while the varlen prefix sum stays a cheap sequential accumulator. The total is validated to fit u32 upstream, so the wrapping operations never actually wrap. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

…ping The varlen body writer was a per-byte XOR loop. Split it into an ascending fast path (`copy_nonoverlapping` of each 32-byte block plus a single stamped continuation byte, then a partial final block) and a descending path that XORs a u64 at a time via `xor_copy_block` for a vectorizable inner loop. The emitted bytes are identical to the previous implementation for every length and direction (full-block counts and final length byte match exactly); only the write strategy changes. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Replace the `with_iterator` traversal in `encode_varbinview` with a direct walk over the view array: cache the data-buffer slices once, then for each row read the bytes straight from the inlined view slot or the referenced buffer at `offset..offset+len`. This drops the iterator's per-row option/bounds machinery. Validity is resolved once via `resolve_validity`, keeping the no-nulls path branch-free on validity. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

The auto-vectorized offset loops and the varlen block writer used raw `as` casts that trip this crate's `cast_possible_truncation` lint. Iterate a `u32` counter instead of casting `usize` per element, and use `u8`/`u32` `try_from` for the varlen final-block length byte and total byte count. No behavior change. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Classify each column in the size pass (`ColKind` + `first_varlen_idx`): a fixed-width column with no varlen column before it has a constant within-row offset, so its write position is pure arithmetic (`i * fixed_per_row + prefix + var_prefix[i]`) with no per-row cursor. Route those columns through `field_encode_fixed_arithmetic`; the cursor path is seeded to start at the first varlen column. Primitive columns in the pure-fixed case use a `chunks_exact_mut` hot loop (matching arrow-row's not-null path); all other fixed types reuse the cursor encoder at the computed offsets, so output is byte-identical. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Run the vortex-row row_encode benchmarks as part of the existing 'Storage formats' shard rather than adding a dedicated ninth shard. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

FSST is not order-preserving, so row keys must be the decompressed bytes; the only strategy today is decompress to a canonical VarBinView then row-encode it. This bench measures that path and its two phases (decompress-only, and row-encode of an already-decompressed column) on compressible multi-block strings, to quantify the opportunity for a future fused FSST row-encode kernel: the phases are additive (decompress ~46%, row-encode ~54%), and the row-encode phase re-reads/re-writes the decompressed bytes a fused kernel could emit once. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Apply nightly rustfmt formatting to the FSST benchmark added in the previous commit. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Adds `fsst_fast_fused`: bulk-decompresses the FSST code heap straight into a contiguous buffer (no intermediate VarBinViewArray) and block-encodes rows directly into the row-key ListView using the stored uncompressed_lengths (free size pass), with the same no-zero-init / no-extra-copy techniques as the row encoder. Lets us compare the fused path head-to-head against decode-then-convert. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Adds `fsst_fast_scatter`: keeps FSST's fast contiguous bulk decompressor but runs it into a cache-resident scratch one row-batch at a time, scattering each row into block form from cache so the decompressed bytes never round-trip through main memory. A one-time assert_arrays_eq! check confirms it produces byte-identical row keys to the straightforward fused path. Result: fast_scatter is on par with fast_fused (no speedup) — the decompressed buffer is already consumed cache-warm in the simple fused path, so avoiding the round-trip saves nothing; the workload is CPU-bound on FSST symbol decode plus block-copy. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-06-05T13:06:14Z

Merging this PR will degrade performance by 10.78%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 3 regressed benchmarks
✅ 1503 untouched benchmarks
🆕 6 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`compare[15]`	119.9 µs	145.9 µs	-17.84%
❌	Simulation	`compare[14]`	117.5 µs	141.6 µs	-17.04%
❌	Simulation	`compare[13]`	115.5 µs	137.8 µs	-16.17%
⚡	Simulation	`compare[5]`	76.9 µs	69.4 µs	+10.89%
🆕	Simulation	`primitive_i64_arrow_row`	N/A	2.4 ms	N/A
🆕	Simulation	`primitive_i64_vortex`	N/A	1.5 ms	N/A
🆕	Simulation	`struct_mixed_arrow_row`	N/A	18.7 ms	N/A
🆕	Simulation	`struct_mixed_vortex`	N/A	22.9 ms	N/A
🆕	Simulation	`utf8_arrow_row`	N/A	8.6 ms	N/A
🆕	Simulation	`utf8_vortex`	N/A	9.3 ms	N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing codex/row-encoder-order-fuzzer (e1085e9) with develop (17bee53)¹}

No successful run was found on develop (7482863) during the generation of this report, so 17bee53 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

joseph-isaacs added 19 commits June 4, 2026 14:28

ci(vortex-row): run row_encode benchmarks on CodSpeed

083c7f3

Add a CodSpeed shard for `vortex-row` so the `row_encode` divan benchmarks (vortex vs arrow-row) build and run in CI alongside the other crates. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

ci(vortex-row): fold row_encode benchmarks into CodSpeed shard 8

2fc07fa

Run the vortex-row row_encode benchmarks as part of the existing 'Storage formats' shard rather than adding a dedicated ninth shard. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

vortex-row: rustfmt the fsst row-encode benchmark

d6f1f4e

Apply nightly rustfmt formatting to the FSST benchmark added in the previous commit. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

fix

b3411f1

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

fix

a213cdd

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

add row encoder order fuzzer

9e2f143

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Base automatically changed from claude/nice-archimedes-yjGyO to develop June 5, 2026 14:09

joseph-isaacs added the changelog/skip Do not list PR in the changelog label Jun 5, 2026

Merge branch 'develop' into codex/row-encoder-order-fuzzer

e1085e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add row encoder order fuzzer#8268

[codex] add row encoder order fuzzer#8268
joseph-isaacs wants to merge 20 commits into
developfrom
codex/row-encoder-order-fuzzer

joseph-isaacs commented Jun 5, 2026

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joseph-isaacs commented Jun 5, 2026

Summary

Validation

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 10.78%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading