Skip to content

prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion#2140

Open
spkrka wants to merge 2 commits into
gitgitgadget:masterfrom
spkrka:lazy-prio-queue-pr
Open

prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion#2140
spkrka wants to merge 2 commits into
gitgitgadget:masterfrom
spkrka:lazy-prio-queue-pr

Conversation

@spkrka
Copy link
Copy Markdown

@spkrka spkrka commented Jun 6, 2026

Rene's lazy_queue wrapper in describe.c was a clever
optimization -- by deferring the get, a following put becomes
a simple replace, avoiding a full remove-rebalance-insert
cycle.

It turns out this pattern is so common in git's traversal code
that it makes sense to fold it into prio_queue itself. Gets
and puts are interleaved in virtually every commit walk, so
the fusion is essentially always a win.

This is mostly a code simplification -- three callers had
independently reimplemented the same optimization, and they
all collapse to plain get+put now. The 3-6% speedup on
traversal-heavy workloads is a nice bonus.

More details and benchmark numbers in the commit message.
Benchmarks were run on next which includes
kk/commit-reach-optim -- those results represent the more
realistic end state.

Related to but independent of the cascade sift-down work in
kk/prio-queue-cascade-sift -- the two can land in either
order.

Changes since v1:

  • Added a second commit that renames .nr to .nr_internal so
    that direct access from outside prio-queue.c is a compile
    error. Verified that after the rename, only prio-queue.c
    references nr_internal.

  • Added prio_queue_for_each() macro for callers that need to
    walk all elements (describe.c, show-branch.c, commit-reach.c,
    revision.c, negotiator/skipping.c).

  • Converted remaining .nr loop conditions to use
    prio_queue_get()/prio_queue_peek() as the loop condition,
    or prio_queue_size() where get/peek isn't suitable.

  • Fixed several callers missed in v1 (object-name.c,
    fetch-pack.c, path-walk.c, pack-bitmap-write.c,
    negotiator/default.c, negotiator/skipping.c, revision.c,
    builtin/last-modified.c).

CC: René Scharfe l.s.r@web.de
cc: Kristofer Karlsson krka@spotify.com

@spkrka spkrka force-pushed the lazy-prio-queue-pr branch from 518e1cc to f2d0e8c Compare June 6, 2026 13:02
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

The pull request has 687 commits. The max allowed is 30. Please split the patch series into multiple pull requests. Also consider squashing related commits.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

There are merge commits in this Pull Request:

1dd8236dfb4dcb23156d6449e99909f69a780e72
e36937bb5eddeffd9f06c3ad4db0ede6ac791eb7
04adce37b9e1e2265200ef4aa7a327c042a42416
8723880fc08b97899c10c847a91c2b69dfbd0b21
5766caf01e1fda2cd21c177fa4056f150dca619c
196ecde80b64e5f09cc15d3d7758e370beaba7cb
3ed53e1e2898928d13721f5b094ea77610025f28
c0dfe4a67d05886e16da79e00b9eaf1bec31092e
1d18160b915fa9f048bfb2fd9d0fd8b964730232
598a273b03e748a60d635ef2c55649333cf4a470
9eebe11338dbfd1aa697b04e33e9e2f10fe34d5a
8a42c5b2bca6c482cc906cf4309d6dfe334e569e
c47ef7d0f1ceaff3cfe9bbb7f54f022b38688da9
d3d6ef5197f72271e7baac02a8a46856f6d761ac
fb95f091ccdd9122a9dcb7831393204563ddede7
7a70817efa3735665c323d2a970584bff018b33d
7339897c7cd7bc48e037247725cf7a460ceeae71
eaeac8ef83491e77f95ff77760b5e67993f37d8d
66d5323bbc44eb35dc937110f5cd0e9539e24623
ad8abe7a5a178e340f8d36535f9667a4af368033

Please rebase the branch and force-push.

@spkrka spkrka force-pushed the lazy-prio-queue-pr branch 2 times, most recently from bbb2457 to 137dd0a Compare June 6, 2026 13:26
Defer the actual removal in prio_queue_get() until the next
operation.  If that next operation is a prio_queue_put(), the
removal and insertion are fused into a single replace — writing
the new element at the root and sifting it down — which avoids
a full remove-rebalance-insert cycle.

This matches the dominant usage pattern in git's commit traversal:
get a commit, then put its parents.  The first parent insertion
after each get is now a replace operation automatically.

This generalizes the lazy_queue pattern from builtin/describe.c
(introduced in 08bb69d) into prio_queue itself.  Three callers
independently implemented the same get+put fusion:

  - builtin/describe.c had a full lazy_queue wrapper
  - commit.c:pop_most_recent_commit() reimplements the same
    get_pending flag with peek+replace
  - builtin/show-branch.c:join_revs() used the same peek+replace
    pattern

All three now collapse to plain _get() and _put(),
with the data structure handling the fusion internally.

Remove prio_queue_replace() since no external callers remain.
Add prio_queue_size() for callers that need the logical element
count, since the physical nr may temporarily include a
pending-removal element.

Benchmarked on a large monorepo (10-15 interleaved runs, 1 warmup):

  Command                       base    patched  speedup
  merge-base --all A A~1000     3.88s   3.77s    1.03x
  rev-list --count A~1000..A    3.57s   3.43s    1.04x
  log --oneline A~1000..A       3.70s   3.49s    1.06x
  rev-parse :/pattern           365ms   364ms    1.00x
  describe HEAD (linux.git)     184ms   190ms    1.00x

No regressions in any scenario.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the lazy-prio-queue-pr branch from 137dd0a to 29af244 Compare June 6, 2026 13:27
@spkrka spkrka marked this pull request as ready for review June 6, 2026 14:55
@spkrka
Copy link
Copy Markdown
Author

spkrka commented Jun 6, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

Submitted as pull.2140.git.1780757885582.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2140/spkrka/lazy-prio-queue-pr-v1

To fetch this version to local tag pr-2140/spkrka/lazy-prio-queue-pr-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2140/spkrka/lazy-prio-queue-pr-v1

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Add prio_queue_size() for callers that need the logical element
> count, since the physical nr may temporarily include a
> pending-removal element.

Many code paths used to learn how many elements it logically has by
directly peeking into .nr member of the prio_queue struct.  Now they
should call this new helper function, and you converted some in this
patch.

How can we be sure that all such users of prio_queue has been
converted?  Are direct references to .nr member, outside of the
prio-queue.c implementation, all now suspect?

For example, object-name.c:get_oid_oneline() uses a prio-queue
"copy", and loops "while (copy.nr)".  In the loop, it calls
pop_most_recent_commit(), which does a get followed by put of its
parents.  If the get become hanging (e.g., root commit, causing no
_put() performed in pop_most_recent_commit()), would copy.nr still
remain 1 but logically no elements remain in the queue.

There seem to be other direct peeking of .nr member remaining in the
code.  Perhaps the member should be renamed to catch in-flight
topics that add more users of prio-queue that peek into the .nr
member, or something like that.

@spkrka spkrka force-pushed the lazy-prio-queue-pr branch 2 times, most recently from 1524cf1 to 50a332a Compare June 6, 2026 18:28
Rename the .nr member to .nr_internal so that callers outside
prio-queue.c that directly reference .nr get a compilation error.
This catches both existing misuse and future in-flight topics.

Add prio_queue_for_each() macro for callers that need to walk all
elements in the queue, accounting for the get_pending offset.

Convert all external .nr users:
 - Loop conditions: use prio_queue_size(), prio_queue_get(), or
   prio_queue_peek() as the loop condition
 - Array iterations: use prio_queue_for_each()

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the lazy-prio-queue-pr branch from 50a332a to bb8b0f7 Compare June 6, 2026 18:31
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

Kristofer Karlsson wrote on the Git mailing list (how to reply to this email):

Junio C Hamano <gitster@pobox.com> writes:

> How can we be sure that all such users of prio_queue has been
> converted?  Are direct references to .nr member, outside of the
> prio-queue.c implementation, all now suspect?

You're right, and the patch is thus broken in its current state.

I did a rename of .nr to ._nr on the branch and rebuilt -- that
immediately found several callers I missed:

 - object-name.c: get_oid_oneline()
   (like you also found)
 - fetch-pack.c: mark_recent_complete_commits()
 - builtin/last-modified.c: last_modified_run()
 - path-walk.c: walk_objects_by_path()
 - commit-reach.c: queue_has_nonstale()

The describe.c and show-branch.c callers already compensate for
get_pending in their iteration bounds, but they still reach into
.nr directly.

> Perhaps the member should be renamed to catch in-flight topics
> that add more users of prio-queue that peek into the .nr member,
> or something like that.

Agreed, that's the right fix. I looked for existing ways of marking
fields as private, internal or hidden but the only thing I found was
the convention of using a code comment: /* for internal use only */

I will apply a rename and submit a v2. Perhaps something like
nr_internal to make it look less like a public API.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

User Kristofer Karlsson <krka@spotify.com> has been added to the cc: list.

@spkrka
Copy link
Copy Markdown
Author

spkrka commented Jun 6, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented Jun 6, 2026

Submitted as pull.2140.v2.git.1780772477.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2140/spkrka/lazy-prio-queue-pr-v2

To fetch this version to local tag pr-2140/spkrka/lazy-prio-queue-pr-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2140/spkrka/lazy-prio-queue-pr-v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant