Skip to content

Stage interface redesign: pipe type negotiation, pooled buffer copies#51

Open
znull wants to merge 40 commits into
mainfrom
znull/stage-v2-clean
Open

Stage interface redesign: pipe type negotiation, pooled buffer copies#51
znull wants to merge 40 commits into
mainfrom
znull/stage-v2-clean

Conversation

@znull
Copy link
Copy Markdown
Contributor

@znull znull commented Apr 9, 2026

Rebases and modernizes @mhagger's Stage interface redesign (#21) onto current main, bumps the module version to v2, and takes advantage of the bump to make a couple other API changes.

What this is

go-pipe is the library used to wire up process pipelines (cat foo | sed | filter-fn | writer): it spawns the subprocesses, connects stdin/stdout between stages, and propagates errors. This PR reworks the core Stage interface so that the pipeline, not the stages, owns connecting adjacent stages. That allows for optimizations (like the copy-elimination below) and lets us delete the old synthetic ioCopier entirely.

The interface now hands each stage both its stdin and stdout, plus a new StartOptions, and asks each stage to declare its I/O preferences so the pipeline can pick the cheapest pipe type between neighbors:

type Stage interface {
    Name() string
    Preferences() StagePreferences
    Start(ctx context.Context, env Env, stdin io.ReadCloser, stdout io.WriteCloser, opts StartOptions) error
    Wait() error
}

Because this breaks the interface, the module path moves to /v2.

Perf optimizations

We want to minimize data copies, whether by letting the kernel move bytes instead of Go, or if we do copy in Go, doing it more efficiently. With the pipeline owning the connections it can:

  • hand a command's stdout fd straight to the next stage (or to the final destination), so there's no Go-side goroutine copying between stages and dirtying the heap;
  • pass an *os.File destination's fd directly into the child, letting the kernel (sendfile(2)/splice(2) and friends) do the copy (so-called "zero-copy" i/o);
  • and where we do have to copy in-process, run it through a pooled 32KB buffer instead of letting exec.Cmd allocate a fresh one per pipeline.

#49 and #50 were aiming for similar optimizations, but now they fall out of the structure instead of being bolted onto ioCopier.

Panic Handling

go-pipe runs user code (function stages, memory-limit event handlers) in goroutines it spawns itself, so a panic there used to be able to take down the whole process. There was already panic handling, but it was part of the Stage interface, which caused a lot of problems with error-prone boilerplate in stages that don't do any panic handling (which is most of them). Panic handling is now a callback passed into Start that the pipeline threads to every stage, so a single handler covers all stages.

Supersedes

The git-systems/pooled-copies branch (which carried #49 + #50) can be deleted after this merges.

/cc @mhagger @migue @carlosmn

Copilot Summary

Interface change

Stage.Start gains stdout and a StartOptions struct (a struct so future run-scoped options don't break the interface again):

Start(ctx context.Context, env Env, stdin io.ReadCloser, stdout io.WriteCloser, opts StartOptions) error

Pipeline.Start() uses each stage's Preferences() to negotiate the pipe type between adjacent stages:

  • os.Pipe() when either neighbor is a command (needs a real *os.File fd)
  • io.Pipe() when both neighbors are Go functions (all userspace, cheaper)
  • the pipeline's own stdout passed directly to the last stage — no synthetic ioCopier

Module path bumped to github.com/github/go-pipe/v2 for the breaking change.

Fast paths preserved from #49/#50

ioCopier is deleted; the optimizations it carried now live in the stage structure:

  • fd dup-to-child / sendfile — a final commandStage writing to an *os.File destination dup's the fd into the child; for a non-*os.File writer that implements io.ReaderFrom, the copy goes through ReadFrom (sendfile where the kernel supports it). Pinned in pipe/command_stdout_fastpath_test.go.
  • pooled 32KB copy buffers — for a non-*os.File, non-ReaderFrom stdout, setupPooledStdout builds an os.Pipe() and copies through a sync.Pool buffer rather than letting exec.Cmd allocate per-pipeline. See pipe/copy_pool.go.

Type unwrapping (replaces transparent NopCloser forwarding)

The old NopCloser re-exposed io.WriterTo by forwarding, so a naive io.Copy kept the fast path transparently through the wrapper. That's replaced by explicit, exported helpers:

  • UnwrapReader(io.Reader) io.Reader / UnwrapWriter(io.Writer) io.Writer recover the concrete type go-pipe wrapped around stdin/stdout (nil-safe; non-wrappers pass through unchanged).
  • goStage.Start and commandStage.Start call them internally, so every pipe.Function consumer transparently regains WriterTo / ReaderFrom / *os.File identity — a superset of the single method the old forwarding preserved, and it picked up a case goStage was previously missing.
  • The exported helpers are the public extension point for external stages that consume raw stdin/stdout and want a fast path or fd identity. No such stage exists in go-pipe or gitrpcd today.

Panic handling

  • StartOptions.PanicHandler (StagePanicHandler = func(p any) error) is set on the pipeline via WithStagePanicHandler and threaded to every stage's Start. The previous StagePanicHandlerAware opt-in interface is removed; wrapper stages just forward opts.
  • Covered: Function stage StageFuncs and memory-limit event handlers (both run in library-spawned goroutines). If PanicHandler is nil the panic propagates and crashes — intentional, since gitrpcd treats an unhandled pipeline panic as catastrophic.
  • A panic escaping the memory-watch goroutine is now recovered and surfaced via PanicHandler rather than crashing. The over-limit kill is guaranteed even if the event handler panics; a panic in a purely-informational handler (RSS-read error / peak usage) is recovered and the stage keeps running unmonitored ("fail open") — losing monitoring is not, on its own, a reason to stop a healthy stage. This is documented on MemoryLimit / StartOptions.PanicHandler.

Commit structure

  • Cherry-picked from @mhagger's version-2 (authorship preserved): linter shush, pipeline_test.go cleanup, pipeline benchmarks, NopCloser simplification, the Stage interface change, pipe-matching tests.
  • Reconciliation with main: port MemoryLimitWithObserver, restore the Function-stage panic handler, fix memoryWatchStage.Wait() to always stopWatching(), lint.
  • Adversarial-review fixups: restore empty-pipeline identity copy, pin the command-stdout fd-pass fast path, restore pooled-buffer copy for non-*os.File stdout, avoid leaking the pooled-stdout goroutine when cmd.Start() fails.
  • v2 API work: bump the module path to /v2, remove IOPreferenceNil, handle nil stdin/stdout cleanly, export Unwrap{Reader,Writer} and fully unwrap goStage stdin, thread the panic handler through Start (dropping StagePanicHandlerAware), and recover panics escaping the memory-watch goroutine.

@znull znull force-pushed the znull/stage-v2-clean branch from 409297e to fa6c12d Compare April 9, 2026 10:05
@znull znull changed the base branch from main to git-systems/pooled-copies April 9, 2026 10:06
@znull znull force-pushed the znull/stage-v2-clean branch 3 times, most recently from 79f4a1a to 911ed5b Compare April 9, 2026 13:37
@mhagger
Copy link
Copy Markdown
Member

mhagger commented Apr 27, 2026

This PR is still in draft mode. Do you want review/feedback already?

@znull
Copy link
Copy Markdown
Contributor Author

znull commented May 4, 2026

This PR is still in draft mode. Do you want review/feedback already?

@mhagger I ran out of time to review the LLM output before vacation. I wanted to read through the changes more fully myself before inflicting them on anyone else so I left it in draft mode.

@znull znull force-pushed the znull/stage-v2-clean branch from 911ed5b to 4764d95 Compare May 28, 2026 17:29
mhagger added 4 commits May 28, 2026 20:23
Ported from version-2 branch commits:
- 95dc2e8 pipeline_test.go: get rid of a bunch of unnecessary tmpdirs
- 5fdc22a TestPipelineStdinThatIsNeverClosed(): create stdin more simply
- c2c9802 pipeline_test.go: use WithStdoutCloser() to close stdout pipes

Tests that don't run external commands (or whose commands don't
need a specific working directory) don't need t.TempDir().
Add some benchmarks that move MB-scale data through pipelines
consisting of alternating commands and functions, one in small writes,
and one buffered into larger writes, then processing it one line at a
time. This is not so efficient, because every transition from
`Function` → `Command` requires an extra (hidden) goroutine that
copies the data from an `io.Reader` to a `*os.File`.

We can make this faster!
* Rename
  * `newNopCloser()` → `newReaderNopCloser()`
  * `nopCloser` → `readerNopCloser`
  * `nopCloserWriterTo` → `readerWriterToNopCloser`
  * `nopWriteCloser` → `writerNopCloser`

  to help keep readers and writers straight and because only the
  `Close()` part is a NOP.

* Move `writerNopCloser` to `nop_closer.go` to be with its siblings.
@znull znull force-pushed the znull/stage-v2-clean branch from 4764d95 to f97fddd Compare May 28, 2026 18:46
@znull znull changed the base branch from git-systems/pooled-copies to main May 28, 2026 18:46
@znull znull changed the title Stage interface redesign: pipe type negotiation, eliminate ioCopier Stage interface redesign: pipe type negotiation, pooled buffer copies May 28, 2026
@znull znull force-pushed the znull/stage-v2-clean branch 2 times, most recently from 2ed608b to d14ef7b Compare May 29, 2026 11:33
@znull znull self-assigned this May 29, 2026
@znull znull force-pushed the znull/stage-v2-clean branch from 08a9cf4 to fca1bfc Compare May 29, 2026 13:56
@znull znull marked this pull request as ready for review May 29, 2026 14:39
@znull znull requested a review from a team as a code owner May 29, 2026 14:39
Copilot AI review requested due to automatic review settings May 29, 2026 14:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the pipeline stage contract for v2 by letting stages declare I/O preferences and receive both stdin and stdout from the pipeline, enabling better pipe selection and removing the synthetic ioCopier stage.

Changes:

  • Redesigns Stage with Preferences() and Start(..., stdin, stdout) so Pipeline.Start() can negotiate os.Pipe vs io.Pipe.
  • Reworks command/function/memory-limit stages for the new interface, including pooled stdout copies for non-file command destinations.
  • Updates module path to /v2 and adds regression/benchmark coverage for pipe matching, empty pipelines, fast-path stdout, and start-failure cleanup.
Show a summary per file
File Description
README.md Updates documentation links for the v2 module path.
go.mod Changes the module path to github.com/github/go-pipe/v2.
internal/ptree/ptree_test.go Updates internal import path for v2.
pipe/stage.go Redefines the public Stage interface and adds I/O preference types.
pipe/pipeline.go Reworks pipeline startup to negotiate pipe types and pass stdout directly.
pipe/command.go Adapts command stages to the new interface and adds pooled stdout copy handling.
pipe/function.go Adapts function stages to receive caller-provided stdout and panic handling.
pipe/filter-error.go Forwards panic handlers through error-filtering wrappers.
pipe/memorylimit.go Ports memory-watching wrappers to the new stage interface.
pipe/nop_closer.go Splits reader/writer nop closers and adds test unwrapping support.
pipe/copy_pool.go Adds pooled-buffer copy helper with ReaderFrom fast-path support.
pipe/iocopier.go Removes the old synthetic copier stage.
pipe/scanner.go Simplifies scanner error return.
pipe/command_linux.go Updates internal import path for v2.
pipe/command_test.go Applies formatting cleanup.
pipe/command_nil_panic_test.go Updates direct Start call for the new signature.
pipe/pipeline_test.go Updates tests/benchmarks for v2 behavior, empty pipelines, and panic forwarding.
pipe/memorylimit_test.go Reworks memory-limit tests for the new pipeline flow.
pipe/pipe_matching_test.go Adds coverage for negotiated stdin/stdout pipe types.
pipe/export_test.go Exposes nop-closer unwrapping for external package tests.
pipe/command_stdout_fastpath_test.go Adds tests pinning direct *os.File stdout handoff.
pipe/command_starterror_test.go Adds regression coverage for start-failure copy-goroutine cleanup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 22/22 changed files
  • Comments generated: 2

Comment thread pipe/memorylimit.go
Comment thread pipe/export_test.go Outdated
@znull znull force-pushed the znull/stage-v2-clean branch from d7948da to 33ca03c Compare May 29, 2026 14:56
@znull znull requested a review from Copilot May 29, 2026 15:01
@znull
Copy link
Copy Markdown
Contributor Author

znull commented May 29, 2026

@mhagger I ran out of time to review the LLM output before vacation. I wanted to read through the changes more fully myself before inflicting them on anyone else so I left it in draft mode.

I think this is actually worth taking a look at now.

@znull znull requested a review from mhagger May 29, 2026 15:03
znull and others added 4 commits May 30, 2026 21:10
The internal noop-closer wrappers that go-pipe puts around the
caller-owned stdin/stdout hid the underlying object's concrete type from
stages.

These were supposed to be unwrapped, but this wasn't done consistently
(readerWriterToNopCloser), and external Stage implementations had no
blessed way to unwrap at all.

So, introduce exported UnwrapReader/UnwrapWriter as the sole way for
recovering the underlying reader/writer, and delete
readerWriterToNopCloser in favor of explicit unwrap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Based on PR feedback, rewrite the use of recover() for function stages
so that the code flows in forward order. Inline recoverPanic() to keep
the recover in the first stack frame.
Of all the different types of pipe.Stage, most don't need to have a
panic handler, because most are not running user functions. Yet we were
paying the price of having panic forwarding as part of the interface,
which was awkward and error-prone for the rest of the stages to
implement cleanly.

Instead, we can just pass the panic handler through Start instead. We
use a trailing StartOptions struct carrying PanicHandler in Stage.Start.
The StagePanicHandlerAware interface and its panic.go file are removed
(StagePanicHandler moves next to StartOptions in stage.go).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Despite allowing a panic handler to be set, any panic in the
user-supplied event handler run by memoryWatchStage in a library-spawned
goroutine was not recovered.

To cover that gap, pass opts.PanicHandler into monitor() and recover
around the watch() call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@znull znull force-pushed the znull/stage-v2-clean branch from f39433e to 4278c36 Compare May 30, 2026 19:10
@znull znull requested a review from Copilot May 30, 2026 19:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 25/25 changed files
  • Comments generated: 1

Comment thread README.md Outdated
znull and others added 2 commits May 31, 2026 16:00
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
MemoryLimit, MemoryLimitWithObserver, and MemoryObserver were three
positional-argument constructors expressing one concept: watch a stage's
RSS, optionally enforce a limit, optionally log the peak. Collapse them
into a single functional-options constructor:

    MemoryWatch(stage, eventHandler, WithMemoryLimit(n), WithPeakUsageLogging())

Calling MemoryWatch with neither option now reports an invalid-usage event
and returns the stage unwrapped, mirroring the non-LimitableStage guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@znull znull requested a review from mhagger May 31, 2026 16:03
@mhagger
Copy link
Copy Markdown
Member

mhagger commented Jun 1, 2026

Thanks for your patience with my comments ✨

I think that I like the new style of dealing with the panic handler. I have just one question: why do we need a new StartOptions type, as opposed to adding the panic handler to Env? They seem redundant with each other.

@znull
Copy link
Copy Markdown
Contributor Author

znull commented Jun 1, 2026

Thanks for your patience with my comments ✨

Not at all, thanks for making them!

I think that I like the new style of dealing with the panic handler. I have just one question: why do we need a new StartOptions type, as opposed to adding the panic handler to Env? They seem redundant with each other.

I considered that briefly, but Env seemed more related to process specifics, and I was hesitant to make it a junk drawer of miscellaneous fields (which, admittedly, StartOptions pretty much is). In answering I'd consider not just the panic handler, but also #53's use of LeaveStdinOpen and LeaveStdoutOpen. On the whole, though, I can't say that any of those fields seems especially out of place in Env.

@mhagger EDIT: I noticed two things reviewing the code of #53:

mhagger added 7 commits June 2, 2026 10:07
For now it only has one method, which is a `memoryWatchFunc`.
Extract the following methods from `memoryWatcher.watch()`:

* `handleGetRSSError()`
* `killStage()`
* `reportPeakUsage()`
Extract method `memoryWatcher.update()` from `memoryWatcher.watch()`.
Build the stage into the `memoryWatcher`, so it doesn't need to be
passed around as an extra argument.
Copy link
Copy Markdown
Member

@mhagger mhagger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts about the tests related to handling panics.

Comment thread pipe/memorylimit_panic_test.go Outdated
Comment thread pipe/memorylimit_panic_test.go Outdated
znull and others added 8 commits June 2, 2026 22:01
Don't leak a goroutine on panic/recover
Per mhagger's review suggestion on #51: rather than injecting a synthetic
`watch` closure that panics, make `fakeLimitableStage.GetRSSAnon()` itself
panic, and drive the test through the real `MemoryWatch` constructor and
monitor goroutine. This exercises the genuine panic-propagation path.

Co-authored-by: Michael Haggerty <mhagger@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per mhagger's review suggestion on #51: replace the re-exec'd subprocess
test (and its child-process scaffolding) with a synchronous
assert.PanicsWithValue. A panic in the monitor goroutine can't be observed
from the test goroutine, so we drive the watch loop directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the `MemoryWatch()` constructor to create the `memoryWatchStage`.
Now that the tests don't need to use a custom `memoryWatchFunc`, there
is no need to make it dynamic. So instead, merge `memoryWatcher`
directly into type `memoryWatchStage`.

This commit intentionally doesn't change the method receiver names, to
make it easier to review. The next commit will make that change.
Make the new receivers (added by the previous commit) consistent with
the others.
This commit only reorders functions; no other changes are included.
Replace the MemoryLimit constructor trio with MemoryWatch + options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants