The solution is that buffered accesses should almost always flush after a threshold number of bytes or after a period of time if there is at least one byte, “threshold or timeout”. This is pretty common in hardware interfaces to solve similar problems.
In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data. Good choices of timeout parameter are: passed in as argument, slightly below human-scale (e.g. 1-100 ms), proportional to {bandwidth / threshold} (i.e. some multiple of the time it would take to reach the threshold at a certain access rate), proportional to target flushing overhead (e.g. spend no more than 0.1% time in syscalls).
Also note this applies for both writes and reads. If you do batched/coalesced reads then you likely want to do something similar. Though this is usually more dependent on your data channel as you need some way to query or be notified of “pending data” efficiently which your channel may not have if it was not designed for this use case. Again, pretty common in hardware to do interrupt coalescing and the like.
This is one of those things where, despite some 20+ years of dealing with NIX systems, I know* it happens, but always forget about it until I've sat puzzled why I've got no output for several moments.
> Some things I didn’t talk about in this post since these posts have been getting pretty long recently and seriously does anyone REALLY want to read 3000 words about buffering?
I'd like all buffers to be flushed whenever the systemwide CPU becomes idle.
Buffering generally is a CPU-saving technique. If we had infinite CPU, all buffers would be 1 byte. Buffers are a way of collecting together data to process in a batch for efficiency.
However, when the CPU becomes idle, we shouldn't have any work "waiting to be done". As soon as the kernel scheduler becomes idle, all processes should be sent a "flush your buffers" signal.
This article is confusing two different things: "unbuffered" vs "line-buffered".
Unbuffered will gratuitously give you worse performance, and can create incorrect output if multiple sources are writing to the same pipe. (sufficiently-long lines will intermingle anyway, but most real-world output lines are less than 4096 bytes, even including formatting/control characters and characters from supplementary planes)
Line-buffering is the default for terminals, and usually what you want for pipes. Run each command under `stdbuf -oL -eL` to get this. The rare programs that want to do in-line updates already have to do manual flushing so will work correctly here too.
You can see what `stdbuf` is actually doing by running:
Veserv ·7 days ago
In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data. Good choices of timeout parameter are: passed in as argument, slightly below human-scale (e.g. 1-100 ms), proportional to {bandwidth / threshold} (i.e. some multiple of the time it would take to reach the threshold at a certain access rate), proportional to target flushing overhead (e.g. spend no more than 0.1% time in syscalls).
Also note this applies for both writes and reads. If you do batched/coalesced reads then you likely want to do something similar. Though this is usually more dependent on your data channel as you need some way to query or be notified of “pending data” efficiently which your channel may not have if it was not designed for this use case. Again, pretty common in hardware to do interrupt coalescing and the like.
Show replies
Twirrim ·8 days ago
hiatus ·8 days ago
I personally would.
Show replies
londons_explore ·7 days ago
Buffering generally is a CPU-saving technique. If we had infinite CPU, all buffers would be 1 byte. Buffers are a way of collecting together data to process in a batch for efficiency.
However, when the CPU becomes idle, we shouldn't have any work "waiting to be done". As soon as the kernel scheduler becomes idle, all processes should be sent a "flush your buffers" signal.
Show replies
o11c ·7 days ago
Unbuffered will gratuitously give you worse performance, and can create incorrect output if multiple sources are writing to the same pipe. (sufficiently-long lines will intermingle anyway, but most real-world output lines are less than 4096 bytes, even including formatting/control characters and characters from supplementary planes)
Line-buffering is the default for terminals, and usually what you want for pipes. Run each command under `stdbuf -oL -eL` to get this. The rare programs that want to do in-line updates already have to do manual flushing so will work correctly here too.
You can see what `stdbuf` is actually doing by running: