Understanding piped commands in Unix/Linux

I have two simple programs: A and B. A would run first, then B gets the “stdout” of A and uses it as its “stdin”. Assume I am using a GNU/Linux operating system and the simplest possible way to do this would be:

./A | ./B

If I had to describe this command, I would say that it is a command that takes input (i.e., reads) from a producer (A) and writes to a consumer (B). Is that a correct description? Am I missing anything?

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The only thing about your question that stands out as wrong is that you say

A would run first, then B gets the stdout of A

In fact, both programs would be started at pretty much the same time. If there’s no input for B when it tries to read, it will block until there is input to read. Likewise, if there’s nobody reading the output from A, its writes will block until its output is read (some of it will be buffered by the pipe).

The only thing synchronising the processes that take part in a pipeline is the I/O, i.e. the reading and writing across the pipe. If no writing or reading happens, then the two processes will run totally independent of each other. If one ignores the reading or writing of the other, the ignored process will block and eventually be killed by a SIGPIPE signal (if writing) or get an end-of-file condition on its standard input stream (if reading) when the other process terminates.

The idiomatic way to describe A | B is that it’s a pipeline containing two programs. The output produced on standard output from the first program is available to be read on the standard input by the second (“[the output of] A is piped into [the input of] B“). The shell does the required plumbing to allow this to happen.

If you want to use the words “consumer” and “producer”, I suppose that’s ok too.

The fact that these are programs written in C is not relevant. The fact that this is Linux, macOS, OpenBSD or AIX is not relevant.

Method 2

The term usually used in documentation is “pipeline” , which consists of one or more commands, see POSIX definition So technically speaking, that’s two commands you have there, two subprocesses for the shell (either fork()+exec()‘ed external commands or subshells )

As for producer-consumer part, the pipeline can be described by that pattern, since:

Producer and Consumer share fixed-size buffer, and at least on Linux and MacOS X, there’s fixed size for pipeline buffer
Producer and Consumer are loosely-coupled, commands in pipeline don’t know of each other’s existence ( unless they are actively checking /proc/<pid>/fd directory ).
Producers write to stdout and consumers read stdin as if they were a single command being executed, aka they can exist without each other.

The difference I see here is that unlike Producer-Consumer in other languges, shell commands use buffering and they write stdout once buffer is filled, but there’s no mention that Producer-Consumer has to follow that rule – only wait when queue is filled or discard data (which is something else that pipeline doesn’t do).

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating