Reusing pipe data for different commands

I would like to use the same pipe for different applications, like in:

cat my_file | {
  cmd1
  cmd2
  cmd3
}

Cmd1 should consume part of the input. Cmd2 should consume another part and so on.

However, each cmd eats more of the input then it read really needs due buffering.

For example:

yes | nl | { 
  head -n 10 > /dev/null
  cat 
} | head -n 10

Outputs from line 912 instead of line 11.

Tee is not a good option, because each command is supposed to consume part of the stdin.

Is there a simple way to get this working?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You may use tee to duplicate command for processing whole stream by many command:

( ( seq 1 10 | tee /dev/fd/5 | sed s/^/line.. / >&4 ) 5>&1 | wc -l ) 4>&1 
line.. 1
line.. 2
line.. 3
line.. 4
line.. 5
line.. 6
line.. 7
line.. 8
line.. 9
line.. 10
10

or split line by line, using bash:

while read line ;do
    echo cmd1 $line
    read line && echo cmd2 $line
    read line && echo cmd3 $line
  done < <(seq 1 10)
cmd1 1
cmd2 2
cmd3 3
cmd1 4
cmd2 5
cmd3 6
cmd1 7
cmd2 8
cmd3 9
cmd1 10

Finaly there is a way for running cmd1, cmd2 and cmd3 only once with 1/3 of stream as STDIN:

( ( ( seq 1 10 |
         tee /dev/fd/5 /dev/fd/6 |
           sed -ne '1{:a;p;N;N;N;s/^.*n//;ta;}' |
           cmd1 >&4
     ) 5>&1 |
       sed -ne '2{:a;p;N;N;N;s/^.*n//;ta;}' |
       cmd2 >&4
  ) 6>&1 |
    sed -ne '3{:a;p;N;N;N;s/^.*n//;ta;}' |
    cmd3 >&4
) 4>&1 
command_1: 1
command_1: 4
command_1: 7
command_1: 10
Command-2: 2
Command-2: 5
Command-2: 8
command 3: 3
command 3: 6
command 3: 9

For trying this, you could use:

alias cmd1='sed -e "s/^/command_1: /"' 
    cmd2='sed -e "s/^/Command_2: /"' 
    cmd3='sed -e "s/^/Command_3: /"'

For using one stream on different process if on same script, you could do:

(
    for ((i=(RANDOM&7);i--;));do
        read line;
        echo CMD1 $line
      done
    for ((i=RANDOM&7;i--;));do
        read line
        echo CMD2 $line
      done
    while read line ;do
        echo CMD3 $line
      done
)
CMD1 1
CMD1 2
CMD1 3
CMD2 4
CMD2 5
CMD2 6
CMD2 7
CMD2 8
CMD2 9
CMD3 10

For this, you may have to transform your separated scripts into bash function to be able to build one overall script.

Another way could be to ensure each script won’t output anything to STDOUT, than add a cat at end of each script to be able to chain them:

#!/bin/sh

for ((i=1;1<n;i++));do
   read line
   pRoCeSS the $line
   echo >output_log
 done

cat

Final command could look like:

seq 1 10 | cmd1 | cmd2 | cmd2

Method 2

For head -n 10 to be able to read 10 lines and not a single more character from the pipe on stdin, it would have to read one character at a time to be not read anything after the last newline character. That would be inefficient.

That’s what the read shell builtin does when stdin is not seekable.

{
  head -n 10 > /dev/null
  cat
} < myfile

Works because head reads a chunk of data and lseeks back to just after the end of the 10th line. That obviously can’t be done with pipes.

On recent GNU or FreeBSD systems, some applications that use stdio can be told to read their input one character at a time by using stdbuf -i1 or stdbuf -i0.

However, that doesn’t work with GNU head. It works with GNU sed though, so you could do:

seq 20 | {
  stdbuf -i0 sed -n 10q
  cat
}

Alternatively, what you could do is control what goes on a pipe so that there’s ever at most only one line in it at a time.

For instance, on Linux, you could do:

one_line_at_a_time() {
  perl -MTime::HiRes=usleep -pe '
    BEGIN{$|=1;open F, "<", "/dev/fd/1"; $r=""; vec($r,fileno(F),1) = 1}
    usleep(1000) while select($ro=$r,undef,undef,0)'
}
seq 20 | one_line_at_a_time | { head -n 10 > /dev/null; cat; }

That perl script opens “/dev/fd/1” in read mode which on Linux causes the other end of the pipe connected to fd 1 (stdout) to be opened. That way, with select, it can check if there’s something in the pipe (and sleep until it’s been emptied) before sending the next line.

Of course, that’s also terribly inefficient.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x