Why does the command shuf file > file leave an empty file, but similar commands do not?

I know this is sort of a duplicate of another question (Why this sort command gives me an empty file?) but I wanted to expand on the question in response to the answers given.

The command

shuf example.txt > example.txt

Returns a blank file, because the shell truncates the file before shuffling it, leaving only a blank file to shuffle. However,

cat example.txt | shuf > example.txt

will produce a shuffled file as expected.

Why does the pipeline method work when the simple redirection doesn’t? If the file is truncated before the commands are run, shouldn’t the second method also leave an empty file?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The problem is that > example.txt starts writing to that file, before shuf example.txt starts reading it. So as there was no output yet, example.txt is empty, shuf reads an empty file, and as shuf makes no output in this case, the final result stays empty.

Your other command may suffer from the same issue. > example.txt may kill the file before cat example.txt starts reading it; it depends on the order the shell executes those things, and how long it takes cat to actually open the file.

To avoid such issues entirely, you could use shuf example.txt > example.txt.shuf && mv example.txt.shuf example.txt.

Or you could go with shuf example.txt --output=example.txt instead.

Method 2

The package moreutils has a command sponge:

   sponge  reads  standard input and writes it out to the specified file.
   Unlike a shell redirect, sponge soaks up all its input before  opening
   the output file. This allows constricting pipelines that read from and
   write to the same file.

with that way you can do:

shuf example.txt | sponge example.txt

(unfortunately the moreutils package also has a util named parallel that is far less useful than gnu parallel. I removed the parallel installed by moreutils)

Method 3

You are just quite lucky running

cat example.txt | shuf > example.txt

doesn’t empty example.txt like this command is doing.

shuf example.txt > example.txt

Redirections are performed by the shell before the commands are executed and pipeline components are executed concurrently.

Using the -o / --output option would be the best solution with shuf but if you like taking (very slight) risks, here is a non traditional way to avoid the processed file to be truncated before being read:

shuf example.txt | (sleep 1;rm example.txt;cat > example.txt)

and this simpler and faster one, thanks to Ole’s suggestion:

(rm example.txt; shuf > example.txt) < example.txt

Method 4

From the GNU bash manual (see also, for the details, section 3.7 Executing Commands):

3.1.1 Shell Operation

The following is a brief description of the shell’s operation when it
reads and executes a command. Basically, the shell does the following:

  1. Reads its input from a file (see Shell Scripts), from a string
    supplied as an argument to the -c invocation option (see Invoking
    Bash
    ), or from the user’s terminal.
  2. Breaks the input into words and operators, obeying the quoting rules
    described in Quoting. These tokens are separated by metacharacters.
    Alias expansion is performed by this step (see Aliases).
  3. Parses the tokens into simple and compound commands (see Shell
    Commands
    ).
  4. Performs the various shell expansions (see Shell Expansions),
    breaking the expanded tokens into lists of filenames (see Filename
    Expansion
    ) and commands and arguments
  5. Performs any necessary redirections (see Redirections) and removes
    the redirection operators and their operands from the argument list.
  6. Executes the command (see Executing Commands).
  7. Optionally waits for the command to complete and collects its exit
    status (see Exit Status).

Consider the situation where your file does not exist. Yet your second example will create the file most of the times. If the file had not been so created and it didn’t exist, and redirection hadn’t occurred, cat would complain there is no such file… Most of the time it won’t. To reproduce the shuffle you got with that second command, I needed a great many tries. So indeed the second expression should leave an empty file most of the times.

Method 5

You can use Vim in Ex mode:

ex -sc '%!shuf' -cx example.txt
  1. % select all lines
  2. ! run command
  3. x save and close

Method 6

The shortest version I have found:

(rm foo && shuf > foo) < foo

This opens the file, unlinks it, and then truncates the file. Thereby you avoid the truncated file before opening it, which is what you will normally see when redirecting output to the same file as you are reading from.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x