Why does sed act differently depending on the output file?

If I run:

cat messages.txt | sed -e 's/a/a/g' > messages.txt

on one large file (2500+ lines) I find that the resulting file will only have about 900 lines after the command in cygwin and will have no lines in gentoo. However if I run

cat messages.txt | sed -e 's/a/a/g' > other_messages.txt

it retains all the lines as it should.

My question is why and is there any way to do it other than

cat messages.txt | sed -e 's/a/a/g' > other_messages.txt
rm messages.txt
mv other_messages.txt messages.txt

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

fschmitt’s answer is the best when using sed; however, in a more general sense this anti-pattern:

cat infile | filter > infile

is likely to cause you a good number of problems. For instance if I have a file called infile that looks like this:

Hello
World

and run this command:

cat infile | tr "[:upper:]" "[:lower:]"

I get

hello
world

But if I run cat infile | tr "[:upper:]" "[:lower:]" > infile I will get an empty file. Why?

Well, when you use the output redirection operator > you are saying “Put my standard output into this file, and if that file exists overwrite it.” Now you may think that this should work since your filter will return all the lines of the original file. However, what often ends up happening is that the shell will clobber your file before any lines are read. Then, your filter command will go to read lines from an empty file, find none, and thus return none. In some places you might get “lucky” enough to have some lines read before the file gets clobber, but it is best to just avoid this pattern altogether.

To get around this particular issue you have a few options. One is to simply do something like:

cat infile | filter > tmpfile; mv tmpfile infile

If you need to be sure that your temp file won’t clobber some other file or have other nasty things happen to it, you should look into mktemp. (see man mktemp and info coreutils mktemp)

Another option is to use sponge from moreutils.

Also, many of these examples are examples of useless uses of cat.

Method 2

Why don’t you just write

sed -i -e 's/a/a/g' messages.txt

the -i means “in place”

Method 3

Yet another (portable) way to edit a file in-place is to use ed.

# cf. http://wiki.bash-hackers.org/howto/edit-ed
cat <<-'EOF' | ed -s messages.txt
H
,g/a/s//b/g
wq
EOF


# ... or read the file contents into a variable, modify it and write it back to file
file_contents="$(cat messages.txt)"
printf '%s' "$file_contents" | sed -e 's/a/b/g' > messages.txt


# ... and, if you want to play around with a file descriptor hack, ...
# (As long as there's a fd associated with a file, the file can be accessed via the fd.) 

exec 3<messages.txt  # open file on fd 3 for reading
rm -f messages.txt
sed -e 's/a/b/g' <&3 > messages.txt

Method 4

You can use Vim in Ex mode:

ex -sc '%s/OLD/NEW/g|x' messages.txt
  1. % select all lines
  2. s substitute
  3. g global replace
  4. x save and close


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x