How to remove multiple newlines at EOF?

I have files that end in one or more newlines and should end in only one newline. How can I do that with Bash/Unix/GNU tools?

Example bad file:

1n
n
2n
n
n
3n
n
n
n

Example corrected file:

1n
n
2n
n
n
3n

In other words: There should be exactly one newline between the EOF and the last non-newline character of the file.

Reference Implementation

Read file contents, chop off a single newline till there no further two newlines at the end, write it back:

#! /bin/python

import sys

with open(sys.argv[1]) as infile:
    lines = infile.read()

while lines.endswith("nn"):
    lines = lines[:-1]

with open(sys.argv[2], 'w') as outfile:
    for line in lines:
        outfile.write(line)

Clarification: Of course, piping is allowed, if that is more elegant.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

From useful one-line scripts for sed.

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^n*$/{$d;N;};/n$/ba' file

Method 2

awk '/^$/ {nlstack=nlstack "n";next;} {printf "%s",nlstack; nlstack=""; print;}' file

Method 3

Since you already have answers with the more suitable tools sed and awk; you could take advantage of the fact that $(< file) strips off trailing blank lines.

a=$(<file); printf '%sn' "$a" > file

That cheap hack wouldn’t work to remove trailing blank lines which may contain spaces or other non-printing characters, only to remove trailing empty lines. It also won’t work if the file contains null bytes.

In shells other than bash and zsh, use $(cat file) instead of $(<file).

Method 4

You can use this trick with cat & printf:

$ printf '%sn' "`cat file`"

For example

$ printf '%sn' "`cat ifile`" > ofile
$ cat -e ofile
1$
$
2$
$
$
3$

The $ denotes the end of a line.

References

Method 5

Here’s a Perl solution that doesn’t require reading more than one line into memory at a time:

my $n = 0;
while (<>) {
    if (/./) {
        print "n" x $n, $_;
        $n = 0;
    } else {
        $n++;
    }
}

or, as a one-liner:

perl -ne 'if (/./) { print "n" x $n, $_; $n = 0 } else { $n++ }'

This reads the file a line at a time and checks each line to see if contains a non-newline character. If it doesn’t, it increments a counter; if it does, it prints the number of newlines indicated by the counter, followed by the line itself, and then resets the counter.

Technically, even buffering a single line in memory is unnecessary; it would be possible to solve this problem using a constant amount of memory by reading the file in fixed-length chunks and processing it character by character using a state machine. However, I suspect that would be needlessly complicated for the typical use case.

Method 6

This question is tagged with , but nobody has proposed an ed solution.

Here’s one:

ed -s file <<'ED_END'
a

.
?.?+1,$d
w
ED_END

or, equivalently,

printf '%sn' a '' . '?.?+1,$d' w | ed -s file

ed will place you at the last line of the editing buffer by default upon startup.

The first command (a) adds an empty line to the end of the buffer (the empty line in the editing script is this line, and the dot (.) is just for coming back into command mode).

The address of the second command (?.?) looks for the nearest previous line that contains something (even white-space characters), and then deletes (d) everything to the end of the buffer from the next line on.

The third command (w) writes the file back to disk.

The added empty line protects the rest of the file from being deleted in the case that there aren’t any empty lines at the end of the original file.

Method 7

If your file is small enough to slurp into memory, you can use this

perl -e 'local($/);$f=<>; $f=~s/n*$/n/;print $f;' file

Method 8

In python (I know it is not what you want, but it is much better as it is optimized, and a prelude to the bash version) without rewriting the file and without reading all the file (which is a good thing if the file is very large):

#!/bin/python
import sys
infile = open(sys.argv[1], 'r+')
infile.seek(-1, 2)
while infile.read(1) == 'n':
  infile.seek(-2, 1)
infile.seek(1, 1)
infile.truncate()
infile.close()

Note that it does not work on files where the EOL character is not ‘n’.

Method 9

A bash version, implementing the python algorithm, but less efficient as it needs many processes:

#!/bin/bash
n=1
while test "$(tail -n $n "$1")" == ""; do
  ((n++))
done
((n--))
truncate -s $(($(stat -c "%s" "$1") - $n)) "$1"

Method 10

This one is quick to type, and, if you know sed, easy to remember:

tac < file | sed '/[^[:blank:]]/,$!d' | tac

It uses the sed script to delete leading blank lines from useful one line scripts for sed, referenced by Alexey, above, and tac (reverse cat).

In a quick test, on an 18MB, 64,000 line file, Alexey’s approach was faster, (0.036 vs 0.046 seconds).


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x