How to delete line if longer than XY?

How can i delete a line if it is longer than e.g.: 2048 chars?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

sed '/^.{2048}./d' input.txt > output.txt

Method 2

Here’s a solution which deletes lines that has 2049 or more characters:

sed '/.{2049}/d' <file.in >file.out

The regular expression .{2049} would match any line that contains a substring of 2049 characters (another way of saying “at least 2049 characters”). The d command deletes them from the input, producing only shorter line on the output.

BSD sed (on e.g. macOS) can only handle repetition counts of up to 256 in the {...} operator (the value of RE_DUP_MAX; see getconf RE_DUP_MAX in the shell). On these systems, you may instead use awk:

awk 'length <= 2048' <file.in >file.out

Mimicking the sed solution literally with awk:

awk 'length >= 2049 { next } { print }' <file.in >file.out

Note that any awk implementation is only guaranteed to be able to handle records of lengths up to LINE_MAX bytes (see getconf LINE_MAX in the shell), but may support longer ones. On macOS, LINE_MAX is 2048.

Method 3

perl -lne "length < 2048 && print" infile > outfile

Method 4

Something like this should work in Python.

of = open("orig")
nf = open("new",'w')
for line in of:         
    if len(line) < 2048:
        nf.write(line)
of.close()
nf.close()

Method 5

The above answers do not work for me on Mac OS X 10.9.5.

The following code does work:

sed '/.{2048}/d'.

Although not asked, but provided for reference, the reverse can be achieved the following code:

sed '/.{2048}/!d'.

Method 6

With gnu-sed, you may use the -r flag, to avoid typing the backslashes, and a comma, to define an open interval:

sed -r  "/.{2049,}/d" input.txt > output.txt

with:

x{2049} meaning exactly 2049 xs
x{2049,3072} meaning from 2049 to 3072 xs
x{2049,} meaning at least 2049 xs
x{,2049} meaning at most 2049 xs

For the intervals, to not match bigger patterns, you would need line anchors like

sed -r  "/^.{32,64}$/d" input.txt > output.txt

Method 7

The sed solutions are all very slow when the line lengths become very long. This is the disadvantage of matching line length with regexes. (But of course the advantage is that sed is everywhere)

If you like the speed of the Perl solution, but prefer using Python, the pz CLI tool makes this really easy. It brings Python to shell pipes.

With pz the solution would be:

cat input | pz 's if len(s) < 2048 else ""' > output

Method 8

Split the row at each char by setting FS to nothing :

awk 'BEGIN{FS=""} NF <= 2048' file

test with :

perl -e 'print "z"x2048' | awk 'BEGIN{FS=""} NF <= 2048'
# This print

perl -e 'print "z"x2049' | awk 'BEGIN{FS=""} NF <= 2048'
# This not

Method 9

With Ruby:

ruby -ne 'print if $_.size <= 2048' input.txt > output.txt

Or to edit in place and create a backup:

ruby -i.bak -ne 'print if $_.size <= 2048' file.txt

Without a backup:

ruby -i -ne 'print if $_.size <= 2048' file.txt

Note: $_.size includes the trailing newline, if any. You can use $_.chomp.size to ignore trailing newlines.

You could also check line size via a regex, like some of the other examples, but it will be slower:

# slow
ruby -ne 'print if /.{2048}./' input.txt > output.txt

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating