How can i delete a line if it is longer than e.g.: 2048 chars?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
sed '/^.{2048}./d' input.txt > output.txt
Method 2
Here’s a solution which deletes lines that has 2049 or more characters:
sed '/.{2049}/d' <file.in >file.out
The regular expression .{2049} would match any line that contains a substring of 2049 characters (another way of saying “at least 2049 characters”). The d command deletes them from the input, producing only shorter line on the output.
BSD sed (on e.g. macOS) can only handle repetition counts of up to 256 in the {...} operator (the value of RE_DUP_MAX; see getconf RE_DUP_MAX in the shell). On these systems, you may instead use awk:
awk 'length <= 2048' <file.in >file.out
Mimicking the sed solution literally with awk:
awk 'length >= 2049 { next } { print }' <file.in >file.out
Note that any awk implementation is only guaranteed to be able to handle records of lengths up to LINE_MAX bytes (see getconf LINE_MAX in the shell), but may support longer ones. On macOS, LINE_MAX is 2048.
Method 3
perl -lne "length < 2048 && print" infile > outfile
Method 4
Something like this should work in Python.
of = open("orig")
nf = open("new",'w')
for line in of:
if len(line) < 2048:
nf.write(line)
of.close()
nf.close()
Method 5
The above answers do not work for me on Mac OS X 10.9.5.
The following code does work:
sed '/.{2048}/d'.
Although not asked, but provided for reference, the reverse can be achieved the following code:
sed '/.{2048}/!d'.
Method 6
With gnu-sed, you may use the -r flag, to avoid typing the backslashes, and a comma, to define an open interval:
sed -r "/.{2049,}/d" input.txt > output.txt
with:
- x{2049} meaning exactly 2049 xs
- x{2049,3072} meaning from 2049 to 3072 xs
- x{2049,} meaning at least 2049 xs
- x{,2049} meaning at most 2049 xs
For the intervals, to not match bigger patterns, you would need line anchors like
sed -r "/^.{32,64}$/d" input.txt > output.txt
Method 7
The sed solutions are all very slow when the line lengths become very long. This is the disadvantage of matching line length with regexes. (But of course the advantage is that sed is everywhere)
If you like the speed of the Perl solution, but prefer using Python, the pz CLI tool makes this really easy. It brings Python to shell pipes.
With pz the solution would be:
cat input | pz 's if len(s) < 2048 else ""' > output
Method 8
Split the row at each char by setting FS to nothing :
awk 'BEGIN{FS=""} NF <= 2048' file
test with :
perl -e 'print "z"x2048' | awk 'BEGIN{FS=""} NF <= 2048'
# This print
perl -e 'print "z"x2049' | awk 'BEGIN{FS=""} NF <= 2048'
# This not
Method 9
With Ruby:
ruby -ne 'print if $_.size <= 2048' input.txt > output.txt
Or to edit in place and create a backup:
ruby -i.bak -ne 'print if $_.size <= 2048' file.txt
Without a backup:
ruby -i -ne 'print if $_.size <= 2048' file.txt
Note: $_.size includes the trailing newline, if any. You can use $_.chomp.size to ignore trailing newlines.
You could also check line size via a regex, like some of the other examples, but it will be slower:
# slow
ruby -ne 'print if /.{2048}./' input.txt > output.txt
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0