I’m looking for the simplest method to print the longest line in a file. I did some googling and surprisingly couldn’t seem to find an answer. I frequently print the length of the longest line in a file, but I don’t know how to actually print the longest line. Can anyone provide a solution to print the longest line in a file? Thanks in advance.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
cat ./text | awk ' { if ( length > x ) { x = length; y = $0 } }END{ print y }'
UPD: summarizing all the advices in the comments
awk 'length > max_length { max_length = length; longest_line = $0 } END { print longest_line }' ./text
Method 2
cat filename | awk '{ print length }' | sort -n | tail -1
Method 3
Grep the first longest line
grep -Em1 "^.{$(wc -L <file.txt)}$" file.txt
The command is unusually hard to read without practise because it mixes shell- and regexp syntax.
For explanation, I will use simplified pseudocode first. The lines starting with ## do not run in the shell.
This simplified code uses the file name F, and leaves out quoting and parts of regexps for readability.
How it works
The command has two parts, a grep– and a wc invocation:
## grep "^.{$( wc -L F )}$" F
The wc is used in a process expansion, $( ... ), so it is run before grep. It calculates the length of the longest line. The shell expansion syntax is mixed with the regular expression pattern syntax in a confusing way, so I will decompose the process expansion:
## wc -L F
42
## grep "^.{42}$" F
Here, the process expansion was replaced with the value it would return, creating the grep commandline that is used. We can now read the regular expression more easily: It matches exactly from start (^) to end ($) of the line. The expression between them matches any character except newline, repeated by 42 times. Combined, that is lines that consist of exactly 42 characters.
Now, back to real shell commands: The grep option -E (--extended-regexp) allows to not escape the {} for readability. Option -m 1 (--max-count=1) makes it stop after the first line is found. The < in the wc command writes the file to its stdin, to prevent wc from printing the file name together with the length.
Which longest lines?
To make the examples more readable with the filename occurring twice, I will use a variable f for the filename; Each $f in the example could be replaced by the file name.
f="file.txt"
Show the first longest line – the first line that is as long as the longest line:
grep -E -m1 "^.{$(wc -L <"$f")}$" "$f"
Show all longest lines – all lines that are as long as the longest line:
grep -E "^.{$(wc -L <"$f")}$" "$f"
Show the last longest line – the last line that is as long as the longest line:
tac "$f" | grep -E -m1 "^.{$(wc -L <"$f")}$"
Show the single longest line – the longest line longer than all other lines, or fail:
[ $(grep -E "^.{$(wc -L <"$f")}$" "$f" | wc -l) = 1 ] && grep -E "^.{$(wc -L <"$f")}$" "$f"
(The last command is even more inefficient than the others, as it repeats the complete grep command. It should obviously be decomposed so that the output of wc and the lines written by grep are saved to variables.
Note that all longest lines may actually be all lines. For saving in a variable, only the first two lines need to be kept.)
Method 4
sed -rn "/.{$(<file expand -t1 |wc -L)}/{p;q}" file
This first reads the file inside the command substitution and outputs the length of the longest line, (previously, expand converts tabs to spaces, to overcome the semantics of wc -L — each tab in the line will add 8 instead of 1 to line length). This length is then used in a sed expression meaning “find a line this number of characters long, print it, then quit”. So this actually can be as optimal as the longest line is near to the top of the file, heheh (thanks fered for the awesome and constructive comments).
Another, I had thought earlier than the sed one (in bash):
#!/bin/bash
while read -r line; do
(( ${#line} > max )) && max=${#line} && longest="$line"
done
echo "$longest"
Method 5
Here’s a Perl solution:
perl -e 'while(<>){
$l=length;
$l>$m && do {$c=$_; $m=$l}
} print $c' file.txt
Or, if you want to print all the longest lines
perl -e 'while(<>){
$l=length;
push @{$k{$l}},$_;
$m=$l if $l>$m;
} print @{$k{$m}}' file.txt
Since I had nothing better to do, I ran some benchmarks on a 625M text file. Surprisingly, my Perl solution was consistently faster than the others. Granted, the difference with the accepted awk solution is tiny, but it is there. Obviously, solutions that print multiple lines are slower so I have sorted by type, fastest to slowest.
Print only one of the longest lines:
$ time perl -e 'while(<>){
$l=length;
$l>$m && do {$c=$_; $m=$l}
} print $c' file.txt
real 0m3.837s
user 0m3.724s
sys 0m0.096s
$ time awk 'length > max_length { max_length = length; longest_line = $0 }
END { print longest_line }' file.txt
real 0m5.835s
user 0m5.604s
sys 0m0.204s
$ time sed -rn "/.{$(<file.txt expand -t1 |wc -L)}/{p;q}" file.txt
real 2m37.348s
user 2m39.990s
sys 0m1.868s
Print all longest lines :
$ time perl -e 'while(<>){
$l=length;
push @{$k{$l}},$_;
$m=$l if $l>$m;
} print @{$k{$m}}' file.txt
real 0m9.263s
user 0m8.417s
sys 0m0.760s
$ time awk 'length >x { delete y; x=length }
length==x { y[NR]=$0 } END{ for (z in y) print y[z] }' file.txt
real 0m10.220s
user 0m9.925s
sys 0m0.252s
## This is Chris Down's bash solution
$ time ./a.sh < file.txt
Max line length: 254
Lines matched with that length: 2
real 8m36.975s
user 8m17.495s
sys 0m17.153s
Method 6
The following example was going to be, and should have been, a comment to dmitry.malikov’s answer, but because of the Useless Use of Visible Comment Space there, I’ve chosen to present it here, where it will at least be seen…
This is a simple variation of the dmitry’s
single-pass awk method.
It prints all “equal longest” lines. (Note. delete array is a gawk extension).
awk 'length >x { delete y; x=length }
length==x { y[NR]=$0 } END{ for (z in y) print y[z] }' file
Method 7
In pure bash:
#!/bin/bash
_max_length=0
while IFS= read -r _line; do
_length="${#_line}"
if (( _length > _max_length )); then
_max_length=${_length}
_max_line=( "${_line}" )
elif (( _length == _max_length )); then
_max_line+=( "${_line}" )
fi
done
printf 'Max line length: %dn' "${_max_length}"
printf 'Lines matched with that length: %dn' "${#_max_line[@]}"
(( ${#_max_line[@]} )) && printf '%sn' '----------------' "${_max_line[@]}"
Method 8
awk '{ print length(), $0 | "sort -n" }' file.txt | tail -1
Reference: https://www.systutorials.com/how-to-sort-lines-by-length-in-linux/
Method 9
I have developed a small shell script for this. It displays length, line # and line itself by length that exceeds a particular size like 80 characters:
#!/bin/sh
# Author: Surinder
if test $# -lt 2
then
echo "usage: $0 length file1 file2 ..."
echo "usage: $0 80 hello.c"
exit 1
fi
length=$1
shift
LONGLINE=/tmp/longest-line-$$.awk
cat << EOF > $LONGLINE
BEGIN {
}
/.*/ {
current_length=length($0);
if (current_length >= expected_length) {
printf("%d at line # %d %sn", current_length, NR, $0);
}
}
END {
}
EOF
for file in $*
do
echo "$file"
cat $file | awk -v expected_length=$length -f $LONGLINE |sort -nr
done
rm $LONGLINE
https://github.com/lordofrain/tools/blob/master/longest-line/longest-line.sh
Method 10
This is a solution using python
python -c 'import sys;print(max(open(sys.argv[1],"r").readlines(), key=len))' file.txt
Method 11
You can use wc:
wc -L fileName
Method 12
(edit of code above in @ДМИТРИЙ МАЛИКОВ (Dmitry Malikov)‘s
popular post from 2011-11-13:)
this prints out the line number, the length, and underlines the contents of only the first longest line:
awk 'length>len{len=length;line=FNR;long=$0}END{print"line="line" len="len" long=33[4mn"long"33[0m"}' <"${filename}"
(underlining seemed best because the text might contain color sequences or white space.)
also, to strip out non-printing characters (if desired), instead of just <"${filename}",
you could use this very concise “ansifilter” alternative found
here:
<(sed "s,x1b[[0-9;]*[a-zA-Z],,g" "${filename}"|expand)
(or check out the real ansifilter for serious projects–thanks to whoever mentioned it before me!)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0