How to print the longest line in a file?

I’m looking for the simplest method to print the longest line in a file. I did some googling and surprisingly couldn’t seem to find an answer. I frequently print the length of the longest line in a file, but I don’t know how to actually print the longest line. Can anyone provide a solution to print the longest line in a file? Thanks in advance.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

cat ./text | awk ' { if ( length > x ) { x = length; y = $0 } }END{ print y }'

UPD: summarizing all the advices in the comments

awk 'length > max_length { max_length = length; longest_line = $0 } END { print longest_line }' ./text

Method 2

cat filename | awk '{ print length }' | sort -n | tail -1

Method 3

Grep the first longest line

grep -Em1 "^.{$(wc -L <file.txt)}$" file.txt

The command is unusually hard to read without practise because it mixes shell- and regexp syntax.
For explanation, I will use simplified pseudocode first. The lines starting with ## do not run in the shell.
This simplified code uses the file name F, and leaves out quoting and parts of regexps for readability.

How it works

The command has two parts, a grep– and a wc invocation:

## grep "^.{$( wc -L F )}$" F

The wc is used in a process expansion, $( ... ), so it is run before grep. It calculates the length of the longest line. The shell expansion syntax is mixed with the regular expression pattern syntax in a confusing way, so I will decompose the process expansion:

## wc -L F
42
## grep "^.{42}$" F

Here, the process expansion was replaced with the value it would return, creating the grep commandline that is used. We can now read the regular expression more easily: It matches exactly from start (^) to end ($) of the line. The expression between them matches any character except newline, repeated by 42 times. Combined, that is lines that consist of exactly 42 characters.

Now, back to real shell commands: The grep option -E (--extended-regexp) allows to not escape the {} for readability. Option -m 1 (--max-count=1) makes it stop after the first line is found. The < in the wc command writes the file to its stdin, to prevent wc from printing the file name together with the length.

Which longest lines?

To make the examples more readable with the filename occurring twice, I will use a variable f for the filename; Each $f in the example could be replaced by the file name.

f="file.txt"

Show the first longest line – the first line that is as long as the longest line:

grep -E -m1 "^.{$(wc -L <"$f")}$" "$f"

Show all longest lines – all lines that are as long as the longest line:

grep -E "^.{$(wc -L <"$f")}$" "$f"

Show the last longest line – the last line that is as long as the longest line:

tac "$f" | grep -E -m1 "^.{$(wc -L <"$f")}$"

Show the single longest line – the longest line longer than all other lines, or fail:

[ $(grep -E "^.{$(wc -L <"$f")}$" "$f" | wc -l) = 1 ] && grep -E "^.{$(wc -L <"$f")}$" "$f"

(The last command is even more inefficient than the others, as it repeats the complete grep command. It should obviously be decomposed so that the output of wc and the lines written by grep are saved to variables.
Note that all longest lines may actually be all lines. For saving in a variable, only the first two lines need to be kept.)

Method 4

sed -rn "/.{$(<file expand -t1 |wc -L)}/{p;q}" file

This first reads the file inside the command substitution and outputs the length of the longest line, (previously, expand converts tabs to spaces, to overcome the semantics of wc -L — each tab in the line will add 8 instead of 1 to line length). This length is then used in a sed expression meaning “find a line this number of characters long, print it, then quit”. So this actually can be as optimal as the longest line is near to the top of the file, heheh (thanks fered for the awesome and constructive comments).

Another, I had thought earlier than the sed one (in bash):

#!/bin/bash
while read -r line; do
    (( ${#line} > max )) && max=${#line} && longest="$line"
done
echo "$longest"

Method 5

Here’s a Perl solution:

perl -e 'while(<>){
           $l=length;  
           $l>$m && do {$c=$_; $m=$l}  
         } print $c' file.txt

Or, if you want to print all the longest lines

perl -e 'while(<>){
           $l=length;
           push @{$k{$l}},$_;
           $m=$l if $l>$m;
         } print @{$k{$m}}' file.txt

Since I had nothing better to do, I ran some benchmarks on a 625M text file. Surprisingly, my Perl solution was consistently faster than the others. Granted, the difference with the accepted awk solution is tiny, but it is there. Obviously, solutions that print multiple lines are slower so I have sorted by type, fastest to slowest.

Print only one of the longest lines:

$ time perl -e 'while(<>){
           $l=length;  
           $l>$m && do {$c=$_; $m=$l}  
         } print $c' file.txt 
real    0m3.837s
user    0m3.724s
sys     0m0.096s



$ time awk 'length > max_length { max_length = length; longest_line = $0 }
 END { print longest_line }' file.txt
real    0m5.835s
user    0m5.604s
sys     0m0.204s



$ time sed -rn "/.{$(<file.txt expand -t1 |wc -L)}/{p;q}" file.txt 
real    2m37.348s
user    2m39.990s
sys     0m1.868s

Print all longest lines :

$ time perl -e 'while(<>){
           $l=length;
           push @{$k{$l}},$_;
           $m=$l if $l>$m;
         } print @{$k{$m}}' file.txt 
real    0m9.263s
user    0m8.417s
sys     0m0.760s


$ time awk 'length >x { delete y; x=length }
     length==x { y[NR]=$0 } END{ for (z in y) print y[z] }' file.txt
real    0m10.220s
user    0m9.925s
sys     0m0.252s


## This is Chris Down's bash solution
$ time ./a.sh < file.txt 
Max line length: 254
Lines matched with that length: 2
real    8m36.975s
user    8m17.495s
sys     0m17.153s

Method 6

The following example was going to be, and should have been, a comment to dmitry.malikov’s answer, but because of the Useless Use of Visible Comment Space there, I’ve chosen to present it here, where it will at least be seen…

This is a simple variation of the dmitry’s
single-pass awk method.
It prints all “equal longest” lines. (Note. delete array is a gawk extension).

awk 'length >x { delete y; x=length }
     length==x { y[NR]=$0 } END{ for (z in y) print y[z] }' file

Method 7

In pure bash:

#!/bin/bash

_max_length=0
while IFS= read -r _line; do
    _length="${#_line}"
    if (( _length > _max_length )); then
        _max_length=${_length}
        _max_line=( "${_line}" )
    elif (( _length == _max_length )); then
        _max_line+=( "${_line}" )
    fi
done

printf 'Max line length: %dn' "${_max_length}"
printf 'Lines matched with that length: %dn' "${#_max_line[@]}"
(( ${#_max_line[@]} )) && printf '%sn' '----------------' "${_max_line[@]}"

Method 8

awk '{ print length(), $0 | "sort -n" }' file.txt | tail -1

Reference: https://www.systutorials.com/how-to-sort-lines-by-length-in-linux/

Method 9

I have developed a small shell script for this. It displays length, line # and line itself by length that exceeds a particular size like 80 characters:

#!/bin/sh

# Author: Surinder

if test $# -lt 2
then
   echo "usage: $0 length file1 file2 ..."
   echo "usage: $0 80 hello.c"
   exit 1
fi

length=$1

shift

LONGLINE=/tmp/longest-line-$$.awk

cat << EOF > $LONGLINE
  BEGIN {
  }

  /.*/ {
    current_length=length($0);
    if (current_length >= expected_length) {
       printf("%d at line # %d %sn", current_length, NR, $0);
    }
  }

  END {
  }
EOF

for file in $*
do
  echo "$file"
  cat $file | awk -v expected_length=$length -f $LONGLINE |sort -nr
done

rm $LONGLINE

https://github.com/lordofrain/tools/blob/master/longest-line/longest-line.sh

Method 10

This is a solution using python

python -c 'import sys;print(max(open(sys.argv[1],"r").readlines(), key=len))' file.txt

Method 11

You can use wc:

wc -L fileName

Method 12

(edit of code above in @ДМИТРИЙ МАЛИКОВ (Dmitry Malikov)‘s
popular post from 2011-11-13:)

this prints out the line number, the length, and underlines the contents of only the first longest line:

awk 'length>len{len=length;line=FNR;long=$0}END{print"line="line" len="len" long=33[4mn"long"33[0m"}' <"${filename}"

(underlining seemed best because the text might contain color sequences or white space.)

also, to strip out non-printing characters (if desired), instead of just <"${filename}",
you could use this very concise “ansifilter” alternative found
here:

<(sed "s,x1b[[0-9;]*[a-zA-Z],,g" "${filename}"|expand)

(or check out the real ansifilter for serious projects–thanks to whoever mentioned it before me!)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x