I have to grep through some JSON files in which the line lengths exceed a few thousand characters. How can I limit grep to display context up to N characters to the left and right of the match? Any tool other than grep would be fine as well, so long as it available in common Linux packages.
This would be example output, for the imaginary grep switch Ф:
$ grep -r foo * hello.txt: Once upon a time a big foo came out of the woods. $ grep -Ф 10 -r foo * hello.txt: ime a big foo came of t
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Try to use this one:
grep -r -E -o ".{0,10}wantedText.{0,10}" *
-E tells, that you want to use extended regex
-o tells, that you want to print only the match
-r grep is looking for result recursively in the folder
REGEX:
{0,10} tells, how many arbitrary characters you want to print
. represents an arbitrary character (a character itself wasn’t important here, just their number)
Edit: Oh, I see, that Joseph recommends almost the same solution as I do 😀
Method 2
With GNU grep:
N=10; grep -roP ".{0,$N}foo.{0,$N}" .
Explanation:
-o=> Print only what you matched-P=> Use Perl-style regular expressions- The regex says match 0 to
$Ncharacters followed byfoofollowed by 0 to$Ncharacters.
If you don’t have GNU grep:
find . -type f -exec
perl -nle '
BEGIN{$N=10}
print if s/^.*?(.{0,$N}foo.{0,$N}).*?$/$ARGV:$1/
' {} ;
Explanation:
Since we can no longer rely on grep being GNU grep, we make use of find to search for files recursively (the -r action of GNU grep). For each file found, we execute the Perl snippet.
Perl switches:
-nRead the file line by line-lRemove the newline at the end of each line and put it back when printing-eTreat the following string as code
The Perl snippet is doing essentially the same thing as grep. It starts by setting a variable $N to the number of context characters you want. The BEGIN{} means this is executed only once at the start of execution not once for every line in every file.
The statement executed for each line is to print the line if the regex substitution works.
The regex:
- Match any old thing lazily1 at the start of line (
^.*?) followed by.{0,$N}as in thegrepcase, followed byfoofollowed by another.{0,$N}and finally match any old thing lazily till the end of line (.*?$). - We substitute this with
$ARGV:$1.$ARGVis a magical variable that holds the name of the current file being read.$1is what the parens matched: the context in this case. - The lazy matches at either end are required because a greedy match would eat all characters before
foowithout failing to match (since.{0,$N}is allowed to match zero times).
1That is, prefer not to match anything unless this would cause the overall match to fail. In short, match as few characters as possible.
Method 3
Piping stdout to cut with the -b flag; you can instruct grep’s output to only bytes 1 through 400 per line.
grep "foobar" * | cut -b 1-400
Method 4
Taken from: http://www.topbug.net/blog/2016/08/18/truncate-long-matching-lines-of-grep-a-solution-that-preserves-color/
and
https://stackoverflow.com/a/39029954/1150462
The suggested approach ".{0,10}<original pattern>.{0,10}" is perfectly good except for that the highlighting color is often messed up. I’ve created a script with a similar output but the color is also preserved:
#!/bin/bash
# Usage:
# grepl PATTERN [FILE]
# how many characters around the searching keyword should be shown?
context_length=10
# What is the length of the control character for the color before and after the matching string?
# This is mostly determined by the environmental variable GREP_COLORS.
control_length_before=$(($(echo a | grep --color=always a | cut -d a -f '1' | wc -c)-1))
control_length_after=$(($(echo a | grep --color=always a | cut -d a -f '2' | wc -c)-1))
grep -E --color=always "$1" $2 | grep --color=none -oE ".{0,$(($control_length_before + $context_length))}$1.{0,$(($control_length_after + $context_length))}"
Assuming the script is saved as grepl, then grepl pattern file_with_long_lines should display the matching lines but with only 10 characters around the matching string.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0