I’m running into weird behavior when trying to grep a man page on macOS. For example, the Bash man page clearly has an occurrence of the string NAME:
$ man bash | head -5 | tail -1 NAME
And if I grep for name I do get results, but if I grep for NAME I don’t:
$ man bash | grep 'NAME' $ man bash | grep NAME
I’ve tried other uppercase words that I know are in there, and searching for SHELL yields nothing whereas searching for BASH yields results.
What’s going on here?
Update: Thanks for all the answers! I thought it worth adding the context in which I ran into this. I wanted to write a bash function to wrap man and in cases where I’ve tried to look up the man page for a shell builtin, jump to the relevant section of the Bash man page. There might be a better way, but here’s what I’ve got currently:
man () {
case "$(type -t "$1")" in
builtin)
local pattern="^ *$1"
if bashdoc_match "$pattern +[-[]"; then
command man bash | less --pattern="$pattern +[-[]"
elif bashdoc_match "$patternb"; then
command man bash | less --pattern="$pattern[[:>:]]"
else
command man bash
fi
;;
keyword)
command man bash | less --hilite-search --pattern='^SHELL GRAMMAR$'
;;
*)
command man "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3c187c">[email protected]</a>"
;;
esac
}
bashdoc_match() {
command man bash | col -b | grep -l "$1" > /dev/null
}
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If you add a | sed -n l to that tail command, to show non-printable characters, you’ll probably see something like:
NbNAbAMbMEbE
That is, each character is written as X Backspace X. On modern terminals, the character ends up being written over itself (as Backspace aka BS aka b aka ^H is the character that moves the cursor one column to the left) with no difference. But in ancient tele-typewriters, that would cause the character to appear in bold as it gets twice as much ink.
Still, pagers like more/less do understand that format to mean bold, so that’s still what roff does to output bold text.
Some man implementations would call roff in a way that those sequences are not used (or internally call col -b -p -x to strip them like in the case of the man-db implementation (unless the MAN_KEEP_FORMATTING environment variable is set)), and don’t invoke a pager when they detect the output is not going to a terminal (so man bash | grep NAME would work there), but not yours.
You can use col -b to remove those sequences (there are other types (_ BS X) as well for underline).
For systems using GNU roff (like GNU or FreeBSD), you can avoid those sequences being used in the first place by making sure the -c -b -u options are passed to grotty, for instance by making sure the -P-cbu options is passed to groff.
For instance by creating a wrapper script called groff containing:
#! /bin/sh - exec /usr/bin/groff -P-cbu "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e4c0a4">[email protected]</a>"
That you put ahead of /usr/bin/groff in $PATH.
With macOS’ man (also using GNU roff), you can create a man-no-overstrike.conf with:
NROFF /usr/bin/groff -mandoc -Tutf8 -P-cbu
And call man as:
man -C man-no-overstrike.conf bash | grep NAME
Still with GNU roff, if you set the GROFF_SGR environment variable (or don’t set the GROFF_NO_SGR variable depending on how the defaults have been set at compile time), then grotty (as long as it’s not passed the -c option) will use ANSI SGR terminal escape sequences instead of those BS tricks for character attributes. less understand them when called with the -R option.
FreeBSD’s man calls grotty with the -c option unless you’re asking for colours by setting the MANCOLOR variable (in which case -c is not passed to grotty and grotty reverts to the default of using ANSI SGR escape sequences there).
MANCOLOR=1 man bash | grep NAME
will work there.
On Debian, GROFF_SGR is not the default. If you do:
GROFF_SGR=1 man bash | grep NAME
however, because man‘s stdout is not a terminal, it takes it upon itself to also pass a GROFF_NO_SGR variable to grotty (I suppose so it can use col -bpx to strip the BS sequences as col doesn’t know how to strip the SGR sequences, even though it still does it with MAN_KEEP_FORMATTING) which overrides our GROFF_SGR. You can do instead:
GROFF_SGR=1 MANPAGER='grep NAME' man bash
(in a terminal) to have the SGR escape sequences.
That time, you’ll notice that some of those NAMEs do appear in bold on the terminal (and in a less -R pager). If you feed the output to sed -n l (MANPAGER='sed -n /NAME/l'), you’ll see something like:
33[1mNAME33[0m$
Where e[1m is the sequence to enable bold in ANSI compatible terminals, and e[0m the sequence to revert all SGR attributes to the default.
On that text grep NAME works as that text does contain NAME, but you could still have problems if looking for text where only parts of it is in bold/underline…
Method 2
If you look at any manual page, you’ll notice that the headers are in bold. This is achieved through formatting them with control characters. To be able to grep like you’re wanting to, these have to be stripped out.
The col utility may be used for this:
$ man bash | col -b | grep 'NAME'
The -b option has the following description on OpenBSD:
Do not output any backspaces, printing only the last character
written to each column position. This can be useful in
processing the output of mandoc(1).
Linux the col manual (on Ubuntu) doesn’t have the last sentence in there (but it works in the same way).
On Linux, unsetting the MAN_KEEP_FORMATTING environment variable (or setting it to an empty string) may also help, and will allow you to grep without passing the output of man through col -b.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0