Why does POSIX require certain shell built-ins to have an external implementation?

From this question about whether printf is a built-in for yash, comes this answer that quotes the POSIX standard.

The answer points out that the POSIX search sequence is to find an external implementation of the desired command, and then, if the shell has implemented it as a built-in, run the built-in. (For built-ins that aren’t special built-ins.)

Why does POSIX have this requirement for an external implementation to exist before allowing an internal implementation to be run?

It seems… arbitrary, so I am curious.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This is an “as if” rule.

Simply put: The behaviour of the shell as users see it should not change if an implementation decides to make a standard external command also available as shell built-in.

The contrast that I showed at https://unix.stackexchange.com/a/496291/5132 between the behaviours of (on the one hand) the PD Korn, MirBSD Korn, and Heirloom Bourne shells; (on the other hand) the Z, 93 Korn, Bourne Again, and Debian Almquist shells; and (on the gripping hand) the Watanabe shell highlights this.

For the shells that do not have printf as a built-in, removing /usr/bin from PATH makes an invocation of printf stop working. The POSIX conformant behaviour, exhibited by the Watanabe shell in its conformant mode, causes the same result. The behaviour of the shell that has a printf built-in is as if it were invoking an external command.

Whereas the behaviour of all of the non-conformant shells does not alter if /usr/bin is removed from PATH, and they do not behave as if they were invoking an external command.

What the standard is trying to guarantee to you is that shells can build-in all sorts of normally external commands (or implement them as its own shell functions), and you’ll still get the same behaviour from the built-ins as you did with the external commands if you adjust PATH to stop the commands from being found. PATH remains your tool for selecting and controlling what commands you can invoke.

(As explained at https://unix.stackexchange.com/a/448799/5132, years ago people chose the personality of their Unix by changing what was on PATH.)

One might opine that making the command always work irrespective of whether it can be found on PATH is in fact the point of making normally external commands built-in. (It’s why my nosh toolset just gained a built-in printenv command in version 1.38, in fact. Although this is not a shell.)

But the standard is giving you the guarantee that you’ll see the same behaviour for regular external commands that are not on PATH from the shell as you will see from other non-shell programs invoking the execvpe() function, and the shell will not magically be able to run (apparently) ordinary external commands that other programs cannot find with the same PATH. Everything works self-consistently from the user’s perspective, and PATH is the tool for controlling how it works.

Further reading

Method 2

That’s quite absurd and that’s why no shell is implementing it in its default mode.

The standard’s rationale and its illustrating example suggest that this was a botched attempt to have a regular built-in associated with a path, and let the user override it by having their own binary appear before it in PATH (eg. a printf built-in associated with /usr/bin/printf could be overridden by the /foo/bin/printf external command by setting PATH=/foo/bin:$PATH).

However, the standard did not end up requiring that, but something completely different (and also useless and unexpected).

You can read more about it in this bug report. Quoting from from the final accepted text:

Many existing implementations execute a regular built-in without
performing a PATH search. This behavior does not match the normative
text, and it does not allow script authors to override regular
built-in utilities via a specially crafted PATH. In addition, the
rationale explains that the intention is to allow authors to override
built-ins by modifying PATH, but this is not what the normative text
says
.

FWIW, I don’t think there’s any shell implementing the revised requirements from the accepted text, either.

Method 3

Follow-up vis-a-vis echo vs printf:

(Below, builtin means “special builtin”, and “regular builtin”s are not considered to be builtins by me since they are not built into the shell)

The first POSIX standardization committee could not agree on how to
standardize echo, so they compromised by issuing that if it was passed flags (-e,-n,-E, etc.) or if any arguments contained escape sequences (n,c,t, etc.) that the behavior was to be defined by the implementing shell rather than POSIX. Instead, the printf command was added and given well-defined behavior.
(source: Classic Shell Scripting, by Robbins and Beebe).

Although printf is well-defined, some shells do not have printf as a
builtin command (e.g. mksh). Instead, they use printf from /usr/bin/.
This meant all scripts run from that shell would print the same on a
given operating system (Ubuntu, Fedora, etc.), but that they wouldn’t
necessarily print the same across OSs (in fact, many users changed the
printf in their /usr/bin for this reason).

Alternatively, shells with printf as a builtin would print the same
regardless of OS, but only if used as implemented for the shell. However,
since printf behavior is defined by the POSIX standard, that isn’t necessarily
a concern for programmers. However, if PATH were overriden for shells that use printf from /usr/bin/, printf wouldn’t be found.

Though all shells have echo as a builtin, some interpret escape sequences
directly (e.g. ash) while others (most) require a -e flag: the behavior is
not defined by POSIX, but by the shell.

One of the main annoyances of echo vs. printf is that echo prints new
lines at the end of the string by default, but printf does not. printf
requires the n escape sequence to print new lines. Conversely, to prevent
echo from printing a new line, the c escape sequence is required
(potentially, also requiring the -e flag).

printf is recommended for maximum portability since its behavior is defined by POSIX, but I personally find explicitly printing a new line at the end of each line is quite annoying (most lines I write require a new line at the end and I very rarely need to suppress echo‘s printing of new lines). On the other hand,
echo is always available since it’s a builtin (no risk of not being found on $PATH) and a simple check can be performed to determine whether the -e flag is needed and a corresponding aliased echo made:

#! /bin/sh -

# Determine if "builtin" command exists.
BUILTIN='builtin'
  
if ! ("${BUILTIN}" echo 123 >/dev/null 2>&1); then
  BUILTIN=''
fi
export BUILTIN

ECHO='echo -e'

if ${BUILTIN} [ "`echo -e test`" = '-e test' ]; then
  ECHO='echo'
fi
export ECHO

# Now use "${ECHO}" where you would normally use "echo"...

Personally, I prefer to do this and only use printf if I need special formatting.

UPDATE:
I should give proper credit where credit is due. The shell code above was taken directly from shunit2. Credit goes to Kate Ward and the shunit2 development team for that one! (Well done 😉 )

Method 4

Adding this as well (Classic Shell Scripting by Robbins and Beebe is a great book):

The shell has a number of commands that are built-in. This means that the shell itself executes the command, instead of running an external
program in a separate process. Furthermore, POSIX distinguishes
between “special” built-ins and “regular” built-ins. [Most regular
built-ins] have to be built-in for the shell to function correctly
(e.g., read). Others are typically built into the shell only for
efficiency (e.g., true and false). The standard allows other
commands to be built-in for efficiency as well, but all regular
built-ins must be accessible as separate programs that can be executed
directly by other binary programs.
The distinction between special
and regular built-in commands comes into play when the shell searches
for commands to execute. The command-search order is special built-ins
first, then shell functions, then regular built-ins, and finally
external commands found by searching the directories listed in
$PATH. This search order makes it possible to define shell functions
that extend or override regular shell builtins. This feature is used
most often in interactive shells. For example, suppose you would like
the shell’s prompt to contain the last component of the current
directory’s pathname. The easiest way to make this happen is to have
the shell change PS1 each time you change directories. You could
just write your own [cd] function [for this]. There is one small
fly in the ointment here. How does the shell function access the
functionality of the “real” cd command?…What’s needed is an
“escape hatch” that tells the shell to bypas the search for functions
and access the real command. This is the job of the command built-in
command.

[However] the command command is not a special builtin command! Woe be to the shell programmer who defines a function named command! The POSIX
standard provides the following two additional special qualities for
the special built-in commands:

  1. A syntax error in a special built-in utility may cause a shell executing that utility to abort, while a syntax error in a regular
    built-in utility shall not cause a shell executing that utility to
    abort. If a special built-in utility encountering a syntax error does
    not abort the shell, its exit value shall be nonzero.
  2. Variable assignments specified with special built-in utilities remain in effect after the built-in completes; this shall not be the
    case with a regular built-in or other utility. [That is] you can
    specify variable assignment at the front of a command and the variable
    will have that value in the environment of the executed command only,
    without affecting the variable in the current shell or subsequent
    commands. (e.g. PATH=/bin:/usr/bin: awk '...') However, when such an
    assignment is used with a special built-in command, the assignment
    stays in effect from then on, even after the special built-in
    completes.

Arnold Robbins and Nelson H. F. Beebe. Classic Shell Scripting: Hidden Commands that Unlock the Power of Unix (p. 262-5). O’Reilly Media. Kindle Edition.

Note that the command command causes the shell to treat the specified command and arguments as a simple command, suppressing shell function lookup. From the IBM Docs

Normally, when a / (slash) does not precede a command (indicating a
specific path), the shell locates a command by searching the following
categories:

special shell built-ins shell functions regular shell built-ins PATH
environment variable For example, if there is a function with the same
name as a regular built-in, the system uses the function. The command
command allows you to call a command that has the same name as a
function and get the simple command.

The command -v and command -V commands write to standard output what
path name will be used by the shell and how the shell interprets the
command type (built-in, function, alias, and so forth). Since the -v
and -V flags produce output in relation to the current shell
environment, the command command is provided as a Korn shell or POSIX
shell regular built-in command. The /usr/bin/command command might not
produce correct results, because it is called in a subshell or
separate command execution environment,. In the following example the
shell is unable to identify aliases, subroutines, or special shell
commands:

(PATH=foo command -v) nohup command -v

Thus, in my previous example, I used the bash builtin instead of command because had I put it in a subshell, it would not have worked properly.

I second @mosvy: it appears the standard and normative text don’t match (quite absurd indeed).


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x