Is it possible to “protect” an IFS character from field splitting?

In a POSIX sh, or in the Bourne shell (as in Solaris 10’s /bin/sh), is it possible to have something like:

a='some var with spaces and a special space'
printf "%sn" $a

And, with the default IFS, get:

some
var
with
spaces
and
a
special space

That is, protect the space between special and space by some combination of quoting or escaping?

The number of words in a isn’t known beforehand, or I’d try something like:

a='some var with spaces and a special space'
printf "%sn" "$a" | while read field1 field2 ...

The context is this bug reported in Cassandra, where OP tried to set an environment variable specifying options for the JVM:

export JVM_EXTRA_OPTS='-XX:OnOutOfMemoryError="echo oh_no"'

In the script executing Cassandra, which has to support POSIX sh and Solaris sh:

JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
#...
exec $NUMACTL "$JAVA" $JVM_OPTS $cassandra_parms -cp "$CLASSPATH" $props "$class"

IMO the only way out here is to use a script wrapping the echo oh_no command. Is there another way?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Not really.

One solution is to reserve a character as the field separator. Obviously it will not be possible to include that character, whatever it is, in an option. Tab and newline are obvious candidates, if the source language makes it easy to insert them. I would avoid multibyte characters if you want portability (e.g. dash and BusyBox don’t support multibyte characters).

If you rely on IFS splitting, don’t forget to turn off wildcard expansion with set -f.

tab=$(printf 't')
IFS=$tab
set -f
exec java $JVM_EXTRA_OPTS …

Another approach is to introduce a quoting syntax. A very common quoting syntax is that a backslash protects the next character. The downside of using backslashes is that so many different tools use it as a quoting characters that it can sometimes be difficult to figure out how many backslashes you need.

set java
eval 'set -- "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3a1e7a">[email protected]</a>"' $(printf '%sn' "$JVM_EXTRA_OPTS" | sed -e 's/[^ ]/\&/g' -e 's/\\/\/g') …
exec "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a581e5">[email protected]</a>"

Method 2

If you were using Bash or similar, an array would do the trick:

a=(some var with spaces and a 'special space')

But since the POSIX shell does not have these, the best internal approach I can see is to actually use a special space. The non-breaking space (U+00A0) is well-suited to this purpose, but being outside ASCII requires agreement on the character set of the script.

a="some var with spaces and a special space"
# this is a non-breaking space ------^
echo "$a" 
| while read word; do printf '%sn' ${word} | sed '<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="572417">[email protected]</a> @ @g'; done
# this is a non-breaking space ----------------------^

This outputs:

some
var
with
spaces
and
a
special space

At the moment, I am unsure of how to include this in a variable expansion (it will need a subshell), but this should offer a starting point for further investigation.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x