Using a common command line tool like sed or awk, is it possible to join all lines that end with a given character, like a backslash?
For example, given the file:
foo bar bash baz dude happy
I would like to get this output:
foo bar bash baz dude happy
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
a shorter and simpler sed solution:
sed '
: again
/\$/ {
N
s/\n//
t again
}
' textfile
or one-liner if using GNU sed:
sed ':x; /\$/ { N; s/\n//; tx }' textfile
Method 2
It is possibly easiest with perl (since perl is like sed and awk, I hope it is acceptable to you):
perl -p -e 's/\n//'
Method 3
Here’s an awk solution. If a line ends with a , strip the backslash and print the line with no terminating newline; otherwise print the line with a terminating newline.
awk '{if (sub(/\$/,"")) printf "%s", $0; else print $0}'
It’s also not too bad in sed, though awk is obviously more readable.
Method 4
This is not an answer as such. It is a side issue about sed.
Specifically, I needed to take Gilles sed command apart piece by piece to understand it… I started writing some notes on it, and then thought it may be useful here to someone…
so here it is… Gilles’ sed script in documented format:
#!/bin/bash
#######################################
sed_dat="$HOME/ztest.dat"
while IFS= read -r line ;do echo "$line" ;done <<'END_DAT' >"$sed_dat"
foo bar
bash
baz
dude
happy
yabba dabba
doo
END_DAT
#######################################
sedexec="$HOME/ztest.sed"
while IFS= read -r line ;do echo "$line" ;done <<'END-SED' >"$sedexec";
sed -nf "$sedexec" "$sed_dat"
s/\$// # If a line has trailing '', remove the ''
#
t'Hold-append' # branch: Branch conditionally to the label 'Hold-append'
# The condition is that a replacement was made.
# The current pattern-space had a trailing '' which
# was replaced, so branch to 'Hold-apend' and append
# the now-truncated line to the hold-space
#
# This branching occurs for each (successive) such line.
#
# PS. The 't' command may be so named because it means 'on true'
# (I'm not sure about this, but the shoe fits)
#
# Note: Appending to the hold-space introduces a leading 'n'
# delimiter for each appended line
#
# eg. compare the hex dump of the follow 4 example commands:
# 'x' swaps the hold and patten spaces
#
# echo -n "a" |sed -ne 'p' |xxd -p ## 61
# echo -n "a" |sed -ne 'H;x;p' |xxd -p ## 0a61
# echo -n "a" |sed -ne 'H;H;x;p' |xxd -p ## 0a610a61
# echo -n "a" |sed -ne 'H;H;H;x;p' |xxd -p ## 0a610a610a61
# No replacement was made above, so the current pattern-space
# (input line) has a "normal" ending.
x # Swap the pattern-space (the just-read "normal" line)
# with the hold-space. The hold-space holds the accumulation
# of appended "stripped-of-backslah" lines
G # The pattern-space now holds zero to many "stripped-of-backslah" lines
# each of which has a preceding 'n'
# The 'G' command Gets the Hold-space and appends it to
# the pattern-space. This append action introduces another
# 'n' delimiter to the pattern space.
s/n//g # Remove all 'n' newlines from the pattern-space
p # Print the pattern-space
s/.*// # Now we need to remove all data from the pattern-space
# This is done as a means to remove data from the hold-space
# (there is no way to directly remove data from the hold-space)
x # Swap the no-data pattern space with the hold-space
# This leaves the hold-space re-initialized to empty...
# The current pattern-space will be overwritten by the next line-read
b # Everything is ready for the next line-read. It is time to make
# an unconditional branch the to end of process for this line
# ie. skip any remaining logic, read the next line and start the process again.
:'Hold-append' # The ':' (colon) indicates a label..
# A label is the target of the 2 branch commands, 'b' and 't'
# A label can be a single letter (it is often 'a')
# Note; 'b' can be used without a label as seen in the previous command
H # Append the pattern to the hold buffer
# The pattern is prefixed with a 'n' before it is appended
END-SED
#######
Method 5
Yet another common command line tool would be ed, which by default modifies files in-place and therefore leaves file permissions unmodified (for more information on ed see Editing files with the ed text editor from scripts)
str=' foo bar bash 1 bash 2 bash 3 bash 4 baz dude happy xxx vvv 1 vvv 2 CCC ' # We are using (1,$)g/re/command-list and (.,.+1)j to join lines ending with a '' # ?? repeats the last regex search. # replace ',p' with 'wq' to edit files in-place # (using Bash and FreeBSD ed on Mac OS X) cat <<-'EOF' | ed -s <(printf '%s' "$str") H ,g/\$/s/// .,.+1j ??s/// .,.+1j ,p EOF
Method 6
Using the fact that read in the shell will interpret backslashes when used without -r:
$ while IFS= read line; do printf '%sn' "$line"; done <file foo bar bash baz dude happy
Note that this will also interpret any other backslash in the data.
Method 7
The Mac version based on @Giles solution would look like this
sed ':x
/\$/{N; s|\'$'\n||; tx
}' textfile
Where the main difference is how newlines are represented, and combining any further into one line breaks it
Method 8
A simple(r) solution that loads the whole file in memory:
sed -z 's/\n//g' file # GNU sed 4.2.2+.
Or an still short one which works understanding (output) lines (GNU syntax):
sed ':x;/\$/{N;bx};s/\n//g' file
On one line (POSIX syntax):
sed -e :x -e '/\$/{N;bx' -e '}' -e 's/\n//g' file
Or use awk (if the file is too big to fit in memory):
awk '{a=sub(/\$/,"");printf("%s%s",$0,a?"":RS)}' file
Method 9
You can use cpp, but it produces some empty lines where it merged the output, and some introduction which I remove with sed – maybe it can be done with cpp-flags and options as well:
echo 'foo bar bash baz dude happy' | cpp | sed 's/# 1 .*//;/^$/d' foo bar bash baz dude happy
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0