How to split a string into an array in bash

I have a problem with the output of a program. I need to launch a command in bash and take its output (a string) and split it to add new lines in certain places. The string looks like this:

battery.charge: 90 battery.charge.low: 30 battery.runtime: 3690 battery.voltage: 230.0 device.mfr: MGE UPS SYSTEMS device.model: Pulsar Evolution 500

basically it is an xxx.yy.zz: value, but the value may contain spaces.
Here’s the output I’d like to get

battery.charge: 90
battery.charge.low: 30
battery.runtime: 3690
battery.voltage: 230.0
device.mfr: MGE UPS SYSTEMS
device.model: Pulsar Evolution 500

I have an idea to search for first dot and then look back from that position for space to put a new line there, but I’m not sure how to achieve it in Bash.

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

With GNU sed, you can match each contiguous string (i.e. without whitespace) terminated by : and then place a newline before all but the first one:

sed 's/[^[:space:]]+:/n&/g2'

If your version of sed does not support the gn extension, you can use a plain g modifier

sed 's/[^[:space:]]{1,}:/
&/g'

which will work the same except for printing an additional newline before the first key. You could use perl -pe 's/S+:/n$&/g' with the same proviso (there may be a perl equivalent of the GNU sed g2 but I don’t know it).

Method 2

Pure bash solution, no external tools used to process the strings, just parameter expansion:

#! /bin/bash
str='battery.charge: 90 battery.charge.low: 30 battery.runtime: 3690 battery.voltage: 230.0 device.mfr: MGE UPS SYSTEMS device.model: Pulsar Evolution 500'

IFS=: read -a fields <<< "$str"

for (( i=0 ; i < ${#fields[@]} ; i++ )) ; do
    f=${fields[i]}

    notfirst=$(( i>0 ))
    last=$(( i+1 == ${#fields[@]} ))

    (( notfirst )) && echo -n ${f% *}

    start=('' $'n' ' ')
    colon=('' ': ')
    echo -n "${start[notfirst + last]}${f##* }${colon[!last]}"
done
echo

Explanation: $notfirst and $last are booleans. The part before the last space ${f% *} isn’t printed for the first field, as there is no such thing. $start and $colon hold various strings that separate the fields: at the first item, notfirst + last is 0, so nothing is prepended, for the rest of the lines, $notfirst is 1, so a newline is printed, and for the last line, the addition gives 2, so a space is printed. Then, the part after the last space is printed ${f##* }. Colon is printed for all lines except the last one.

Method 3

A perl solution:

$ perl -pe 's{S+:}{$seen++ ? "n$&" : "$&"}ge' file
battery.charge: 90 
battery.charge.low: 30 
battery.runtime: 3690 
battery.voltage: 230.0 
device.mfr: MGE UPS SYSTEMS 
device.model: Pulsar Evolution 500

Explanation

S+: matches string end with :.
With all matched strings, we insert the newline before them ("n$&") except the first one ($seen++).

Method 4

It’s easier using a tool that supports lookarounds:

$ s="battery.charge: 90 battery.charge.low: 30 battery.runtime: 3690 battery.voltage: 230.0 device.mfr: MGE UPS SYSTEMS device.model: Pulsar Evolution 500"
$ grep -oP 'S+:s+.*?(?=s+S+:|$)' <<< "$s"
battery.charge: 90
battery.charge.low: 30
battery.runtime: 3690
battery.voltage: 230.0
device.mfr: MGE UPS SYSTEMS
device.model: Pulsar Evolution 500

If you wanted the result in an array:

$ IFS=$'n' foo=($(grep -oP 'S+:s+.*?(?=s+S+:|$)' <<< "$s"))
$ for i in "${!foo[@]}"; do echo "$i<==>${foo[i]}"; done
0<==>battery.charge: 90
1<==>battery.charge.low: 30
2<==>battery.runtime: 3690
3<==>battery.voltage: 230.0
4<==>device.mfr: MGE UPS SYSTEMS
5<==>device.model: Pulsar Evolution 500

EDIT: Explanation of the regex:

'S+:s+.*?(?=s+S+:|$)'

S+ matches one or more non-whitespace characters
: matches :
s+ matches one or more spaces after the :
.*? denotes a non-greedy match
(?=s+S+:|$) is a lookahead assertion to determine if there is:
- one or more space followed by a string (non-whitespace charaters) and a colon, or
- end of string

So the string is split into parts like battery.charge: 90, … device.mfr: MGE UPS SYSTEMS, …

Below are links to a couple of online regular expression analyzers:

Method 5

Here’s a naive approach that should work assuming you don’t care that tabs and newlines in the input (if any) are converted to plain spaces.

The idea is simple: split the input on whitespace, and print every token except that you prepend tokens that end with : with a newline (and re-add a space in front of the others). The $count variable and related if are only useful to prevent an initial empty line. Could be removed if that’s not a problem. (The script assumes the input is in a file called intput in the current directory.)

#! /bin/bash

count=0
for i in $(<input) ; do
   fmt=
   if [[ $i =~ :$ ]] ; then
       if [[ $count -gt 0 ]] ; then
           fmt="n%s"
       else
           fmt="%s"
       fi
       ((count++))
   else
       fmt=" %s"
   fi
   printf "$fmt" "$i"
done
echo
echo "Num items: $count"

I hope someone can come up with a nicer alternative though.

$ cat input
battery.charge: 90 battery.charge.low: 30 battery.runtime: 3690 battery.voltage: 230.0 device.mfr: MGE UPS SYSTEMS device.model: Pulsar Evolution 500
$ ./t.sh
battery.charge: 90
battery.charge.low: 30
battery.runtime: 3690
battery.voltage: 230.0
device.mfr: MGE UPS SYSTEMS
device.model: Pulsar Evolution 500
Num items: 6

Method 6

You can use awk(1) with the following script split.awk:

BEGIN { RS=" "; first=1; }
first { first=0; printf "%s", $1; next; }
/[a-z]+.[^:]+:/ { printf "n%s", $1; next; }
{ printf " %s", $1 }
END { printf "n" }

When you run

awk -f split.awk input.dat

you will get

battery.charge: 90
battery.charge.low: 30
battery.runtime: 3690
battery.voltage: 230.0
device.mfr: MGE UPS SYSTEMS
device.model: Pulsar Evolution 500

The idea is to let awk split the input when it sees a space (setting record separator RS in line 1). Then it matches xxx.yy.zz: values in line 2 and 3 (distinguishing the very first match from subsequent ones), while line 4 matches whenever line 2 and 3 do not match. Line 5 just print the last newline.

Method 7

Here is short awk script to demo awk power.

awk '
len=patsplit($0, namesArr,"[^ :]+: ", valuesArr) {
    for(i=0;i<=len;i++)
        print namesArr[i], valuesArr[i]
}' input.txt

input.txt

battery.charge: 90 battery.charge.low: 30 battery.runtime: 3690 battery.voltage: 230.0 device.mfr: MGE UPS SYSTEMS device.model: Pulsar Evolution 500

output:

battery.charge:  90
battery.charge.low:  30
battery.runtime:  3690
battery.voltage:  230.0
device.mfr:  MGE UPS SYSTEMS
device.model:  Pulsar Evolution 500

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating