In python
re.sub(r"(?<=.)(?=(?:...)+$)", ",", stroke )
To split a number by triplets, e.g.:
echo 123456789 | python -c 'import sys;import re; print re.sub(r"(?<=.)(?=(?:...)+$)", ",", sys.stdin.read());' 123,456,789
How to do the same with bash/awk?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
bash‘s printf supports pretty much everything you can do in the printf C function
type printf # => printf is a shell builtin printf "%'d" 123456 # => 123,456
printf from coreutils will do the same
/usr/bin/printf "%'d" 1234567 # => 1,234,567
Method 2
With sed:
$ echo "123456789" | sed 's/([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})/1,2,3/g'
123,456,789
(Note that this only works for exactly 9 digits!)
or this with sed:
$ echo "123456789" | sed ':a;s/B[0-9]{3}>/,&/;ta'
123,456,789
With printf:
$ LC_NUMERIC=en_US printf "%'.fn" 123456789 123,456,789
Method 3
You can use numfmt:
$ numfmt --grouping 123456789 123,456,789
Or:
$ numfmt --g 123456789 123,456,789
Note that numfmt is not a POSIX utility, it is part of GNU coreutils.
Method 4
cat <<'EOF' | 13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096 EOF perl -wpe '1 while s/(d+)(ddd)/$1,$2/;'
produces:
13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096
This is accomplished by splitting the string of digits into 2 groups, the right-hand group with 3 digits, the left-hand group with whatever remains, but at least one digit. Then everything is replaced by the 2 groups, separated by a comma. This continues until the substitution fails. The options “wpe” are for error listing, enclose the statement inside a loop with an automatic print, and take the next argument as the perl “program” (see command perldoc perlrun for details).
Best wishes … cheers, drl
Method 5
awk and bash have good built-in solutions, based on printf, as described in the other answers. But first, sed.
For sed, we need to do it “manually”. The general rule is that if you have four consecutive digits, followed by a non-digit (or end-of-line) then a comma should be inserted between the first and second digit.
For example,
echo 12345678 | sed -re 's/([0-9])([0-9]{3})($|[^0-9])/1,23/'
will print
12345,678
We obviously need to then keep repeating the process, in order to keep adding enough commas.
sed -re ' :restart ; s/([0-9])([0-9]{3})($|[^0-9])/1,23/ ; t restart '
In sed, the t command specifies a label that will be jumped to if the last s/// command was successful. I therefore define a label with :restart, in order that it jumps back.
Here is a bash demo (on ideone) that works with any number of digits:
function thousands {
sed -re ' :restart ; s/([0-9])([0-9]{3})($|[^0-9])/1,23/ ; t restart '
}
echo 12 | thousands
echo 1234 | thousands
echo 123456 | thousands
echo 1234567 | thousands
echo 123456789 | thousands
echo 1234567890 | thousands
Method 6
With some awk implementations:
echo "123456789" | awk '{ printf("%'"'"'dn",$1); }'
123,456,789
"%'"'"'dn" is: "%(single quote)(double quote)(single quote)(double quote)(single quote)dn"
That will use the configured thousand separator for your locale (typically , in English locales, space in French, . in Spanish/German…). Same as returned by locale thousands_sep
Method 7
A common use case for me is to modify the output of a command pipeline so that decimal numbers are printed with thousand separators. Rather than writing a function or script, I prefer to use a technique that I can customise on the fly for any output from a Unix pipeline.
I have found printf (provided by Awk) to be the most flexible and the memorable way to to accomplish this. The apostrophe/single quote character is specified by POSIX as a modifier to format decimal numbers and has the advantage that it’s locale-aware so it’s not restricted to using comma characters.
When running Awk commands from a Unix shell, there can be difficulties entering a singe-quote character inside a string delimited by single-quotes (to avoid shell expansion of positional variables, e.g., $1). In this case, I find the most readable and reliable way to enter the single-quote character is to enter it as an octal escape sequence (beginning with ).
Example:
printf "first 1000nsecond 10000000n" |
awk '{printf "%9s: %1147dn", $1, $2}'
first: 1,000
second: 10,000,000
Simulated output of a pipeline showing which directories are using the most disk space:
printf "7654321 /home/exportn110384 /home/incomingn" |
awk '{printf "%22s: %947dn", $2, $1}'
/home/export: 7,654,321
/home/incoming: 110,384
Other solutions are listed in How to escape a single quote inside awk.
Note: as warned against in Print a Single Quote, it’s recommended to avoid the use of hexadecimal escape sequences as they do not work reliably across different systems.
Method 8
$ echo 1232323 | awk '{printf(fmt,$1)}' fmt="%'6.3fn"
12,32,323.000
Method 9
If you are looking at BIG numbers I was unable to make the above solutions work. For example, lets get a really big number:
$ echo 2^512 |bc -l|tr -d -c [0-9]
13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096
Note I need the tr to remove backslash newline output from bc. This number is too big to treat as a float or fixed bit number in awk, and I don’t even want to build a regexp large enough to account for all the digits in sed. Rather, I can reverse it and put commas between groups of three digits, then unreverse it:
echo 2^512 |bc -l|tr -d -c [0-9] |rev |sed -e 's/([0-9][0-9][0-9])/1,/g' |rev
13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096
Method 10
a="13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096"
echo "$a" | rev | sed "s#[[:digit:]]{3}#&,#g" | rev
13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096
Method 11
A bash/awk (as requested) solution that works regardless of the length of the number and uses , regardless of the locale’s thousands_sep setting, and wherever the numbers are in the input and avoids adding the thousand separator after in 1.12345:
echo not number 123456789012345678901234567890 1234.56789 |
awk '{while (match($0, /(^|[^.0123456789])[0123456789]{4,}/))
$0 = substr($0, 1, RSTART+RLENGTH-4) "," substr($0, RSTART+RLENGTH-3)
print}'
Gives:
not number 123,456,789,012,345,678,901,234,567,890 1,234.56789
With awk implementations like mawk that don’t support the interval regex operators, change the regexp to /(^|[^.0123456789])[0123456789][0123456789][0123456789][0123456789]+/
Method 12
The following uses space as thousands separator, which is the practice at my place. Modifying it for using comma should be easy.
echo "1000066955"|sed -rn "s/([[:digit:]])([[:digit:]]{3})$/1 2/;T end;:loop s/([[:digit:]])([[:digit:]]{3})[[:space:]]/1 2 /;t loop;:end p;"
Method 13
I also wanted to have the part after the decimal separator correctly separated/spaced, therefore I wrote this sed-script which uses some shell variables to adjust to regional and personal preferences. It also takes into account different conventions for the number of digits grouped together:
#DECIMALSEP='.' # usa
DECIMALSEP=',' # europe
#THOUSSEP=',' # usa
#THOUSSEP='.' # europe
#THOUSSEP='_' # underscore
#THOUSSEP=' ' # space
THOUSSEP=' ' # thinspace
# group before decimal separator
#GROUPBEFDS=4 # china
GROUPBEFDS=3 # europe and usa
# group after decimal separator
#GROUPAFTDS=5 # used by many publications
GROUPAFTDS=3
function digitgrouping {
# FIXME: This is a workaround: BEGINNING has to be marked (and after
# alteration removed) for the first number to be spaced correctly (1234
# should be 1 234, and that only works if something is in front of that
# number).
sed -e 's%^%BEGINNING&%'
-e '
s%([0-9'"$DECIMALSEP"']+)'"$THOUSSEP"'%1__HIDETHOUSSEP__%g
:restartA ; s%([0-9])([0-9]{'"$GROUPBEFDS"'})(['"$DECIMALSEP$THOUSSEP"'])%1'"$THOUSSEP"'23% ; t restartA
:restartB ; s%('"$DECIMALSEP"'([0-9]{'"$GROUPAFTDS"'}'"$THOUSSEP"')*)([0-9]{'"$GROUPAFTDS"'})([0-9])%13'"$THOUSSEP"'4% ; t restartB
:restartC ; s%([^'"$DECIMALSEP"'][0-9]+)([0-9]{'"$GROUPBEFDS"'})($|[^0-9])%1'"$THOUSSEP"'23% ; t restartC
s%__HIDETHOUSSEP__%'"$THOUSSEP"'%g'
-e 's%^BEGINNING%%'
}
Method 14
= Number grouping formatting using Perl RegEx =
[
|*| Source: https://unix.stackexchange.com/a/656655
|*| Last update: CE 2021-08-18 06:44 UTC ]
Number grouping formatting (e.g. turning "1000000" into "1,000,000"; approximation of `numfmt --grouping`) using Perl RegEx:
(Unix Shell)
[
PERLIO=':raw:utf8' exec '/usr/bin/perl' -p
-e 'BEGIN { $^H |= 0x02800000; $^H{reflags_charset} = 4; $/ = undef(); }'
-e '
sub f {
$x1 = $1;
$x2 = $2;
#
# [
if ( length( $x1 ) > 3 ) {
pos( $x1 ) = length( $x1 ) % 3;
$x1 =~ s/G.{3}/ ( pos( $x1 ) != 0 ? "," : "" ).${&}; /gse;
};
# ]
#
# Would work but inefficient:
# [
# $x1 =~ s/(?<=d)(?=(d+))/ ( length( $1 ) % 3 != 0 ? "" : "," ); /ge;
# ]
# ,
# [
# $x1 =~ s/(?<=d)(?=(?:d{3})+(?!d))/,/g;
# ]
#
"${x1}${x2}";
};
s/(?<![w#&)*,./:;=-@[-]`{-}])([0-9]+)(.[0-9]+)?(?![w#$&(*-/<=@[-]`{-}]|.[^W0-9])/ f(); /geu;
'
"[email protected]";
]
[ Explanation Needed ]
Test case:
(Console Log (Unix) )
[
>
{ nf <<EOF
0.000000
10.000000
100.000000
1000.000000
10000.000000
100000.000000
1000000.000000
10000000.000000
100000000.000000
1000000000.000000
10000000000.000000
100000000000.000000
1000000000000.000000
EOF
} | nf; # Verified idempotence.
0.000000
10.000000
100.000000
1,000.000000
10,000.000000
100,000.000000
1,000,000.000000
10,000,000.000000
100,000,000.000000
1,000,000,000.000000
10,000,000,000.000000
100,000,000,000.000000
1,000,000,000,000.000000
]
[ Alternatively: Try the full text of this message. ]
See also:
|*| "perlrun" - how to execute the Perl interpreter # "-i''[extension]''": https://perldoc.perl.org/perlrun#-i%5Bextension%5D
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0