According to man perlrun:
-0[octal/hexadecimal]
specifies the input record separator ($/) as an octal or
hexadecimal number. If there are no digits, the null character is
the separator.
and
The special value 00 will cause Perl to slurp files in paragraph mode. Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose.
However, given this input file:
This is paragraph one This is paragraph two.
I get some unexpected results:
$ perl -0ne 'print; exit' file ## is used, so everything is printed This is paragraph one. This is paragraph two. $ perl -00ne 'print; exit' file ## Paragraph mode, as expected This is paragraph one.
So far, so good. Now, why do these two seem to also work in paragraph mode?
$ perl -000ne 'print; exit' file This is paragraph one. $ perl -0000ne 'print; exit' file This is paragraph one.
And why is this one apparently slurping the entire file again?
$ perl -00000ne 'print; exit' file This is paragraph one. This is paragraph two.
Further testing shows that these all seem to work in paragraph mode:
perl -000 perl -0000 perl -000000 perl -0000000 perl -00000000
While these seem to slurp the file whole:
perl -00000 perl -000000000
I guess my problem is that I don’t understand octal well enough (at all, really), I am a biologist, not a programmer. Do the latter two slurp the file whole because both 0000 and 00000000 are >= 0400? Or is there something completely different going on?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Octal is just like decimal in that 0 == 0, and 0000 == 0, 0 == 000000, etc. The fact that the switch here is -0 may make things a little confusing — I would presume the point about “the special value 00” means one 0 for the switch and one for the value; adding more zeros is not going to change the latter, so you get the same thing…
Up to a point. The behavior of 000000 etc. is kind of bug-like, but keep in mind that this is supposed to refer to a single 8-bit value. The range of 8 bits in decimal is 0-255, in octal, 0-377. So you can’t possibly use more than 3 digits here meaningfully (the special values are all outside that range, but still 3 digits + the switch). You are perhaps meant to just infer this from:
You can also specify the separator character using hexadecimal notation: -0xHHH…, where the H are valid hexadecimal digits. Unlike the octal form, this one may be used to specify any Unicode character, even those beyond 0xFF.
0xFF hex == 255 decimal == 377 octal == max for 8-bits, the size of one byte and a character in the (extended) ASCII set.
Method 2
Let looking into perl source to more details. In perl.c:
case '0':
{
I32 flags = 0;
STRLEN numlen;
SvREFCNT_dec(PL_rs);
if (s[1] == 'x' && s[2]) {
const char *e = s+=2;
U8 *tmps;
while (*e)
e++;
numlen = e - s;
flags = PERL_SCAN_SILENT_ILLDIGIT;
rschar = (U32)grok_hex(s, &numlen, &flags, NULL);
if (s + numlen < e) {
rschar = 0; /* Grandfather -0xFOO as -0 -xFOO. */
numlen = 0;
s--;
}
PL_rs = newSVpvs("");
SvGROW(PL_rs, (STRLEN)(UNISKIP(rschar) + 1));
tmps = (U8*)SvPVX(PL_rs);
uvchr_to_utf8(tmps, rschar);
SvCUR_set(PL_rs, UNISKIP(rschar));
SvUTF8_on(PL_rs);
}
else {
numlen = 4;
rschar = (U32)grok_oct(s, &numlen, &flags, NULL);
if (rschar & ~((U8)~0))
PL_rs = &PL_sv_undef;
else if (!rschar && numlen >= 2)
PL_rs = newSVpvs("");
else {
char ch = (char)rschar;
PL_rs = newSVpvn(&ch, 1);
}
}
sv_setsv(get_sv("/", GV_ADD), PL_rs);
return s + numlen;
}
grok_oct converts a string representing an octal number to numeric form. It return immediately if attempt an invalid octal digit. And it only assumes each 4 characters (numlen = 4) for a valid value (You can see the for loop in its implementation in numeric.c)
So in -00000, first perl parse -0000 and set $/ to 00. The last 0 is considered as perl -0, causing $/ set to 00 again. You can see in:
$ perl -MO=Deparse -00000777ne 'print; exit' file
BEGIN { $/ = undef; $ = undef; }
LINE: while (defined($_ = <ARGV>)) {
print $_;
exit;
}
-e syntax OK
$/ was set to undef, because the last octal sequence perl parsed is 0777.
More clearly:
$ perl -MO=Deparse -00000x1FF -ne 'print; exit' file
BEGIN { $/ = "x{1ff}"; $ = undef; }
LINE: while (defined($_ = <ARGV>)) {
print $_;
exit;
}
-e syntax OK
You can see $/ was set to the last 4 digits sequence 0x1FF.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0