I am parsing the /proc/pid/cmdline value for a number of processes on my Linux system (Ubuntu 16.04) and have found that while most of the entries are null-encoded, as expected, at least one uses spaces for delimiters which I find unexpected.
From the documentation for proc(5) I don’t see any indication that this should be happening. Are there any cases where I should expect spaces as delimiters instead of null values? If so, where can I find documentation that describes the behavior?
Behavior
This is what I see when I try to cat the cmdline for one of the chromium-browser processes (note the space character is used to delimit the values):
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cfbabcaabd8fa7a0bcbb">[email protected]</a>:~$ cat /proc/2721/cmdline /usr/lib/chromium-browser/chromium-browser --type=gpu-process --field-trial-handle=2073283832741738928,4790986738309707242,131072 --gpu-preferences=GAAAAAAAAAAAAQAAAQAAAAAAAAAAAGAA --gpu-vendor-id=0x15ad --gpu-device-id=0x0405 --gpu-driver-vendor=Mesa --gpu-driver-version=17.2.8 --gpu-driver-date --service-request-channel-token=3778166CAD6E96F44A7268DF1AB1DD53
I would expect to see something like this (null values as delimiter), which is what I do see from other processes on the system:
~$ cat /proc/354/cmdline vmware-vmblock-fuse/run/vmblock-fuse-orw,subtype=vmware-vmblock,default_permissions,allow_other,dev,suid
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
at least one uses spaces for delimiters
Incorrect.
If you look at the end of the pseudo-file on FreeBSD/TrueOS, where you can encounter exactly the same behaviour with Chromium, you will find a ␀. This is ␀-terminated. It is all one single argument.
Chromium is overwriting its arguments after a fork(), to give you something interesting to look at in the output of ps. It is using the setproctitle() library function. This is part of the BSD C libraries. It is not part of the GNU C library. On GNU C platforms, Chromium uses a setproctitle() of its own that overwrites the argv data directly.
setproctitle() is not in fact the right tool for this job, because it does not allow for setting more than one argument string. It sets the formatted “title” as the 0th argument and sets the argument count to 1. Everything is marshalled through the library function as one single argument.
This is not the only problem with setproctitle(). The FreeBSD/OpenBSD/NetBSD C library version also has an arbitrary 2KiB limitation, inherited straight from the old BSD sendmail program (from which the library function was originally lifted in the FreeBSD case), which is far too short for what Chromium often sets command lines to. And both Chromium’s own and the FreeBSD/OpenBSD/NetBSD C library version have extra functionality, of the format string being a null pointer, that Chromium does not use (but, ironically, has to deal with in its own setproctitle() implementation nonetheless).
One can do a lot better with less code. The underlying system call on FreeBSD/TrueOS that the library function calls to do the work once it has constructed the argument data, is the sysctl() function, taking CTL_KERN, KERN_PROC, KERN_PROC_ARGS, and a process ID as the address. This can accept multiple ␀-terminated strings. I wrote a fairly simple setprocargv() function for my toolsets that employs this.
extern
void
setprocargv (
size_t argc,
const char * argv[]
) {
#if defined(__FreeBSD__) || defined(__DragonFly__)
std::string s;
for (size_t c(0); c < argc; ++c) {
if (!argv[c]) break;
s += argv[c];
s += '';
}
const int oid[4] = { CTL_KERN, KERN_PROC, KERN_PROC_ARGS, getpid() };
sysctl(oid, sizeof oid/sizeof *oid, 0, 0, s.data(), s.length());
#elif defined(__OpenBSD__) …
(OpenBSD/NetBSD do things the old way that FreeBSD/TrueOS used to, with a ps_strings structure in application memory, but it is still sysctl() that is the underlying system call used, to find the location of that structure.)
% /package/admin/nosh/command/exec foreground pause ; true & [1] 30318 % hexdump -C /proc/30318/cmdline 00000000 66 6f 72 65 67 72 6f 75 6e 64 00 70 61 75 73 65 |foreground.pause| 00000010 00 3b 00 74 72 75 65 00 |.;.true.| 00000018 % hexdump -C /proc/30319/cmdline 00000000 70 61 75 73 65 00 |pause.| 00000006 %
Because setproctitle() is the wrong tool for the job, Chromium is taking the new argv members and constructing a single long ␠-delimited string of them, to be passed as a single argument to setproctitle().
for (size_t i = 1; i < command_line->argv().size(); ++i) {
if (!title.empty())
title += " ";
title += command_line->argv()[i];
}
// Disable prepending argv[0] with '-' if we prepended it ourselves above.
setproctitle(have_argv0 ? "-%s" : "%s", title.c_str());
As you can see, Chromium itself already has the new argument vector as a series of ␀-terminated strings. It is passing it through an intermediate library layer that needs them all bunched up into one string, where the actual system call level nonetheless operates in terms of an argument vector of ␀-terminated strings.
Hence the behaviour that you are witnessing, where Chromium is presenting its altered argument vectors to the system as one single argument.
Perhaps you could persuade the writers of Chromium to adopt something like setprocargv(). ☺
Further reading
- Peter Wemm (1995-12-16).
setproctitle. FreeBSD Library Functions Manual. FreeBSD.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0