I’m writing a library for manipulation Unix path strings. That being the case, I need to understand a few obscure corners of the syntax that most people wouldn’t worry about.
For example, as best as I can tell, it seems that foo/bar and foo//bar both point to the same place.
Also, ~ usually stands for the user’s home directory, but what if it appears in the middle of a path? What happens then?
These and several dozen other obscure questions need answering if I’m going to write code which handles every possible case correctly. Does anybody know of a definitive reference which explains the exact syntax rules for this stuff?
(Unfortunately, searching for terms like “Unix path syntax” just turns up a million pages discussing the $PATH variable… Heck, I’m even struggling to find suitable tags for this question!)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
There are three types of paths:
- relative paths like
foo,foo/bar,../a,.. They don’t start with/and are relative to the current directory of the process making a system call with that path. - absolute paths like
/,/foo/baror///x. They start with 1, or 3 or more/, they are not relative, are looked up starting from the/root directory. - POSIX allows
//footo be treated specially, but doesn’t specify how. Some systems use that for special cases like network files. It has to be exactly 2 slashes.
Other than at the start, sequences of slashes act like one.
~ is only special to the shell, it’s expanded by the shell, it’s not special to the system at all. How it’s expanded is shell dependent. Shells do other forms of expansions like globbing (*.txt) or variable expansion /$foo/$bar or others. As far as the system is concerned ~foo is just a relative path like _foo or foo.
Things to bear in mind:
foo/is not the same asfoo. It’s closer tofoo/.thanfoo(especially iffoois a symlink) for most system calls on most systems (foo//is the same asfoo/though).a/b/../cis not necessarily the same asa/c(for instance ifa/bis a symlink). Best is not to treat..specially.- it’s generally safe to consider
a/././././bthe same asa/bthough.
Method 2
For example, as best as I can tell, it seems that foo/bar and foo//bar both point to the same place.
Yes. This is common because software sometimes concatenates a path assuming the first part was not terminated with a forward slash, so one is thrown in to make sure (meaning there may end up being two or more). foo///bar and foo/////bar also point to the same place as foo/bar. A nice function for a path manipulation library would be one which reduces any number of sequential slashes to one (except at the beginning of a path, where it may be used in an URL-ish way, or, as Stephane points out, for any unspecified special purpose).
Also, ~ usually stands for the user’s home directory
That transformation is done via the shell and tilde exapansion, which only works if it is the first character in the path. Whether or not you need to deal with this depends on context. If the library is to be used with normal programs which receive, e.g., command line arguments containing a path, tilde expansion is already done when they see the path. The only situation I can see it being a concern is if you are processing paths directly from a text file.
Beyond that, ~ is a legal character in a *nix path and should not be changed to anything else. As per this, the only characters which aren’t legal in a unix filename are / (because it is the path separator) and “null” (aka. a zero byte) because they are illegal in text generally.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0