Task
I need to unambiguously and without “holistic” guessing find the peer network interface of a veth end in another network namespace.
Theory ./. Reality
Albeit a lot of documentation and also answers here on SO assume that the ifindex indices of network interfaces are globally unique per host across network namespaces, this doesn’t hold in many cases: ifindex/iflink are ambiguous. Even the loopback already shows the contrary, having an ifindex of 1 in any network namespace. Also, depending on the container environment, ifindex numbers get reused in different namespaces. Which makes tracing veth wiring a nightmare, espcially with lots of containers and a host bridge with veth peers all ending in @if3 or so…
Example: link-netnsid is 0
Spin up a Docker container instance, just to get a new veth pair connecting from the host network namespace to the new container network namespace…
$ sudo docker run -it debian /bin/bash
Now, in the host network namespace list the network interfaces (I’ve left out those interfaces that are of no interest to this question):
$ ip link show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
...
4: docker0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:34:23:81:f0 brd ff:ff:ff:ff:ff:ff
...
16: [email protected]: mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether da:4c:f7:50:09:e2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
As you can see, while the iflink is unambiguous, but the link-netnsid is 0, despite the peer end sitting in a different network namespace.
For reference, check the netnsid in the unnamed network namespace of the container:
$ sudo lsns -t net
NS TYPE NPROCS PID USER COMMAND
...
...
4026532469 net 1 29616 root /bin/bash
$ sudo nsenter -t 29616 -n ip link show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
15: [email protected]: mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
So, for both veth ends ip link show (and RTNETLINK fwif) tells us they’re in the same network namespace with netnsid 0. Which is either wrong or correct under the assumptions that link-netnsids are local as opposed to global. I could not find any documentation that make it explicit what scope link-netnsids are supposed to have.
/sys/class/net/... NOT to the Rescue?
I’ve looked into /sys/class/net/if/… but can only find the ifindex and iflink elements; these are well documented. “ip link show” also only seems to show the peer ifindex in form of the (in)famous “@if#” notation. Or did I miss some additional network namespace element?
Bottom Line/Question
Are there any syscalls that allow retrieving the missing network namespace information for the peer end of a veth pair?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here’s the method I followed to find how to understand this problem. Available tools appear usable (with some convolution) for the namespace part, and (UPDATED) using /sys/ can easily get the peer’s index. So it’s quite long, bear with me. It’s in two parts (which are not in the logical order, but namespace first helps explain the the index naming), using common tools, not any custom program:
- Network namespace
- Interface index
Network namespace
This information is available with the property link-netnsid in the output of ip link and can be matched with the id in the output of ip netns. It’s possible to “associate” a container’s network namespace with ip netns, thus using ip netns as a specialized tool. Of course doing a specific program for this would be better (some informations about syscalls at the end of each part).
About the nsid’s description, here’s what man ip netns tells (emphasis mine):
ip netns set NAME NETNSID – assign an id to a peer network namespace
This command assigns a id to a peer network namespace. This id is valid only in the current network namespace. This id will be used by
the kernel in some netlink messages. If no id is assigned when the
kernel needs it, it will be automatically assigned by the kernel. Once
it is assigned, it’s not possible to change it.
While creating a namespace with ip netns won’t immediately create a netnsid, it will be created (on the current namespace, probably the “host”) whenever a veth half is set to an other namespace. So it’s always set for a typical container.
Here’s an example using an LXC container:
# lxc-start -n stretch-amd64
A new veth link veth9RPX4M appeared (this can be tracked with ip monitor link). Here are the detailed informations:
# ip -o link show veth9RPX4M 44: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="017764756938535159354c4168673532">[email protected]</a>: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master lxcbr0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether fe:25:13:8a:00:f8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
This link has the property link-netnsid 4, telling the other side is in the network namespace with nsid 4. How to verify it’s the LXC container? The easiest way to get this information is making ip netns believe it created the container’s network namespace, by doing the operations hinted in the manpage.
# mkdir -p /var/run/netns # touch /var/run/netns/stretch-amd64 # mount -o bind /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net /var/run/netns/stretch-amd64
UPDATE3: I didn’t understand that finding back the global name was a problem. Here it is:
# ls -l /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net lrwxrwxrwx. 1 root root 0 mai 5 20:40 /proc/17855/ns/net -> net:[4026532831] # stat -c %i /var/run/netns/stretch-amd64 4026532831
Now the information is retrieved with:
# ip netns | grep stretch-amd64 stretch-amd64 (id: 4)
It confirms the veth’s peer is in the network namespace with the same nsid = 4 = link-netnsid.
The container/ip netns “association” can be removed (without removing the namespace as long as the container is running):
# ip netns del stretch-amd64
Note: the nsid naming is per network namespace, usually starts with 0 for the first container, and the lowest value available is recycled with new namespaces.
About using syscalls, here are informations guessed from strace:
-
for the link part: it requires an
AF_NETLINKsocket (opened withsocket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)), asking (sendmsg()) the link’s informations with a message typeRTM_GETLINKand retrieving (recvmsg()) the reply with message typeRTM_NEWLINK. -
for the netns nsid part: same method, the query message is type
RTM_GETNSIDwith reply typeRTM_NEWNSID.
I think the slightly higher level libraries to handle this are there: libnl. Anyway it’s a topic for SO.
Interface index
Now it will be easier to follow why the index appear to have random behaviours. Let’s do an experiment:
First enter a new net namespace to have a clean (index) slate:
# ip netns add test # ip netns exec test bash # ip netns id test # ip -o link 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
As OP noted, lo begins with index 1.
Let’s add 5 net namespaces, create veth pairs, then put a veth end on them:
# for i in {0..4}; do ip netns add test$i; ip link add type veth peer netns test$i ; done
# ip -o link|sed 's/^/ /'
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eb9d8e9f83daab828dd9">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0f796a7b673d4f66693d">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a4d2c1d0cc97e4cdc296">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c1b7a4b5a9f581a8a7f3">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4
When it’s displaying @if2 for each of them it becomes quite clear it’s the peer’s namespace interface index and index are not global, but per namespace. When it’s displaying an actual interface name, it’s a relation to an interface in the same name space (be it veth’s peer, bridge, bond …). So why veth0 doesn’t have a peer displayed? I believe it’s an ip link bug when the index is the same as itself. Just moving twice the peer link “solves” it here, because it forced an index change. I’m also sure sometimes ip link do other confusions and instead of displaying @ifXX, displays an interface in the current namespace with the same index.
# ip -n test0 link set veth0 name veth0b netns test # ip link set veth0b netns test0 # ip -o link 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d7a1b2a3bfe797beb1e0">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0 3: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="afd9cadbc79eefc6c99d">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1 4: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f78192839fc5b79e91c5">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2 5: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6b1d0e1f03582b020d59">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3 6: <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="661003120e52260f0054">[email protected]</a>: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4
UPDATE: reading again informations in OP’s question, the peer’s index (but not nsid) is easily and unambiguously available with cat /sys/class/net/ interface /iflink.
UPDATE2:
All those iflink 2 may appear ambiguous, but what is unique is the combination of nsid and iflink, not iflink alone. For the above example that is:
interface nsid:iflink veth0 0:7 veth1 1:2 veth2 2:2 veth3 3:2 veth4 4:2
In this namespace (namely namespace test) there will never be two same nsid:pair .
If one was to look from each peer network the opposite information:
namespace interface nsid:iflink test0 veth0 0:2 test1 veth0 0:3 test2 veth0 0:4 test3 veth0 0:5 test4 veth0 0:6
But bear in mind that all the 0: there is for each one a separate 0, that happens to map to the same peer namespace (namely: namespace test, not even the host). They can’t be directly compared because they’re tied to their namespace. So the whole comparable and unique information should be:
test0:0:2 test1:0:3 test2:0:4 test3:0:5 test4:0:6
Once it’s confirmed that “test0:0” == “test1:0” etc. (true in this example, all map to the net namespace called test by ip netns) then they can be really compared.
About syscalls, still looking at strace results,the information is retrieved as above from RTM_GETLINK. Now there should be all informations available:
local: interface index with SIOCGIFINDEX / if_nametoindex
peer: both nsid and interface index with RTM_GETLINK.
All this should probably be used with libnl.
Method 2
Many thanks to @A.B who filled in some missing pieces for me, especially regarding the semantics of netnsids. His PoC is very instructive. However, the crucial missing piece in his PoC is how to correlate a local netnsid to its globally unique network namespace inode number, because only then we can unambiguously connect the correct corresponding veth pairs.
To summarize and give a small Python example how to gather the information programmatically without having to rely on ip netns and its need to mount things: RTNETLINK actually returns the netnsid when querying for network interfaces. It’s the IFLA_LINK_NETNSID attribute, which only appears in a link’s info when needed. If it’s not there, then it isn’t needed — and we must assume that the peer index refers to a namespace-local network interface.
The important lesson to take home is that a netnsid/IFLA_LINK_NETSID is only locally defined within the network namespace where you got it when asking RTNETLINK for link information. A netnsid with the same value gotten in a different network namespace might identify a different peer namespace, so be careful to not use the netnsid outside its namespace. But which uniquely identifyable network namespace (inode number) map to which netnsid?
As it turns out, a very recent version of lsns as of March 2018 is well capable to show the correct netnsid next to its network namespace inode number! So there is a way to map local netnsids to namespace inodes, but it is actually backwards! And it’s more an oracle (with a lowercase ell) than a lookup: RTM_GETNSID needs a network namespace identifier either as a PID or FD (to the network namespace) and then returns the netnsid. See https://stackoverflow.com/questions/50196902/retrieving-the-netnsid-of-a-network-namespace-in-python for an example of how to ask the Linux network namespace oracle.
In consequence, you need to enumerate the available network namespaces (via /proc and/or /var/run/netns), then for a given veth network interface attach to the network namespace where you found it, ask for the netnsids of all the network namespaces you enumerated at the beginning (because you never know Beforehand which is which), and finally map the netnsid of the veth peer to the namespace inode number per the local map you created in step 3 after attaching to the veth‘s namespace.
import psutil
import os
import pyroute2
from pyroute2.netlink import rtnl, NLM_F_REQUEST
from pyroute2.netlink.rtnl import nsidmsg
from nsenter import Namespace
# phase I: gather network namespaces from /proc/[0-9]*/ns/net
netns = dict()
for proc in psutil.process_iter():
netnsref= '/proc/{}/ns/net'.format(proc.pid)
netnsid = os.stat(netnsref).st_ino
if netnsid not in netns:
netns[netnsid] = netnsref
# phase II: ask kernel "oracle" about the local IDs for the
# network namespaces we've discovered in phase I, doing this
# from all discovered network namespaces
for id, ref in netns.items():
with Namespace(ref, 'net'):
print('inside net:[{}]...'.format(id))
ipr = pyroute2.IPRoute()
for netnsid, netnsref in netns.items():
with open(netnsref, 'r') as netnsf:
req = nsidmsg.nsidmsg()
req['attrs'] = [('NETNSA_FD', netnsf.fileno())]
resp = ipr.nlm_request(req, rtnl.RTM_GETNSID, NLM_F_REQUEST)
local_nsid = dict(resp[0]['attrs'])['NETNSA_NSID']
if local_nsid != 2**32-1:
print(' net:[{}] <--> nsid {}'.format(netnsid, local_nsid))
Method 3
I created a simple script that lists all containers with associated veth interface: https://github.com/samos123/docker-veth/blob/master/docker-veth.sh
Let me explain how it works:
- Find the PID of the container
pid=$(docker inspect --format '{{.State.Pid}}' $containerID)
- Enter the network namespace using
nsenter
nsenter -t $pid -n ip a
You will notice that there is an [email protected] interface inside the container network namespace. The X tells you the interface index on the host network. This index can then be used to figure out which veth belongs to the container.
Run the following commands to find the veth interface:
ifindex=$(nsenter -t $pid -n ip link | sed -n -e 's/.*<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5732233f67173e31">[email protected]</a>([0-9]*):.*/1/p') veth=$(ip -o link | grep ^$ifindex | sed -n -e 's/.*(veth[[:alnum:]]*@if[[:digit:]]*).*/1/p') echo $veth
Blog post with more details: http://samos-it.com/posts/enter-namespace-of-other-containers-from-a-pod.html
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0