How is the IFB device positioned in the packet flow of the Linux kernel
I would like to know the exact position of the following device in the packet flow for ingress traffic shaping:
- IFB: Intermediate Functional Block
I would like to better understand how packets are flowing to this device and exactly when this happens to understand what methods for filtering / classification can be used of the following:
tc filter ... u32 ...
iptables ... -j MARK --set-mark ...
iptables ... -j CLASSIFY --set-class ...
It seems hard to find documentation on this topic, any help where to find official documentation would be greatly appreciated as well.
Documentation as far as I know:
tc
: tldp.org HOWTO, lartc.org HOWTOifb
: linuxfoundation.org, tc-mirred manpage, wiki.gentoo.orgnetfilter
packet flow: kernel_flow, docum.org kptd
From the known documentation I interpret the following:
Basic traffic control
figure 1 +-------+ +------+ |ingress| +---------+ |egress| |qdisc +--->netfilter+--->qdisc | |eth0 | +---------+ |eth0 | +-------+ +------+
IFB?
tc filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 action mirred egress redirect dev ifb0
will result in?
figure 2 +-------+ +-------+ +------+ +------+ |ingress| |ingress| |egress| +---------+ |egress| |qdisc +--->qdisc +--->qdisc +--->netfilter+--->qdisc | |eth0 | |ifb0 | |ifb0 | +---------+ |eth0 | +-------+ +-------+ +------+ +------+
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I think I finally understood how redirecting ingress
to IFB
is working:
+-------+ +------+ +------+ |ingress| |egress| +---------+ |egress| |qdisc +--->qdisc +--->netfilter+--->qdisc | |eth1 | |ifb1 | +---------+ |eth1 | +-------+ +------+ +------+
My initial assumption in
figure 2
, that the ifb
device is inserted between ingress eth1
and netfilter
and that packets first enter the ingress ifb1
and then exit through egress ifb1
was wrong.In fact redirecting traffic from an interface’s ingress
or egress
to the ifb’s egress
is done directly by redirecting (“stealing”) the packet and directly placing it in the egress
of the ifb device.
Mirroring/redirecting traffic to the ifb’s ingress
is currently not supported as also stated in the documentation, at least on my version:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="384a57574c785c5d5a00">[email protected]</a>:~# tc -V tc utility, iproute2-ss140804 <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="92e0fdfde6d2f6f7f0aa">[email protected]</a>:~# dpkg -l | grep iproute ii iproute2 3.16.0-2 <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="483a27273c082c2d2a70">[email protected]</a>:~# uname -a Linux deb8 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 x86_64 GNU/Linux
Documentation
I was able to get this information thanks to the following documentation:
- linux-ip.net Intermediate Functional Block
- dev.laptop.org ifb-README
- people.netfilter.org Linux Traffic Control Classifier-Action Subsystem Architecture Paper
Debugging
And some debugging using iptables -j LOG
and tc filter action simple
, which I used to print out messages to syslog
when an icmp
packet is flowing through the netdevs.
The result is as follows:
Jun 14 13:02:12 deb8 kernel: [ 4273.341087] simple: tc[eth1]ingress_1 Jun 14 13:02:12 deb8 kernel: [ 4273.341114] simple: tc[ifb1]egress_1 Jun 14 13:02:12 deb8 kernel: [ 4273.341229] ipt[PREROUTING]raw IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341238] ipt[PREROUTING]mangle IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341242] ipt[PREROUTING]nat IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341249] ipt[INPUT]mangle IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341252] ipt[INPUT]filter IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341255] ipt[INPUT]nat IN=eth1 OUT= MAC=08:00:27:ee:8f:15:08:00:27:89:16:5b:08:00 SRC=10.1.1.3 DST=10.1.1.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53979 DF PROTO=ICMP TYPE=8 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341267] ipt[OUTPUT]raw IN= OUT=eth1 SRC=10.1.1.2 DST=10.1.1.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37735 PROTO=ICMP TYPE=0 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341270] ipt[OUTPUT]mangle IN= OUT=eth1 SRC=10.1.1.2 DST=10.1.1.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37735 PROTO=ICMP TYPE=0 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341272] ipt[OUTPUT]filter IN= OUT=eth1 SRC=10.1.1.2 DST=10.1.1.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37735 PROTO=ICMP TYPE=0 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341274] ipt[POSTROUTING]mangle IN= OUT=eth1 SRC=10.1.1.2 DST=10.1.1.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37735 PROTO=ICMP TYPE=0 CODE=0 ID=1382 SEQ=1 Jun 14 13:02:12 deb8 kernel: [ 4273.341278] simple: tc[eth1]egress_1 Jun 14 13:02:12 deb8 kernel: [ 4273.341280] simple: tc[ifb0]egress_1
The debugging was done using the following settings:
iptables -F -t filter iptables -F -t nat iptables -F -t mangle iptables -F -t raw iptables -A PREROUTING -t raw -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]raw ' iptables -A PREROUTING -t mangle -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]mangle ' iptables -A PREROUTING -t nat -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]nat ' iptables -A INPUT -t mangle -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]mangle ' iptables -A INPUT -t filter -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]filter ' iptables -A INPUT -t nat -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]nat ' iptables -A FORWARD -t mangle -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[FORWARD]mangle ' iptables -A FORWARD -t filter -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[FORWARD]filter ' iptables -A OUTPUT -t raw -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]raw ' iptables -A OUTPUT -t mangle -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]mangle ' iptables -A OUTPUT -t nat -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]nat ' iptables -A OUTPUT -t filter -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]filter ' iptables -A POSTROUTING -t mangle -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[POSTROUTING]mangle ' iptables -A POSTROUTING -t nat -p icmp --icmp-type 8 -j LOG --log-level 7 --log-prefix 'ipt[POSTROUTING]nat ' iptables -A PREROUTING -t raw -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]raw ' iptables -A PREROUTING -t mangle -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]mangle ' iptables -A PREROUTING -t nat -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[PREROUTING]nat ' iptables -A INPUT -t mangle -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]mangle ' iptables -A INPUT -t filter -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]filter ' iptables -A INPUT -t nat -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[INPUT]nat ' iptables -A FORWARD -t mangle -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[FORWARD]mangle ' iptables -A FORWARD -t filter -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[FORWARD]filter ' iptables -A OUTPUT -t raw -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]raw ' iptables -A OUTPUT -t mangle -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]mangle ' iptables -A OUTPUT -t nat -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]nat ' iptables -A OUTPUT -t filter -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[OUTPUT]filter ' iptables -A POSTROUTING -t mangle -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[POSTROUTING]mangle ' iptables -A POSTROUTING -t nat -p icmp --icmp-type 0 -j LOG --log-level 7 --log-prefix 'ipt[POSTROUTING]nat ' export TC="/sbin/tc" $TC qdisc del dev eth1 root $TC qdisc del dev eth1 ingress ip link set dev ifb0 down ip link set dev ifb1 down $TC qdisc del dev ifb0 root $TC qdisc del dev ifb1 root rmmod ifb modprobe ifb numifbs=2 $TC qdisc add dev ifb0 root handle 1: htb default 2 $TC class add dev ifb0 parent 1: classid 1:1 htb rate 2Mbit $TC class add dev ifb0 parent 1: classid 1:2 htb rate 10Mbit $TC filter add dev ifb0 parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:1 action simple "tc[ifb0]egress" $TC qdisc add dev ifb0 ingress $TC filter add dev ifb0 parent ffff: protocol ip prio 1 u32 match ip protocol 1 0xff action simple "tc[ifb0]ingress" $TC qdisc add dev ifb1 root handle 1: htb default 2 $TC class add dev ifb1 parent 1: classid 1:1 htb rate 2Mbit $TC class add dev ifb1 parent 1: classid 1:2 htb rate 10Mbit $TC filter add dev ifb1 parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:1 action simple "tc[ifb1]egress" $TC qdisc add dev ifb1 ingress $TC filter add dev ifb1 parent ffff: protocol ip prio 1 u32 match ip protocol 1 0xff action simple "tc[ifb1]ingress" ip link set dev ifb0 up ip link set dev ifb1 up $TC qdisc add dev eth1 root handle 1: htb default 2 $TC class add dev eth1 parent 1: classid 1:1 htb rate 2Mbit $TC class add dev eth1 parent 1: classid 1:2 htb rate 10Mbit $TC filter add dev eth1 parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:1 action simple "tc[eth1]egress" pipe action mirred egress redirect dev ifb0 $TC qdisc add dev eth1 ingress $TC filter add dev eth1 parent ffff: protocol ip prio 1 u32 match ip protocol 1 0xff action simple "tc[eth1]ingress" pipe action mirred egress redirect dev ifb1
Method 2
So to go back to what you mentioned on top, does ifb
work with set-mark
?
I am marking with iptables
and cgroups
and then I put latency on the marked traffic in tc
. I can do this easily on eth0
for egress (in postrouting). But the same logic does not work for ifb0
. It seems no traffic is marked or if it is marked, not captured by ifb. Any thoughts?
This is my iptables setup:
sudo /sbin/iptables -t mangle --new test_chain --wait sudo /sbin/iptables -t mangle -I POSTROUTING 1 --match cgroup --cgroup 0x4d81b18 --jump test_chain --wait sudo /sbin/iptables -t mangle -I INPUT 1 --match cgroup --cgroup 0x4d81b18 --jump test_chain --wait sudo /sbin/iptables -t mangle -I OUTPUT 1 --match cgroup --cgroup 0x4d81b18 --jump test_chain --wait sudo /sbin/iptables -t mangle -A test_chain -p tcp -j MARK --set-mark 0x4d81b18 --wait
This is my tc setup:
sudo tc qdisc add dev eth0 handle ffff: ingress sudo tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0 sudo /sbin/tc qdisc add dev ifb0 root handle 1:0 htb default 2 sudo /sbin/tc class add dev ifb0 parent 1:0 classid 1:1 htb rate 1000Gbps sudo /sbin/tc class add dev ifb0 parent 1:0 classid 1:2 htb rate 1000Gbps prio 1 sudo /sbin/tc class add dev ifb0 parent 1:1 classid 1:3 htb rate 1000Gbps prio 2 sudo /sbin/tc qdisc add dev ifb0 parent 1:3 handle 3:0 netem delay 20ms sudo /sbin/tc filter add dev ifb0 parent 1:0 protocol ip prio 1 handle 0x4d81b18 fw flowid 1:3
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0