Last Friday I upgraded my Ubuntu server to 11.10, which now runs with a 3.0.0-12-server kernel. Since then the overall performance has dropped dramatically. Before the upgrade the system load was about 0.3, but currently it is at 22-30 on an 8 core CPU system with 16GB of RAM (10GB free, no swap used).
I was going to blame the BTRFS file system driver and the underlaying MD array, because [md1_raid1] and [btrfs-transacti] consumed a lot of resources. But all the [kworker/*:*] consume a lot more.
sar has outputted something similar to this constantly since Friday:
11:25:01 CPU %user %nice %system %iowait %steal %idle 11:35:01 all 1,55 0,00 70,98 8,99 0,00 18,48 11:45:01 all 1,51 0,00 68,29 10,67 0,00 19,53 11:55:01 all 1,40 0,00 65,52 13,53 0,00 19,55 12:05:01 all 0,95 0,00 66,23 10,73 0,00 22,10
And iostat confirms a very poor write rate:
sda 129,26 3059,12 614,31 258226022 51855269 sdb 98,78 24,28 3495,05 2049471 295023077 md1 191,96 202,63 611,95 17104003 51656068 md0 0,01 0,02 0,00 1980 109
The question is: How can I track down why the kworker threads consume so many resources (and which one)? Or better: Is this a known issue with the 3.0 kernel, and can I tweak it with kernel parameters?
Edit:
I updated the Kernel to the brand new version 3.1 as recommended by the BTRFS developers. But unfortunately this didn’t change anything.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I found this thread on lkml that answers your question a little. (It seems even Linus himself was puzzled as to how to find out the origin of those threads.)
Basically, there are two ways of doing this:
$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt (wait a few secs)
For this you will need ftrace to be compiled in your kernel, and to enable it with:
mount -t debugfs nodev /sys/kernel/debug
More information on the function tracer facilities of Linux is available in the ftrace.txt documentation.
This will output what threads are all doing, and is useful for tracing multiple small jobs.
cat /proc/THE_OFFENDING_KWORKER/stack
This will output the stack of a single thread doing a lot of work. It may allow you to find out what caused this specific thread to hog the CPU (for example). THE_OFFENDING_KWORKER is the pid of the kworker in the process list.
Method 2
The solution is: I don’t know how to find the cause. Nobody told me so far.
But talking with the BTRFS developers revealed a bug in the btrfs drivers when writing many many small files in a very short time period. This is an issue on kernels from 3.0 upto 3.1. Maybe it gets fixed in 3.2.
In the meantime I got a patch for the current kernel that solved the problem.
Method 3
I had a similar issue; looking at the kworker’s thread stack:
while true ; do clear ; grep -n ^ /proc/24910/stack | sort -rn | cut -d: -f 2- ; sleep 1 ; done [<ffffffffffffffff>] 0xffffffffffffffff [<ffffffff810908f0>] kthread+0x0/0xe0 [<ffffffff81576432>] ret_from_fork+0x42/0x70 [<ffffffff810908f0>] kthread+0x0/0xe0 [<ffffffff810909b1>] kthread+0xc1/0xe0 [<ffffffff8108b520>] worker_thread+0x0/0x550 [<ffffffff8108b573>] worker_thread+0x53/0x550 [<ffffffff8108aa4b>] process_one_work+0x14b/0x420 [<ffffffff81405a3d>] rpm_idle+0x1ad/0x220 [<ffffffff8140509d>] __rpm_callback+0x2d/0xb0 [<ffffffffa01aef16>] usb_runtime_idle+0x26/0x30 [usbcore] [<ffffffffa01aeef0>] usb_runtime_idle+0x0/0x30 [usbcore] [<ffffffff8140686c>] __pm_runtime_suspend+0x5c/0x90 [<ffffffff81405b19>] __pm_runtime_idle+0x69/0x90 [<ffffffff81405295>] rpm_suspend+0x105/0x620 [<ffffffff8140513f>] rpm_callback+0x1f/0x70 [<ffffffff8140509d>] __rpm_callback+0x2d/0xb0 [<ffffffffa01aee50>] usb_runtime_suspend+0x0/0x80 [usbcore] [<ffffffffa01aee7e>] usb_runtime_suspend+0x2e/0x80 [usbcore] [<ffffffffa01adc4f>] usb_suspend_both+0xef/0x1f0 [usbcore] [<ffffffffa01adb06>] usb_resume_interface.isra.6+0xa6/0x100 [usbcore] [<ffffffffa01a0c63>] hub_resume+0x23/0x60 [usbcore] [<ffffffffa01a0636>] hub_activate+0xc6/0x5c0 [usbcore] [<ffffffffa01a9d3f>] usb_kill_urb+0x3f/0xa0 [usbcore] [<ffffffffa019d249>] hub_port_status+0xd9/0x120 [usbcore] [<ffffffff81088a4f>] __queue_work+0x12f/0x340 [<ffffffff810888b6>] insert_work+0x46/0xb0 [<ffffffffa01aa6d4>] usb_control_msg+0xc4/0x110 [usbcore] [<ffffffffa01aa55a>] usb_start_wait_urb+0x9a/0x150 [usbcore] [<ffffffff810a36f7>] update_curr+0xd7/0x120
I figured it’s the usb modules. I had previously on another machine blithely rmmod’d all usb and [uex]hci modules are realised that I had turned off the keyboard (not even ctrl-shift-sysrq-U !), so I ended up doing this:
MODS="uvcvideo ohci_hcd ehci_hcd xhci_hcd ohci_pci ehci_pci xhci_pci usbcore" ( echo $MODS $MODS | xargs -n 1 rmmod -v ; sleep 3 ; echo $MODS | xargs -n 1 modprobe -v ; ) <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="90e2ffffe4d0f8e0">[email protected]</a>:~# ( echo $MODS $MODS | xargs -n 1 rmmod -v ; sleep 3 ; echo $MODS | xargs -n 1 modprobe -v ; ) rmmod: ERROR: Module ohci_hcd is in use by: ohci_pci rmmod: ERROR: Module ehci_hcd is in use by: ehci_pci rmmod: ERROR: Module xhci_hcd is in use by: xhci_pci rmmod: ERROR: Module uvcvideo is not currently loaded rmmod: ERROR: Module ohci_pci is not currently loaded rmmod: ERROR: Module ehci_pci is not currently loaded rmmod: ERROR: Module xhci_pci is not currently loaded insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/media/usb/uvc/uvcvideo.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/ehci-hcd.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/ohci-hcd.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/xhci-hcd.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/ehci-pci.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/ohci-pci.ko insmod /lib/modules/4.1.0-2-amd64/kernel/drivers/usb/host/xhci-pci.ko
did the trick:
grep -n ^ /proc/24910/stack | sort -rn | cut -d: -f 2- [<ffffffffffffffff>] 0xffffffffffffffff [<ffffffff810908f0>] kthread+0x0/0xe0 [<ffffffff81576432>] ret_from_fork+0x42/0x70 [<ffffffff810908f0>] kthread+0x0/0xe0 [<ffffffff810909b1>] kthread+0xc1/0xe0 [<ffffffff8108b520>] worker_thread+0x0/0x550 [<ffffffff8108b5ec>] worker_thread+0xcc/0x550
So my chief suspect is this gadget:
RTL8723B* WIFI+Bluetooth Module.
I’m wondering now if the power-management code realises it’s the same device if it tries to e.g. powerdown an unused BT adapter.
context:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="daa8b5b5ae9ab2aa">[email protected]</a>:~# lsusb
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 0c45:651b Microdia
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 0bda:b001 Realtek Semiconductor Corp.
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 009 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 008 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 007 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 006 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e2908d8d96a28a92">[email protected]</a>:~# lsmod | grep usb
btusb 45056 0
btbcm 16384 1 btusb
btintel 16384 1 btusb
bluetooth 438272 5 bnep,btbcm,btusb,btintel
usbcore 200704 8 btusb,uvcvideo,ohci_hcd,ohci_pci,ehci_hcd,ehci_pci,xhci_hcd,xhci_pci
usb_common 16384 1 usbcore
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e99b86869da98199">[email protected]</a>:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux stable-updates (sid)
Release: stable-updates
Codename: sid
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="75071a1a01351d05">[email protected]</a>:~# uname -a
Linux hp 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23) x86_64 GNU/Linux
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4c3e2323380c243c">[email protected]</a>:~# dmesg | tail -n 20
[97865.088740] usb 2-4: SerialNumber: HP Webcam
[97865.091557] uvcvideo: Found UVC 1.00 device HP Webcam (0c45:651b)
[97865.105948] input: HP Webcam as /devices/pci0000:00/0000:00:13.2/usb2/2-4/2-4:1.0/input/input17
[97865.189817] usb 3-3: new full-speed USB device number 2 using ohci-pci
[97865.350981] usb 3-3: No LPM exit latency info found, disabling LPM.
[97865.368958] usb 3-3: New USB device found, idVendor=0bda, idProduct=b001
[97865.368969] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[97865.368976] usb 3-3: Product: Bluetooth Radio
[97865.368981] usb 3-3: Manufacturer: Realtek
[97865.368985] usb 3-3: SerialNumber: 00e04c000001
[97865.375859] Bluetooth: hci0: rtl: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
[97865.375867] Bluetooth: hci0: rtl: loading rtl_bt/rtl8723b_fw.bin
[97865.375896] usb 3-3: firmware: failed to load rtl_bt/rtl8723b_fw.bin (-2)
[97865.375902] usb 3-3: Direct firmware load for rtl_bt/rtl8723b_fw.bin failed with error -2
[97865.375907] Bluetooth: hci0: Failed to load rtl_bt/rtl8723b_fw.bin
[97865.397812] Bluetooth: hci0: rtl: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
[97865.397821] Bluetooth: hci0: rtl: loading rtl_bt/rtl8723b_fw.bin
[97865.397850] usb 3-3: firmware: failed to load rtl_bt/rtl8723b_fw.bin (-2)
[97865.397856] usb 3-3: Direct firmware load for rtl_bt/rtl8723b_fw.bin failed with error -2
[97865.397861] Bluetooth: hci0: Failed to load rtl_bt/rtl8723b_fw.bin
Method 4
echo N >/sys/module/drm_kms_helper/parameters/poll
(in root mode)
Problem with Intel graphic card
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0