summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-02-13net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2015-02-13net: ip_send_unicast_reply: add missing local serializationNicholas Mc Guire
in response to the oops in ip_output.c:ip_send_unicast_reply under high network load with CONFIG_PREEMPT_RT_FULL=y, reported by Sami Pietikainen <Sami.Pietikainen@wapice.com>, this patch adds local serialization in ip_send_unicast_reply. from ip_output.c: /* * Generic function to send a packet as reply to another packet. * Used to send some TCP resets/acks so far. * * Use a fake percpu inet socket to avoid false sharing and contention. */ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = { ... which was added in commit be9f4a44 in linux-stable. The git log, wich introduced the PER_CPU unicast_sock, states: <snip> commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jul 19 07:34:03 2012 +0000 ipv4: tcp: remove per net tcp_sock tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket per network namespace. This leads to bad behavior on multiqueue NICS, because many cpus contend for the socket lock and once socket lock is acquired, extra false sharing on various socket fields slow down the operations. To better resist to attacks, we use a percpu socket. Each cpu can run without contention, using appropriate memory (local node) <snip> The per-cpu here thus is assuming exclusivity serializing per cpu - so the use of get_cpu_ligh introduced in net-use-cpu-light-in-ip-send-unicast-reply.patch, which droped the preempt_disable in favor of a migrate_disable is probably wrong as this only handles the referencial consistency but not the serialization. To evade a preempt_disable here a local lock would be needed. Therapie: * add local lock: * and re-introduce local serialization: Tested on x86 with high network load using the testcase from Sami Pietikainen while : ; do wget -O - ftp://LOCAL_SERVER/empty_file > /dev/null 2>&1; done Link: http://www.spinics.net/lists/linux-rt-users/msg11007.html Cc: stable-rt@vger.kernel.org Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13net: Use get_cpu_light() in ip_send_unicast_reply()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net: Another local_irq_disable/kmalloc headacheThomas Gleixner
Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net: sysrq via icmpCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
2015-02-13net: Avoid livelock in net_tx_action() on RTSteven Rostedt
qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13skbufhead-raw-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2015-02-13net-netif_rx_ni-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net-wireless-warn-nort.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13softirq-thread-do-softirq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13net: sched: dev_deactivate_many(): use msleep(1) instead of yield() to wait ↵Marc Kleine-Budde
for outstanding qdisc_run calls On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50 (by default). If a high priority userspace process tries to shut down a busy network interface it might spin in a yield loop waiting for the device to become idle. With the interrupt thread having a lower priority than the looping process it might never be scheduled and so result in a deadlock on UP systems. With Magic SysRq the following backtrace can be produced: > test_app R running 0 174 168 0x00000000 > [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80) > [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20) > [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88) > [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264) > [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4) > [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c) > [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130) > [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48) > [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec) > [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) > [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) > [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0) > [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) > [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198) > [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0) > [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98) > [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) > [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78) > [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c) This patch works around the problem by replacing yield() by msleep(1), giving the interrupt thread time to finish, similar to other changes contained in the rt patch set. Using wait_for_completion() instead would probably be a better solution. Cc: stable-rt@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-02-13net-flip-lock-dep-thingy.patchThomas Gleixner
======================================================= [ INFO: possible circular locking dependency detected ] 3.0.0-rc3+ #26 ------------------------------------------------------- ip/1104 is trying to acquire lock: (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68 but task is already holding lock: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+...}: [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff813e2781>] lock_sock_nested+0x82/0x92 [<ffffffff81433308>] lock_sock+0x10/0x12 [<ffffffff81433afa>] tcp_close+0x1b/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b -> #0 (local_softirq_lock){+.+...}: [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(local_softirq_lock); lock(sk_lock-AF_INET); lock(local_softirq_lock); *** DEADLOCK *** 1 lock held by ip/1104: #0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 stack backtrace: Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26 Call Trace: [<ffffffff81081649>] print_circular_bug+0x1f8/0x209 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41 [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff81433308>] ? lock_sock+0x10/0x12 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-02-13Reset to 3.12.37Scott Wood
2014-05-14softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-05-14net: Add a mutex around devnet_rename_seqSebastian Andrzej Siewior
On RT write_seqcount_begin() disables preemption and device_rename() allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex. Serialize with a mutex and add use the non preemption disabling __write_seqcount_begin(). To avoid writer starvation, let the reader grab the mutex and release it when it detects a writer in progress. This keeps the normal case (no reader on the fly) fast. [ tglx: Instead of replacing the seqcount by a mutex, add the mutex ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: Use local_bh_disable in netif_rx_ni()Thomas Gleixner
This code triggers the new WARN in __raise_softirq_irqsoff() though it actually looks at the softirq pending bit and calls into the softirq code, but that fits not well with the context related softirq model of RT. It's correct on mainline though, but going through local_bh_disable/enable here is not going to hurt badly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-05-14net: ip_send_unicast_reply: add missing local serializationNicholas Mc Guire
in response to the oops in ip_output.c:ip_send_unicast_reply under high network load with CONFIG_PREEMPT_RT_FULL=y, reported by Sami Pietikainen <Sami.Pietikainen@wapice.com>, this patch adds local serialization in ip_send_unicast_reply. from ip_output.c: /* * Generic function to send a packet as reply to another packet. * Used to send some TCP resets/acks so far. * * Use a fake percpu inet socket to avoid false sharing and contention. */ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = { ... which was added in commit be9f4a44 in linux-stable. The git log, wich introduced the PER_CPU unicast_sock, states: <snip> commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jul 19 07:34:03 2012 +0000 ipv4: tcp: remove per net tcp_sock tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket per network namespace. This leads to bad behavior on multiqueue NICS, because many cpus contend for the socket lock and once socket lock is acquired, extra false sharing on various socket fields slow down the operations. To better resist to attacks, we use a percpu socket. Each cpu can run without contention, using appropriate memory (local node) <snip> The per-cpu here thus is assuming exclusivity serializing per cpu - so the use of get_cpu_ligh introduced in net-use-cpu-light-in-ip-send-unicast-reply.patch, which droped the preempt_disable in favor of a migrate_disable is probably wrong as this only handles the referencial consistency but not the serialization. To evade a preempt_disable here a local lock would be needed. Therapie: * add local lock: * and re-introduce local serialization: Tested on x86 with high network load using the testcase from Sami Pietikainen while : ; do wget -O - ftp://LOCAL_SERVER/empty_file > /dev/null 2>&1; done Link: http://www.spinics.net/lists/linux-rt-users/msg11007.html Cc: stable-rt@vger.kernel.org Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14net: Use get_cpu_light() in ip_send_unicast_reply()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: Another local_irq_disable/kmalloc headacheThomas Gleixner
Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: sysrq via icmpCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
2014-05-14net: Avoid livelock in net_tx_action() on RTSteven Rostedt
qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14skbufhead-raw-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-05-14net-netif_rx_ni-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net-wireless-warn-nort.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14softirq-thread-do-softirq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14net: sched: dev_deactivate_many(): use msleep(1) instead of yield() to wait ↵Marc Kleine-Budde
for outstanding qdisc_run calls On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50 (by default). If a high priority userspace process tries to shut down a busy network interface it might spin in a yield loop waiting for the device to become idle. With the interrupt thread having a lower priority than the looping process it might never be scheduled and so result in a deadlock on UP systems. With Magic SysRq the following backtrace can be produced: > test_app R running 0 174 168 0x00000000 > [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80) > [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20) > [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88) > [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264) > [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4) > [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c) > [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130) > [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48) > [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec) > [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) > [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) > [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0) > [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) > [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198) > [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0) > [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98) > [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) > [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78) > [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c) This patch works around the problem by replacing yield() by msleep(1), giving the interrupt thread time to finish, similar to other changes contained in the rt patch set. Using wait_for_completion() instead would probably be a better solution. Cc: stable-rt@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-05-14net-flip-lock-dep-thingy.patchThomas Gleixner
======================================================= [ INFO: possible circular locking dependency detected ] 3.0.0-rc3+ #26 ------------------------------------------------------- ip/1104 is trying to acquire lock: (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68 but task is already holding lock: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+...}: [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff813e2781>] lock_sock_nested+0x82/0x92 [<ffffffff81433308>] lock_sock+0x10/0x12 [<ffffffff81433afa>] tcp_close+0x1b/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b -> #0 (local_softirq_lock){+.+...}: [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(local_softirq_lock); lock(sk_lock-AF_INET); lock(local_softirq_lock); *** DEADLOCK *** 1 lock held by ip/1104: #0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 stack backtrace: Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26 Call Trace: [<ffffffff81081649>] print_circular_bug+0x1f8/0x209 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41 [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff81433308>] ? lock_sock+0x10/0x12 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-14Reset to 3.12.19Scott Wood
2014-04-10softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-04-10net: Add a mutex around devnet_rename_seqSebastian Andrzej Siewior
On RT write_seqcount_begin() disables preemption and device_rename() allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex. Serialize with a mutex and add use the non preemption disabling __write_seqcount_begin(). To avoid writer starvation, let the reader grab the mutex and release it when it detects a writer in progress. This keeps the normal case (no reader on the fly) fast. [ tglx: Instead of replacing the seqcount by a mutex, add the mutex ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: Use local_bh_disable in netif_rx_ni()Thomas Gleixner
This code triggers the new WARN in __raise_softirq_irqsoff() though it actually looks at the softirq pending bit and calls into the softirq code, but that fits not well with the context related softirq model of RT. It's correct on mainline though, but going through local_bh_disable/enable here is not going to hurt badly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-04-10net: ip_send_unicast_reply: add missing local serializationNicholas Mc Guire
in response to the oops in ip_output.c:ip_send_unicast_reply under high network load with CONFIG_PREEMPT_RT_FULL=y, reported by Sami Pietikainen <Sami.Pietikainen@wapice.com>, this patch adds local serialization in ip_send_unicast_reply. from ip_output.c: /* * Generic function to send a packet as reply to another packet. * Used to send some TCP resets/acks so far. * * Use a fake percpu inet socket to avoid false sharing and contention. */ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = { ... which was added in commit be9f4a44 in linux-stable. The git log, wich introduced the PER_CPU unicast_sock, states: <snip> commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 Author: Eric Dumazet <edumazet@google.com> Date: Thu Jul 19 07:34:03 2012 +0000 ipv4: tcp: remove per net tcp_sock tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket per network namespace. This leads to bad behavior on multiqueue NICS, because many cpus contend for the socket lock and once socket lock is acquired, extra false sharing on various socket fields slow down the operations. To better resist to attacks, we use a percpu socket. Each cpu can run without contention, using appropriate memory (local node) <snip> The per-cpu here thus is assuming exclusivity serializing per cpu - so the use of get_cpu_ligh introduced in net-use-cpu-light-in-ip-send-unicast-reply.patch, which droped the preempt_disable in favor of a migrate_disable is probably wrong as this only handles the referencial consistency but not the serialization. To evade a preempt_disable here a local lock would be needed. Therapie: * add local lock: * and re-introduce local serialization: Tested on x86 with high network load using the testcase from Sami Pietikainen while : ; do wget -O - ftp://LOCAL_SERVER/empty_file > /dev/null 2>&1; done Link: http://www.spinics.net/lists/linux-rt-users/msg11007.html Cc: stable-rt@vger.kernel.org Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10net: Use get_cpu_light() in ip_send_unicast_reply()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: Another local_irq_disable/kmalloc headacheThomas Gleixner
Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: sysrq via icmpCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
2014-04-10net: Avoid livelock in net_tx_action() on RTSteven Rostedt
qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10skbufhead-raw-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2014-04-10net-netif_rx_ni-migrate-disable.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net-wireless-warn-nort.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10softirq-thread-do-softirq.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10net: sched: dev_deactivate_many(): use msleep(1) instead of yield() to wait ↵Marc Kleine-Budde
for outstanding qdisc_run calls On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50 (by default). If a high priority userspace process tries to shut down a busy network interface it might spin in a yield loop waiting for the device to become idle. With the interrupt thread having a lower priority than the looping process it might never be scheduled and so result in a deadlock on UP systems. With Magic SysRq the following backtrace can be produced: > test_app R running 0 174 168 0x00000000 > [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80) > [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20) > [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88) > [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264) > [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4) > [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c) > [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130) > [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48) > [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec) > [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) > [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) > [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0) > [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) > [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198) > [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0) > [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98) > [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) > [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78) > [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c) This patch works around the problem by replacing yield() by msleep(1), giving the interrupt thread time to finish, similar to other changes contained in the rt patch set. Using wait_for_completion() instead would probably be a better solution. Cc: stable-rt@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10net-flip-lock-dep-thingy.patchThomas Gleixner
======================================================= [ INFO: possible circular locking dependency detected ] 3.0.0-rc3+ #26 ------------------------------------------------------- ip/1104 is trying to acquire lock: (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68 but task is already holding lock: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+...}: [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff813e2781>] lock_sock_nested+0x82/0x92 [<ffffffff81433308>] lock_sock+0x10/0x12 [<ffffffff81433afa>] tcp_close+0x1b/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b -> #0 (local_softirq_lock){+.+...}: [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(local_softirq_lock); lock(sk_lock-AF_INET); lock(local_softirq_lock); *** DEADLOCK *** 1 lock held by ip/1104: #0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 stack backtrace: Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26 Call Trace: [<ffffffff81081649>] print_circular_bug+0x1f8/0x209 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41 [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff81433308>] ? lock_sock+0x10/0x12 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b Signed-off-by: Thomas Gleixner <tglx@linutronix.de>