summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-05-20Merge tag 'v3.8.13'Scott Wood
This is the 3.8.13 stable release
2013-05-11netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengthsMatthias Schiffer
commit 906b1c394d0906a154fbdc904ca506bceb515756 upstream. The bitmask used for the prefix mangling was being calculated incorrectly, leading to the wrong part of the address being replaced when the prefix length wasn't a multiple of 32. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: xt_rpfilter: skip locally generated broadcast/multicast, tooFlorian Westphal
commit f83a7ea2075ca896f2dbf07672bac9cf3682ff74 upstream. Alex Efros reported rpfilter module doesn't match following packets: IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ] (netfilter bugzilla #814). Problem is that network stack arranges for the locally generated broadcasts to appear on the interface they were sent out, so the IFF_LOOPBACK check doesn't trigger. As -m rpfilter is restricted to PREROUTING, we can check for existing rtable instead, it catches locally-generated broad/multicast case, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: ctnetlink: don't permit ct creation with random tupleFlorian Westphal
commit 442fad9423b78319e0019a7f5047eddf3317afbc upstream. Userspace can cause kernel panic by not specifying orig/reply tuple: kernel will create a tuple with random stack values. Problem is that tuple.dst.dir will be random, too, which causes nf_ct_tuplehash_to_ctrack() to return garbage. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: nf_ct_helper: don't discard helper if it is actually the sameFlorian Westphal
commit 6e2f0aa8cf8892868bf2c19349cb5d7c407f690d upstream. commit (32f5376 netfilter: nf_ct_helper: disable automatic helper re-assignment of different type) broke transparent proxy scenarios. For example, initial helper lookup might yield "ftp" (dport 21), while re-lookup after REDIRECT yields "ftp-2121". This causes the autoassign code to toss the ftp helper, even though these are just different instances of the same helper. Change the test to check for the helper function address instead of the helper address, as suggested by Pablo. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: ipset: "Directory not empty" error messageJozsef Kadlecsik
commit dd82088dab3646ed28e4aa43d1a5b5d5ffc2afba upstream. When an entry flagged with "nomatch" was tested by ipset, it returned the error message "Kernel error received: Directory not empty" instead of "<element> is NOT in set <setname>" (reported by John Brendler). The internal error code was not properly transformed before returning to userspace, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: nf_ct_sip: don't drop packets with offsets pointing outside the ↵Patrick McHardy
packet commit 3a7b21eaf4fb3c971bdb47a98f570550ddfe4471 upstream. Some Cisco phones create huge messages that are spread over multiple packets. After calculating the offset of the SIP body, it is validated to be within the packet and the packet is dropped otherwise. This breaks operation of these phones. Since connection tracking is supposed to be passive, just let those packets pass unmodified and untracked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: ipset: list:set: fix reference counter updateJozsef Kadlecsik
commit 02f815cb6d3f57914228be84df9613ee5a01c2e6 upstream. The last element can be replaced or pushed off and in both cases the reference counter must be updated. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11netfilter: nf_nat: fix race when unloading protocol modulesFlorian Westphal
commit c2d421e171868586939c328dfb91bab840fe4c49 upstream. following oops was reported: RIP: 0010:[<ffffffffa03227f2>] [<ffffffffa03227f2>] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat] RSP: 0018:ffff880202c63d40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0 RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8 [..] Call Trace: [..] [<ffffffffa02febed>] destroy_conntrack+0xbd/0x110 [nf_conntrack] Happens when a conntrack timeout expires right after first part of the nat cleanup has completed (bysrc hash removal), but before part 2 has completed (re-initialization of nat area). [ destroy callback tries to delete bysrc again ] Patrick suggested to just remove the affected conntracks -- the connections won't work properly anyway without nat transformation. So, lets do that. Reported-by: CAI Qian <caiqian@redhat.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11ipvs: ip_vs_sip_fill_param() BUG: bad check of return valueHans Schillstrom
commit f7a1dd6e3ad59f0cfd51da29dfdbfd54122c5916 upstream. The reason for this patch is crash in kmemdup caused by returning from get_callid with uniialized matchoff and matchlen. Removing Zero check of matchlen since it's done by ct_sip_get_header() BUG: unable to handle kernel paging request at ffff880457b5763f IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35 PGD 27f6067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core CPU 5 Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5 /S1200KP RIP: 0010:[<ffffffff810df7fc>] [<ffffffff810df7fc>] kmemdup+0x2e/0x35 RSP: 0018:ffff8803fea03648 EFLAGS: 00010282 RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003 RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0 RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011 R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90 FS: 0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480) Stack: ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000 Call Trace: <IRQ> [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip] [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs] [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs] ... Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-08wireless: regulatory: fix channel disabling race conditionJohannes Berg
commit 990de49f74e772b6db5208457b7aa712a5f4db86 upstream. When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz part of the scan disables a 5.2 GHz channel due to, e.g. receiving country or frequency information, that 5.2 GHz channel might already be in the list of channels to scan next. Then, when the driver checks if it should do a passive scan, that will return false and attempt an active scan. This is not only wrong but can also lead to the iwlwifi device firmware crashing since it checks regulatory as well. Fix this by not setting the channel flags to just disabled but rather OR'ing in the disabled flag. That way, even if the race happens, the channel will be scanned passively which is still (mostly) correct. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-08mac80211: fix station entry leak/warning while suspendingJohannes Berg
commit b20d34c458bc2bbd0a4624f2933581e01e72d875 upstream. Since Stanislaw's patches, when suspending while connected, cfg80211 will disconnect. This causes the AP station to be removed, which uses call_rcu() to clean up. Due to needing process context, this queues a work struct on the mac80211 workqueue. This will warn and fail when already suspended, which can happen if the rcu call doesn't happen quickly. To fix this, replace the synchronize_net() which is really just synchronize_rcu_expedited() with rcu_barrier(), which unlike synchronize_rcu() waits until RCU callback have run and thus avoids this issue. In theory, this can even happen without Stanislaw's change to disconnect on suspend since userspace might disconnect just before suspending, though then it's unlikely that the call_rcu() will be delayed long enough. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01Merge branch 'rtmerge'Scott Wood
Conflicts: include/linux/preempt.h
2013-05-01net: drop dst before queueing fragmentsEric Dumazet
[ Upstream commit 97599dc792b45b1669c3cdb9a4b365aad0232f65 ] Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU protected section. Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed the case of timeouts, but not the general problem. Tom Parkin noticed crashes in UDP stack and provided a patch, but further analysis permitted us to pinpoint the root cause. Before queueing a packet into a frag list, we must drop its dst, as this dst has limited lifetime (RCU protected) When/if a packet is finally reassembled, we use the dst of the very last skb, still protected by RCU and valid, as the dst of the reassembled packet. Use same logic in IPv6, as there is no need to hold dst references. Reported-by: Tom Parkin <tparkin@katalix.com> Tested-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01net: rate-limit warn-bad-offload splats.Ben Greear
[ Upstream commit c846ad9b880ece01bb4d8d07ba917734edf0324f ] If one does do something unfortunate and allow a bad offload bug into the kernel, this the skb_warn_bad_offload can effectively live-lock the system, filling the logs with the same error over and over. Add rate limitation to this so that box remains otherwise functional in this case. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01tcp: call tcp_replace_ts_recent() from tcp_ack()Eric Dumazet
[ Upstream commit 12fb3dd9dc3c64ba7d64cec977cca9b5fb7b1d4e ] commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()) introduced a TS ecr bug in slow path processing. 1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200> 2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001> 3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> 4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> (ecr 200 should be ecr 300 in packets 3 & 4) Problem is tcp_ack() can trigger send of new packets (retransmits), reflecting the prior TSval, instead of the TSval contained in the currently processed incoming packet. Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the checks, but before the actions. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01esp4: fix error return code in esp_output()Wei Yongjun
[ Upstream commit 06848c10f720cbc20e3b784c0df24930b7304b93 ] Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01tcp: Reallocate headroom if it would overflow csum_startThomas Graf
[ Upstream commit 50bceae9bd3569d56744882f3012734d48a1d413 ] If a TCP retransmission gets partially ACKed and collapsed multiple times it is possible for the headroom to grow beyond 64K which will overflow the 16bit skb->csum_start which is based on the start of the headroom. It has been observed rarely in the wild with IPoIB due to the 64K MTU. Verify if the acking and collapsing resulted in a headroom exceeding what csum_start can cover and reallocate the headroom if so. A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at LLNL for helping out with the investigation and testing. Reported-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01tcp: incoming connections might use wrong route under synfloodDmitry Popov
[ Upstream commit d66954a066158781ccf9c13c91d0316970fe57b6 ] There is a bug in cookie_v4_check (net/ipv4/syncookies.c): flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP, inet_sk_flowi_flags(sk), (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, ireq->loc_addr, th->source, th->dest); Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be taken. This dst_entry is used by new socket (get_cookie_sock -> tcp_v4_syn_recv_sock), so its packets may take the wrong path. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01rtnetlink: Call nlmsg_parse() with correct header lengthMichael Riesch
[ Upstream commit 88c5b5ce5cb57af6ca2a7cf4d5715fa320448ff9 ] Signed-off-by: Michael Riesch <michael.riesch@omicron.at> Cc: Jiri Benc <jbenc@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01ipv6/tcp: Stop processing ICMPv6 redirect messagesChristoph Paasch
[ Upstream commit 50a75a8914539c5dcd441c5f54d237a666a426fd ] Tetja Rediske found that if the host receives an ICMPv6 redirect message after sending a SYN+ACK, the connection will be reset. He bisected it down to 093d04d (ipv6: Change skb->data before using icmpv6_notify() to propagate redirect), but the origin of the bug comes from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error handlers.). The bug simply did not trigger prior to 093d04d, because skb->data did not point to the inner IP header and thus icmpv6_notify did not call the correct err_handler. This patch adds the missing "goto out;" in tcp_v6_err. After receiving an ICMPv6 Redirect, we should not continue processing the ICMP in tcp_v6_err, as this may trigger the removal of request-socks or setting sk_err(_soft). Reported-by: Tetja Rediske <tetja@tetja.de> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01netfilter: don't reset nf_trace in nf_reset()Patrick McHardy
[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ] Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code to reset nf_trace in nf_reset(). This is wrong and unnecessary. nf_reset() is used in the following cases: - when passing packets up the the socket layer, at which point we want to release all netfilter references that might keep modules pinned while the packet is queued. nf_trace doesn't matter anymore at this point. - when encapsulating or decapsulating IPsec packets. We want to continue tracing these packets after IPsec processing. - when passing packets through virtual network devices. Only devices on that encapsulate in IPv4/v6 matter since otherwise nf_trace is not used anymore. Its not entirely clear whether those packets should be traced after that, however we've always done that. - when passing packets through virtual network devices that make the packet cross network namespace boundaries. This is the only cases where we clearly want to reset nf_trace and is also what the original patch intended to fix. Add a new function nf_reset_trace() and use it in dev_forward_skb() to fix this properly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01af_unix: If we don't care about credentials coallesce all messagesEric W. Biederman
[ Upstream commit 0e82e7f6dfeec1013339612f74abc2cdd29d43d2 ] It was reported that the following LSB test case failed https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we were not coallescing unix stream messages when the application was expecting us to. The problem was that the first send was before the socket was accepted and thus sock->sk_socket was NULL in maybe_add_creds, and the second send after the socket was accepted had a non-NULL value for sk->socket and thus we could tell the credentials were not needed so we did not bother. The unnecessary credentials on the first message cause unix_stream_recvmsg to start verifying that all messages had the same credentials before coallescing and then the coallescing failed because the second message had no credentials. Ignoring credentials when we don't care in unix_stream_recvmsg fixes a long standing pessimization which would fail to coallesce messages when reading from a unix stream socket if the senders were different even if we did not care about their credentials. I have tested this and verified that the in the LSB test case mentioned above that the messages do coallesce now, while the were failing to coallesce without this change. Reported-by: Karel Srot <ksrot@redhat.com> Reported-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01net: count hw_addr syncs so that unsync works properly.Vlad Yasevich
[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ] A few drivers use dev_uc_sync/unsync to synchronize the address lists from master down to slave/lower devices. In some cases (bond/team) a single address list is synched down to multiple devices. At the time of unsync, we have a leak in these lower devices, because "synced" is treated as a boolean and the address will not be unsynced for anything after the first device/call. Treat "synced" as a count (same as refcount) and allow all unsync calls to work. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01net IPv6 : Fix broken IPv6 routing table after loopback down-upBalakumaran Kannan
[ Upstream commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f ] IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo) interface. After down-up, routes of other interface's IPv6 addresses through 'lo' are lost. IPv6 addresses assigned to all interfaces are routed through 'lo' for internal communication. Once 'lo' is down, those routing entries are removed from routing table. But those removed entries are not being re-created properly when 'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable from the same machine. Also this breaks communication with other machines because of NDISC packet processing failure. This patch fixes this issue by reading all interface's IPv6 addresses and adding them to IPv6 routing table while bringing up 'lo'. ==Testing== Before applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ After applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01cbq: incorrect processing of high limitsVasily Averin
[ Upstream commit f0f6ee1f70c4eaab9d52cf7d255df4bd89f8d1c2 ] currently cbq works incorrectly for limits > 10% real link bandwidth, and practically does not work for limits > 50% real link bandwidth. Below are results of experiments taken on 1 Gbit link In shaper | Actual Result -----------+--------------- 100M | 108 Mbps 200M | 244 Mbps 300M | 412 Mbps 500M | 893 Mbps This happen because of q->now changes incorrectly in cbq_dequeue(): when it is called before real end of packet transmitting, L2T is greater than real time delay, q_now gets an extra boost but never compensate it. To fix this problem we prevent change of q->now until its synchronization with real time. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01tipc: fix info leaks via msg_name in recv_msg/recv_streamMathias Krause
[ Upstream commit 60085c3d009b0df252547adb336d1ccca5ce52ec ] The code in set_orig_addr() does not initialize all of the members of struct sockaddr_tipc when filling the sockaddr info -- namely the union is only partly filled. This will make recv_msg() and recv_stream() -- the only users of this function -- leak kernel stack memory as the msg_name member is a local variable in net/socket.c. Additionally to that both recv_msg() and recv_stream() fail to update the msg_namelen member to 0 while otherwise returning with 0, i.e. "success". This is the case for, e.g., non-blocking sockets. This will lead to a 128 byte kernel stack leak in net/socket.c. Fix the first issue by initializing the memory of the union with memset(0). Fix the second one by setting msg_namelen to 0 early as it will be updated later if we're going to fill the msg_name member. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01rose: fix info leak via msg_name in rose_recvmsg()Mathias Krause
[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ] The code in rose_recvmsg() does not initialize all of the members of struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info. Nor does it initialize the padding bytes of the structure inserted by the compiler for alignment. This will lead to leaking uninitialized kernel stack bytes in net/socket.c. Fix the issue by initializing the memory used for sockaddr info with memset(0). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()Mathias Krause
[ Upstream commit d26d6504f23e803824e8ebd14e52d4fc0a0b09cb ] The code in llcp_sock_recvmsg() does not initialize all the members of struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it initialize the padding bytes of the structure inserted by the compiler for alignment. Also, if the socket is in state LLCP_CLOSED or is shutting down during receive the msg_namelen member is not updated to 0 while otherwise returning with 0, i.e. "success". The msg_namelen update is also missing for stream and seqpacket sockets which don't fill the sockaddr info. Both issues lead to the fact that the code will leak uninitialized kernel stack bytes in net/socket.c. Fix the first issue by initializing the memory used for sockaddr info with memset(0). Fix the second one by setting msg_namelen to 0 early. It will be updated later if we're going to fill the msg_name member. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01netrom: fix info leak via msg_name in nr_recvmsg()Mathias Krause
[ Upstream commits 3ce5efad47b62c57a4f5c54248347085a750ce0e and c802d759623acbd6e1ee9fbdabae89159a513913 ] In case msg_name is set the sockaddr info gets filled out, as requested, but the code fails to initialize the padding bytes of struct sockaddr_ax25 inserted by the compiler for alignment. Also the sax25_ndigis member does not get assigned, leaking four more bytes. Both issues lead to the fact that the code will leak uninitialized kernel stack bytes in net/socket.c. Fix both issues by initializing the memory with memset(0). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01llc: Fix missing msg_namelen update in llc_ui_recvmsg()Mathias Krause
[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ] For stream sockets the code misses to update the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. The msg_namelen update is also missing for datagram sockets in case the socket is shutting down during receive. Fix both issues by setting msg_namelen to 0 early. It will be updated later if we're going to fill the msg_name member. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01l2tp: fix info leak in l2tp_ip6_recvmsg()Mathias Krause
[ Upstream commit b860d3cc62877fad02863e2a08efff69a19382d2 ] The L2TP code for IPv6 fails to initialize the l2tp_conn_id member of struct sockaddr_l2tpip6 and therefore leaks four bytes kernel stack in l2tp_ip6_recvmsg() in case msg_name is set. Initialize l2tp_conn_id with 0 to avoid the info leak. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()Mathias Krause
[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about iucv_sock_recvmsg() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01irda: Fix missing msg_namelen update in irda_recvmsg_dgram()Mathias Krause
[ Upstream commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about irda_recvmsg_dgram() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()Mathias Krause
[ Upstream commit 2d6fbfe733f35c6b355c216644e08e149c61b271 ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about caif_seqpkt_recvmsg() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()Mathias Krause
[ Upstream commit c8c499175f7d295ef867335bceb9a76a2c3cdc38 ] If the socket is in state BT_CONNECT2 and BT_SK_DEFER_SETUP is set in the flags, sco_sock_recvmsg() returns early with 0 without updating the possibly set msg_namelen member. This, in turn, leads to a 128 byte kernel stack leak in net/socket.c. Fix this by updating msg_namelen in this case. For all other cases it will be handled in bt_sock_recvmsg(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()Mathias Krause
[ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ] If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns early with 0 without updating the possibly set msg_namelen member. This, in turn, leads to a 128 byte kernel stack leak in net/socket.c. Fix this by updating msg_namelen in this case. For all other cases it will be handled in bt_sock_stream_recvmsg(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01Bluetooth: fix possible info leak in bt_sock_recvmsg()Mathias Krause
[ Upstream commit 4683f42fde3977bdb4e8a09622788cc8b5313778 ] In case the socket is already shutting down, bt_sock_recvmsg() returns with 0 without updating msg_namelen leading to net/socket.c leaking the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix this by moving the msg_namelen assignment in front of the shutdown test. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01ax25: fix info leak via msg_name in ax25_recvmsg()Mathias Krause
[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ] When msg_namelen is non-zero the sockaddr info gets filled out, as requested, but the code fails to initialize the padding bytes of struct sockaddr_ax25 inserted by the compiler for alignment. Additionally the msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is not always filled up to this size. Both issues lead to the fact that the code will leak uninitialized kernel stack bytes in net/socket.c. Fix both issues by initializing the memory with memset(0). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01atm: update msg_namelen in vcc_recvmsg()Mathias Krause
[ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about vcc_recvmsg() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-30net: make devnet_rename_seq a mutexSebastian Andrzej Siewior
On RT write_seqcount_begin() disables preemption and device_rename() allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex. Since I don't see a reason why this can't be a mutex, make it one. We probably don't have that much reads at the same time in the hot path. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2013-04-30net: Use local_bh_disable in netif_rx_ni()Thomas Gleixner
This code triggers the new WARN in __raise_softirq_irqsoff() though it actually looks at the softirq pending bit and calls into the softirq code, but that fits not well with the context related softirq model of RT. It's correct on mainline though, but going through local_bh_disable/enable here is not going to hurt badly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-30net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2013-04-30net: Use get_cpu_light() in ip_send_unicast_reply()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-30net: Another local_irq_disable/kmalloc headacheThomas Gleixner
Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-30net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2013-04-30net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-30softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2013-04-30net: sysrq via icmpCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
2013-04-30net: Avoid livelock in net_tx_action() on RTSteven Rostedt
qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>