summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-08-12tipc: avoid possible deadlock while enable and disable bearerdingtianhong
We met lockdep warning when enable and disable the bearer for commands such as: tipc-config -netid=1234 -addr=1.1.3 -be=eth:eth0 tipc-config -netid=1234 -addr=1.1.3 -bd=eth:eth0 --------------------------------------------------- [ 327.693595] ====================================================== [ 327.693994] [ INFO: possible circular locking dependency detected ] [ 327.694519] 3.11.0-rc3-wwd-default #4 Tainted: G O [ 327.694882] ------------------------------------------------------- [ 327.695385] tipc-config/5825 is trying to acquire lock: [ 327.695754] (((timer))#2){+.-...}, at: [<ffffffff8105be80>] del_timer_sync+0x0/0xd0 [ 327.696018] [ 327.696018] but task is already holding lock: [ 327.696018] (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+ 0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] which lock already depends on the new lock. [ 327.696018] [ 327.696018] [ 327.696018] the existing dependency chain (in reverse order) is: [ 327.696018] [ 327.696018] -> #1 (&(&b_ptr->lock)->rlock){+.-...}: [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff814d65b1>] _raw_spin_lock_bh+0x41/0x80 [ 327.696018] [<ffffffffa02c5d48>] disc_timeout+0x18/0xd0 [tipc] [ 327.696018] [<ffffffff8105b92a>] call_timer_fn+0xda/0x1e0 [ 327.696018] [<ffffffff8105bcd7>] run_timer_softirq+0x2a7/0x2d0 [ 327.696018] [<ffffffff8105379a>] __do_softirq+0x16a/0x2e0 [ 327.696018] [<ffffffff81053a35>] irq_exit+0xd5/0xe0 [ 327.696018] [<ffffffff81033005>] smp_apic_timer_interrupt+0x45/0x60 [ 327.696018] [<ffffffff814df4af>] apic_timer_interrupt+0x6f/0x80 [ 327.696018] [<ffffffff8100b70e>] arch_cpu_idle+0x1e/0x30 [ 327.696018] [<ffffffff810a039d>] cpu_idle_loop+0x1fd/0x280 [ 327.696018] [<ffffffff810a043e>] cpu_startup_entry+0x1e/0x20 [ 327.696018] [<ffffffff81031589>] start_secondary+0x89/0x90 [ 327.696018] [ 327.696018] -> #0 (((timer))#2){+.-...}: [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b [ 327.696018] [ 327.696018] other info that might help us debug this: [ 327.696018] [ 327.696018] Possible unsafe locking scenario: [ 327.696018] [ 327.696018] CPU0 CPU1 [ 327.696018] ---- ---- [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] [ 327.696018] *** DEADLOCK *** [ 327.696018] [ 327.696018] 5 locks held by tipc-config/5825: [ 327.696018] #0: (cb_lock){++++++}, at: [<ffffffff8143e608>] genl_rcv+0x18/0x40 [ 327.696018] #1: (genl_mutex){+.+.+.}, at: [<ffffffff8143ed66>] genl_rcv_msg+0xa6/0xd0 [ 327.696018] #2: (config_mutex){+.+.+.}, at: [<ffffffffa02bf889>] tipc_cfg_do_cmd+0x39/ 0x550 [tipc] [ 327.696018] #3: (tipc_net_lock){++.-..}, at: [<ffffffffa02be738>] tipc_disable_bearer+ 0x18/0x60 [tipc] [ 327.696018] #4: (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] stack backtrace: [ 327.696018] CPU: 2 PID: 5825 Comm: tipc-config Tainted: G O 3.11.0-rc3-wwd- default #4 [ 327.696018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 327.696018] 00000000ffffffff ffff880037fa77a8 ffffffff814d03dd 0000000000000000 [ 327.696018] ffff880037fa7808 ffff880037fa77e8 ffffffff810b1c4f 0000000037fa77e8 [ 327.696018] ffff880037fa7808 ffff880037e4db40 0000000000000000 ffff880037e4e318 [ 327.696018] Call Trace: [ 327.696018] [<ffffffff814d03dd>] dump_stack+0x4d/0xa0 [ 327.696018] [<ffffffff810b1c4f>] print_circular_bug+0x10f/0x120 [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff81087a28>] ? sched_clock_cpu+0xd8/0x110 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffff81218783>] ? security_capable+0x13/0x20 [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143ecc0>] ? genl_lock+0x20/0x20 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e608>] ? genl_rcv+0x18/0x40 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff81289d7c>] ? memcpy_fromiovec+0x6c/0x90 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff813fe29c>] ? release_sock+0x8c/0xa0 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fa24>] ? rw_verify_area+0x54/0x100 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b ----------------------------------------------------------------------- The problem is that the tipc_link_delete() will cancel the timer disc_timeout() when the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to finish the work, so the dead lock occurs. We should unlock the b_ptr->lock when del the disc_timeout(). Remove link_timeout() still met the same problem, the patch: http://article.gmane.org/gmane.network.tipc.general/4380 fix the problem, so no need to send patch for fix link_timeout() deadlock warming. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included change: - reassign pointers to data after skb reallocation to avoid kernel paging errors Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10batman-adv: fix potential kernel paging errors for unicast transmissionsLinus Lüssing
There are several functions which might reallocate skb data. Currently some places keep reusing their old ethhdr pointer regardless of whether they became invalid after such a reallocation or not. This potentially leads to kernel paging errors. This patch fixes these by refetching the ethdr pointer after the potential reallocations. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-08-10Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains four netfilter fixes, they are: * Fix possible invalid access and mangling of the TCPMSS option in xt_TCPMSS. This was spotted by Julian Anastasov. * Fix possible off by one access and mangling of the TCP packet in xt_TCPOPTSTRIP, also spotted by Julian Anastasov. * Fix possible information leak due to missing initialization of one padding field of several structures that are included in nfqueue and nflog netlink messages, from Dan Carpenter. * Fix TCP window tracking with Fast Open, from Yuchung Cheng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10netfilter: nf_conntrack: fix tcp_in_window for Fast OpenYuchung Cheng
Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10rtnetlink: Fix inverted check in ndo_dflt_fdb_del()Sridhar Samudrala
Fix inverted check when deleting an fdb entry. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09net: rename busy poll MIB counterEliezer Tamir
Rename mib counter from "low latency" to "busy poll" v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer) Eric Dumazet suggested that the current location is better. So v2 just renames the counter to fit the new naming convention. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09net: flow_dissector: add 802.1ad supportEric Dumazet
Same behavior than 802.1q : finds the encapsulated protocol and skip 32bit header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09ip_gre: fix ipgre_header to return correct offsetTimo Teräs
Fix ipgre_header() (header_ops->create) to return the correct amount of bytes pushed. Most callers of dev_hard_header() seem to care only if it was success, but af_packet.c uses it as offset to the skb to copy from userspace only once. In practice this fixes packet socket sendto()/sendmsg() to gre tunnels. Regression introduced in c54419321455631079c7d6e60bc732dd0c5914c5 ("GRE: Refactor GRE tunneling code.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-08-08ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not matchHannes Frederic Sowa
In case a subtree did not match we currently stop backtracking and return NULL (root table from fib_lookup). This could yield in invalid routing table lookups when using subtrees. Instead continue to backtrack until a valid subtree or node is found and return this match. Also remove unneeded NULL check. Reported-by: Teco Boot <teco@inf-net.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Lamparter <equinox@diac24.net> Cc: <boutier@pps.univ-paris-diderot.fr> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07SUNRPC: If the rpcbind channel is disconnected, fail the call to unregisterTrond Myklebust
If rpcbind causes our connection to the AF_LOCAL socket to close after we've registered a service, then we want to be careful about reconnecting since the mount namespace may have changed. By simply refusing to reconnect the AF_LOCAL socket in the case of unregister, we avoid the need to somehow save the mount namespace. While this may lead to some services not unregistering properly, it should be safe. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Nix <nix@esperi.org.uk> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x
2013-08-07tcp: cubic: fix bug in bictcp_acked()Eric Dumazet
While investigating about strange increase of retransmit rates on hosts ~24 days after boot, Van found hystart was disabled if ca->epoch_start was 0, as following condition is true when tcp_time_stamp high order bit is set. (s32)(tcp_time_stamp - ca->epoch_start) < HZ Quoting Van : At initialization & after every loss ca->epoch_start is set to zero so I believe that the above line will turn off hystart as soon as the 2^31 bit is set in tcp_time_stamp & hystart will stay off for 24 days. I think we've observed that cubic's restart is too aggressive without hystart so this might account for the higher drop rate we observe. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07bridge: correct the comment for file br_sysfs_br.cWang Sheng-Hui
br_sysfs_if.c is for sysfs attributes of bridge ports, while br_sysfs_br.c is for sysfs attributes of bridge itself. Correct the comment here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07tcp: cubic: fix overflow error in bictcp_update()Eric Dumazet
commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an overflow error in bictcp_update() in following code : /* change the unit from HZ to bictcp_HZ */ t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) - ca->epoch_start) << BICTCP_HZ) / HZ; Because msecs_to_jiffies() being unsigned long, compiler does implicit type promotion. We really want to constrain (tcp_time_stamp - ca->epoch_start) to a signed 32bit value, or else 't' has unexpected high values. This bugs triggers an increase of retransmit rates ~24 days after boot [1], as the high order bit of tcp_time_stamp flips. [1] for hosts with HZ=1000 Big thanks to Van Jacobson for spotting this problem. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-06SUNRPC: Don't auto-disconnect from the local rpcbind socketTrond Myklebust
There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x
2013-08-05bridge: don't try to update timers in case of broken MLD queriesLinus Lüssing
Currently we are reading an uninitialized value for the max_delay variable when snooping an MLD query message of invalid length and would update our timers with that. Fixing this by simply ignoring such broken MLD queries (just like we do for IGMP already). This is a regression introduced by: "bridge: disable snooping if there is no querier" (b00589af3b04) Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05fib_trie: remove potential out of bound accessEric Dumazet
AddressSanitizer [1] dynamic checker pointed a potential out of bound access in leaf_walk_rcu() We could allocate one more slot in tnode_new() to leave the prefetch() in-place but it looks not worth the pain. Bug added in commit 82cfbb008572b ("[IPV4] fib_trie: iterator recode") [1] : https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05net: esp{4,6}: fix potential MTU calculation overflowsDaniel Borkmann
Commit 91657eafb ("xfrm: take net hdr len into account for esp payload size calculation") introduced a possible interger overflow in esp{4,6}_get_mtu() handlers in case of x->props.mode equals XFRM_MODE_TUNNEL. Thus, the following expression will overflow unsigned int net_adj; ... <case ipv{4,6} XFRM_MODE_TUNNEL> net_adj = 0; ... return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) - net_adj) & ~(align - 1)) + (net_adj - 2); where (net_adj - 2) would be evaluated as <foo> + (0 - 2) in an unsigned context. Fix it by simply removing brackets as those operations here do not need to have special precedence. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Benjamin Poirier <bpoirier@suse.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05net_sched: make dev_trans_start return vlan's real dev trans_startnikolay@redhat.com
Vlan devices are LLTX and don't update their own trans_start, so if dev_trans_start has to be called with a vlan device then 0 or a stale value will be returned. Currently the bonding is the only such user, and it's needed for proper arp monitoring when the slaves are vlans. Fix this by extracting the vlan's real device trans_start. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05vlan: make vlan_dev_real_dev work over stacked vlansnikolay@redhat.com
Sometimes we might have stacked vlans on top of each other, and we're interested in the first non-vlan real device on the path, so transform vlan_dev_real_dev to go over the stacked vlans and extract the first non-vlan device. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05net/vmw_vsock/af_vsock.c: drop unneeded semicolonJulia Lawall
Drop the semicolon at the end of the list_for_each_entry loop header. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05netfilter: nfnetlink_{log,queue}: fix information leaks in netlink messageDan Carpenter
These structs have a "_pad" member. Also the "phw" structs have an 8 byte "hw_addr[]" array but sometimes only the first 6 bytes are initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't ignore user initiated wireless regulatory settings on cards with custom regulatory domains, from Arik Nemtsov. 2) Fix length check of bluetooth information responses, from Jaganath Kanakkassery. 3) Fix misuse of PTR_ERR in btusb, from Adam Lee. 4) Handle rfkill properly while iwlwifi devices are offline, from Emmanuel Grumbach. 5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang. 6) Kernel info leak in ATM packet scheduler, from Dan Carpenter. 7) 8139cp doesn't check for DMA mapping errors, from Neil Horman. 8) Fix bridge multicast code to not snoop when no querier exists, otherwise mutlicast traffic is lost. From Linus Lüssing. 9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek. 10) Fix races in automatic address asignment on ipv6, which can result in incorrect lifetime assignments. From Jiri Benc. 11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the original naming of this feature. From Cong Wang. 12) Fix crash in TIPC when server socket creation fails, from Ying Xue. 13) macvlan_changelink() silently succeeds when it shouldn't, from Michael S Tsirkin. 14) HTB packet scheduler can crash due to sign extension, fix from Stephen Hemminger. 15) With the cable unplugged, r8169 prints out a message every 10 seconds, make it netif_dbg() instead of netif_warn(). From Peter Wu. 16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann. 17) sis900 gets spurious TX queue timeouts due to mismanagement of link carrier state, from Denis Kirjanov. 18) Validate somaxconn sysctl to make sure it fits inside of a u16. From Roman Gushchin. 19) Fix MAC address filtering on qlcnic, from Shahed Shaikh. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits) qlcnic: Fix for flash update failure on 83xx adapter qlcnic: Fix link speed and duplex display for 83xx adapter qlcnic: Fix link speed display for 82xx adapter qlcnic: Fix external loopback test. qlcnic: Removed adapter series name from warning messages. qlcnic: Free up memory in error path. qlcnic: Fix ingress MAC learning qlcnic: Fix MAC address filter issue on 82xx adapter net: ethernet: davinci_emac: drop IRQF_DISABLED netlabel: use domain based selectors when address based selectors are not available net: check net.core.somaxconn sysctl values sis900: Fix the tx queue timeout issue net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails r8169: remove "PHY reset until link up" log spam net: ethernet: cpsw: drop IRQF_DISABLED htb: fix sign extension bug macvlan: handle set_promiscuity failures macvlan: better mode validation tipc: fix oops when creating server socket fails net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL ...
2013-08-03Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "Most of this is due to a screwup on my part -- some gss-proxy crashes got fixed before the merge window but somehow never made it out of a temporary git repo on my laptop...." * 'for-3.11' of git://linux-nfs.org/~bfields/linux: svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall svcrpc: fix kfree oops in gss-proxy code svcrpc: fix gss-proxy xdr decoding oops svcrpc: fix gss_rpc_upcall create error NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
2013-08-02netlabel: use domain based selectors when address based selectors are not ↵Paul Moore
available NetLabel has the ability to selectively assign network security labels to outbound traffic based on either the LSM's "domain" (different for each LSM), the network destination, or a combination of both. Depending on the type of traffic, local or forwarded, and the type of traffic selector, domain or address based, different hooks are used to label the traffic; the goal being minimal overhead. Unfortunately, there is a bug such that a system using NetLabel domain based traffic selectors does not correctly label outbound local traffic that is not assigned to a socket. The issue is that in these cases the associated NetLabel hook only looks at the address based selectors and not the domain based selectors. This patch corrects this by checking both the domain and address based selectors so that the correct labeling is applied, regardless of the configuration type. In order to acomplish this fix, this patch also simplifies some of the NetLabel domainhash structures to use a more common outbound traffic mapping type: struct netlbl_dommap_def. This simplifies some of the code in this patch and paves the way for further simplifications in the future. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02net: check net.core.somaxconn sysctl valuesRoman Gushchin
It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reported-by: Changli Gao <xiaosuo@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing failsDaniel Borkmann
Commit 5c766d642 ("ipv4: introduce address lifetime") leaves the ifa resource that was allocated via inet_alloc_ifa() unfreed when returning the function with -EINVAL. Thus, free it first via inet_free_ifa(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02htb: fix sign extension bugstephen hemminger
When userspace passes a large priority value the assignment of the unsigned value hopt->prio to signed int cl->prio causes cl->prio to become negative and the comparison is with TC_HTB_NUMPRIO is always false. The result is that HTB crashes by referencing outside the array when processing packets. With this patch the large value wraps around like other values outside the normal range. See: https://bugzilla.kernel.org/show_bug.cgi?id=60669 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-08-01tipc: fix oops when creating server socket failsYing Xue
When creation of TIPC internal server socket fails, we get an oops with the following dump: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] PGD 13719067 PUD 12008067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc(+) CPU: 4 PID: 4340 Comm: insmod Not tainted 3.10.0+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880014360000 ti: ffff88001374c000 task.ti: ffff88001374c000 RIP: 0010:[<ffffffffa0011f49>] [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] RSP: 0018:ffff88001374dc98 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff880012ac09d8 RCX: 0000000000000000 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffff880014360000 RBP: ffff88001374dcb8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0016fa0 R13: ffffffffa0017010 R14: ffffffffa0017010 R15: ffff880012ac09d8 FS: 0000000000000000(0000) GS:ffff880016600000(0063) knlGS:00000000f76668d0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000020 CR3: 0000000012227000 CR4: 00000000000006e0 Stack: ffff88001374dcb8 ffffffffa0016fa0 0000000000000000 0000000000000001 ffff88001374dcf8 ffffffffa0012922 ffff88001374dce8 00000000ffffffea ffffffffa0017100 0000000000000000 ffff8800134241a8 ffffffffa0017150 Call Trace: [<ffffffffa0012922>] tipc_server_stop+0xa2/0x1b0 [tipc] [<ffffffffa0009995>] tipc_subscr_stop+0x15/0x20 [tipc] [<ffffffffa00130f5>] tipc_core_stop+0x1d/0x33 [tipc] [<ffffffffa001f0d4>] tipc_init+0xd4/0xf8 [tipc] [<ffffffffa001f000>] ? 0xffffffffa001efff [<ffffffff8100023f>] do_one_initcall+0x3f/0x150 [<ffffffff81082f4d>] ? __blocking_notifier_call_chain+0x7d/0xd0 [<ffffffff810cc58a>] load_module+0x11aa/0x19c0 [<ffffffff810c8d60>] ? show_initstate+0x50/0x50 [<ffffffff8190311c>] ? retint_restore_args+0xe/0xe [<ffffffff810cce79>] SyS_init_module+0xd9/0x110 [<ffffffff8190dc65>] sysenter_dispatch+0x7/0x1f Code: 6c 24 70 4c 89 ef e8 b7 04 8f e1 8b 73 04 4c 89 e7 e8 7c 9e 32 e1 41 83 ac 24 b8 00 00 00 01 4c 89 ef e8 eb 0a 8f e1 48 8b 43 08 <4c> 8b 68 20 4d 8d a5 48 03 00 00 4c 89 e7 e8 04 05 8f e1 4c 89 RIP [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc] RSP <ffff88001374dc98> CR2: 0000000000000020 ---[ end trace b02321f40e4269a3 ]--- We have the following call chain: tipc_core_start() ret = tipc_subscr_start() ret = tipc_server_start(){ server->enabled = 1; ret = tipc_open_listening_sock() } I.e., the server->enabled flag is unconditionally set to 1, whatever the return value of tipc_open_listening_sock(). This causes a crash when tipc_core_start() tries to clean up resources after a failed initialization: if (ret == failed) tipc_subscr_stop() tipc_server_stop(){ if (server->enabled) tipc_close_conn(){ NULL reference of con->sock-sk OOPS! } } To avoid this, tipc_server_start() should only set server->enabled to 1 in case of a succesful socket creation. In case of failure, it should release all allocated resources before returning. Problem introduced in commit c5fa7b3cf3cb22e4ac60485fc2dc187fe012910f ("tipc: introduce new TIPC server infrastructure") in v3.11-rc1. Note that it won't be seen often; it takes a module load under memory constrained conditions in order to trigger the failure condition. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLLCong Wang
Eliezer renames several *ll_poll to *busy_poll, but forgets CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too. Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01ipv6: prevent race between address creation and removalJiri Benc
There's a race in IPv6 automatic addess assignment. The address is created with zero lifetime when it's added to various address lists. Before it gets assigned the correct lifetime, there's a window where a new address may be configured. This causes the semi-initiated address to be deleted in addrconf_verify. This was discovered as a reference leak caused by concurrent run of __ipv6_ifa_notify for both RTM_NEWADDR and RTM_DELADDR with the same address. Fix this by setting the lifetime before the address is added to inet6_addr_lst. A few notes: 1. In addrconf_prefix_rcv, by setting update_lft to zero, the if (update_lft) { ... } condition is no longer executed for newly created addresses. This is okay, as the ifp fields are set in ipv6_add_addr now and ipv6_ifa_notify is called (and has been called) through addrconf_dad_start. 2. The removal of the whole block under ifp->lock in inet6_addr_add is okay, too, as tstamp is initialized to jiffies in ipv6_add_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01ipv6: move peer_addr init into ipv6_add_addr()Jiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01ipv6: update ip6_rt_last_gc every time GC is runMichal Kubeček
As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got there from ip6_dst_gc(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01ipv6: prevent fib6_run_gc() contentionMichal Kubeček
On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses fib6_gc_lock to allow only one thread to run the garbage collector but ip6_dst_gc() doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc() returns. On a system with many entries, this can take some time so that in the meantime, other threads pass the tests in ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for the lock. They then have to run the garbage collector one after another which blocks them for quite long. Resolve this by replacing special value ~0UL of expire parameter to fib6_run_gc() by explicit "force" parameter to choose between spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with force=false if gc_thresh is reached but not max_size. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-08-01svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcallJ. Bruce Fields
The change made to rsc_parse() in 0dc1531aca7fd1440918bd55844a054e9c29acad "svcrpc: store gss mech in svc_cred" should also have been propagated to the gss-proxy codepath. This fixes a crash in the gss-proxy case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix kfree oops in gss-proxy codeJ. Bruce Fields
mech_oid.data is an array, not kmalloc()'d memory. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix gss-proxy xdr decoding oopsJ. Bruce Fields
Uninitialized stack data was being used as the destination for memcpy's. Longer term we'll just delete some of this code; all we're doing is skipping over xdr that we don't care about. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix gss_rpc_upcall create errorJ. Bruce Fields
Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.NeilBrown
Since we enabled auto-tuning for sunrpc TCP connections we do not guarantee that there is enough write-space on each connection to queue a reply. If memory pressure causes the window to shrink too small, the request throttling in sunrpc/svc will not accept any requests so no more requests will be handled. Even when pressure decreases the window will not grow again until data is sent on the connection. This means we get a deadlock: no requests will be handled until there is more space, and no space will be allocated until a request is handled. This can be simulated by modifying svc_tcp_has_wspace to inflate the number of byte required and removing the 'svc_sock_setbufsize' calls in svc_setup_socket. I found that multiplying by 16 was enough to make the requirement exceed the default allocation. With this modification in place: mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt would block and eventually time out because the nfs server could not accept any requests. This patch relaxes the request throttling to always allow at least one request through per connection. It does this by checking both sk_stream_min_wspace() and xprt->xpt_reserved are zero. The first is zero when the TCP transmit queue is empty. The second is zero when there are no RPC requests being processed. When both of these are zero the socket is idle and so one more request can safely be allowed through. Applying this patch allows the above mount command to succeed cleanly. Tracing shows that the allocated write buffer space quickly grows and after a few requests are handled, the extra tests are no longer needed to permit further requests to be processed. The main purpose of request throttling is to handle the case when one client is slow at collecting replies and the send queue gets full of replies that the client hasn't acknowledged (at the TCP level) yet. As we only change behaviour when the send queue is empty this main purpose is still preserved. Reported-by: Ben Myers <bpm@sgi.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01netfilter: xt_TCPOPTSTRIP: fix possible off by one accessPablo Neira Ayuso
Fix a possible off by one access since optlen() touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1. This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen that stores the TCP header length, to save some cycles. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01netfilter: xt_TCPMSS: fix handling of malformed TCP header and optionsPablo Neira Ayuso
Make sure the packet has enough room for the TCP header and that it is not malformed. While at it, store tcph->doff*4 in a variable, as it is used several times. This patch also fixes a possible off by one in case of malformed TCP options. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01bridge: disable snooping if there is no querierLinus Lüssing
If there is no querier on a link then we won't get periodic reports and therefore won't be able to learn about multicast listeners behind ports, potentially leading to lost multicast packets, especially for multicast listeners that joined before the creation of the bridge. These lost multicast packets can appear since c5c23260594 ("bridge: Add multicast_querier toggle and disable queries by default") in particular. With this patch we are flooding multicast packets if our querier is disabled and if we didn't detect any other querier. A grace period of the Maximum Response Delay of the querier is added to give multicast responses enough time to arrive and to be learned from before disabling the flooding behaviour again. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net_sched: info leak in atm_tc_dump_class()Dan Carpenter
The "pvc" struct has a hole after pvc.sap_family which is not cleared. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix association failures not triggering a connect-failure event in cfg80211, from Johannes Berg. 2) Eliminate a potential NULL deref with older iptables tools when configuring xt_socket rules, from Eric Dumazet. 3) Missing RTNL locking in wireless regulatory code, from Johannes Berg. 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey Khoroshilov. 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey Khoroshilov. 6) VXLAN namespace teardown fails to unregister devices, from Stephen Hemminger. 7) Fix multicast settings getting dropped by firmware in qlcnic driver, from Sucheta Chakraborty. 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar. 9) Fix a nasty bug in bridging where an active timer would get reinitialized with a setup_timer() call. From Eric Dumazet. 10) Fix use after free in new mlx5 driver, from Dan Carpenter. 11) Fix freed pointer reference in ipv6 multicast routing on namespace cleanup, from Hannes Frederic Sowa. 12) Some usbnet drivers report TSO and SG in their feature set, but the usbnet layer doesn't really support them. From Eric Dumazet. 13) Fix crash on EEH errors in tg3 driver, from Gavin Shan. 14) Drop cb_lock when requesting modules in genetlink, from Stanislaw Gruszka. 15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from Dan Carpenter. 16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to endless loops, from Uwe Kleine-König. 17) Fix hangs from loading mvneta driver, from Arnaud Patard. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits) mlx5: fix error return code in mlx5_alloc_uuars() mvneta: Try to fix mvneta when compiled as module mvneta: Fix hang when loading the mvneta driver atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE af_key: more info leaks in pfkey messages net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link net_sched: Fix stack info leak in cbq_dump_wrr(). igb: fix vlan filtering in promisc mode when not in VT mode ixgbe: Fix Tx Hang issue with lldpad on 82598EB genetlink: release cb_lock before requesting additional module net: fec: workaround stop tx during errata ERR006358 qlcnic: Fix diagnostic interrupt test for 83xx adapters. qlcnic: Fix setting Guest VLAN qlcnic: Fix operation type and command type. qlcnic: Fix initialization of work function. Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring" atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring net/tg3: Fix warning from pci_disable_device() net/tg3: Fix kernel crash ...
2013-07-31mac80211: continue using disabled channels while connectedJohannes Berg
In case the AP has different regulatory information than we do, it can happen that we connect to an AP based on e.g. the world roaming regulatory data, and then update our database with the AP's country information disables the channel the AP is using. If this happens on an HT AP, the bandwidth tracking code will hit the WARN_ON() and disconnect. Since that's not very useful, ignore the channel-disable flag in bandwidth tracking. Cc: stable@vger.kernel.org Reported-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31cfg80211: fix P2P GO interface teardownJohannes Berg
When a P2P GO interface goes down, cfg80211 doesn't properly tear it down, leading to warnings later. Add the GO interface type to the enumeration to tear it down like AP interfaces. Otherwise, we leave it pending and mac80211's state can get very confused, leading to warnings later. Cc: stable@vger.kernel.org Reported-by: Ilan Peer <ilan.peer@intel.com> Tested-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>