summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-08-20ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()Dan Carpenter
We need to move the derefernce after the IS_ERR() check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv4: raise IP_MAX_MTU to theoretical limitEric Dumazet
As discussed last year [1], there is no compelling reason to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF [1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2 Willem raised this issue again because some of our internal regression tests broke after lo mtu being set to 65536. IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of mtu + 1 bytes, expecting the send() to fail, but it does not. Alexey raised interesting points about TCP MSS, that should be addressed in follow-up patches in TCP stack if needed, as someone could also set an odd mtu anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20tcp: trivial: Remove nocache argument from tcp_v4_send_synackChristoph Paasch
The nocache-argument was used in tcp_v4_send_synack as an argument to inet_csk_route_req. However, since ba3f7f04ef2b (ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used. This patch removes the unsued argument from tcp_v4_send_synack. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: fix checkpatch errors in net/ipv6/addrconf.cdingtianhong
ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: convert the uses of ADBG and remove the superfluous parenthesesdingtianhong
Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: handle context based source addressAlexander Aring
Handle context based address when an unspecified address is given. For other context based address we print a warning and drop the packet because we don't support it right now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: lowpan_uncompress_addr with address_modeAlexander Aring
This patch drops the pre and postcount calculation from the lowpan_uncompress_addr function.We use instead a switch/case over address_mode value. The original implementation has several bugs in this function and it was hard to decrypt how it works. To make it maintainable and fix these bugs this patch basically reimplements lowpan_uncompress_addr from scratch. A list of bugs we found in the current implementation: 1) Properly support uncompression of short-address based IPv6 addresses (instead of basically copying garbage) 2) Fix use and uncompression of long-addresses based IPv6 addresses 3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0 Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: add function to uncompress multicast addrAlexander Aring
Add function to uncompress multicast address. This function split the uncompress function for a multicast address in a seperate function. To uncompress a multicast address is different than a other non-multicasts addresses according to rfc6282. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: introduce lowpan_fetch_skb functionAlexander Aring
This patch adds a helper function to parse the ipv6 header to a 6lowpan header in stream. This function checks first if we can pull data with a specific length from a skb. If this seems to be okay, we copy skb data to a destination pointer and run skb_pull. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: Fix fragmentation with link-local compressed addressesDavid Hauweele
When a new 6lowpan fragment is received, a skbuff is allocated for the reassembled packet. However when a 6lowpan packet compresses link-local addresses based on link-layer addresses, the processing function relies on the skb mac control block to find the related link-layer address. This patch copies the control block from the first fragment into the newly allocated skb to keep a trace of the link-layer addresses in case of a link-local compressed address. Edit: small changes on comment issue Signed-off-by: David Hauweele <david@hauweele.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: init ipv6hdr buffer to zeroAlexander Aring
This patch simplify the handling to set fields inside of struct ipv6hdr to zero. Instead of setting some memory regions with memset to zero we initialize the whole ipv6hdr to zero. This is a simplification for parsing the 6lowpan header for the upcomming patches. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20openvswitch: Add vxlan tunneling support.Pravin B Shelar
Following patch adds vxlan vport type for openvswitch using vxlan api. So now there is vxlan dependency for openvswitch. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix SKB leak in 8139cp, from Dave Jones. 2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar. 3) RCU conversion of macvtap introduced two races, fixes by Eric Dumazet 4) Synchronize statistic flows in bnx2x driver to prevent corruption, from Dmitry Kravkov 5) Undo optimization in IP tunneling, we were using the inner IP header in some cases to inherit the IP ID, but that isn't correct in some circumstances. From Pravin B Shelar 6) Use correct struct size when parsing netlink attributes in rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen 7) Length verifications in tun_get_user() are bogus, from Weiping Pan and Dan Carpenter 8) Fix bad merge resolution during 3.11 networking development in openvswitch, albeit a harmless one which added some unreachable code. From Jesse Gross 9) Wrong size used in flexible array allocation in openvswitch, from Pravin B Shelar 10) Clear out firmware capability flags the be2net driver isn't ready to handle yet, from Sarveshwar Bandi 11) Revert DMA mapping error checking addition to cxgb3 driver, it's buggy. From Alexey Kardashevskiy 12) Fix regression in packet scheduler rate limiting when working with a link layer of ATM. From Jesper Dangaard Brouer 13) Fix several errors in TCP Cubic congestion control, in particular overflow errors in timestamp calculations. From Eric Dumazet and Van Jacobson 14) In ipv6 routing lookups, we need to backtrack if subtree traversal don't result in a match. From Hannes Frederic Sowa 15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs 16) Get "low latency" out of the new MIB counter names. From Eliezer Tamir 17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar Samudrala 18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung Cheng 19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean 20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong 21) call_rcu() call to destroy SCTP transport is done too early and might result in an oops. From Daniel Borkmann 22) Fix races in genetlink family dumps, from Johannes Berg 23) Flags passed into macvlan by the user need to be validated properly, from Michael S Tsirkin 24) Fix skge build on 32-bit, from Stephen Hemminger 25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira Ayuso 26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay Aleksandrov 27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel Borkmann 28) neigh_parms need to be setup before calling the ->ndo_neigh_setup() method. From Veaceslav Falico 29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet 30) Don't dereference MLD query message if the length isn't value in the bridge multicast code, from Linus Lüssing 31) Fix VXLAN IGMP join regression due to an inverted check, from Cong Wang * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes tun: signedness bug in tun_get_user() qlcnic: Fix diagnostic interrupt test for 83xx adapters qlcnic: Fix beacon state return status handling qlcnic: Fix set driver version command net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset net_sched: restore "linklayer atm" handling drivers/net/ethernet/via/via-velocity.c: update napi implementation Revert "cxgb3: Check and handle the dma mapping errors" be2net: Clear any capability flags that driver is not interested in. openvswitch: Reset tunnel key between input and output. openvswitch: Use correct type while allocating flex array. openvswitch: Fix bad merge resolution. tun: compare with 0 instead of total_len rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header ethernet/arc/arc_emac - fix NAPI "work > weight" warning ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. bnx2x: prevent crash in shutdown flow with CNIC bnx2x: fix PTE write access error bnx2x: fix memory leak in VF ...
2013-08-15netlink: Eliminate kmalloc in netlink dump operation.Pravin B Shelar
Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15net: proc_fs: trivial: print UIDs as unsigned intFrancesco Fusco
UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15net_sched: restore "linklayer atm" handlingJesper Dangaard Brouer
commit 56b765b79 ("htb: improved accuracy at high rates") broke the "linklayer atm" handling. tc class add ... htb rate X ceil Y linklayer atm The linklayer setting is implemented by modifying the rate table which is send to the kernel. No direct parameter were transferred to the kernel indicating the linklayer setting. The commit 56b765b79 ("htb: improved accuracy at high rates") removed the use of the rate table system. To keep compatible with older iproute2 utils, this patch detects the linklayer by parsing the rate table. It also supports future versions of iproute2 to send this linklayer parameter to the kernel directly. This is done by using the __reserved field in struct tc_ratespec, to convey the choosen linklayer option, but only using the lower 4 bits of this field. Linklayer detection is limited to speeds below 100Mbit/s, because at high rates the rtab is gets too inaccurate, so bad that several fields contain the same values, this resembling the ATM detect. Fields even start to contain "0" time to send, e.g. at 1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have been more broken than we first realized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15Merge branch 'fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Three bug fixes that are fairly small either way but resolve obviously incorrect code. For net/3.11. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15rtnetlink: remove an unneeded testDan Carpenter
We know that "dev" is a valid pointer at this point, so we can remove the test and clean up a little. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15ip6tnl: add x-netns supportNicolas Dichtel
This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15ipip: add x-netns supportNicolas Dichtel
This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15ipv4 tunnels: use net_eq() helper to check netnsNicolas Dichtel
It's better to use available helpers for these tests. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15dev: move skb_scrub_packet() after eth_type_trans()Nicolas Dichtel
skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans() set pkt_type. In fact, we should force pkt_type to PACKET_HOST, so move the call after eth_type_trans(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-14openvswitch: Reset tunnel key between input and output.Jesse Gross
It doesn't make sense to output a tunnel packet using the same parameters that it was received with since that will generally just result in the packet going back to us. As a result, userspace assumes that the tunnel key is cleared when transitioning through the switch. In the majority of cases this doesn't matter since a packet is either going to a tunnel port (in which the key is overwritten with new values) or to a non-tunnel port (in which case the key is ignored). However, it's theoreticaly possible that userspace could rely on the documented behavior, so this corrects it. Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-14openvswitch: Use correct type while allocating flex array.Pravin B Shelar
Flex array is used to allocate hash buckets which is type struct hlist_head, but we use `struct hlist_head *` to calculate array size. Since hlist_head is of size pointer it works fine. Following patch use correct type. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-14openvswitch: Fix bad merge resolution.Jesse Gross
git silently included an extra hunk in vport_cmd_set() during automatic merging. This code is unreachable so it does not actually introduce a problem but it is clearly incorrect. Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-14rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg headerAsbjoern Sloth Toennesen
Fix the iproute2 command `bridge vlan show`, after switching from rtgenmsg to ifinfomsg. Let's start with a little history: Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in the 3.9 merge window. In the kernel commit 6cbdceeb, he added attribute support to bridge GETLINK requests sent with rtgenmsg. Mar 6th: Vlad got this iproute2 reference implementation of the bridge vlan netlink interface accepted (iproute2 9eff0e5c) Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca) http://patchwork.ozlabs.org/patch/239602/ http://marc.info/?t=136680900700007 Apr 28th: Linus released 3.9 Apr 30th: Stephen released iproute2 3.9.0 The `bridge vlan show` command haven't been working since the switch to ifinfomsg, or in a released version of iproute2. Since the kernel side only supports rtgenmsg, which iproute2 switched away from just prior to the iproute2 3.9.0 release. I haven't been able to find any documentation, about neither rtgenmsg nor ifinfomsg, and in which situation to use which, but kernel commit 88c5b5ce seams to suggest that ifinfomsg should be used. Fixing this in kernel will break compatibility, but I doubt that anybody have been using it due to this bug in the user space reference implementation, at least not without noticing this bug. That said the functionality is still fully functional in 3.9, when reversing iproute2 commit 63338dca. This could also be fixed in iproute2, but thats an ugly patch that would reintroduce rtgenmsg in iproute2, and from searching in netdev it seams like rtgenmsg usage is discouraged. I'm assuming that the only reason that Vlad implemented the kernel side to use rtgenmsg, was because iproute2 was using it at the time. Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-14ipv6: make unsolicited report intervals configurable for mldHannes Frederic Sowa
Commit cab70040dfd95ee32144f02fade64f0cb94f31a0 ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and 2690048c01f32bf45d1c1e1ab3079bc10ad2aea7 ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") by William Manley made igmp unsolicited report intervals configurable per interface and corrected the interval of unsolicited igmpv3 report messages resendings to 1s. Same needs to be done for IPv6: MLDv1 (RFC2710 7.10.): 10 seconds MLDv2 (RFC3810 9.11.): 1 second Both intervals are configurable via new procfs knobs mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval. (also added .force_mld_version to ipv6_devconf_dflt to bring structs in line without semantic changes) v2: a) Joined documentation update for IPv4 and IPv6 MLD/IGMP unsolicited_report_interval procfs knobs. b) incorporate stylistic feedback from William Manley v3: a) add new DEVCONF_* values to the end of the enum (thanks to David Miller) Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: William Manley <william.manley@youview.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.Pravin B Shelar
Using inner-id for tunnel id is not safe in some rare cases. E.g. packets coming from multiple sources entering same tunnel can have same id. Therefore on tunnel packet receive we could have packets from two different stream but with same source and dst IP with same ip-id which could confuse ip packet reassembly. Following patch reverts optimization from commit 490ab08127 (IP_GRE: Fix IP-Identification.) CC: Jarno Rajahalme <jrajahalme@nicira.com> CC: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13tcp: reset reordering est. selectively on timeoutYuchung Cheng
On timeout the TCP sender unconditionally resets the estimated degree of network reordering (tp->reordering). The idea behind this is that the estimate is too large to trigger fast recovery (e.g., due to a IP path change). But for example if the sender only had 2 packets outstanding, then a timeout doesn't tell much about reordering. A sender that learns about reordering on big writes and loses packets on small writes will end up falsely retransmitting again and again, especially when reordering is more likely on big writes. Therefore the sender should only suspect that tp->reordering is too high if it could have gone into fast recovery with the (lower) default estimate. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== This is a batch of updates intended for 3.12. It is mostly driver stuff, although Johannes Berg and Simon Wunderlich make a good showing with mac80211 bits (particularly some work on 5/10 MHz channel support). The usual suspects are mostly represented. There are lots of updates to iwlwifi, ath9k, ath10k, mwifiex, rt2x00, wil6210, as usual. The bcma bus gets some love this time, as do cw1200, iwl4965, and a few other bits here and there. I don't think there is much unusual here, FWIW. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13net: sctp: Add rudimentary infrastructure to account for control chunksVlad Yasevich
This patch adds a base infrastructure that allows SCTP to do memory accounting for control chunks. Real accounting code will follow. This patch alos fixes the following triggered bug ... [ 553.109742] kernel BUG at include/linux/skbuff.h:1813! [ 553.109766] invalid opcode: 0000 [#1] SMP [ 553.109789] Modules linked in: sctp libcrc32c rfcomm [...] [ 553.110259] uinput i915 i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core i2c_core wmi video sunrpc [ 553.110320] CPU: 0 PID: 1636 Comm: lt-test_1_to_1_ Not tainted 3.11.0-rc3+ #2 [ 553.110350] Hardware name: LENOVO 74597D6/74597D6, BIOS 6DET60WW (3.10 ) 09/17/2009 [ 553.110381] task: ffff88020a01dd40 ti: ffff880204ed0000 task.ti: ffff880204ed0000 [ 553.110411] RIP: 0010:[<ffffffffa0698017>] [<ffffffffa0698017>] skb_orphan.part.9+0x4/0x6 [sctp] [ 553.110459] RSP: 0018:ffff880204ed1bb8 EFLAGS: 00010286 [ 553.110483] RAX: ffff8802086f5a40 RBX: ffff880204303300 RCX: 0000000000000000 [ 553.110487] RDX: ffff880204303c28 RSI: ffff8802086f5a40 RDI: ffff880202158000 [ 553.110487] RBP: ffff880204ed1bb8 R08: 0000000000000000 R09: 0000000000000000 [ 553.110487] R10: ffff88022f2d9a04 R11: ffff880233001600 R12: 0000000000000000 [ 553.110487] R13: ffff880204303c00 R14: ffff8802293d0000 R15: ffff880202158000 [ 553.110487] FS: 00007f31b31fe740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [ 553.110487] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 553.110487] CR2: 000000379980e3e0 CR3: 000000020d225000 CR4: 00000000000407f0 [ 553.110487] Stack: [ 553.110487] ffff880204ed1ca8 ffffffffa068d7fc 0000000000000000 0000000000000000 [ 553.110487] 0000000000000000 ffff8802293d0000 ffff880202158000 ffffffff81cb7900 [ 553.110487] 0000000000000000 0000400000001c68 ffff8802086f5a40 000000000000000f [ 553.110487] Call Trace: [ 553.110487] [<ffffffffa068d7fc>] sctp_sendmsg+0x6bc/0xc80 [sctp] [ 553.110487] [<ffffffff8128f185>] ? sock_has_perm+0x75/0x90 [ 553.110487] [<ffffffff815a3593>] inet_sendmsg+0x63/0xb0 [ 553.110487] [<ffffffff8128f2b3>] ? selinux_socket_sendmsg+0x23/0x30 [ 553.110487] [<ffffffff8151c5d6>] sock_sendmsg+0xa6/0xd0 [ 553.110487] [<ffffffff81637b05>] ? _raw_spin_unlock_bh+0x15/0x20 [ 553.110487] [<ffffffff8151cd38>] SYSC_sendto+0x128/0x180 [ 553.110487] [<ffffffff8151ce6b>] ? SYSC_connect+0xdb/0x100 [ 553.110487] [<ffffffffa0690031>] ? sctp_inet_listen+0x71/0x1f0 [sctp] [ 553.110487] [<ffffffff8151d35e>] SyS_sendto+0xe/0x10 [ 553.110487] [<ffffffff81640202>] system_call_fastpath+0x16/0x1b [ 553.110487] Code: e0 48 c7 c7 00 22 6a a0 e8 67 a3 f0 e0 48 c7 [...] [ 553.110487] RIP [<ffffffffa0698017>] skb_orphan.part.9+0x4/0x6 [sctp] [ 553.110487] RSP <ffff880204ed1bb8> [ 553.121578] ---[ end trace 46c20c5903ef5be2 ]--- The approach taken here is to split data and control chunks creation a bit. Data chunks already have memory accounting so noting needs to happen. For control chunks, add stubs handlers. Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13netfilter: nfnetlink_queue: allow to attach expectations to conntracksPablo Neira Ayuso
This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13netfilter: ctnetlink: refactor ctnetlink_create_expectPablo Neira Ayuso
This patch refactors ctnetlink_create_expect by spliting it in two chunks. As a result, we have a new function ctnetlink_alloc_expect to allocate and to setup the expectation from ctnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13genetlink: fix family dump raceJohannes Berg
When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. A similar bug was reported to me on an old kernel (3.4.47) but the exact scenario that happened there is no longer possible, on those kernels the first round wasn't locked either. Looking at the current code I found the race described above, which had also existed on the old kernel. Cc: stable@vger.kernel.org Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13net: sctp: sctp_transport_destroy{, _rcu}: fix potential pointer corruptionDaniel Borkmann
Probably this one is quite unlikely to be triggered, but it's more safe to do the call_rcu() at the end after we have dropped the reference on the asoc and freed sctp packet chunks. The reason why is because in sctp_transport_destroy_rcu() the transport is being kfree()'d, and if we're unlucky enough we could run into corrupted pointers. Probably that's more of theoretical nature, but it's safer to have this simple fix. Introduced by commit 8c98653f ("sctp: sctp_close: fix release of bindings for deferred call_rcu's"). I also did the 8c98653f regression test and it's fine that way. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13net: sctp: sctp_assoc_control_transport: fix MTU size in SCTP_PF stateDaniel Borkmann
The SCTP Quick failover draft [1] section 5.1, point 5 says that the cwnd should be 1 MTU. So, instead of 1, set it to 1 MTU. [1] https://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 Reported-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-12Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/ethernet/broadcom/Kconfig
2013-08-12af_unix: fix bug on large send()Eric Dumazet
commit e370a723632 ("af_unix: improve STREAM behavior with fragmented memory") added a bug on large send() because the skb_copy_datagram_from_iovec() call always start from the beginning of iovec. We must instead use the @sent variable to properly skip the already processed part. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-12tipc: avoid possible deadlock while enable and disable bearerdingtianhong
We met lockdep warning when enable and disable the bearer for commands such as: tipc-config -netid=1234 -addr=1.1.3 -be=eth:eth0 tipc-config -netid=1234 -addr=1.1.3 -bd=eth:eth0 --------------------------------------------------- [ 327.693595] ====================================================== [ 327.693994] [ INFO: possible circular locking dependency detected ] [ 327.694519] 3.11.0-rc3-wwd-default #4 Tainted: G O [ 327.694882] ------------------------------------------------------- [ 327.695385] tipc-config/5825 is trying to acquire lock: [ 327.695754] (((timer))#2){+.-...}, at: [<ffffffff8105be80>] del_timer_sync+0x0/0xd0 [ 327.696018] [ 327.696018] but task is already holding lock: [ 327.696018] (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+ 0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] which lock already depends on the new lock. [ 327.696018] [ 327.696018] [ 327.696018] the existing dependency chain (in reverse order) is: [ 327.696018] [ 327.696018] -> #1 (&(&b_ptr->lock)->rlock){+.-...}: [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff814d65b1>] _raw_spin_lock_bh+0x41/0x80 [ 327.696018] [<ffffffffa02c5d48>] disc_timeout+0x18/0xd0 [tipc] [ 327.696018] [<ffffffff8105b92a>] call_timer_fn+0xda/0x1e0 [ 327.696018] [<ffffffff8105bcd7>] run_timer_softirq+0x2a7/0x2d0 [ 327.696018] [<ffffffff8105379a>] __do_softirq+0x16a/0x2e0 [ 327.696018] [<ffffffff81053a35>] irq_exit+0xd5/0xe0 [ 327.696018] [<ffffffff81033005>] smp_apic_timer_interrupt+0x45/0x60 [ 327.696018] [<ffffffff814df4af>] apic_timer_interrupt+0x6f/0x80 [ 327.696018] [<ffffffff8100b70e>] arch_cpu_idle+0x1e/0x30 [ 327.696018] [<ffffffff810a039d>] cpu_idle_loop+0x1fd/0x280 [ 327.696018] [<ffffffff810a043e>] cpu_startup_entry+0x1e/0x20 [ 327.696018] [<ffffffff81031589>] start_secondary+0x89/0x90 [ 327.696018] [ 327.696018] -> #0 (((timer))#2){+.-...}: [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b [ 327.696018] [ 327.696018] other info that might help us debug this: [ 327.696018] [ 327.696018] Possible unsafe locking scenario: [ 327.696018] [ 327.696018] CPU0 CPU1 [ 327.696018] ---- ---- [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] [ 327.696018] *** DEADLOCK *** [ 327.696018] [ 327.696018] 5 locks held by tipc-config/5825: [ 327.696018] #0: (cb_lock){++++++}, at: [<ffffffff8143e608>] genl_rcv+0x18/0x40 [ 327.696018] #1: (genl_mutex){+.+.+.}, at: [<ffffffff8143ed66>] genl_rcv_msg+0xa6/0xd0 [ 327.696018] #2: (config_mutex){+.+.+.}, at: [<ffffffffa02bf889>] tipc_cfg_do_cmd+0x39/ 0x550 [tipc] [ 327.696018] #3: (tipc_net_lock){++.-..}, at: [<ffffffffa02be738>] tipc_disable_bearer+ 0x18/0x60 [tipc] [ 327.696018] #4: (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] stack backtrace: [ 327.696018] CPU: 2 PID: 5825 Comm: tipc-config Tainted: G O 3.11.0-rc3-wwd- default #4 [ 327.696018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 327.696018] 00000000ffffffff ffff880037fa77a8 ffffffff814d03dd 0000000000000000 [ 327.696018] ffff880037fa7808 ffff880037fa77e8 ffffffff810b1c4f 0000000037fa77e8 [ 327.696018] ffff880037fa7808 ffff880037e4db40 0000000000000000 ffff880037e4e318 [ 327.696018] Call Trace: [ 327.696018] [<ffffffff814d03dd>] dump_stack+0x4d/0xa0 [ 327.696018] [<ffffffff810b1c4f>] print_circular_bug+0x10f/0x120 [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff81087a28>] ? sched_clock_cpu+0xd8/0x110 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffff81218783>] ? security_capable+0x13/0x20 [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143ecc0>] ? genl_lock+0x20/0x20 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e608>] ? genl_rcv+0x18/0x40 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff81289d7c>] ? memcpy_fromiovec+0x6c/0x90 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff813fe29c>] ? release_sock+0x8c/0xa0 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fa24>] ? rw_verify_area+0x54/0x100 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b ----------------------------------------------------------------------- The problem is that the tipc_link_delete() will cancel the timer disc_timeout() when the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to finish the work, so the dead lock occurs. We should unlock the b_ptr->lock when del the disc_timeout(). Remove link_timeout() still met the same problem, the patch: http://article.gmane.org/gmane.network.tipc.general/4380 fix the problem, so no need to send patch for fix link_timeout() deadlock warming. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included change: - reassign pointers to data after skb reallocation to avoid kernel paging errors Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10batman-adv: fix potential kernel paging errors for unicast transmissionsLinus Lüssing
There are several functions which might reallocate skb data. Currently some places keep reusing their old ethhdr pointer regardless of whether they became invalid after such a reallocation or not. This potentially leads to kernel paging errors. This patch fixes these by refetching the ethdr pointer after the potential reallocations. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-08-10Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains four netfilter fixes, they are: * Fix possible invalid access and mangling of the TCPMSS option in xt_TCPMSS. This was spotted by Julian Anastasov. * Fix possible off by one access and mangling of the TCP packet in xt_TCPOPTSTRIP, also spotted by Julian Anastasov. * Fix possible information leak due to missing initialization of one padding field of several structures that are included in nfqueue and nflog netlink messages, from Dan Carpenter. * Fix TCP window tracking with Fast Open, from Yuchung Cheng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10netfilter: nf_conntrack: fix tcp_in_window for Fast OpenYuchung Cheng
Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10rtnetlink: Fix inverted check in ndo_dflt_fdb_del()Sridhar Samudrala
Fix inverted check when deleting an fdb entry. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet
Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10af_unix: improve STREAM behavior with fragmented memoryEric Dumazet
unix_stream_sendmsg() currently uses order-2 allocations, and we had numerous reports this can fail. The __GFP_REPEAT flag present in sock_alloc_send_pskb() is not helping. This patch extends the work done in commit eb6a24816b247c ("af_unix: reduce high order page allocations) for datagram sockets. This opens the possibility of zero copy IO (splice() and friends) The trick is to not use skb_pull() anymore in recvmsg() path, and instead add a @consumed field in UNIXCB() to track amount of already read payload in the skb. There is a performance regression for large sends because of extra page allocations that will be addressed in a follow-up patch, allowing sock_alloc_send_pskb() to attempt high order page allocations. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10tcp: add server ip to encrypt cookie in fast openYuchung Cheng
Encrypt the cookie with both server and client IPv4 addresses, such that multi-homed server will grant different cookies based on both the source and destination IPs. No client change is needed since cookie is opaque to the client. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl"David S. Miller
This reverts commit cda5f98e36576596b9230483ec52bff3cc97eb21. As per Vlad's request. Signed-off-by: David S. Miller <davem@davemloft.net>