summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-08-23openvswitch: Mega flow implementationAndy Zhou
Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: check CONFIG_OPENVSWITCH_GRE in makefileCong Wang
Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: Fix argument descriptions in vport.c.Justin Pettit
Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch:: link upper device for port devicesJiri Pirko
Link upper device properly. That will make IFLA_MASTER filled up. Set the master to port 0 of the datapath under which the port belongs. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: Use non rcu hlist_del() flow table entry.Pravin B Shelar
Flow table destroy is done in rcu call-back context. Therefore there is no need to use rcu variant of hlist_del(). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: Use RCU lock for dp dump operation.Pravin B Shelar
RCUfy dp-dump operation which is already read-only. This makes all ovs dump operations lockless. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: Use RCU lock for flow dump operation.Pravin B Shelar
Flow dump operation is read-only operation. There is no need to take ovs-lock. Following patch use rcu-lock for dumping flows. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23net: sctp_probe: simplify code by using %pISc format specifierDaniel Borkmann
We can simply use the %pISc format specifier that was recently added and thus remove some code that distinguishes between IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-23ipv6: handle Redirect ICMP Message with no Redirected Header optionDuan Jiong
rfc 4861 says the Redirected Header option is optional, so the kernel should not drop the Redirect Message that has no Redirected Header option. In this patch, the function ip6_redirect_no_header() is introduced to deal with that condition. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-08-22net: tcp_probe: add IPv6 supportDaniel Borkmann
The tcp_probe currently only supports analysis of IPv4 connections. Therefore, it would be nice to have IPv6 supported as well. Since we have the recently added %pISpc specifier that is IPv4/IPv6 generic, build related sockaddress structures from the flow information and pass this to our format string. Tested with SSH and HTTP sessions on IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22net: tcp_probe: kprobes: adapt jtcp_rcv_established signatureDaniel Borkmann
This patches fixes a rather unproblematic function signature mismatch as the const specifier was missing for the th variable; and next to that it adds a build-time assertion so that future function signature mismatches for kprobes will not end badly, similarly as commit 22222997 ("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack") did it for SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22net: tcp_probe: also include rcv_wnd next to snd_wndDaniel Borkmann
It is helpful to sometimes know the TCP window sizes of an established socket e.g. to confirm that window scaling is working or to tweak the window size to improve high-latency connections, etc etc. Currently the TCP snooper only exports the send window size, but not the receive window size. Therefore, also add the receive window size to the end of the output line. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Some constifications, from Mathias Krause. 2) Catch bugs if a hold timer is still active when xfrm_policy_destroy() is called, from Fan Du. 3) Remove a redundant address family checking, from Fan Du. 4) Make xfrm_state timer monotonic to be independent of system clock changes, from Fan Du. 5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(), from Rami Rosen. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22tcp: increase throughput when reordering is highYuchung Cheng
The stack currently detects reordering and avoid spurious retransmission very well. However the throughput is sub-optimal under high reordering because cwnd is increased only if the data is deliverd in order. I.e., FLAG_DATA_ACKED check in tcp_ack(). The more packet are reordered the worse the throughput is. Therefore when reordering is proven high, cwnd should advance whenever the data is delivered regardless of its ordering. If reordering is low, conservatively advance cwnd only on ordered deliveries in Open state, and retain cwnd in Disordered state (RFC5681). Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms to 55ms (for reordering effect). This change increases TCP throughput by 20 - 25% to near bottleneck BW. A special case is the stretched ACK with new SACK and/or ECE mark. For example, a receiver may receive an out of order or ECN packet with unacked data buffered because of LRO or delayed ACK. The principle on such an ACK is to advance cwnd on the cummulative acked part first, then reduce cwnd in tcp_fastretrans_alert(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22Revert "genetlink: fix family dump race"Johannes Berg
This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengthsDaniel Borkmann
Instead of hard-coding length values, use a define to make it clear where those lengths come from. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21net: ipv6: minor: *_start_timer: rather use unsigned longDaniel Borkmann
For the functions mld_gq_start_timer(), mld_ifc_start_timer(), and mld_dad_start_timer(), rather use unsigned long than int as we operate only on unsigned values anyway. This seems more appropriate as there is no good reason to do type conversions to int, that could lead to future errors. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21net: ipv6: igmp6_event_query: use msecs_to_jiffiesDaniel Borkmann
Use proper API functions to calculate jiffies from milliseconds and not the crude method of dividing HZ by a value. This ensures more accurate values even in the case of strange HZ values. While at it, also simplify code in the mlh2 case by using max(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21ip6_tunnel: ensure to always have a link local addressNicolas Dichtel
When an Xin6 tunnel is set up, we check other netdevices to inherit the link- local address. If none is available, the interface will not have any link-local address. RFC4862 expects that each interface has a link local address. Now than this kind of tunnels supports x-netns, it's easy to fall in this case (by creating the tunnel in a netns where ethernet interfaces stand and then moving it to a other netns where no ethernet interface is available). RFC4291, Appendix A suggests two methods: the first is the one currently implemented, the second is to generate a unique identifier, so that we can always generate the link-local address. Let's use eth_random_addr() to generate this interface indentifier. I remove completly the previous method, hence for the whole life of the interface, the link-local address remains the same (previously, it depends on which ethernet interfaces were up when the tunnel interface was set up). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c"David S. Miller
This reverts commit df8372ca747f6da9e8590775721d9363c1dfc87e. These changes are buggy and make unintended semantic changes to ip6_tnl_add_linklocal(). Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21bridge: Use the correct bit length for bitmap functions in the VLAN codeToshiaki Makita
The VLAN code needs to know the length of the per-port VLAN bitmap to perform its most basic operations (retrieving VLAN informations, removing VLANs, forwarding database manipulation, etc). Unfortunately, in the current implementation we are using a macro that indicates the bitmap size in longs in places where the size in bits is expected, which in some cases can cause what appear to be random failures. Use the correct macro. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Regarding the iwlwifi bits, Johannes says: "We revert an rfkill bugfix that unfortunately caused more bugs, shuffle some code to avoid touching the PCIe device before it's enabled and disconnect if firmware fails to do our bidding. I also have Stanislaw's fix to not crash in some channel switch scenarios." As for the mac80211 bits, Johannes says: "This time, I have one fix from Dan Carpenter for users of nl80211hdr_put(), and one fix from myself fixing a regression with the libertas driver." Along with the above... Dan Carpenter fixes some incorrectly placed "address of" operators in hostap that caused copying of junk data. Jussi Kivilinna corrects zd1201 to use an allocated buffer rather than the stack for a URB operation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21packet: restore packet statistics tp_packets to include dropsWillem de Bruijn
getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit ee80fbf301 ("packet: account statistics only in tpacket_stats_u") cleaned up the getsockopt PACKET_STATISTICS code. This also changed semantics. Historically, tp_packets included tp_drops on return. The commit removed the line that adds tp_drops into tp_packets. This patch reinstates the old semantics. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included change: - Check if the skb has been correctly prepared before going on
2013-08-20ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()Dan Carpenter
We need to move the derefernce after the IS_ERR() check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv4: raise IP_MAX_MTU to theoretical limitEric Dumazet
As discussed last year [1], there is no compelling reason to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF [1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2 Willem raised this issue again because some of our internal regression tests broke after lo mtu being set to 65536. IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of mtu + 1 bytes, expecting the send() to fail, but it does not. Alexey raised interesting points about TCP MSS, that should be addressed in follow-up patches in TCP stack if needed, as someone could also set an odd mtu anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20tcp: trivial: Remove nocache argument from tcp_v4_send_synackChristoph Paasch
The nocache-argument was used in tcp_v4_send_synack as an argument to inet_csk_route_req. However, since ba3f7f04ef2b (ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used. This patch removes the unsued argument from tcp_v4_send_synack. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: fix checkpatch errors in net/ipv6/addrconf.cdingtianhong
ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: convert the uses of ADBG and remove the superfluous parenthesesdingtianhong
Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: handle context based source addressAlexander Aring
Handle context based address when an unspecified address is given. For other context based address we print a warning and drop the packet because we don't support it right now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: lowpan_uncompress_addr with address_modeAlexander Aring
This patch drops the pre and postcount calculation from the lowpan_uncompress_addr function.We use instead a switch/case over address_mode value. The original implementation has several bugs in this function and it was hard to decrypt how it works. To make it maintainable and fix these bugs this patch basically reimplements lowpan_uncompress_addr from scratch. A list of bugs we found in the current implementation: 1) Properly support uncompression of short-address based IPv6 addresses (instead of basically copying garbage) 2) Fix use and uncompression of long-addresses based IPv6 addresses 3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0 Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: add function to uncompress multicast addrAlexander Aring
Add function to uncompress multicast address. This function split the uncompress function for a multicast address in a seperate function. To uncompress a multicast address is different than a other non-multicasts addresses according to rfc6282. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: introduce lowpan_fetch_skb functionAlexander Aring
This patch adds a helper function to parse the ipv6 header to a 6lowpan header in stream. This function checks first if we can pull data with a specific length from a skb. If this seems to be okay, we copy skb data to a destination pointer and run skb_pull. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: Fix fragmentation with link-local compressed addressesDavid Hauweele
When a new 6lowpan fragment is received, a skbuff is allocated for the reassembled packet. However when a 6lowpan packet compresses link-local addresses based on link-layer addresses, the processing function relies on the skb mac control block to find the related link-layer address. This patch copies the control block from the first fragment into the newly allocated skb to keep a trace of the link-layer addresses in case of a link-local compressed address. Edit: small changes on comment issue Signed-off-by: David Hauweele <david@hauweele.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-206lowpan: init ipv6hdr buffer to zeroAlexander Aring
This patch simplify the handling to set fields inside of struct ipv6hdr to zero. Instead of setting some memory regions with memset to zero we initialize the whole ipv6hdr to zero. This is a simplification for parsing the 6lowpan header for the upcomming patches. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20tcp: set timestamps for restored skb-sAndrey Vagin
When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent. The "when" field must be set for such skb. It's used in tcp_rearm_rto for example. If the "when" field isn't set, the retransmit timeout can be calculated incorrectly and a tcp connected can stop for two minutes (TCP_RTO_MAX). Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20openvswitch: Add vxlan tunneling support.Pravin B Shelar
Following patch adds vxlan vport type for openvswitch using vxlan api. So now there is vxlan dependency for openvswitch. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: drop packets with multiple fragmentation headersHannes Frederic Sowa
It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp. The updates for RFC 6980 will come in later, I have to do a bit more research here. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20ipv6: remove max_addresses check from ipv6_create_tempaddrHannes Frederic Sowa
Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-19Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-08-19xfrm: remove irrelevant comment in xfrm_input().Rami Rosen
This patch removes a comment in xfrm_input() which became irrelevant due to commit 2774c13, "xfrm: Handle blackhole route creation via afinfo". That commit removed returning -EREMOTE in the xfrm_lookup() method when the packet should be discarded and also removed the correspoinding -EREMOTE handlers. This was replaced by calling the make_blackhole() method. Therefore the comment about -EREMOTE is not relevant anymore. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-17batman-adv: check return type of unicast packet preparationsLinus Lüssing
batadv_unicast(_4addr)_prepare_skb might reallocate the skb's data. And if it tries to do so then this can potentially fail. We shouldn't continue working on this skb in such a case. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix SKB leak in 8139cp, from Dave Jones. 2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar. 3) RCU conversion of macvtap introduced two races, fixes by Eric Dumazet 4) Synchronize statistic flows in bnx2x driver to prevent corruption, from Dmitry Kravkov 5) Undo optimization in IP tunneling, we were using the inner IP header in some cases to inherit the IP ID, but that isn't correct in some circumstances. From Pravin B Shelar 6) Use correct struct size when parsing netlink attributes in rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen 7) Length verifications in tun_get_user() are bogus, from Weiping Pan and Dan Carpenter 8) Fix bad merge resolution during 3.11 networking development in openvswitch, albeit a harmless one which added some unreachable code. From Jesse Gross 9) Wrong size used in flexible array allocation in openvswitch, from Pravin B Shelar 10) Clear out firmware capability flags the be2net driver isn't ready to handle yet, from Sarveshwar Bandi 11) Revert DMA mapping error checking addition to cxgb3 driver, it's buggy. From Alexey Kardashevskiy 12) Fix regression in packet scheduler rate limiting when working with a link layer of ATM. From Jesper Dangaard Brouer 13) Fix several errors in TCP Cubic congestion control, in particular overflow errors in timestamp calculations. From Eric Dumazet and Van Jacobson 14) In ipv6 routing lookups, we need to backtrack if subtree traversal don't result in a match. From Hannes Frederic Sowa 15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs 16) Get "low latency" out of the new MIB counter names. From Eliezer Tamir 17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar Samudrala 18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung Cheng 19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean 20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong 21) call_rcu() call to destroy SCTP transport is done too early and might result in an oops. From Daniel Borkmann 22) Fix races in genetlink family dumps, from Johannes Berg 23) Flags passed into macvlan by the user need to be validated properly, from Michael S Tsirkin 24) Fix skge build on 32-bit, from Stephen Hemminger 25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira Ayuso 26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay Aleksandrov 27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel Borkmann 28) neigh_parms need to be setup before calling the ->ndo_neigh_setup() method. From Veaceslav Falico 29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet 30) Don't dereference MLD query message if the length isn't value in the bridge multicast code, from Linus Lüssing 31) Fix VXLAN IGMP join regression due to an inverted check, from Cong Wang * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes tun: signedness bug in tun_get_user() qlcnic: Fix diagnostic interrupt test for 83xx adapters qlcnic: Fix beacon state return status handling qlcnic: Fix set driver version command net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset net_sched: restore "linklayer atm" handling drivers/net/ethernet/via/via-velocity.c: update napi implementation Revert "cxgb3: Check and handle the dma mapping errors" be2net: Clear any capability flags that driver is not interested in. openvswitch: Reset tunnel key between input and output. openvswitch: Use correct type while allocating flex array. openvswitch: Fix bad merge resolution. tun: compare with 0 instead of total_len rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header ethernet/arc/arc_emac - fix NAPI "work > weight" warning ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. bnx2x: prevent crash in shutdown flow with CNIC bnx2x: fix PTE write access error bnx2x: fix memory leak in VF ...
2013-08-16xfrm: Make xfrm_state timer monotonicFan Du
xfrm_state timer should be independent of system clock change, so switch to CLOCK_BOOTTIME base which is not only monotonic but also counting suspend time. Thus issue reported in commit: 9e0d57fd6dad37d72a3ca6db00ca8c76f2215454 ("xfrm: SAD entries do not expire correctly after suspend-resume") could ALSO be avoided. v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-15netlink: Eliminate kmalloc in netlink dump operation.Pravin B Shelar
Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15net: proc_fs: trivial: print UIDs as unsigned intFrancesco Fusco
UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-08-15net_sched: restore "linklayer atm" handlingJesper Dangaard Brouer
commit 56b765b79 ("htb: improved accuracy at high rates") broke the "linklayer atm" handling. tc class add ... htb rate X ceil Y linklayer atm The linklayer setting is implemented by modifying the rate table which is send to the kernel. No direct parameter were transferred to the kernel indicating the linklayer setting. The commit 56b765b79 ("htb: improved accuracy at high rates") removed the use of the rate table system. To keep compatible with older iproute2 utils, this patch detects the linklayer by parsing the rate table. It also supports future versions of iproute2 to send this linklayer parameter to the kernel directly. This is done by using the __reserved field in struct tc_ratespec, to convey the choosen linklayer option, but only using the lower 4 bits of this field. Linklayer detection is limited to speeds below 100Mbit/s, because at high rates the rtab is gets too inaccurate, so bad that several fields contain the same values, this resembling the ATM detect. Fields even start to contain "0" time to send, e.g. at 1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have been more broken than we first realized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>