summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2015-12-16ila: Add generic ILA translation facilityTom Herbert
This patch implements an ILA tanslation table. This table can be configured with identifier to locator mappings, and can be be queried to resolve a mapping. Queries can be parameterized based on interface, direction (incoming or outoing), and matching locator. The table is implemented using rhashtable and is configured via netlink (through "ip ila .." in iproute). The table may be used as alternative means to do do ILA tanslations other than the lw tunnels Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-16ila: Create net/ipv6/ila directoryTom Herbert
Create ila directory in preparation for supporting other hooks in the kernel than LWT for doing ILA. This includes: - Moving ila.c to ila/ila_lwt.c - Splitting out some common functions into ila_common.c Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUMTom Herbert
These netif flags are unnecessary convolutions. It is more straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM directly. This patch also: - Cleans up can_checksum_protocol - Simplifies netdev_intersect_features Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-14ipv6: addrconf: drop ieee802154 specific thingsAlexander Aring
This patch removes ARPHRD_IEEE802154 from addrconf handling. In the earlier days of 802.15.4 6LoWPAN, the interface type was ARPHRD_IEEE802154 which introduced several issues, because 802.15.4 interfaces used the same type. Since commit 965e613d299c ("ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPAN") we use ARPHRD_6LOWPAN for 6LoWPAN interfaces. This patch will remove ARPHRD_IEEE802154 which is currently deadcode, because ARPHRD_IEEE802154 doesn't reach the minimum 1280 MTU of IPv6. Also we use 6LoWPAN EUI64 specific defines instead using link-layer constanst from 802.15.4 link-layer header. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-05ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addressesAndrew Lunn
An interface changing type may not have IPv6 addresses. Don't call the address configuration type change in this case. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/renesas/ravb_main.c kernel/bpf/syscall.c net/ipv4/ipmr.c All three conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03net: ipv6: restrict hop_limit sysctl setting to range [1; 255]Phil Sutter
Setting a value bigger than 255 resulted in using only the lower eight bits of that value as it is assigned to the u8 header field. To avoid this unexpected result, reject such values. Setting a value of zero is technically possible, but hosts receiving such a packet have to treat it like hop_limit was set to one, according to RFC2460. Therefore I don't see a use-case for that. Setting a route's hop_limit to zero in iproute2 means to use the sysctl default, which is not the case here: Setting e.g. net.conf.eth0.hop_limit=0 will not make the kernel use net.conf.all.hop_limit for outgoing packets on eth0. To avoid these kinds of confusion, reject zero. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03ipv6: kill sk_dst_lockEric Dumazet
While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03tcp: suppress too verbose messages in tcp_send_ack()Eric Dumazet
If tcp_send_ack() can not allocate skb, we properly handle this and setup a timer to try later. Use __GFP_NOWARN to avoid polluting syslog in the case host is under memory pressure, so that pertinent messages are not lost under a flood of useless information. sk_gfp_atomic() can use its gfp_mask argument (all callers currently were using GFP_ATOMIC before this patch) We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this function now takes into account its second argument (gfp_mask) Note that when tcp_transmit_skb() is called with clone_it set to false, we do not attempt memory allocations, so can pass a 0 gfp_mask, which most compilers can emit faster than a non zero or constant value. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03ipv6: add complete rcu protection around np->optEric Dumazet
This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"Nicolas Dichtel
This reverts commit ab450605b35caa768ca33e86db9403229bf42be4. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <jbenc@redhat.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30net: remove unnecessary mroute.h includesNikolay Aleksandrov
It looks like many files are including mroute.h unnecessarily, so remove the include. Most importantly remove it from ipv6. CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24net: ipmr, ip6mr: fix vif/tunnel failure race conditionNikolay Aleksandrov
Since (at least) commit b17a7c179dd3 ("[NET]: Do sysfs registration as part of register_netdevice."), netdev_run_todo() deals only with unregistration, so we don't need to do the rtnl_unlock/lock cycle to finish registration when failing pimreg or dvmrp device creation. In fact that opens a race condition where someone can delete the device while rtnl is unlocked because it's fully registered. The problem gets worse when netlink support is introduced as there are more points of entry that can cause it and it also makes reusing that code correctly impossible. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24ipv6: distinguish frag queues by device for multicast and link-local packetsMichal Kubeček
If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23net: ip6mr: fix static mfc/dev leaks on table destructionNikolay Aleksandrov
Similar to ipv4, when destroying an mrt table the static mfc entries and the static devices are kept, which leads to devices that can never be destroyed (because of refcnt taken) and leaked memory. Make sure that everything is cleaned up on netns destruction. Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code") CC: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-22net: IPv6 fib lookup tracepointDavid Ahern
Add tracepoint to show fib6 table lookups and result. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18net/ip6_tunnel: fix dst leakPaolo Abeni
the commit cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel") introduced percpu storage for ip6_tunnel dst cache, but while clearing such cache it used raw_cpu_ptr to walk the per cpu entries, so cached dst on non current cpu are not actually reset. This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning such storage. Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16snmp: Remove duplicate OUTMCAST stat incrementNeil Horman
the OUTMCAST stat is double incremented, getting bumped once in the mcast code itself, and again in the common ip output path. Remove the mcast bump, as its not needed Validated by the reporter, with good results Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Claus Jensen <claus.jensen@microsemi.com> CC: Claus Jensen <claus.jensen@microsemi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15tcp: ensure proper barriers in lockless contextsEric Dumazet
Some functions access TCP sockets without holding a lock and might output non consistent data, depending on compiler and or architecture. tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ... Introduce sk_state_load() and sk_state_store() to fix the issues, and more clearly document where this lack of locking is happening. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Check rt->dst.from for the DST_NOCACHE routeMartin KaFai Lau
All DST_NOCACHE rt6_info used to have rt->dst.from set to its parent. After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"), DST_NOCACHE is also set to rt6_info which does not have a parent (i.e. rt->dst.from is NULL). This patch catches the rt->dst.from == NULL case. Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Check expire on DST_NOCACHE routeMartin KaFai Lau
Since the expires of the DST_NOCACHE rt can be set during the ip6_rt_update_pmtu(), we also need to consider the expires value when doing ip6_dst_check(). This patches creates __rt6_check_expired() to only check the expire value (if one exists) of the current rt. In rt6_dst_from_check(), it adds __rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 treeMartin KaFai Lau
The original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1272571 The setup has a IPv4 GRE tunnel running in a IPSec. The bug happens when ndisc starts sending router solicitation at the gre interface. The simplified oops stack is like: __lock_acquire+0x1b2/0x1c30 lock_acquire+0xb9/0x140 _raw_write_lock_bh+0x3f/0x50 __ip6_ins_rt+0x2e/0x60 ip6_ins_rt+0x49/0x50 ~~~~~~~~ __ip6_rt_update_pmtu.part.54+0x145/0x250 ip6_rt_update_pmtu+0x2e/0x40 ~~~~~~~~ ip_tunnel_xmit+0x1f1/0xf40 __gre_xmit+0x7a/0x90 ipgre_xmit+0x15a/0x220 dev_hard_start_xmit+0x2bd/0x480 __dev_queue_xmit+0x696/0x730 dev_queue_xmit+0x10/0x20 neigh_direct_output+0x11/0x20 ip6_finish_output2+0x21f/0x770 ip6_finish_output+0xa7/0x1d0 ip6_output+0x56/0x190 ~~~~~~~~ ndisc_send_skb+0x1d9/0x400 ndisc_send_rs+0x88/0xc0 ~~~~~~~~ The rt passed to ip6_rt_update_pmtu() is created by icmp6_dst_alloc() and it is not managed by the fib6 tree, so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL and it causes the ip6_ins_rt() oops. During pmtu update, we only want to create a RTF_CACHE clone from a rt which is currently managed (or owned) by the fib6 tree. It means either rt->rt6i_node != NULL or rt is a RTF_PCPU clone. It is worth to note that rt6i_table may not be NULL even it is not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()). Hence, rt6i_node is a better check instead of rt6i_table. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05tcp: use correct req pointer in tcp_move_syn() callsEric Dumazet
I mistakenly took wrong request sock pointer when calling tcp_move_syn() @req_unhash is either a copy of @req, or a NULL value for FastOpen connexions (as we do not expect to unhash the temporary request sock from ehash table) Fixes: 805c4bc05705 ("tcp: fix req->saved_syn race") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05tcp: fix req->saved_syn raceEric Dumazet
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access req->saved_syn unless we own the request sock. This fixes races for listeners using TCP_SAVE_SYN option. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ying Cai <ycai@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_devSabrina Dubroca
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up the dev_snmp6 entry that we have already registered for this device. Call snmp6_unregister_dev in this case. Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were fixing the "BH-ness" of the counter bumps whilst in 'net-next' the functions were modified to take an explicit 'net' parameter. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03ipv6: fix tunnel error handlingMichal Kubeček
Both tunnel6_protocol and tunnel46_protocol share the same error handler, tunnel6_err(), which traverses through tunnel6_handlers list. For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g. in tunnel46_rcv(). Current code can generate an ICMPv6 error message with an IPv4 packet embedded in it. Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03sit: fix sit0 percpu double allocationsEric Dumazet
sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03net: fix percpu memory leaksEric Dumazet
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed sourceMatthias Schiffer
There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02tcp/dccp: fix ireq->pktopts raceEric Dumazet
IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-10-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTUAlexander Duyck
This change makes it so that we reinitialize the interface if the MTU is increased back above IPV6_MIN_MTU and the interface is up. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-28ipv6: Export nf_ct_frag6_consume_orig()Joe Stringer
This is needed in openvswitch to fix an skb leak in the next patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27ipv6: icmp: include addresses in debug messagesBjørn Mork
Messages like "icmp6_send: no reply to icmp error" are close to useless. Adding source and destination addresses to provide some more clue. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23tcp/dccp: fix hashdance race for passive sessionsEric Dumazet
Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: fix the incorrect return value of throw routelucien
The error condition -EAGAIN, which is signaled by throw routes, tells the rules framework to walk on searching for next matches. If the walk ends and we stop walking the rules with the result of a throw route we have to translate the error conditions to -ENETUNREACH. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-10-22 1) Fix IPsec pre-encap fragmentation for GSO packets. From Herbert Xu. 2) Fix some header checks in _decode_session6. We skip the header informations if the data pointer points already behind the header in question for some protocols. This is because we call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull in this case. Skipping the header informations can lead to incorrect policy lookups. From Mathias Krause. 3) Allow to change the replay threshold and expiry timer of a state without having to set other attributes like replay counter and byte lifetime. Changing these other attributes may break the SA. From Michael Rossberg. 4) Fix pmtu discovery for local generated packets. We may fail dispatch to the inner address family. As a reault, the local error handler is not called and the mtu value is not reported back to userspace. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr setDavid Ahern
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if oif is given. Hajime reported that this change breaks the Mobile IPv6 use case that wants to force the message through one interface yet use the source address from another interface. Handle this case by only adding the flag if oif is set and saddr is not set. Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") Cc: Hajime Tazaki <thehajime@gmail.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22ipv6: gro: support sit protocolEric Dumazet
Tom Herbert added SIT support to GRO with commit 19424e052fb4 ("sit: Add gro callbacks to sit_offload"), later reverted by Herbert Xu. The problem came because Tom patch was building GRO packets without proper meta data : If packets were locally delivered, we would not care. But if packets needed to be forwarded, GSO engine was not able to segment individual segments. With the following patch, we correctly set skb->encapsulation and inner network header. We also update gso_type. Tested: Server : netserver modprobe dummy ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up arp -s 8.0.0.100 4e:32:51:04:47:e5 iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100 ifconfig sixtofour0 sixtofour0 Link encap:IPv6-in-IPv4 inet6 addr: 2002:af6:798::1/128 Scope:Global inet6 addr: 2002:af6:798::/128 Scope:Global UP RUNNING NOARP MTU:1480 Metric:1 RX packets:411169 errors:0 dropped:0 overruns:0 frame:0 TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:20319631739 (20.3 GB) TX bytes:29529556 (29.5 MB) Client : netperf -H 2002:af6:798::1 -l 1000 & Checked on server traffic copied on dummy0 and verify segments were properly rebuilt, with proper IP headers, TCP checksums... tcpdump on eth0 shows proper GRO aggregation takes place. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains four Netfilter fixes for net, they are: 1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6. 2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking --accept-local, from Xin Long. 3) Wait for RCU grace period after dropping the pending packets in the nfqueue, from Florian Westphal. 4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22netlink: Rightsize IFLA_AF_SPEC size calculationArad, Ronen
if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22net: Really fix vti6 with oif in dst lookupsDavid Ahern
6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>