summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-13udp: consistently apply ufo or fragmentationWillem de Bruijn
[ Upstream commit 85f1bd9a7b5a79d5baa8bf44af19658f7bf77bfa ] When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13revert "ipv4: Should use consistent conditional judgement for ip fragment in ↵Greg Kroah-Hartman
__ip_append_data and ip_finish_output" This reverts commit f102bb7164c9020e12662998f0fd99c3be72d4f6 which is commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 upstream as there is another patch that needs to be applied instead of this one. Cc: Zheng Li <james.z.li@ericsson.com> Cc: David S. Miller <davem@davemloft.net> Cc: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13revert "net: account for current skb length when deciding about UFO"Greg Kroah-Hartman
This reverts commit ef09c9ff343122a0b245416066992d096416ff19 which is commit a5cb659bbc1c8644efa0c3138a757a1e432a4880 upstream as it causes merge issues with later patches that are much more important... Cc: Michal Kubecek <mkubecek@suse.cz> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13packet: fix tp_reserve race in packet_set_ringWillem de Bruijn
[ Upstream commit c27927e372f0785f3303e8fad94b85945e2c97b7 ] Updates to tp_reserve can race with reads of the field in packet_set_ring. Avoid this by holding the socket lock during updates in setsockopt PACKET_RESERVE. This bug was discovered by syzkaller. Fixes: 8913336a7e8d ("packet: add PACKET_RESERVE sockopt") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13igmp: Fix regression caused by igmp sysctl namespace code.Nikolay Borisov
[ Upstream commit 1714020e42b17135032c8606f7185b3fb2ba5d78 ] Commit dcd87999d415 ("igmp: net: Move igmp namespace init to correct file") moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This function is only called as part of per-namespace initialization, only if CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is compiled out, casuing the igmp pernet ops to not be registerd and those sysctl being left initialized with 0. However, there are certain functions, such as ip_mc_join_group which are always compiled and make use of some of those sysctls. Let's do a partial revert of the aforementioned commit and move the sysctl initialization into inet_init_net, that way they will always have sane values. Fixes: dcd87999d415 ("igmp: net: Move igmp namespace init to correct file") Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595 Reported-by: Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13net: avoid skb_warn_bad_offload false positives on UFOWillem de Bruijn
[ Upstream commit 8d63bee643f1fb53e472f0e135cae4eb99d62d19 ] skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set. Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE. When __skb_gso_segment is called from the receive path, this triggers the warning again. Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed. See also this thread for context: http://patchwork.ozlabs.org/patch/799015/ Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13tcp: fastopen: tcp_connect() must refresh the routeEric Dumazet
[ Upstream commit 8ba60924710cde564a3905588b6219741d6356d0 ] With new TCP_FASTOPEN_CONNECT socket option, there is a possibility to call tcp_connect() while socket sk_dst_cache is either NULL or invalid. +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(4, ..., ...) = 0 << sk->sk_dst_cache becomes obsolete, or even set to NULL >> +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 We need to refresh the route otherwise bad things can happen, especially when syzkaller is running on the host :/ Fixes: 19f6d3f3c8422 ("net/tcp-fastopen: Add new API support") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_targetXin Long
[ Upstream commit 96d9703050a0036a3360ec98bb41e107c90664fe ] Commit 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat") introduced a member nft_compat to xt_tgchk_param structure. But it didn't set it's value for ipt_init_target. With unexpected value in par.nft_compat, it may return unexpected result in some target's checkentry. This patch is to set all it's fields as 0 and only initialize the non-zero fields in ipt_init_target. v1->v2: As Wang Cong's suggestion, fix it by setting all it's fields as 0 and only initializing the non-zero fields. Fixes: 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13net: fix keepalive code vs TCP_FASTOPEN_CONNECTEric Dumazet
[ Upstream commit 2dda640040876cd8ae646408b69eea40c24f9ae9 ] syzkaller was able to trigger a divide by 0 in TCP stack [1] Issue here is that keepalive timer needs to be updated to not attempt to send a probe if the connection setup was deferred using TCP_FASTOPEN_CONNECT socket option added in linux-4.11 [1] divide error: 0000 [#1] SMP CPU: 18 PID: 0 Comm: swapper/18 Not tainted task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000 RIP: 0010:[<ffffffff8409cc0d>] [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160 Call Trace: <IRQ> [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20 [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0 [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160 [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230 [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0 [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280 [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257 [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0 [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80 [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90 <EOI> [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0 [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0 Tested: Following packetdrill no longer crashes the kernel `echo 0 >/proc/sys/net/ipv4/tcp_timestamps` // Cache warmup: send a Fast Open cookie request 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress) +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop> +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> +0 > . 1:1(0) ack 1 +0 close(3) = 0 +0 > F. 1:1(0) ack 1 +0 < F. 1:1(0) ack 2 win 92 +0 > . 2:2(0) ack 2 +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 +.01 connect(4, ..., ...) = 0 +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0 +10 close(4) = 0 `echo 1 >/proc/sys/net/ipv4/tcp_timestamps` Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction statesYuchung Cheng
[ Upstream commit ed254971edea92c3ac5c67c6a05247a92aa6075e ] If the sender switches the congestion control during ECN-triggered cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to the ssthresh value calculated by the previous congestion control. If the previous congestion control is BBR that always keep ssthresh to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe step is to avoid assigning invalid ssthresh value when recovery ends. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11net: account for current skb length when deciding about UFOMichal Kubeček
[ Upstream commit a5cb659bbc1c8644efa0c3138a757a1e432a4880 ] Our customer encountered stuck NFS writes for blocks starting at specific offsets w.r.t. page boundary caused by networking stack sending packets via UFO enabled device with wrong checksum. The problem can be reproduced by composing a long UDP datagram from multiple parts using MSG_MORE flag: sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 3000, 0, ...); Assume this packet is to be routed via a device with MTU 1500 and NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), this condition is tested (among others) to decide whether to call ip_ufo_append_data(): ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb)) At the moment, we already have skb with 1028 bytes of data which is not marked for GSO so that the test is false (fragheaderlen is usually 20). Thus we append second 1000 bytes to this skb without invoking UFO. Third sendto(), however, has sufficient length to trigger the UFO path so that we end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() uses udp_csum() to calculate the checksum but that assumes all fragments have correct checksum in skb->csum which is not true for UFO fragments. When checking against MTU, we need to add skb->len to length of new segment if we already have a partially filled skb and fragheaderlen only if there isn't one. In the IPv6 case, skb can only be null if this is the first segment so that we have to use headersize (length of the first IPv6 header) rather than fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv4: Should use consistent conditional judgement for ip fragment in ↵zheng li
__ip_append_data and ip_finish_output [ Upstream commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 ] There is an inconsistent conditional judgement in __ip_append_data and ip_finish_output functions, the variable length in __ip_append_data just include the length of application's payload and udp header, don't include the length of ip header, but in ip_finish_output use (skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ip header. That causes some particular application's udp payload whose length is between (MTU - IP Header) and MTU were fragmented by ip_fragment even though the rst->dev support UFO feature. Add the length of ip header to length in __ip_append_data to keep consistent conditional judgement as ip_finish_output for ip fragment. Signed-off-by: Zheng Li <james.z.li@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned intPavel Tikhomirov
[ Upstream commit b007f09072ca8afa118ade333e717ba443e8d807 ] > cat /proc/sys/net/ipv4/tcp_notsent_lowat -1 > echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat -bash: echo: write error: Invalid argument > echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat > cat /proc/sys/net/ipv4/tcp_notsent_lowat -2147483648 but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER" v2: simplify to just proc_douintvec Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11netfilter: use fwmark_reflect in nf_send_resetPau Espin Pedrol
[ Upstream commit cc31d43b4154ad5a7d8aa5543255a93b7e89edc2 ] Otherwise, RST packets generated by ipt_REJECT always have mark 0 when the routing is checked later in the same code path. Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11dccp: fix a memleak for dccp_feat_init err processXin Long
[ Upstream commit e90ce2fc27cad7e7b1e72b9e66201a7a4c124c2b ] In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc memory for rx.val, it should free tx.val before returning an error. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properlyXin Long
[ Upstream commit b7953d3c0e30a5fc944f6b7bd0bcceb0794bcd85 ] The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue exists on dccp_ipv4. This patch is to fix it for dccp_ipv4. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properlyXin Long
[ Upstream commit 0c2232b0a71db0ac1d22f751aa1ac0cadb950fd2 ] In dccp_v6_conn_request, after reqsk gets alloced and hashed into ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer, one is for hlist, and the other one is for current using. The problem is when dccp_v6_conn_request returns and finishes using reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and reqsk obj never gets freed. Jianlin found this issue when running dccp_memleak.c in a loop, the system memory would run out. dccp_memleak.c: int s1 = socket(PF_INET6, 6, IPPROTO_IP); bind(s1, &sa1, 0x20); listen(s1, 0x9); int s2 = socket(PF_INET6, 6, IPPROTO_IP); connect(s2, &sa1, 0x20); close(s1); close(s2); This patch is to put the reqsk before dccp_v6_conn_request returns, just as what tcp_conn_request does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()Stefano Brivio
[ Upstream commit afce615aaabfbaad02550e75c0bec106dafa1adf ] RFC 2465 defines ipv6IfStatsOutFragFails as: "The number of IPv6 datagrams that have been discarded because they needed to be fragmented at this output interface but could not be." The existing implementation, instead, would increase the counter twice in case we fail to allocate room for single fragments: once for the fragment, once for the datagram. This didn't look intentional though. In one of the two affected affected failure paths, the double increase was simply a result of a new 'goto fail' statement, introduced to avoid a skb leak. The other path appears to be affected since at least 2.6.12-rc2. Reported-by: Sabrina Dubroca <sdubroca@redhat.com> Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11packet: fix use-after-free in prb_retire_rx_blk_timer_expired()WANG Cong
[ Upstream commit c800aaf8d869f2b9b47b10c5c312fe19f0a94042 ] There are multiple reports showing we have a use-after-free in the timer prb_retire_rx_blk_timer_expired(), where we use struct tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by free_pg_vec(). The interesting part is it is not freed via packet_release() but via packet_setsockopt(), which means we are not closing the socket. Looking into the big and fat function packet_set_ring(), this could happen if we satisfy the following conditions: 1. closing == 0, not on packet_release() path 2. req->tp_block_nr == 0, we don't allocate a new pg_vec 3. rx_ring->pg_vec is already set as V3, which means we already called packet_set_ring() wtih req->tp_block_nr > 0 previously 4. req->tp_frame_nr == 0, pass sanity check 5. po->mapped == 0, never called mmap() In this scenario we are clearing the old rx_ring->pg_vec, so we need to free this pg_vec, but we don't stop the timer on this path because of closing==0. The timer has to be stopped as long as we need to free pg_vec, therefore the check on closing!=0 is wrong, we should check pg_vec!=NULL instead. Thanks to liujian for testing different fixes. Reported-by: alexander.levin@verizon.com Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: liujian (CE) <liujian56@huawei.com> Tested-by: liujian (CE) <liujian56@huawei.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11openvswitch: fix potential out of bound access in parse_ctLiping Zhang
[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ] Before the 'type' is validated, we shouldn't use it to fetch the ovs_ct_attr_lens's minlen and maxlen, else, out of bound access may happen. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11rtnetlink: allocate more memory for dev_set_mac_address()WANG Cong
[ Upstream commit 153711f9421be5dbc973dc57a4109dc9d54c89b1 ] virtnet_set_mac_address() interprets mac address as struct sockaddr, but upper layer only allocates dev->addr_len which is ETH_ALEN + sizeof(sa_family_t) in this case. We lack a unified definition for mac address, so just fix the upper layer, this also allows drivers to interpret it to struct sockaddr freely. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv4: initialize fib_trie prior to register_netdev_notifier call.Mahesh Bandewar
[ Upstream commit 8799a221f5944a7d74516ecf46d58c28ec1d1f75 ] Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. In fact fib_trie initialization needs to happen before first rtnl_register(). It does not cause any problem since there are no devices UP at this moment, but trying to bring 'lo' UP at initialization would make this assumption wrong and exposes the issue. Fixes following crash Call Trace: ? alternate_node_alloc+0x76/0xa0 fib_table_insert+0x1b7/0x4b0 fib_magic.isra.17+0xea/0x120 fib_add_ifaddr+0x7b/0x190 fib_netdev_event+0xc0/0x130 register_netdevice_notifier+0x1c1/0x1d0 ip_fib_init+0x72/0x85 ip_rt_init+0x187/0x1e9 ip_init+0xe/0x1a inet_init+0x171/0x26c ? ipv4_offload_init+0x66/0x66 do_one_initcall+0x43/0x160 kernel_init_freeable+0x191/0x219 ? rest_init+0x80/0x80 kernel_init+0xe/0x150 ret_from_fork+0x22/0x30 Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08 RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28 CR2: 0000000000000014 Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.") Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv6: avoid overflow of offset in ip6_find_1stfragoptSabrina Dubroca
[ Upstream commit 6399f1fae4ec29fab5ec76070435555e256ca3a6 ] In some cases, offset can overflow and can cause an infinite loop in ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and cap it at IPV6_MAXPLEN, since packets larger than that should be invalid. This problem has been here since before the beginning of git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11net: Zero terminate ifr_name in dev_ifname().David S. Miller
[ Upstream commit 63679112c536289826fec61c917621de95ba2ade ] The ifr.ifr_name is passed around and assumed to be NULL terminated. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()Alexander Potapenko
[ Upstream commit 18bcf2907df935981266532e1e0d052aff2e6fae ] KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11tcp_bbr: init pacing rate on first RTT sampleNeal Cardwell
[ Upstream commit 32984565574da7ed3afa10647bb4020d7a9e6c93 ] Fixes the following behavior: for connections that had no RTT sample at the time of initializing congestion control, BBR was initializing the pacing rate to a high nominal rate (based an a guess of RTT=1ms, in case this is LAN traffic). Then BBR never adjusted the pacing rate downward upon obtaining an actual RTT sample, if the connection never filled the pipe (e.g. all sends were small app-limited writes()). This fix adjusts the pacing rate upon obtaining the first RTT sample. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11tcp_bbr: remove sk_pacing_rate=0 transient during initNeal Cardwell
[ Upstream commit 1d3648eb5d1fe9ed3d095ed8fa19ad11ca4c8bc0 ] Fix a corner case noticed by Eric Dumazet, where BBR's setting sk->sk_pacing_rate to 0 during initialization could theoretically cause packets in the sending host to hang if there were packets "in flight" in the pacing infrastructure at the time the BBR congestion control state is initialized. This could occur if the pacing infrastructure happened to race with bbr_init() in a way such that the pacer read the 0 rather than the immediately following non-zero pacing rate. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helperNeal Cardwell
[ Upstream commit 79135b89b8af304456bd67916b80116ddf03d7b6 ] Introduce a helper to initialize the BBR pacing rate unconditionally, based on the current cwnd and RTT estimate. This is a pure refactor, but is needed for two following fixes. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11tcp_bbr: introduce bbr_bw_to_pacing_rate() helperNeal Cardwell
[ Upstream commit f19fd62dafaf1ed6cf615dba655b82fa9df59074 ] Introduce a helper to convert a BBR bandwidth and gain factor to a pacing rate in bytes per second. This is a pure refactor, but is needed for two following fixes. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11tcp_bbr: cut pacing rate only if filled pipeNeal Cardwell
[ Upstream commit 4aea287e90dd61a48268ff2994b56f9799441b62 ] In bbr_set_pacing_rate(), which decides whether to cut the pacing rate, there was some code that considered exiting STARTUP to be equivalent to the notion of filling the pipe (i.e., bbr_full_bw_reached()). Specifically, as the code was structured, exiting STARTUP and going into PROBE_RTT could cause us to cut the pacing rate down to something silly and low, based on whatever bandwidth samples we've had so far, when it's possible that all of them have been small app-limited bandwidth samples that are not representative of the bandwidth available in the path. (The code was correct at the time it was written, but the state machine changed without this spot being adjusted correspondingly.) Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07xfrm: Don't use sk_family for socket policy lookupsSteffen Klassert
commit 4c86d77743a54fb2d8a4d18a037a074c892bb3be upstream. On IPv4-mapped IPv6 addresses sk_family is AF_INET6, but the flow informations are created based on AF_INET. So the routing set up 'struct flowi4' but we try to access 'struct flowi6' what leads to an out of bounds access. Fix this by using the family we get with the dst_entry, like we do it for the standard policy lookup. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookupGuillaume Nault
[ Upstream commit 97b84fd6d91766ea57dcc350d78f42639e011c30 ] An L2TP socket bound to the unspecified address should match with any address. If not, it can't receive any packet and __l2tp_ip6_bind_lookup() can't prevent another socket from binding on the same device/tunnel ID. While there, rename the 'addr' variable to 'sk_laddr' (local addr), to make following patch clearer. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07ipv6: Should use consistent conditional judgement for ip6 fragment between ↵Zheng Li
__ip6_append_data and ip6_finish_output [ Upstream commit e4c5e13aa45c23692e4acf56f0b3533f328199b2 ] There is an inconsistent conditional judgement between __ip6_append_data and ip6_finish_output functions, the variable length in __ip6_append_data just include the length of application's payload and udp6 header, don't include the length of ipv6 header, but in ip6_finish_output use (skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ipv6 header. That causes some particular application's udp6 payloads whose length are between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even though the rst->dev support UFO feature. Add the length of ipv6 header to length in __ip6_append_data to keep consistent conditional judgement as ip6_finish_output for ip6 fragment. Signed-off-by: Zheng Li <james.z.li@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07net: skb_needs_check() accepts CHECKSUM_NONE for txEric Dumazet
commit 6e7bc478c9a006c701c14476ec9d389a484b4864 upstream. My recent change missed fact that UFO would perform a complete UDP checksum before segmenting in frags. In this case skb->ip_summed is set to CHECKSUM_NONE. We need to add this valid case to skb_needs_check() Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07net: reduce skb_warn_bad_offload() noiseEric Dumazet
commit b2504a5dbef3305ef41988ad270b0e8ec289331c upstream. Dmitry reported warnings occurring in __skb_gso_segment() [1] All SKB_GSO_DODGY producers can allow user space to feed packets that trigger the current check. We could prevent them from doing so, rejecting packets, but this might add regressions to existing programs. It turns out our SKB_GSO_DODGY handlers properly set up checksum information that is needed anyway when packets needs to be segmented. By checking again skb_needs_check() after skb_mac_gso_segment(), we should remove these pesky warnings, at a very minor cost. With help from Willem de Bruijn [1] WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20 Call Trace: [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline] [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline] [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline] [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline] [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07af_key: Add lock to key dumpYuejie Shi
commit 89e357d83c06b6fac581c3ca7f0ee3ae7e67109e upstream. A dump may come in the middle of another dump, modifying its dump structure members. This race condition will result in NULL pointer dereference in kernel. So add a lock to prevent that race. Fixes: 83321d6b9872 ("[AF_KEY]: Dump SA/SP entries non-atomically") Signed-off-by: Yuejie Shi <syjcnss@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sunrpc: use constant time memory comparison for macJason A. Donenfeld
commit 15a8b93fd5690de017ce665382ea45e5d61811a4 upstream. Otherwise, we enable a MAC forgery via timing attack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27ipvs: SNAT packet replies only for NATed connectionsJulian Anastasov
commit 3c5ab3f395d66a9e4e937fcfdf6ebc63894f028b upstream. We do not check if packet from real server is for NAT connection before performing SNAT. This causes problems for setups that use DR/TUN and allow local clients to access the real server directly, for example: - local client in director creates IPVS-DR/TUN connection CIP->VIP and the request packets are routed to RIP. Talks are finished but IPVS connection is not expired yet. - second local client creates non-IPVS connection CIP->RIP with same reply tuple RIP->CIP and when replies are received on LOCAL_IN we wrongly assign them for the first client connection because RIP->CIP matches the reply direction. As result, IPVS SNATs replies for non-IPVS connections. The problem is more visible to local UDP clients but in rare cases it can happen also for TCP or remote clients when the real server sends the reply traffic via the director. So, better to be more precise for the reply traffic. As replies are not expected for DR/TUN connections, better to not touch them. Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk> Tested-by: Nick Moriarty <nick.moriarty@york.ac.uk> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27af_key: Fix sadb_x_ipsecrequest parsingHerbert Xu
commit 096f41d3a8fcbb8dde7f71379b1ca85fe213eded upstream. The parsing of sadb_x_ipsecrequest is broken in a number of ways. First of all we're not verifying sadb_x_ipsecrequest_len. This is needed when the structure carries addresses at the end. Worse we don't even look at the length when we parse those optional addresses. The migration code had similar parsing code that's better but it also has some deficiencies. The length is overcounted first of all as it includes the header itself. It also fails to check the length before dereferencing the sa_family field. This patch fixes those problems in parse_sockaddr_pair and then uses it in parse_ipsecrequest. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27Bluetooth: use constant time memory comparison for secret valuesJason A. Donenfeld
commit 329d82309824ff1082dc4a91a5bbed8c3bec1580 upstream. This file is filled with complex cryptography. Thus, the comparisons of MACs and secret keys and curve points and so forth should not add timing attacks, which could either result in a direct forgery, or, given the complexity, some other type of attack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27NFC: Add sockaddr length checks before accessing sa_family in bind handlersMateusz Jurczyk
commit f6a5885fc4d68e7f25ffb42b9d8d80aebb3bacbb upstream. Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() handlers of the AF_NFC socket. Since the syscall doesn't enforce a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27nfc: Fix the sockaddr length sanitization in llcp_sock_connectMateusz Jurczyk
commit 608c4adfcabab220142ee335a2a003ccd1c0b25b upstream. Fix the sockaddr length verification in the connect() handler of NFC/LLCP sockets, to compare against the size of the actual structure expected on input (sockaddr_nfc_llcp) instead of its shorter version (sockaddr_nfc). Both structures are defined in include/uapi/linux/nfc.h. The fields specific to the _llcp extended struct are as follows: 276 __u8 dsap; /* Destination SAP, if known */ 277 __u8 ssap; /* Source SAP to be bound to */ 278 char service_name[NFC_LLCP_MAX_SERVICE_NAME]; /* Service name URI */; 279 size_t service_name_len; If the caller doesn't provide a sufficiently long sockaddr buffer, these fields remain uninitialized (and they currently originate from the stack frame of the top-level sys_connect handler). They are then copied by llcp_sock_connect() into internal storage (nfc_llcp_sock structure), and could be subsequently read back through the user-mode getsockname() function (handled by llcp_sock_getname()). This would result in the disclosure of up to ~70 uninitialized bytes from the kernel stack to user-mode clients capable of creating AFC_NFC sockets. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27nfc: Ensure presence of required attributes in the activate_target handlerMateusz Jurczyk
commit a0323b979f81ad2deb2c8836eab506534891876a upstream. Check that the NFC_ATTR_TARGET_INDEX and NFC_ATTR_PROTOCOLS attributes (in addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to accessing them. This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27NFC: fix broken device allocationJohan Hovold
commit 20777bc57c346b6994f465e0d8261a7fbf213a09 upstream. Commit 7eda8b8e9677 ("NFC: Use IDR library to assing NFC devices IDs") moved device-id allocation and struct-device initialisation from nfc_allocate_device() to nfc_register_device(). This broke just about every nfc-device-registration error path, which continue to call nfc_free_device() that tries to put the device reference of the now uninitialised (but zeroed) struct device: kobject: '(null)' (ce316420): is not initialized, yet kobject_put() is being called. The late struct-device initialisation also meant that various work queues whose names are derived from the nfc device name were also misnamed: 421 root 0 SW< [(null)_nci_cmd_] 422 root 0 SW< [(null)_nci_rx_w] 423 root 0 SW< [(null)_nci_tx_w] Move the id-allocation and struct-device initialisation back to nfc_allocate_device() and fix up the single call site which did not use nfc_free_device() in its error path. Fixes: 7eda8b8e9677 ("NFC: Use IDR library to assing NFC devices IDs") Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21cfg80211: Check if NAN service ID is of expected sizeSrinivas Dasari
commit 0a27844ce86d039d74221dd56cd8c0349b146b63 upstream. nla policy checks for only maximum length of the attribute data when the attribute type is NLA_BINARY. If userspace sends less data than specified, cfg80211 may access illegal memory. When type is NLA_UNSPEC, nla policy check ensures that userspace sends minimum specified length number of bytes. Remove type assignment to NLA_BINARY from nla_policy of NL80211_NAN_FUNC_SERVICE_ID to make these NLA_UNSPEC and to make sure minimum NL80211_NAN_FUNC_SERVICE_ID_LEN bytes are received from userspace with NL80211_NAN_FUNC_SERVICE_ID. Fixes: a442b761b24 ("cfg80211: add add_nan_func / del_nan_func") Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21cfg80211: Check if PMKID attribute is of expected sizeSrinivas Dasari
commit 9361df14d1cbf966409d5d6f48bb334384fbe138 upstream. nla policy checks for only maximum length of the attribute data when the attribute type is NLA_BINARY. If userspace sends less data than specified, the wireless drivers may access illegal memory. When type is NLA_UNSPEC, nla policy check ensures that userspace sends minimum specified length number of bytes. Remove type assignment to NLA_BINARY from nla_policy of NL80211_ATTR_PMKID to make this NLA_UNSPEC and to make sure minimum WLAN_PMKID_LEN bytes are received from userspace with NL80211_ATTR_PMKID. Fixes: 67fbb16be69d ("nl80211: PMKSA caching support") Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIESSrinivas Dasari
commit d7f13f7450369281a5d0ea463cc69890a15923ae upstream. validate_scan_freqs() retrieves frequencies from attributes nested in the attribute NL80211_ATTR_SCAN_FREQUENCIES with nla_get_u32(), which reads 4 bytes from each attribute without validating the size of data received. Attributes nested in NL80211_ATTR_SCAN_FREQUENCIES don't have an nla policy. Validate size of each attribute before parsing to avoid potential buffer overread. Fixes: 2a519311926 ("cfg80211/nl80211: scanning (and mac80211 update to use it)") Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODESrinivas Dasari
commit 8feb69c7bd89513be80eb19198d48f154b254021 upstream. Buffer overread may happen as nl80211_set_station() reads 4 bytes from the attribute NL80211_ATTR_LOCAL_MESH_POWER_MODE without validating the size of data received when userspace sends less than 4 bytes of data with NL80211_ATTR_LOCAL_MESH_POWER_MODE. Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE to avoid the buffer overread. Fixes: 3b1c5a5307f ("{cfg,nl}80211: mesh power mode primitives and userspace access") Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21rds: tcp: use sock_create_lite() to create the accept socketSowmini Varadhan
commit 0933a578cd55b02dc80f219dc8f2efb17ec61c9a upstream. There are two problems with calling sock_create_kern() from rds_tcp_accept_one() 1. it sets up a new_sock->sk that is wasteful, because this ->sk is going to get replaced by inet_accept() in the subsequent ->accept() 2. The new_sock->sk is a leaked reference in sock_graft() which expects to find a null parent->sk Avoid these problems by calling sock_create_lite(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21net: ipv6: Compare lwstate in detecting duplicate nexthopsDavid Ahern
commit f06b7549b79e29a672336d4e134524373fb7a232 upstream. Lennert reported a failure to add different mpls encaps in a multipath route: $ ip -6 route add 1234::/16 \ nexthop encap mpls 10 via fe80::1 dev ens3 \ nexthop encap mpls 20 via fe80::1 dev ens3 RTNETLINK answers: File exists The problem is that the duplicate nexthop detection does not compare lwtunnel configuration. Add it. Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: David Ahern <dsahern@gmail.com> Reported-by: João Taveira Araújo <joao.taveira@gmail.com> Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>