Age | Commit message (Collapse) | Author |
|
commit 71d6ad08379304128e4bdfaf0b4185d54375423e upstream.
Don't assume that server is sane and won't return more data than
asked for.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8 ]
In situations where an skb is paged, the transport header pointer and
tail pointer can be the same because the skb contents are in frags.
This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
length of 0 when the length to receive is actually greater than zero.
skb->len is already correctly set in ip6_input_finish() with
pskb_pull(), so use skb->len as it always returns the correct result
for both linear and paged data.
Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c1201444075009507a6818de6518e2822b9a87c8 ]
Always zero out ca_priv data in tcp_assign_congestion_control() so that
ca_priv data is cleared out during socket creation.
Also always zero out ca_priv data in tcp_reinit_congestion_control() so
that when cc algorithm is changed, ca_priv data is cleared out as well.
We should still zero out ca_priv data even in TCP_CLOSE state because
user could call connect() on AF_UNSPEC to disconnect the socket and
leave it in TCP_CLOSE state and later call setsockopt() to switch cc
algorithm on this socket.
Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 199ab00f3cdb6f154ea93fa76fd80192861a821d ]
Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:
neigh = dst_neigh_lookup(skb_dst(skb),
&ipv6_hdr(skb)->daddr);
if (!neigh)
goto tx_err_link_failure;
addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
addr_type = ipv6_addr_type(addr6);
if (addr_type == IPV6_ADDR_ANY)
addr6 = &ipv6_hdr(skb)->daddr;
memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.
This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.
Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8048ced9beb21a52e3305f3332ae82020619f24e ]
Taking down the loopback device wreaks havoc on IPv6 routing. By
extension, taking down a VRF device wreaks havoc on its table.
Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.
Andrey's reproducer was the key to getting to the bottom of this.
With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.
The recent change to keep global IPv6 addresses added a new function,
fixup_permanent_addr, that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, re-inserting the route causes the reported
out-of-bounds accesses. If the device with the address is taken down
or the address is removed, the WARN_ON in fib6_del is triggered.
All of those faults are fixed by regenerating the host route if the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 723b929ca0f79c0796f160c2eeda4597ee98d2b8 ]
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS: 0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
call_netdevice_notifiers net/core/dev.c:1663
rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many net/core/dev.c:7880
default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c70b17b775edb21280e9de7531acf6db3b365274 ]
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000001525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
00000000 Tainted: G OE
TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
0000000000000001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00000000000000c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0000000000000003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
000000000049ed94
RPC: <set_next_entity+0x34/0xb80>
l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
0000000000000000
l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
0000000000000000
i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
00000000103fa4b0
I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
Call Trace:
[00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
[0000000000998c74] netpoll_start_xmit+0xf4/0x200
[0000000000998e10] queue_process+0x90/0x160
[0000000000485fa8] process_one_work+0x188/0x480
[0000000000486410] worker_thread+0x170/0x4c0
[000000000048c6b8] kthread+0xd8/0x120
[0000000000406064] ret_from_fork+0x1c/0x2c
[0000000000000000] (null)
Disabling lock debugging due to kernel taint
Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
Caller[0000000000998e10]: queue_process+0x90/0x160
Caller[0000000000485fa8]: process_one_work+0x188/0x480
Caller[0000000000486410]: worker_thread+0x170/0x4c0
Caller[000000000048c6b8]: kthread+0xd8/0x120
Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]: (null)
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 557c44be917c322860665be3d28376afa84aa936 ]
Andrey reported a fault in the IPv6 route code:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069809600 task.stack: ffff880062dc8000
RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
RSP: 0018:ffff880062dced30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
Call Trace:
ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
...
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 43170c4e0ba709c79130c3fe5a41e66279950cd0 ]
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1862d6208db0aeca9c8ace44915b08d5ab2cd667 ]
Syzkaller reported a use-after-free in ip_recv_error at line
info->ipi_ifindex = skb->dev->ifindex;
This function is called on dequeue from the error queue, at which
point the device pointer may no longer be valid.
Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
pointer is valid or NULL. Store it in temporary storage skb->cb.
It is safe to reference skb->dev here, as called from device drivers
or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
in that case it is NULL and ifindex is set to 0 (invalid).
Do not return a pktinfo cmsg if ifindex is 0. This maintains the
current behavior of not returning a cmsg if skb->dev was NULL.
On dequeue, the ipv4 path will cast from sock_exterr_skb to
in_pktinfo. Both have ifindex as their first element, so no explicit
conversion is needed. This is by design, introduced in commit
0b922b7a829c ("net: original ingress device index in PKTINFO"). For
ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a2d6cbb0670d54806f18192cb0db266b4a6d285a ]
addrconf_ifdown() removes elements from the idev->addr_list without
holding the idev->lock.
If this happens while the loop in __ipv6_dev_get_saddr() is handling the
same element, that function ends up in an infinite loop:
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [test:1719]
Call Trace:
ipv6_get_saddr_eval+0x13c/0x3a0
__ipv6_dev_get_saddr+0xe4/0x1f0
ipv6_dev_get_saddr+0x1b4/0x204
ip6_dst_lookup_tail+0xcc/0x27c
ip6_dst_lookup_flow+0x38/0x80
udpv6_sendmsg+0x708/0xba8
sock_sendmsg+0x18/0x30
SyS_sendto+0xb8/0xf8
syscall_common+0x34/0x58
Fixes: 6a923934c33 (Revert "ipv6: Revert optional address flusing on ifdown.")
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 17c3060b1701fc69daedb4c90be6325d3d9fca8e ]
In the (very unlikely) case a passive socket becomes a listener,
we do not want to duplicate its saved SYN headers.
This would lead to double frees, use after free, and please hackers and
various fuzzers
Tested:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 connect(4, AF_UNSPEC, ...) = 0
+0 close(3) = 0
+0 bind(4, ..., ...) = 0
+0 listen(4, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
Fixes: cd8ae85299d5 ("tcp: provide SYN headers for passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 34b2789f1d9bf8dcca9b5cb553d076ca2cd898ee ]
Now sctp doesn't check sock's state before listening on it. It could
even cause changing a sock with any state to become a listening sock
when doing sctp_listen.
This patch is to fix it by checking sock's state in sctp_listen, so
that it will listen on the sock with right state.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a8801799c6975601fd58ae62f48964caec2eb83f ]
inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
ip_route_input when iif is given. If a multipath route is present for
the designated destination, ip_multipath_icmp_hash ends up being called,
which uses the source/destination addresses within the skb to calculate
a hash. However, those are not set in the synthetic skb, causing it to
return an arbitrary and incorrect result.
Instead, use UDP, which gets no such special treatment.
Signed-off-by: Florian Larysch <fl@n621.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 249ee819e24c180909f43c1173c8ef6724d21faf ]
PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).
Fixes: f1f39f911027 ("l2tp: auto load type modules")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e08293a4ccbcc993ded0fdc46f1e57926b833d63 ]
Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.
For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bcc5364bdcfe131e6379363f089e7b4108d35b70 ]
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.
Fix by checking that tp_reserve <= INT_MAX on assign.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8f8d28e4d6d815a391285e121c3a53a0b6cb9e7b ]
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.
Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
Since frames_per_block <= tp_block_size, the expression would
never overflow.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e91793bb615cf6cdd59c0b6749fe173687bb0947 ]
The Rx path may grab the socket right before pppol2tp_release(), but
nothing guarantees that it will enqueue packets before
skb_queue_purge(). Therefore, the socket can be destroyed without its
queues fully purged.
Fix this by purging queues in pppol2tp_session_destruct() where we're
guaranteed nothing is still referencing the socket.
Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 94d7ee0baa8b764cf64ad91ed69464c1a6a0066b ]
The code following l2tp_tunnel_find() expects that a new reference is
held on sk. Either sk_receive_skb() or the discard_put error path will
drop a reference from the tunnel's socket.
This issue exists in both l2tp_ip and l2tp_ip6.
Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a80db69e47d764bbcaf2fec54b1f308925e7c490 ]
There is no reason to continue after a copy_from_user()
failure.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 48481c8fa16410ffa45939b13b6c53c2ca609e5f ]
Dmitry posted a nice reproducer of a bug triggering in neigh_probe()
when dereferencing a NULL neigh->ops->solicit method.
This can happen for arp_direct_ops/ndisc_direct_ops and similar,
which can be used for NUD_NOARP neighbours (created when dev->header_ops
is NULL). Admin can then force changing nud_state to some other state
that would fire neigh timer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 43a6684519ab0a6c52024b5e25322476cabad893 upstream.
We got a report of yet another bug in ping
http://www.openwall.com/lists/oss-security/2017/03/24/6
->disconnect() is not called with socket lock held.
Fix this by acquiring ping rwlock earlier.
Thanks to Daniel, Alexander and Andrey for letting us know this problem.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9e478066eae41211c92a8f63cc69aafc391bd6ab upstream.
There are two bugs in the follow-MAC code:
* it treats the radiotap header as the 802.11 header
(therefore it can't possibly work)
* it doesn't verify that the skb data it accesses is actually
present in the header, which is mitigated by the first point
Fix this by moving all of this out into a separate function.
This function copies the data it needs using skb_copy_bits()
to make sure it can be accessed if it's paged, and offsets
that by the possibly present vendor radiotap header.
This also makes all those conditions more readable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3018e947d7fd536d57e2b550c33e456d921fff8c upstream.
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
--
|
|
commit dfcb9f4f99f1e9a49e43398a7bfbf56927544af1 upstream.
commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.
As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.
Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.
This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).
Joint work with Xin Long.
Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c2ed1880fd61a998e3ce40254a99a2ad000f1a7d upstream.
The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).
This can be verified using `ip -6 route del <prefix> proto something`.
Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3278682123811dd8ef07de5eb701fc4548fcebf2 upstream.
Fixes the mess observed in e.g. rsync over a noisy link we'd been
seeing since last Summer. What happens is that we copy part of
a datagram before noticing a checksum mismatch. Datagram will be
resent, all right, but we want the next try go into the same place,
not after it...
All this family of primitives (copy/checksum and copy a datagram
into destination) is "all or nothing" sort of interface - either
we get 0 (meaning that copy had been successful) or we get an
error (and no way to tell how much had been copied before we ran
into whatever error it had been). Make all of them leave iterator
unadvanced in case of errors - all callers must be able to cope
with that (an error might've been caught before the iterator had
been advanced), it costs very little to arrange, it's safer for
callers and actually fixes at least one bug in said callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b6867c2ce76c596676bec7d2d525af525fdc6e2 upstream.
Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).
Compare them as is instead.
Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4d712ef1db05c3aa5c3b690a50c37ebad584c53f ]
S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
whose sequence number lies outside the current window is dropped.
The rationale is:
The reason for discarding requests silently is that the server
is unable to determine if the duplicate or out of range request
was due to a sequencing problem in the client, network, or the
operating system, or due to some quirk in routing, or a replay
attack by an intruder. Discarding the request allows the client
to recover after timing out, if indeed the duplication was
unintentional or well intended.
However, clients may rely on the server dropping the connection to
indicate that a retransmit is needed. Without a connection reset, a
client can wait forever without retransmitting, and the workload
just stops dead. I've reproduced this behavior by running xfstests
generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
To address this issue, have the server close the connection when it
silently discards an incoming message due to a GSS sequence number
problem.
There are a few other places where the server will never reply.
Change those spots in a similar fashion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d65f82954dadbbe7b6e1aec7e07ad17bc6d958b upstream.
When internal mac80211 TXQs aren't supported, netdev queues must
always started out started even when driver queues are stopped
while the interface is added. This is necessary because with the
internal TXQ support netdev queues are never stopped and packet
scheduling/dropping is done in mac80211.
Fixes: 80a83cfc434b1 ("mac80211: skip netdev queue control with software queuing")
Reported-and-tested-by: Sven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b3ef5520c1eabb56064474043c7c55a1a65b8708 upstream.
We got the following use-after-free KASAN report:
BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211]
at addr ffff8803fc244090
Read of size 8 by task kworker/u16:24/2587
CPU: 6 PID: 2587 Comm: kworker/u16:24 Tainted: G B 4.9.13-debug+
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
Workqueue: events_unbound async_run_entry_fn
ffff880425d4f9d8 ffffffffaeedb541 ffff88042b80ef00 ffff8803fc244088
ffff880425d4fa00 ffffffffae84d7a1 ffff880425d4fa98 ffff8803fc244080
ffff88042b80ef00 ffff880425d4fa88 ffffffffae84da3a ffffffffc141f7d9
Call Trace:
[<ffffffffaeedb541>] dump_stack+0x85/0xc4
[<ffffffffae84d7a1>] kasan_object_err+0x21/0x70
[<ffffffffae84da3a>] kasan_report_error+0x1fa/0x500
[<ffffffffc141f7d9>] ? cfg80211_bss_age+0x39/0xc0 [cfg80211]
[<ffffffffc141f83a>] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
[<ffffffffae48d46d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffae84def1>] __asan_report_load8_noabort+0x61/0x70
[<ffffffffc13fb100>] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
[<ffffffffc13fb751>] ? wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb751>] wiphy_resume+0x591/0x5a0 [cfg80211]
[<ffffffffc13fb1c0>] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
[<ffffffffaf3b206e>] dpm_run_callback+0x6e/0x4f0
[<ffffffffaf3b31b2>] device_resume+0x1c2/0x670
[<ffffffffaf3b367d>] async_resume+0x1d/0x50
[<ffffffffae3ee84e>] async_run_entry_fn+0xfe/0x610
[<ffffffffae3d0666>] process_one_work+0x716/0x1a50
[<ffffffffae3d05c9>] ? process_one_work+0x679/0x1a50
[<ffffffffafdd7b6d>] ? _raw_spin_unlock_irq+0x3d/0x60
[<ffffffffae3cff50>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffffae3d1a80>] worker_thread+0xe0/0x1460
[<ffffffffae3d19a0>] ? process_one_work+0x1a50/0x1a50
[<ffffffffae3e54c2>] kthread+0x222/0x2e0
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffae3e52a0>] ? kthread_park+0x80/0x80
[<ffffffffafdd86aa>] ret_from_fork+0x2a/0x40
Object at ffff8803fc244088, in cache kmalloc-1024 size: 1024
Allocated:
PID = 71
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
__kmalloc_track_caller+0x134/0x360
kmemdup+0x20/0x50
brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
request_firmware_work_func+0x135/0x280
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Freed:
PID = 2568
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x71/0xb0
kfree+0xe8/0x2e0
brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
brcmf_detach+0x14a/0x2b0 [brcmfmac]
brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
pci_pm_resume+0x186/0x220
dpm_run_callback+0x6e/0x4f0
device_resume+0x1c2/0x670
async_resume+0x1d/0x50
async_run_entry_fn+0xfe/0x610
process_one_work+0x716/0x1a50
worker_thread+0xe0/0x1460
kthread+0x222/0x2e0
ret_from_fork+0x2a/0x40
Memory state around the buggy address:
ffff8803fc243f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8803fc244000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803fc244080: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8803fc244100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8803fc244180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
What is happening is that brcmf_pcie_resume() detects a device that
is no longer responsive and it decides to unbind resulting in a
wiphy_unregister() and wiphy_free() call. Now the wiphy instance
remains allocated, because PM needs to call wiphy_resume() for it.
However, brcmfmac already does a kfree() for the struct
cfg80211_registered_device::ops field. Change the checks in
wiphy_resume() to only access the struct cfg80211_registered_device::ops
if the wiphy instance is still registered at this time.
Reported-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream.
sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:
Workqueue: ceph-msgr con_work [libceph]
ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
Call Trace:
[<ffffffff816dd629>] schedule+0x29/0x70
[<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
[<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
[<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
[<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
[<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
[<ffffffff81086335>] flush_work+0x165/0x250
[<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
[<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
[<ffffffff816d6b42>] ? __slab_free+0xee/0x234
[<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
[<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
[<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
[<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
[<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
[<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
[<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
[<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
[<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
[<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
[<ffffffff811c0c18>] super_cache_scan+0x178/0x180
[<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
[<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
[<ffffffff8115af70>] shrink_slab+0x100/0x140
[<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
[<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
[<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
[<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
[<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
[<ffffffff811a0ac5>] new_slab+0x2c5/0x390
[<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
[<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
[<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
[<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
[<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
[<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
[<ffffffff811d8566>] alloc_inode+0x26/0xa0
[<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
[<ffffffff815b933e>] sock_alloc+0x1e/0x80
[<ffffffff815ba855>] __sock_create+0x95/0x220
[<ffffffff815baa04>] sock_create_kern+0x24/0x30
[<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
[<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
[<ffffffff81084c19>] process_one_work+0x159/0x4f0
[<ffffffff8108561b>] worker_thread+0x11b/0x530
[<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
[<ffffffff8108b6f9>] kthread+0xc9/0xe0
[<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
[<ffffffff816e1b98>] ret_from_fork+0x58/0x90
[<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.
Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov <wintchester@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f843ee6dd019bcece3e74e76ad9df0155655d0df upstream.
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues. To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.
CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 677e806da4d916052585301785d847c3b3e6186a upstream.
When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer. However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call. There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents. We do
not at this point check that the replay_window is within the allocated
memory. This leads to out-of-bounds reads and writes triggered by
netlink packets. This leads to memory corruption and the potential for
priviledge escalation.
We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn. It however does not check the replay_window
remains within that buffer. Add validation of the contained
replay_window.
CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c282222a45cb9503cbfbebfdb60491f06ae84b49 upstream.
Dmitry reports following splat:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
[..]
spin_lock_bh include/linux/spinlock.h:304 [inline]
xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
ops_init+0x10a/0x530 net/core/net_namespace.c:115
setup_net+0x2ed/0x690 net/core/net_namespace.c:291
copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
SYSC_unshare kernel/fork.c:2281 [inline]
Problem is that when we get error during xfrm_net_init we will call
xfrm_policy_fini which will acquire xfrm_policy_lock before it was
initialized. Just move it around so locks get set up first.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea90e0dc8cecba6359b481e24d9c37160f6f524f upstream.
Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out
to be perfectly accurate - there are various error paths that miss unlock
of the RTNL.
To fix those, change the locking a bit to not be conditional in all those
nl80211_prepare_*_dump() functions, but make those require the RTNL to
start with, and fix the buggy error paths. This also let me use sparse
(by appropriately overriding the rtnl_lock/rtnl_unlock functions) to
validate the changes.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream.
Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs. Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.
Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a05d4fd9176003e0c1f9c3d083f4dac19fd346ab upstream.
The net_cls controller controls the classid field of each socket which
is associated with the cgroup. Because the classid is per-socket
attribute, when a task migrates to another cgroup or the configured
classid of the cgroup changes, the controller needs to walk all
sockets and update the classid value, which was implemented by
3b13758f51de ("cgroups: Allow dynamically changing net_classid").
While the approach is not scalable, migrating tasks which have a lot
of fds attached to them is rare and the cost is born by the ones
initiating the operations. However, for simplicity, both the
migration and classid config change paths call update_classid() which
scans all fds of all tasks in the target css. This is an overkill for
the migration path which only needs to cover a much smaller subset of
tasks which are actually getting migrated in.
On cgroup v1, this can lead to unexpected scalability issues when one
tries to migrate a task or process into a net_cls cgroup which already
contains a lot of fds. Even if the migration traget doesn't have many
to get scanned, update_classid() ends up scanning all fds in the
target cgroup which can be extremely numerous.
Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
even worse. Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only
for subsystems which are actually affected by migration"), cgroup core
would call the ->css_attach callback even for controllers which don't
see actual migration to a different css.
As net_cls is always disabled but still mounted on cgroup v2, whenever
a process is migrated on the cgroup v2 hierarchy, net_cls sees
identity migration from root to root and cgroup core used to call
->css_attach callback for those. The net_cls ->css_attach ends up
calling update_classid() on the root net_cls css to which all
processes on the system belong to as the controller isn't used. This
makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
which is horrible and easily leads to noticeable stalls triggering RCU
stall warnings and so on.
The worst symptom is already fixed in upstream by bfc2cf6f61fc
("cgroup: call subsys->*attach() only for subsystems which are
actually affected by migration"); however, backporting that commit is
too invasive and we want to avoid other cases too.
This patch updates net_cls's cgrp_attach() to iterate fds of only the
processes which are actually getting migrated. This removes the
surprising migration cost which is dependent on the total number of
fds in the target cgroup. As this leaves write_classid() the only
user of update_classid(), open-code the helper into write_classid().
Reported-by: David Goode <dgoode@fb.com>
Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid")
Cc: Nina Schiff <ninasc@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 15bb7745e94a665caf42bfaabf0ce062845b533b ]
icsk_ack.lrcvtime has a 0 value at socket creation time.
tcpi_last_data_recv can have bogus value if no payload is ever received.
This patch initializes icsk_ack.lrcvtime for active sessions
in tcp_finish_connect(), and for passive sessions in
tcp_create_openreq_child()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a97e50cc4cb67e1e7bff56f6b41cda62ca832336 ]
In sk_clone_lock(), we create a new socket and inherit most of the
parent's members via sock_copy() which memcpy()'s various sections.
Now, in case the parent socket had a BPF socket filter attached,
then newsk->sk_filter points to the same instance as the original
sk->sk_filter.
sk_filter_charge() is then called on the newsk->sk_filter to take a
reference and should that fail due to hitting max optmem, we bail
out and release the newsk instance.
The issue is that commit 278571baca2a ("net: filter: simplify socket
charging") wrongly combined the dismantle path with the failure path
of xfrm_sk_clone_policy(). This means, even when charging failed, we
call sk_free_unlock_clone() on the newsk, which then still points to
the same sk_filter as the original sk.
Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually
where it tests for present sk_filter and calls sk_filter_uncharge()
on it, which potentially lets sk_omem_alloc wrap around and releases
the eBPF prog and sk_filter structure from the (still intact) parent.
Fix it by making sure that when sk_filter_charge() failed, we reset
newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(),
so that we don't mess with the parents sk_filter.
Only if xfrm_sk_clone_policy() fails, we did reach the point where
either the parent's filter was NULL and as a result newsk's as well
or where we previously had a successful sk_filter_charge(), thus for
that case, we do need sk_filter_uncharge() to release the prior taken
reference on sk_filter.
Fixes: 278571baca2a ("net: filter: simplify socket charging")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c64c0b3cac4c5b8cb093727d2c19743ea3965c0b ]
Alexander reported a KMSAN splat caused by reads of uninitialized
field (tb_id_in) from user provided struct fib_result_nl
It turns out nl_fib_input() sanity tests on user input is a bit
wrong :
User can pretend nlh->nlmsg_len is big enough, but provide
at sendmsg() time a too small buffer.
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d515684d78148884d5fc425ba904c50f03844020 ]
In the case udp_sk(sk)->pending is AF_INET6, udpv6_sendmsg() would
jump to do_append_data, skipping the initialization of sockc.tsflags.
Fix the problem by moving sockc.tsflags initialization earlier.
The bug was detected with KMSAN.
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7df9c24625b9981779afb8fcdbe2bb4765e61147 ]
Dmitry has reported that a BUG_ON() condition in unix_notinflight()
may be triggered by a simple code that forwards unix socket in an
SCM_RIGHTS message.
That is caused by incorrect unix socket GC implementation in unix_gc().
The GC first collects list of candidates, then (a) decrements their
"children's" inflight counter, (b) checks which inflight counters are
now 0, and then (c) increments all inflight counters back.
(a) and (c) are done by calling scan_children() with inc_inflight or
dec_inflight as the second argument.
Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage
collector") changed scan_children() such that it no longer considers
sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
of code that that unsets this flag _before_ invoking
scan_children(, dec_iflight, ). This may lead to incorrect inflight
counters for some sockets.
This change fixes this bug by changing order of operations:
UNIX_GC_CANDIDATE is now unset only after all inflight counters are
restored to the original state.
kernel BUG at net/unix/garbage.c:149!
RIP: 0010:[<ffffffff8717ebf4>] [<ffffffff8717ebf4>]
unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
Call Trace:
[<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
[<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
[<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
[<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
[<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
[<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705
[<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
[<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836
[<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570
[<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017
[<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208
[<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244
[<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
[< inline >] exit_task_work include/linux/task_work.h:21
[<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828
[<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931
[<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307
[<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
[<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Link: https://lkml.org/lkml/2017/3/6/252
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8f3dbfd79ed9ef9770305a7cc4e13dfd31ad2cd0 ]
Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement
in ip_tun_from_nlattr in order to prevent the default case
returning an error.
Fixes: b46f6ded906e ("libnl: nla_put_be64(): align on a 64-bit area")
Signed-off-by: Kris Murphy <kriskend@linux.vnet.ibm.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 22a0e18eac7a9e986fec76c60fa4a2926d1291e2 ]
I mistakenly added the code to release sk->sk_frag in
sk_common_release() instead of sk_destruct()
TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
sk_common_release() at close time, thus leaking one (order-3) page.
iSCSI is using such sockets.
Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3d20f1f7bd575d147ffa75621fa560eea0aec690 ]
When dealing with ipv6 source tunnel key address attribute
(OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel
dst ip, fix that.
Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eed50879d64ab1b9f76445dbab822e43a098b309 upstream.
New complaint from kbuild for 4.9.y:
net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in
comparison expression (different type sizes)
verbs.c:
489 max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);
I can't reproduce this running sparse here. Likewise, "make W=1
net/sunrpc/xprtrdma/verbs.o" never indicated any issue.
A little poking suggests that because the range of its values is
small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES
smaller than the width of an unsigned integer.
Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ]
This patch fixes a memory leak, which happens if the connection request
is not fulfilled between parsing the DCCP options and handling the SYN
(because e.g. the backlog is full), because we forgot to free the
list of ack vectors.
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:
#8 [] page_fault at ffffffff8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8
Of course it may happen with other NIC drivers as well.
It's found the freed dst_entry here:
224 static bool tcp_in_quickack_mode(struct sock *sk)↩
225 {↩
226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩
227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩
228 ↩
229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
231 }↩
But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.
All the vmcores showed 2 significant clues:
- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.
- All vmcores showed a postitive LockDroppedIcmps value, e.g:
LockDroppedIcmps 267
A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:
do_redirect()->__sk_dst_check()-> dst_release().
Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.
To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.
The dccp/IPv6 code is very similar in this respect, so fixing it there too.
As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().
Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|