summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-08-08bpf: fix checksum for vlan push/pop helperDaniel Borkmann
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't push rcsum of mac header back in and after BPF run back pull out again as opposed to some other subsystems (ovs, for example). For cases like q-in-q, meaning when a vlan tag for offloading is already present and we're about to push another one, then skb_vlan_push() pushes the inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing the 4 bytes vlan header diff. Likewise, for the reverse operation in skb_vlan_pop() for the case where vlan header needs to be pulled out of the skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes rcsum of the vlan header that was removed. However mangling the rcsum here will lead to hw csum failure for BPF case, since we're pulling or pushing data that was not part of the current rcsum. Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN() is also not really an option since current behaviour is ABI by now, but apart from that would also mean to do quite a bit of useless work in the sense that usually 12 bytes need to be rcsum pushed/pulled also when we don't need to touch this vlan related corner case. One way to fix it would be to push the necessary rcsum fixup down into vlan helpers that are (mostly) slow-path anyway. Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bpf: fix checksum fixups on bpf_skb_store_bytesDaniel Borkmann
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way. Where we ran into an issue with bpf_skb_store_bytes() is when we did a single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The underlying issue has been tracked down to a buffer alignment issue. Meaning, that csum_partial() computations via skb_postpull_rcsum() and skb_postpush_rcsum() pair invoked had a wrong result since they operated on an odd address for the hoplimit, while other computations were done on an even address. This mix doesn't work as-is with skb_postpull_rcsum(), skb_postpush_rcsum() pair as it always expects at least half-word alignment of input buffers, which is normally the case. Thus, instead of these helpers using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(), csum_block_add(), respectively. For unaligned offsets, they rotate the sum to align it to a half-word boundary again, otherwise they work the same as csum_sub() and csum_add(). Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers use a 0 constant for offset so that the compiler optimizes the offset & 1 test away and generates the same code as with csum_sub()/_add(). Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08bpf: also call skb_postpush_rcsum on xmit occasionsDaniel Borkmann
Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect locations which need CHECKSUM_COMPLETE fixups on ingress. For the same reasons as described in f8ffad69c9f8 already, we of course also need this here, since dev_queue_xmit() on a veth device will let us end up in the dev_forward_skb() helper again to cross namespaces. Latter then calls into skb_postpull_rcsum() to pull out L2 header, so that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers. Also here we have to address bpf_redirect() and bpf_clone_redirect(). Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp_diag: Respect ss adding TCPF_CLOSE to idiag_statesPhil Sutter
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't rely upon TCPF_LISTEN flag solely being present when listening sockets are requested. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08sctp_diag: Fix T3_rtx timer exportPhil Sutter
The asoc's timer value is not kept in asoc->timeouts array but in it's primary transport instead. Furthermore, we must export the timer only if it is pending, otherwise the value will underrun when stored in an unsigned variable and user space will only see a very large timeout value. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-07Merge tag 'mac80211-for-davem-2016-08-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== First set of fixes for the current cycle: * fix 80+80 bandwidth warning * fix powersave with mac80211 TXQ implementation * use correct way to free SKBs from multicast buffering * mesh: fix operation ordering to work with all drivers * mesh: end service period even when peer goes away * mesh: correct HT opmode validity checks * pass hw pointer from mac80211 to driver in TPT method, fixing a bug (in a bit the wrong way, but that's what we have right now) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06ipv4: panic in leaf_walk_rcu due to stale node pointerDavid Forster
Panic occurs when issuing "cat /proc/net/route" whilst populating FIB with > 1M routes. Use of cached node pointer in fib_route_get_idx is unsafe. BUG: unable to handle kernel paging request at ffffc90001630024 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000 RIP: 0010:[<ffffffff814cf6a0>] [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480 FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0 Stack: ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00 ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000 ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80 Call Trace: [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0 [<ffffffff811dd8b9>] ? seq_read+0x149/0x380 [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500 [<ffffffff8138dce0>] ? process_echoes+0x70/0x70 [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70 [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0 [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0 [<ffffffff811bbe61>] ? vfs_read+0x81/0x120 [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0 [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89 RIP [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP <ffff88011a05fda0> CR2: ffffc90001630024 Signed-off-by: Dave Forster <dforster@brocade.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06rxrpc: Fix races between skb free, ACK generation and replyingDavid Howells
Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06OVS: Ignore negative headroom valueIan Wienand
net_device->ndo_set_rx_headroom (introduced in 871b642adebe300be2e50aa5f65a418510f636ec) says "Setting a negtaive value reset the rx headroom to the default value". It seems that the OVS implementation in 3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets dev->needed_headroom unconditionally. This doesn't have an immediate effect, but can mess up later LL_RESERVED_SPACE calculations, such as done in net/ipv6/mcast.c:mld_newpack. For reference, this issue was found from a skb_panic raised there after the length calculations had given the wrong result. Note the other current users of this interface (drivers/net/tun.c:tun_set_headroom and drivers/net/veth.c:veth_set_rx_headroom) are both checking this correctly thus need no modification. Thanks to Ben for some pointers from the crash dumps! Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-05mac80211: Add ieee80211_hw pointer to get_expected_throughputMaxim Altshul
The variable is added to allow the driver an easy access to it's own hw->priv when the op is invoked. This fixes a crash in wlcore because it was relying on a station pointer that wasn't initialized yet. It's the wrong way to fix the crash, but it solves the problem for now and it does make sense to have the hw pointer here. Signed-off-by: Maxim Altshul <maxim.altshul@ti.com> [rewrite commit message, fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE valueMasashi Honma
Previously, NL80211_MESHCONF_HT_OPMODE validation rejected correct flag combinations, e.g. IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED | IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT. Doing just a range-check allows setting flags that don't exist (0x8) and invalid flag combinations. Implements some checks based on IEEE 802.11 2012 8.4.2.59 "HT Operation element". Signed-off-by: Masashi Honma <masashi.honma@gmail.com> [reword commit message, simplify a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05mac80211: End the MPSP even if EOSP frame was not ackedMasashi Honma
If QoS frame with EOSP (end of service period) subfield=1 sent by local peer was not acked by remote peer, local peer did not end the MPSP. This prevents local peer from going to DOZE state. And if the remote peer goes away without closing connection, local peer continues AWAKE state and wastes battery. Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-05mac80211: fix purging multicast PS buffer queueFelix Fietkau
The code currently assumes that buffered multicast PS frames don't have a pending ACK frame for tx status reporting. However, hostapd sends a broadcast deauth frame on teardown for which tx status is requested. This can lead to the "Have pending ack frames" warning on module reload. Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-03openvswitch: Remove incorrect WARN_ONCE().Jarno Rajahalme
ovs_ct_find_existing() issues a warning if an existing conntrack entry classified as IP_CT_NEW is found, with the premise that this should not happen. However, a newly confirmed, non-expected conntrack entry remains IP_CT_NEW as long as no reply direction traffic is seen. This has resulted into somewhat confusing kernel log messages. This patch removes this check and warning. Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix several cases of missing of_node_put() calls in various networking drivers. From Peter Chen. 2) Don't try to remove unconfigured VLANs in qed driver, from Yuval Mintz. 3) Unbalanced locking in TIPC error handling, from Wei Yongjun. 4) Fix lockups in CPDMA driver, from Grygorii Strashko. 5) More MACSEC refcount et al fixes, from Sabrina Dubroca. 6) Fix MAC address setting in r8169 during runtime suspend, from Chun-Hao Lin. 7) Various printf format specifier fixes, from Heinrich Schuchardt. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) qed: Fail driver load in 100g MSI mode. ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle ethernet: altera: add missing of_node_put 8139too: fix system hang when there is a tx timeout event. qed: Fix error return code in qed_resc_alloc() net: qlcnic: avoid superfluous assignement dsa: b53: remove redundant if ...
2016-08-03mac80211: mesh: flush stations before beacons are stoppedMaital Hahn
Some drivers (e.g. wl18xx) expect that the last stage in the de-initialization process will be stopping the beacons, similar to AP flow. Update ieee80211_stop_mesh() flow accordingly. As peers can be removed dynamically, this would not impact other drivers. Tested also on Ralink RT3572 chipset. Signed-off-by: Maital Hahn <maitalm@ti.com> Signed-off-by: Yaniv Machani <yanivma@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-02Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull Ceph updates from Ilya Dryomov: "The highlights are: - RADOS namespace support in libceph and CephFS (Zheng Yan and myself). The stopgaps added in 4.5 to deny access to inodes in namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature bit is now fully supported - A large rework of the MDS cap flushing code (Zheng Yan) - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were overly pessimistic before, bailing at the first sight of LOOKUP_RCU On top of that we've got a few CephFS bug fixes, a couple of cleanups and Arnd's workaround for a weird genksyms issue" * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits) ceph: fix symbol versioning for ceph_monc_do_statfs ceph: Correctly return NXIO errors from ceph_llseek ceph: Mark the file cache as unreclaimable ceph: optimize cap flush waiting ceph: cleanup ceph_flush_snaps() ceph: kick cap flushes before sending other cap message ceph: introduce an inode flag to indicates if snapflush is needed ceph: avoid sending duplicated cap flush message ceph: unify cap flush and snapcap flush ceph: use list instead of rbtree to track cap flushes ceph: update types of some local varibles ceph: include 'follows' of pending snapflush in cap reconnect message ceph: update cap reconnect message to version 3 ceph: mount non-default filesystem by name libceph: fsmap.user subscription support ceph: handle LOOKUP_RCU in ceph_d_revalidate ceph: allow dentry_lease_is_valid to work under RCU walk ceph: clear d_fsinfo pointer under d_lock ceph: remove ceph_mdsc_lease_release ceph: don't use ->d_time ...
2016-08-02mac80211: fix check for buffered powersave frames with txqFelix Fietkau
The logic was inverted here, set the bit if frames are pending. Fixes: ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-02cfg80211: fix missing break in NL8211_CHAN_WIDTH_80P80 caseColin Ian King
The switch on chandef->width is missing a break on the NL8211_CHAN_WIDTH_80P80 case; currently we get a WARN_ON when center_freq2 is non-zero because of the missing break. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-31sctp: allow receiving msg when TCP-style sk is in CLOSED stateXin Long
Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the assoc is closed when sctp_accept clones a new sk. If there is still data in sk receive queue, users will not be able to read it any more, as sctp_recvmsg returns directly if sk state is CLOSED. This patch is to add CLOSED state check in sctp_recvmsg to allow reading data from TCP-style sk with CLOSED state as what TCP does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-31sctp: allow delivering notifications after receiving SHUTDOWNXin Long
Prior to this patch, once sctp received SHUTDOWN or shutdown with RD, sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would be dropped in sctp_ulpq_tail_event(). It would cause: 1. some notifications couldn't be received by users. like SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C(). 2. sctp would also never trigger sk_data_ready when the association was closed, making it harder to identify the end of the association by calling recvmsg() and getting an EOF. It was not convenient for kernel users. The check here should be stopping delivering DATA chunks after receiving SHUTDOWN, and stopping delivering ANY chunks after sctp_close(). So this patch is to allow notifications to enqueue into receive queue even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event, but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all events. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-31sctp: fix the issue sctp requeue auth chunk incorrectlyXin Long
sctp needs to queue auth chunk back when we know that we are going to generate another segment. But commit f1533cce60d1 ("sctp: fix panic when sending auth chunks") requeues the last chunk processed which is probably not the auth chunk. It causes panic when calculating the MAC in sctp_auth_calculate_hmac(), as the incorrect offset of the auth chunk in skb->data. This fix is to requeue it by using packet->auth. Fixes: f1533cce60d1 ("sctp: fix panic when sending auth chunks") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-31tcp: consider recv buf for the initial window scaleSoheil Hassas Yeganeh
tcp_select_initial_window() intends to advertise a window scaling for the maximum possible window size. To do so, it considers the maximum of net.ipv4.tcp_rmem[2] and net.core.rmem_max as the only possible upper-bounds. However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE to set the socket's receive buffer size to values larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max. Thus, SO_RCVBUFFORCE is effectively ignored by tcp_select_initial_window(). To fix this, consider the maximum of net.ipv4.tcp_rmem[2], net.core.rmem_max and socket's initial buffer space. Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-31net: ipv6: use list_move instead of list_del/list_addWei Yongjun
Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-31tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor()Wei Yongjun
In the error handling case of nla_nest_start() failed read_unlock_bh() is called to unlock a lock that had not been taken yet. sparse warns about the context imbalance as the following: net/tipc/monitor.c:799:23: warning: context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes') Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30Merge tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - nfs: don't create zero-length requests - several LAYOUTGET bugfixes Features: - several performance related features - more aggressive caching when we can rely on close-to-open cache consistency - remove serialisation of O_DIRECT reads and writes - optimise several code paths to not flush to disk unnecessarily. However allow for the idiosyncracies of pNFS for those layout types that need to issue a LAYOUTCOMMIT before the metadata can be updated on the server. - SUNRPC updates to the client data receive path - pNFS/SCSI support RH/Fedora dm-mpath device nodes - pNFS files/flexfiles can now use unprivileged ports when the generic NFS mount options allow it. Bugfixes: - Don't use RDMA direct data placement together with data integrity or privacy security flavours - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential security holes. - Several layout recall fixes to improve NFSv4.1 protocol compliance. - Fix an Oops in the pNFS files and flexfiles connection setup to the DS - Allow retry of operations that used a returned delegation stateid - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate()" * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits) pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding NFSv4: Clean up lookup of SECINFO_NO_NAME NFSv4.2: Fix warning "variable ‘stateids’ set but not used" NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’" SUNRPC: Fix a compiler warning in fs/nfs/clnt.c pNFS: Remove redundant smp_mb() from pnfs_init_lseg() pNFS: Cleanup - do layout segment initialisation in one place pNFS: Remove redundant stateid invalidation pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() pNFS: Clear the layout metadata if the server changed the layout stateid pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id pNFS: Do not set plh_return_seq for non-callback related layoutreturns pNFS: Ensure layoutreturn acts as a completion for layout callbacks pNFS: Fix CB_LAYOUTRECALL stateid verification pNFS: Always update the layout barrier seqid on LAYOUTGET pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set pNFS: Clear the layout return tracking on layout reinitialisation pNFS: LAYOUTRETURN should only update the stateid if the layout is valid nfs: don't create zero-length requests ...
2016-07-30Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
2016-07-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns vfs updates from Eric Biederman: "This tree contains some very long awaited work on generalizing the user namespace support for mounting filesystems to include filesystems with a backing store. The real world target is fuse but the goal is to update the vfs to allow any filesystem to be supported. This patchset is based on a lot of code review and testing to approach that goal. While looking at what is needed to support the fuse filesystem it became clear that there were things like xattrs for security modules that needed special treatment. That the resolution of those concerns would not be fuse specific. That sorting out these general issues made most sense at the generic level, where the right people could be drawn into the conversation, and the issues could be solved for everyone. At a high level what this patchset does a couple of simple things: - Add a user namespace owner (s_user_ns) to struct super_block. - Teach the vfs to handle filesystem uids and gids not mapping into to kuids and kgids and being reported as INVALID_UID and INVALID_GID in vfs data structures. By assigning a user namespace owner filesystems that are mounted with only user namespace privilege can be detected. This allows security modules and the like to know which mounts may not be trusted. This also allows the set of uids and gids that are communicated to the filesystem to be capped at the set of kuids and kgids that are in the owning user namespace of the filesystem. One of the crazier corner casees this handles is the case of inodes whose i_uid or i_gid are not mapped into the vfs. Most of the code simply doesn't care but it is easy to confuse the inode writeback path so no operation that could cause an inode write-back is permitted for such inodes (aka only reads are allowed). This set of changes starts out by cleaning up the code paths involved in user namespace permirted mounts. Then when things are clean enough adds code that cleanly sets s_user_ns. Then additional restrictions are added that are possible now that the filesystem superblock contains owner information. These changes should not affect anyone in practice, but there are some parts of these restrictions that are changes in behavior. - Andy's restriction on suid executables that does not honor the suid bit when the path is from another mount namespace (think /proc/[pid]/fd/) or when the filesystem was mounted by a less privileged user. - The replacement of the user namespace implicit setting of MNT_NODEV with implicitly setting SB_I_NODEV on the filesystem superblock instead. Using SB_I_NODEV is a stronger form that happens to make this state user invisible. The user visibility can be managed but it caused problems when it was introduced from applications reasonably expecting mount flags to be what they were set to. There is a little bit of work remaining before it is safe to support mounting filesystems with backing store in user namespaces, beyond what is in this set of changes. - Verifying the mounter has permission to read/write the block device during mount. - Teaching the integrity modules IMA and EVM to handle filesystems mounted with only user namespace root and to reduce trust in their security xattrs accordingly. - Capturing the mounters credentials and using that for permission checks in d_automount and the like. (Given that overlayfs already does this, and we need the work in d_automount it make sense to generalize this case). Furthermore there are a few changes that are on the wishlist: - Get all filesystems supporting posix acls using the generic posix acls so that posix_acl_fix_xattr_from_user and posix_acl_fix_xattr_to_user may be removed. [Maintainability] - Reducing the permission checks in places such as remount to allow the superblock owner to perform them. - Allowing the superblock owner to chown files with unmapped uids and gids to something that is mapped so the files may be treated normally. I am not considering even obvious relaxations of permission checks until it is clear there are no more corner cases that need to be locked down and handled generically. Many thanks to Seth Forshee who kept this code alive, and putting up with me rewriting substantial portions of what he did to handle more corner cases, and for his diligent testing and reviewing of my changes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits) fs: Call d_automount with the filesystems creds fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns evm: Translate user/group ids relative to s_user_ns when computing HMAC dquot: For now explicitly don't support filesystems outside of init_user_ns quota: Handle quota data stored in s_user_ns in quota_setxquota quota: Ensure qids map to the filesystem vfs: Don't create inodes with a uid or gid unknown to the vfs vfs: Don't modify inodes with a uid or gid unknown to the vfs cred: Reject inodes with invalid ids in set_create_file_as() fs: Check for invalid i_uid in may_follow_link() vfs: Verify acls are valid within superblock's s_user_ns. userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS fs: Refuse uid/gid changes which don't map into s_user_ns selinux: Add support for unprivileged mounts from user namespaces Smack: Handle labels consistently in untrusted mounts Smack: Add support for unprivileged mounts from user namespaces fs: Treat foreign mounts as nosuid fs: Limit file caps to the user namespace of the super block userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag userns: Remove implicit MNT_NODEV fragility. ...
2016-07-28Merge branch 'salted-string-hash'Linus Torvalds
This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
2016-07-28libceph: fsmap.user subscription supportYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28libceph: make sure redirect does not change namespaceYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28libceph: rados pool namespace supportYan, Zheng
Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28libceph: introduce reference counted stringYan, Zheng
The data structure is for storing namesapce string. It allows namespace string to be shared between cephfs inodes with same layout. This data structure can also be referenced by OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28libceph: define new ceph_file_layout structureYan, Zheng
Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28libceph: fix some missing includesIlya Dryomov
- decode.h needs slab.h for kmalloc() - osd_client.h needs msgpool.h for struct ceph_msgpool - msgpool.h doesn't need messenger.h Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck. 2) Make DSA binding more sane, from Andrew Lunn. 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli. 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar. 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others. 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet. 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli. 8) Simplify netlink conntrack entry layout, from Florian Westphal. 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko. 10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang. 11) Support qdisc packet injection in pktgen, from John Fastabend. 12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy. 13) Add NV congestion control support to TCP, from Lawrence Brakmo. 14) Add GSO support to SCTP, from Marcelo Ricardo Leitner. 15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni. 16) Support MPLS over IPV4, from Simon Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...
2016-07-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
2016-07-26af_unix: charge buffers to kmemcgVladimir Davydov
Unix sockets can consume a significant amount of system memory, hence they should be accounted to kmemcg. Since unix socket buffers are always allocated from process context, all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in sock->sk_allocation mask. Eric asked: > 1) What happens when a buffer, allocated from socket <A> lands in a > different socket <B>, maybe owned by another user/process. > > Who owns it now, in term of kmemcg accounting ? We never move memcg charges. E.g. if two processes from different cgroups are sharing a memory region, each page will be charged to the process which touched it first. Or if two processes are working with the same directory tree, inodes and dentries will be charged to the first user. The same is fair for unix socket buffers - they will be charged to the sender. > 2) Has performance impact been evaluated ? I ran netperf STREAM_STREAM with default options in a kmemcg on a 4 core x2 HT box. The results are below: # clients bandwidth (10^6bits/sec) base patched 1 67643 +- 725 64874 +- 353 - 4.0 % 4 193585 +- 2516 186715 +- 1460 - 3.5 % 8 194820 +- 377 187443 +- 1229 - 3.7 % So the accounting doesn't come for free - it takes ~4% of performance. I believe we could optimize it by using per cpu batching not only on charge, but also on uncharge in memcg core, but that's beyond the scope of this patch set - I'll take a look at this later. Anyway, if performance impact is found to be unacceptable, it is always possible to disable kmem accounting at boot time (cgroup.memory=nokmem) or not use memory cgroups at runtime at all (thanks to jump labels there'll be no overhead even if they are compiled in). Link: http://lkml.kernel.org/r/fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26l2tp: Correctly return -EBADF from pppol2tp_getname.phil.turnbull@oracle.com
If 'tunnel' is NULL we should return -EBADF but the 'end_put_sess' path unconditionally sets 'error' back to zero. Rework the error path so it more closely matches pppol2tp_sendmsg. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26net: ipmr/ip6mr: update lastuse on entry changeNikolay Aleksandrov
Currently lastuse is updated on entry creation and cache hit, but it should also be updated on entry change. Since both on add and update the ttl array is updated we can simply update the lastuse in ipmr_update_thresholds. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Donald Sharp <sharpd@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26tipc: dump monitor attributesParthasarathy Bhuvaragan
In this commit, we dump the monitor attributes when queried. The link monitor attributes are separated into two kinds: 1. general attributes per bearer 2. specific attributes per node/peer This style resembles the socket attributes and the nametable publications per socket. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26tipc: add a function to get the bearer nameParthasarathy Bhuvaragan
Introduce a new function to get the bearer name from its id. This is used in subsequent commit. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26tipc: get monitor threshold for the clusterParthasarathy Bhuvaragan
In this commit, we add support to fetch the configured cluster monitoring threshold. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26tipc: make cluster size threshold for monitoring configurableParthasarathy Bhuvaragan
In this commit, we introduce support to configure the minimum threshold to activate the new link monitoring algorithm. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26tipc: introduce constants for tipc address validationParthasarathy Bhuvaragan
In this commit, we introduce defines for tipc address size, offset and mask specification for Zone.Cluster.Node. There is no functional change in this commit. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in ↵He Chunhui
neigh_update() NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged, the entry state should not be changed. Currently the code puts an extra "NUD_CONNECTED" condition. So if old state was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the state can be changed to NUD_STALE. This may cause problem. Because NUD_STALE lladdr doesn't guarantee reachability, when we send traffic, the state will be changed to NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()), we will change the state to NUD_PROBE and send probe traffic. But now the state may be reset to NUD_STALE again(e.g. by broadcast ARP packets), so the probe traffic will not be sent. This situation may happen again and again, and packets will be sent to an non-reachable lladdr forever. The fix is to remove the "NUD_CONNECTED" condition. After that the "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will be redundant, so remove it. This change may increase probe traffic, but it's essential since NUD_STALE lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr, when we can't get confirmation, even while remote packets try to set NUD_STALE state. Signed-off-by: Chunhui He <hchunhui@mail.ustc.edu.cn> Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There are a couple of new things for s390 with this merge request: - a new scheduling domain "drawer" is added to reflect the unusual topology found on z13 machines. Performance tests showed up to 8 percent gain with the additional domain. - the new crc-32 checksum crypto module uses the vector-galois-field multiply and sum SIMD instruction to speed up crc-32 and crc-32c. - proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in the generic vmlinux.lds linker script definitions. - kcov instrumentation support. A prerequisite for that is the inline assembly basic block cleanup, which is the reason for the net/iucv/iucv.c change. - support for 2GB pages is added to the hugetlbfs backend. Then there are two removals: - the oprofile hardware sampling support is dead code and is removed. The oprofile user space uses the perf interface nowadays. - the ETR clock synchronization is removed, this has been superseeded be the STP clock synchronization. And it always has been "interesting" code.. And the usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits) s390/pci: Delete an unnecessary check before the function call "pci_dev_put" s390/smp: clean up a condition s390/cio/chp : Remove deprecated create_singlethread_workqueue s390/chsc: improve channel path descriptor determination s390/chsc: sanitize fmt check for chp_desc determination s390/cio: make fmt1 channel path descriptor optional s390/chsc: fix ioctl CHSC_INFO_CU command s390/cio/device_ops: fix kernel doc s390/cio: allow to reset channel measurement block s390/console: Make preferred console handling more consistent s390/mm: fix gmap tlb flush issues s390/mm: add support for 2GB hugepages s390: have unique symbol for __switch_to address s390/cpuinfo: show maximum thread id s390/ptrace: clarify bits in the per_struct s390: stack address vs thread_info s390: remove pointless load within __switch_to s390: enable kcov support s390/cpumf: use basic block for ecctr inline assembly s390/hypfs: use basic block for diag inline assembly ...
2016-07-26net_sched: get rid of struct tcf_commonWANG Cong
After the previous patch, struct tc_action should be enough to represent the generic tc action, tcf_common is not necessary any more. This patch gets rid of it to make tc action code more readable. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26net_sched: move tc_action into tcf_commonWANG Cong
struct tc_action is confusing, currently we use it for two purposes: 1) Pass in arguments and carry out results from helper functions 2) A generic representation for tc actions The first one is error-prone, since we need to make sure we don't miss anything. This patch aims to get rid of this use, by moving tc_action into tcf_common, so that they are allocated together in hashtable and can be cast'ed easily. And together with the following patch, we could really make tc_action a generic representation for all tc actions and each type of action can inherit from it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skbDaniel Borkmann
After a612769774a3 ("udp: prevent bugcheck if filter truncates packet too much"), there followed various other fixes for similar cases such as f4979fcea7fd ("rose: limit sk_filter trim to payload"). Latter introduced a new helper sk_filter_trim_cap(), where we can pass the trim limit directly to the socket filter handling. Make use of it here as well with sizeof(struct udphdr) as lower cap limit and drop the extra skb->len test in UDP's input path. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>