summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-02-20Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skbDouglas Anderson
In commit 44d271377479 ("Bluetooth: Compress the size of struct hci_ctrl") we squashed down the size of the structure by using a union with the assumption that all users would use the flag to determine whether we had a req_complete or a req_complete_skb. Unfortunately we had a case in hci_req_cmd_complete() where we weren't looking at the flag. This can result in a situation where we might be storing a hci_req_complete_skb_t in a hci_req_complete_t variable, or vice versa. During some testing I found at least one case where the function hci_req_sync_complete() was called improperly because the kernel thought that it didn't require an SKB. Looking through the stack in kgdb I found that it was called by hci_event_packet() and that hci_event_packet() had both of its locals "req_complete" and "req_complete_skb" pointing to the same place: both to hci_req_sync_complete(). Let's make sure we always check the flag. For more details on debugging done, see <http://crbug.com/588288>. Fixes: 44d271377479 ("Bluetooth: Compress the size of struct hci_ctrl") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-02-20af_unix: Don't use continue to re-execute unix_stream_read_generic loopRainer Weikusat
The unix_stream_read_generic function tries to use a continue statement to restart the receive loop after waiting for a message. This may not work as intended as the caller might use a recvmsg call to peek at control messages without specifying a message buffer. If this was the case, the continue will cause the function to return without an error and without the credential information if the function had to wait for a message while it had returned with the credentials otherwise. Change to using goto to restart the loop without checking the condition first in this case so that credentials are returned either way. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20unix_diag: fix incorrect sign extension in unix_lookup_by_inoDmitry V. Levin
The value passed by unix_diag_get_exact to unix_lookup_by_ino has type __u32, but unix_lookup_by_ino's argument ino has type int, which is not a problem yet. However, when ino is compared with sock_i_ino return value of type unsigned long, ino is sign extended to signed long, and this results to incorrect comparison on 64-bit architectures for inode numbers greater than INT_MAX. This bug was found by strace test suite. Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20net: use skb_postpush_rcsum instead of own implementationsDaniel Borkmann
Replace individual implementations with the recently introduced skb_postpush_rcsum() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tom Herbert <tom@herbertland.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20net/ethtool: support set coalesce per queueKan Liang
This patch implements sub command ETHTOOL_SCOALESCE for ioctl ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to set coalesce of each masked queue to device driver. The wanted coalesce information are stored in "data" for each masked queue, which can copy from userspace. If it fails to set coalesce to device driver, the value which already set to specific queue will be tried to rollback. Signed-off-by: Kan Liang <kan.liang@intel.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20net/ethtool: support get coalesce per queueKan Liang
This patch implements sub command ETHTOOL_GCOALESCE for ioctl ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to get coalesce of each masked queue from device driver. Then the interrupt coalescing parameters will be copied back to user space one by one. Signed-off-by: Kan Liang <kan.liang@intel.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20net/ethtool: introduce a new ioctl for per queue settingKan Liang
Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting. The following patches will enable some SUB_COMMANDs for per queue setting. Signed-off-by: Kan Liang <kan.liang@intel.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19ipv6: pass up EMSGSIZE msg for UDP socket in Ipv6Wei Wang
In ipv4, when the machine receives a ICMP_FRAG_NEEDED message, the connected UDP socket will get EMSGSIZE message on its next read from the socket. However, this is not the case for ipv6. This fix modifies the udp err handler in Ipv6 for ICMP6_PKT_TOOBIG to make it similar to ipv4 behavior. That is when the machine gets an ICMP6_PKT_TOOBIG message, the connected UDP socket will get EMSGSIZE message on its next read from the socket. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19lwt: fix rx checksum setting for lwt devices tunneling over ipv6Paolo Abeni
the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.") changed the default xmit checksum setting for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not set into external UDP header. This commit changes the rx checksum setting for both lwt vxlan/geneve devices created by openvswitch accordingly, so that lwt over ipv6 tunnel pairs are again able to communicate with default values. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19tipc: unlock in error pathInsu Yun
tipc_bcast_unlock need to be unlocked in error path. Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== Two of the fixes included in this patchset prevent wrong memory access - it was triggered when removing an object from a list after it was already free'd due to bad reference counting. This misbehaviour existed for both the gw_node and the orig_node_vlan object and has been fixed by Sven Eckelmann. The last patch fixes our interface feasibility check and prevents it from looping indefinitely when two net_device objects reference each other via iflink index (i.e. veth pair), by Andrew Lunn ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19rtnl: RTM_GETNETCONF: fix wrong return valueAnton Protopopov
An error response from a RTM_GETNETCONF request can return the positive error value EINVAL in the struct nlmsgerr that can mislead userspace. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19net: make netdev_for_each_lower_dev safe for device removalNikolay Aleksandrov
When I used netdev_for_each_lower_dev in commit bad531623253 ("vrf: remove slave queue and private slave struct") I thought that it acts like netdev_for_each_lower_private and can be used to remove the current device from the list while walking, but unfortunately it acts more like netdev_for_each_lower_private_rcu and doesn't allow it. The difference is where the "iter" points to, right now it points to the current element and that makes it impossible to remove it. Change the logic to be similar to netdev_for_each_lower_private and make it point to the "next" element so we can safely delete the current one. VRF is the only such user right now, there's no change for the read-only users. Here's what can happen now: [98423.249858] general protection fault: 0000 [#1] SMP [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy virtio_ring virtio scsi_mod [last unloaded: bridge] [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O 4.5.0-rc2+ #81 [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti: ffff88003428c000 [98423.256123] RIP: 0010:[<ffffffff81514f3e>] [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207 [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX: 0000000000000000 [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI: ffff880054ff90c0 [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09: 0000000000000000 [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003428f9e0 [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15: 0000000000000001 [98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000) knlGS:0000000000000000 [98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4: 00000000000406e0 [98423.258902] Stack: [98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004 ffff880054ff9000 [98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000 ffff88003428f978 [98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0 ffff880035b35f00 [98423.260739] Call Trace: [98423.260920] [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf] [98423.261156] [<ffffffff81518205>] rollback_registered_many+0x205/0x390 [98423.261401] [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70 [98423.261641] [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50 [98423.271557] [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0 [98423.271800] [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90 [98423.272049] [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200 [98423.272279] [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10 [98423.272513] [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40 [98423.272755] [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40 [98423.272983] [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0 [98423.273209] [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40 [98423.273476] [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0 [98423.273710] [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610 [98423.273947] [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70 [98423.274175] [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0 [98423.274416] [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140 [98423.274658] [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210 [98423.274894] [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210 [98423.275130] [<ffffffff81269611>] ? __fget_light+0x91/0xb0 [98423.275365] [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80 [98423.275595] [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20 [98423.275827] [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48 89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 [98423.279639] RIP [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.279920] RSP <ffff88003428f940> CC: David Ahern <dsa@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Vlad Yasevich <vyasevic@redhat.com> Fixes: bad531623253 ("vrf: remove slave queue and private slave struct") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19bridge: mdb: add support for more attributes and export timerNikolay Aleksandrov
Currently mdb entries are exported directly as a structure inside MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without breaking user-space. In order to export new mdb fields, I've converted the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before with struct br_mdb_entry (without header, as it's casted directly in iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we keep compatibility with older users and can export new data. I've tested this with iproute2, both with and without support for the added attribute and it works fine. So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry inside but it may contain also some additional MDBA_MDB_EATTR_ attributes such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space. So the new structure is: [MDBA_MDB] = { [MDBA_MDB_ENTRY] = { [MDBA_MDB_ENTRY_INFO] [MDBA_MDB_ENTRY_INFO] { <- Nested attribute struct br_mdb_entry <- nla_put_nohdr() [MDBA_MDB_ENTRY attributes] <- normal netlink attributes } } } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19bridge: mdb: reduce the indentation level in br_mdb_fill_infoNikolay Aleksandrov
Switch the port check and skip if it's null, this allows us to reduce one indentation level. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net: caif: fix erroneous return valueAnton Protopopov
The cfrfml_receive() function might return positive value EPROTO Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18appletalk: fix erroneous return valueAnton Protopopov
The atalk_sendmsg() function might return wrong value ENETUNREACH instead of -ENETUNREACH. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18IFF_NO_QUEUE: Fix for drivers not calling ether_setup()Phil Sutter
My implementation around IFF_NO_QUEUE driver flag assumed that leaving tx_queue_len untouched (specifically: not setting it to zero) by drivers would make it possible to assign a regular qdisc to them without having to worry about setting tx_queue_len to a useful value. This was only partially true: I overlooked that some drivers don't call ether_setup() and therefore not initialize tx_queue_len to the default value of 1000. Consequently, removing the workarounds in place for that case in qdisc implementations which cared about it (namely, pfifo, bfifo, gred, htb, plug and sfb) leads to problems with these specific interface types and qdiscs. Luckily, there's already a sanitization point for drivers setting tx_queue_len to zero, which can be reused to assign the fallback value most qdisc implementations used, which is 1. Fixes: 348e3435cbefa ("net: sched: drop all special handling of tx_queue_len == 0") Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18gre: clear IFF_TX_SKB_SHARINGJiri Benc
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre as it modifies the skb on xmit. Also, clean up whitespace in ipgre_tap_setup when we're already touching it. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18iptunnel: scrub packet in iptunnel_pull_headerJiri Benc
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net: bridge: log port STP state on changeVivien Didelot
Remove the shared br_log_state function and print the info directly in br_set_state, where the net_bridge_port state is actually changed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18nfnetlink: Revert "nfnetlink: add support for memory mapped netlink"Florian Westphal
reverts commit 3ab1f683bf8b ("nfnetlink: add support for memory mapped netlink")' Like previous commits in the series, remove wrappers that are not needed after mmapped netlink removal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18nfnetlink: remove nfnetlink_alloc_skbFlorian Westphal
Following mmapped netlink removal this code can be simplified by removing the alloc wrapper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18Revert "genl: Add genlmsg_new_unicast() for unicast message allocation"Florian Westphal
This reverts commit bb9b18fb55b0 ("genl: Add genlmsg_new_unicast() for unicast message allocation")'. Nothing wrong with it; its no longer needed since this was only for mmapped netlink support. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18openvswitch: Revert: "Enable memory mapped Netlink i/o"Florian Westphal
revert commit 795449d8b846 ("openvswitch: Enable memory mapped Netlink i/o"). Following the mmaped netlink removal this code can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18netlink: remove mmapped netlink supportFlorian Westphal
mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce0021087a5d812d2 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18tcp/dccp: fix another race at listener dismantleEric Dumazet
Ilya reported following lockdep splat: kernel: ========================= kernel: [ BUG: held lock freed! ] kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted kernel: ------------------------- kernel: swapper/5/0 is freeing memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there! kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 kernel: 4 locks held by swapper/5/0: kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>] netif_receive_skb_internal+0x4b/0x1f0 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>] ip_local_deliver_finish+0x3f/0x380 kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 To properly fix this issue, inet_csk_reqsk_queue_add() needs to return to its callers if the child as been queued into accept queue. We also need to make sure listener is still there before calling sk->sk_data_ready(), by holding a reference on it, since the reference carried by the child can disappear as soon as the child is put on accept queue. Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18route: check and remove route cache when we get routeXin Long
Since the gc of ipv4 route was removed, the route cached would has no chance to be removed, and even it has been timeout, it still could be used, cause no code to check it's expires. Fix this issue by checking and removing route cache when we get route. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net_sched: Improve readability of filter processingJamal Hadi Salim
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18bridge: switchdev: Offload VLAN flags to hardware bridgeIdo Schimmel
When VLANs are created / destroyed on a VLAN filtering bridge (MASTER flag set), the configuration is passed down to the hardware. However, when only the flags (e.g. PVID) are toggled, the configuration is done in the software bridge alone. While it is possible to pass these flags to hardware when invoked with the SELF flag set, this creates inconsistency with regards to the way the VLANs are initially configured. Pass the flags down to the hardware even when the VLAN already exists and only the flags are toggled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net_sched fix: reclassification needs to consider ether protocol changesJamal Hadi Salim
actions could change the etherproto in particular with ethernet tunnelled data. Typically such actions, after peeling the outer header, will ask for the packet to be reclassified. We then need to restart the classification with the new proto header. Example setup used to catch this: sudo tc qdisc add dev $ETH ingress sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \ u32 match u32 0 0 flowid 1:1 \ action vlan pop reclassify Fixes: 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net-sysfs: remove unused fmt_long_hexColin Ian King
Ever since commit 04ed3e741d0f133e02bed7fa5c98edba128f90e7 ("net: change netdev->features to u32") the format string fmt_long_hex has not been used, so we may as well remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18tcp: correctly crypto_alloc_hash return checkInsu Yun
crypto_alloc_hash never returns NULL Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18vlan: change return type of vlan_proc_rem_devZhang Shengju
Since function vlan_proc_rem_dev() will only return 0, it's better to return void instead of int. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18net: dsa: Unregister slave_dev in error pathFlorian Fainelli
With commit 0071f56e46da ("dsa: Register netdev before phy"), we are now trying to free a network device that has been previously registered, and in case of errors, this will make us hit the BUG_ON(dev->reg_state != NETREG_UNREGISTERED) condition. Fix this by adding a missing unregister_netdev() before free_netdev(). Fixes: 0071f56e46da ("dsa: Register netdev before phy") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17ipv4: Remove inet_lro libraryBen Hutchings
There are no longer any in-tree drivers that use it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17af_llc: fix types on llc_ui_wait_for_connOne Thousand Gnomes
The timeout is a long, we return it truncated if it is huge. Basically harmless as the only caller does a boolean check, but tidy it up anyway. (64bit build tested this time. Thank you 0day) Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17sctp: remove the unused sctp_datamsg_free()Xin Long
Since commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of sctp_datamsg_free, this function has no use in sctp. So we will remove it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17sctp: remove rcu_read_lock in sctp_seq_dump_remote_addrs()Xin Long
sctp_seq_dump_remote_addrs is only called by sctp_assocs_seq_show() and it has been protected by rcu_read_lock that is from rhashtable_walk_start(). So we will remove this one. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17sctp: move rcu_read_lock from __sctp_lookup_association to ↵Xin Long
sctp_lookup_association __sctp_lookup_association() is only invoked by sctp_v4_err() and sctp_rcv(), both which run on the rx BH, and it has been protected by rcu_read_lock [see ip_local_deliver_finish() / ipv6_rcv()]. So we can move it to sctp_lookup_association, only let sctp_lookup_association use rcu_read_lock. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17l2tp: Fix error creating L2TP tunnelsMark Tomlinson
A previous commit (33f72e6) added notification via netlink for tunnels when created/modified/deleted. If the notification returned an error, this error was returned from the tunnel function. If there were no listeners, the error code ESRCH was returned, even though having no listeners is not an error. Other calls to this and other similar notification functions either ignore the error code, or filter ESRCH. This patch checks for ESRCH and does not flag this as an error. Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17core: remove unneded headers for net cgroup controllers.Rosen, Rami
commit 3ed80a6 (cgroup: drop module support) made including module.h redundant in the net cgroup controllers, netclassid_cgroup.c and netprio_cgroup.c. This patch removes them. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: add tc offload feature flagJohn Fastabend
Its useful to turn off the qdisc offload feature at a per device level. This gives us a big hammer to enable/disable offloading. More fine grained control (i.e. per rule) may be supported later. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: sched: add cls_u32 offload hooks for netdevsJohn Fastabend
This patch allows netdev drivers to consume cls_u32 offloads via the ndo_setup_tc ndo op. This works aligns with how network drivers have been doing qdisc offloads for mqprio. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: rework setup_tc ndo op to consume general tc operandJohn Fastabend
This patch updates setup_tc so we can pass additional parameters into the ndo op in a generic way. To do this we provide structured union and type flag. This lets each classifier and qdisc provide its own set of attributes without having to add new ndo ops or grow the signature of the callback. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: rework ndo tc op to consume additional qdisc handle parameterJohn Fastabend
The ndo_setup_tc() op was added to support drivers offloading tx qdiscs however only support for mqprio was ever added. So we only ever added support for passing the number of traffic classes to the driver. This patch generalizes the ndo_setup_tc op so that a handle can be provided to indicate if the offload is for ingress or egress or potentially even child qdiscs. CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: Export ip fragment sysctl to unprivileged usersNikolay Borisov
Now that all the ip fragmentation related sysctls are namespaceified there is no reason to hide them anymore from "root" users inside containers. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17ipv4: namespacify ip fragment max dist sysctl knobNikolay Borisov
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17ipv4: namespacify ip_early_demux sysctl knobNikolay Borisov
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17ipv4: Namespacify ip_dynaddr sysctl knobNikolay Borisov
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>