summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-16ravb: Fix int mask value overwritten issueMasaru Nagai
When RX/TX interrupt for Network Control queue and Best Effort queue is issued at the same time, the interrupt mask of Network Control queue will be reset when the mask of Best Effort queue is set. This patch fixes this problem. Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: smsc911x: Reset PHY during initializationPavel Fedin
On certain hardware after software reboot the chip may get stuck and fail to reinitialize during reset. This can be fixed by ensuring that PHY is reset too. Old PHY resetting method required operational MDIO interface, therefore the chip should have been already set up. In order to be able to function during probe, it is changed to use PMT_CTRL register. The problem could be observed on SMDK5410 board. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16bpf, arm64: start flushing icache range from headerDaniel Borkmann
While recently going over ARM64's BPF code, I noticed that the icache range we're flushing should start at header already and not at ctx.image. Reason is that after b569c1c622c5 ("net: bpf: arm64: address randomize and write protect JIT code"), we also want to make sure to flush the random-sized trap in front of the start of the actual program (analogous to x86). No operational differences from user side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16bpf, arm: start flushing icache range from headerDaniel Borkmann
During review I noticed that the icache range we're flushing should start at header already and not at ctx.image. Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize and write protect JIT code"), we also want to make sure to flush the random-sized trap in front of the start of the actual program (analogous to x86). No operational differences from user side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16bpf: samples: exclude asm/sysreg.h for arm64Yang Shi
commit 338d4f49d6f7114a017d294ccf7374df4f998edc ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is incompatible with llvm so it will cause BPF samples build failure for ARM64. Since sysreg.h is useless for BPF samples, just exclude it from Makefile via defining __ASM_SYSREG_H. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16arm64: bpf: fix JIT frame pointer setupYang Shi
BPF fp should point to the top of the BPF prog stack. The original implementation made it point to the bottom incorrectly. Move A64_SP to fp before reserve BPF prog stack space. CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: phy: vitesse: add support for VSC8601Måns Rullgård
This adds support for the Vitesse VSC8601 PHY. Generic functions are used for everything except interrupt handling. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: phy: at803x: support interrupt on 8030 and 8035Måns Rullgård
Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode" added interrupt support for the 8031 PHY but left out the other two chips supported by this driver. This patch sets the .ack_interrupt and .config_intr functions for the 8030 and 8035 drivers as well. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16ip_tunnel: disable preemption when updating per-cpu tstatsJason A. Donenfeld
Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16hwmon: (scpi) skip unsupported sensors properlySudeep Holla
Currently it's assumed that firmware exports only the class of sensors supported by the driver. However with newer firmware or SCPI protocol revision, support for newer classes of sensors can be present. The driver fails to probe with the following warning if an unsupported class of sensor is encountered in the firmware. sysfs: cannot create duplicate filename '/devices/platform/scpi/scpi:sensors/hwmon/hwmon0/' ------------[ cut here ]------------ WARNING: at fs/sysfs/dir.c:31 Modules linked in: CPU: 0 PID: 6 Comm: kworker/u12:0 Not tainted 4.3.0-rc7 #137 Hardware name: ARM Juno development board (r0) (DT) Workqueue: deferwq deferred_probe_work_func PC is at sysfs_warn_dup+0x54/0x78 LR is at sysfs_warn_dup+0x54/0x78 This patch fixes the above issue by skipping through the unsupported class of SCPI sensors. Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors") Fixes: ea98b29a05e9 ("hwmon: Support sensors exported via ARM SCP interface") Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-11-16hwmon: (scpi) add thermal-of dependencyArnd Bergmann
The newly added scpi thermal support is broken when the scpi driver is built-in but the thermal driver is a loadable module: drivers/built-in.o: In function `scpi_hwmon_probe': (.text+0x444d70): undefined reference to `thermal_zone_of_sensor_unregister' (.text+0x444d94): undefined reference to `thermal_zone_of_sensor_register' drivers/built-in.o: In function `scpi_hwmon_remove': (text+0x444e6c): undefined reference to `thermal_zone_of_sensor_unregister' This uses the same Kconfig trick that we have in a couple of other drivers already to ensure we can only select the driver in valid configurations when either THERMAL_OF is disabled, or when with a dependency on CONFIG_THERMAL that can force SCPI to be a loadable module in the case I was hitting. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-11-16s390: remove SALIPL loaderHeiko Carstens
There is no known user, therefore remove the code. Acked-by: Rob Van Der Heij <robvdheij@nl.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: wire up mlock2 system callHeiko Carstens
Passes mlock2-tests test case in 64 bit and compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: remove g5 elf platform supportHeiko Carstens
Remove dead code, since this could only happen on a 31 bit machine where the kernel wouldn't IPL. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: avoid cache aliasing under z/VM and KVMMartin Schwidefsky
commit 1f6b83e5e4d3 ("s390: avoid z13 cache aliasing") checks for the machine type to optimize address space randomization and zero page allocation to avoid cache aliases. This check might fail under a hypervisor with migration support. z/VMs "Single System Image and Live Guest Relocation" facility will "fake" the machine type of the oldest system in the group. For example in a group of zEC12 and Z13 the guest appears to run on a zEC12 (architecture fencing within the relocation domain) Remove the machine type detection and always use cache aliasing rules that are known to work for all machines. These are the z13 aliasing rules. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16hwmon : (applesmc) Fix uninitialized variables warningsShuah Khan
Fix the following "maybe used uninitialized" warnings by initializing the variables to keep the compiler quiet. There is no "used uninitialized" in this case. CC [M] drivers/hwmon/applesmc.o drivers/hwmon/applesmc.c: In function ‘applesmc_init_smcreg’: drivers/hwmon/applesmc.c:595:43: warning: ‘right_light_sensor’ may be used uninitialized in this function [-Wmaybe-uninitialized] s->num_light_sensors = left_light_sensor + right_light_sensor; ^ drivers/hwmon/applesmc.c:540:26: note: ‘right_light_sensor’ was declared here bool left_light_sensor, right_light_sensor; ^ drivers/hwmon/applesmc.c:595:43: warning: ‘left_light_sensor’ may be used uninitialized in this function [-Wmaybe-uninitialized] s->num_light_sensors = left_light_sensor + right_light_sensor; ^ drivers/hwmon/applesmc.c:540:7: note: ‘left_light_sensor’ was declared here bool left_light_sensor, right_light_sensor; ^ Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-11-16hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2CLi Yang
Since a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap") the driver requires REGMAP_I2C to build. Select it by default in Kconfig. Reported-by: Guo Chunrong <B40290@freescale.com> Cc: Marc Titinger <mtitinger@baylibre.com> Signed-off-by: Li Yang <leoli@freescale.com> Fixes: a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-11-16Merge branch 'mv88e6060-fixes'David S. Miller
Neil Armstrong says: ==================== net: dsa: mv88e6060: cleanup and fix setup This patchset introduces some fixes and a registers addressing cleanup for the mv88e6060 DSA driver. The first patch removes the poll_link as mv88e6xxx. The 3 following patches fixes the setup in regards of the datasheet. The 2 last patches introduces a clean header and replaces all magic values. v2: cleanup InitReady patch, add missing Acked-by and fix header copyright notice ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: replace magic values with register definesNeil Armstrong
To align with the mv88e6xxx code, use the register defines to access all the register addresses and bit fields. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: add register defines header fileNeil Armstrong
To align with the mv88e6xxx code, add a similar header file with all the register defines. The file is based on the mv88e6xxx header for coherency. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: use the correct bit shift for mac0Neil Armstrong
According to the mv88e6060 datasheet, the first mac byte must be at position 9 instead of 8 since the bit 8 is used to select if the mac address must differ for each port for Pause frames. Use the correct shift and set the same mac address for all port. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: use the correct MaxFrameSize bitNeil Armstrong
According to the mv88e6060 datasheet, the MaxFrameSize bit position is 10 instead of 11 which is reserved. Use the bit correctly to setup max frame size to 1536. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: use the correct InitReady bitNeil Armstrong
According to the mv88e6060 datasheet, the InitReady bit position is 11 and the polarity is inverted. Use the bit correctly to detect the end of initialization. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16net: dsa: mv88e6060: remove poll_link callbackNeil Armstrong
As of mv88e6xxx remove the poll_link callback since the link state change polling is now handled by the phylib. Tested on a mv88e6060 B0 device with a TI DM816X SoC. Suggested-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16Linux 4.4-rc1Linus Torvalds
2015-11-15Merge branch 'mellanox-net-fixes'David S. Miller
Or Gerlitz says: ==================== Mellanox NIC driver update, Nov 12, 2015 Few small mlx5 and mlx4 fixes from the team... done over net commit c5a3788 "Merge branch 'akpm' (patches from Andrew)" Eran's patch needs to go to 4.2 and 4.3 stable kernels. Tariq's patch need to go to 4.3 stable too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx4_core: Avoid returning success in case of an error flowNoa Osherovich
The err variable wasn't set with the correct error value in some cases. Fixes: 47605df95398 ('mlx4: Modify proxy/tunnel QP mechanism [..]') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_countersEran Ben Elisha
When cleaning slave's counter resources, we hold a spinlock that protects the slave's counters list. As part of the clean, we call __mlx4_clear_if_stat which calls mlx4_alloc_cmd_mailbox which is a sleepable function. In order to fix this issue, hold the spinlock, and copy all counter indices into a temporary array, and release the spinlock. Afterwards, iterate over this array and free every counter. Repeat this scenario until the original list is empty (a new counter might have been added while releasing the counters from the temporary array). Fixes: b72ca7e96acf ("net/mlx4_core: Reset counters data when freed") Reported-by: Moni Shoua <monis@mellanox.com> Tested-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx5e: Use the right DMA free function on TX pathAchiad Shochat
On xmit path we use skb_frag_dma_map() which is using dma_map_page(), while upon completion we dma-unmap the skb fragments using dma_unmap_single() rather than dma_unmap_page(). To fix this, we now save the dma map type on xmit path and use this info to call the right dma unmap method upon TX completion. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx5e: Max mtu comparison fixDoron Tsur
On change mtu the driver compares between hardware queried mtu and software requested mtu. We need to compare between software representation of the queried mtu and the requested mtu. Fixes: facc9699f0fe ('net/mlx5e: Fix HW MTU settings') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx5e: Added self loopback preventionTariq Toukan
Prevent outgoing multicast frames from looping back to the RX queue. By introducing new HW capability self_lb_en_modifiable, which indicates the support to modify self_lb_en bit in modify_tir command. When this capability is set we can prevent TIRs from sending back loopback multicast traffic to their own RQs, by "refreshing TIRs" with modify_tir command, on every time new channels (SQs/RQs) are created at device open. This is needed since TIRs are static and only allocated once on driver load, and the loopback decision is under their responsibility. Fixes issues of the kind: "IPv6: eth2: IPv6 duplicate address fe80::e61d:2dff:fe5c:f2e9 detected!" The issue is seen since the IPv6 solicitations multicast messages are loopedback and the network stack thinks they are coming from another host. Fixes: 5c50368f3831 ("net/mlx5e: Light-weight netdev open/stop") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net/mlx5e: Fix inline header size calculationSaeed Mahameed
mlx5e_get_inline_hdr_size didn't take into account the vlan insertion into the inline WQE segment. This could lead to max inline violation in cases where skb_headlen(skb) + VLAN_HLEN >= sq->max_inline. Fixes: 3ea4891db8d0 ("net/mlx5e: Fix LSO vlan insertion") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipvs: use skb_to_full_sk() helperEric Dumazet
SYNACK packets might be attached to request sockets. Use skb_to_full_sk() helper to avoid illegal accesses to inet_sk(skb->sk) Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15tcp: ensure proper barriers in lockless contextsEric Dumazet
Some functions access TCP sockets without holding a lock and might output non consistent data, depending on compiler and or architecture. tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ... Introduce sk_state_load() and sk_state_store() to fix the issues, and more clearly document where this lack of locking is happening. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15net: thunder: Fix crash upon shutdown after failed probePavel Fedin
If device probe fails, driver remains bound to the PCI device. However, driver data has been reset to NULL. This causes crash upon dereferencing it in nicvf_remove() Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15sctp: translate host order to network order when setting a hmacidlucien
now sctp auth cannot work well when setting a hmacid manually, which is caused by that we didn't use the network order for hmacid, so fix it by adding the transformation in sctp_auth_ep_set_hmacs. even we set hmacid with the network order in userspace, it still can't work, because of this condition in sctp_auth_ep_set_hmacs(): if (id > SCTP_AUTH_HMAC_ID_MAX) return -EOPNOTSUPP; so this wasn't working before and thus it won't break compatibility. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15Merge branch 'packet-fixes'David S. Miller
Daniel Borkmann says: ==================== packet fixes Fixes a couple of issues in packet sockets, i.e. on TX ring side. See individual patches for details. v2 -> v3: - First two patches unchanged, kept Jason's Ack - Reworked 3rd patch and split into 3: - check for dev type as discussed with Willem - infer skb->protocol - fix max len for dgram v1 -> v2: - Added patch 2 as suggested by Dave - Rest is unchanged from previous submission ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15packet: fix tpacket_snd max frame lenDaniel Borkmann
Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW side. When used with SOCK_DGRAM only, the size_max > dev->mtu + reserve check should have reserve as 0, but currently, this is unconditionally set (in it's original form as dev->hard_header_len). I think this is not correct since tpacket_fill_skb() would then take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM, the extra VLAN_HLEN could be possible in both cases. Presumably, the reserve code was copied from packet_snd(), but later on missed the check. Make it similar as we have it in packet_snd(). Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15packet: infer protocol from ethernet header if unsetDaniel Borkmann
In case no struct sockaddr_ll has been passed to packet socket's sendmsg() when doing a TX_RING flush run, then skb->protocol is set to po->num instead, which is the protocol passed via socket(2)/bind(2). Applications only xmitting can go the path of allocating the socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the TX_RING with sll_protocol of 0. That way, register_prot_hook() is neither called on creation nor on bind time, which saves cycles when there's no interest in capturing anyway. That leaves us however with po->num 0 instead and therefore the TX_RING flush run sets skb->protocol to 0 as well. Eric reported that this leads to problems when using tools like trafgen over bonding device. I.e. the bonding's hash function could invoke the kernel's flow dissector, which depends on skb->protocol being properly set. In the current situation, all the traffic is then directed to a single slave. Fix it up by inferring skb->protocol from the Ethernet header when not set and we have ARPHRD_ETHER device type. This is only done in case of SOCK_RAW and where we have a dev->hard_header_len length. In case of ARPHRD_ETHER devices, this is guaranteed to cover ETH_HLEN, and therefore being accessed on the skb after the skb_store_bits(). Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15packet: only allow extra vlan len on ethernet devicesDaniel Borkmann
Packet sockets can be used by various net devices and are not really restricted to ARPHRD_ETHER device types. However, when currently checking for the extra 4 bytes that can be transmitted in VLAN case, our assumption is that we generally probe on ARPHRD_ETHER devices. Therefore, before looking into Ethernet header, check the device type first. This also fixes the issue where non-ARPHRD_ETHER devices could have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus the check would test unfilled linear part of the skb (instead of non-linear). Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.") Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15packet: always probe for transport headerDaniel Borkmann
We concluded that the skb_probe_transport_header() should better be called unconditionally. Avoiding the call into the flow dissector has also not really much to do with the direct xmit mode. While it seems that only virtio_net code makes use of GSO from non RX/TX ring packet socket paths, we should probe for a transport header nevertheless before they hit devices. Reference: http://thread.gmane.org/gmane.linux.network/386173/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15packet: do skb_probe_transport_header when we actually have dataDaniel Borkmann
In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport header before doing xmit") and later on 40893fd0fd4e ("net: switch to use skb_probe_transport_header()") was probing for a transport header on the skb from a ring buffer slot, but at a time, where the skb has _not even_ been filled with data yet. So that call into the flow dissector is pretty useless. Lets do it after we've set up the skb frags. Fixes: c1aad275b029 ("packet: set transport header before doing xmit") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15tools/net: Use include/uapi with __EXPORTED_HEADERS__Kamal Mostafa
Use the local uapi headers to keep in sync with "recently" added #define's (e.g. SKF_AD_VLAN_TPID). Refactored CFLAGS, and bpf_asm doesn't need -I. Fixes: 3f356385e8a4 ("filter: bpf_asm: add minimal bpf asm tool") Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15Merge branch 'ipv6-route-fixes'David S. Miller
Martin KaFai Lau says: ==================== ipv6: Fixes for pmtu update and DST_NOCACHE route This patchset fixes: 1. An oops during IPv6 pmtu update on a IPv4 GRE running in an IPSec setup 2. Misc fixes on DST_NOCACHE route ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Check rt->dst.from for the DST_NOCACHE routeMartin KaFai Lau
All DST_NOCACHE rt6_info used to have rt->dst.from set to its parent. After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"), DST_NOCACHE is also set to rt6_info which does not have a parent (i.e. rt->dst.from is NULL). This patch catches the rt->dst.from == NULL case. Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Check expire on DST_NOCACHE routeMartin KaFai Lau
Since the expires of the DST_NOCACHE rt can be set during the ip6_rt_update_pmtu(), we also need to consider the expires value when doing ip6_dst_check(). This patches creates __rt6_check_expired() to only check the expire value (if one exists) of the current rt. In rt6_dst_from_check(), it adds __rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 treeMartin KaFai Lau
The original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1272571 The setup has a IPv4 GRE tunnel running in a IPSec. The bug happens when ndisc starts sending router solicitation at the gre interface. The simplified oops stack is like: __lock_acquire+0x1b2/0x1c30 lock_acquire+0xb9/0x140 _raw_write_lock_bh+0x3f/0x50 __ip6_ins_rt+0x2e/0x60 ip6_ins_rt+0x49/0x50 ~~~~~~~~ __ip6_rt_update_pmtu.part.54+0x145/0x250 ip6_rt_update_pmtu+0x2e/0x40 ~~~~~~~~ ip_tunnel_xmit+0x1f1/0xf40 __gre_xmit+0x7a/0x90 ipgre_xmit+0x15a/0x220 dev_hard_start_xmit+0x2bd/0x480 __dev_queue_xmit+0x696/0x730 dev_queue_xmit+0x10/0x20 neigh_direct_output+0x11/0x20 ip6_finish_output2+0x21f/0x770 ip6_finish_output+0xa7/0x1d0 ip6_output+0x56/0x190 ~~~~~~~~ ndisc_send_skb+0x1d9/0x400 ndisc_send_rs+0x88/0xc0 ~~~~~~~~ The rt passed to ip6_rt_update_pmtu() is created by icmp6_dst_alloc() and it is not managed by the fib6 tree, so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL and it causes the ip6_ins_rt() oops. During pmtu update, we only want to create a RTF_CACHE clone from a rt which is currently managed (or owned) by the fib6 tree. It means either rt->rt6i_node != NULL or rt is a RTF_PCPU clone. It is worth to note that rt6i_table may not be NULL even it is not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()). Hence, rt6i_node is a better check instead of rt6i_table. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15fjes: fix inconsistent indentingColin Ian King
minor change, indenting is one tab out. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15af-unix: fix use-after-free with concurrent readers while splicingHannes Frederic Sowa
During splicing an af-unix socket to a pipe we have to drop all af-unix socket locks. While doing so we allow another reader to enter unix_stream_read_generic which can read, copy and finally free another skb. If exactly this skb is just in process of being spliced we get a use-after-free report by kasan. First, we must make sure to not have a free while the skb is used during the splice operation. We simply increment its use counter before unlocking the reader lock. Stream sockets have the nice characteristic that we don't care about zero length writes and they never reach the peer socket's queue. That said, we can take the UNIXCB.consumed field as the indicator if the skb was already freed from the socket's receive queue. If the skb was fully consumed after we locked the reader side again we know it has been dropped by a second reader. We indicate a short read to user space and abort the current splice operation. This bug has been found with syzkaller (http://github.com/google/syzkaller) by Dmitry Vyukov. Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Mostly updates to the perf tool plus two fixes to the kernel core code: - Handle tracepoint filters correctly for inherited events (Peter Zijlstra) - Prevent a deadlock in perf_lock_task_context (Paul McKenney) - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de Melo) - Print full source file paths when using 'perf annotate --print-line --full-paths' (Michael Petlan) - Fix 'perf probe -d' when just one out of uprobes and kprobes is enabled (Wang Nan) - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo) - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by the 'perf test' LLVM entries, when running it in-tree, to .gitignore (Yunlong Song) - libbpf error reporting improvements, using a strerror interface to more precisely tell the user about problems with the provided scriptlet, be it in C or as a ready made object file (Wang Nan) - Do not be case sensitive when searching for matching 'perf test' entries (Arnaldo Carvalho de Melo) - Inform the user about objdump failures in 'perf annotate' (Andi Kleen) - Improve the LLVM 'perf test' entry, introduce a new ones for BPF and kbuild tests to check the environment used by clang to compile .c scriptlets (Wang Nan)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro tools include: Add compiler.h to list.h perf probe: Verify parameters in two functions perf session: Add missing newlines to some pr_err() calls perf annotate: Support full source file paths for srcline fix perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore perf: Fix inherited events vs. tracepoint filters perf: Disable IRQs across RCU RS CS that acquires scheduler lock perf test: Do not be case sensitive when searching for matching tests perf test: Add 'perf test BPF' perf test: Enhance the LLVM tests: add kbuild test perf test: Enhance the LLVM test: update basic BPF test program perf bpf: Improve BPF related error messages perf tools: Make fetch_kernel_version() publicly available bpf tools: Add new API bpf_object__get_kversion() bpf tools: Improve libbpf error reporting perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy perf annotate: Inform the user about objdump failures in --stdio perf stat: Make stat options global perf sched latency: Fix thread pid reuse issue ...