summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-08-22include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.hMikko Rapeli
Fixes userspace compilation errors: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.hMikko Rapeli
Fixes userspace compilation errors like: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ ^ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.hMikko Rapeli
Fixes userspace compilation errors like: error: field ‘iph’ has incomplete type error: field ‘prefix’ has incomplete type Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22include/uapi/linux/if_pppox.h: include linux/if.hMikko Rapeli
Fixes userspace compilation error: error: ‘IFNAMSIZ’ undeclared here (not in a function) Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-21net: tehuti: fix typo: "eneble" -> "enable"Colin Ian King
trivial typo fix in pr_err message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-21net: xilinx: emaclite: Fallback to random MAC address.Daniel Romell
If the address configured in the device tree is invalid, the driver will fallback to using a random address from the locally administered range. Signed-off-by: Daniel Romell <daro@hms.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-20vmxnet3: fix tx data ring copy for variable sizeShrikrishna Khare
'Commit 3c8b3efc061a ("vmxnet3: allow variable length transmit data ring buffer")' changed the size of the buffers in the tx data ring from a fixed size of 128 bytes to a variable size. However, while copying data to the data ring, vmxnet3_copy_hdr continues to carry the old code that assumes fixed buffer size of 128. This patch fixes it by adding correct offset based on the actual data ring buffer size. Signed-off-by: Guolin Yang <gyang@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-20ixgbe: Do not clear RAR entry when clearing VMDq for SAN MACAlexander Duyck
The RAR entry for the SAN MAC address was being cleared when we were clearing the VMDq pool bits. In order to prevent this we need to add an extra check to protect the SAN MAC from being cleared. Fixes: 6e982aeae ("ixgbe: Clear stale pool mappings") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-20mlxsw: spectrum_buffers: Fix pool value handling in mlxsw_sp_sb_tc_pool_bind_setJiri Pirko
Pool index has to be converted by get_pool helper to work correctly for egress pool. In mlxsw the egress pool index starts from 0. Fixes: 0f433fa0ecc ("mlxsw: spectrum_buffers: Implement shared buffer configuration") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-20l2tp: Fix the connect status check in pppol2tp_getnameGao Feng
The sk->sk_state is bits flag, so need use bit operation check instead of value check. Signed-off-by: Gao Feng <fgao@ikuai8.com> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-20sctp: linearize early if it's not GSOMarcelo Ricardo Leitner
Because otherwise when crc computation is still needed it's way more expensive than on a linear buffer to the point that it affects performance. It's so expensive that netperf test gives a perf output as below: Overhead Command Shared Object Symbol 18,62% netserver [kernel.vmlinux] [k] crc32_generic_shift 2,57% netserver [kernel.vmlinux] [k] __pskb_pull_tail 1,94% netserver [kernel.vmlinux] [k] fib_table_lookup 1,90% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1,66% swapper [kernel.vmlinux] [k] intel_idle 1,63% netserver [kernel.vmlinux] [k] _raw_spin_lock 1,59% netserver [sctp] [k] sctp_packet_transmit 1,55% netserver [kernel.vmlinux] [k] memcpy_erms 1,42% netserver [sctp] [k] sctp_rcv # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000 SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 212992 212992 12000 10.00 3016.42 2.88 3.78 1.874 2.462 After patch: Overhead Command Shared Object Symbol 2,75% netserver [kernel.vmlinux] [k] memcpy_erms 2,63% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string 2,39% netserver [kernel.vmlinux] [k] fib_table_lookup 2,04% netserver [kernel.vmlinux] [k] __pskb_pull_tail 1,91% netserver [kernel.vmlinux] [k] _raw_spin_lock 1,91% netserver [sctp] [k] sctp_packet_transmit 1,72% netserver [mlx4_en] [k] mlx4_en_process_rx_cq 1,68% netserver [sctp] [k] sctp_rcv # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000 SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 212992 212992 12000 10.00 3681.77 3.83 3.46 2.045 1.849 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19Merge branch 'mlx5-fixes'David S. Miller
Saeed Mahameed says: ==================== Mellanox 100G mlx5 fixes 2016-08-16 This series includes some bug fixes for mlx5e driver. From Saeed and Tariq, Optimize MTU change to not reset when it is not required. From Paul, Command interface message length check to speedup firmware command preparation. From Mohamad, Save pci state when pci error is detected. From Amir, Flow counters "lastuse" update fix. From Hadar, Use correct flow dissector key on flower offloading. Plus a small optimization for switchdev hardware id query. From Or, three patches to address some E-Switch offloads issues. For -stable of 4.6.y and 4.7.y: net/mlx5e: Use correct flow dissector key on flower offloading net/mlx5: Fix pci error recovery flow net/mlx5: Added missing check of msg length in verifying its signature ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: E-Switch, Avoid ACLs in the offloads modeOr Gerlitz
When we are in the switchdev/offloads mode, HW matching is done as dictated by the offloaded rules and hence we don't need to enable the ACLs mechanism used by the legacy mode. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: E-Switch, Set the send-to-vport rules in the correct tableOr Gerlitz
While adding actual offloading support to the new switchdev mode, we didn't change the setup of the send-to-vport rules to put them in the slow path table, fix that. Fixes: 1033665e63b6 ('net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: E-Switch, Return the correct devlink e-switch modeOr Gerlitz
Since mlx5 has also the NONE e-switch mode, we must translate from mlx5 mode to devlink mode on the devlink eswitch mode get call, do that. While here, remove the mlx5_ prefix from the static function helpers that deal with the mode to comply with the rest of the code. Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode change') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5e: Retrieve the switchdev id from the firmware only onceHadar Hen Zion
Avoid firmware command execution each time the switchdev HW ID attr get call is made. We do that by reading the ID (PF NIC MAC) only once at load time and store it on the representor structure. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5e: Use correct flow dissector key on flower offloadingHadar Hen Zion
The wrong key is used when extracting the address type field set by the flower offload code. We have to use the control key and not the basic key, fix that. Fixes: e3a2b7ed018e ('net/mlx5e: Support offload cls_flower with drop action') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: Update last-use statistics for flow rulesAmir Vadai
Set lastuse statistic, when number of packets is changed compared to last query. This was wrongly dropped when bulk counter reading was added. Fixes: a351a1b03bf1 ('net/mlx5: Introduce bulk reading of flow counters') Signed-off-by: Amir Vadai <amirva@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: Added missing check of msg length in verifying its signaturePaul Blakey
Set and verify signature calculates the signature for each of the mailbox nodes, even for those that are unused (from cache). Added a missing length check to set and verify only those which are used. While here, also moved the setting of msg's nodes token to where we already go over them. This saves a pass because checksum is disabled, and the only useful thing remaining that set signature does is setting the token. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5: Fix pci error recovery flowMohamad Haj Yahia
When PCI error is detected we should save the state of the pci prior to disabling it. Also when receiving pci slot reset call we need to verify that the device is responsive. Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5e: Optimization for MTU changeTariq Toukan
Avoid unnecessary interface down/up operations upon an MTU change when it does not affect the rings configuration. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net/mlx5e: Set port MTU on netdev creation rather on openSaeed Mahameed
Port mtu shouldn't be written to hardware on every single interface open. Here we set it only when needed, on change_mtu and netdevice creation. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19fib_trie: Fix the description of pos and bitsXunlei Pang
1) Fix one typo: s/tn/tp/ 2) Fix the description about the "u" bits. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19Merge branch 'kaweth-oopses'David S. Miller
Oliver Neukum says: ==================== fixes to kaweth in response to Umap2 testing These patches fix an oops in firmware downloading and an oops due to a memory allocation failure ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19kaweth: fix oops upon failed memory allocationOliver Neukum
Just return an error upon failure. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19kaweth: fix firmware downloadOliver Neukum
This fixes the oops discovered by the Umap2 project and Alan Stern. The intf member needs to be set before the firmware is downloaded. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net: bgmac: fix reversed check for MII registration errorRafał Miłecki
It was failing on successful registration returning meaningless errors. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Fixes: 55954f3bfdac ("net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19tcp: fix use after free in tcp_xmit_retransmit_queue()Eric Dumazet
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the tail of the write queue using tcp_add_write_queue_tail() Then it attempts to copy user data into this fresh skb. If the copy fails, we undo the work and remove the fresh skb. Unfortunately, this undo lacks the change done to tp->highest_sack and we can leave a dangling pointer (to a freed skb) Later, tcp_xmit_retransmit_queue() can dereference this pointer and access freed memory. For regular kernels where memory is not unmapped, this might cause SACK bugs because tcp_highest_sack_seq() is buggy, returning garbage instead of tp->snd_nxt, but with various debug features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel. This bug was found by Marco Grassi thanks to syzkaller. Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19cxgb4: Fixes resource allocation for ULD's in kdump kernelHariprasad Shenai
At present the code to check in kdump kernel was not disabling allocation of resources when CONFIG_CHELSIO_T4_DCB is defined, move the code outside #defines so that it gets disabled irrespective of #define, when in kdump kernel. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19net: thunderx: Fix OOPs with ethtool --register-dumpDavid Daney
The ethtool_ops .get_regs function attempts to read the nonexistent register NIC_QSET_SQ_0_7_CNM_CHG, which produces a "bus error" type OOPs. Fix by not attempting to read, and removing the definition of, NIC_QSET_SQ_0_7_CNM_CHG. A zero is written into the register dump to keep the layout unchanged. Signed-off-by: David Daney <david.daney@cavium.com> Cc: <stable@vger.kernel.org> # 4.4.x- Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19qede: Fix Tx timeout due to xmit_moreYuval Mintz
Driver uses netif_tx_queue_stopped() to make sure the xmit_more indication will be honored, but that only checks for DRV_XOFF. At the same time, it's possible that during transmission the DQL will close the transmission queue with STACK_XOFF indication. In re-configuration flows, when the threshold is relatively low, it's possible that the device has no pending tranmissions, and during tranmission the driver would miss doorbelling the HW. Since there are no pending transmission, there will never be a Tx completion [and thus the DQL would not remove the STACK_XOFF indication], eventually causing the Tx queue to timeout. While we're at it - also doorbell in case driver has to close the transmission queue on its own [although this one is less important - if the ring is full, we're bound to receive completion eventually, which means the doorbell would only be postponed and not indefinetly blocked]. Fixes: 312e06761c99 ("qede: Utilize xmit_more") Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter updates for your net tree, they are: 1) Dump only conntrack that belong to this namespace via /proc file. This is some fallout from the conversion to single conntrack table for all netns, patch from Liping Zhang. 2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents module autoloading, also from Liping Zhang. 3) Report overquota event to the right netnamespace, again from Liping. 4) Fix tproxy listener sk refcount that leads to crash, from Eric Dumazet. 5) Fix racy refcounting on object deletion from nfnetlink and rule removal both for nfacct and cttimeout, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18netfilter: cttimeout: fix use after free error when delete netnsLiping Zhang
In general, when we want to delete a netns, cttimeout_net_exit will be called before ipt_unregister_table, i.e. before ctnl_timeout_put. But after call kfree_rcu in cttimeout_net_exit, we will still decrease the timeout object's refcnt in ctnl_timeout_put, this is incorrect, and will cause a use after free error. It is easy to reproduce this problem: # while : ; do ip netns add xxx ip netns exec xxx nfct add timeout testx inet icmp timeout 200 ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx ip netns del xxx done ======================================================================= BUG kmalloc-96 (Tainted: G B E ): Poison overwritten ----------------------------------------------------------------------- INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of 0x6b INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout] age=104 cpu=0 pid=3330 ___slab_alloc+0x4da/0x540 __slab_alloc+0x20/0x40 __kmalloc+0x1c8/0x240 cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout] nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink] [ ... ] So only when the refcnt decreased to 0, we call kfree_rcu to free the timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to avoid race between ctnl_timeout_try_del and ctnl_timeout_put. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-18netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroyLiping Zhang
Suppose that we input the following commands at first: # nfacct add test # iptables -A INPUT -m nfacct --nfacct-name test And now "test" acct's refcnt is 2, but later when we try to delete the "test" nfacct and the related iptables rule at the same time, race maybe happen: CPU0 CPU1 nfnl_acct_try_del nfnl_acct_put atomic_dec_and_test //ref=1,testfail - - atomic_dec_and_test //ref=0,testok - kfree_rcu atomic_inc //ref=1 - So after the rcu grace period, nf_acct will be freed but it is still linked in the nfnl_acct_list, and we can access it later, then oops will happen. Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic operation atomic_cmpxchg here to fix this problem. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix Fietkau. 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme. 3) Fix some tg3 ethtool logic bugs, and one that would cause no interrupts to be generated when rx-coalescing is set to 0. From Satish Baddipadige and Siva Reddy Kallam. 4) QLCNIC mailbox corruption and napi budget handling fix from Manish Chopra. 5) Fix fib_trie logic when walking the trie during /proc/net/route output than can access a stale node pointer. From David Forster. 6) Several sctp_diag fixes from Phil Sutter. 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel. 8) Checksum fixup fixes in bpf from Daniel Borkmann. 9) Memork leaks in nfnetlink, from Liping Zhang. 10) Use after free in rxrpc, from David Howells. 11) Use after free in new skb_array code of macvtap driver, from Jason Wang. 12) Calipso resource leak, from Colin Ian King. 13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang. 14) Fix bpf non-linear packet write helpers, from Daniel Borkmann. 15) Fix lockdep splats in macsec, from Sabrina Dubroca. 16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF handling. 17) Various tc-action bug fixes, from CONG Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net_sched: allow flushing tc police actions net_sched: unify the init logic for act_police net_sched: convert tcf_exts from list to pointer array net_sched: move tc offload macros to pkt_cls.h net_sched: fix a typo in tc_for_each_action() net_sched: remove an unnecessary list_del() net_sched: remove the leftover cleanup_a() mlxsw: spectrum: Allow packets to be trapped from any PG mlxsw: spectrum: Unmap 802.1Q FID before destroying it mlxsw: spectrum: Add missing rollbacks in error path mlxsw: reg: Fix missing op field fill-up mlxsw: spectrum: Trap loop-backed packets mlxsw: spectrum: Add missing packet traps mlxsw: spectrum: Mark port as active before registering it mlxsw: spectrum: Create PVID vPort before registering netdevice mlxsw: spectrum: Remove redundant errors from the code mlxsw: spectrum: Don't return upon error in removal path i40e: check for and deal with non-contiguous TCs ixgbe: Re-enable ability to toggle VLAN filtering ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths ...
2016-08-17Merge branch 'tc_action-fixes'David S. Miller
Cong Wang says: ==================== net_sched: tc action fixes and updates This patchset fixes a few regressions caused by the previous code refactor and more. Thanks to Jamal for catching them! Note, patch 3/7 and 4/7 are not strictly necessary for this patchset, I just want to carry them together. --- v4: adjust an indention for Jamal add two more patches v3: avoid list for fast path, suggested by Jamal v2: replace flex_array with regular dynamic array keep tcf_action_stats_update() in act_api.h fix macro typos found by Amir ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: allow flushing tc police actionsRoman Mashak
The act_police uses its own code to walk the action hashtable, which leads to that we could not flush standalone tc police actions, so just switch to tcf_generic_walker() like other actions. (Joint work from Roman and Cong.) Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: unify the init logic for act_policeWANG Cong
Jamal reported a crash when we create a police action with a specific index, this is because the init logic is not correct, we should always create one for this case. Just unify the logic with other tc actions. Fixes: a03e6fe56971 ("act_police: fix a crash during removal") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: convert tcf_exts from list to pointer arrayWANG Cong
As pointed out by Jamal, an action could be shared by multiple filters, so we can't use list to chain them any more after we get rid of the original tc_action. Instead, we could just save pointers to these actions in tcf_exts, since they are refcount'ed, so convert the list to an array of pointers. The "ugly" part is the action API still accepts list as a parameter, I just introduce a helper function to convert the array of pointers to a list, instead of relying on the C99 feature to iterate the array. Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: move tc offload macros to pkt_cls.hWANG Cong
struct tcf_exts belongs to filters, should not be visible to plain tc actions. Cc: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: fix a typo in tc_for_each_action()WANG Cong
It is harmless because all users pass 'a' to this macro. Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef") Cc: Amir Vadai <amir@vadai.me> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: remove an unnecessary list_del()WANG Cong
This list_del() for tc action is not needed actually, because we only use this list to chain bulk operations, therefore should not be carried for latter operations. Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17net_sched: remove the leftover cleanup_a()WANG Cong
After refactoring tc_action into tcf_common, we no longer need to cleanup temporary "actions" in list, they are permanently stored in the hashtable. Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2016-08-16 This series contains fixes to e1000e, igb, ixgbe and i40e. Kshitiz Gupta provides a fix for igb to resolve the PHY delay compensation math in several functions. Jarod Wilson provides a fix for e1000e which had to broken up into 2 patches, first is prepares the driver for expanding the list of NICs that have occasional ~10 hour clock jumps when being used for PTP. Second patch actually fixes i218 silicon which has been experiencing the clock jumps while using PTP. Alex provides 2 patches for ixgbe now that he is back at Intel. First fixes setting VLNCTRL.VFE bit, which was left unchanged in earlier patches which resulted in disabling VLAN filtering for all the VFs. Second corrects the support for disabling the VLAN tag filtering via the feature bit. Lastly, David fixes i40e which was causing a kernel panic when non-contiguous traffic classes or traffic classes not starting with TC0, were configured on a link partner switch. To fix this, changed the logic when determining the total number of TCs enabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17Merge branch 'mlxsw-fixes'David S. Miller
Jiri Pirko says: ==================== mlxsw: IPv4 UC router fixes Ido says: Patches 1-3 fix a long standing problem in the driver's init sequence, which manifests itself quite often when routing daemons try to configure an IP address on registered netdevs that don't yet have an associated vPort. Patches 4-9 add missing packet traps for the router to work properly and also fix ordering issue following the recent changes to the driver's init sequence. The last patch isn't related to the router, but fixes a general problem in which under certain conditions packets aren't trapped to CPU. v1->v2: - Change order of patch 7 - Add patch 6 following Ilan's comment - Add patchset name and cover letter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17mlxsw: spectrum: Allow packets to be trapped from any PGIdo Schimmel
When packets enter the device they are classified to a priority group (PG) buffer based on their PCP value. After their egress port and traffic class are determined they are moved to the switch's shared buffer and await transmission, if: (Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres) || (Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min || Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min) Packets scheduled to transmission through CPU port (trapped to CPU) use traffic class 7, which has a zero maximum and minimum quotas. However, when such packets arrive from PG 0 they are admitted to the shared buffer as PG 0 has a non-zero minimum quota. Allow all packets to be trapped to the CPU - regardless of the PG they were classified to - by assigning a 10KB minimum quota for CPU port and TC7. Fixes: 8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support") Reported-by: Tamir Winetroub <tamirw@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17mlxsw: spectrum: Unmap 802.1Q FID before destroying itIdo Schimmel
Before destroying the 802.1Q FID we should first remove the VID-to-FID mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to mlxsw_sp_fid_create(). Fixes: 14d39461b3f4 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17mlxsw: spectrum: Add missing rollbacks in error pathIdo Schimmel
While going over the code I noticed we are missing two rollbacks in the port's creation error path. Add them and adjust the place of one of them in the port's removal sequence so that both are symmetric. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17mlxsw: reg: Fix missing op field fill-upJiri Pirko
Ralue pack function needs to set op, otherwise it is 0 for add always. Fixes: d5a1c749d22 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17mlxsw: spectrum: Trap loop-backed packetsIdo Schimmel
One of the conditions to generate an ICMP Redirect Message is that "the packet is being forwarded out the same physical interface that it was received from" (RFC 1812). Therefore, we need to be able to trap such packets and let the kernel decide what to do with them. For each RIF, enable the loop-back filter, which will raise the LBERROR trap whenever the ingress RIF equals the egress RIF. Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces") Reported-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>