summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2013-03-20net/mlx4_en: Disable RFS when running in SRIOV modeAmir Vadai
commit a229e488ac3f904d06c20d8d3f47831db3c7a15a upstream. Commit 37706996 "mlx4_en: fix allocation of CPU affinity reverse-map" fixed a bug when mlx4_dev->caps.comp_pool is larger from the device rx rings, but introduced a regression. When the mlx4_core is activating its "legacy mode" (e.g when running in SRIOV mode) w.r.t to EQs/IRQs usage, comp_pool becomes zero and we're crashing on divide by zero alloc_cpu_rmap. Fix that by enabling RFS only when running in non-legacy mode. Reported-by: Yan Burman <yanb@mellanox.com> Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20net/mlx4_en: Initialize RFS filters lock and list in init_netdevAmir Vadai
commit 78fb2de711ec28997bf38bcf3e48e108e907be77 upstream. filters_lock might have been used while it was re-initialized. Moved filters_lock and filters_list initialization to init_netdev instead of alloc_resources which is called every time the device is configured. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28mlx4_en: fix allocation of CPU affinity reverse-mapKleber Sacilotto de Souza
[ Upstream commit 3770699675dd1b8fc1e86ff369eb3cce44e10082 ] The mlx4_en driver allocates the number of objects for the CPU affinity reverse-map based on the number of rx rings of the device. However, mlx4_assign_eq() calls irq_cpu_rmap_add() as many times as IRQ's are assigned to EQ's, which can be as large as mlx4_dev->caps.comp_pool. If caps.comp_pool is larger than rx_ring_num we will eventually hit the BUG_ON() in cpu_rmap_add(). Fix this problem by allocating space for the maximum number of CPU affinity reverse-map objects we might want to add. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28mlx4_en: fix allocation of device tx_cqKleber Sacilotto de Souza
[ Upstream commit 427a96252d8eee7b9bbafce15bd37fa3387ede55 ] The memory to hold the network device tx_cq is not being allocated with the correct size in mlx4_en_init_netdev(). It should use MAX_TX_RINGS instead of MAX_RX_RINGS. This can cause problems if the number of tx rings being used is greater than MAX_RX_RINGS. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-08Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull IB regression fixes from Roland Dreier: - Fix mlx4 VFs not working on old guests because of 64B CQE changes - Fix ill-considered sparse fix for qib - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1 * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/qib: Fix for broken sparse warning fix mlx4_core: Fix advertisement of wrong PF context behaviour IPoIB: Fix crash due to skb double destruct
2013-02-05mlx4_core: Fix advertisement of wrong PF context behaviourOr Gerlitz
Commit 08ff32352d6f ("mlx4: 64-byte CQE/EQE support") introduced a regression where older guest VF drivers failed to load even when 64-byte EQEs/CQEs are disabled, since the PF wrongly advertises the new context behaviour anyway. The failure looks like: mlx4_core 0000:00:07.0: Unknown pf context behaviour mlx4_core 0000:00:07.0: Failed to obtain slave caps mlx4_core: probe of 0000:00:07.0 failed with error -38 Fix this by basing this advertisement on dev->caps.flags, which is the operational capabilities used by the QUERY_FUNC_CAP command wrapper (dev_cap->flags holds the firmware capabilities). Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-01-18net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaultsOr Gerlitz
The lines if (mlx4_is_mfunc(dev)) { nreq = 2; } else { which hard code the number of requested msi-x vectors under multi-function mode to two can be removed completely, since the firmware sets num_eqs and reserved_eqs appropriately Thus, the code line: nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq); is by itself sufficient and correct for all cases. Currently, for mfunc mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled. This triples (one vector is used for the async events and commands EQ) the horse power provided for processing of incoming packets on netdev RSS scheme, IO initiators/targets commands processing flows, etc. Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV modeYan Burman
Commit 5b4c4d36860e "mlx4_en: Allow communication between functions on same host" introduced a regression under which a bridge acting as vSwitch whose uplink is an mlx4 Ethernet device become non-operative in native (non sriov) mode. This happens since broadcast ARP requests sent by VMs were loopback-ed by the HW and hence the bridge learned VM source MACs on both the VM and the uplink ports. The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19mlx4_core: Allow choosing flow steering modeJack Morgenstein
Device managed flow steering will be enabled only under administrator directive provided through setting the existing module parameter log_num_mgm_entry_size to -1 (if the device actually supports flow steering). If flow steering isn't requested or not available, the driver will use the value of log_num_mgm_entry_size and B0 steering. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19mlx4_core: Adjustments to Flow Steering activation logic for SR-IOVJack Morgenstein
Separate flow steering capability detection from the decision to activate. For the master (and for native), detect the flow steering capability in mlx4_dev_cap, but activate the appropriate steering type in a new function choose_flow_steering() based on detected data. For VFs, activate flow steering based on what was actually activated by the master, where that info is obtained via QUERY_HCA. This fixes the current VF detection which is wrongly based on QUERY_DEV_CAP. Also, for SR-IOV mode, if flow steering may be activated, do so only if the max number of QPs per rule is sufficient to satisfy one subscription per VF. If not, fall back to B0 mode. This is needed to serve registrations done by L2 network drivers such as mlx4_en and IPoIB when the network stack attempts to join to multicast groups such as all-hosts or the IPoIB broadcast group. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19mlx4_core: Fix error flow in the flow steering wrapperHadar Hen Zion
The error flow of the flow steering wrapper had a typo which caused the wrong firmware command to be called, fix it. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19mlx4_core: Add QPN enforcement for flow steering rules set by VFsHadar Hen Zion
Since VFs may be mapped to VMs which aren't trusted entities, flow steering rules attached through the wrapper on behalf of VFs must be checked to make sure that the specified QP number is assigned to that VF. Also, make sure to keep the QP busy till the end of the operation from the resource tracker point of view. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-14Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband upate from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.8 merge window: - A good chunk of Bart Van Assche's SRP fixes - UAPI disintegration from David Howells - mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz - Other miscellaneous fixes" Fix up trivial conflict in mellanox/mlx4 driver. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (33 commits) RDMA/nes: Fix for crash when registering zero length MR for CQ RDMA/nes: Fix for terminate timer crash RDMA/nes: Fix for BUG_ON due to adding already-pending timer IB/srp: Allow SRP disconnect through sysfs srp_transport: Document sysfs attributes srp_transport: Simplify attribute initialization code srp_transport: Fix attribute registration IB/srp: Document sysfs attributes IB/srp: send disconnect request without waiting for CM timewait exit IB/srp: destroy and recreate QP and CQs when reconnecting IB/srp: Eliminate state SRP_TARGET_DEAD IB/srp: Introduce the helper function srp_remove_target() IB/srp: Suppress superfluous error messages IB/srp: Process all error completions IB/srp: Introduce srp_handle_qp_err() IB/srp: Simplify SCSI error handling IB/srp: Keep processing commands during host removal IB/srp: Eliminate state SRP_TARGET_CONNECTING IB/srp: Increase block layer timeout RDMA/cm: Change return value from find_gid_port() ...
2012-12-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...
2012-12-12net/mlx4_en: Add support for destination MAC in steering rulesYan Burman
Implement destination MAC rule extension for L3/L4 rules in flow steering. Usefull for vSwitch/macvlan configurations. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12net/mlx4_en: Use generic etherdevice.h functions.Yan Burman
Get rid of full_mac, zero_mac in favour of is_zero_ether_addr and is_broadcast_ether_addr. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07drivers/net: fix up function prototypes after __dev* removalsGreg Kroah-Hartman
The __dev* removal patches for the network drivers ended up messing up the function prototypes for a bunch of drivers. This patch fixes all of them back up to be properly aligned. Bonus is that this almost removes 100 lines of code, always a nice surprise. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03mlx4_core: remove __dev* attributesBill Pemberton
CONFIG_HOTPLUG is going away as an option. As result the __dev* markings will be going away. Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03net/mlx4_en: Set number of rx/tx channels using ethtoolAmir Vadai
Add support to changing number of rx/tx channels using ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool is the number of rings per user priority - not total number of tx rings. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03net/mlx4_en: Fix TX moderation info loss after set_ringparam is calledAmir Vadai
We need to re-set tx moderation information after calling set_ringparam else default tx moderation will be used. Also avoid related code duplication, by putting it in a utility function. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29mlx4_core: Fix potential deadlock in mlx4_eq_int()Jack Morgenstein
The slave_state_lock spinlock is used in both interrupt context and process context, hence irq locking must be used. Found by lockdep. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28net/mlx4_en: Can set maxrate only for TC0Amir Vadai
Had a typo in memcpy. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26mlx4: 64-byte CQE/EQE supportOr Gerlitz
ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-20net: Remove bogus dependencies on INETBen Hutchings
Various drivers depend on INET because they used to select INET_LRO, but they have all been converted to use GRO which has no such dependency. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20mlx4_en: Remove remnants of LRO supportBen Hutchings
Commit fa37a9586f92051de03a13e55e5ec3880bb6783e ('mlx4_en: Moving to work with GRO') left behind the Kconfig depends/select, some dead code and comments referring to LRO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19various: Fix spelling of "asynchronous" in comments.Adam Buchbinder
"Asynchronous" is misspelled in some comments. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07mlx4: change TX coalescing defaultsEric Dumazet
mlx4 currently uses a too high tx coalescing setting, deferring TX completion interrupts by up to 128 us. With the recent skb_orphan() removal in commit 8112ec3b872, performance of a single TCP flow is capped to ~4 Gbps, unless we increase tcp_limit_output_bytes. I suggest using 16 us instead of 128 us, allowing a finer control. Performance of a single TCP flow is restored to previous levels, while keeping TCP small queues fully enabled with default sysctl. This patch is also a BQL prereq. Reported-by: Vimalkumar <j.vimal@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yevgeny Petrilin <yevgenyp@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "This is what we usually expect at this stage of the game, lots of little things, mostly in drivers. With the occasional 'oops didn't mean to do that' kind of regressions in the core code." 1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann 2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu. 3) Lost error code on return from _rtl_usb_receive(), from Christian Lamparter. 4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka. 5) Release resources on error in pch_gbe driver, from Veaceslav Falico. 6) Default hop limit not set correctly in ip6_template_metrics[], fix from Li RongQing. 7) Gianfar PTP code requests wrong kind of resource during probe, fix from Wei Yang. 8) Fix VHOST net driver on big-endian, from Michael S Tsirkin. 9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni Shoua, Dotan Barak, and Uri Habusha. 10) usbnet leaks memory on TX path, fix from Hemant Kumar. 11) Use socket state test, rather than presence of FIN bit packet, to determine FIONREAD/SIOCINQ value. Fix from Eric Dumazet. 12) Fix cxgb4 build failure, from Vipul Pandya. 13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket info dumps. From Yuchung Cheng. 14) Fix leak of security path in kfree_skb_partial(). Fix from Eric Dumazet. 15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from Veaceslav Falico. 16) Fix MAINTAINERS file pattern for networking drivers, from Jean Delvare. 17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit. 18) VLAN device type change restriction is too strict, and should not trigger for the automatically generated vlan0 device. Fix from Jiri Pirko. 19) Make PMTU/redirect flushing work properly again in ipv4, from Steffen Klassert. 20) Fix memory corruptions by using kfree_rcu() in netlink_release(). From Eric Dumazet. 21) More qmi_wwan device IDs, from Bjørn Mork. 22) Fix unintentional change of SNAT/DNAT hooks in generic NAT infrastructure, from Elison Niven. 23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) tilegx: fix some issues in the SW TSO support qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan net: usb: Fix memory leak on Tx data path net/mlx4_core: Unmap UAR also in the case of error flow net/mlx4_en: Don't use vlan tag value as an indication for vlan presence net/mlx4_en: Fix double-release-range in tx-rings bas_gigaset: fix pre_reset handling vhost: fix mergeable bufs on BE hosts gianfar_ptp: use iomem, not ioports resource tree in probe ipv6: Set default hoplimit as zero. NET_VENDOR_TI: make available for am33xx as well pch_gbe: fix error handling in pch_gbe_up() b43: Fix oops on unload when firmware not found mwifiex: clean up scan state on error mwifiex: return -EBUSY if specific scan request cannot be honored brcmfmac: fix potential NULL dereference Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz" ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation rt2x00: usb: fix reset resume rtlwifi: pass rx setup error code to caller ...
2012-10-26net/mlx4_core: Unmap UAR also in the case of error flowDotan Barak
If a failure takes place during the EQ creation, we need to unmap the UAR memory block too. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Uri Habusha <urih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26net/mlx4_en: Don't use vlan tag value as an indication for vlan presenceMoni Shoua
The vlan tag can be zero. This is why it can't serve as an indication that packet requires VLAN header in the TX flow. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26net/mlx4_en: Fix double-release-range in tx-ringsJack Morgenstein
The QP range is reserved as a single block. However, when freeing the en resources, the tx-ring QPs are released both in mlx4_en_destroy_tx_ring (one at a time) and in mlx4_en_free_resources (as a block release). Fix by eliminating the one-at-a-time release in mlx4_en_destroy_tx_ring. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23mlx4_core: Perform correct resource cleanup if mlx4_QUERY_ADAPTER() failsDotan Barak
Fixed the resource cleanup to act correctly and prevent a kernel oops when mlx4_QUERY_ADAPTER() fails. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-23mlx4_core: Remove annoying debug messages from SR-IOV flowOr Gerlitz
These debug prints left behind by commits c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests"), 54679e148287 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop") and 993c401e2079 ("mlx4_core: Add IB port-state machine and port mgmt event propagation") make it pretty hard to actually use the mlx4_core debug messages when running in SRIOV/IB mode -- for example, the module load sequence of a device with one VF yielded 631 debug prints, with 408 of them being from this set. Let's just remove them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03mlx4_core: Adjust flow steering attach wrapper so that IB works on SR-IOV VFsJack Morgenstein
Currently, the InfiniBand stack does not support flow steering at the verbs level -- the only usage of flow steering in the IB driver is for L2 multicast attaches. We need to add the IB case to procedure mlx4_QP_FLOW_STEERING_ATTACH_wrapper() to allow IPoIB to work on VFs on ConnectX-3 when flow steering is enabled. Currently, the IB case in mlx4_QP_FLOW_STEERING_ATTACH_wrapper() is missing, so the procedure returns -EINVAL and IPoIB on VFs breaks. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes" This merge also removes a new use of __cancel_delayed_work(), and replaces it with the regular cancel_delayed_work() that is now irq-safe thanks to the workqueue updates. That said, I suspect the sequence in question should probably use "mod_delayed_work()". I just did the minimal "don't use deprecated functions" fixup, though. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits) IB/qib: Fix local access validation for user MRs mlx4_core: Disable SENSE_PORT for multifunction devices mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs mlx4_core: Stash PCI ID driver_data in mlx4_priv structure IB/srp: Avoid having aborted requests hang IB/srp: Fix use-after-free in srp_reset_req() IB/qib: Add a qib driver version RDMA/nes: Fix compilation error when nes_debug is enabled RDMA/nes: Print hardware resource type RDMA/nes: Fix for crash when TX checksum offload is off RDMA/nes: Cosmetic changes RDMA/nes: Fix for incorrect MSS when TSO is on RDMA/nes: Fix incorrect resolving of the loopback MAC address mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem mlx4_core: Trivial cleanups to driver log messages mlx4_core: Trivial readability fix: "0X30" -> "0x30" IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations mlx4: Paravirtualize Node Guids for slaves mlx4: Activate SR-IOV mode for IB ...
2012-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
2012-10-02Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...
2012-10-02Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' ↵Roland Dreier
and 'srp' into for-linus
2012-10-01mlx4: dont orphan skbs in mlx4_en_xmit()Eric Dumazet
After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX completions) we no longer need to orphan skbs in mlx4_en_xmit() since skb wont stay a long time in TX ring before their release. Orphaning skbs in ndo_start_xmit() should be avoided as much as possible, since it breaks TCP Small Queue or other flow control mechanisms (per socket limits) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds
Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)" * tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits) PCI: acpiphp: Handle PCIe ports without native hotplug capability PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots PCI/ACPI: Protect acpi_pci_roots list with mutex PCI/ACPI: Use acpi_pci_root info rather than looking it up again PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface PCI/ACPI: Protect acpi_pci_drivers list with mutex PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges PCI/ACPI: Use normal list for struct acpi_pci_driver PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots PCI: Fix default vga ref_count ia64/PCI: Clear host bridge aperture struct resource x86/PCI: Clear host bridge aperture struct resource PCI: Stop all children first, before removing all children Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()" PCI: Provide a default pcibios_update_irq() PCI: Discard __init annotations for pci_fixup_irqs() and related functions PCI: Use correct type when freeing bus resource list PCI: Check P2P bridge for invalid secondary/subordinate range PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot() ...
2012-10-01mlx4_core: Disable SENSE_PORT for multifunction devicesRoland Dreier
In the current driver, the SENSE_PORT firmware command is issued as a "wrapped" command, but the command handling code doesn't have a wrapper, so it will never do anything other than log an error message. The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT capability even in multi-function (SR-IOV) mode, so the driver will try to issue the command. At least until the driver has a proper wrapper for SENSE_PORT, make sure we disable the command for multi-function devices. Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAsRoland Dreier
Instead of having a hard-coded "PCI device ID != 0x1003" (which obviously breaks as newer devices with ID != 0x1003 become available), instead let's set a flag in our PCI device table for the older devices where we're supposed to force using SENSE_PORT. This also avoids enabling SENSE_PORT for virtual functions by mistake. Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4_core: Stash PCI ID driver_data in mlx4_priv structureRoland Dreier
That way we can check flags later on, when we've finished with the pci_device_id structure. Also convert the "is VF" flag to an enum: "Never do in the preprocessor what can be done in C." Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4_core: Fix crash on uninitialized priv->cmd.slave_semRoland Dreier
On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca() before mlx4_multi_func_init(). However, for unlucky configurations, mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set. However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd() calls into mlx4_slave_cmd(), and that immediately tries to do down(&priv->cmd.slave_sem); but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init() (which we haven't called yet). The next thing it tries to do is access priv->mfunc.vhcr, but that hasn't been allocated yet. Fix this by moving the initialization of slave_sem and vhcr up into mlx4_cmd_init(). Also, since slave_sem is really just being used as a mutex, convert it into a slave_cmd_mutex. Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4_core: Trivial cleanups to driver log messagesRoland Dreier
Also put format string onto one line. Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4_core: Trivial readability fix: "0X30" -> "0x30"Roland Dreier
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculationsJack Morgenstein
Previously, the structure of a guest's proxy QPs followed the structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1, qp1 port 2, ...). The guest then did offset calculations on the sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP(). This is now changed so that the guest does no offset calculations regarding proxy or tunnel QPs to use. This change frees the PPF from needing to adhere to a specific order in allocating proxy and tunnel QPs. Now QUERY_FUNC_CAP provides each port individually with its proxy qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are used directly where required (with no offset calculations). To accomplish this change, several fields were added to the phys_caps structure for use by the PPF and by non-SR-IOV mode: base_sqpn -- in non-sriov mode, this was formerly sqp_start. base_proxy_sqpn -- the first physical proxy qp number -- used by PPF base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF. The current code in the PPF still adheres to the previous layout of sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this layout without affecting VF or (paravirtualized) PF code. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01mlx4: Paravirtualize Node Guids for slavesJack Morgenstein
This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>