summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp/ipoib
AgeCommit message (Collapse)Author
2017-11-15IB/ipoib: Change list_del to list_del_init in the tx objectFeras Daoud
[ Upstream commit 27d41d29c7f093f6f77843624fbb080c1b4a8b9c ] Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function belong to different work queues, they can run in parallel. In this case if ipoib_cm_tx_reap calls list_del and release the lock, ipoib_cm_tx_start may acquire it and call list_del_init on the already deleted object. Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem. Fixes: 839fcaba355a ("IPoIB: Connected mode experimental support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: Replace list_del of the neigh->list with list_del_initFeras Daoud
[ Upstream commit c586071d1dc8227a7182179b8e50ee92cc43f6d2 ] In order to resolve a situation where a few process delete the same list element in sequence and cause panic, list_del is replaced with list_del_init. In this case if the first process that calls list_del releases the lock before acquiring it again, other processes who can acquire the lock will call list_del_init. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: rtnl_unlock can not come after free_netdevFeras Daoud
[ Upstream commit 89a3987ab7a923c047c6dec008e60ad6f41fac22 ] The ipoib_vlan_add function calls rtnl_unlock after free_netdev, rtnl_unlock not only releases the lock, but also calls netdev_run_todo. The latter function browses the net_todo_list array and completes the unregistration of all its net_device instances. If we call free_netdev before rtnl_unlock, then netdev_run_todo call over the freed device causes panic. To fix, move rtnl_unlock call before free_netdev call. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: Fix deadlock over vlan_mutexFeras Daoud
[ Upstream commit 1c3098cdb05207e740715857df7b0998e372f527 ] This patch fixes Deadlock while executing ipoib_vlan_delete. The function takes the vlan_rwsem semaphore and calls unregister_netdevice. The later function calls ipoib_mcast_stop_thread that cause workqueue flush. When the queue has one of the ipoib_ib_dev_flush_xxx events, a deadlock occur because these events also tries to catch the same vlan_rwsem semaphore. To fix, unregister_netdevice should be called after releasing the semaphore. Fixes: cbbe1efa4972 ("IPoIB: Fix deadlock between ipoib_open() and child interface create") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/IPoIB: ibX: failed to create mcg debug fileShamir Rabinovitch
commit 771a52584096c45e4565e8aabb596eece9d73d61 upstream. When udev renames the netdev devices, ipoib debugfs entries does not get renamed. As a result, if subsequent probe of ipoib device reuse the name then creating a debugfs entry for the new device would fail. Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part of ipoib event handling in order to avoid any race condition between these. Fixes: 1732b0ef3b3a ([IPoIB] add path record information in debugfs) Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/IPoIB: Add destination address when re-queue packetErez Shitrit
commit 2b0841766a898aba84630fb723989a77a9d3b4e6 upstream. When sending packet to destination that was not resolved yet via path query, the driver keeps the skb and tries to re-send it again when the path is resolved. But when re-sending via dev_queue_xmit the kernel doesn't call to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb and to put the destination address inside them. In that way the dev_start_xmit will have the correct destination, and the driver won't take the destination from the skb->data, while nothing exists there, which causes to packet be be dropped. The test flow is: 1. Run the SM on remote node, 2. Restart the driver. 4. Ping some destination, 3. Observe that first ICMP request will be dropped. Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/ipoib: Fix deadlock between rmmod and set_modeFeras Daoud
commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream. When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26IB/IPoIB: Remove can't use GFP_NOIO warningKamal Heib
commit 0b59970e7d96edcb3c7f651d9d48e1a59af3c3b0 upstream. Remove the warning print of "can't use of GFP_NOIO" to avoid prints in each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO. This print become more annoying when the IPoIB interface is configured to work in connected mode. Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09IPoIB: Avoid reading an uninitialized member variableBart Van Assche
commit 11b642b84e8c43e8597de031678d15c08dd057bc upstream. This patch avoids that Coverity reports the following: Using uninitialized value port_attr.state when calling printk Fixes: commit 94232d9ce817 ("IPoIB: Start multicast join process only on active ports") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-14IB/ipoib: move back IB LL address into the hard headerPaolo Abeni
After the commit 9207f9d45b0a ("net: preserve IP control block during GSO segmentation"), the GSO CB and the IPoIB CB conflict. That destroy the IPoIB address information cached there, causing a severe performance regression, as better described here: http://marc.info/?l=linux-kernel&m=146787279825501&w=2 This change moves the data cached by the IPoIB driver from the skb control lock into the IPoIB hard header, as done before the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses"). In order to avoid GRO issue, on packet reception, the IPoIB driver stash into the skb a dummy pseudo header, so that the received packets have actually a hard header matching the declared length. To avoid changing the connected mode maximum mtu, the allocated head buffer size is increased by the pseudo header length. After this commit, IPoIB performances are back to pre-regression value. v2 -> v3: rebased v1 -> v2: avoid changing the max mtu, increasing the head buf size Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-10Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07ipoib: Make ipoib_warn ratelimitedkernel@kyup.com
In certain cases it's possible to be flooded by warning messages. To cope with such situations make the ipoib_warn macro be ratelimited. To prevent accidental limiting of legitimate, bursty messages make the limit fairly liberal by allowing up to 100 messages in 10 seconds. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib_verbs: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues mulitple work items viz &priv->restart_task, &priv->cm.rx_reap_task, &priv->cm.skb_task, &priv->neigh_reap_task, &priv->ah_reap_task, &priv->mcast_task and &priv->carrier_on_task. The work items require strict execution ordering. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() replaces deprecated create_singlethread_workqueue(). The workqueue "ipoib_workqueue" that is used for all flush operations for the device. WQ_MEM_RECLAIM has been set since the flush operations may need to complete in order for other network functions to continue, and the memory reclaim operation might need the network functioning in order to make progress. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/ipoib: Don't allow MC joins during light MC flushAlex Vesker
This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c [18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0 [18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20 [18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40 [18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80 [18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30 [18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50 [18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170 [18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0 [18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50 [18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0 [18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0 [18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0 [18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/ipoib: Fix memory corruption in ipoib cm mode connect flowErez Shitrit
When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford
2016-08-04IB/ipoib: Report SG feature regardless of HW UD CSUM capabilityYuval Shaia
Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit ec5f06156423 ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/ipoib: Use new device FW version stringIra Weiny
Using this allows for devices to specify the format of their firmware version rather than forcing a format. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Don't update neigh validity for unresolved entriesErez Shitrit
ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Disable bottom half when dealing with device addressMark Bloch
Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Fix race between ipoib_remove_one to sysfs functionsErez Shitrit
In ipoib_remove_one the driver holds the rtnl_lock and tries to do some operation like dev_change_flags or unregister_netdev, while sysfs callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to free its sysfs directory, and we will get a->b, b->a deadlock. Trace like the following: schedule+0x37/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0xb5/0x120 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x17c/0x320 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib] delete_child+0x54/0x80 [ib_ipoib] dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 mutex_lock+0x16/0x40 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x16/0x75 And schedule+0x37/0x80 __kernfs_remove+0x1a8/0x260 ? wake_atomic_t_function+0x60/0x60 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x50/0x80 kobject_del+0x18/0x50 device_del+0x19f/0x260 netdev_unregister_kobject+0x6a/0x80 rollback_registered_many+0x1fd/0x340 rollback_registered+0x3c/0x70 unregister_netdevice_queue+0x55/0xc0 unregister_netdev+0x20/0x30 ipoib_remove_one+0x114/0x1b0 [ib_ipoib] ib_unregister_client+0x4a/0x170 [ib_core] ? find_module_all+0x71/0xa0 ipoib_cleanup_module+0x10/0x94 [ib_ipoib] SyS_delete_module+0x1b5/0x210 entry_SYSCALL_64_fastpath+0x16/0x75 The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to get out from the sysfs function. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-28Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "This is the second group of code for the 4.7 merge window. It looks large, but only in one sense. I'll get to that in a minute. The list of changes here breaks down as follows: - Dynamic counter infrastructure in the IB drivers This is a sysfs based code to allow free form access to the hardware counters RDMA devices might support so drivers don't need to code this up repeatedly themselves - SendOnlyFullMember multicast support - IB router support - A couple misc fixes - The big item on the list: hfi1 driver updates, plus moving the hfi1 driver out of staging There was a group of 15 patches in the hfi1 list that I thought I had in the first pull request but they weren't. So that added to the length of the hfi1 section here. As far as these go, everything but the hfi1 is pretty straight forward. The hfi1 is, if you recall, the driver that Al had complaints about how it used the write/writev interfaces in an overloaded fashion. The write portion of their interface behaved like the write handler in the IB stack proper and did bi-directional communications. The writev interface, on the other hand, only accepts SDMA request structures. The completions for those structures are sent back via an entirely different event mechanism. With the security patch, we put security checks on the write interface, however, we also knew they would be going away soon. Now, we've converted the write handler in the hfi1 driver to use ioctls from the IB reserved magic area for its bidirectional communications. With that change, Intel has addressed all of the items originally on their TODO when they went into staging (as well as many items added to the list later). As such, I moved them out, and since they were the last item in the staging/rdma directory, and I don't have immediate plans to use the staging area again, I removed the staging/rdma area. Because of the move out of staging, as well as a series of 5 patches in the hfi1 driver that removed code people thought should be done in a different way and was optional to begin with (a snoop debug interface, an eeprom driver for an eeprom connected directory to their hfi1 chip and not via an i2c bus, and a few other things like that), the line count, especially the removal count, is high" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits) staging/rdma: Remove the entire rdma subdirectory of staging IB/core: Make device counter infrastructure dynamic IB/hfi1: Fix pio map initialization IB/hfi1: Correct 8051 link parameter settings IB/hfi1: Update pkey table properly after link down or FM start IB/rdamvt: Fix rdmavt s_ack_queue sizing IB/rdmavt: Max atomic value should be a u8 IB/hfi1: Fix hard lockup due to not using save/restore spin lock IB/hfi1: Add tracing support for send with invalidate opcode IB/hfi1, qib: Add ieth to the packet header definitions IB/hfi1: Move driver out of staging IB/hfi1: Do not free hfi1 cdev parent structure early IB/hfi1: Add trace message in user IOCTL handling IB/hfi1: Remove write(), use ioctl() for user cmds IB/hfi1: Add ioctl() interface for user commands IB/hfi1: Remove unused user command IB/hfi1: Remove snoop/diag interface IB/hfi1: Remove EPROM functionality from data device IB/hfi1: Remove UI char device IB/hfi1: Remove multiple device cdev ...
2016-05-25IB/IPoIB: Allow setting the device addressMark Bloch
In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25IB/ipoib: Support SendOnlyFullMember MCG for SendOnly joinErez Shitrit
Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-20Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits) IB/mlx5: Fire the CQ completion handler from tasklet net/mlx5_core: Use tasklet for user-space CQ completion events IB/core: Do not require CAP_NET_ADMIN for packet sniffing IB/mlx4: Fix unaligned access in send_reply_to_slave IB/mlx5: Report Scatter FCS device capability when supported IB/mlx5: Add Scatter FCS support for Raw Packet QP IB/core: Add Scatter FCS create flag IB/core: Add Raw Scatter FCS device capability IB/core: Add extended device capability flags i40iw: pass hw_stats by reference rather than by value i40iw: Remove unnecessary synchronize_irq() before free_irq() i40iw: constify i40iw_vf_cqp_ops structure IB/mlx5: Add UARs write-combining and non-cached mapping IB/mlx5: Allow mapping the free running counter on PROT_EXEC IB/mlx4: Use list_for_each_entry_safe IB/SA: Use correct free function IB/core: Fix a potential array overrun in CMA and SA agent IB/core: Remove unnecessary check in ibnl_rcv_msg IB/IWPM: Fix a potential skb leak RDMA/nes: replace custom print_hex_dump() ...
2016-05-13IB/ipoib: Add readout of statistics using ethtoolHans Westgaard Ry
IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13infiniband/ulp/ipoib: remove pkey_mutexSebastian Andrzej Siewior
The last user of pkey_mutex was removed in db84f8803759 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-04treewide: replace dev->trans_start update with helperFlorian Westphal
Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-xtensa@linux-xtensa.org Cc: linux1394-devel@lists.sourceforge.net Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: MPT-FusionLinux.pdl@broadcom.com Cc: linux-scsi@vger.kernel.org Cc: linux-can@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-hams@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: linux-bluetooth@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04drivers: replace dev->trans_start accesses with dev_trans_startFlorian Westphal
a trans_start struct member exists twice: - in struct net_device (legacy) - in struct netdev_queue Instead of open-coding dev->trans_start usage to obtain the current trans_start value, use dev_trans_start() instead. This is not exactly the same, as dev_trans_start also considers the trans_start values of the netdev queues owned by the device and provides the most recent one. For legacy devices this doesn't matter as dev_trans_start can cope with netdev trans_start values of 0 (they are ignored). This is a prerequisite to eventual removal of dev->trans_start. Cc: linux-rdma@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21IB/ipoib: Allow mcast packets from other VFsEli Cohen
With SRIOV enabled, two VFs on the same HCA which have the same port LID and may have the same QP number. To enable receiving multicasts from such VFs, further qualify the check: ignore the receive only if, in addition, the packet source gid equals the receiving VF's source gid. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/ipoib: Add ndo operations for configuring VFsEli Cohen
Add ndo operations to the network driver that enables configuring the following operations: ipoib_set_vf_link_state - configure the VF link policy ipoib_get_vf_config - get link state configuration ipoib_set_vf_guid - set a VF port or node GUID ipoib_get_vf_stats - get statistics of a VF Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/{core, ulp} Support above 32 possible device capability flagsLeon Romanovsky
The old bitwise device_cap_flags variable was limited to u32 which has all bits already defined. In order to overcome it, we converted device_cap_flags variable to be u64 type. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03IB/ipoib: Add handling for sending of skb with many fragsHans Westgaard Ry
IPoIB converts skb-fragments to sge adding 1 extra sge when SG is enabled. Current codepath assumes that the max number of sge a device support is at least MAX_SKB_FRAGS+1, there is no interaction with upper layers to limit number of fragments in an skb if a device suports fewer sges. The assumptions also lead to requesting a fixed number of sge when IPoIB creates queue-pairs with SG enabled. A fallback/slowpath is implemented using skb_linearize to handle cases where the conversion would result in more sges than supported. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-12IB/ipoib: fix for rare multicast join race conditionAlex Estrin
A narrow window for race condition still exist between multicast join thread and *dev_flush workers. A kernel crash caused by prolong erratic link state changes was observed (most likely a faulty cabling): [167275.656270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib] [167275.674443] PGD 0 [167275.677373] Oops: 0000 [#1] SMP ... [167275.977530] Call Trace: [167275.982225] [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib] [167275.992024] [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490 [ib_ipoib] [167276.002149] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [167276.010754] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [167276.019088] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [167276.027737] [<ffffffff810a5aef>] kthread+0xcf/0xe0 Here was a hit spot: ipoib_mcast_join() { .............. rec.qkey = priv->broadcast->mcmember.qkey; ^^^^^^^ ..... } Proposed patch should prevent multicast join task to continue if link state change is detected. Signed-off-by: Alex Estrin <alex.estrin@intel.com> Changes from v4: - as suggested by Doug Ledford, optimized spinlock usage, i.e. ipoib_mcast_join() is called with lock held. Changes from v3: - sync with priv->lock before flag check. Chages from v2: - Move check for OPER_UP flag state to mcast_join() to ensure no event worker is in progress. - minor style fixes. Changes from v1: - No need to lock again if error detected. Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-04IB/IPoIB: Do not set skb truesize since using one linearskbCarol L Soto
We are seeing this warning: at net/core/skbuff.c:4174 and before commit a44878d10063 ("IB/ipoib: Use one linear skb in RX flow") skb truesize was not being set when ipoib was using just one skb. Removing this line avoids the warning when running tcp tests like iperf. Fixes: a44878d10063 ("IB/ipoib: Use one linear skb in RX flow") Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/IPoIB: Fix kernel panic on multicast flowErez Shitrit
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the parameter mcast->dev. That mcast is a temporary (used as an iterator) variable that may be uninitialized. There is no need to send the variable dev to the function, as each mcast has its dev as a member in the mcast struct. This causes the next panic: RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib] RSP: 0018: EFLAGS: 00010246 RAX: f0201 RBX: 24e00 RCX: 00000 .... .... Stack: Call Trace: ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib] ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib] process_one_work+0x164/0x470 worker_thread+0x11d/0x420 ... Fixes: 5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/IPoIB: Move multicast specific code out of ipoib_main.cChristoph Lameter
Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/IPoIB: factor out common multicast list removal codeChristoph Lameter
Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-22Merge branches '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/iser/iser_verbs.c
2015-12-22IB/ulps: Avoid calling ib_query_deviceOr Gerlitz
Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-11IB: add a proper completion queue abstractionChristoph Hellwig
This adds an abstraction that allows ULPs to simply pass a completion object and completion callback with each submitted WR and let the RDMA core handle the nitty gritty details of how to handle completion interrupts and poll the CQ. In detail there is a new ib_cqe structure which just contains the completion callback, and which can be used to get at the containing object using container_of. It is pointed to by the WR and WC as an alternative to the wr_id field, similar to how many ULPs already use the field to store a pointer using casts. A driver using the new completion callbacks allocates it's CQs using the new ib_create_cq API, which in addition to the number of CQEs and the completion vectors also takes a mode on how we poll for CQEs. Three modes are available: direct for drivers that never take CQ interrupts and just poll for them, softirq to poll from softirq context using the to be renamed blk-iopoll infrastructure which takes care of rearming and budgeting, or a workqueue for consumer who want to be called from user context. Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote the current version of the workqueue code because my two previous attempts sucked too much and converted the iSER initiator to the new API. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-10-29Merge branch 'wr-cleanup' into k.o/for-4.4Doug Ledford
2015-10-22IB/core: Add netdev and gid attributes paramteres to cacheMatan Barak
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-13IB/ipoib: For sendonly join free the multicast group on leaveChristoph Lameter
When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e05c4d (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-08IB: split struct ib_send_wrChristoph Hellwig
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-09-26IB/ipoib: increase the max mcast backlog queueDoug Ledford
When performing sendonly joins, we queue the packets that trigger the join until the join completes. This may take on the order of hundreds of milliseconds. It is easy to have many more than three packets come in during that time. Expand the maximum queue depth in order to try and prevent dropped packets during the time it takes to join the multicast group. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25IB/ipoib: Make sendonly multicast joins create the mcast groupDoug Ledford
Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25IB/ipoib: Expire sendonly multicast joinsChristoph Lameter
On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>