summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2007-05-29IB/cm: Fix stale connection detectionSean Hefty
The ib_cm can incorrectly detect a stale connection (a new connection request for a QPN that is already connected) as a duplicate connection request. Separate the handling of potential duplicate REQs from stale connections. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29IPoIB/cm: Fix performance regression on MellanoxMichael S. Tsirkin
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe performance regression on Mellanox cards, because keeping a QP in the error state for extended periods of time moves hardware to the slow path (until the QP is destroyed). For example, MPI latency goes from ~3 usecs to ~7 usecs. Fix this by posting a send WR on one of the QPs that are being flushed, instead of using a separate drain QP that is kept in the error state. This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>, reported and bisected by Scott Weitzenkamp at Cisco and debugged by Sasha Mikheev at Voltaire. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29IB/mthca: Fix handling of send CQE with error for QPs connected to SRQMichael S. Tsirkin
mthca_free_err_wqe() currently treats both send and receive CQEs identically if a QP is using an SRQ. But for Tavor hardware, send CQEs with error can be chained together even if the RQ is part of SRQ, so we may miss some CQEs. Fix by following the WQE chain for all send CQEs even for non-SRQ QPs. This fixes crashes in IPoIB CM: <https://bugs.openfabrics.org//show_bug.cgi?id=604> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Drain cq in ipoib_cm_dev_stop()Michael S. Tsirkin
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running, ipoib_cm_dev_stop() must poll the CQ itself in order to see the packets draining. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()Michael S. Tsirkin
time_after() was used backwards, so the timeout occurred immediately. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IB/ehca: Fix number of send WRs reported for new QPStefan Roscher
Due to a typo, the driver was reporting the wrong number of "actual send WRs" after ehca_create_qp(). Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24IB/mlx4: Initialize send queue entry ownership bitsEli Cohen
We need to initialize the owner bit of send queue WQEs to hardware ownership whenever the QP is modified from reset to init, not just when the QP is first allocated. This avoids having the hardware process stale WQEs when the QP is moved to reset but not destroyed and then modified to init again. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-23IB/mlx4: Don't allocate RQ doorbell if using SRQRoland Dreier
If a QP is attached to a shared receive queue (SRQ), then it doesn't have a receive queue (RQ). So don't allocate an RQ doorbell (or map a doorbell from userspace for userspace QPs) for that QP. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/cm: Improve local id allocation IPoIB/cm: Fix SRQ WR leak IB/ipoib: Fix typos in error messages IB/mlx4: Check if SRQ is full when posting receive IB/mlx4: Pass send queue sizes from userspace to kernel IB/mlx4: Fix check of opcode in mlx4_ib_post_send() mlx4_core: Fix array overrun in dump_dev_cap_flags() IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions IB/mthca: Fix RESET to ERROR transition IB/mlx4: Set GRH:HopLimit when sending globally routed MADs IB/mthca: Set GRH:HopLimit when building MLX headers IB/mlx4: Fix check of max_qp_dest_rdma in modify QP IB/mthca: Fix use-after-free on device restart IB/ehca: Return proper error code if register_mr fails IPoIB: Handle P_Key table reordering IB/core: Use start_port() and end_port() IB/core: Add helpers for uncached GID and P_Key searches IB/ipath: Fix potential deadlock with multicast spinlocks IB/core: Free umem when mm is already gone
2007-05-21IB/cm: Improve local id allocationMichael S. Tsirkin
The IB CM uses an idr for local id allocations, with a running counter as start_id. This fails to generate distinct ids if 1. An id is constantly created and destroyed 2. A chunk of ids just beyond the current next_id value is occupied This in turn leads to an increased chance of connection request being mis-detected as a duplicate, sometimes for several retries, until next_id gets past the block of allocated ids. This has been observed in practice. As a fix, remember the last id allocated and start immediately above it. This also fixes a problem with the old code, where next_id might overflow and become negative. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21IPoIB/cm: Fix SRQ WR leakMichael S. Tsirkin
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on and off will, with time, leak out all WRs and then all connections will start getting RNR NAKs. Fix this in the way suggested by spec: move the QP being destroyed to the error state, wait for "Last WQE Reached" event and then post WR on a "drain QP" connected to the same CQ. Once we observe a completion on the drain QP, it's safe to call ib_destroy_qp. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21IB/ipoib: Fix typos in error messagesMichael S. Tsirkin
Trivial error message fixups. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21Detach sched.h from mm.hAlexey Dobriyan
First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21IB/mlx4: Check if SRQ is full when posting receiveRoland Dreier
Make mlx4_post_srq_recv() fail if the SRQ is full (head == tail). Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-20IB/mlx4: Pass send queue sizes from userspace to kernelEli Cohen
Pass the number of WQEs for the send queue and their size from userspace to the kernel to avoid having to keep the QP size calculations in sync between the kernel driver and libmlx4. This fixes a bug seen with the current mlx4_ib driver and current libmlx4 caused by a difference in the calculated sizes for SQ WQEs. Also, this gives more flexibility for userspace to experiment with using multiple WQE BBs for a single SQ WQE. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mlx4: Fix check of opcode in mlx4_ib_post_send()Roland Dreier
wr->opcode is invalid if it's >= ARRAY_SIZE(mlx4_ib_opcode), not just strictly >. This was spotted by the Coverity checker (CID 1643). Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mlx4: Fix RESET to RESET and RESET to ERROR transitionsMichael S. Tsirkin
According to the IB spec, a QP can be moved from RESET back to RESET or to the ERROR state, but mlx4 firmware does not support this and returns an error if we try. Fix the RESET to RESET transition by just returning 0 without doing anything, and fix RESET to ERROR by moving the QP from RESET to INIT with dummy parameters and then transitioning from INIT to ERROR. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mthca: Fix RESET to ERROR transitionMichael S. Tsirkin
According to the IB spec, a QP can be moved from RESET to the ERROR state, but mthca firmware does not support this and returns an error if we try. Work around this FW limitation by moving the QP from RESET to INIT with dummy parameters and then transitioning from INIT to ERROR. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mlx4: Set GRH:HopLimit when sending globally routed MADsRoland Dreier
This is the same issue discovered in mthca by Rolf Manderscheid <rvm@obsidianresearch.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mthca: Set GRH:HopLimit when building MLX headersRolf Manderscheid
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit of zero, causing them to be dropped by the router. The problem is a missing initialization of the hop_limit field in mthca_read_ah(), which was called by build_mlx_header() when sending a MAD on QP1. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mlx4: Fix check of max_qp_dest_rdma in modify QPEli Cohen
max_qp_dest_rdma is already in natural units - no need to shift. This was discovered by a test that deliberately requests more outstanding atomic operation than the device supports. Found by Sagi Rotem at Mellanox. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/mthca: Fix use-after-free on device restartAli Ayoub
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/ehca: Return proper error code if register_mr failsHoang-Nam Nguyen
Set the return code of ehca_register_mr() to ENOMEM if the corresponding firmware call fails due to out of resources. Some other error codes were explicitly mapped to EINVAL -- just remove those cases so they get mapped to the default case, which already returns EINVAL anyway. Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IPoIB: Handle P_Key table reorderingYosef Etigin
SM reconfiguration or failover possibly causes a shuffling of the values in the P_Key table. Right now, IPoIB only queries for the P_Key index once when it creates the device QP, and hence there are problems if the index of a P_Key value changes. Fix this by using the PKEY_CHANGE event to trigger a recheck of the P_Key index. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/core: Use start_port() and end_port()Roland Dreier
Clean up ib_query_port() and ib_modify_port() slightly by using the just-added start_port() and end_port() helpers. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/core: Add helpers for uncached GID and P_Key searchesYosef Etigin
Add ib_find_gid() and ib_find_pkey() functions that use uncached device queries. The calls might block but the returns are always up-to-date. Cache P_Key and GID table lengths in core to avoid extra port info queries. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/ipath: Fix potential deadlock with multicast spinlocksRoland Dreier
Lockdep found the following potential deadlock between mcast_lock and n_mcast_grps_lock: mcast_lock is taken from both interrupt context and process context, so spin_lock_irqsave() must be used to take it. n_mcast_grps_lock is only taken from process context, so at first it seems safe to take it with plain spin_lock(); however, it also nests inside mcast_lock, and hence we could deadlock: cpu A cpu B ipath_mcast_add(): spin_lock_irq(&mcast_lock); ipath_mcast_detach(): spin_lock(&n_mcast_grps_lock); <enter interrupt> ipath_mcast_find(): spin_lock_irqsave(&mcast_lock); spin_lock(&n_mcast_grps_lock); Fix this by using spin_lock_irq() to take n_mcast_grps_lock. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19IB/core: Free umem when mm is already goneEli Cohen
Free umem when task's mm is already destroyed by the time ib_umem_release gets called. Found by Dotan Barak at Mellanox. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-15Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IPoIB/cm: Optimize stale connection detection IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ IB/mthca: Fix posting >255 recv WRs for Tavor RDMA/cma: Add check to validate that cm_id is bound to a device RDMA/cma: Fix synchronization with device removal in cma_iw_handler RDMA/cma: Simplify device removal handling code IB/ehca: Disable scaling code by default, bump version number IB/ehca: Beautify sysfs attribute code and fix compiler warnings IB/ehca: Remove _irqsave, move #ifdef IB/ehca: Fix AQP0/1 QP number IB/ehca: Correctly set GRH mask bit in ehca_modify_qp() IB/ehca: Serialize hypervisor calls in ehca_register_mr() IB/ipath: Shadow the gpio_mask register IB/mlx4: Fix uninitialized spinlock for 32-bit archs mlx4_core: Remove unused doorbell_lock net: Trivial MLX4_DEBUG dependency fix.
2007-05-14IPoIB/cm: Optimize stale connection detectionMichael S. Tsirkin
In the presence of some running RX connections, we repeat queue_delayed_work calls each 4 RX WRs, which is a waste. It's enough to start stale task when a first passive connection is added, and rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding passive connections. This removes some code from RX data path. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQMichael S. Tsirkin
mthca_cq_clean() updates the CQ consumer index without moving CQEs back to HW ownership. As a result, the same WRID might get reported twice, resulting in a use-after-free. This was observed in IPoIB CM. Fix by moving all freed CQEs to HW ownership. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/mthca: Fix posting >255 recv WRs for TavorMichael S. Tsirkin
Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must be updated each doorbell, otherwise the next doorbell will use an incorrect index. Found by Ronni Zimmermann at Mellanox. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14RDMA/cma: Add check to validate that cm_id is bound to a deviceSean Hefty
Several checks in the rdma_cm check against the state of the cm_id, but only to validate that the cm_id is bound to an underlying transport specific CM and an RDMA device. Make the check explicit in what we're trying to check for, since we're not synchronizing against the cm_id state. This will allow a user to disconnect a cm_id or reject a connection after receiving a device removal event. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14RDMA/cma: Fix synchronization with device removal in cma_iw_handlerSean Hefty
The cma_iw_handler needs to validate the state of the rdma_cm_id before processing a new connection request to ensure that a device removal is not already being processed for the same rdma_cm_id. Without the state check, the user can receive simultaneous callbacks for the same cm_id, or a callback after they've destroyed the cm_id. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14RDMA/cma: Simplify device removal handling codeSean Hefty
Add a new routine and rename another to encapsulate common code for synchronizing with device removal. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Disable scaling code by default, bump version numberJoachim Fenkes
- Scaling code is still considered experimental, so disable it by default - Increase version to SVNEHCA_0023 Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Beautify sysfs attribute code and fix compiler warningsJoachim Fenkes
eHCA's sysfs attributes are now being created via sysfs_create_group(), making the process neatly table-driven. The return value is checked, thus fixing a few compiler warnings. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Remove _irqsave, move #ifdefJoachim Fenkes
- In ehca_process_eq(), we're IRQ safe throughout the whole function, so we don't need another _irqsave in the middle of flight. - take_over_work() is only called by comp_pool_callback(), so it can move into the same #ifdef block. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Fix AQP0/1 QP numberHoang-Nam Nguyen
AQP0/1 should report qp_num={0|1} and the actual QP# should be stored in struct ehca_qp, not the other way round. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Correctly set GRH mask bit in ehca_modify_qp()Joachim Fenkes
The driver needs to always supply the "GRH present" flag to the hypervisor, whether it's true or false. Not supplying it (i.e. not setting the corresponding mask bit) amounts to a "perhaps", which we don't want. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ehca: Serialize hypervisor calls in ehca_register_mr()Stefan Roscher
Some pSeries hypervisor versions show a race condition in the allocate MR hCall. Serialize this call per adapter to circumvent this problem. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/ipath: Shadow the gpio_mask registerArthur Jones
Once upon a time, GPIO interrupts were rare. But then a chip bug in the waldo series forced the use of a GPIO interrupt to signal packet reception. This greatly increased the frequency of GPIO interrupts which have the gpio_mask bits set on the waldo chips. Other bits in the gpio_status register are used for I2C clock and data lines, these bits are usually on. An "unlikely" annotation leftover from the old days was improperly applied to these bits, and an unnecessary chip mmio read was being accessed in the interrupt fast path on waldo. Remove the stagnant unlikely annotation in the interrupt handler and keep a shadow copy of the gpio_mask register to avoid the slow mmio read when testing for interruptable GPIO bits. Signed-off-by: Arthur Jones <arthur.jones@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14IB/mlx4: Fix uninitialized spinlock for 32-bit archsJack Morgenstein
uar_lock spinlock was used in mlx4_ib_cq_arm without being initialized (this only affects 32-bit archs, because uar_lock is not used on 64-bit archs and MLX4_INIT_DOORBELL_LOCK() is a NOP). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-10[S390] Kconfig: menus with depends on HAS_IOMEM.Martin Schwidefsky
Add "depends on HAS_IOMEM" to a number of menus to make them disappear for s390 which does not have I/O memory. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-10Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters IB: Put rlimit accounting struct in struct ib_umem IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adaptersRoland Dreier
Add an InfiniBand driver for Mellanox ConnectX adapters. Because these adapters can also be used as ethernet NICs and Fibre Channel HBAs, the driver is split into two modules: mlx4_core: Handles low-level things like device initialization and processing firmware commands. Also controls resource allocation so that the InfiniBand, ethernet and FC functions can share a device without stepping on each other. mlx4_ib: Handles InfiniBand-specific things; plugs into the InfiniBand midlayer. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-09IB: Put rlimit accounting struct in struct ib_umemRoland Dreier
When memory pinned with ib_umem_get() is released, ib_umem_release() needs to subtract the amount of memory being unpinned from mm->locked_vm. However, ib_umem_release() may be called with mm->mmap_sem already held for writing if the memory is being released as part of an munmap() call, so it is sometimes necessary to defer this accounting into a workqueue. However, the work struct used to defer this accounting is dynamically allocated before it is queued, so there is the possibility of failing that allocation. If the allocation fails, then ib_umem_release has no choice except to bail out and leave the process with a permanently elevated locked_vm. Fix this by allocating the structure to defer accounting as part of the original struct ib_umem, so there's no possibility of failing a later allocation if creating the struct ib_umem and pinning memory succeeds. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-09IB/uverbs: Export ib_umem_get()/ib_umem_release() to modulesRoland Dreier
Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits) [POWERPC] Abolish powerpc_flash_init() [POWERPC] Early serial debug support for PPC44x [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc [POWERPC] Add device tree for Ebony [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now [POWERPC] MPIC U3/U4 MSI backend [POWERPC] MPIC MSI allocator [POWERPC] Enable MSI mappings for MPIC [POWERPC] Tell Phyp we support MSI [POWERPC] RTAS MSI implementation [POWERPC] PowerPC MSI infrastructure [POWERPC] Rip out the existing powerpc msi stubs [POWERPC] Remove use of 4level-fixup.h for ppc32 [POWERPC] Add powerpc PCI-E reset API implementation [POWERPC] Holly bootwrapper [POWERPC] Holly DTS [POWERPC] Holly defconfig [POWERPC] Add support for 750CL Holly board [POWERPC] Generalize tsi108 PCI setup [POWERPC] Generalize tsi108 PHY types ... Fixed conflict in include/asm-powerpc/kdebug.h manually Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>