summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2014-07-17Merge branches 'cxgb4' and 'mlx5' into for-nextRoland Dreier
2014-07-17IB/mlx5: Enable "block multicast loopback" for kernel consumersOr Gerlitz
In commit f360d88a2efd, we advertise blocking multicast loopback to both kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB) to use it with their UD QPs. Fix that. Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support") Reported-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-13RDMA/cxgb4: Call iwpm_init() only onceSteve Wise
We need to only register with the iwpm core once. Currently it is being done for every adapter, which causes a failure for each adapter but the first, making multiple adapters unusable. Fixes: 9eccfe109b27 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08RDMA/cxgb4: Initialize the device status pageSteve Wise
The status page is mapped to user processes and allows sharing the device state between the kernel and user processes. This state isn't getting initialized and thus intermittently causes problems. Namely, the user process can mistakenly think the user doorbell writes are disabled which causes SQ work requests to never get fetched by HW. Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes"). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> # v3.15 Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08RDMA/cxgb4: Clean up connection on ARP errorHariprasad S
Based on origninal work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08RDMA/cxgb4: Fix skb_leak in reject_cr()Hariprasad S
Based on origninal work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - Add support for T10 PI pass-through between vhost-scsi + virtio-scsi (MST + Paolo + MKP + nab) - Add support for T10 PI in qla2xxx target mode (Quinn + MKP + hch + nab, merged through scsi.git) - Add support for percpu-ida pre-allocation in qla2xxx target code (Quinn + nab) - A number of iser-target fixes related to hardening the network portal shutdown path (Sagi + Slava) - Fix response length residual handling for a number of control CDBs (Roland + Christophe V.) - Various iscsi RFC conformance fixes in the CHAP authentication path (Tejas and Calsoft folks + nab) - Return TASK_SET_FULL status for tcm_fc(FCoE) DataIn + Response failures (Vasu + Jun + nab) - Fix long-standing ABORT_TASK + session reset hang (nab) - Convert iser-initiator + iser-target to include T10 bytes into EDTL (Sagi + Or + MKP + Mike Christie) - Fix NULL pointer dereference regression related to XCOPY introduced in v3.15 + CC'ed to v3.12.y (nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (34 commits) target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd vhost-scsi: Include prot_bytes into expected data transfer length TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire libiscsi, iser: Adjust data_length to include protection information scsi_cmnd: Introduce scsi_transfer_length helper target: Report correct response length for some commands target/sbc: Check that the LBA and number of blocks are correct in VERIFY target/sbc: Remove sbc_check_valid_sectors() Target/iscsi: Fix sendtargets response pdu for iser transport Target/iser: Fix a wrong dereference in case discovery session is over iser iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak target: Use complete_all for se_cmd->t_transport_stop_comp target: Set CMD_T_ACTIVE bit for Task Management Requests target: cleanup some boolean tests target/spc: Simplify INQUIRY EVPD=0x80 tcm_fc: Generate TASK_SET_FULL status for response failures tcm_fc: Generate TASK_SET_FULL status for DataIN failures iscsi-target: Reject mutual authentication with reflected CHAP_C iscsi-target: Remove no-op from iscsit_tpg_del_portal_group iscsi-target: Fix CHAP_A parameter list handling ...
2014-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-11libiscsi, iser: Adjust data_length to include protection informationSagi Grimberg
In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11Target/iscsi: Fix sendtargets response pdu for iser transportSagi Grimberg
In case the transport is iser we should not include the iscsi target info in the sendtargets text response pdu. This causes sendtargets response to include the target info twice. Modify iscsit_build_sendtargets_response to filter transport types that don't match. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Slava Shwartsman <valyushash@gmail.com> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11Target/iser: Fix a wrong dereference in case discovery session is over iserSagi Grimberg
In case the discovery session is carried over iser, we can't access the assumed network portal since the default portal is used. In this case we don't really need to allocate the fastreg pool, just prepare to the text pdu that will follow. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Alex Tabachnik <alext@mellanox.com> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11iw_cxgb4: don't truncate the recv window sizeHariprasad Shenai
Fixed a bug that shows up with recv window sizes that exceed the size of the RCV_BUFSIZ field in opt0 (>= 1024K). If the recv window exceeds this, then we specify the max possible in opt0, add add the rest in via a RX_DATA_ACK credits. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connectionsHariprasad Shenai
Select the appropriate hw mtu index and initial sequence number to optimize hw memory performance. Add new cxgb4_best_aligned_mtu() which allows callers to provide enough information to be used to [possibly] select an MTU which will result in the TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value. If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so that after the SYN the send seqno will align on a 4B boundary. The RTR message exchange will leave the send seqno aligned on an 8B boundary. If an RTR is not required, then align the ISS to 8B - 1. The goal is to have the send seqno be 8B aligned when we send the first FPDU. Based on original work by Casey Leedom <leeedom@chelsio.com> and Steve Wise <swise@opengridcomputing.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11iw_cxgb4: Allocate and use IQs specifically for indirect interruptsHariprasad Shenai
Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA RXQs, which also handle direct interrupts for offload CPLs during RDMA connection setup/teardown. The intended T4 usage model, however, is to have indirect interrupts flow through dedicated IQs. IE not to mix indirect interrupts with CPL messages in an IQ. This patch adds the concept of RDMA concentrator IQs, or CIQs, setup and maintained by the LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will flow through the LLD's RDMA RXQs, and CQ interrupts flow through the CIQs. Design: cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs are sized according to the max available CQs available at adapter init. In addition, these IQs don't need FL buffers since they only service indirect interrupts. One CIQ is setup per RX channel similar to the RDMA RXQs. iw_cxgb4 will utilize these CIQs based on the vector value passed into create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the number of CIQs configured, and thus the vector value will be the index into the array of CIQs. Based on original work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull main InfiniBand/RDMA updates from Roland Dreier: - add iWARP port mapper to avoid conflicts between RDMA and normal stack TCP connections. - fixes for i386 / x86-64 structure padding differences (ABI compatibility for 32-on-64) from Yann Droneaud. - a pile of SRP initiator fixes from Bart Van Assche. - fixes for a writeback / memory allocation deadlock with NFS over IPoIB connected mode from Jiri Kosina. - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level drivers. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits) RDMA/cxgb4: Add support for iWARP Port Mapper user space service RDMA/nes: Add support for iWARP Port Mapper user space service RDMA/core: Add support for iWARP Port Mapper user space service IB/mlx4: Fix gfp passing in create_qp_common() IB/umad: Fix use-after-free on close IB/core: Fix kobject leak on device register error flow RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp mlx4_core: Fix GFP flags parameters to be gfp_t IB/core: Fix port kobject deletion during error flow IB/core: Remove unneeded kobject_get/put calls IB/core: Fix sparse warnings about redeclared functions IB/mad: Fix sparse warning about gfp_t use IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO IB: Add a QP creation flag to use GFP_NOIO allocations IB: Return error for unsupported QP creation flags IB: Allow build of hw/ and ulp/ subdirectories independently mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp IB/srp: Avoid problems if a header uses pr_fmt IB/umad: Fix error handling ...
2014-06-10Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', ↵Roland Dreier
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
2014-06-10RDMA/cxgb4: Add support for iWARP Port Mapper user space serviceSteve Wise
Based on original work by Vipul Pandya. Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fix htons -> ntohs to make sparse happy. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10RDMA/nes: Add support for iWARP Port Mapper user space serviceTatyana Nikolova
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10RDMA/core: Add support for iWARP Port Mapper user space serviceTatyana Nikolova
This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP Port Mapper implementation is based on the port mapper specification section in the Sockets Direct Protocol paper - http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf Existing iWARP RDMA providers use the same IP address as the native TCP/IP stack when creating RDMA connections. They need a mechanism to claim the TCP ports used for RDMA connections to prevent TCP port collisions when other host applications use TCP ports. The iWARP Port Mapper provides a standard mechanism to accomplish this. Without this service it is possible for RDMA application to bind/listen on the same port which is already being used by native TCP host application. If that happens the incoming TCP connection data can be passed to the RDMA stack with error. The iWARP Port Mapper solution doesn't contain any changes to the existing network stack in the kernel space. All the changes are contained with the infiniband tree and also in user space. The iWARP Port Mapper service is implemented as a user space daemon process. Source for the IWPM service is located at http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary The iWARP driver (port mapper client) sends to the IWPM service the local IP address and TCP port it has received from the RDMA application, when starting a connection. The IWPM service performs a socket bind from user space to get an available TCP port, called a mapped port, and communicates it back to the client. In that sense, the IWPM service is used to map the TCP port, which the RDMA application uses to any port available from the host TCP port space. The mapped ports are used in iWARP RDMA connections to avoid collisions with native TCP stack which is aware that these ports are taken. When an RDMA connection using a mapped port is terminated, the client notifies the IWPM service, which then releases the TCP port. The message exchange between the IWPM service and the iWARP drivers (between user space and kernel space) is implemented using netlink sockets. 1) Netlink interface functions are added: ibnl_unicast() and ibnl_mulitcast() for sending netlink messages to user space 2) The signature of the existing ibnl_put_msg() is changed to be more generic 3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW corresponding to the two iWarp drivers - nes and cxgb4 which use the IWPM service 4) Enums are added to enumerate the attributes in the netlink messages, which are exchanged between the user space IWPM service and the iWARP drivers Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com> [ Fold in range checking fixes and nlh_next removal as suggested by Dan Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-09IB/mlx4: Fix gfp passing in create_qp_common()Jiri Kosina
There are two kzalloc() calls which were not converted to use value of gfp passed to create_qp_common() instead of using hardcoded GFP_KERNEL in 40f2287bd583 ("IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO"). Fix this by passing gfp value down properly. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull SCSI target fixes from Nicholas Bellinger: "Here are the remaining fixes for v3.15. This series includes: - iser-target fix for ImmediateData exception reference count bug (Sagi + nab) - iscsi-target fix for MC/S login + potential iser-target MRDSL buffer overrun (Santosh + Roland) - iser-target fix for v3.15-rc multi network portal shutdown regression (nab) - target fix for allowing READ_CAPCITY during ALUA Standby access state (Chris + nab) - target fix for NULL pointer dereference of alua_access_state for un-configured devices (Chris + nab)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix alua_access_state attribute OOPs for un-configured devices target: Allow READ_CAPACITY opcode in ALUA Standby access state iser-target: Fix multi network portal shutdown regression iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value() iser-target: Add missing target_put_sess_cmd for ImmedateData failure
2014-06-06IB/umad: Fix use-after-free on closeBart Van Assche
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n> triggers a use-after-free. __fput() invokes f_op->release() before it invokes cdev_put(). Make sure that the ib_umad_device structure is freed by the cdev_put() call instead of f_op->release(). This avoids that changing the port mode from IB into Ethernet and back to IB followed by restarting opensmd triggers the following kernel oops: general protection fault: 0000 [#1] PREEMPT SMP RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170 Call Trace: [<ffffffff81190f20>] cdev_put+0x20/0x30 [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0 [<ffffffff8118e35e>] ____fput+0xe/0x10 [<ffffffff810723bc>] task_work_run+0xac/0xe0 [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0 [<ffffffff814b8398>] int_signal+0x12/0x17 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6b58: IB/umad: Fix error handling Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05IB/core: Fix kobject leak on device register error flowHaggai Eran
The ports kobject isn't being released during error flow in device registration. This patch refactors the ports kobject cleanup into a single function called from both the error flow in device registration and from the unregistration function. A couple of attributes aren't being deleted (iw_stats_group, and ib_class_attributes). While this may be handled implicitly by the destruction of their kobjects, it seems better to handle all the attributes the same way. Signed-off-by: Haggai Eran <haggaie@mellanox.com> [ Make free_port_list_attributes() static. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_respYann Droneaud
The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_alloc_ucontext_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp { __u64 status_page_key; /* 0 8 */ __u32 status_page_size; /* 8 4 */ - /* size: 12, cachelines: 1, members: 2 */ - /* last cacheline: 12 bytes */ + /* size: 16, cachelines: 1, members: 2 */ + /* padding: 4 */ + /* last cacheline: 16 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. Additionally, as reported by Dan Carpenter, without the implicit padding being properly cleared, an information leak would take place in most architectures. This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_alloc_ucontext_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain Link: http://marc.info/?i=20140328082428.GH25192@mwanda Cc: <stable@vger.kernel.org> Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes") Reported-by: Yann Droneaud <ydroneaud@opteya.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04IB/core: Fix port kobject deletion during error flowHaggai Eran
When encountering an error during the add_port function, adding a port to sysfs, the port kobject is freed without being deleted from sysfs. Instead of freeing it directly, the patch uses kobject_put to release the kobject and delete it. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04IB/core: Remove unneeded kobject_get/put callsHaggai Eran
The ib_core module will call kobject_get on the parent object of each kobject it creates. This is redundant since kobject_add does that anyway. As a side effect, this patch should fix leaking the ports kobject and the device kobject during unregister flow, since the previous code didn't seem to take into account the kobject_get calls on behalf of the child kobjects. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04IB/core: Fix sparse warnings about redeclared functionsRoland Dreier
Fix a few functions that are declared with __attribute_const__ in the ib_verbs.h header file but defined without it in verbs.c. This gets rid of the following sparse warnings: drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04iser-target: Add missing target_put_sess_cmd for ImmedateData failureNicholas Bellinger
This patch addresses a bug where an early exception for SCSI WRITE with ImmediateData=Yes was missing the target_put_sess_cmd() call to drop the extra se_cmd->cmd_kref reference obtained during the normal iscsit_setup_scsi_cmd() codepath execution. This bug was manifesting itself during session shutdown within isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would end up waiting indefinately for the last se_cmd->cmd_kref put to occur for the failed SCSI WRITE + ImmediateData descriptors. This fix follows what traditional iscsi-target code already does for the same failure case within iscsit_get_immediate_data(). Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03IB/mad: Fix sparse warning about gfp_t useRoland Dreier
Properly convert gfp_t & result to bool to fix: drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types) drivers/infiniband/core/sa_query.c:621:33: expected bool [unsigned] [usertype] preload drivers/infiniband/core/sa_query.c:621:33: got restricted gfp_t Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIOJiri Kosina
Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02IB: Add a QP creation flag to use GFP_NOIO allocationsOr Gerlitz
This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02IB: Return error for unsupported QP creation flagsOr Gerlitz
Fix the usnic and thw qib drivers to err when QP creation flags that they don't understand are provided. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02IB: Allow build of hw/ and ulp/ subdirectories independentlyYann Droneaud
It is not possible to build only the drivers/infiniband/hw/ (or ulp/) subdirectory with command such as: $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/ This fails with following error messages: make[2]: Nothing to be done for `all'. make[2]: Nothing to be done for `relocs'. CHK include/config/kernel.release Using /home/ydroneaud/src/linux as source for kernel GEN /home/ydroneaud/src/linux/obj-x86_64/Makefile CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALL /home/ydroneaud/src/linux/scripts/checksyscalls.sh /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'. Stop. make[1]: *** [drivers/infiniband/hw/] Error 2 make: *** [sub-make] Error 2 This patch creates a Makefile in hw/ and ulp/ and moves each corresponding parts of drivers/infiniband/Makefile in the new Makefiles. It should not break build except if some hw/ drivers or ulp/ were allowed previously to be built while CONFIG_INFINIBAND is set to 'n', but according to drivers/infiniband/Kconfig, it's not possible. So it should be safe to apply. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02Revert "net/mlx4_en: Use affinity hint"David S. Miller
This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02net/mlx4_en: Use affinity hintYuval Atias
The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_respYann Droneaud
The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_create_cq_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_create_cq_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp { __u32 size; /* 28 4 */ __u32 qid_mask; /* 32 4 */ - /* size: 36, cachelines: 1, members: 6 */ - /* last cacheline: 36 bytes */ + /* size: 40, cachelines: 1, members: 6 */ + /* padding: 4 */ + /* last cacheline: 40 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. This patch adds an explicit padding at end of structure c4iw_create_cq_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_create_cq_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Fixes: cfdda9d764362 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Fixes: e24a72a3302a6 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()") Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/srp: Avoid problems if a header uses pr_fmtJoe Perches
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of header files before it gets around to defining PFX. This causes problems if any of the header files do a pr_... and use pr_fmt(). Fix this by using KBUILD_MODNAME instead of the private PFX. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/umad: Fix error handlingBart Van Assche
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL or if nonseekable_open() fails. Avoid leaking a kref count, that sm_sem is kept down and also that the IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if nonseekable_open() fails. Since container_of() never returns NULL, remove the code that tests whether container_of() returns NULL. Moving the kref_get() call from the start of ib_umad_*open() to the end is safe since it is the responsibility of the caller of these functions to ensure that the cdev pointer remains valid until at least when these functions return. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> [ydroneaud@opteya.com: rework a bit to reduce the amount of code changed] Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> [ nonseekable_open() can't actually fail, but.... - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPsJack Morgenstein
This commit adds the sysfs interface for enabling QP0 on VFs for selected VF/port. By default, no VFs are enabled for QP0 operation. To enable QP0 operation on a VF/port, under /sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries: - smi_enabled (read-only). Indicates whether smi is currently enabled for the indicated VF/port - enable_smi_admin (rw). Used by the admin to request that smi capability be enabled or disabled for the indicated VF/port. 0 = disable, 1 = enable. The requested enablement will occur at the next reset of the VF (e.g. driver restart on the VM which owns the VF). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPsJack Morgenstein
This commit adds the infrastructure for enabling selected VFs to operate SMI (QP0) MADs without restriction. Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs are MLX QPs. As such, they operate over VL15. Therefore, they are not affected by "credit" problems or changes in the VLArb table (which may shut down VL0). Non-enabled VFs may only create UD proxy QP0 qps (which are forced by the hypervisor to send packets using the q-key it assigns and places in the qp-context). Thus, non-enabled VFs will not pose a security risk. The hypervisor discards any privileged MADs it receives from these non-enabled VFs. By default, all VFs are NOT enabled, and must explicitly be enabled by the administrator. The sysfs interface which operates the VF enablement infrastructure is provided in the next commit. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responsesJack Morgenstein
Currently, VFs in SRIOV VFs are denied QP0 access. The main reason for this decision is security, since Subnet Management Datagrams (SMPs) are not restricted by network partitioning and may affect the physical network topology. Moreover, even the SM may be denied access from portions of the network by setting management keys unknown to the SM. However, it is desirable to grant SMI access to certain privileged VFs, so that certain network management activities may be conducted within virtual machines instead of the hypervisor. This commit does the following: 1. Create QP0 tunnel QPs for all VFs. 2. Discard SMI mads sent-from/received-for non-privileged VFs in the hypervisor MAD multiplex/demultiplex logic. SMI mads from/for privileged VFs are allowed to pass. 3. MAD_IFC wrapper changes/fixes. For non-privileged VFs, only host-view MAD_IFC commands are allowed, and only for SMI LID-Routed GET mads. For privileged VFs, there are no restrictions. This commit does not allow privileged VFs as yet. To determine if a VF is privileged, it calls function mlx4_vf_smi_enabled(). This function returns 0 unconditionally for now. The next two commits allow defining and activating privileged VFs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrappedJack Morgenstein
mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations counters and for modifying the IB port capability flags. For example, when opensm is started up on the hypervisor, mlx4_ib_modify_port is called to set the port's IsSM flag. In multifunction mode, the SET_PORT command used in this flow should be wrapped (so that the PF port capability flags are also tracked, thus enabling the aggregate of all the VF/PF capability flags to be tracked properly). The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT() to differentiate it from procedure mlx4_SET_PORT() in port.c. mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port(). Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag even when running over RoCE. Therefore, when RoCE is active, mlx4_ib_modify_port should return OK unconditionally (since the capability flags and qkey violations counter are not relevant). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30IB/qib: Additional Intel branding changesVinit Agnihotri
This patches changes user visible function names containing "qlogic" in module init and cleanup. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28RDMA/cxgb3: Remove a couple unneeded conditionsDan Carpenter
We know that "reset_tpt_entry" is false on this side of the if else statement so there is no need to check again. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28IB/mlx4: fix unitialised variable is_mcastColin Ian King
Commit 297e0dad7206 ("IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing") introduced a bug where is_mcast is now no longer initialized on the non-multicast condition and so it can be any random value from the stack. This issue was detected by cppcheck: [drivers/infiniband/hw/mlx4/ah.c:103]: (error) Uninitialized variable: is_mcast Simple fix is to initialise is_mcast to zero. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28IB/ipath: Use time_before()/_after()Manuel Schölling
Time comparisons must use time_after / time_before to avoid problems when jiffies wraps. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bitsRoland Dreier
We need to cast wr_id to unsigned long before casting to a pointer. This fixes: drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_umr_cq_handler': >> drivers/infiniband/hw/mlx5/mr.c:724:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] context = (struct mlx5_ib_umr_context *)wc.wr_id; Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27IB/usnic: Fix source file missing copyright and licenseUpinder Malhi
Prepends copyright and license to usnic_uiom_interval_tree.c Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27IB/ipath: Translate legacy diagpkt into newer extended diagpktDennis Dalessandro
This patch addresses an issue where the legacy diagpacket is sent in from the user, but the driver operates on only the extended diagpkt. This patch specifically initializes the extended diagpkt based on the legacy packet. Cc: <stable@vger.kernel.org> Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27IB/qib: Fix port in pkey change eventMike Marciniszyn
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE. As of the dual port qib QDR card, this is not necessarily correct. Change to use the port as specified in the call. Cc: <stable@vger.kernel.org> Reported-by: Alex Estrin <alex.estrin@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>