summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)Author
2016-06-07IB/core: Initialize sysfs attributes before sysfs create groupMark Bloch
For dynamically allocated sysfs attributes there is a need to call sysfs_attr_init in order to comply with lockdep, not calling it will result in error complaining key is not in .data section. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix removal of default GID cache entryAviv Heller
When deleting a default GID from the cache, its gid_type field is set to 0. This could set the gid_type to RoCE v1 for a RoCE v2 default GID, essentially making it inaccessible to future modifications, since it is no longer found by find_gid(). This fix preserves the gid_type value for default gids during cache operations. Fixes: b39ffa1df505 ('IB/core: Add gid_type to gid attribute') Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix query port failure in RoCEEli Cohen
Currently ib_query_port always attempts to to read the subnet prefix by calling ib_query_gid(). For RoCE/iWARP there is no subnet manager and no subnet prefix. Fix this by querying GID[0] only for IB networks. Fixes: fad61ad4e755 ('IB/core: Add subnet prefix to port info') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: fix error unwind in sysfs hw counters codeDoug Ledford
Between the initial and final versions of the function setup_hw_stats, the order of variable initialization was changed. However, the unwind flow on error did not properly keep up with the flow changes. Make the unwind flow match a proper unwind of the allocation flow, then remove no longer needed variable initializations. Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure dynamic) Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix array length allocationDoug Ledford
The new sysfs hw_counters code had an off by one in its array allocation length. Fix that and the comment along with it. Reported-by: Mark Bloch <markb@mellanox.com> Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure dynamic) Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/mad: Fix indentationBart Van Assche
Make indentation consistent. Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hal Rosenstock <hal@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-By: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06RDMA/core: Fix indentationBart Van Assche
Make indentation consistent. Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Cc: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@gimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/core: fix null pointer deref and mem leak in error handlingColin Ian King
The current error handling in setup_hw_stats has a couple of issues. It is possible to generate a null pointer deference on the kfree of hsag->attrs[i] because two of the early error exit paths jump to the kfree when hsags NULL and not allocated. Fix this by moving the kfree on stats and jumping to that, avoiding the hsag freeing. Secondly, there is a memory leak of stats if the hsag allocation fails; instead of returning, jump to the kfree on stats. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/core: fix an error code in ib_core_init()Dan Carpenter
We should return the error code if ib_add_ibnl_clients() fails. The current code returns success. Fixes: 735c631ae99d ('IB/core: Register SA ibnl client during ib_core initialization') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/cm: Fix a recently introduced locking bugBart Van Assche
ib_cm_notify() can be called from interrupt context. Hence do not reenable interrupts unconditionally in cm_establish(). This patch avoids that lockdep reports the following warning: WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0 DEBUG_LOCKS_WARN_ON(current->hardirq_context) Call Trace: <IRQ> [<ffffffff812bd0e5>] dump_stack+0x67/0x92 [<ffffffff81056f21>] __warn+0xc1/0xe0 [<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50 [<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0 [<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10 [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40 [<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm] [<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt] [<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib] [<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core] [<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core] [<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core] [<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100 [<ffffffff810b09e4>] handle_irq_event+0x34/0x60 [<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150 [<ffffffff8101ad05>] handle_irq+0x15/0x20 [<ffffffff8101a66c>] do_IRQ+0x5c/0x110 [<ffffffff8159a2c9>] common_interrupt+0x89/0x89 [<ffffffff81297a17>] blk_run_queue_async+0x37/0x40 [<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod] [<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod] [<ffffffff812a26c2>] blk_done_softirq+0x52/0x90 [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230 [<ffffffff8105bec8>] irq_exit+0xa8/0xb0 [<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30 [<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10 [<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90 <EOI> Fixes: commit be4b499323bf (IB/cm: Do not queue work to a device that's going away) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Nikolay Borisov <kernel@kyup.com> Cc: stable <stable@vger.kernel.org> # v4.2+ Acked-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/core: Make device counter infrastructure dynamicChristoph Lameter
In practice, each RDMA device has a unique set of counters that the hardware implements. Having a central set of counters that they must all adhere to is limiting and causes many useful counters to not be available. Therefore we create a dynamic counter registration infrastructure. The driver must implement a stats structure allocation routine, in which the driver must place the directory name it wants, a list of names for all of the counters, an array of u64 counters themselves, plus a few generic configuration options. We then implement a core routine to create a sysfs file for each of the named stats elements, and a core routine to retrieve the stats when any of the sysfs attribute files are read. To avoid excessive beating on the stats generation routine in the drivers, the core code also caches the stats for a short period of time so that someone attempting to read all of the stats in a given device's directory will not result in a stats generation call per file read. Future work will attempt to standardize just the shared stats elements, and possibly add a method to get the stats via netlink in addition to sysfs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [ Add caching, make structure names more informative, add i40iw support, other significant rewrites from the original patch ]
2016-05-26Merge branches 'misc-4.7-2', 'ipoib' and 'ib-router' into k.o/for-4.7Doug Ledford
2016-05-25IB/core: Support new type of join-state for multicastErez Shitrit
There are four types for MCG, FullMember, NonMember, SendOnlyNonMember, and the new added type: SendOnlyFullMember. Add support for the new SendOnlyFullMember join state. The new type allows host to send join request as sendonly, it will cause the group to be created but without getting packets from this multicast back to the host. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25IB/SA Agent: Add support for SA agent get ClassPortInfoErez Shitrit
New SA query function to return the ClassPortInfo struct from the SA. If the SM supports FullMemberSendOnly mode for MCG's, it sets a capability bit in the capability_mask2 field of the response. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Add IP to GID netlink offloadMark Bloch
There is an assumption that rdmacm is used only between nodes in the same IB subnet, this why ARP resolution can be used to turn IP to GID in rdmacm. When dealing with IB communication between subnets this assumption is no longer valid. ARP resolution will get us the next hop device address and not the peer node's device address. To solve this issue, we will check user space if it can provide the GID of the peer node, and fail if not. We add a sequence number to identify each request and fill in the GID upon answer from userspace. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Register SA ibnl client during ib_core initializationMark Bloch
Move SA ibnl client registration to ib_core module init. This will allow us to register a single client to handle all RDMA_NL_LS operations and make it SA independent. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/SA: Integrate ib_sa module into ib_core moduleMark Bloch
Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/MAD: Integrate ib_mad module into ib_core moduleMark Bloch
Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Integrate IB address resolution module into coreLeon Romanovsky
IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-18IB/core: Do not require CAP_NET_ADMIN for packet sniffingChristoph Lameter
In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program to listen to all incoming packets on a specific interface, and the higher CAP_NET_ADMIN is required to set the interface into promiscuous mode. We want to emulate that same basic division of privilege in the RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with CAP_NET_RAW to listen to all incoming flows (and direct them as they see fit in their own listen stream). Do not require CAP_NET_ADMIN just to listen to traffic already incoming. Reserve CAP_NET_ADMIN if we attempt to set promiscuous mode. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into ↵Doug Ledford
k.o/for-4.7
2016-05-13IB/core: Add Scatter FCS create flagMajd Dibbiny
Raw Packet QPs that were created with Scatter FCS flag, will scatter the FCS into the receive buffers. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Add extended device capability flagsMajd Dibbiny
Since all the uverbs device_cap_flags are occupied, we need a place to expose more device capabilities. This patch adds a new 64 bit device_cap_flags_ex to expose new device capabilities. The lower 32 bits will be identical to the original device_cap_flags, The upper 32 bits will be new capabilities. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/SA: Use correct free functionMark Bloch
Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Fix a potential array overrun in CMA and SA agentMark Bloch
Fix array overrun when going over callback table. In declaration of callback table, the max size isn't provided and in registration phase, it is provided. There is potential scenario where a new operation is added and it is not supported by current client. The acceptance of such operation by ib_netlink will cause to array overrun. Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to netlink_dump_start") Fixes: b493d91d333e ("iwcm: common code for port mapper") Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Remove unnecessary check in ibnl_rcv_msgMark Bloch
RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1)) which means op (defined as an int) can never be a negative number. Fixes: b2cbae2c2487 ('RDMA: Add netlink infrastructure') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/IWPM: Fix a potential skb leakMark Bloch
In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13iwcm: Fix a sparse warningBart Van Assche
Avoid that sparse complains about the comparison of s_addr with INADDR_ANY. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Enhance ib_map_mr_sg()Bart Van Assche
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add RW API support for signature MRsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: generic RDMA READ/WRITE APIChristoph Hellwig
This supports both manual mapping of lots of SGEs, as well as using MRs from the QP's MR pool, for iWarp or other cases where it's more optimal. For now, MRs are only used for iWARP transports. The user of the RDMA-RW API must allocate the QP MR pool as well as size the SQ accordingly. Thanks to Steve Wise for testing, fixing and rewriting the iWarp support, and to Sagi Grimberg for ideas, reviews and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add a need_inval flag to struct ib_mrSteve Wise
This is the first step toward moving MR invalidation decisions to the core. It will be needed by the upcoming RW API. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add a simple MR poolChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: refactor ib_create_qpChristoph Hellwig
Split the XRC magic into a separate function, and return early on failure to make the initialization code readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-12IB/cma: pass the port number to ib_create_qpChristoph Hellwig
The new RW API will need this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/security: Restrict use of the write() interfaceJason Gunthorpe
The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-26IB/core: Don't drain non-existent rq queue-pairSagi Grimberg
The drain_rq function expects a normal receive qp to drain. A qp can only have either a normal rq or an srq. If there is an srq, there is no rq to drain. Until the API supports draining SRQs, simply skip draining the rq when the qp has an srq attached. Fixes: 765d67748bcf ("IB: new common API for draining queues") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-23IB/core: Fix oops in ib_cache_gid_set_default_gidDoug Ledford
When we fail to find the default gid index, we can't continue processing in this routine or else we will pass a negative index to later routines resulting in invalid memory access attempts and a kernel oops. Fixes: 03db3a2d81e6 (IB/core: Add RoCE GID table management) Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-22Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...
2016-03-21IB/core: Add interfaces to control VF attributesEli Cohen
Following the practice exercised for network devices which allow the PF net device to configure attributes of its virtual functions, we introduce the following functions to be used by IPoIB which is the network driver implementation for IB devices. ib_set_vf_link_state - set the policy for a VF link. More below. ib_get_vf_config - read configuration information of a VF ib_get_vf_stats - read VF statistics ib_set_vf_guid - set the node or port GUID of a VF Also add an indication in the device cap flags that indicates that this IB devices is based on a virtual function. A VF shares the physical port with the PF and other VFs. When setting the link state we have three options: 1. Auto - in this mode, the virtual port follows the state of the physical port and becomes active only if the physical port's state is active. In all other cases it remains in a Down state. 2. Down - sets the state of the virtual port to Down 3. Up - causes the virtual port to transition into Initialize state if it was not already in this state. A virtualization aware subnet manager can then bring the state of the port into the Active state. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Support accessing SA in virtualized environmentEli Cohen
Per the ongoing standardisation process, when virtual HCAs are present in a network, traffic is routed based on a destination GID. In order to access the SA we use the well known SA GID. We also add a GRH required boolean field to the port attributes which is used to report to the verbs consumer whether this port is connected to a virtual network. We use this field to realize whether we need to create an address vector with GRH to access the subnet administrator. We clear the port attributes struct before calling the hardware driver to make sure the default remains that GRH is not required. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Add subnet prefix to port infoEli Cohen
The subnet prefix is a part of the port_info MAD returned and should be available at the ib_port_attr struct. We define it here and provide a default implementation in case the hardware driver does not provide one. The subnet prefix is required when creating the address vector to access the SA in networks where GRH must be used. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/{core, ulp} Support above 32 possible device capability flagsLeon Romanovsky
The old bitwise device_cap_flags variable was limited to u32 which has all bits already defined. In order to overcome it, we converted device_cap_flags variable to be u64 type. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Replace setting the zero values in ib_uverbs_ex_query_deviceLeon Romanovsky
The setting to zero during variable initialization eliminates the need to explicitly set to zero variables and structures. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-18Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...
2016-03-17Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs updates from Christoph Hellwig: - A large patch from me to simplify setting up the list of default groups by actually implementing it as a list instead of an array. - a small Y2083 prep patch from Deepa Dinamani. Probably doesn't matter on it's own, but it seems like he is trying to get rid of all CURRENT_TIME uses in file systems, which is a worthwhile goal. * tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs: configfs: switch ->default groups to a linked list configfs: Replace CURRENT_TIME by current_fs_time()
2016-03-16Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6Doug Ledford
2016-03-16iwpm: crash fix for large connections testFaisal Latif
During large connection test, there is a crash at wake_up() in the callback as waitq is not yet initialized. Callback can happen before iwpm_wait_complete_req() is called to initialize waitq. To resolve, using signaling semaphore instead of waitq. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Reviewed-by: Tatyana E Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>