summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-21RDMA/cma: Set qkey for AF_IBSean Hefty
Allow the user to specify the qkey when using AF_IB. The qkey is added to struct rdma_ucm_conn_param in place of a reserved field, but for backwards compatability, is only accessed if the associated rdma_cm_id is using AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21RDMA/cma: Expose private data when using AF_IBSean Hefty
If the source or destination address is AF_IB, then do not reserve a portion of the private data in the IB CM REQ or SIDR REQ messages for the cma header. Instead, all private data should be exported to the user. When AF_IB is used, the rdma cm does not have sufficient information to fill in the cma header. Additionally, this will be necessary to support any IB connection through the rdma cm interface, Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21RDMA/cma: Merge cma_get/save_net_infoSean Hefty
With the removal of SDP related code, we can merge cma_get_net_info() with cma_save_net_info(), since we're only ever dealing with a single header format. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Remove unused SDP related codeSean Hefty
The SDP protocol was never merged upstream. Remove unused SDP related code from the RDMA CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Add support for AF_IB to cma_get_service_id()Sean Hefty
cma_get_service_id() forms the service ID based on the port space and port number of the rdma_cm_id. Extend the call to support AF_IB, which contains the service ID directly. This will be needed to support any arbitrary SID. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Add support for AF_IB to rdma_resolve_route()Sean Hefty
Allow rdma_resolve_route() to handle the case where the user specified the source and destination addresses using AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Add support for AF_IB to rdma_resolve_addr()Sean Hefty
Allow the user to specify the remote address using AF_IB format. When AF_IB is used, the remote address simply needs to be recorded, and no resolution using ARP is done. The local address may still need to be matched with a local IB device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Verify that source and dest sa_family are the sameSean Hefty
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Restrict AF_IB loopback to binding to IB devices onlySean Hefty
If a user specifies AF_IB as the source address for a loopback connection, limit the resolution to IB devices only. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Add helper functions to return id address informationSean Hefty
Provide inline helpers to extract source and destination address data from the rdma_cm_id. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Do not modify sa_family when setting loopback addressSean Hefty
cma_resolve_loopback is called after an rdma_cm_id has been bound to a specific sa_family and port. Once the source sa_family for the id has been set, do not modify it. Only the actual IP address portion of the source address needs to be set. As part of this fix, we can simplify setting the source address by moving the loopback address assignment from cma_resolve_loopback to cma_bind_loopback. cma_bind_loopback is only invoked when the source address is the loopback address. Finally, add loopback support for AF_IB as part of the change. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Allow user to specify AF_IB when bindingSean Hefty
Modify rdma_bind_addr to allow the user to specify AF_IB when binding to a device. AF_IB indicates that the user is not mapping an IP address to the native IB addressing. (The mapping may have already been done, or is not needed) Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Update port reservation to support AF_IBSean Hefty
The AF_IB uses a 64-bit service id (SID), which the user can control through the use of a mask. The rdma_cm will assign values to the unmasked portions of the SID based on the selected port space and port number. Because the IB spec divides the SID range into several regions, a SID/mask combination may fall into one of the existing port space ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the correct RDMA port space. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20IB/addr: Add AF_IB support to ip_addr_sizeSean Hefty
Add support for AF_IB to ip_addr_size, and rename the function to account for the change. Give the compiler more control over whether the call should be inline or not by moving the definition into the .c file, removing the static inline, and exporting it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Include AF_IB in loopback and any address checksSean Hefty
Enhance checks for loopback and any address to support AF_IB in addition to AF_INET and AF_INT6. This will allow future patches to use AF_IB when binding and resolving addresses. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Allow enabling reuseaddr in any stateSean Hefty
The rdma_cm only allows setting reuseaddr if the corresponding rdma_cm_id is in the idle state. Allow setting this value in other states. This brings the behavior more inline with sockets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20RDMA/cma: Define native IB addressSean Hefty
Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB addressing. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-20ndisc: Convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20ipv6: Convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20inet: frag , remove an empty ifdef.Rami Rosen
This patch removes an empty ifdef from inet_frag_intern() in net/ipv4/inet_fragment.c. commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a (hlist: drop the node parameter from iterators) removed hlist from net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command, which is now empty. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20htb: refactor struct htb_sched fields for performanceEric Dumazet
htb_sched structures are big, and source of false sharing on SMP. Every time a packet is queued or dequeue, many cache lines must be touched because structures are not lay out properly. By carefully splitting htb_sched in two parts, and define sub structures to increase data locality, we can improve performance dramatically on SMP. New htb_prio structure can also be used in htb_class to increase data locality. I got 26 % performance increase on a 24 threads machine, with 200 concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20tcp: introduce a per-route knob for quick ackCong Wang
In previous discussions, I tried to find some reasonable heuristics for delayed ACK, however this seems not possible, according to Eric: "ACKS might also be delayed because of bidirectional traffic, and is more controlled by the application response time. TCP stack can not easily estimate it." "ACK can be incredibly useful to recover from losses in a short time. The vast majority of TCP sessions are small lived, and we send one ACK per received segment anyway at beginning or retransmits to let the sender smoothly increase its cwnd, so an auto-tuning facility wont help them that much." and according to David: "ACKs are the only information we have to detect loss. And, for the same reasons that TCP VEGAS is fundamentally broken, we cannot measure the pipe or some other receiver-side-visible piece of information to determine when it's "safe" to stretch ACK. And even if it's "safe", we should not do it so that losses are accurately detected and we don't spuriously retransmit. The only way to know when the bandwidth increases is to "test" it, by sending more and more packets until drops happen. That's why all successful congestion control algorithms must operate on explicited tested pieces of information. Similarly, it's not really possible to universally know if it's safe to stretch ACK or not." It still makes sense to enable or disable quick ack mode like what TCP_QUICK_ACK does. Similar to TCP_QUICK_ACK option, but for people who can't modify the source code and still wants to control TCP delayed ACK behavior. As David suggested, this should belong to per-path scope, since different pathes may want different behaviors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20sctp: Convert __list_for_each use to list_for_eachDave Jones
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)Yijing Wang
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init() in init path. So we can use pdev->pm_cap instead of using pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Michael Chan <mchan@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)Yijing Wang
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init() in init path. So we can use pdev->pm_cap instead of using pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS) Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20Bnx2x: remove redundant D0 power state setYijing Wang
Pci_enable_device() will set device power state to D0, so it's no need to do it again in bnx2x_init_dev(). Also remove redundant PM Cap find code, because pci core has been saved the pci device pm cap value. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20net: Add missing dependencies on NETDEVICESBen Hutchings
ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES. I don't think anything should select NETDEVICES, so make it a dependency. It also doesn't need to select or depend on ETHERNET, which has nothing to do with the Ethernet library functions. BPCTL selects MII, which depends on NETDEVICES. But everything in the drivers/staging/silicom directory is related to net devices, so make NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant dependencies on NET. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20at91_ether: Do not select NET_COREBen Hutchings
This has no dependency on any of the drivers under NET_CORE. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20net: Move MII out from under NET_CORE and hide itBen Hutchings
All drivers that select MII also need to select NET_CORE because MII depends on it. This is a bit ridiculous because NET_CORE is just a menu option that doesn't enable any code by itself. There is also no need for it to be a visible option, since its users all select it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20tcp:typo unset should be unsentWeiping Pan
Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20bonding: trivial: make alb use bond_slave_has_mac()Veaceslav Falico
Also, cleanup bond_alb_handle_active_change() from 2 identical ifs. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20be2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()Sathya Perla
be_find_vfs() is no longer needed as the common PCI calls provide the same functionality. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20sit: fix an oops when IFLA_IPTUN_PROTO is not setNicolas Dichtel
The use of this attribute has been added in 32b8a8e59c9c (sit: add IPv4 over IPv4 support). It is optional, by default proto is IPPROTO_IPV6. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUFDaniel Borkmann
The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff)) while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a TCP_SKB_MIN_TRUESIZE. Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is SKB_TRUESIZE(2048) after commit f07d960df33c5 ("tcp: avoid frag allocation for small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF as 2048 can hit a small regression for some applications setting to low SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for skb->head. The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that would appear otherwise. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20neigh: disallow un-init_net to change thresh of neighGao feng
thresh and interval are global resources, only init net can change them. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20neigh: only allow init_net to change the default neigh_parmsGao feng
Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/ directory to the un-init_net, but we can still use cmd such as "ip ntable change name arp_cache locktime 129" to change the locktime of default neigh_parms. This patch disallows the un-init_net to find out the neigh_table.parms. So the un-init_net will failed to influence the init_net. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20neigh: no need to call lookup_neigh_parms in neigh_parms_allocGao feng
neigh_table.parms always exist and is initialized,kmemdup can use it to create new neigh_parms, actually lookup_neigh_parms here will return neigh_table.parms too. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20bnx2x: replace mechanism to check for next available packetDmitry Kravkov
Check next packet availability by validating that HW has finished CQE placement. This saves latency of another dma transaction performed to update SB indexes. Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20bnx2x: add support for ndo_ll_pollDmitry Kravkov
Adds ndo_ll_poll method and locking for FPs between LL and the napi. When receiving a packet we use skb_mark_ll to record the napi it came from. Add each napi to the napi_hash right after netif_napi_add(). Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20net/mlx4_en: Low Latency recv statisticsAmir Vadai
Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20net/mlx4_en: Add Low Latency Socket (LLS) supportAmir Vadai
Add basic support for LLS. Signed-off-by: Amir Vadai <amirv@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: gre tunneling support.David S. Miller
Pravin B Shelar says: ==================== Following patch series adds support for gre tunneling. First six patches extend kernel gre and ip_tunnel modules api so that there is more code sharing between gre modules and ovs. Rest of patches adds ovs tunneling infrastructre and gre protocol vport. V2 fixes two patches according to comments from Jesse. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: Add gre tunnel support.Pravin B Shelar
Add gre vport implementation. Most of gre protocol processing is pushed to gre module. It make use of gre demultiplexer therefore it can co-exist with linux device based gre tunnels. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: Optimize flow key match for non tunnel flows.Pravin B Shelar
Following patch adds start offset for sw_flow-key, so that we can skip tunneling information in key for non-tunnel flows. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: Expand action buffer size.Pravin B Shelar
MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action needs extra space on action list, for now increase max actions list limit. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: Add tunneling interface.Pravin B Shelar
Add ovs tunnel interface for set tunnel action for userspace. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20openvswitch: Copy individual actions.Pravin B Shelar
Rather than validating actions and then copying all actiaons in one block, following patch does same operation in single pass. This validate and copy action one by one. This is required for ovs tunneling patch. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20ip_tunnel: Add dont fragment flag.Pravin B Shelar
This flag will be used by ovs tunneling. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20ip_tunnel: push generic protocol handling to ip_tunnel module.Pravin B Shelar
Process skb tunnel header before sending packet to protocol handler. this allows code sharing between gre and ovs gre modules. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20ip_tunnels: extend iptunnel_xmit()Pravin B Shelar
Refactor various ip tunnels xmit functions and extend iptunnel_xmit() so that there is more code sharing. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>