summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-18net: sctp: get rid of t_new macro for kzallocDaniel Borkmann
t_new rather obfuscates things where everyone else is using actual function names instead of that macro, so replace it with kzalloc, which is the function t_new wraps. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17fec: Add support to restart autonegotiateChris Healy
Add ethtool operation to restart autonegotiation via the PHY. Tested on i.MX28EVK. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17bonding: don't call alb_set_slave_mac_addr() while atomicVeaceslav Falico
alb_set_slave_mac_addr() sets the mac address in alb mode via dev_set_mac_address(), which might sleep. It's called from alb_handle_addr_collision_on_attach() in atomic context (under read_lock(bond->lock)), thus triggering a bug. Fix this by moving the lock inside alb_handle_addr_collision_on_attach(). v1->v2: As Nikolay Aleksandrov noticed, we can drop the bond->lock completely. Also, use bond_slave_has_mac(), when possible. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tg3: Prevent system hang during repeated EEH errors.Michael Chan
The current tg3 code assumes the pci_error_handlers to be always called in sequence. In particular, during ->error_detected(), NAPI is disabled and the device is shutdown. The device is later reset and NAPI re-enabled in ->slot_reset() and ->resume(). In EEH, if more than 6 errors are detected in a hour, only ->error_detected() will be called. This will leave the driver in an inconsistent state as NAPI is disabled but netif_running state is still true. When the device is later closed, we'll try to disable NAPI again and it will loop forever. We fix this by closing the device if we encounter any error conditions during the normal sequence of the pci_error_handlers. v2: Remove the changes in tg3_io_resume() based on Benjamin Poirier's feedback. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17Merge branch 'tipc'David S. Miller
Paul Gortmaker says: ==================== This is a rework of the content sent earlier[1], with the following changes: -drop the Kconfig --> modparam conversion patch; this was requested to be replaced[2] with a dynamic port quantity resizing. Ying and Erik were discussing how best to achieve this, and then vacation schedules got in the way, so implementing that will come (hopefully) in the next round. -rework the sk_rcvbuf patch to allow memory resizing via sysctl as per what Ying and Neil discussed[3] -add 4 more seemingly straigtforward and relatively small changes from Ying (the last 4 in the series). -add cosmetic UAPI comment update patch from Ying. That said, the largest change is still the one where we make use of the fact that linux supports kernel threads and do the server like operations within kernel threads. As Jon says: We remove the last remnants of the TIPC native API, to make it possible to simplify locking policy and solve a problem with lost topology events. First, we introduce a socket-based alternative to the native API. Second, we convert the two remaining users of the native API, the TIPC internal topology server and the configuarion server, to use the new API. Third, we remove the remaining code pertaining to the native API. I have re-tested this collection of commits between 32 and 64 bit x86 machines using the standard tipc test suite, and build tested for ppc. [1] http://patchwork.ozlabs.org/patch/247687/ [2] http://patchwork.ozlabs.org/patch/247680/ [3] http://patchwork.ozlabs.org/patch/247688/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: remove dev_base_lock use from enable_bearerYing Xue
Convert enable_bearer() to RCU locking with dev_get_by_name(). Based on a similar changeset in commit 840a185d ["aoe: remove dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that: "dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock)" Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: fix wrong return value for link_send_sections_long routineYing Xue
When skb buffer cannot be allocated in link_send_sections_long(), -ENOMEM error code instead of -EFAULT should be returned to its caller. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: make tipc_link_send_sections_fast exit earlierYing Xue
Once message build request function returns invalid code, the process of sending message cannot continue. So in case of message build failure, tipc_link_send_sections_fast() should return immediately. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: enhance priority of link protocol packetYing Xue
pfifo_fast is set as default traffic class queueing discipline. This queue has three so called "bands". Within each band, FIFO rules apply. However, as long as there are packets waiting in band 0, band 1 won't be processed. Now all kind of TIPC type packet priorities are never set, that is, their priorities are 0, so they are mapped to band 1 of pfifo_fast qdisc. But, especially during link congestion, if link protocol packet can be sent out as earlier as possible than other type of packets so that protocol packet can arrive at peer endpoint in time, the peer will timely reset its link timeout timer to keep the link alive. So enhancing the priority of link protocol packets can meet the specific demand to avoid unnecessary link reset due to a transient link congestion. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: cosmetic realignment of function argumentsPaul Gortmaker
No runtime code changes here. Just a realign of the function arguments to start where the 1st one was, and fit as many args as can be put in an 80 char line. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: save sock structure pointer instead of void pointer to tipc_portYing Xue
Directly save sock structure pointer instead of void pointer to avoid unnecessary cast conversions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert config_lock from spinlock to mutexYing Xue
As the configuration server is now running under process context, it's unnecessary for us to have a spinlock serializing the TIPC configuration process. Instead, we replace it with a mutex lock, which gives us more freedom. For instance, we can now call pre-emptable functions within the protected area. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: rename tipc_createport_raw to tipc_createportYing Xue
After the removal of the native API, there is now only one way to to create a TIPC port instance -- the function tipc_createport_raw(). We make it more readable by renaming it to tipc_createport(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: remove user_port instance from tipc_port structureYing Xue
After the native API has been completely removed, the 'user_port' field in struct tipc_port becomes unused, and can be removed. As a consequence, the "usrmem" argument in tipc_msg_build() is no longer needed, and so we remove that one too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: delete code orphaned by new server infrastructureYing Xue
Having completed the conversion of the topology server and configuration server to use the new server infrastructure, the following functions become unused, and can be deleted: - tipc_createport() - port_wakeup_sh() - port_dispatcher() - port_dispatcher_sigh() - tipc_send_buf_fast() - tipc_send_buf2port Additionally, the following variables become orphaned, and can be deleted: - tipc_msg_err_event - tipc_named_msg_err_event - tipc_conn_shutdown_event - tipc_msg_event - tipc_named_msg_event - tipc_conn_msg_event - tipc_continue_event - msg_queue_head - msg_queue_tail - queue_lock Deletion is done here in a separate commit in order to allow the actual conversion changes to be more easily viewed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert configuration server to use new server facilityYing Xue
As the new socket-based TIPC server infrastructure has been introduced, we can now convert the configuration server to use it. Then we can take future steps to simplify the configuration server locking policy. Some minor reordering of initialization is done, due to the dependency on having tipc_socket_init completed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: convert topology server to use new server facilityYing Xue
As the new TIPC server infrastructure has been introduced, we can now convert the TIPC topology server to it. We get two benefits from doing this: 1) It simplifies the topology server locking policy. In the original locking policy, we placed one spin lock pointer in the tipc_subscriber structure to reuse the lock of the subscriber's server port, controlling access to members of tipc_subscriber instance. That is, we only used one lock to ensure both tipc_port and tipc_subscriber members were safely accessed. Now we introduce another spin lock for tipc_subscriber structure only protecting themselves, to get a finer granularity locking policy. Moreover, the change will allow us to make the topology server code more readable and maintainable. 2) It fixes a bug where sent subscription events may be lost when the topology port is congested. Using the new service, the topology server now queues sent events into an outgoing buffer, and then wakes up a sender process which has been blocked in workqueue context. The process will keep picking events from the buffer and send them to their respective subscribers, using the kernel socket interface, until the buffer is empty. Even if the socket is congested during transmission there is no risk that events may be dropped, since the sender process may block when needed. Some minor reordering of initialization is done, since we now have a scenario where the topology server must be started after socket initialization has taken place, as the former depends on the latter. And overall, we see a simplification of the TIPC subscriber code in making this changeover. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: introduce new TIPC server infrastructureYing Xue
TIPC has two internal servers, one providing a subscription service for topology events, and another providing the configuration interface. These servers have previously been running in BH context, accessing the TIPC-port (aka native) API directly. Apart from these servers, even the TIPC socket implementation is partially built on this API. As this API may simultaneously be called via different paths and in different contexts, a complex and costly lock policiy is required in order to protect TIPC internal resources. To eliminate the need for this complex lock policiy, we introduce a new, generic service API that uses kernel sockets for message passing instead of the native API. Once the toplogy and configuration servers are converted to use this new service, all code pertaining to the native API can be removed. This entails a significant reduction in code amount and complexity, and opens up for a complete rework of the locking policy in TIPC. The new service also solves another problem: As the current topology server works in BH context, it cannot easily be blocked when sending of events fails due to congestion. In such cases events may have to be silently dropped, something that is unacceptable. Therefore, the new service keeps a dedicated outbound queue receiving messages from BH context. Once messages are inserted into this queue, we will immediately schedule a work from a special workqueue. This way, messages/events from the topology server are in reality sent in process context, and the server can block if necessary. Analogously, there is a new workqueue for receiving messages. Once a notification about an arriving message is received in BH context, we schedule a work from the receive workqueue to do the job of receiving the message in process context. As both sending and receive messages are now finished in processes, subscribed events cannot be dropped any more. As of this commit, this new server infrastructure is built, but not actually yet called by the existing TIPC code, but since the conversion changes required in order to use it are significant, the addition is kept here as a separate commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: allow implicit connect for stream socketsErik Hugne
TIPC's implied connect feature, aka piggyback connect, allows applications to save one syscall and all SYN/SYN-ACK signalling overhead when setting up a connection. Until now, this has only been supported for SEQPACKET sockets. Here, we make it possible to use this feature even with stream sockets. At the connecting side, the connection is completed when the first data message arrives from the accepting peer. This means that we must allow the connecting user to call blocking recv() before the socket has reached state SS_CONNECTED. So we must must relax the state machine check at recv_stream(), and allow the recv() call even if socket is in state SS_CONNECTING. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: change socket buffer overflow control to respect sk_rcvbufYing Xue
As per feedback from the netdev community, we change the buffer overflow protection algorithm in receiving sockets so that it always respects the nominal upper limit set in sk_rcvbuf. Instead of scaling up from a small sk_rcvbuf value, which leads to violation of the configured sk_rcvbuf limit, we now calculate the weighted per-message limit by scaling down from a much bigger value, still in the same field, according to the importance priority of the received message. To allow for administrative tunability of the socket receive buffer size, we create a tipc_rmem sysctl variable to allow the user to configure an even bigger value via sysctl command. It is a size of three (min/default/max) to be consistent with things like tcp_rmem. By default, the value initialized in tipc_rmem[1] is equal to the receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message. This value is also set as the default value of sk_rcvbuf. Originally-by: Jon Maloy <jon.maloy@ericsson.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> [Ying: added sysctl variation to Jon's original patch] Signed-off-by: Ying Xue <ying.xue@windriver.com> [PG: don't compile sysctl.c if not config'd; add Documentation] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17tipc: update code comments to reflect new uapi header pathYing Xue
Files tipc.h and tipc_config.h were moved to uapi directory, but the corresponding comments were not updated at the same time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: add socket option for low latency pollingEliezer Tamir
adds a socket option for low latency polling. This allows overriding the global sysctl value with a per-socket one. Unexport sysctl_net_ll_poll since for now it's not needed in modules. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: remove NET_LL_RX_POLL config menueEliezer Tamir
Remove NET_LL_RX_POLL from the config menu. Change default to y. Busy polling still needs to be enabled at run time. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: convert low latency sockets to sched_clock()Eliezer Tamir
Use sched_clock() instead of get_cycles(). We can use sched_clock() because we don't care much about accuracy. Remove the dependency on X86_TSC Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17net: change sysctl_net_ll_poll into an unsigned intEliezer Tamir
There is no reason for sysctl_net_ll_poll to be an unsigned long. Change it into an unsigned int. Fix the proc handler. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: sctp_association_init: put refs in reverse orderDaniel Borkmann
In case we need to bail out for whatever reason during assoc init, we call sctp_endpoint_put() and then sock_put(), however, we've hold both refs in reverse, non-symmetric order, so first sctp_endpoint_hold() and then sock_hold(). Reverse this, so that in an error case we have sock_put() and then sctp_endpoint_put(). Actually shouldn't matter too much, since both cleanup paths do the right thing, but that way, it is more consistent with the rest of the code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: minor: remove variable in sctp_init_sockDaniel Borkmann
It's only used at this one time, so we could remove it as well. This is valid and also makes it more explicit/obvious that in case of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock() check that was recently added. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: sctp_sf_do_prm_asoc: do SCTP_CMD_INIT_CHOOSE_TRANSPORT firstDaniel Borkmann
While this currently cannot trigger any NULL pointer dereference in sctp_seq_dump_local_addrs(), better change the order of commands to prevent a future bug to happen. Although we first add SCTP_CMD_NEW_ASOC and then set the SCTP_CMD_INIT_CHOOSE_TRANSPORT, it is okay for now, since this primitive is only called by sctp_connect() or sctp_sendmsg() with sctp_assoc_add_peer() set first. However, lets do this precaution and first set the transport and then add it to the association hashlist to prevent in future something to possibly triggering this. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net: sctp: sideeffect: throw BUG if primary_path is NULLDaniel Borkmann
This clearly states a BUG somewhere in the SCTP code as e.g. fixed once in f28156335 ("sctp: Use correct sideffect command in duplicate cookie handling"). If this ever happens, throw a trace in the sideeffect engine where assocs clearly must have a primary_path assigned. When in sctp_seq_dump_local_addrs() also throw a WARN and bail out since we do not need to panic for printing this one asterisk. Also, it will avoid the not so obvious case when primary != NULL test passes and at a later point in time triggering a NULL ptr dereference caused by primary. While at it, also fix up the white space. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A few miscellaneous improvements and cleanups before the GRE tunnel integration series. Intended for net-next/3.11. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14openvswitch: Simplify interface ovs_flow_metadata_from_nlattrs()Pravin B Shelar
This is not functional change, this is just code cleanup. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: make skb->csum consistent with rest of networking stack.Pravin B Shelar
Following patch keeps skb->csum correct across ovs. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Fix struct comment.Pravin B Shelar
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Fix misspellings in comments and docs.Andy Hill
Flagged with: https://github.com/lyda/misspell-check Run with: git ls-files | misspellings -f - Signed-off-by: Andy Hill <hillad@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: fix variable names in commentLorand Jakab
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Unify vport error stats handling.Pravin B Shelar
Following patch changes vport->send return type so that vport layer can do error accounting. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Remove unused get_config vport op.Jesse Gross
The get_config vport op is left over from old compatibility code, it is neither used nor implemented any more. Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14openvswitch: Immediately exit on error in ovs_vport_cmd_set().Jesse Gross
It is an error to try to change the type of a vport using the set command. However, while we check that this is an error, we still proceed to allocate memory which then gets freed immediately. This stops processing after noticing the error, which does not actually fix a bug but is more correct. Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-06-14net/mlx4: Add VF link state supportRony Efraim
Add support to change the link state of VF (vPort) Signed-off-by: Rony Efraim <ronye@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net/core: Add VF link state controlRony Efraim
Add netlink directives and ndo entry to allow for controling VF link, which can be in one of three states: Auto - VF link state reflects the PF link state (default) Up - VF link state is up, traffic from VF to VF works even if the actual PF link is down Down - VF link state is down, no traffic from/to this VF, can be of use while configuring the VF Signed-off-by: Rony Efraim <ronye@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14bcm63xx_enet: add support Broadcom BCM6345 EthernetFlorian Fainelli
This patch adds support for the Broadcom BCM6345 SoC Ethernet. BCM6345 has a slightly different and older DMA engine which requires the following modifications: - the width of the DMA channels on BCM6345 is 64 bytes vs 16 bytes, which means that the helpers enet_dma{c,s} need to account for this channel width and we can no longer use macros - BCM6345 DMA engine does not have any internal SRAM for transfering buffers - BCM6345 buffer allocation and flow control is not per-channel but global (done in RSET_ENETDMA) - the DMA engine bits are right-shifted by 3 compared to other DMA generations - the DMA enable/interrupt masks are a little different (we need to enabled more bits for 6345) - some register have the same meaning but are offsetted in the ENET_DMAC space so a lookup table is required to return the proper offset The MAC itself is identical and requires no modifications to work. Signed-off-by: Florian Fainelli <florian@openwrt.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14htb: reorder struct htb_class fields for performanceEric Dumazet
htb_class structures are big, and source of false sharing on SMP. By carefully splitting them in two parts, we can improve performance. I got 9 % performance increase on a 24 threads machine, with 200 concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14net-rps: fixes for rps flow limitWillem de Bruijn
Caught by sparse: - __rcu: missing annotation to sd->flow_limit - __user: direct access in cpumask_scnprintf Also - add endline character when printing bitmap if room in buffer - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY The last item warrants some explanation. The hashtable buckets are subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal to bucket size, since all packets may end up in a single bucket. The current (rather arbitrary) history value of 256 happens to match the buffer size (u8). As a result, with a single flow, the first 128 packets are accepted (correct), the second 128 packets dropped (correct) and then the history[] array has filled, so that each subsequent new packet causes an increment in the bucket for new_flow plus a decrement for old_flow: a steady state. This is fine if packets are dropped, as the steady state goes away as soon as a mix of traffic reappears. But, because the 256th packet overflowed the bucket to 0: no packets are dropped. Instead of explicitly adding an overflow check, this patch changes FLOW_LIMIT_HISTORY to never be able to overflow a single bucket. Reported-by: Fengguang Wu <fengguang.wu@intel.com> (first item) Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13tcp: properly send new data in fast recovery in first RTTYuchung Cheng
Linux sends new unset data during disorder and recovery state if all (suspected) lost packets have been retransmitted ( RFC5681, section 3.2 step 1 & 2, RFC3517 section 4, NexSeg() Rule 2). One requirement is to keep the receive window about twice the estimated sender's congestion window (tcp_rcv_space_adjust()), assuming the fast retransmits repair the losses in the next round trip. But currently it's not the case on the first round trip in either normal or Fast Open connection, beucase the initial receive window is identical to (expected) sender's initial congestion window. The fix is to double it. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13sh_eth: remove '__maybe_unused' annotationsSergei Shtylyov
Now that the SoC specific support is no longer done with help of #ifdef'fery, we no longer need '__maybe_unused' annotations to sh_eth_select_mii() and sh_eth_set_duplex()... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net: Convert uses of typedef ctl_table to struct ctl_tableJoe Perches
Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net: make all team port device link events urgentFlavio Leitner
Since team functionality relies heavily on userspace daemon, we need to deliver event to userspace via Netlink as quick as possible. So make all team port device link events urgent. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net: ping_check_bind_addr() etc. can be staticWu Fengguang
net/ipv4/ping.c:286:5: sparse: symbol 'ping_check_bind_addr' was not declared. Should it be static? net/ipv4/ping.c:355:6: sparse: symbol 'ping_set_saddr' was not declared. Should it be static? net/ipv4/ping.c:370:6: sparse: symbol 'ping_clear_saddr' was not declared. Should it be static? net/ipv6/ping.c:60:5: sparse: symbol 'dummy_ipv6_recv_error' was not declared. Should it be static? net/ipv6/ping.c:64:5: sparse: symbol 'dummy_ip6_datagram_recv_ctl' was not declared. Should it be static? net/ipv6/ping.c:69:5: sparse: symbol 'dummy_icmpv6_err_convert' was not declared. Should it be static? net/ipv6/ping.c:73:6: sparse: symbol 'dummy_ipv6_icmp_error' was not declared. Should it be static? net/ipv6/ping.c:75:5: sparse: symbol 'dummy_ipv6_chk_addr' was not declared. Should it be static? net/ipv6/ping.c:201:5: sparse: symbol 'ping_v6_seq_show' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13cxgb4: Do not set net_device::dev_id to VI indexBen Hutchings
net_device::dev_id should not be used merely to indicate a VI index, as it affects the way the local part of IPv6 addresses is normally generated. This field was intended for use where multiple devices may share a single assigned MAC address and need to have different IPv6 addresses. T4 VIs each have their own MAC address. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13macvtap: fix uninitialized return value macvtap_ioctl_set_queue()Jason Wang
Return -EINVAL on illegal flag instead of uninitialized value. This fixes the kbuild test warning. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>