summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-08-25ipv6: White-space cleansing : Line LayoutsIan Morris
This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char *)foo etc. * Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25net: ec_bhf: remove excessive debug messagesDarek Marcinkiewicz
This cuts down the number of debug information spit out by the driver. Signed-off-by: Dariusz Marcinkiewicz <reksio@newterm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25random32: improvements to prandom_bytesDaniel Borkmann
This patch addresses a couple of minor items, mostly addesssing prandom_bytes(): 1) prandom_bytes{,_state}() should use size_t for length arguments, 2) We can use put_unaligned() when filling the array instead of open coding it [ perhaps some archs will further benefit from their own arch specific implementation when GCC cannot make up for it ], 3) Fix a typo, 4) Better use unsigned int as type for getting the arch seed, 5) Make use of prandom_u32_max() for timer slack. Regarding the change to put_unaligned(), callers of prandom_bytes() which internally invoke prandom_bytes_state(), don't bother as they expect the array to be filled randomly and don't have any control of the internal state what-so-ever (that's also why we have periodic reseeding there, etc), so they really don't care. Now for the direct callers of prandom_bytes_state(), which are solely located in test cases for MTD devices, that is, drivers/mtd/tests/{oobtest.c,pagetest.c,subpagetest.c}: These tests basically fill a test write-vector through prandom_bytes_state() with an a-priori defined seed each time and write that to a MTD device. Later on, they set up a read-vector and read back that blocks from the device. So in the verification phase, the write-vector is being re-setup [ so same seed and prandom_bytes_state() called ], and then memcmp()'ed against the read-vector to check if the data is the same. Akinobu, Lothar and I also tested this patch and it runs through the 3 relevant MTD test cases w/o any errors on the nandsim device (simulator for MTD devs) for x86_64, ppc64, ARM (i.MX28, i.MX53 and i.MX6): # modprobe nandsim first_id_byte=0x20 second_id_byte=0xac \ third_id_byte=0x00 fourth_id_byte=0x15 # modprobe mtd_oobtest dev=0 # modprobe mtd_pagetest dev=0 # modprobe mtd_subpagetest dev=0 We also don't have any users depending directly on a particular result of the PRNG (except the PRNG self-test itself), and that's just fine as it e.g. allowed us easily to do things like upgrading from taus88 to taus113. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Lothar Waßmann <LW@KARO-electronics.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25Merge branch 'csums-next'David S. Miller
Tom Herbert says: ==================== net: Checksum offload changes - Part V I am working on overhauling RX checksum offload. Goals of this effort are: - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY - Preserve CHECKSUM_COMPLETE through encapsulation layers - Don't do skb_checksum more than once per packet - Unify GRO and non-GRO csum verification as much as possible - Unify the checksum functions (checksum_init) - Simplify code What is in this fifth patch set: - Added GRO checksum validation functions - Call the GRO validations functions from TCP and GRE gro_receive - Perform checksum verification in the UDP gro_receive path using GRO functions and add support for gro_receive in UDP6 Changes in V2: - Change ip_summed to CHECKSUM_UNNECESSARY instead of moving it to CHECKSUM_COMPLETE from GRO checksum validation. This avoids performance penalty in checksumming bytes which are before the header GRO is at. Please review carefully and test if possible, mucking with basic checksum functions is always a little precarious :-) ---- Test results with this patch set are below. I did not notice any performace regression. Tests run: TCP_STREAM: super_netperf with 200 streams TCP_RR: super_netperf with 200 streams and -r 1,1 Device bnx2x (10Gbps): No GRE RSS hash (RX interrupts occur on one core) UDP RSS port hashing enabled. * GRE with checksum with IPv4 encapsulated packets With fix: TCP_STREAM 9.91% CPU utilization 5163.78 Mbps TCP_RR 50.64% CPU utilization 219/347/502 90/95/99% latencies 834103 tps Without fix: TCP_STREAM 10.05% CPU utilization 5186.22 tps TCP_RR 49.70% CPU utilization 227/338/486 90/95/99% latencies 813450 tps * GRE without checksum with IPv4 encapsulated packets With fix: TCP_STREAM 10.18% CPU utilization 5159 Mbps TCP_RR 51.86% CPU utilization 214/325/471 90/95/99% latencies 865943 tps Without fix: TCP_STREAM 10.26% CPU utilization 5307.87 Mbps TCP_RR 50.59% CPU utilization 224/325/476 90/95/99% latencies 846429 tps *** Simulate device returns CHECKSUM_COMPLETE * VXLAN with checksum With fix: TCP_STREAM 13.03% CPU utilization 9093.9 Mbps TCP_RR 95.96% CPU utilization 161/259/474 90/95/99% latencies 1.14806e+06 tps Without fix: TCP_STREAM 13.59% CPU utilization 9093.97 Mbps TCP_RR 93.95% CPU utilization 160/259/484 90/95/99% latencies 1.10262e+06 tps * VXLAN without checksum With fix: TCP_STREAM 13.28% CPU utilization 9093.87 Mbps TCP_RR 95.04% CPU utilization 155/246/439 90/95/99% latencies 1.15e+06 tps Without fix: TCP_STREAM 13.37% CPU utilization 9178.45 Mbps TCP_RR 93.74% CPU utilization 161/257/469 90/95/99% latencies 1.1068e+06 Mbps ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25gre: When GRE csum is present count as encap layer wrt csumTom Herbert
In GRE demux if the GRE checksum pop rcv encapsulation so that any encapsulated checksums are treated as tunnel checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25udp: additional GRO supportTom Herbert
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25tcp: Call skb_gro_checksum_validateTom Herbert
In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP checksum in the gro context. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25gre: call skb_gro_checksum_simple_validateTom Herbert
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25net: add gro_compute_pseudo functionsTom Herbert
Add inet_gro_compute_pseudo and ip6_gro_compute_pseudo. These are the logical equivalents of inet_compute_pseudo and ip6_compute_pseudo for GRO path. The IP header is taken from skb_gro_network_header. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25net: skb_gro_checksum_* functionsTom Herbert
Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check, and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete. These are the cognates of the normal checksum functions but are used in the gro_receive path and operate on GRO related fields in sk_buffs. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: use reciprocal_scale() helperDaniel Borkmann
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: Allow raw buffers to be passed into the flow dissector.David S. Miller
Drivers, and perhaps other entities we have not yet considered, sometimes want to know how deep the protocol headers go before deciding how large of an SKB to allocate and how much of the packet to place into the linear SKB area. For example, consider a driver which has a device which DMAs into pools of pages and then tells the driver where the data went in the DMA descriptor(s). The driver can then build an SKB and reference most of the data via SKB fragments (which are page/offset/length triplets). However at least some of the front of the packet should be placed into the linear SKB area, which comes before the fragments, so that packet processing can get at the headers efficiently. The first thing each protocol layer is going to do is a "pskb_may_pull()" so we might as well aggregate as much of this as possible while we're building the SKB in the driver. Part of supporting this is that we don't have an SKB yet, so we want to be able to let the flow dissector operate on a raw buffer in order to compute the offset of the end of the headers. So now we have a __skb_flow_dissect() which takes an explicit data pointer and length. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23Merge branch 'bcm7xxx_apd_eee'David S. Miller
Florian Fainelli says: ==================== net: phy: bcm7xxx: APD and EEE support This patch series enables Auto-power down and EEE for the BCM7xxx integrated Gigabit PHYs. I also put a fix for the fixed PHY that would allow clause 45 over clause 22 reads/writes but would return bogus data by using e.g: ethtool --show-eee ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: bcm7xxx: enable EEE at the PHY levelFlorian Fainelli
The 28nm Gigabit PHY on BCM7xxx chips comes out of reset with absolutely no EEE capabilities, such that we would actually return that we do not support EEE when accessing 3.20 (MDIO_PCS_EEE_ABLE) registers. Poke through the vendor-specific C45 register to enable EEE globally at the PHY level, and advertise supported EEE modes. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: allow phy_init_eee() to work with internal PHYsFlorian Fainelli
Internal PHYs do not have any specific phy_interface_t defined because they are within an Ethernet MAC or a larger IC, they will fail the early check in phy_init_eee(). Allow these PHYs to proceed with EEE initialization and report error/success by checking the standard C45 EEE-related registers. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: export phy_{read,write}_mmd_indirectFlorian Fainelli
Some PHY drivers might need to access Clause 45 registers in Clause 22 compatibility mode to e.g: properly advertise EEE support when disabled by default. Export these two helper functions: phy_read_mmd_indirect() and phy_write_mmd_indirect() for drivers to use them. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: fixed: return an error for Clause 45 over 22 readsFlorian Fainelli
The fixed PHY driver does not properly emulate Clause 45 over Clause 22 MDIO reads, and as such, will return bogus values when we access such registers. Return an error when accessing these registers in order to prevent advertising bogus capabilities such as EEE support and such. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: bcm7xxx: enable auto power downFlorian Fainelli
The 28nm process BCM7xxx internal Gigabit PHYs all support automatic power down, turn on that feature as part of the configuration initialization callback. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: broadcom: move shadow 0x1C register accessors to brcmphy.hFlorian Fainelli
The shadow register 0x1C is used both by the BCM54xxx PHYs and the BCM7xxx internal PHYs, move the accessors to a common location so both drivers can use them. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: phy: broadcom: extract all registers to brcmphy.hFlorian Fainelli
Commit 439d39a9ac8fbbba9c04581361188f33f21ced50 ("net: phy: broadcom: extract register definitions") added a bunch of registers to brcmphy.h but left some to broadcom.c, move all of them to the header file since the BCM54xx and BCM7xxx PHY drivers do share all of these registers. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23Merge branch 'tipc-next'David S. Miller
Jon Maloy says: ==================== tipc: Merge port and socket layer code After the removal of the TIPC native interface, there is no reason to keep a distinction between a "generic" port layer and a "specific" socket layer in the code. Throughout the last months, we have posted several series that aimed at facilitating removal of the port layer, and in particular the port_lock spinlock, which in reality duplicates the role normally kept by lock_sock()/bh_lock_sock(). In this series, we finalize this work, by making a significant number of changes to the link, node, port and socket code, all with the aim of reducing dependencies between the layers. In the final commits, we then remove the port spinlock, port.c and port.h altogether. After this series, we have a socket layer that has only few dependencies to the rest of the stack, so that it should be possible to continue cleanups of its code without significantly affecting other code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: merge struct tipc_port into struct tipc_sockJon Paul Maloy
We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c. We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly. Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: remove files ref.h and ref.cJon Paul Maloy
The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: remove include file port.hJon Paul Maloy
We move the inline functions in the file port.h to socket.c, and modify their names accordingly. We move struct tipc_port and some macros to socket.h. Finally, we remove the file port.h. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: remove source file port.cJon Paul Maloy
In this commit, we move the remaining functions in port.c to socket.c, and give them new names that correspond to their new location. We then remove the file port.c. There are only cosmetic changes to the moved functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: remove port_lockJon Paul Maloy
In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: replace port pointer with socket pointer in registryJon Paul Maloy
In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use registry when scanning socketsJon Paul Maloy
The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate functions tipc_port_init and tipc_port_destroyJon Paul Maloy
After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: redefine message acknowledge functionJon Paul Maloy
The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate port_connect()/port_disconnect() functionsJon Paul Maloy
tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: eliminate function tipc_port_shutdown()Jon Paul Maloy
tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: clean up socket timer functionJon Paul Maloy
The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use message to abort connections when losing contact to nodeJon Paul Maloy
In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use pseudo message to wake up sockets after link congestionJon Paul Maloy
The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: introduce new function tipc_msg_create()Jon Paul Maloy
The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Pulling to get some TIPC fixes that a net-next series depends upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tcp: improve undo on timeoutYuchung Cheng
Upon timeout, undo (via both timestamps/Eifel and DSACKs) was disabled if any retransmits were still in flight. The concern was perhaps that spurious retransmission sent in a previous recovery episode may trigger DSACKs to falsely undo the current recovery. However, this inadvertently misses undo opportunities (using either TCP timestamps or DSACKs) when timeout occurs during a loss episode, i.e. recurring timeouts or timeout during fast recovery. In these cases some retransmissions will be in flight but we should allow undo. Furthermore, we should only reset undo_marker and undo_retrans upon timeout if we are starting a new recovery episode. Finally, when we do reset our undo state, we now do so in a manner similar to tcp_enter_recovery(), so that we require a DSACK for each of the outstsanding retransmissions. This will achieve the original goal by requiring that we receive the same number of DSACKs as retransmissions. This patch increases the undo events by 50% on Google servers. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23phylib: use MDIO_DEVS[12]Sergei Shtylyov
The bare register numbers are used despite <uapi/linux/mdio.h> has MDIO_DEVS[12] #define'd for those. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: remove dead code after sk_data_ready changeEric Dumazet
As a followup to commit 676d23690fb ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23net: use ktime_get_ns() and ktime_get_real_ns() helpersEric Dumazet
ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23vxlan: fix incorrect initializer in union vxlan_addrGerhard Stenzel
The first initializer in the following union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sa.sa_family = AF_INET, }; is optimised away by the compiler, due to the second initializer, therefore initialising .sin.sin_addr.s_addr always to 0. This results in netlink messages indicating a L3 miss never contain the missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions. The problem affects user space programs relying on an IP address being sent as part of a netlink message indicating a L3 miss. Changing .sa.sa_family = AF_INET, to .sin.sin_family = AF_INET, fixes the problem. Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23Merge tag 'linux-can-next-for-3.18-20140820' of ↵David S. Miller
git://gitorious.org/linux-can/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2014-08-20 this is a pull request of 10 patches for net-next/master. There is one patch by Wolfram Sang to clean up the build system. Two patches by Stefan Agner that add vf610 support to the flexcan driver. Dong Aisheng add support for bosch's m_can core, which is found in the new freescale ARM SoCs. Sergei Shtylyov improves the rcar_can driver by supporting all input clocks and adding device tree support. The next patch is a small cleanup for the bit rate calculation function by Lad, Prabhakar. And finally a patch by Himangi Saraogi, which converts the mcp251x driver to use dmam_alloc_coherent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22Merge tag 'pwm/for-3.17-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm fix from Thierry Reding: "Just one bugfix for the PWM lookup table code that would cause a PWM channel to be set to the wrong period and polarity for non-perfect matches" * tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: Fix period and polarity in pwm_get() for non-perfect matches
2014-08-22mac80211: fix channel switch for chanctx-based driversMichal Kazior
The new_ctx pointer is set only for non-chanctx drivers. This yielded a crash for chanctx-based drivers during channel switch finalization: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211] Use an adequate chanctx pointer to fix this. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Here are some bug fixes that have piled up during ksummit/linuxcon. 1) Fix endian problems in ibmveth, from Anton Blanchard. 2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from Benjamin Block. 3) SCTP association fixes from Daniel Borkmann. 4) When multiple VLAN headers are present we have to make sure the second and subsequent ones are pullable in the SKB otherwise we blindly dereference garbage. From Jiri Benc. 5) The argument adjustment of the signature of hlist_add_after*() introduced a regression in the batman-adv code, fix from Sven Eckelmann. 6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai Jain. 7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg. 8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from Chas Williams. 9) Truncate packets larger then the TPACKET_V3 format configured buffers, otherwise we overwrite past the end of said buffers. From Eric Dumazet. 10) Fix endianness bugs in qlcnic firmware handling, from Rajesh Borundia and Shahed Shaikh. 11) CXGB4 sometimes doesn't get all of the TX completion events it should resulting in SKBs getting stuck in the TX queue, from Hariprasad Shenai. 12) When the FEC chip's PTP clock is disabled, you can't access the register. Add necessary checks to avoid the resulting hang, from Fugang Duan" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits) drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef net: sctp: fix suboptimal edge-case on non-active active/retrans path selection net: sctp: spare unnecessary comparison in sctp_trans_elect_best net: ethernet: broadcom: bnx2x: Remove redundant #ifdef ibmveth: Fix endian issues with rx_no_buffer statistic net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings() openvswitch: fix panic with multiple vlan headers net: ipv6: fib: don't sleep inside atomic lock net: fec: ptp: avoid register access when ipg clock is disabled cxgb4: Free completed tx skbs promptly cxgb4: Fix race condition in cleanup sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe bnx2x: Revert UNDI flushing mechanism qlcnic: Fix endianess issue in firmware load from file operation qlcnic: Fix endianess issue in FW dump template header qlcnic: Fix flash access interface to application MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver macvlan: Allow setting multicast filter on all macvlan types packet: handle too big packets for PACKET_V3 MAINTAINERS: add entry for ec_bhf driver ...
2014-08-22dp83640: Fix length check for event timestamp status messagesChristian Riesch
Event timestamp status messages have a variable length, ranging from 1 to 5 words (16 bit words). The current code however requires a minimum message length of sizeof(*phy_txts). In most cases this condition is fulfilled due to padding bytes. However, if several events are signaled in a single message, padding bytes may not be present. For short event timestamp status messages, the length check will fail, and the event timestamp will be dropped. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: stmmac: add fix_mac_speed support for socfpgaLey Foon Tan
This patch adds fix_mac_speed() support for Altera socfpga Ethernet controller. Emac splitter is a soft IP core in FPGA system that converts GMII interface from Synopsys mac to RGMII/SGMII interface. This splitter core is an optional IP if user would like to use RGMII/SGMII interface in their system. Software needs to update a register in splitter core when there is speed change. Signed-off-by: Ley Foon Tan <lftan@altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22r8169:add support for RTL8168H and RTL8107EChun-Hao Lin
RTL8168H is Realtek PCIe Gigabit Ethernet controller. RTL8107E is Realtek PCIe Fast Ethernet controller. This patch add support for these two chips. Signed-off-by: Chun-Hao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22bonding: create netlink event when bonding option is changedJiri Pirko
Userspace needs to be notified if one changes some option. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>