summaryrefslogtreecommitdiff
path: root/include/net/tcp.h
AgeCommit message (Collapse)Author
2012-05-17tcp: bool conversionsEric Dumazet
bool conversions where possible. __inline__ -> inline space cleanups Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11tcp: Move rcvq sending to tcp_input.cPavel Emelyanov
It actually works on the input queue and will use its read mem routines, thus it's better to have in in the tcp_input.c file. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04tcp: be more strict before accepting ECN negociationEric Dumazet
It appears some networks play bad games with the two bits reserved for ECN. This can trigger false congestion notifications and very slow transferts. Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can disable TCP ECN negociation if it happens we receive mangled CT bits in the SYN packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Perry Lorier <perryl@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Wilmer van der Gaast <wilmer@google.com> Cc: Ankur Jain <jankur@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Dave Täht <dave.taht@bufferbloat.net> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03net: implement tcp coalescing in tcp_queue_rcv()Eric Dumazet
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main receiver function when application is not blocked in recvmsg(). Function tcp_queue_rcv() is moved a bit to allow its call from tcp_data_queue() This gives good results especially if GRO could not kick, and if skb head is a fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03tcp: early retransmit: delayed fast retransmitYuchung Cheng
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2). Delays the fast retransmit by an interval of RTT/4. We borrow the RTO timer to implement the delay. If we receive another ACK or send a new packet, the timer is cancelled and restored to original RTO value offset by time elapsed. When the delayed-ER timer fires, we enter fast recovery and perform fast retransmit. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03tcp: early retransmitYuchung Cheng
This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizingEric Dumazet
Quoting Tore Anderson from : https://bugzilla.kernel.org/show_bug.cgi?id=42572 When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment size does not take into account the size of the IPv6 Fragmentation header that needs to be included in outbound packets, causing every transmitted TCP segment to be fragmented across two IPv6 packets, the latter of which will only contain 8 bytes of actual payload. RTAX_FEATURE_ALLFRAG is typically set on a route in response to receving a ICMPv6 Packet Too Big message indicating a Path MTU of less than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6 PTBs with MTU < 1280 are still valid, in particular when an IPv6 packet is sent to an IPv4 destination through a stateless translator. Any ICMPv4 Need To Fragment packets originated from the IPv4 part of the path will be translated to ICMPv6 PTB which may then indicate an MTU of less than 1280. The Linux kernel refuses to reduce the effective MTU to anything below 1280 bytes, instead it sets it to exactly 1280 bytes, and RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header), instead of 1232 (additionally taking into account the 8 bytes required by the IPv6 Fragmentation extension header). This in turn results in rather inefficient transmission, as every transmitted TCP segment now is split in two fragments containing 1232+8 bytes of payload. After this patch, all the outgoing packets that includes a Fragmentation header all are "atomic" or "non-fragmented" fragments, i.e., they both have Offset=0 and More Fragments=0. With help from David S. Miller Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()Neal Cardwell
This commit moves the (substantial) common code shared between tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family independent function, tcp_init_sock(). Centralizing this functionality should help avoid drift issues, e.g. where the IPv4 side is updated without a corresponding update to IPv6. There was already some drift: IPv4 initialized snd_cwnd to TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to 2 (in this case it should not matter, since snd_cwnd is also initialized in tcp_init_metrics(), but the general risks and maintenance overhead remain). When diffing the old and new code, note that new tcp_init_sock() function uses the order of steps from the tcp_v4_init_sock() implementation (the order is slightly different in tcp_v6_init_sock()). Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Initial repair modePavel Emelyanov
This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Move code aroundPavel Emelyanov
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16tcp: restore formatting of macros for tcp_skb_cb sacked fieldNeal Cardwell
Commit b82d1bb4 inadvertendly placed unrelated new code between TCPCB_EVER_RETRANS and TCPCB_RETRANS and the other macros that refer to the sacked field in the struct tcp_skb_cb (probably because there was a misleading empty line there). This commit fixes up the formatting so that all macros related to the sacked field are adjacent again. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14tcp: RFC6298 supersedes RFC2988bisEric Dumazet
Updates some comments to track RFC6298 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-24Merge tag 'bug-for-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-04BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/tg3.c Conflicts in the statistics regression bug fix from 'net', but happily Matt Carlson originally posted the fix against 'net-next' so I used that to resolve this. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-28tcp: fix comment for tp->highest_sackNeal Cardwell
There was an off-by-one error in the comments describing the highest_sack field in struct tcp_sock. The comments previously claimed that it was the "start sequence of the highest skb with SACKed bit". This commit fixes the comments to note that it is the "start sequence of the skb just *after* the highest skb with SACKed bit". Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-02-01net: Disambiguate kernel messageArun Sharma
Some of our machines were reporting: TCP: too many of orphaned sockets even when the number of orphaned sockets was well below the limit. We print a different message depending on whether we're out of TCP memory or there are too many orphaned sockets. Also move the check out of line and cleanup the messages that were printed. Signed-off-by: Arun Sharma <asharma@fb.com> Suggested-by: Mohan Srinivasan <mohan@fb.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: David Miller <davem@davemloft.net> Cc: Glauber Costa <glommer@parallels.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-01tcp: md5: protects md5sig_info with RCUEric Dumazet
This patch makes sure we use appropriate memory barriers before publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from tcp_v4_send_reset() without holding socket lock (upcoming patch from Shawn Lu) Note we also need to respect rcu grace period before its freeing, since we can free socket without this grace period thanks to SLAB_DESTROY_BY_RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Lu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31tcp: md5: rcu conversionEric Dumazet
In order to be able to support proper RST messages for TCP MD5 flows, we need to allow access to MD5 keys without locking listener socket. This conversion is a nice cleanup, and shrinks size of timewait sockets by 80 bytes. IPv6 code reuses generic code found in IPv4 instead of duplicating it. Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Lu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31tcp: md5: remove obsolete md5_add() methodEric Dumazet
We no longer use md5_add() method from struct tcp_sock_af_ops Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTLGlauber Costa
sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c in commit 3dc43e3e4d0b52197d3205214fe8f162f9e0c334, since it became a per-ns value. That code, however, will never run when CONFIG_SYSCTL is disabled, leading to bogus values on those fields - causing hung TCP sockets. This patch fixes it by keeping an initialization code in tcp_init(). It will be overwritten by the first net namespace init if CONFIG_SYSCTL is compiled in, and do the right thing if it is compiled out. It is also named properly as tcp_init_mem(), to properly signal its non-sysctl side effect on TCP limits. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: David S. Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com [ renamed the function, tidied up the changelog a bit ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21tcp: Replace constants with #define macrosVijay Subramanian
to record the state of SACK/FACK and DSACK for better readability and maintenance. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13per-netns ipv4 sysctl_tcp_memGlauber Costa
This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13foundations of per-cgroup memory pressure controlling.Glauber Costa
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-11net: use IS_ENABLED(CONFIG_IPV6)Eric Dumazet
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29tcp: do not scale TSO segment size with reordering degreeNeal Cardwell
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16net: introduce and use netdev_features_t for device features setsMichał Mirosław
v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01net: make the tcp and udp file_operations for the /proc stuff constArjan van de Ven
the tcp and udp code creates a set of struct file_operations at runtime while it can also be done at compile time, with the added benefit of then having these file operations be const. the trickiest part was to get the "THIS_MODULE" reference right; the naive method of declaring a struct in the place of registration would not work for this reason. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24TCP: remove TCP_DEBUGFlavio Leitner
It was enabled by default and the messages guarded by the define are useful. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24tcp: md5: add more const attributesEric Dumazet
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more const attributes to callers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24tcp: md5: dont write skb head in tcp_md5_hash_header()Eric Dumazet
tcp_md5_hash_header() writes into skb header a temporary zero value, this might confuse other users of this area. Since tcphdr is small (20 bytes), copy it in a temporary variable and make the change in the copy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21tcp: add const qualifiers where possibleEric Dumazet
Adding const qualifiers to pointers can ease code review, and spot some bugs. It might allow compiler to optimize code further. For example, is it legal to temporary write a null cksum into tcphdr in tcp_md5_hash_header() ? I am afraid a sniffer could catch the temporary null value... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: rename tcp_skb_cb flagsEric Dumazet
Rename struct tcp_skb_cb "flags" to "tcp_flags" to ease code review and maintenance. Its content is a combination of FIN/SYN/RST/PSH/ACK/URG/ECE/CWR flags Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: unalias tcp_skb_cb flags and ip_dsfieldEric Dumazet
struct tcp_skb_cb contains a "flags" field containing either tcp flags or IP dsfield depending on context (input or output path) Introduce ip_dsfield to make the difference clear and ease maintenance. If later we want to save space, we can union flags/ip_dsfield Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: ECN blackhole should not force quickack modeEric Dumazet
While playing with a new ADSL box at home, I discovered that ECN blackhole can trigger suboptimal quickack mode on linux : We send one ACK for each incoming data frame, without any delay and eventual piggyback. This is because TCP_ECN_check_ce() considers that if no ECT is seen on a segment, this is because this segment was a retransmit. Refine this heuristic and apply it only if we seen ECT in a previous segment, to detect ECN blackhole at IP level. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> CC: Jerry Chu <hkchu@google.com> CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> CC: Jim Gettys <jg@freedesktop.org> CC: Dave Taht <dave.taht@gmail.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-22Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
2011-09-19tcp: fix build error if !CONFIG_SYN_COOKIESEric Dumazet
commit 946cedccbd7387 (tcp: Change possible SYN flooding messages) added a build error if CONFIG_SYN_COOKIES=n Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-17tcp: md5: remove one indirection level in tcp_md5sig_poolEric Dumazet
tcp_md5sig_pool is currently an 'array' (a percpu object) of pointers to struct tcp_md5sig_pool. Only the pointers are NUMA aware, but objects themselves are all allocated on a single node. Remove this extra indirection to get proper percpu memory (NUMA aware) and make code simpler. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15tcp: Change possible SYN flooding messagesEric Dumazet
"Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open sideJerry Chu
This patch lowers the default initRTO from 3secs to 1sec per RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet has been retransmitted, AND the TCP timestamp option is not on. It also adds support to take RTT sample during 3WHS on the passive open side, just like its active open counterpart, and uses it, if valid, to seed the initRTO for the data transmission phase. The patch also resets ssthresh to its initial default at the beginning of the data transmission phase, and reduces cwnd to 1 if there has been MORE THAN ONE retransmission during 3WHS per RFC5681. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-20tcp: Remove debug macro of TCP_CHECK_TIMERShan Wei
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing. And, it has been there for several years, maybe 6 years. Remove it to keep code clearer. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-06tcp: Add reference to initial CWND ietf draft.David S. Miller
Suggested by Alexander Zimmermann Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03tcp: Increase the initial congestion window to 10.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Nandita Dukkipati <nanditad@google.com>
2011-01-24net: change netdev->features to u32Michał Mirosław
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-21TCP: increase default initial receive window.Nandita Dukkipati
This patch changes the default initial receive window to 10 mss (defined constant). The default window is limited to the maximum of 10*1460 and 2*mss (when mss > 1460). draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends increasing TCP's initial congestion window to 10 mss or about 15KB. Leading up to this proposal were several large-scale live Internet experiments with an initial congestion window of 10 mss (IW10), where we showed that the average latency of HTTP responses improved by approximately 10%. This was accompanied by a slight increase in retransmission rate (0.5%), most of which is coming from applications opening multiple simultaneous connections. To understand the extreme worst case scenarios, and fairness issues (IW10 versus IW3), we further conducted controlled testbed experiments. We came away finding minimal negative impact even under low link bandwidths (dial-ups) and small buffers. These results are extremely encouraging to adopting IW10. However, an initial congestion window of 10 mss is useless unless a TCP receiver advertises an initial receive window of at least 10 mss. Fortunately, in the large-scale Internet experiments we found that most widely used operating systems advertised large initial receive windows of 64KB, allowing us to experiment with a wide range of initial congestion windows. Linux systems were among the few exceptions that advertised a small receive window of 6KB. The purpose of this patch is to fix this shortcoming. References: 1. A comprehensive list of all IW10 references to date. http://code.google.com/speed/protocols/tcpm-IW10.html 2. Paper describing results from large-scale Internet experiments with IW10. http://ccr.sigcomm.org/drupal/?q=node/621 3. Controlled testbed experiments under worst case scenarios and a fairness study. http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf 4. Raw test data from testbed experiments (Linux senders/receivers) with initial congestion and receive windows of both 10 mss. http://research.csc.ncsu.edu/netsrv/?q=content/iw10 5. Internet-Draft. Increasing TCP's Initial Window. https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/ Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20net: kill unused macrosShan Wei
These macros never be used, so remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16tcp: relax tcp_paws_check()Eric Dumazet
Some windows versions have wrong RFC1323 implementations, with SYN and SYNACKS messages containing zero tcp timestamps. We relaxed in commit fc1ad92dfc4e363 the passive connection case (Windows connects to a linux machine), but the reverse case (linux connects to a Windows machine) has an analogue problem when tsvals from windows machine are 'negative' (high order bit set) : PAWS triggers and we drops incoming messages. Fix this by making zero ts_recent value special, allowing frame to be processed. Based on a report and initial patch from Dmitiy Balakin Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=24842 Reported-by: dmitriy.balakin@nicneiron.ru Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02net: kill unused macros from head fileShan Wei
These macros have been defined for several years since v2.6.12-rc2(tracing by git), but never be used. So remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>