summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2012-06-11Merge branch 'master' of git://1984.lsi.us.es/net-nextDavid S. Miller
2012-06-07netfilter: ipvs: switch hook PFs to nfprotoAlban Crequy
This patch is a cleanup. Use NFPROTO_* for consistency with other netfilter code. Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: xt_recent: add address masking optionDenys Fedoryshchenko
The mask option allows you put all address belonging that mask into the same recent slot. This can be useful in case that recent is used to detect attacks from the same network segment. Tested for backward compatibility. Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: NFQUEUE: don't xor src/dst ip address for load distributionFlorian Westphal
because reply packets need to go to the same nfqueue, src/dst ip address were xor'd prior to jhash(). However, this causes bad distribution for some workloads, e.g. flows a.b.1.{1,n} -> a.b.2.{1,n} all share the same hash value. Avoid this by hashing both. To get same hash for replies, first argument is the smaller address. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_conntrack: add namespace support for cttimeoutGao feng
This patch adds namespace support for cttimeout. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_conntrack: remove now unused sysctl for nf_conntrack_l[3|4]protoPablo Neira Ayuso
Since the sysctl data for l[3|4]proto now resides in pernet nf_proto_net. We can now remove this unused fields from struct nf_contrack_l[3,4]proto. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_gre: use new namespace supportGao feng
This patch modifies the GRE protocol tracker, which partially supported namespace before this patch, to use the new namespace infrastructure for nf_conntrack. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_dccp: use new namespace supportGao feng
This patch modifies the DCCP protocol tracker to use the new namespace infrastructure for nf_conntrack. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_udplite: add namespace supportGao feng
This patch adds namespace support for UDPlite protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_sctp: add namespace supportGao feng
This patch adds namespace support for SCTP protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_icmp: add namespace supportGao feng
This patch adds namespace support for ICMPv6 protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_icmp: add namespace supportGao feng
This patch adds namespace support for ICMP protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_udp: add namespace supportGao feng
This patch adds namespace support for UDP protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_tcp: add namespace supportGao feng
This patch adds namespace support for TCP protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_generic: add namespace supportGao feng
This patch adds namespace support for the generic layer 4 protocol tracker. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_conntrack: prepare namespace support for l3 protocol trackersGao feng
This patch prepares the namespace support for layer 3 protocol trackers. Basically, this modifies the following interfaces: * nf_ct_l3proto_[un]register_sysctl. * nf_conntrack_l3proto_[un]register. We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto. This adds rhe new struct nf_ip_net that is used to store the sysctl header and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the protos such tcp and tcp6 use the same data,so making nf_ip_net as a field of netns_ct is the easiest way to manager it. This patch also adds init_net to struct nf_conntrack_l3proto to initial the layer 3 protocol pernet data. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_conntrack: prepare namespace support for l4 protocol trackersGao feng
This patch prepares the namespace support for layer 4 protocol trackers. Basically, this modifies the following interfaces: * nf_ct_[un]register_sysctl * nf_conntrack_l4proto_[un]register to include the namespace parameter. We still use init_net in this patch to prepare the ground for follow-up patches for each layer 4 protocol tracker. We add a new net_id field to struct nf_conntrack_l4proto that is used to store the pernet_operations id for each layer 4 protocol tracker. Note that AF_INET6's protocols do not need to do sysctl compat. Thus, we only register compat sysctl when l4proto.l3proto != AF_INET6. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: Add fail-open supportKrishna Kumar
Implement a new "fail-open" mode where packets are not dropped upon queue-full condition. This mode can be enabled/disabled per queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Vivek Kashyap <vivk@us.ibm.com> Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>
2012-06-07netfilter: xt_connlimit: remove revision 0Cong Wang
It was scheduled to be removed. Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: nf_ct_h323: fix bug in rtcp nattingPablo Neira Ayuso
The nat_rtp_rtcp hook takes two separate parameters port and rtp_port. port is expected to be the real h245 address (found inside the packet). rtp_port is the even number closest to port (RTP ports are even and RTCP ports are odd). However currently, both port and rtp_port are having same value (both are rounded to nearest even numbers). This works well in case of openlogicalchannel with media (RTP/even) port. But in case of openlogicalchannel for media control (RTCP/odd) port, h245 address in the packet is wrongly modified to have an even port. I am attaching a pcap demonstrating the problem, for any further analysis. This behavior was introduced around v2.6.19 while rewriting the helper. Signed-off-by: Jagdish Motwani <jagdish.motwani@elitecore.com> Signed-off-by: Sanket Shah <sanket.shah@elitecore.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: xt_HMARK: fix endianness and provide consistent hashingHans Schillstrom
This patch addresses two issues: a) Fix usage of u32 and __be32 that causes endianess warnings via sparse. b) Ensure consistent hashing in a cluster that is composed of big and little endian systems. Thus, we obtain the same hash mark in an heterogeneous cluster. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-04net: Remove casts to same typeJoe Perches
Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04net: use consume_skb() in place of kfree_skb()Eric Dumazet
Remove some dropwatch/drop_monitor false positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-23Merge branch 'for-3.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "Contains Alex Shi's three patches to remove percpu_xxx() which overlap with this_cpu_xxx(). There shouldn't be any functional change." * 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: remove percpu_xxx() functions x86: replace percpu_xxx funcs with this_cpu_xxx net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxx
2012-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-05-16netfilter: nf_ct_h323: fix usage of MODULE_ALIAS_NFCT_HELPERPablo Neira Ayuso
ctnetlink uses the aliases that are created by MODULE_ALIAS_NFCT_HELPER to auto-load the module based on the helper name. Thus, we have to use RAS, Q.931 and H.245, not H.323. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: xt_CT: remove redundant header includeEldad Zack
nf_conntrack_l4proto.h is included twice. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: ipset: fix timeout value overflow bugJozsef Kadlecsik
Large timeout parameters could result wrong timeout values due to an overflow at msec to jiffies conversion (reported by Andreas Herz) [ This patch was mangled by Pablo Neira Ayuso since David Laight and Eric Dumazet noticed that we were using hardcoded 1000 instead of MSEC_PER_SEC to calculate the timeout ] Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: nf_ct_tcp: extend log message for invalid ignored packetsPablo Neira Ayuso
Extend log message if packets are ignored to include the TCP state, ie. replace: [ 3968.070196] nf_ct_tcp: invalid packet ignored IN= OUT= SRC=... by: [ 3968.070196] nf_ct_tcp: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=... This information is useful to know in what state we were while ignoring the packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-05-16netfilter: xt_HMARK: modulus is expensive for hash calculationPablo Neira Ayuso
Use: ((u64)(HASH_VAL * HASH_SIZE)) >> 32 as suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: xt_HMARK: potential NULL dereference in get_inner_hdr()Dan Carpenter
There is a typo in the error checking and "&&" was used instead of "||". If skb_header_pointer() returns NULL then it leads to a NULL dereference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: xt_hashlimit: use _ALL macro to reject unknown flag bitsFlorian Westphal
David Miller says: The canonical way to validate if the set bits are in a valid range is to have a "_ALL" macro, and test: if (val & ~XT_HASHLIMIT_ALL) goto err;" make it so. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-16netfilter: ipset: fix hash size checking in kernelJozsef Kadlecsik
The hash size must fit both into u32 (jhash) and the max value of size_t. The missing checking could lead to kernel crash, bug reported by Seblu. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15net: Convert net_ratelimit uses to net_<level>_ratelimitedJoe Perches
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-14net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxxAlex Shi
percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them for further code clean up. And in preempt safe scenario, __this_cpu_xxx funcs may has a bit better performance since __this_cpu_xxx has no redundant preempt_enable/preempt_disable on some architectures. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-05-10netfilter: Convert compare_ether_addr to ether_addr_equalJoe Perches
Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-09netfilter: hashlimit: byte-based limit modeFlorian Westphal
can be used e.g. for ingress traffic policing or to detect when a host/port consumes more bandwidth than expected. This is done by optionally making cost to mean "cost per 16-byte-chunk-of-data" instead of "cost per packet". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09netfilter: hashlimit: move rateinfo initialization to helperFlorian Westphal
followup patch would bloat main match function too much. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09netfilter: limit, hashlimit: avoid duplicated inlineFlorian Westphal
credit_cap can be set to credit, which avoids inlining user2credits twice. Also, remove inline keyword and let compiler decide. old: 684 192 0 876 36c net/netfilter/xt_limit.o 4927 344 32 5303 14b7 net/netfilter/xt_hashlimit.o now: 668 192 0 860 35c net/netfilter/xt_limit.o 4793 344 32 5169 1431 net/netfilter/xt_hashlimit.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09netfilter: add xt_hmark target for hash-based skb markingHans Schillstrom
The target allows you to create rules in the "raw" and "mangle" tables which set the skbuff mark by means of hash calculation within a given range. The nfmark can influence the routing method (see "Use netfilter MARK value as routing key") and can also be used by other subsystems to change their behaviour. [ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-09netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()Hans Schillstrom
This patch adds the flags parameter to ipv6_find_hdr. This flags allows us to: * know if this is a fragment. * stop at the AH header, so the information contained in that header can be used for some specific packet handling. This patch also adds the offset parameter for inspection of one inner IPv6 header that is contained in error messages. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08netfilter: nf_conntrack: fix explicit helper attachment and NATPablo Neira Ayuso
Explicit helper attachment via the CT target is broken with NAT if non-standard ports are used. This problem was hidden behind the automatic helper assignment routine. Thus, it becomes more noticeable now that we can disable the automatic helper assignment with Eric Leblond's: 9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment Basically, nf_conntrack_alter_reply asks for looking up the helper up if NAT is enabled. Unfortunately, we don't have the conntrack template at that point anymore. Since we don't want to rely on the automatic helper assignment, we can skip the second look-up and stick to the helper that was attached by iptables. With the CT target, the user is in full control of helper attachment, thus, the policy is to trust what the user explicitly configures via iptables (no automatic magic anymore). Interestingly, this bug was hidden by the automatic helper look-up code. But it can be easily trigger if you attach the helper in a non-standard port, eg. iptables -I PREROUTING -t raw -p tcp --dport 8888 \ -j CT --helper ftp And you disabled the automatic helper assignment. I added the IPS_HELPER_BIT that allows us to differenciate between a helper that has been explicitly attached and those that have been automatically assigned. I didn't come up with a better solution (having backward compatibility in mind). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08netfilter: nf_ct_expect: partially implement ctnetlink_change_expectKelvie Wong
This refreshes the "timeout" attribute in existing expectations if one is given. The use case for this would be for userspace helpers to extend the lifetime of the expectation when requested, as this is not possible right now without deleting/recreating the expectation. I use this specifically for forwarding DCERPC traffic through: DCERPC has a port mapper daemon that chooses a (seemingly) random port for future traffic to go to. We expect this traffic (with a reasonable timeout), but sometimes the port mapper will tell the client to continue using the same port. This allows us to extend the expectation accordingly. Signed-off-by: Kelvie Wong <kelvie@ieee.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08ipvs: ip_vs_proto: local functions should not be exposed globallyH Hartley Sweeten
Functions not referenced outside of a source file should be marked static to prevent it from being exposed globally. This quiets the sparse warnings: warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: ip_vs_ftp: local functions should not be exposed globallyH Hartley Sweeten
Functions not referenced outside of a source file should be marked static to prevent it from being exposed globally. This quiets the sparse warnings: warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: optimize the use of flags in ip_vs_bind_destPablo Neira Ayuso
cp->flags is marked volatile but ip_vs_bind_dest can safely modify the flags, so save some CPU cycles by using temp variable. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: add support for sync threadsPablo Neira Ayuso
Allow master and backup servers to use many threads for sync traffic. Add sysctl var "sync_ports" to define the number of threads. Every thread will use single UDP port, thread 0 will use the default port 8848 while last thread will use port 8848+sync_ports-1. The sync traffic for connections is scheduled to many master threads based on the cp address but one connection is always assigned to same thread to avoid reordering of the sync messages. Remove ip_vs_sync_switch_mode because this check for sync mode change is still risky. Instead, check for mode change under sync_buff_lock. Make sure the backup socks do not block on reading. Special thanks to Aleksey Chudov for helping in all tests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: reduce sync rate with time thresholdsJulian Anastasov
Add two new sysctl vars to control the sync rate with the main idea to reduce the rate for connection templates because currently it depends on the packet rate for controlled connections. This mechanism should be useful also for normal connections with high traffic. sync_refresh_period: in seconds, difference in reported connection timer that triggers new sync message. It can be used to avoid sync messages for the specified period (or half of the connection timeout if it is lower) if connection state is not changed from last sync. sync_retries: integer, 0..3, defines sync retries with period of sync_refresh_period/8. Useful to protect against loss of sync messages. Allow sysctl_sync_threshold to be used with sysctl_sync_period=0, so that only single sync message is sent if sync_refresh_period is also 0. Add new field "sync_endtime" in connection structure to hold the reported time when connection expires. The 2 lowest bits will represent the retry count. As the sysctl_sync_period now can be 0 use ACCESS_ONCE to avoid division by zero. Special thanks to Aleksey Chudov for being patient with me, for his extensive reports and helping in all tests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: wakeup master threadPablo Neira Ayuso
High rate of sync messages in master can lead to overflowing the socket buffer and dropping the messages. Fixed sleep of 1 second without wakeup events is not suitable for loaded masters, Use delayed_work to schedule sending for queued messages and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will reduce the rate of wakeups but to avoid sending long bursts we wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages. Add hard limit for the queued messages before sending by using "sync_qlen_max" sysctl var. It defaults to 1/32 of the memory pages but actually represents number of messages. It will protect us from allocating large parts of memory when the sending rate is lower than the queuing rate. As suggested by Pablo, add new sysctl var "sync_sock_size" to configure the SNDBUF (master) or RCVBUF (slave) socket limit. Default value is 0 (preserve system defaults). Change the master thread to detect and block on SNDBUF overflow, so that we do not drop messages when the socket limit is low but the sync_qlen_max limit is not reached. On ENOBUFS or other errors just drop the messages. Change master thread to enter TASK_INTERRUPTIBLE state early, so that we do not miss wakeups due to messages or kthread_should_stop event. Thanks to Pablo Neira Ayuso for his valuable feedback! Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08ipvs: always update some of the flags bits in backupJulian Anastasov
As the goal is to mirror the inactconns/activeconns counters in the backup server, make sure the cp->flags are updated even if cp is still not bound to dest. If cp->flags are not updated ip_vs_bind_dest will rely only on the initial flags when updating the counters. To avoid mistakes and complicated checks for protocol state rely only on the IP_VS_CONN_F_INACTIVE bit when updating the counters. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>