summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-02-26sunrpc: integer underflow in rsc_parse()Dan Carpenter
If we call groups_alloc() with invalid values then it's might lead to memory corruption. For example, with a negative value then we might not allocate enough for sizeof(struct group_info). (We're doing this in the caller for consistency with other callers of groups_alloc(). The other alternative might be to move the check out of all the callers into groups_alloc().) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-17svcrpc: fix memory leak in gssp_accept_sec_context_upcallDavid Ramos
Our UC-KLEE tool found a kernel memory leak of 512 bytes (on x86_64) for each call to gssp_accept_sec_context_upcall() (net/sunrpc/auth_gss/gss_rpc_upcall.c). Since it appears that this call can be triggered by remote connections (at least, from a cursory a glance at the call chain), it may be exploitable to cause kernel memory exhaustion. We found the bug in kernel 3.16.3, but it appears to date back to commit 9dfd87da1aeb0fd364167ad199f40fe96a6a87be (2013-08-20). The gssp_accept_sec_context_upcall() function performs a pair of calls to gssp_alloc_receive_pages() and gssp_free_receive_pages(). The first allocates memory for arg->pages. The second then frees the pages pointed to by the arg->pages array, but not the array itself. Reported-by: David A. Ramos <daramos@stanford.edu> Fixes: 9dfd87da1aeb ("rpc: fix huge kmalloc's in gss-proxy”) Signed-off-by: David A. Ramos <daramos@stanford.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-02Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20J. Bruce Fields
Christoph's block pnfs patches have some minor dependencies on these lock patches.
2015-01-23sunrpc/lockd: fix references to the BKLJeff Layton
The BKL is completely out of the picture in the lockd and sunrpc code these days. Update the antiquated comments that refer to it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Handle additional inline contentChuck Lever
Most NFS RPCs place their large payload argument at the end of the RPC header (eg, NFSv3 WRITE). For NFSv3 WRITE and SYMLINK, RPC/RDMA sends the complete RPC header inline, and the payload argument in the read list. Data in the read list is the last part of the XDR stream. One important case is not like this, however. NFSv4 COMPOUND is a counted array of operations. A WRITE operation, with its large data payload, can appear in the middle of the compound's operations array. Thus NFSv4 WRITE compounds can have header content after the WRITE payload. The Linux client, for example, performs an NFSv4 WRITE like this: { PUTFH, WRITE, GETATTR } Though RFC 5667 is not precise about this, the proper way to convey this compound is to place the GETATTR inline, _after_ the front of the RPC header. The receiver inserts the read list payload into the XDR stream after the initial WRITE arguments, and before the GETATTR operation, thanks to the value of the read list "position" field. The Linux client currently sends the GETATTR at the end of the RPC/RDMA read list, which is incorrect. It will be corrected in the future. The Linux server currently rejects NFSv4 compounds with inline content after the read list. For the above NFSv4 WRITE compound, the NFS compound header indicates there are three operations, but the server finds nonsense when it looks in the XDR stream for the third operation, and the compound fails with OP_ILLEGAL. Move trailing inline content to the end of the XDR buffer's page list. This presents incoming NFSv4 WRITE compounds to NFSD in the same way the socket transport does. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Move read list XDR round-up logicChuck Lever
This is a pre-requisite for a subsequent patch. Read list XDR round-up needs to be done _before_ additional inline content is copied to the end of the XDR buffer's page list. Move the logic added by commit e560e3b510d2 ("svcrdma: Add zero padding if the client doesn't send it"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Support RDMA_NOMSG requestsChuck Lever
Currently the Linux server can not decode RDMA_NOMSG type requests. Operations whose length exceeds the fixed size of RDMA SEND buffers, like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via RDMA_NOMSG. For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC headers, and some or all of the NFS arguments via RDMA SEND. For an RDMA_NOMSG type request, the client sends just the RPC/RDMA header via RDMA SEND. The request's read list contains elements for the entire RPC message, including the RPC header. NFSD expects the RPC/RMDA header and RPC header to be contiguous in page zero of the XDR buffer. Add logic in the RDMA READ path to make the read list contents land where the server prefers, when the incoming message is a type RDMA_NOMSG message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: rc_position sanity checkingChuck Lever
An RPC/RDMA client may send large RPC arguments via a read list. This is a list of scatter/gather elements which convey RPC call arguments too large to fit in a small RDMA SEND. Each entry in the read list has a "position" field, whose value is the byte offset in the XDR stream where the data in that entry is to be inserted. Entries which share the same "position" value make up the same RPC argument. The receiver inserts entries with the same position field value in list order into the XDR stream. Currently the Linux NFS/RDMA server cannot handle receiving read chunks in more than one position, mostly because no current client sends read lists with elements in more than one position. As a sanity check, ensure that all received chunks have the same "rc_position." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Plant reader function in struct svcxprt_rdmaChuck Lever
The RDMA reader function doesn't change once an svcxprt_rdma is instantiated. Instead of checking sc_devcap during every incoming RPC, set the reader function once when the connection is accepted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Find rmsgp more reliablyChuck Lever
xdr_start() can return the wrong rmsgp address if an assumption about how the xdr_buf was constructed changes. When it gets it wrong, the client receives a reply that has gibberish in the RPC/RDMA header, preventing it from matching a waiting RPC request. Instead, make (and document) just one assumption: that the RDMA header for the client's RPC call is at the start of the first page in rq_pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Scrub BUG_ON() and WARN_ON() call sitesChuck Lever
Current convention is to avoid using BUG_ON() in places where an oops could cause complete system failure. Replace BUG_ON() call sites in svcrdma with an assertion error message and allow execution to continue safely. Some BUG_ON() calls are removed because they have never fired in production (that we are aware of). Some WARN_ON() calls are also replaced where a back trace is not helpful; e.g., in a workqueue task. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Clean up read chunk countingChuck Lever
The byte_count argument is not used, and the function is called only from one place. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Remove unused variableChuck Lever
Nit: remove an unused variable to squelch a compiler warning. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15svcrdma: Clean up dprintkChuck Lever
Nit: Fix inconsistent white space in dprintk messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't use uninitialized data in IPVS, from Dan Carpenter. 2) conntrack race fixes from Pablo Neira Ayuso. 3) Fix TX hangs with i40e, from Jesse Brandeburg. 4) Fix budget return from poll calls in dnet and alx, from Eric Dumazet. 5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph Jaeger. 6) Fix bug introduced by conversion to list_head in TIPC retransmit code, from Jon Paul Maloy. 7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey Khoroshilov. 8) Fix bridge build with INET disabled, from Arnd Bergmann. 9) Fix netlink array overrun for PROBE attributes in openvswitch, from Thomas Graf. 10) Don't hold spinlock across synchronize_irq() in tg3 driver, from Prashant Sreedharan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) tg3: Release tp->lock before invoking synchronize_irq() tg3: tg3_reset_task() needs to use rtnl_lock to synchronize tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin openvswitch: packet messages need their own probe attribtue i40e: adds FCoE configure option cxgb4vf: Fix queue allocation for 40G adapter netdevice: Add missing parentheses in macro bridge: only provide proxy ARP when CONFIG_INET is enabled neighbour: fix base_reachable_time(_ms) not effective immediatly when changed net: fec: fix MDIO bus assignement for dual fec SoC's xen-netfront: use different locks for Rx and Tx stats drivers: net: cpsw: fix multicast flush in dual emac mode cxgb4vf: Initialize mdio_addr before using it net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb() MAINTAINERS: add me as ibmveth maintainer tipc: fix bug in broadcast retransmit code update ip-sysctl.txt documentation (v2) net/at91_ether: prepare and unprepare clock ...
2015-01-14openvswitch: packet messages need their own probe attribtueThomas Graf
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow and packet messages. This leads to an out-of-bounds access in ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE > OVS_PACKET_ATTR_MAX. Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes while maintaining to be binary compatible with existing OVS binaries. Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.") Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tracked-down-by: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14bridge: only provide proxy ARP when CONFIG_INET is enabledArnd Bergmann
When IPV4 support is disabled, we cannot call arp_send from the bridge code, which would result in a kernel link error: net/built-in.o: In function `br_handle_frame_finish': :(.text+0x59914): undefined reference to `arp_send' :(.text+0x59a50): undefined reference to `arp_tbl' This makes the newly added proxy ARP support in the bridge code depend on the CONFIG_INET symbol and lets the compiler optimize the code out to avoid the link error. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") Cc: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14neighbour: fix base_reachable_time(_ms) not effective immediatly when changedJean-Francois Remy
When setting base_reachable_time or base_reachable_time_ms on a specific interface through sysctl or netlink, the reachable_time value is not updated. This means that neighbour entries will continue to be updated using the old value until it is recomputed in neigh_period_work (which recomputes the value every 300*HZ). On systems with HZ equal to 1000 for instance, it means 5mins before the change is effective. This patch changes this behavior by recomputing reachable_time after each set on base_reachable_time or base_reachable_time_ms. The new value will become effective the next time the neighbour's timer is triggered. Changes are made in two places: the netlink code for set and the sysctl handling code. For sysctl, I use a proc_handler. The ipv6 network code does provide its own handler but it already refreshes reachable_time correctly so it's not an issue. Any other user of neighbour which provide its own handlers must refresh reachable_time. Signed-off-by: Jean-Francois Remy <jeff@melix.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12tipc: fix bug in broadcast retransmit codeJon Paul Maloy
In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic SKB list APIs to manage link transmission queue") we replace all list traversal loops with the macros skb_queue_walk() or skb_queue_walk_safe(). While the previous loops were based on the assumption that the list was NULL-terminated, the standard macros stop when the iterator reaches the list head, which is non-NULL. In the function bclink_retransmit_pkt() this macro replacement has lead to a bug. When we receive a BCAST STATE_MSG we unconditionally call the function bclink_retransmit_pkt(), whether there really is anything to retransmit or not, assuming that the sequence number comparisons will lead to the correct behavior. However, if the transmission queue is empty, or if there are no eligible buffers in the transmission queue, we will by mistake pass the list head pointer to the function tipc_link_retransmit(). Since the list head is not a valid sk_buff, this leads to a crash. In this commit we fix this by only calling tipc_link_retransmit() if we actually found eligible buffers in the transmission queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains netfilter/ipvs fixes, they are: 1) Small fix for the FTP helper in IPVS, a diff variable may be left unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter. 2) Fix nf_tables port NAT in little endian archs, patch from leroy christophe. 3) Fix race condition between conntrack confirmation and flush from userspace. This is the second reincarnation to resolve this problem. 4) Make sure inner messages in the batch come with the nfnetlink header. 5) Relax strict check from nfnetlink_bind() that may break old userspace applications using all 1s group mask. 6) Schedule removal of chains once no sets and rules refer to them in the new nf_tables ruleset flush command. Reported by Asbjoern Sloth Toennesen. Note that this batch comes later than usual because of the short winter holidays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12packet: bail out of packet_snd() if L2 header creation failsChristoph Jaeger
Due to a misplaced parenthesis, the expression (unlikely(offset) < 0), which expands to (__builtin_expect(!!(offset), 0) < 0), never evaluates to true. Therefore, when sending packets with PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended if the creation of the layer 2 header fails. Spotted by Coverity - CID 1259975 ("Operands don't affect result"). Fixes: 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header") Signed-off-by: Christoph Jaeger <cj@linux.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-10Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull two nfsd bugfixes from Bruce Fields. * 'for-3.19' of git://linux-nfs.org/~bfields/linux: rpc: fix xdr_truncate_encode to handle buffer ending on page boundary nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two Ceph fixes from Sage Weil: "These are both pretty trivial: a sparse warning fix and size_t printk thing" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix sparse endianness warnings ceph: use %zu for len in ceph_fill_inline_data()
2015-01-08libceph: fix sparse endianness warningsIlya Dryomov
The only real issue is the one in auth_x.c and it came with 3.19-rc1 merge. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-07rpc: fix xdr_truncate_encode to handle buffer ending on page boundaryJ. Bruce Fields
A struct xdr_stream at a page boundary might point to the end of one page or the beginning of the next, but xdr_truncate_encode isn't prepared to handle the former. This can cause corruption of NFSv4 READDIR replies in the case that a readdir entry that would have exceeded the client's dircount/maxcount limit would have ended exactly on a 4k page boundary. You're more likely to hit this case on large directories. Other xdr_truncate_encode callers are probably also affected. Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Fixes: 3e19ce762b53 "rpc: xdr_truncate_encode" Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Just a pile of random fixes, including: 1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu. 2) MDI{,X} eeprom check in e100 driver is reversed, from John W. Linville. 3) Missing error return assignments in several ethernet drivers, from Julia Lawall. 4) Altera TSE device doesn't come back up after ifconfig down/up sequence, fix from Kostya Belezko. 5) Add more cases to the check for whether the qmi_wwan device has a bogus MAC address and needs to be assigned a random one. From Kristian Evensen. 6) Fix interrupt hangs in CPSW, from Felipe Balbi. 7) Implement ndo_features_check in r8152 so that the stack doesn't feed GSO packets which are outside of the chip's capabilities. From Hayes Wang" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) qla3xxx: don't allow never end busy loop xen-netback: fixing the propagation of the transmit shaper timeout r8152: support ndo_features_check batman-adv: fix potential TT client + orig-node memory leak batman-adv: fix multicast counter when purging originators batman-adv: fix counter for multicast supporting nodes batman-adv: fix lock class for decoding hash in network-coding.c batman-adv: fix delayed foreign originator recognition batman-adv: fix and simplify condition when bonding should be used Revert "mac80211: Fix accounting of the tailroom-needed counter" net: ethernet: cpsw: fix hangs with interrupts enic: free all rq buffs when allocation fails qmi_wwan: Set random MAC on devices with buggy fw openvswitch: Consistently include VLAN header in flow and port stats. tcp: Do not apply TSO segment limit to non-TSO packets Altera TSE: Add missing phydev net/mlx4_core: Fix error flow in mlx4_init_hca() net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow qlcnic: Fix return value in qlcnic_probe() net: axienet: fix error return code ...
2015-01-06netfilter: nf_tables: fix flush ruleset chain dependenciesPablo Neira Ayuso
Jumping between chains doesn't mix well with flush ruleset. Rules from a different chain and set elements may still refer to us. [ 353.373791] ------------[ cut here ]------------ [ 353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159! [ 353.373896] invalid opcode: 0000 [#1] SMP [ 353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi [ 353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98 [ 353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010 [...] [ 353.375018] Call Trace: [ 353.375046] [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540 [ 353.375101] [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0 [ 353.375150] [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0 [ 353.375200] [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790 [ 353.375253] [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0 [ 353.375300] [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70 [ 353.375357] [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30 [ 353.375410] [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0 [ 353.375459] [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400 [ 353.375510] [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90 [ 353.375563] [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20 [ 353.375616] [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0 [ 353.375667] [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80 [ 353.375719] [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d [ 353.375776] [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20 [ 353.375823] [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b Release objects in this order: rules -> sets -> chains -> tables, to make sure no references to chains are held anymore. Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06netfilter: nfnetlink: relax strict multicast group check from netlink_bindPablo Neira Ayuso
Relax the checking that was introduced in 97840cb ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") when the subscription bitmask is used. Existing userspace code code may request to listen to all of the existing netlink groups by setting an all to one subscription group bitmask. Netlink already validates subscription via setsockopt() for us. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06netfilter: nfnetlink: validate nfnetlink header from batchPablo Neira Ayuso
Make sure there is enough room for the nfnetlink header in the netlink messages that are part of the batch. There is a similar check in netlink_rcv_skb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06netfilter: conntrack: fix race between confirmation and flushPablo Neira Ayuso
Commit 5195c14c8b27c ("netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse") aimed to resolve the race condition between the confirmation (packet path) and the flush command (from control plane). However, it introduced a crash when several packets race to add a new conntrack, which seems easier to reproduce when nf_queue is in place. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list This patch also changes the verdict from NF_ACCEPT to NF_DROP when we lose race. Basically, the confirmation happens for the first packet that we see in a flow. If you just invoked conntrack -F once (which should be the common case), then this is likely to be the first packet of the flow (unless you already called flush anytime soon in the past). This should be hard to trigger, but better drop this packet, otherwise we leave things in inconsistent state since the destination will likely reply to this packet, but it will find no conntrack, unless the origin retransmits. The change of the verdict has been discussed in: https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included changes: - ensure bonding is used (if enabled) for packets coming in the soft interface - fix race condition to avoid orig_nodes to be deleted right after being added - avoid false positive lockdep splats by assigning lockclass to the proper hashtable lock objects - avoid miscounting of multicast 'disabled' nodes in the network - fix memory leak in the Global Translation Table in case of originator interval change Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06Merge tag 'mac80211-for-davem-2015-01-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Here's just a single fix - a revert of a patch that broke the p54 and cw2100 drivers (arguably due to bad assumptions there.) Since this affects kernels since 3.17, I decided to revert for now and we'll revisit this optimisation properly for -next. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06batman-adv: fix potential TT client + orig-node memory leakLinus Lüssing
This patch fixes a potential memory leak which can occur once an originator times out. On timeout the according global translation table entry might not get purged correctly. Furthermore, the non purged TT entry will cause its orig-node to leak, too. Which additionally can lead to the new multicast optimization feature not kicking in because of a therefore bogus counter. In detail: The batadv_tt_global_entry->orig_list holds the reference to the orig-node. Usually this reference is released after BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()-> batadv_purge_orig_node()->batadv_update_route()->_batadv_update_route()-> batadv_tt_global_del_orig() which purges this global tt entry and releases the reference to the orig-node. However, if between two batadv_purge_orig_node() calls the orig-node timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't reached. Instead the according orig-node is removed from the originator hash in _batadv_purge_orig(), the batadv_update_route() part is skipped and won't be reached anymore. Fixing the issue by moving batadv_tt_global_del_orig() out of the rcu callback. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06batman-adv: fix multicast counter when purging originatorsLinus Lüssing
When purging an orig_node we should only decrease counter tracking the number of nodes without multicast optimizations support if it was increased through this orig_node before. A not yet quite initialized orig_node (meaning it did not have its turn in the mcast-tvlv handler so far) which gets purged would not adhere to this and will lead to a counter imbalance. Fixing this by adding a check whether the orig_node is mcast-initalized before decreasing the counter in the mcast-orig_node-purging routine. Introduced by 60432d756cf06e597ef9da511402dd059b112447 ("batman-adv: Announce new capability via multicast TVLV") Reported-by: Tobias Hachmer <tobias@hachmer.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06batman-adv: fix counter for multicast supporting nodesLinus Lüssing
A miscounting of nodes having multicast optimizations enabled can lead to multicast packet loss in the following scenario: If the first OGM a node receives from another one has no multicast optimizations support (no multicast tvlv) then we are missing to increase the counter. This potentially leads to the wrong assumption that we could safely use multicast optimizations. Fixings this by increasing the counter if the initial OGM has the multicast TVLV unset, too. Introduced by 60432d756cf06e597ef9da511402dd059b112447 ("batman-adv: Announce new capability via multicast TVLV") Reported-by: Tobias Hachmer <tobias@hachmer.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06batman-adv: fix lock class for decoding hash in network-coding.cMartin Hundebøll
batadv_has_set_lock_class() is called with the wrong hash table as first argument (probably due to a copy-paste error), which leads to false positives when running with lockdep. Introduced-by: 612d2b4fe0a1ff2f8389462a6f8be34e54124c05 ("batman-adv: network coding - save overheard and tx packets for decoding") Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06batman-adv: fix delayed foreign originator recognitionLinus Lüssing
Currently it can happen that the reception of an OGM from a new originator is not being accepted. More precisely it can happen that an originator struct gets allocated and initialized (batadv_orig_node_new()), even the TQ gets calculated and set correctly (batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging thread will decide to delete it if it has a chance to jump between these two function calls. This is because batadv_orig_node_new() initializes the last_seen value to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to other threads by adding it to the hash table already. batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct, current time a few lines later but if the purging thread jumps in between that it will think that the orig_node timed out and will wrongly schedule it for deletion already. If the purging interval is the same as the originator interval (which is the default: 1 second), then this game can continue for several rounds until the random OGM jitter added enough difference between these two (in tests, two to about four rounds seemed common). Fixing this by initializing the last_seen variable of an orig_node to the current time before adding it to the hash table. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-06batman-adv: fix and simplify condition when bonding should be usedSimon Wunderlich
The current condition actually does NOT consider bonding when the interface the packet came in from is the soft interface, which is the opposite of what it should do (and the comment describes). Fix that and slightly simplify the condition. Reported-by: Ray Gibson <booray@gmail.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-01-05Revert "mac80211: Fix accounting of the tailroom-needed counter"Johannes Berg
This reverts commit ca34e3b5c808385b175650605faa29e71e91991b. It turns out that the p54 and cw2100 drivers assume that there's tailroom even when they don't say they really need it. However, there's currently no way for them to explicitly say they do need it, so for now revert this. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331. Cc: stable@vger.kernel.org Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter") Reported-by: Christopher Chavez <chrischavez@gmx.us> Bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Debugged-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-02openvswitch: Consistently include VLAN header in flow and port stats.Ben Pfaff
Until now, when VLAN acceleration was in use, the bytes of the VLAN header were not included in port or flow byte counters. They were however included when VLAN acceleration was not used. This commit corrects the inconsistency, by always including the VLAN header in byte counters. Previous discussion at http://openvswitch.org/pipermail/dev/2014-December/049521.html Reported-by: Motonori Shindo <mshindo@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02tcp: Do not apply TSO segment limit to non-TSO packetsHerbert Xu
Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs. In fact the problem was completely unrelated to IPsec. The bug is also reproducible if you just disable TSO/GSO. The problem is that when the MSS goes down, existing queued packet on the TX queue that have not been transmitted yet all look like TSO packets and get treated as such. This then triggers a bug where tcp_mss_split_point tells us to generate a zero-sized packet on the TX queue. Once that happens we're screwed because the zero-sized packet can never be removed by ACKs. Fixes: 1485348d242 ("tcp: Apply device TSO segment limit earlier") Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31Revert "cfg80211: make WEXT compatibility unselectable"Jiri Kosina
This reverts commit 24a0aa212ee2dbe44360288684478d76a8e20a0a. It's causing severe userspace breakage. Namely, all the utilities from wireless-utils which are relying on CONFIG_WEXT (which means tools like 'iwconfig', 'iwlist', etc) are not working anymore. There is a 'iw' utility in newer wireless-tools, which is supposed to be a replacement for all the "deprecated" binaries, but it's far away from being massively adopted. Please see [1] for example of the userspace breakage this is causing. In addition to that, Larry Finger reports [2] that this patch is also causing ipw2200 driver being impossible to build. To me this clearly shows that CONFIG_WEXT is far, far away from being "deprecated enough" to be removed. [1] http://thread.gmane.org/gmane.linux.kernel/1857010 [2] http://thread.gmane.org/gmane.linux.network/343688 Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen. 2) Fix receive checksum handling in enic driver, from Govindarajulu Varadarajan. 3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from Herbert Xu. Also, add code to detect drivers that have this mistake in the future. 4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai. 5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP input path,f rom Nicolas Dichtel. 6) Fix MPLS action validation in openvswitch, from Pravin B Shelar. 7) Fix double SKB free in vxlan driver, also from Pravin. 8) When we scrub a packet, which happens when we are switching the context of the packet (namespace, etc.), we should reset the secmark. From Thomas Graf. 9) ->ndo_gso_check() needs to do more than return true/false, it also has to allow the driver to clear netdev feature bits in order for the caller to be able to proceed properly. From Jesse Gross. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) genetlink: A genl_bind() to an out-of-range multicast group should not WARN(). netlink/genetlink: pass network namespace to bind/unbind ne2k-pci: Add pci_disable_device in error handling bonding: change error message to debug message in __bond_release_one() genetlink: pass multicast bind/unbind to families netlink: call unbind when releasing socket netlink: update listeners directly when removing socket genetlink: pass only network namespace to genl_has_listeners() netlink: rename netlink_unbind() to netlink_undo_bind() net: Generalize ndo_gso_check to ndo_features_check net: incorrect use of init_completion fixup neigh: remove next ptr from struct neigh_table net: xilinx: Remove unnecessary temac_property in the driver net: phy: micrel: use generic config_init for KSZ8021/KSZ8031 net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding openvswitch: fix odd_ptr_err.cocci warnings Bluetooth: Fix accepting connections when not using mgmt Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR brcmfmac: Do not crash if platform data is not populated ipw2200: select CFG80211_WEXT ...
2014-12-29genetlink: A genl_bind() to an out-of-range multicast group should not WARN().David S. Miller
Users can request to bind to arbitrary multicast groups, so warning when the requested group number is out of range is not appropriate. And with the warning removed, and the 'err' variable properly given an initial value, we can remove 'found' altogether. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27netlink/genetlink: pass network namespace to bind/unbindJohannes Berg
Netlink families can exist in multiple namespaces, and for the most part multicast subscriptions are per network namespace. Thus it only makes sense to have bind/unbind notifications per network namespace. To achieve this, pass the network namespace of a given client socket to the bind/unbind functions. Also do this in generic netlink, and there also make sure that any bind for multicast groups that only exist in init_net is rejected. This isn't really a problem if it is accepted since a client in a different namespace will never receive any notifications from such a group, but it can confuse the family if not rejected (it's also possible to silently (without telling the family) accept it, but it would also have to be ignored on unbind so families that take any kind of action on bind/unbind won't do unnecessary work for invalid clients like that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27genetlink: pass multicast bind/unbind to familiesJohannes Berg
In order to make the newly fixed multicast bind/unbind functionality in generic netlink, pass them down to the appropriate family. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27netlink: call unbind when releasing socketJohannes Berg
Currently, netlink_unbind() is only called when the socket explicitly unbinds, which limits its usefulness (luckily there are no users of it yet anyway.) Call netlink_unbind() also when a socket is released, so it becomes possible to track listeners with this callback and without also implementing a netlink notifier (and checking netlink_has_listeners() in there.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27netlink: update listeners directly when removing socketJohannes Berg
The code is now confusing to read - first in one function down (netlink_remove) any group subscriptions are implicitly removed by calling __sk_del_bind_node(), but the subscriber database is only updated far later by calling netlink_update_listeners(). Move the latter call to just after removal from the list so it is easier to follow the code. This also enables moving the locking inside the kernel-socket conditional, which improves the normal socket destruction path. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27genetlink: pass only network namespace to genl_has_listeners()Johannes Berg
There's no point to force the caller to know about the internal genl_sock to use inside struct net, just have them pass the network namespace. This doesn't really change code generation since it's an inline, but makes the caller less magic - there's never any reason to pass another socket. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27netlink: rename netlink_unbind() to netlink_undo_bind()Johannes Berg
The new name is more expressive - this isn't a generic unbind function but rather only a little undo helper for use only in netlink_bind(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>