summaryrefslogtreecommitdiff
path: root/net/netlink
AgeCommit message (Collapse)Author
2013-12-08net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa
[ Upstream commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c ] This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-06net: netlink: filter particular protocols from analyzersDaniel Borkmann
Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28genl: Hold reference on correct module while netlink-dump.Pravin B Shelar
netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28genl: Fix genl dumpit() locking.Pravin B Shelar
In case of genl-family with parallel ops off, dumpif() callback is expected to run under genl_lock, But commit def3117493eafd9df (genl: Allow concurrent genl callbacks.) changed this behaviour where only first dumpit() op was called under genl-lock. For subsequent dump, only nlk->cb_lock was taken. Following patch fixes it by defining locked dumpit() and done() callback which takes care of genl-locking. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22Revert "genetlink: fix family dump race"Johannes Berg
This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-15netlink: Eliminate kmalloc in netlink dump operation.Pravin B Shelar
Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13genetlink: fix family dump raceJohannes Berg
When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. A similar bug was reported to me on an old kernel (3.4.47) but the exact scenario that happened there is no longer possible, on those kernels the first round wasn't locked either. Looking at the current code I found the race described above, which had also existed on the old kernel. Cc: stable@vger.kernel.org Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02net: netlink: minor: remove unused pointer in alloc_pg_vecDaniel Borkmann
Variable ptr is being assigned, but never used, so just remove it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-30genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACEPablo Neira
Currently, it is not possible to use neither NLM_F_EXCL nor NLM_F_REPLACE from genetlink. This is due to this checking in genl_family_rcv_msg: if (nlh->nlmsg_flags & NLM_F_DUMP) NLM_F_DUMP is NLM_F_MATCH|NLM_F_ROOT. Thus, if NLM_F_EXCL or NLM_F_REPLACE flag is set, genetlink believes that you're requesting a dump and it calls the .dumpit callback. The solution that I propose is to refine this checking to make it stricter: if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) And given the combination NLM_F_REPLACE and NLM_F_EXCL does not make sense to me, it removes the ambiguity. There was a patch that tried to fix this some time ago (0ab03c2 netlink: test for all flags of the NLM_F_DUMP composite) but it tried to resolve this ambiguity in *all* existing netlink subsystems, not only genetlink. That patch was reverted since it broke iproute2, which is using NLM_F_ROOT to request the dump of the routing cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28genetlink: release cb_lock before requesting additional moduleStanislaw Gruszka
Requesting external module with cb_lock taken can result in the deadlock like showed below: [ 2458.111347] Showing all locks held in the system: [ 2458.111347] 1 lock held by NetworkManager/582: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 [ 2458.111347] 1 lock held by modprobe/603: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.579457] SysRq : Show Blocked State [ 2461.580103] task PC stack pid father [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 Problem start to happen after adding net-pf-16-proto-16-family-nl80211 alias name to cfg80211 module by below commit (though that commit itself is perfectly fine): commit fb4e156886ce6e8309e912d8b370d192330d19d3 Author: Marcel Holtmann <marcel@holtmann.org> Date: Sun Apr 28 16:22:06 2013 -0700 nl80211: Add generic netlink module alias for cfg80211/nl80211 Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-28netlink: fix splat in skb_clone with large messagesPablo Neira
Since (c05cdb1 netlink: allow large data transfers from user-space), netlink splats if it invokes skb_clone on large netlink skbs since: * skb_shared_info was not correctly initialized. * skb->destructor is not set in the cloned skb. This was spotted by trinity: [ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001 [ 894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0 [...] [ 894.991034] Call Trace: [ 894.991034] [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240 [ 894.991034] [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40 [ 894.991034] [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0 [ 894.991034] [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0 Fix it by: 1) introducing a new netlink_skb_clone function that is used in nl_fib_input, that sets our special skb->destructor in the cloned skb. Moreover, handle the release of the large cloned skb head area in the destructor path. 2) not allowing large skbuffs in the netlink broadcast path. I cannot find any reasonable use of the large data transfer using netlink in that path, moreover this helps to skip extra skb_clone handling. I found two more netlink clients that are cloning the skbs, but they are not in the sendmsg path. Therefore, the sole client cloning that I found seems to be the fib frontend. Thanks to Eric Dumazet for helping to address this issue. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24net: netlink: virtual tap device managementDaniel Borkmann
Similarly to the networking receive path with ptype_all taps, we add the possibility to register netdevices that are for ARPHRD_NETLINK to the netlink subsystem, so that those can be used for netlink analyzers resp. debuggers. We do not offer a direct callback function as out-of-tree modules could do crap with it. Instead, a netdevice must be registered properly and only receives a clone, managed by the netlink layer. Symbols are exported as GPL-only. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13netlink: make compare exist all the timeGao feng
Commit da12c90e099789a63073fc82a19542ce54d4efb9 "netlink: Add compare function for netlink_table" only set compare at the time we create kernel netlink, and reset compare to NULL at the time we finially release netlink socket, but netlink_lookup wants the compare exist always. So we should set compare after we allocate nl_table, and never reset it. make comapre exist all the time. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11netlink: fix error propagation in netlink_mmap()Patrick McHardy
Return the error if something went wrong instead of unconditionally returning 0. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11netlink: Add compare function for netlink_tableGao feng
As we know, netlink sockets are private resource of net namespace, they can communicate with each other only when they in the same net namespace. this works well until we try to add namespace support for other subsystems which use netlink. Don't like ipv4 and route table.., it is not suited to make these subsytems belong to net namespace, Such as audit and crypto subsystems,they are more suitable to user namespace. So we must have the ability to make the netlink sockets in same user namespace can communicate with each other. This patch adds a new function pointer "compare" for netlink_table, we can decide if the netlink sockets can communicate with each other through this netlink_table self-defined compare function. The behavior isn't changed if we don't provide the compare function for netlink_table. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07netlink: allow large data transfers from user-spacePablo Neira Ayuso
I can hit ENOBUFS in the sendmsg() path with a large batch that is composed of many netlink messages. Here that limit is 8 MBytes of skbuff data area as kmalloc does not manage to get more than that. While discussing atomic rule-set for nftables with Patrick McHardy, we decided to put all rule-set updates that need to be applied atomically in one single batch to simplify the existing approach. However, as explained above, the existing netlink code limits us to a maximum of ~20000 rules that fit in one single batch without hitting ENOBUFS. iptables does not have such limitation as it is using vmalloc. This patch adds netlink_alloc_large_skb() which is only used in the netlink_sendmsg() path. It uses alloc_skb if the memory requested is <= one memory page, that should be the common case for most subsystems, else vmalloc for higher memory allocations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05net: fix sk_buff head without data areaPablo Neira
Eric Dumazet spotted that we have to check skb->head instead of skb->data as skb->head points to the beginning of the data area of the skbuff. Similarly, we have to initialize the skb->head pointer, not skb->data in __alloc_skb_head. After this fix, netlink crashes in the release path of the sk_buff, so let's fix that as well. This bug was introduced in (0ebd0ac net: add function to allocate sk_buff head without data area). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01netlink: kconfig: move mmap i/o into netlink kconfigDaniel Borkmann
Currently, in menuconfig, Netlink's new mmaped IO is the very first entry under the ``Networking support'' item and comes even before ``Networking options'': [ ] Netlink: mmaped IO Networking options ---> ... Lets move this into ``Networking options'' under netlink's Kconfig, since this might be more appropriate. Introduced by commit ccdfcc398 (``netlink: mmaped netlink: ring setup''). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01netlink: Fix skb ref counting.Pravin B Shelar
Commit f9c2288837ba072b21dba955f04a4c97eaa77b1e (netlink: implement memory mapped recvmsg) increamented skb->users ref count twice for a dump op which does not look right. Following patch fixes that. CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-27genetlink: fix possible memory leak in genl_family_rcv_msg()Wei Yongjun
'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr && family->parallel_ops, thus should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Introduced by commit def3117493eafd9dfa1f809d861e0031b2cc8a07 (genl: Allow concurrent genl callbacks.) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25genl: Allow concurrent genl callbacks.Pravin B Shelar
All genl callbacks are serialized by genl-mutex. This can become bottleneck in multi threaded case. Following patch adds an parameter to genl_family so that a particular family can get concurrent netlink callback without genl_lock held. New rw-sem is used to protect genl callback from genl family unregister. in case of parallel_ops genl-family read-lock is taken for callbacks and write lock is taken for register or unregistration for any family. In case of locked genl family semaphore and gel-mutex is locked for any openration. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-24netlink: fix compilation after memory mapped patchesNicolas Dichtel
Depending of the kernel configuration (CONFIG_UIDGID_STRICT_TYPE_CHECKS), we can get the following errors: net/netlink/af_netlink.c: In function ‘netlink_queue_mmaped_skb’: net/netlink/af_netlink.c:663:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:664:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ net/netlink/af_netlink.c: In function ‘netlink_ring_set_copied’: net/netlink/af_netlink.c:693:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:694:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ We must use the helpers to get the uid and gid, and also take care of user_ns. Fix suggested by Eric W. Biederman <ebiederm@xmission.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-23netlink: Fix build with mmap disabled.David S. Miller
net/netlink/diag.c: In function 'sk_diag_put_rings_cfg': net/netlink/diag.c:28:17: error: 'struct netlink_sock' has no member named 'pg_vec_lock' net/netlink/diag.c:29:29: error: 'struct netlink_sock' has no member named 'rx_ring' net/netlink/diag.c:31:30: error: 'struct netlink_sock' has no member named 'tx_ring' net/netlink/diag.c:33:19: error: 'struct netlink_sock' has no member named 'pg_vec_lock' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-23netlink: fix typo in net/netlink/af_netlink.cStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add RX/TX-ring support to netlink diagPatrick McHardy
Based on AF_PACKET. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add flow control for memory mapped I/OPatrick McHardy
Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: implement memory mapped recvmsg()Patrick McHardy
Add support for mmap'ed recvmsg(). To allow the kernel to construct messages into the mapped area, a dataless skb is allocated and the data pointer is set to point into the ring frame. This means frames will be delivered to userspace in order of allocation instead of order of transmission. This usually doesn't matter since the order is either not determinable by userspace or message creation/transmission is serialized. The only case where this can have a visible difference is nfnetlink_queue. Userspace can't assume mmap'ed messages have ordered IDs anymore and needs to check this if using batched verdicts. For non-mapped sockets, nothing changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: implement memory mapped sendmsg()Patrick McHardy
Add support for mmap'ed sendmsg() to netlink. Since the kernel validates received messages before processing them, the code makes sure userspace can't modify the message contents after invoking sendmsg(). To do that only a single mapping of the TX ring is allowed to exist and the socket must not be shared. If either of these two conditions does not hold, it falls back to copying. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add mmap'ed netlink helper functionsPatrick McHardy
Add helper functions for looking up mmap'ed frame headers, reading and writing their status, allocating skbs with mmap'ed data areas and a poll function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: mmaped netlink: ring setupPatrick McHardy
Add support for mmap'ed RX and TX ring setup and teardown based on the af_packet.c code. The following patches will use this to add the real mmap'ed receive and transmit functionality. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add netlink_skb_set_owner_r()Patrick McHardy
For mmap'ed I/O a netlink specific skb destructor needs to be invoked after the final kfree_skb() to clean up state. This doesn't work currently since the skb's ownership is transfered to the receiving socket using skb_set_owner_r(), which orphans the skb, thereby invoking the destructor prematurely. Since netlink doesn't account skbs to the originating socket, there's no need to orphan the skb. Add a netlink specific skb_set_owner_r() variant that does not orphan the skb and use a netlink specific destructor to call sock_rfree(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: don't orphan skb in netlink_trim()Patrick McHardy
Netlink doesn't account skbs to the sending socket, so the there's no need to orphan the skb before trimming it. Removing the skb_orphan() call is required for mmap'ed netlink, which uses a netlink specific skb destructor that must not be invoked before the final freeing of the skb. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: rename ssk to sk in struct netlink_skb_paramsPatrick McHardy
Memory mapped netlink needs to store the receiving userspace socket when sending from the kernel to userspace. Rename 'ssk' to 'sk' to avoid confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add symbolic value for congested statePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28net-next: replace obsolete NLMSG_* with type safe nlmsg_*Hong zhi guo
Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21netlink: Diag core and basic socket info dumping (v2)Andrey Vagin
The netlink_diag can be built as a module, just like it's done in unix sockets. The core dumping message carries the basic info about netlink sockets: family, type and protocol, portis, dst_group, dst_portid, state. Groups can be received as an optional parameter NETLINK_DIAG_GROUPS. Netlink sockets cab be filtered by protocols. The socket inode number and cookie is reserved for future per-socket info retrieving. The per-protocol filtering is also reserved for future by requiring the sdiag_protocol to be zero. The file /proc/net/netlink doesn't provide enough information for dumping netlink sockets. It doesn't provide dst_group, dst_portid, groups above 32. v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant. Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21net: prepare netlink code for netlink diagAndrey Vagin
Move a few declarations in a header. Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20genetlink: trigger BUG_ON if a group name is too longMasatake YAMATO
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-28hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-23new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-18net: proc: change proc_net_remove to remove_proc_entryGao feng
proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18net: proc: change proc_net_fops_create to proc_createGao feng
Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10netlink: Use FIELD_SIZEOF() in netlink_proto_init().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-18netlink: validate addr_len on bindHannes Frederic Sowa
Otherwise an out of bounds read could happen. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>