summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2016-07-08ieee802154: fix skb get fc on big endianAlexander Aring
This patch fixes ieee802154_get_fc_from_skb function on big endian machines. The function get_unaligned_le16 converts the byte order to host byte order but we want to keep the byte order like in mac header. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-08ieee802154: 6lowpan: fix intra pan id checkAlexander Aring
The RIOT-OS stack does send intra-pan frames but don't set the intra pan flag inside the mac header. It seems this is valid frame addressing but inefficient. Anyway this patch adds a new function for intra pan addressing, doesn't matter if intra pan flag or source and destination are the same. The newly introduction function will be used to check on intra pan addressing for 6lowpan. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-08ieee802154: add ieee802154_skb_src_pan helperAlexander Aring
This patch adds ieee802154_skb_src_pan function to get the pointer address of the source pan id at skb mac pointer. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-08ieee802154: add ieee802154_skb_dst_pan helperAlexander Aring
This patch adds ieee802154_skb_dst_pan function to get the pointer address of the destination pan id at skb mac pointer. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-08ieee802154: add netns supportAlexander Aring
This patch adds netns support for 802.15.4 subsystem. Most parts are copy&pasted from wireless subsystem, it has the identically userspace API. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-08nl802154: move PAD to right positionAlexander Aring
The PAD define should be above the experimental support. We don't care about if we break userspace in experimental stuff but PAD is part of the existing UAPI. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-07Merge tag 'mac80211-next-for-davem-2016-07-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== One more set of new features: * beacon report (for radio measurement) support in cfg80211/mac80211 * hwsim: allow wmediumd in namespaces * mac80211: extend 160MHz workaround to CSA IEs * mesh: properly encrypt group-addressed privacy action frames * mesh: allow setting peer AID * first steps for MU-MIMO monitor mode * along with various other cleanups and improvements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-07Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux ↵James Morris
into next
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Don't use userspace datatypes in bridge netfilter code, from Tobin Harding. 2) Iterate only once over the expectation table when removing the helper module, instead of once per-netns, from Florian Westphal. 3) Extra sanitization in xt_hook_ops_alloc() to return error in case we ever pass zero hooks, xt_hook_ops_alloc(): 4) Handle NFPROTO_INET from the logging core infrastructure, from Liping Zhang. 5) Autoload loggers when TRACE target is used from rules, this doesn't change the behaviour in case the user already selected nfnetlink_log as preferred way to print tracing logs, also from Liping Zhang. 6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields by cache lines, increases the size of entries in 11% per entry. From Florian Westphal. 7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian. 8) Remove useless defensive check in nf_logger_find_get() from Shivani Bhardwaj. 9) Remove zone extension as place it in the conntrack object, this is always include in the hashing and we expect more intensive use of zones since containers are in place. Also from Florian Westphal. 10) Owner match now works from any namespace, from Eric Bierdeman. 11) Make sure we only reply with TCP reset to TCP traffic from nf_reject_ipv4, patch from Liping Zhang. 12) Introduce --nflog-size to indicate amount of network packet bytes that are copied to userspace via log message, from Vishwanath Pai. This obsoletes --nflog-range that has never worked, it was designed to achieve this but it has never worked. 13) Introduce generic macros for nf_tables object generation masks. 14) Use generation mask in table, chain and set objects in nf_tables. This allows fixes interferences with ongoing preparation phase of the commit protocol and object listings going on at the same time. This update is introduced in three patches, one per object. 15) Check if the object is active in the next generation for element deactivation in the rbtree implementation, given that deactivation happens from the commit phase path we have to observe the future status of the object. 16) Support for deletion of just added elements in the hash set type. 17) Allow to resize hashtable from /proc entry, not only from the obscure /sys entry that maps to the module parameter, from Florian Westphal. 18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised anymore since we tear down the ruleset whenever the netdevice goes away. 19) Support for matching inverted set lookups, from Arturo Borrero. 20) Simplify the iptables_mangle_hook() by removing a superfluous extra branch. 21) Introduce ether_addr_equal_masked() and use it from the netfilter codebase, from Joe Perches. 22) Remove references to "Use netfilter MARK value as routing key" from the Netfilter Kconfig description given that this toggle doesn't exists already for 10 years, from Moritz Sichert. 23) Introduce generic NF_INVF() and use it from the xtables codebase, from Joe Perches. 24) Setting logger to NONE via /proc was not working unless explicit nul-termination was included in the string. This fixes seems to leave the former behaviour there, so we don't break backward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06cfg80211: Add mesh peer AID setting APIMasashi Honma
Previously, mesh power management functionality works only with kernel MPM. Because user space MPM did not report mesh peer AID to kernel, the kernel could not identify the bit in TIM element. So this patch adds mesh peer AID setting API. Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06mac80211: Add support for beacon report radio measurementAvraham Stern
Add the following to support beacon report radio measurement with the measurement mode field set to passive or active: 1. Propagate the required scan duration to the device 2. Report the scan start time (in terms of TSF) 3. Report each BSS's detection time (also in terms of TSF) TSF times refer to the BSS that the interface that requested the scan is connected to. Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com> [changed ath9k/10k, at76c59x-usb, iwlegacy, wl1251 and wlcore to match the new API] Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06nl80211: support beacon report scanningAvraham Stern
Beacon report radio measurement requires reporting observed BSSs on the channels specified in the beacon request. If the measurement mode is set to passive or active, it requires actually performing a scan (passive or active, accordingly), and reporting the time that the scan was started and the time each beacon/probe was received (both in terms of TSF of the BSS of the requesting AP). If the request mode is table, this information is optional. In addition, the radio measurement request specifies the channel dwell time for the measurement. In order to use scan for beacon report when the mode is active or passive, add a parameter to scan request that specifies the channel dwell time, and add scan start time and beacon received time to scan results information. Supporting beacon report is required for Multi Band Operation (MBO). Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06nl80211: Add API to support VHT MU-MIMO air snifferAviya Erenfeld
add API to support VHT MU-MIMO air sniffer. in MU-MIMO there are parallel frames on the air while the HW has only one RX. add the capability to sniff one of the MU-MIMO parallel frames by giving the sniffer additional information so it'll know which of the parallel frames it shall follow. Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting a MU-MIMO groupID in order to monitor packets from that group using VHT MU-MIMO. And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing MAC address to monitor mode. that option will be used by VHT MU-MIMO air sniffer to follow a station according to it's MAC address using VHT MU-MIMO. Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-05neigh: Send a notification when DELAY_PROBE_TIME changesIdo Schimmel
When the data plane is offloaded the traffic doesn't go through the networking stack. Therefore, after first resolving a neighbour the NUD state machine will transition it from REACHABLE to STALE until it's finally deleted by the garbage collector. To prevent such situations the offloading driver should notify the NUD state machine on any neighbours that were recently used. The driver's polling interval should be set so that the NUD state machine can function as if the traffic wasn't offloaded. Currently, there are no in-tree drivers that can report confirmation for a neighbour, but only 'used' indication. Therefore, the polling interval should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will transition from REACHABLE state to DELAY (instead of STALE) if "a packet was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861). Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via netlink or sysctl - so that offloading drivers can correctly set their polling interval. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net sched actions: skbedit add support for mod-ing skb pkt_typeJamal Hadi Salim
Extremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Example usage: tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \ match ip src 5.5.5.5/32 \ flowid 1:5 action skbedit ptype host This will tag all packets incoming from 5.5.5.5 with type PACKET_HOST Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04NFC: digital: Add a delay between poll cyclesThierry Escande
This replaces the polling work struct with a delayed work struct and add a 10 ms delay between 2 poll cycles. This avoids to flood the device with 'switch off'/'switch on' commands. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-04NFC: hci: delete unused nfc_llc_get_rx_head_tail_room()Denys Vlasenko
It used to be EXPORTed, but then EXPORT usage was cleaned up (in 2012), without noticing that the function has no users at all (and curiously, never had any users). Delete it. While at it, remove non-static "inline" hints on nearby functions: these hints don't work across compilation units anyway, and these functions are not used in their .c file, thus they are never inlined. IOW: "inline" here does not help in any way. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Samuel Ortiz <sameo@linux.intel.com> CC: Christophe Ricard <christophe.ricard@gmail.com> CC: linux-wireless@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-02net/devlink: Add E-Switch mode controlOr Gerlitz
Add the commands to set and show the mode of SRIOV E-Switch, two modes are supported: * legacy: operating in the "old" L2 based mode (DMAC --> VF vport) * switchdev: the E-Switch is referred to as whitebox switch configured using standard tools such as tc, bridge, openvswitch etc. To allow working with the tools, for each VF, a VF representor netdevice is created by the E-Switch manager vendor device driver instance (e.g PF). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01bonding: prevent out of bound accessesEric Dumazet
ether_addr_equal_64bits() requires some care about its arguments, namely that 8 bytes might be read, even if last 2 byte values are not used. KASan detected a violation with null_mac_addr and lacpdu_mcast_addr in bond_3ad.c Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c : Although the 8-byte alignment was there, KASan would detect out of bound accesses. Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare") Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr") Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01tcp: md5: use kmalloc() backed scratch areasEric Dumazet
Some arches have virtually mapped kernel stacks, or will soon have. tcp_md5_hash_header() uses an automatic variable to copy tcp header before mangling th->check and calling crypto function, which might be problematic on such arches. David says that using percpu storage is also problematic on non SMP builds. Just use kmalloc() to allocate scratch areas. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_outputShmulik Ladkani
ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it calls ip_sk_use_pmtu which casts sk as an inet_sk). However, in the case of UDP tunneling, the skb->sk is not necessarily an inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from tun/tap). OTOH, the sk passed as an argument throughout IP stack's output path is the one which is of PMTU interest: - In case of local sockets, sk is same as skb->sk; - In case of a udp tunnel, sk is the tunneling socket. Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu. This augments 7026b1ddb6 'netfilter: Pass socket pointer down through okfn().' Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30tcp: increase size at which tcp_bound_to_half_wnd bounds to > TCP_MSS_DEFAULTSeymour, Shane M
In previous commit 01f83d69844d307be2aa6fea88b0e8fe5cbdb2f4 the following comments were added: "When peer uses tiny windows, there is no use in packetizing to sub-MSS pieces for the sake of SWS or making sure there are enough packets in the pipe for fast recovery." The test should be > TCP_MSS_DEFAULT not >= 512. This allows low end devices that send an MSS of 536 (TCP_MSS_DEFAULT) to see better network performance by sending it 536 bytes of data at a time instead of bounding to half window size (268). Other network stacks work this way, e.g. HP-UX. Signed-off-by: Shane Seymour <shane.seymour@hpe.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attributeNikolay Aleksandrov
This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute which allows to export per-slave statistics if the master device supports the linkxstats callback. The attribute is passed down to the linkxstats callback and it is up to the callback user to use it (an example has been added to the only current user - the bridge). This allows us to query only specific slaves of master devices like bridge ports and export only what we're interested in instead of having to dump all ports and searching only for a single one. This will be used to export per-port IGMP/MLD stats and also per-port vlan stats in the future, possibly other statistics as well. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-27netlabel: Implement CALIPSO config functions for SMACK.Huw Davies
SMACK uses similar functions to control CIPSO, these are the equivalent functions for CALIPSO and follow exactly the same semantics. int netlbl_cfg_calipso_add(struct calipso_doi *doi_def, struct netlbl_audit *audit_info) Adds a CALIPSO doi. void netlbl_cfg_calipso_del(u32 doi, struct netlbl_audit *audit_info) Removes a CALIPSO doi. int netlbl_cfg_calipso_map_add(u32 doi, const char *domain, const struct in6_addr *addr, const struct in6_addr *mask, struct netlbl_audit *audit_info) Creates a mapping between a domain and a CALIPSO doi. If addr and mask are non-NULL this creates an address-selector type mapping. This also extends netlbl_cfg_map_del() to remove IPv6 address-selector mappings. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Add a label cache.Huw Davies
This works in exactly the same way as the CIPSO label cache. The idea is to allow the lsm to cache the result of a secattr lookup so that it doesn't need to perform the lookup for every skbuff. It introduces two sysctl controls: calipso_cache_enable - enables/disables the cache. calipso_cache_bucket_size - sets the size of a cache bucket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Add validation of CALIPSO option.Huw Davies
Lengths, checksum and the DOI are checked. Checking of the level and categories are left for the socket layer. CRC validation is performed in the calipso module to avoid unconditionally linking crc_ccitt() into ipv6. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Pass a family parameter to netlbl_skbuff_err().Huw Davies
This makes it possible to route the error to the appropriate labelling engine. CALIPSO is far less verbose than CIPSO when encountering a bogus packet, so there is no need for a CALIPSO error handler. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Allow the lsm to label the skbuff directly.Huw Davies
In some cases, the lsm needs to add the label to the skbuff directly. A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4 behaviour. This allows selinux to label the skbuffs that it requires. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: constify the skb pointer of ipv6_find_tlv().Huw Davies
Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Allow request sockets to be relabelled by the lsm.Huw Davies
Request sockets need to have a label that takes into account the incoming connection as well as their parent's label. This is used for the outgoing SYN-ACK and for their child full-socket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: Allow request socks to contain IPv6 options.Huw Davies
If set, these will take precedence over the parent's options during both sending and child creation. If they're not set, the parent's options (if any) will be used. This is to allow the security_inet_conn_request() hook to modify the IPv6 options in just the same way that it already may do for IPv4. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Set the calipso socket label to match the secattr.Huw Davies
CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on the equivalent CISPO code. The main difference is due to manipulating the options in the hop-by-hop header. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Move bitmap manipulation functions to the NetLabel core.Huw Davies
This is to allow the CALIPSO labelling engine to use these. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem pointer.Huw Davies
The functionality is equivalent to ipv6_renew_options() except that the newopt pointer is in kernel, not user, memory The kernel memory implementation will be used by the CALIPSO network labelling engine, which needs to be able to set IPv6 hop-by-hop options. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for removing a CALIPSO DOI.Huw Davies
Remove a specified DOI through the NLBL_CALIPSO_C_REMOVE command. It requires the attribute: NLBL_CALIPSO_A_DOI. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for enumerating the CALIPSO DOI list.Huw Davies
Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command. It takes no attributes. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for querying a CALIPSO DOI.Huw Davies
Query a specified DOI through the NLBL_CALIPSO_C_LIST command. It requires the attribute: NLBL_CALIPSO_A_DOI. The reply will contain: NLBL_CALIPSO_A_MTYPE Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Initial support for the CALIPSO netlink protocol.Huw Davies
CALIPSO is a packet labelling protocol for IPv6 which is very similar to CIPSO. It is specified in RFC 5570. Much of the code is based on the current CIPSO code. This adds support for adding passthrough-type CALIPSO DOIs through the NLBL_CALIPSO_C_ADD command. It requires attributes: NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS. NLBL_CALIPSO_A_DOI. In passthrough mode the CALIPSO engine will map MLS secattr levels and categories directly to the packet label. At this stage, the major difference between this and the CIPSO code is that IPv6 may be compiled as a module. To allow for this the CALIPSO functions are registered at module init time. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-25net_sched: generalize bulk dequeueEric Dumazet
When qdisc bulk dequeue was added in linux-3.18 (commit 5772e9a3463b "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"), it was constrained to some specific qdiscs. With some extra care, we can extend this to all qdiscs, so that typical traffic shaping solutions can benefit from small batches (8 packets in this patch). For example, HTB is often used on some multi queue device. And bonding/team are multi queue devices... Idea is to bulk-dequeue packets mapping to the same transmit queue. This brings between 35 and 80 % performance increase in HTB setup under pressure on a bonding setup : 1) NUMA node contention : 610,000 pps -> 1,110,000 pps 2) No node contention : 1,380,000 pps -> 1,930,000 pps Now we should work to add batches on the enqueue() side ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: fq_codel: cache skb->truesize into skb->cbEric Dumazet
Now we defer skb drops, it makes sense to keep a copy of skb->truesize in struct codel_skb_cb to avoid one cache line miss per dropped skb in fq_codel_drop(), to reduce latencies a bit further. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: drop packets after root qdisc lock is releasedEric Dumazet
Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-24netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLEDPablo Neira Ayuso
This flag was introduced to restore rulesets from the new netdev family, but since 5ebe0b0eec9d6f7 ("netfilter: nf_tables: destroy basechain and rules on netdevice removal") the ruleset is released once the netdev is gone. This also removes nft_register_basechain() and nft_unregister_basechain() since they have no clients anymore after this rework. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: conntrack: allow increasing bucket size via sysctl tooFlorian Westphal
No need to restrict this to module parameter. We export a copy of the real hash size -- when user alters the value we allocate the new table, copy entries etc before we update the real size to the requested one. This is also needed because the real size is used by concurrent readers and cannot be changed without synchronizing the conntrack generation seqcnt. We only allow changing this value from the initial net namespace. Tested using http-client-benchmark vs. httpterm with concurrent while true;do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: nf_tables: add generation mask to setsPablo Neira Ayuso
Similar to ("netfilter: nf_tables: add generation mask to tables"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: nf_tables: add generation mask to chainsPablo Neira Ayuso
Similar to ("netfilter: nf_tables: add generation mask to tables"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: nf_tables: add generation mask to tablesPablo Neira Ayuso
This patch addresses two problems: 1) The netlink dump is inconsistent when interfering with an ongoing transaction update for several reasons: 1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should be skipping these inactive objects in the dump. 1.b) We perform speculative deletion during the preparation phase, that may result in skipping active objects. 1.c) The listing order changes, which generates noise when tracking incremental ruleset update via tools like git or our own testsuite. 2) We don't allow to add and to update the object in the same batch, eg. add table x; add table x { flags dormant\; }. In order to resolve these problems: 1) If the user requests a deletion, the object becomes inactive in the next generation. Then, ignore objects that scheduled to be deleted from the lookup path, as they will be effectively removed in the next generation. 2) From the get/dump path, if the object is not currently active, we skip it. 3) Support 'add X -> update X' sequence from a transaction. After this update, we obtain a consistent list as long as we stay in the same generation. The userspace side can detect interferences through the generation counter so it can restart the dumping. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: nf_tables: add generic macros to check for generation maskPablo Neira Ayuso
Thus, we can reuse these to check the genmask of any object type, not only rules. This is required now that tables, chain and sets will get a generation mask field too in follow up patches. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: xt_NFLOG: nflog-range does not truncate packetsVishwanath Pai
li->u.ulog.copy_len is currently ignored by the kernel, we should truncate the packet to either li->u.ulog.copy_len (if set) or copy_range before sending it to userspace. 0 is a valid input for copy_len, so add a new flag to indicate whether this was option was specified by the user or not. Add two flags to indicate whether nflog-size/copy_len was set or not. XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log On the userspace side, this was initially represented by the option nflog-range, this will be replaced by --nflog-size now. --nflog-range would still exist but does not do anything. Reported-by: Joe Dollard <jdollard@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>