summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-09-13net: sched: RCU cls_tcindexJohn Fastabend
Make cls_tcindex RCU safe. This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check caller either holds the rcu read lock or RTNL. This is needed to handle the case where tcindex_lookup() is being called in both cases. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13net: rcu-ify tcf_protoJohn Fastabend
rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13net: qdisc: use rcu prefix and silence sparse warningsJohn Fastabend
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10net: bpf: only build bpf_jit_binary_{alloc, free}() when jit selectedDaniel Borkmann
Since BPF JIT depends on the availability of module_alloc() and module_free() helpers (HAVE_BPF_JIT and MODULES), we better build that code only in case we have BPF_JIT in our config enabled, just like with other JIT code. Fixes builds for arm/marzen_defconfig and sh/rsk7269_defconfig. ==================== kernel/built-in.o: In function `bpf_jit_binary_alloc': /home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc' kernel/built-in.o: In function `bpf_jit_binary_free': /home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free' make: *** [vmlinux] Error 1 ==================== Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: 738cbe72adc5 ("net: bpf: consolidate JIT binary allocator") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10net-timestamp: optimize sock_tx_timestamp default pathWillem de Bruijn
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly in this common case. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: bpf: be friendly to kmemcheckDaniel Borkmann
Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: bpf: consolidate JIT binary allocatorDaniel Borkmann
Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") and later on replicated in aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code") for s390 architecture, write protection for BPF JIT images got added and a random start address of the JIT code, so that it's not on a page boundary anymore. Since both use a very similar allocator for the BPF binary header, we can consolidate this code into the BPF core as it's mostly JIT independant anyway. This will also allow for future archs that support DEBUG_SET_MODULE_RONX to just reuse instead of reimplementing it. JIT tested on x86_64 and s390x with BPF test suite. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_infoJiri Pirko
Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr This initial part implements forward_delay, hello_time, max_age options. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net/ipv4: bind ip_nonlocal_bind to current netnsVincent Bernat
net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: filter: split filter.h and expose eBPF to user spaceAlexei Starovoitov
allow user space to generate eBPF programs uapi/linux/bpf.h: eBPF instruction set definition linux/filter.h: the rest This patch only moves macro definitions, but practically it freezes existing eBPF instruction set, though new instructions can still be added in the future. These eBPF definitions cannot go into uapi/linux/filter.h, since the names may conflict with existing applications. Full eBPF ISA description is in Documentation/networking/filter.txt Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: filter: add "load 64-bit immediate" eBPF instructionAlexei Starovoitov
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register. All previous instructions were 8-byte. This is first 16-byte instruction. Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction: insn[0].code = BPF_LD | BPF_DW | BPF_IMM insn[0].dst_reg = destination register insn[0].imm = lower 32-bit insn[1].code = 0 insn[1].imm = upper 32-bit All unused fields must be zero. Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads 32-bit immediate value into a register. x64 JITs it as single 'movabsq %rax, imm64' arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn Note that old eBPF programs are binary compatible with new interpreter. It helps eBPF programs load 64-bit constant into a register with one instruction instead of using two registers and 4 instructions: BPF_MOV32_IMM(R1, imm32) BPF_ALU64_IMM(BPF_LSH, R1, 32) BPF_MOV32_IMM(R2, imm32) BPF_ALU64_REG(BPF_OR, R1, R2) User space generated programs will use this instruction to load constants only. To tell kernel that user space needs a pointer the _pseudo_ variant of this instruction may be added later, which will use extra bits of encoding to indicate what type of pointer user space is asking kernel to provide. For example 'off' or 'src_reg' fields can be used for such purpose. src_reg = 1 could mean that user space is asking kernel to validate and load in-kernel map pointer. src_reg = 2 could mean that user space needs readonly data section pointer src_reg = 3 could mean that user space needs a pointer to per-cpu local data All such future pseudo instructions will not be carrying the actual pointer as part of the instruction, but rather will be treated as a request to kernel to provide one. The kernel will verify the request_for_a_pointer, then will drop _pseudo_ marking and will store actual internal pointer inside the instruction, so the end result is the interpreter and JITs never see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that loads 64-bit immediate into a register. User space never operates on direct pointers and verifier can easily recognize request_for_pointer vs other instructions. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09netfilter: nf_tables: add new nft_masq expressionArturo Borrero
The nft_masq expression is intended to perform NAT in the masquerade flavour. We decided to have the masquerade functionality in a separated expression other than nft_nat. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09netfilter: nf_nat: generalize IPv6 masquerading support for nf_tablesArturo Borrero
Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv6; a similar patch exists to handle IPv4. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09netfilter: nf_nat: generalize IPv4 masquerading support for nf_tablesArturo Borrero
Let's refactor the code so we can reach the masquerade functionality from outside the xt context (ie. nftables). The patch includes the addition of an atomic counter to the masquerade notifier: the stuff to be done by the notifier is the same for xt and nftables. Therefore, only one notification handler is needed. This factorization only involves IPv4; a similar patch follows to handle IPv6. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09netfilter: nft_nat: include a flag attributeArturo Borrero
Both SNAT and DNAT (and the upcoming masquerade) can have additional configuration parameters, such as port randomization and NAT addressing persistence. We can cover these scenarios by simply adding a flag attribute for userspace to fill when needed. The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h: NF_NAT_RANGE_MAP_IPS NF_NAT_RANGE_PROTO_SPECIFIED NF_NAT_RANGE_PROTO_RANDOM NF_NAT_RANGE_PERSISTENT NF_NAT_RANGE_PROTO_RANDOM_FULLY NF_NAT_RANGE_PROTO_RANDOM_ALL The caller must take care of not messing up with the flags, as they are added unconditionally to the final resulting nf_nat_range. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09netfilter: nf_tables: add devgroup support in meta expresionAna Rey
Add devgroup support to let us match device group of a packets incoming or outgoing interface. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09netfilter: nat: move specific NAT IPv6 to corePablo Neira Ayuso
Move the specific NAT IPv6 core functions that are called from the hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-08Merge tag 'master-2014-09-08' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-09-08 Please pull this batch of updates intended for the 3.18 stream... For the mac80211 bits, Johannes says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." For the Bluetooth bits, Gustavo says: "The changes consists of: - Coding style fixes to HCI drivers - Corrupted ack value fix for the H5 HCI driver - A couple of Enhanced L2CAP fixes - Conversion of SMP code to use common L2CAP channel API - Page scan optimizations when using the kernel-side whitelist - Various mac802154 and and ieee802154 6lowpan cleanups - One new Atheros USB ID" For the iwlwifi bits, Emmanuel says: "We have a new big thing coming up which is called Dynamic Queue Allocation (or DQA). This is a completely new way to work with the Tx queues and it requires major refactoring. This is being done by Johannes and Avri. Besides this, Johannes disables U-APSD by default because of APs that would disable A-MPDU if the association supports U-ASPD. Luca contributed to the power area which he was cleaning up on the way while working on CSA. A few more random things here and there." For the Atheros bits, Kalle says: "For ath6kl we had two small fixes and a new SDIO device id. For ath10k the bigger changes are: * support for new firmware version 10.2 (Michal) * spectral scan support (Simon, Sven & Mathias) * export a firmware crash dump file (Ben & me) * cleaning up of pci.c (Michal) * print pci id in all messages, which causes most of the churn (Michal)" Beyond that, we have the usual collection of various updates to ath9k, b43, mwifiex, and wil6210, as well as a few other bits here and there. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-08inet: remove dead inetpeer sequence codeWillem de Bruijn
inetpeer sequence numbers are no longer incremented, so no need to check and flush the tree. The function that increments the sequence number was already dead code and removed in in "ipv4: remove unused function" (068a6e18). Remove the code that checks for a change, too. Verifying that v4_seq and v6_seq are never incremented and thus that flush_check compares bp->flush_seq to 0 is trivial. The second part of the change removes flush_check completely even though bp->flush_seq is exactly !0 once, at initialization. This change is correct because the time this branch is true is when bp->root == peer_avl_empty_rcu, in which the branch and inetpeer_invalidate_tree are a NOOP. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-08Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
2014-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix skb leak in mac802154, from Martin Townsend 2) Use select not depends on NF_NAT for NFT_NAT, from Pablo Neira Ayuso 3) Fix union initializer bogosity in vxlan, from Gerhard Stenzel 4) Fix RX checksum configuration in stmmac driver, from Giuseppe CAVALLARO 5) Fix TSO with non-accelerated VLANs in e1000, e1000e, bna, ehea, i40e, i40evf, mvneta, and qlge, from Vlad Yasevich 6) Fix capability checks in phy_init_eee(), from Giuseppe CAVALLARO 7) Try high order allocations more sanely for SKBs, specifically if a high order allocation fails, fall back directly to zero order pages rather than iterating down one order at a time. From Eric Dumazet 8) Fix a memory leak in openvswitch, from Li RongQing 9) amd-xgbe initializes wrong spinlock, from Thomas Lendacky 10) RTNL locking was busted in setsockopt for anycast and multicast, fix from Sabrina Dubroca 11) Fix peer address refcount leak in ipv6, from Nicolas Dichtel 12) DocBook typo fixes, from Masanari Iida * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits) ipv6: restore the behavior of ipv6_sock_ac_drop() amd-xgbe: Enable interrupts for all management counters amd-xgbe: Treat certain counter registers as 64 bit greth: moved TX ring cleaning to NAPI rx poll func cnic : Cleanup CONFIG_IPV6 & VLAN check net: treewide: Fix typo found in DocBook/networking.xml bnx2x: Fix link problems for 1G SFP RJ45 module 3c59x: avoid panic in boomerang_start_xmit when finding page address: netfilter: add explicit Kconfig for NETFILTER_XT_NAT ipv6: use addrconf_get_prefix_route() to remove peer addr ipv6: fix a refcnt leak with peer addr net-timestamp: only report sw timestamp if reporting bit is set drivers/net/fddi/skfp/h/skfbi.h: Remove useless PCI_BASE_2ND macros l2tp: fix race while getting PMTU on PPP pseudo-wire ipv6: fix rtnl locking in setsockopt for anycast and multicast VMXNET3: Check for map error in vmxnet3_set_mc openvswitch: distinguish between the dropped and consumed skb amd-xgbe: Fix initialization of the wrong spin lock openvswitch: fix a memory leak netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG ...
2014-09-07Merge tag 'master-2014-09-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-05 Please pull this batch of fixes intended for the 3.17 stream... For the mac80211 bits, Johannes says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." For the iwlwifi bits, Emmanuel says: "I revert a patch that disabled CTS to self in dvm because users reported issues. The revert is CCed to stable since the offending patch was sent to stable too. I also bump the firmware API versions since a new firmware is coming up. On top of that, Marcel fixes a bug I introduced while fixing a bug in our Kconfig file." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-07Merge tag 'pm+acpi-3.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are regression fixes (ACPI sysfs, ACPI video, suspend test), ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD output, a fix and a new CPU ID for the RAPL driver, new blacklist entry for the ACPI EC driver and a couple of trivial cleanups (intel_pstate and generic PM domains). Specifics: - Fix for recently broken test_suspend= command line argument (Rafael Wysocki). - Fixes for regressions related to the ACPI video driver caused by switching the default to native backlight handling in 3.16 from Hans de Goede. - Fix for a sysfs attribute of ACPI device objects that returns stale values sometimes due to the fact that they are cached instead of executing the appropriate method (_SUN) every time (broken in 3.14). From Yasuaki Ishimatsu. - Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the ACPI processor driver from Jiri Kosina. - Runtime output validation for the ACPI _DSD device configuration object missing from the support for it that has been introduced recently. From Mika Westerberg. - Fix for an unuseful and misleading RAPL (Running Average Power Limit) domain detection message in the RAPL driver from Jacob Pan. - New Intel Haswell CPU ID for the RAPL driver from Jason Baron. - New Clevo W350etq blacklist entry for the ACPI EC driver from Lan Tianyu. - Cleanup for the intel_pstate driver and the core generic PM domains code from Gabriele Mazzotta and Geert Uytterhoeven" * tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock ACPI / scan: not cache _SUN value in struct acpi_device_pnp cpufreq: intel_pstate: Remove unneeded variable powercap / RAPL: change domain detection message powercap / RAPL: add support for CPU model 0x3f PM / domains: Make generic_pm_domain.name const PM / sleep: Fix test_suspend= command line option ACPI / EC: Add msi quirk for Clevo W350etq ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC ACPI / video: Add a disable_native_backlight quirk ACPI / video: Fix use_native_backlight selection logic ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
2014-09-07Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Three fixlets from the timer departement: - Update the timekeeper before updating vsyscall and pvclock. This fixes the kvm-clock regression reported by Chris and Paolo. - Use the proper irq work interface from NMI. This fixes the regression reported by Catalin and Dave. - Clarify the compat_nanosleep error handling mechanism to avoid future confusion" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Update timekeeper before updating vsyscall and pvclock compat: nanosleep: Clarify error handling nohz: Restore NMI safe local irq work for local nohz kick
2014-09-06Merge tag 'for-linus-20140905' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd fixes from Brian Norris: "Two trivial MTD updates for 3.17-rc4: - a tiny comment tweak, to kill a bunch of DocBook warnings added during the merge window - a small fixup to the OTP routines' error handling" * tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd: mtd: nand: fix DocBook warnings on nand_sdr_timings doc mtd: cfi_cmdset_0002: check return code for get_chip()
2014-09-06tcp: remove TCP_SKB_CB(skb)->whenEric Dumazet
After commit 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution"), we no longer need to maintain timestamps in two different fields. TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isnEric Dumazet
TCP_SKB_CB(skb)->when has different meaning in output and input paths. In output path, it contains a timestamp. In input path, it contains an ISN, chosen by tcp_timewait_state_process() Lets add a different name to ease code comprehension. Note that 'when' field will disappear in following patch, as skb_mstamp already contains timestamp, the anonymous union will promptly disappear as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06net: Add function for parsing the header length out of linear ethernet framesAlexander Duyck
This patch updates some of the flow_dissector api so that it can be used to parse the length of ethernet buffers stored in fragments. Most of the changes needed were to __skb_get_poff as it needed to be updated to support sending a linear buffer instead of a skb. I have split __skb_get_poff into two functions, the first is skb_get_poff and it retains the functionality of the original __skb_get_poff. The other function is __skb_get_poff which now works much like __skb_flow_dissect in relation to skb_flow_dissect in that it provides the same functionality but works with just a data buffer and hlen instead of needing an skb. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06net: merge cases where sock_efree and sock_edemux are the same functionAlexander Duyck
Since sock_efree and sock_demux are essentially the same code for non-TCP sockets and the case where CONFIG_INET is not defined we can combine the code or replace the call to sock_edemux in several spots. As a result we can avoid a bit of unnecessary code or code duplication. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06net-timestamp: Make the clone operation stand-alone from phy timestampingAlexander Duyck
The phy timestamping takes a different path than the regular timestamping does in that it will create a clone first so that the packets needing to be timestamped can be placed in a queue, or the context block could be used. In order to support these use cases I am pulling the core of the code out so it can be used in other drivers beyond just phy devices. In addition I have added a destructor named sock_efree which is meant to provide a simple way for dropping the reference to skb exceptions that aren't part of either the receive or send windows for the socket, and I have removed some duplication in spots where this destructor could be used in place of sock_edemux. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06ipv4: harden fnhe_hashfun()Eric Dumazet
Lets make this hash function a bit secure, as ICMP attacks are still in the wild. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06net: treewide: Fix typo found in DocBook/networking.xmlMasanari Iida
This patch fix spelling typo found in DocBook/networking.xml. It is because the neworking.xml is generated from comments in the source, I have to fix typo in comments within the source. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-06ipv4: fix a race in update_or_create_fnhe()Eric Dumazet
nh_exceptions is effectively used under rcu, but lacks proper barriers. Between kzalloc() and setting of nh->nh_exceptions(), we need a proper memory barrier. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05net-timestamp: only report sw timestamp if reporting bit is setWillem de Bruijn
The timestamping API has separate bits for generating and reporting timestamps. A software timestamp should only be reported for a packet when the packet has the relevant generation flag (SKBTX_..) set and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set. The second check was accidentally removed. Reinstitute the original behavior. Tested: Without this patch, Documentation/networking/txtimestamp reports timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set. After the patch, it only reports them when the flag is set. Fixes: f24b9be5957b ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05ethtool: Add generic options for tunablesGovindarajulu Varadarajan
This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting tunable values from driver. Add get_tunable and set_tunable to ethtool_ops. Driver implements these functions for getting/setting tunable value. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05net: bpf: make eBPF interpreter images read-onlyDaniel Borkmann
With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05Merge tag 'regulator-v3.17-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator documentation fixes from Mark Brown: "All the fixes people have found for the regulator API have been documentation fixes, avoiding warnings while building the kerneldoc, fixing some errors in one of the DT bindings documents and fixing some typos in the header" * tag 'regulator-v3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: fix kernel-doc warnings in header files regulator: Proofread documentation regulator: tps65090: Fix tps65090 typos in example
2014-09-05Merge tag 'gpio-v3.17-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: - some documentation sync - resource leak in the bt8xx driver - again fix the way varargs are used to handle the optional flags on the gpiod_* accessors. Now hopefully nailed the entire problem. * tag 'gpio-v3.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: move varargs hack outside #ifdef GPIOLIB gpio: bt8xx: fix release of managed resources Documentation: gpio: documentation for optional getters functions
2014-09-05Merge branches 'pm-sleep', 'powercap', 'pm-domains' and 'pm-cpufreq'Rafael J. Wysocki
* pm-sleep: PM / sleep: Fix test_suspend= command line option * powercap: powercap / RAPL: change domain detection message powercap / RAPL: add support for CPU model 0x3f * pm-domains: PM / domains: Make generic_pm_domain.name const * pm-cpufreq: cpufreq: intel_pstate: Remove unneeded variable
2014-09-05Merge remote-tracking branches 'regulator/fix/doc' and ↵Mark Brown
'regulator/fix/tps65090' into regulator-linus
2014-09-05ipv4: implement igmp_qrv sysctl to tune igmp robustness variableHannes Frederic Sowa
As in IPv6 people might increase the igmp query robustness variable to make sure unsolicited state change reports aren't lost on the network. Add and document this new knob to igmp code. RFCs allow tuning this parameter back to first IGMP RFC, so we also use this setting for all counters, including source specific multicast. Also take over sysctl value when upping the interface and don't reuse the last one seen on the interface. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05ipv6: add sysctl_mld_qrv to configure query robustness variableHannes Frederic Sowa
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-04nohz: Restore NMI safe local irq work for local nohz kickFrederic Weisbecker
The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9) changed its implementation to fire the local kick using the remote kick API. It was convenient to make the code more generic but the remote kick isn't NMI-safe. As a result: WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140() CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34 0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37 0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180 0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000 Call Trace: <NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20 [<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140 [<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90 [<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350 [<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff9a187934>] perf_event_overflow+0x14/0x20 [<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130 [<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff9a007b72>] nmi_handle+0xd2/0x390 [<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff9a0d131b>] ? lock_release+0xab/0x330 [<ffffffff9a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200 [<ffffffff9a008268>] do_nmi+0xb8/0x100 Lets fix this by restoring the use of local irq work for the nohz local kick. Reported-by: Catalin Iacob <iacobcatalin@gmail.com> Reported-and-tested-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-04bcma: get info about flash type SoC booted fromRafał Miłecki
There is an ongoing work on cleaning MIPS's nvram support so it could be re-used on other platforms (bcm53xx to say precisely). This will require a bit of extra logic in bcma this patch implements. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-04Merge tag 'mac80211-next-for-john-2014-08-29' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg <johannes@sipsolutions.net> says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." Signed-off-by: John W. Linville <linville@tuxdriver.com> Conflicts: drivers/net/wireless/ath/wil6210/wmi.c
2014-09-04Merge tag 'mac80211-for-john-2014-08-29' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." Signed-off-by: John W. Linville <linville@redhat.com>
2014-09-04Merge tag 'sound-3.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This time it contains a bunch of small ASoC fixes that slipped from in previous updates, in addition to the usual HD-audio fixes and the regression fixes for FireWire updates in 3.17. All commits are reasonably small fixes" * tag 'sound-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix COEF setups for ALC1150 codec ASoC: simple-card: Fix bug of wrong decrement DT node's refcount ALSA: hda - Fix digital mic on Acer Aspire 3830TG ASoC: omap-twl4030: Fix typo in 2nd dai link's platform_name ALSA: firewire-lib/dice: add arrangements of PCM pointer and interrupts for Dice quirk ALSA: dice: fix wrong channel mappping at higher sampling rate ASoC: cs4265: Fix setting of functional mode and clock divider ASoC: cs4265: Fix clock rates in clock map table ASoC: rt5677: correct mismatch widget name ASoC: rt5640: Do not allow regmap to use bulk read-write operations ASoC: tegra: Fix typo in include guard ASoC: da732x: Fix typo in include guard ASoC: core: fix .info for SND_SOC_BYTES_TLV ASoC: rcar: Use && instead of & for boolean expressions ASoC: Use dev_set_name() instead of init_name ASoC: axi: Fix ADI AXI SPDIF specification
2014-09-04lib/rhashtable: allow user to set the minimum shifts of shrinkingYing Xue
Although rhashtable library allows user to specify a quiet big size for user's created hash table, the table may be shrunk to a very small size - HASH_MIN_SIZE(4) after object is removed from the table at the first time. Subsequently, even if the total amount of objects saved in the table is quite lower than user's initial setting in a long time, the hash table size is still dynamically adjusted by rhashtable_shrink() or rhashtable_expand() each time object is inserted or removed from the table. However, as synchronize_rcu() has to be called when table is shrunk or expanded by the two functions, we should permit user to set the minimum table size through configuring the minimum number of shifts according to user specific requirement, avoiding these expensive actions of shrinking or expanding because of calling synchronize_rcu(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>