Age | Commit message (Collapse) | Author |
|
Make cls_tcindex RCU safe.
This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.
This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.
Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since BPF JIT depends on the availability of module_alloc() and
module_free() helpers (HAVE_BPF_JIT and MODULES), we better build
that code only in case we have BPF_JIT in our config enabled, just
like with other JIT code. Fixes builds for arm/marzen_defconfig
and sh/rsk7269_defconfig.
====================
kernel/built-in.o: In function `bpf_jit_binary_alloc':
/home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc'
kernel/built-in.o: In function `bpf_jit_binary_free':
/home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free'
make: *** [vmlinux] Error 1
====================
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 738cbe72adc5 ("net: bpf: consolidate JIT binary allocator")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
nf-next pull request
The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:
1) use __u8 instead of u_int8_t in arptables header, from
Mike Frysinger.
2) Add support to match by skb->pkttype to the meta expression, from
Ana Rey.
3) Add support to match by cpu to the meta expression, also from
Ana Rey.
4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
Vytas Dauksa.
5) Fix netnet and netportnet hash types the range support for IPv4,
from Sergey Popovich.
6) Fix missing-field-initializer warnings resolved, from Mark Rustad.
7) Dan Carperter reported possible integer overflows in ipset, from
Jozsef Kadlecsick.
8) Filter out accounting objects in nfacct by type, so you can
selectively reset quotas, from Alexey Perevalov.
9) Move specific NAT IPv4 functions to the core so x_tables and
nf_tables can share the same NAT IPv4 engine.
10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.
11) Move specific NAT IPv6 functions to the core so x_tables and
nf_tables can share the same NAT IPv4 engine.
12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.
13) Refactor code to add nft_delrule(), which can be reused in the
enhancement of the NFT_MSG_DELTABLE to remove a table and its
content, from Arturo Borrero.
14) Add a helper function to unregister chain hooks, from
Arturo Borrero.
15) A cleanup to rename to nft_delrule_by_chain for consistency with
the new nft_*() functions, also from Arturo.
16) Add support to match devgroup to the meta expression, from Ana Rey.
17) Reduce stack usage for IPVS socket option, from Julian Anastasov.
18) Remove unnecessary textsearch state initialization in xt_string,
from Bojan Prtvar.
19) Add several helper functions to nf_tables, more work to prepare
the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.
20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
Arturo Borrero.
21) Support NAT flags in the nat expression to indicate the flavour,
eg. random fully, from Arturo.
22) Add missing audit code to ebtables when replacing tables, from
Nicolas Dichtel.
23) Generalize the IPv4 masquerading code to allow its re-use from
nf_tables, from Arturo.
24) Generalize the IPv6 masquerading code, also from Arturo.
25) Add the new masq expression to support IPv4/IPv6 masquerading
from nf_tables, also from Arturo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.
We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.
As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in aa2d2c73c21f
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.
Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.
This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.
JIT tested on x86_64 and s390x with BPF test suite.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
allow user space to generate eBPF programs
uapi/linux/bpf.h: eBPF instruction set definition
linux/filter.h: the rest
This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.
These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.
Full eBPF ISA description is in Documentation/networking/filter.txt
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.
x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn
Note that old eBPF programs are binary compatible with new interpreter.
It helps eBPF programs load 64-bit constant into a register with one
instruction instead of using two registers and 4 instructions:
BPF_MOV32_IMM(R1, imm32)
BPF_ALU64_IMM(BPF_LSH, R1, 32)
BPF_MOV32_IMM(R2, imm32)
BPF_ALU64_REG(BPF_OR, R1, R2)
User space generated programs will use this instruction to load constants only.
To tell kernel that user space needs a pointer the _pseudo_ variant of
this instruction may be added later, which will use extra bits of encoding
to indicate what type of pointer user space is asking kernel to provide.
For example 'off' or 'src_reg' fields can be used for such purpose.
src_reg = 1 could mean that user space is asking kernel to validate and
load in-kernel map pointer.
src_reg = 2 could mean that user space needs readonly data section pointer
src_reg = 3 could mean that user space needs a pointer to per-cpu local data
All such future pseudo instructions will not be carrying the actual pointer
as part of the instruction, but rather will be treated as a request to kernel
to provide one. The kernel will verify the request_for_a_pointer, then
will drop _pseudo_ marking and will store actual internal pointer inside
the instruction, so the end result is the interpreter and JITs never
see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
loads 64-bit immediate into a register. User space never operates on direct
pointers and verifier can easily recognize request_for_pointer vs other
instructions.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The nft_masq expression is intended to perform NAT in the masquerade flavour.
We decided to have the masquerade functionality in a separated expression other
than nft_nat.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).
The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.
This factorization only involves IPv6; a similar patch exists to
handle IPv4.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).
The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.
This factorization only involves IPv4; a similar patch follows to
handle IPv6.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Both SNAT and DNAT (and the upcoming masquerade) can have additional
configuration parameters, such as port randomization and NAT addressing
persistence. We can cover these scenarios by simply adding a flag
attribute for userspace to fill when needed.
The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h:
NF_NAT_RANGE_MAP_IPS
NF_NAT_RANGE_PROTO_SPECIFIED
NF_NAT_RANGE_PROTO_RANDOM
NF_NAT_RANGE_PERSISTENT
NF_NAT_RANGE_PROTO_RANDOM_FULLY
NF_NAT_RANGE_PROTO_RANDOM_ALL
The caller must take care of not messing up with the flags, as they are
added unconditionally to the final resulting nf_nat_range.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add devgroup support to let us match device group of a packets incoming
or outgoing interface.
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Move the specific NAT IPv6 core functions that are called from the
hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the
ground to allow iptables and nft to use the same NAT engine code that
comes in a follow up patch.
This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in
net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-09-08
Please pull this batch of updates intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."
For the Bluetooth bits, Gustavo says:
"The changes consists of:
- Coding style fixes to HCI drivers
- Corrupted ack value fix for the H5 HCI driver
- A couple of Enhanced L2CAP fixes
- Conversion of SMP code to use common L2CAP channel API
- Page scan optimizations when using the kernel-side whitelist
- Various mac802154 and and ieee802154 6lowpan cleanups
- One new Atheros USB ID"
For the iwlwifi bits, Emmanuel says:
"We have a new big thing coming up which is called Dynamic Queue
Allocation (or DQA). This is a completely new way to work with the
Tx queues and it requires major refactoring. This is being done by
Johannes and Avri. Besides this, Johannes disables U-APSD by default
because of APs that would disable A-MPDU if the association supports
U-ASPD. Luca contributed to the power area which he was cleaning
up on the way while working on CSA. A few more random things here
and there."
For the Atheros bits, Kalle says:
"For ath6kl we had two small fixes and a new SDIO device id.
For ath10k the bigger changes are:
* support for new firmware version 10.2 (Michal)
* spectral scan support (Simon, Sven & Mathias)
* export a firmware crash dump file (Ben & me)
* cleaning up of pci.c (Michal)
* print pci id in all messages, which causes most of the churn (Michal)"
Beyond that, we have the usual collection of various updates to ath9k,
b43, mwifiex, and wil6210, as well as a few other bits here and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inetpeer sequence numbers are no longer incremented, so no need to
check and flush the tree. The function that increments the sequence
number was already dead code and removed in in "ipv4: remove unused
function" (068a6e18). Remove the code that checks for a change, too.
Verifying that v4_seq and v6_seq are never incremented and thus that
flush_check compares bp->flush_seq to 0 is trivial.
The second part of the change removes flush_check completely even
though bp->flush_seq is exactly !0 once, at initialization. This
change is correct because the time this branch is true is when
bp->root == peer_avl_empty_rcu, in which the branch and
inetpeer_invalidate_tree are a NOOP.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
|
|
|
|
Pull networking fixes from David Miller:
1) Fix skb leak in mac802154, from Martin Townsend
2) Use select not depends on NF_NAT for NFT_NAT, from Pablo Neira
Ayuso
3) Fix union initializer bogosity in vxlan, from Gerhard Stenzel
4) Fix RX checksum configuration in stmmac driver, from Giuseppe
CAVALLARO
5) Fix TSO with non-accelerated VLANs in e1000, e1000e, bna, ehea,
i40e, i40evf, mvneta, and qlge, from Vlad Yasevich
6) Fix capability checks in phy_init_eee(), from Giuseppe CAVALLARO
7) Try high order allocations more sanely for SKBs, specifically if a
high order allocation fails, fall back directly to zero order pages
rather than iterating down one order at a time. From Eric Dumazet
8) Fix a memory leak in openvswitch, from Li RongQing
9) amd-xgbe initializes wrong spinlock, from Thomas Lendacky
10) RTNL locking was busted in setsockopt for anycast and multicast, fix
from Sabrina Dubroca
11) Fix peer address refcount leak in ipv6, from Nicolas Dichtel
12) DocBook typo fixes, from Masanari Iida
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
ipv6: restore the behavior of ipv6_sock_ac_drop()
amd-xgbe: Enable interrupts for all management counters
amd-xgbe: Treat certain counter registers as 64 bit
greth: moved TX ring cleaning to NAPI rx poll func
cnic : Cleanup CONFIG_IPV6 & VLAN check
net: treewide: Fix typo found in DocBook/networking.xml
bnx2x: Fix link problems for 1G SFP RJ45 module
3c59x: avoid panic in boomerang_start_xmit when finding page address:
netfilter: add explicit Kconfig for NETFILTER_XT_NAT
ipv6: use addrconf_get_prefix_route() to remove peer addr
ipv6: fix a refcnt leak with peer addr
net-timestamp: only report sw timestamp if reporting bit is set
drivers/net/fddi/skfp/h/skfbi.h: Remove useless PCI_BASE_2ND macros
l2tp: fix race while getting PMTU on PPP pseudo-wire
ipv6: fix rtnl locking in setsockopt for anycast and multicast
VMXNET3: Check for map error in vmxnet3_set_mc
openvswitch: distinguish between the dropped and consumed skb
amd-xgbe: Fix initialization of the wrong spin lock
openvswitch: fix a memory leak
netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2014-09-05
Please pull this batch of fixes intended for the 3.17 stream...
For the mac80211 bits, Johannes says:
"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.
In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."
For the iwlwifi bits, Emmanuel says:
"I revert a patch that disabled CTS to self in dvm because users
reported issues. The revert is CCed to stable since the offending
patch was sent to stable too. I also bump the firmware API versions
since a new firmware is coming up. On top of that, Marcel fixes a
bug I introduced while fixing a bug in our Kconfig file."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI sysfs, ACPI video, suspend test),
ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD
output, a fix and a new CPU ID for the RAPL driver, new blacklist
entry for the ACPI EC driver and a couple of trivial cleanups
(intel_pstate and generic PM domains).
Specifics:
- Fix for recently broken test_suspend= command line argument (Rafael
Wysocki).
- Fixes for regressions related to the ACPI video driver caused by
switching the default to native backlight handling in 3.16 from
Hans de Goede.
- Fix for a sysfs attribute of ACPI device objects that returns stale
values sometimes due to the fact that they are cached instead of
executing the appropriate method (_SUN) every time (broken in
3.14). From Yasuaki Ishimatsu.
- Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the
ACPI processor driver from Jiri Kosina.
- Runtime output validation for the ACPI _DSD device configuration
object missing from the support for it that has been introduced
recently. From Mika Westerberg.
- Fix for an unuseful and misleading RAPL (Running Average Power
Limit) domain detection message in the RAPL driver from Jacob Pan.
- New Intel Haswell CPU ID for the RAPL driver from Jason Baron.
- New Clevo W350etq blacklist entry for the ACPI EC driver from Lan
Tianyu.
- Cleanup for the intel_pstate driver and the core generic PM domains
code from Gabriele Mazzotta and Geert Uytterhoeven"
* tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
ACPI / scan: not cache _SUN value in struct acpi_device_pnp
cpufreq: intel_pstate: Remove unneeded variable
powercap / RAPL: change domain detection message
powercap / RAPL: add support for CPU model 0x3f
PM / domains: Make generic_pm_domain.name const
PM / sleep: Fix test_suspend= command line option
ACPI / EC: Add msi quirk for Clevo W350etq
ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC
ACPI / video: Add a disable_native_backlight quirk
ACPI / video: Fix use_native_backlight selection logic
ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Three fixlets from the timer departement:
- Update the timekeeper before updating vsyscall and pvclock. This
fixes the kvm-clock regression reported by Chris and Paolo.
- Use the proper irq work interface from NMI. This fixes the
regression reported by Catalin and Dave.
- Clarify the compat_nanosleep error handling mechanism to avoid
future confusion"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Update timekeeper before updating vsyscall and pvclock
compat: nanosleep: Clarify error handling
nohz: Restore NMI safe local irq work for local nohz kick
|
|
Pull mtd fixes from Brian Norris:
"Two trivial MTD updates for 3.17-rc4:
- a tiny comment tweak, to kill a bunch of DocBook warnings added
during the merge window
- a small fixup to the OTP routines' error handling"
* tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd:
mtd: nand: fix DocBook warnings on nand_sdr_timings doc
mtd: cfi_cmdset_0002: check return code for get_chip()
|
|
After commit 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution"),
we no longer need to maintain timestamps in two different fields.
TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TCP_SKB_CB(skb)->when has different meaning in output and input paths.
In output path, it contains a timestamp.
In input path, it contains an ISN, chosen by tcp_timewait_state_process()
Lets add a different name to ease code comprehension.
Note that 'when' field will disappear in following patch,
as skb_mstamp already contains timestamp, the anonymous
union will promptly disappear as well.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments. Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.
I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff. The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since sock_efree and sock_demux are essentially the same code for non-TCP
sockets and the case where CONFIG_INET is not defined we can combine the
code or replace the call to sock_edemux in several spots. As a result we
can avoid a bit of unnecessary code or code duplication.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phy timestamping takes a different path than the regular timestamping
does in that it will create a clone first so that the packets needing to be
timestamped can be placed in a queue, or the context block could be used.
In order to support these use cases I am pulling the core of the code out
so it can be used in other drivers beyond just phy devices.
In addition I have added a destructor named sock_efree which is meant to
provide a simple way for dropping the reference to skb exceptions that
aren't part of either the receive or send windows for the socket, and I
have removed some duplication in spots where this destructor could be used
in place of sock_edemux.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lets make this hash function a bit secure, as ICMP attacks are still
in the wild.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh->nh_exceptions(),
we need a proper memory barrier.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.
The second check was accidentally removed. Reinstitute the original
behavior.
Tested:
Without this patch, Documentation/networking/txtimestamp reports
timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
After the patch, it only reports them when the flag is set.
Fixes: f24b9be5957b ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting
tunable values from driver.
Add get_tunable and set_tunable to ethtool_ops. Driver implements these
functions for getting/setting tunable value.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate. This patch moves the to be interpreted bytecode to
read-only pages.
In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").
This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).
Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.
Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables. Brad Spengler proposed the same idea
via Twitter during development of this patch.
Joint work with Hannes Frederic Sowa.
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator documentation fixes from Mark Brown:
"All the fixes people have found for the regulator API have been
documentation fixes, avoiding warnings while building the kerneldoc,
fixing some errors in one of the DT bindings documents and fixing some
typos in the header"
* tag 'regulator-v3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fix kernel-doc warnings in header files
regulator: Proofread documentation
regulator: tps65090: Fix tps65090 typos in example
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- some documentation sync
- resource leak in the bt8xx driver
- again fix the way varargs are used to handle the optional flags on
the gpiod_* accessors. Now hopefully nailed the entire problem.
* tag 'gpio-v3.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: move varargs hack outside #ifdef GPIOLIB
gpio: bt8xx: fix release of managed resources
Documentation: gpio: documentation for optional getters functions
|
|
* pm-sleep:
PM / sleep: Fix test_suspend= command line option
* powercap:
powercap / RAPL: change domain detection message
powercap / RAPL: add support for CPU model 0x3f
* pm-domains:
PM / domains: Make generic_pm_domain.name const
* pm-cpufreq:
cpufreq: intel_pstate: Remove unneeded variable
|
|
'regulator/fix/tps65090' into regulator-linus
|
|
As in IPv6 people might increase the igmp query robustness variable to
make sure unsolicited state change reports aren't lost on the network. Add
and document this new knob to igmp code.
RFCs allow tuning this parameter back to first IGMP RFC, so we also use
this setting for all counters, including source specific multicast.
Also take over sysctl value when upping the interface and don't reuse
the last one seen on the interface.
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query
robustness variable. It specifies how many retransmit of unsolicited mld
retransmit should happen. Admins might want to tune this on lossy links.
Also reset mld state on interface down/up, so we pick up new sysctl
settings during interface up event.
IPv6 certification requests this knob to be available.
I didn't make this knob netns specific, as it is mostly a setting in a
physical environment and should be per host.
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.
As a result:
WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
Call Trace:
<NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
[<ffffffff9a008268>] do_nmi+0xb8/0x100
Lets fix this by restoring the use of local irq work for the nohz local
kick.
Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
There is an ongoing work on cleaning MIPS's nvram support so it could be
re-used on other platforms (bcm53xx to say precisely).
This will require a bit of extra logic in bcma this patch implements.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:
"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Conflicts:
drivers/net/wireless/ath/wil6210/wmi.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.
In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."
Signed-off-by: John W. Linville <linville@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This time it contains a bunch of small ASoC fixes that slipped from in
previous updates, in addition to the usual HD-audio fixes and the
regression fixes for FireWire updates in 3.17.
All commits are reasonably small fixes"
* tag 'sound-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix COEF setups for ALC1150 codec
ASoC: simple-card: Fix bug of wrong decrement DT node's refcount
ALSA: hda - Fix digital mic on Acer Aspire 3830TG
ASoC: omap-twl4030: Fix typo in 2nd dai link's platform_name
ALSA: firewire-lib/dice: add arrangements of PCM pointer and interrupts for Dice quirk
ALSA: dice: fix wrong channel mappping at higher sampling rate
ASoC: cs4265: Fix setting of functional mode and clock divider
ASoC: cs4265: Fix clock rates in clock map table
ASoC: rt5677: correct mismatch widget name
ASoC: rt5640: Do not allow regmap to use bulk read-write operations
ASoC: tegra: Fix typo in include guard
ASoC: da732x: Fix typo in include guard
ASoC: core: fix .info for SND_SOC_BYTES_TLV
ASoC: rcar: Use && instead of & for boolean expressions
ASoC: Use dev_set_name() instead of init_name
ASoC: axi: Fix ADI AXI SPDIF specification
|
|
Although rhashtable library allows user to specify a quiet big size
for user's created hash table, the table may be shrunk to a
very small size - HASH_MIN_SIZE(4) after object is removed from
the table at the first time. Subsequently, even if the total amount
of objects saved in the table is quite lower than user's initial
setting in a long time, the hash table size is still dynamically
adjusted by rhashtable_shrink() or rhashtable_expand() each time
object is inserted or removed from the table. However, as
synchronize_rcu() has to be called when table is shrunk or
expanded by the two functions, we should permit user to set the
minimum table size through configuring the minimum number of shifts
according to user specific requirement, avoiding these expensive
actions of shrinking or expanding because of calling synchronize_rcu().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|