Age | Commit message (Collapse) | Author |
|
Rather than returning earlier value (EINVAL), return ENOMEM if
kzalloc fails. Found while reviewing to find another EINVAL condition.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
include/net/xfrm.h
Simple conflict between Joe Perches "extern" removal for function
declarations in header files and the changes in Steffen's tree.
Steffen Klassert says:
====================
Two patches that are left from the last development cycle.
Manual merging of include/net/xfrm.h is needed. The conflict
can be solved as it is currently done in linux-next.
1) We announce the creation of temporary acquire state via an asyc event,
so the deletion should be annunced too. From Nicolas Dichtel.
2) The VTI tunnels do not real tunning, they just provide a routable
IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When flags IFF_PROMISC and IFF_ALLMULTI are changed, netlink messages are not
consistent. For example, if a multicast daemon is running (flag IFF_ALLMULTI
set in dev->flags but not dev->gflags, ie not exported to userspace) and then a
user sets it via netlink (flag IFF_ALLMULTI set in dev->flags and dev->gflags, ie
exported to userspace), no netlink message is sent.
Same for IFF_PROMISC and because dev->promiscuity is exported via
IFLA_PROMISCUITY, we may send a netlink message after each change of this
counter.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch only prepares the next one, there is no functional change.
Now, __dev_notify_flags() can also be used to notify flags changes via
rtnetlink.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As mentioned in commit afe4fd062416b ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.
SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.
u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
To be effectively paced, a flow must use FQ packet scheduler.
Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.
I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().
The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.
Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:
- the TTL, expressed as __u8.
The allowed values are in the [1-255].
A value of 0 means that the TTL is not specified.
- the TOS, expressed as __s16.
The allowed values are in the range [0,255].
A value of -1 means that the TOS is not specified.
- the priority, expressed as a char and computed when
handling the ancillary data.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch provides an additional safety net against NULL
pointer dereferences while walking the fib trie for the new
/proc/net/ipv6_route walkers. I never needed it myself and am unsure
if it is needed at all, but the same checks where introduced in
2bec5a369ee79576a3eea2c23863325089785a2c ("ipv6: fib: fix crash when
changing large fib while dumping it") to fix NULL pointer bugs.
This patch is separated from the first patch to make it easier to revert
if we are sure we can drop this logic.
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dumping routes on a system with lots rt6_infos in the fibs causes up to
11-order allocations in seq_file (which fail). While we could switch
there to vmalloc we could just implement the streaming interface for
/proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from
single_open_net to seq_open_net.
loff_t *pos tracks dst entries.
Also kill never used struct rt6_proc_arg and now unused function
fib6_clean_all_ro.
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also, remove the same functionality from bonding - it will be already done
for any device that links to its lower/upper neighbour.
The links will be created for dev's kobject, and will look like
lower_eth0 for lower device eth0 and upper_bridge0 for upper device
bridge0.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we can have only one master upper neighbour, so it would be
useful to create a symlink to it in the sysfs device directory, the way
that bonding now does it, for every device. Lower devices from
bridge/team/etc will automagically get it, so we could rely on it.
Also, remove the same functionality from bonding.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On netdev unregister we're removing also all of its sysfs-associated stuff,
including the sysfs symlinks that are controlled by netdev neighbour code.
Also, it's a subtle race condition - cause we can still access it after
unregistering.
Move the unlinking right before the unregistering to fix both.
CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise users might access it without being fully registered, as per
sysfs - it only inits in register_netdevice(), so is unusable till it is
called.
CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It will be useful to get first/last element.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a possibility to iterate through netdev_adjacent's private, currently
only for lower neighbours.
Add both RCU and RTNL/other locking variants of iterators, and make the
non-rcu variant to be safe from removal.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, even though we can access any linked device, we can't attach
anything to it, which is vital to properly manage them.
To fix this, add a new void *private to netdev_adjacent and functions
setting/getting it (per link), so that we can save, per example, bonding's
slave structures there, per slave device.
netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to
upper dev and populates the neighbour link only with private.
netdev_lower_dev_get_private{,_rcu}() returns the private, if found.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we have only the RTNL flavour, however we can traverse it while
holding only RCU, so add the RCU search. Add an RCU variant that uses
list_head * as an argument, so that it can be universally used afterwards.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we distinguish neighbours (first-level linked devices) from
non-neighbours by the neighbour bool in the netdev_adjacent. This could be
quite time-consuming in case we would like to traverse *only* through
neighbours - cause we'd have to traverse through all devices and check for
this flag, and in a (quite common) scenario where we have lots of vlans on
top of bridge, which is on top of a bond - the bonding would have to go
through all those vlans to get its upper neighbour linked devices.
This situation is really unpleasant, cause there are already a lot of cases
when a device with slaves needs to go through them in hot path.
To fix this, introduce a new upper/lower device lists structure -
adj_list, which contains only the neighbours. It works always in
pair with the all_adj_list structure (renamed from upper/lower_dev_list),
i.e. both of them contain the same links, only that all_adj_list contains
also non-neighbour device links. It's really a small change visible,
currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't
change the main linked logic at all.
Also, add some comments a fix a name collision in
netdev_for_each_upper_dev_rcu() and rework the naming by the following
rules:
netdev_(all_)(upper|lower)_*
If "all_" is present, then we work with the whole list of upper/lower
devices, otherwise - only with direct neighbours. Uninline functions - to
get better stack traces.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we make use of bool upper when we want to specify if we want to
work with upper/lower list. It's, however, harder to read, debug and
occupies a lot more code.
Fix this by just passing the correct upper/lower_dev_list list_head pointer
instead of bool upper, and work internally with it.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we always use the first member of the arp_queue to determine
the sender ip address of the arp packet (or in case of IPv6 - source
address of the ndisc packet). This skb is fixed as long as the queue is
not drained by a complete purge because of a timeout or by a successful
response.
If the first packet enqueued on the arp_queue is from a local application
with a manually set source address and the to be discovered system
does some kind of uRPF checks on the source address in the arp packet
the resolving process hangs until a timeout and restarts. This hurts
communication with the participating network node.
This could be mitigated a bit if we use the latest enqueued skb's
source address for the resolving process, which is not as static as
the arp_queue's head. This change of the source address could result in
better recovery of a failed solicitation.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There was some bug report on ipv6 module removal path before.
Also, as Stephen pointed out, after vxlan module gets ipv6 support,
the ipv6 stub it used is not safe against this module removal either.
So, let's just remove inet6_exit() so that ipv6 module will not be
able to be unloaded.
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamic Right Sizing (DRS) is supposed to open TCP receive window
automatically, but suffers from two bugs, presented by order
of importance.
1) tcp_rcv_space_adjust() fix :
Using twice the last received amount is very pessimistic,
because it doesn't allow fast recovery or proper slow start
ramp up, if sender wants to increase cwin by 100% every RTT.
copied = bytes received in previous RTT
2*copied = bytes we expect to receive in next RTT
4*copied = bytes we need to advertise in rwin at end of next RTT
DRS is one RTT late, it needs a 4x factor.
If sender is not using ABC, and increases cwin by 50% every rtt,
then we needed 1.5*1.5 = 2.25 factor.
This is probably why this bug was not really noticed.
2) There is no window adjustment after first RTT. DRS triggers only
after the second RTT.
DRS needs two RTT to initialize, so tcp_fixup_rcvbuf() should setup
sk_rcvbuf to allow proper window grow for first two RTT.
This patch increases TCP efficiency particularly for large RTT flows
when autotuning is used at the receiver, and more particularly
in presence of packet losses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Halve mss table size to make blind cookie guessing more difficult.
This is sad since the tables were already small, but there
is little alternative except perhaps adding more precise mss information
in the tcp timestamp. Timestamps are unfortunately not ubiquitous.
Guessing all possible cookie values still has 8-in 2**32 chance.
Reported-by: Jakob Lell <jakob@jakoblell.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently accept cookies that were created less than 4 minutes ago
(ie, cookies with counter delta 0-3). Combined with the 8 mss table
values, this yields 32 possible values (out of 2**32) that will be valid.
Reducing the lifetime to < 2 minutes halves the guessing chance while
still providing a large enough period.
While at it, get rid of jiffies value -- they overflow too quickly on
32 bit platforms.
getnstimeofday is used to create a counter that increments every 64s.
perf shows getnstimeofday cost is negible compared to sha_transform;
normal tcp initial sequence number generation uses getnstimeofday, too.
Reported-by: Jakob Lell <jakob@jakoblell.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HTB already can deal with 64bit rates, we only have to add two new
attributes so that tc can use them to break the current 32bit ABI
barrier.
TCA_HTB_RATE64 : class rate (in bytes per second)
TCA_HTB_CEIL64 : class ceil (in bytes per second)
This allows us to setup HTB on 40Gbps links, as 32bit limit is
actually ~34Gbps
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an extra u64 rate parameter to psched_ratecfg_precompute()
so that some qdisc can opt-in for 64bit rates in the future,
to overcome the ~34 Gbits limit.
psched_ratecfg_getrate() reports a legacy structure to
tc utility, so if actual rate is above the 32bit rate field,
cap it to the 34Gbit limit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
removed these checkpatch.pl warnings:
net/ethernet/eth.c:61: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
net/ethernet/eth.c:136: WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ...
net/ethernet/eth.c:181: ERROR: space prohibited before that close parenthesis ')'
Signed-off-by: Avinash Kumar <avi.kp.137@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) If the local_df boolean is set on an SKB we have to allocate a
unique ID even if IP_DF is set in the ipv4 headers, from Ansis
Atteka.
2) Some fixups for the new chipset support that went into the sfc
driver, from Ben Hutchings.
3) Because SCTP bypasses a good chunk of, and actually duplicates, the
logic of the ipv6 output path, some IPSEC things don't get done
properly. Integrate SCTP better into the ipv6 output path so that
these problems are fixed and such issues don't get missed in the
future either. From Daniel Borkmann.
4) Fix skge regressions added by the DMA mapping error return checking
added in v3.10, from Mikulas Patocka.
5) Kill some more IRQF_DISABLED references, from Michael Opdenacker.
6) Fix races and deadlocks in the bridging code, from Hong Zhiguo.
7) Fix error handling in tun_set_iff(), in particular don't leak
resources. From Jason Wang.
8) Prevent format-string injection into xen-netback driver, from Kees
Cook.
9) Fix regression added to netpoll ARP packet handling, in particular
check for the right ETH_P_ARP protocol code. From Sonic Zhang.
10) Try to deal with AMD IOMMU errors when using r8169 chips, from
Francois Romieu.
11) Cure freezes due to recent changes in the rt2x00 wireless driver,
from Stanislaw Gruszka.
12) Don't do SPI transfers (which can sleep) in interrupt context in
cw1200 driver, from Solomon Peachy.
13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719.
From Nithin Sujir.
14) Make xen_netbk_count_skb_slots() count the actual number of slots
that will be used, taking into consideration packing and other
issues that the transmit path will run into. From David Vrabel.
15) Use the correct maximum age when calculating the bridge
message_age_timer, from Chris Healy.
16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey
Khoroshilov.
17) Netfilter conntrack extensions were converted to RCU but are not
always freed properly using kfree_rcu(). Fix from Michal Kubecek.
18) VF reset recovery not being done correctly in qlcnic driver, from
Manish Chopra.
19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko.
20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang.
21) Internal switch not initialized properly in bgmac driver, from Rafał
Miłecki.
22) Netlink messages report wrong local and remote addresses in IPv6
tunneling, from Ding Zhi.
23) ICMP redirects should not generate socket errors in DCCP and SCTP.
We're still working out how this should be handled for RAW and UDP
sockets. From Daniel Borkmann and Duan Jiong.
24) We've had several bugs wherein the network namespace's loopback
device gets accessed after it is free'd, NULL it out so that we can
catch these problems more readily. From Eric W Biederman.
25) Fix regression in TCP RTO calculations, from Neal Cardwell.
26) Fix too early free of xen-netback network device when VIFs still
exist. From Paul Durrant.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
netconsole: fix a deadlock with rtnl and netconsole's mutex
netpoll: fix NULL pointer dereference in netpoll_cleanup
skge: fix broken driver
ip: generate unique IP identificator if local fragmentation is allowed
ip: use ip_hdr() in __ip_make_skb() to retrieve IP header
xen-netback: Don't destroy the netdev until the vif is shut down
net:dccp: do not report ICMP redirects to user space
cnic: Fix crash in cnic_bnx2x_service_kcq()
bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions.
vxlan: Avoid creating fdb entry with NULL destination
tcp: fix RTO calculated from cached RTT
drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h>
net loopback: Set loopback_dev to NULL when freed
batman-adv: set the TAG flag for the vid passed to BLA
netfilter: nfnetlink_queue: use network skb for sequence adjustment
net: sctp: rfc4443: do not report ICMP redirects to user space
net: usb: cdc_ether: use usb.h macros whenever possible
net: usb: cdc_ether: fix checkpatch errors and warnings
net: usb: cdc_ether: Use wwan interface for Telit modules
ip6_tunnels: raddr and laddr are inverted in nl msg
...
|
|
I've been hitting a NULL ptr deref while using netconsole because the
np->dev check and the pointer manipulation in netpoll_cleanup are done
without rtnl and the following sequence happens when having a netconsole
over a vlan and we remove the vlan while disabling the netconsole:
CPU 1 CPU2
removes vlan and calls the notifier
enters store_enabled(), calls
netdev_cleanup which checks np->dev
and then waits for rtnl
executes the netconsole netdev
release notifier making np->dev
== NULL and releases rtnl
continues to dereference a member of
np->dev which at this point is == NULL
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.
For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->data already points to IP header, but for the sake of
consistency we can also use ip_hdr() to retrieve it.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
"These fix several bugs with RBD from 3.11 that didn't get tested in
time for the merge window: some error handling, a use-after-free, and
a sequencing issue when unmapping and image races with a notify
operation.
There is also a patch fixing a problem with the new ceph + fscache
code that just went in"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
fscache: check consistency does not decrement refcount
rbd: fix error handling from rbd_snap_name()
rbd: ignore unmapped snapshots that no longer exist
rbd: fix use-after free of rbd_dev->disk
rbd: make rbd_obj_notify_ack() synchronous
rbd: complete notifies before cleaning up osd_client and rbd_dev
libceph: add function to ensure notifies are complete
|
|
DCCP shouldn't be setting sk_err on redirects as it
isn't an error condition. it should be doing exactly
what tcp is doing and leaving the error handler without
touching the socket.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Included change:
- fix the Bridge Loop Avoidance component by marking the variables containing
the VLAN ID with the HAS_TAG flag when needed.
|
|
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for you net tree,
mostly targeted to ipset, they are:
* Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
Phil Oester.
* Fix RCU race in conntrack extensions release path, from Michal Kubecek.
* Fix missing inversion in the userspace ipset test command match if
the nomatch option is specified, from Jozsef Kadlecsik.
* Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
also from Jozsef Kadlecsik.
* Fix sequence adjustment in nfnetlink_queue due to using the netlink
skb instead of the network skb, from Gao feng.
* Make sure we cannot swap of sets with different layer 3 family in
ipset, from Jozsef Kadlecsik.
* Fix possible bogus matching in ipset if hash sets with net elements
are used, from Oliver Smith.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 1b7fdd2ab5852 ("tcp: do not use cached RTT for RTT estimation")
did not correctly account for the fact that crtt is the RTT shifted
left 3 bits. Fix the calculation to consistently reflect this fact.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When receiving or sending a packet a packet on a VLAN, the
vid has to be marked with the TAG flag in order to make any
component in batman-adv understand that the packet is coming
from a really tagged network.
This fix the Bridge Loop Avoidance behaviour which was not
able to send announces over VLAN interfaces.
Introduced by 0b1da1765fdb00ca5d53bc95c9abc70dfc9aae5b
("batman-adv: change VID semantic in the BLA code")
Signed-off-by: Antonio Quartulli <antonio@open-mesh.org>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
|
|
Instead of the netlink skb.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
messages. For IPv6, RFC4443, section 2.4. says:
...
(e) An ICMPv6 error message MUST NOT be originated as a result of
receiving the following:
...
(e.2) An ICMPv6 redirect message [IPv6-DISC].
...
Therefore, do not report an error to user space, just invoke dst's redirect
callback and leave, same for IPv4 as done in TCP as well. The implication
w/o having this patch could be that the reception of such packets would
generate a poll notification and in worst case it could even tear down the
whole connection. Therefore, stop updating sk_err on redirects.
Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.
Introduced by c075b13098b3 (ip6tnl: advertise tunnel param via rtnl).
Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes a serious bug affecting all hash types with a net element -
specifically, if a CIDR value is deleted such that none of the same size
exist any more, all larger (less-specific) values will then fail to
match. Adding back any prefix with a CIDR equal to or more specific than
the one deleted will fix it.
Steps to reproduce:
ipset -N test hash:net
ipset -A test 1.1.0.0/16
ipset -A test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS in set
ipset -D test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set
This is due to the fact that the nets counter was unconditionally
decremented prior to the iteration that shifts up the entries. Now, we
first check if there is a proceeding entry and if not, decrement it and
return. Otherwise, we proceed to iterate and then zero the last element,
which, in most cases, will already be zero.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
swapping
This closes netfilter bugzilla #843, reported by Quentin Armitage.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:
# ipset n test hash:net
# ipset a test 10.0.0.0/24 nomatch
# ipset t test 10.0.0.1
10.0.0.1 is NOT in set test.
# ipset t test 10.0.0.1 nomatch
10.0.0.1 is in set test.
# ipset a test 192.168.0.0/24
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is NOT in set test.
Before the patch the results were
...
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is in set test.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
port/protocol
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
The NULL deref happens when br_handle_frame is called between these
2 lines of del_nbp:
dev->priv_flags &= ~IFF_BRIDGE_PORT;
/* --> br_handle_frame is called at this time */
netdev_rx_handler_unregister(dev);
In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced
without check but br_port_get_rcu(dev) returns NULL if:
!(dev->priv_flags & IFF_BRIDGE_PORT)
Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary
here since we're in rcu_read_lock and we have synchronize_net() in
netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT
and by the previous patch, make sure br_port_get_rcu is called in
bridging code.
Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
current br_port_get_rcu is problematic in bridging path
(NULL deref). Change these calls in netlink path first.
Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull aio changes from Ben LaHaise:
"First off, sorry for this pull request being late in the merge window.
Al had raised a couple of concerns about 2 items in the series below.
I addressed the first issue (the race introduced by Gu's use of
mm_populate()), but he has not provided any further details on how he
wants to rework the anon_inode.c changes (which were sent out months
ago but have yet to be commented on).
The bulk of the changes have been sitting in the -next tree for a few
months, with all the issues raised being addressed"
* git://git.kvack.org/~bcrl/aio-next: (22 commits)
aio: rcu_read_lock protection for new rcu_dereference calls
aio: fix race in ring buffer page lookup introduced by page migration support
aio: fix rcu sparse warnings introduced by ioctx table lookup patch
aio: remove unnecessary debugging from aio_free_ring()
aio: table lookup: verify ctx pointer
staging/lustre: kiocb->ki_left is removed
aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
aio: convert the ioctx list to table lookup v3
aio: double aio_max_nr in calculations
aio: Kill ki_dtor
aio: Kill ki_users
aio: Kill unneeded kiocb members
aio: Kill aio_rw_vect_retry()
aio: Don't use ctx->tail unnecessarily
aio: io_cancel() no longer returns the io_event
aio: percpu ioctx refcount
aio: percpu reqs_available
aio: reqs_active -> reqs_available
aio: fix build when migration is disabled
...
|
|
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
In commit 58a317f1 (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt
was added with an incorrect comparison of ICMP codes to types. This causes
problems when using NAT rules with the --random option. Correct the
comparison.
This closes netfilter bugzilla #851, reported by Alexander Neumann.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
At some point limits were added to forward_delay. However, the
limits are only enforced when STP is enabled. This created a
scenario where you could have a value outside the allowed range
while STP is disabled, which then stuck around even after STP
is enabled.
This patch fixes this by clamping the value when we enable STP.
I had to move the locking around a bit to ensure that there is
no window where someone could insert a value outside the range
while we're in the middle of enabling STP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
|