Age | Commit message (Collapse) | Author |
|
This branch simplifies and clarifies the dcache lookup, and allows us to
do certain nice optimizations when comparing dentries. It also cleans
up the interface to __d_lookup_rcu(), especially around passing the
inode information around.
* dentry-cleanups:
vfs: make it possible to access the dentry hash/len as one 64-bit entry
vfs: move dentry name length comparison from dentry_cmp() into callers
vfs: do the careful dentry name access for all dentry_cmp cases
vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu
vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
|
|
|
|
|
|
The following tightens the padding check from commit
c1412fce7eccae62b4de22494f6ab3ff8a90c0c6 :
* Take into account combinations of consecutive Pad1 and PadN.
* Catch the corner case of when only padding is present in the
header, when the extention header length is 0 (i.e., 8 bytes).
In this case, the header would have exactly 6 bytes of padding:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: Next Header : Hdr Ext Len=0 : :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
: Padding (Pad1 or PadN) :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace simple_strtoul with kstrtoul in three similar occurrences, all setup
handlers:
* route.c: set_rhash_entries
* tcp.c: set_thash_entries
* udp.c: set_uhash_entries
Also check if the conversion failed.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The __setup macro should follow the corresponding setup handler.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip6_frag_reasm() can use skb_try_coalesce() to build optimized skb,
reducing memory used by them (truesize), and reducing number of cache
line misses and overhead for the consumer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
reducing memory used by them (truesize), and reducing number of cache
line misses and overhead for the consumer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move tcp_try_coalesce() protocol independent part to
skb_try_coalesce().
skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
to build optimized skbs (less sk_buff, and possibly less 'headers')
skb_try_coalesce() is zero copy, unless the copy can fit in destination
header (its a rare case)
kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
because IPv6 will need it in patch (ipv6: use skb coalescing in
reassembly).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixed space issues relating to operators found by
checkpatch.pl tool in net/ipv6/udp.c
Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixed a trailing white space issue found by
checkpatch.pl tool in net/ipv6/udp.c
Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
xprt_alloc_slot will call rpc_delay() to make the task wait a bit before
retrying when it gets back an -ENOMEM error from xprt_dynamic_alloc_slot.
The problem is that rpc_delay will clear the task->tk_status, causing
call_reserveresult to abort the task.
The solution is simply to let call_reserveresult handle the ENOMEM error
directly.
Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org [>= 3.1]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If the allfrag feature has been set on a host route (due to an ICMPv6
Packet Too Big received indicating a MTU of less than 1280), we hit a
very slow behavior in TCP stack, because all big packets are dropped and
only a retransmit timer is able to push one MSS frame every 200 ms.
One way to handle this is to disable GSO on the socket the first time a
super packet is dropped. Adding a specific dst_allfrag() in the fast
path is probably overkill since the dst_allfrag() case almost never
happen.
Result on netperf TCP_STREAM, one flow :
Before : 60 kbit/sec
After : 1.6 Gbit/sec
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No need to export napi_frags_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mostly bool conversions, some inline removals and const additions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We already unconditionally dereference 'sk' via lock_sock(sk) earlier
in this function, and our caller (sock_do_ioctl()) makes takes similar
liberties.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Quoting Tore Anderson from :
If the allfrag feature has been set on a host route (due to an ICMPv6
Packet Too Big received indicating a MTU of less than 1280),
TCP SYN/ACK packets to that destination appears to get an incorrect
TCP checksum. This in turn means they are thrown away as invalid.
In the case of an IPv4 client behind a link with a MTU of less than
1260, accessing an IPv6 server through a stateless translator,
this means that the client can only download a single large file
from the server, because once it is in the server's routing cache
with the allfrag feature set, new TCP connections can no longer
be established.
</endquote>
It appears ip6_fragment() doesn't handle CHECKSUM_PARTIAL properly.
As network drivers are not prepared to fetch correct transport header, a
safe fix is to call skb_checksum_help() before fragmenting packet.
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request(). If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.
Simply check if the request is still already unregistered; if so,
return immediately and do nothing.
Fixes http://tracker.newdream.net/issues/2420
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
|
|
Move the addition of the authorizer buffer to a connection's
out_kvec out of get_connect_authorizer() and into its caller. This
way, the caller--prepare_write_connect()--can avoid adding the
connect header to out_kvec before it has been fully initialized.
Prior to this patch, it was possible for a connect header to be
sent over the wire before the authorizer protocol or buffer length
fields were initialized. An authorizer buffer associated with that
header could also be queued to send only after the connection header
that describes it was on the wire.
Fixes http://tracker.newdream.net/issues/2424
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
commit c57b5468406 (pktgen: fix crash at module unload) did a very poor
job with list primitives.
1) list_splice() arguments were in the wrong order
2) list_splice(list, head) has undefined behavior if head is not
initialized.
3) We should use the list_splice_init() variant to clear pktgen_threads
list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
csummode variable is always CHECKSUM_NONE in ip6_append_data()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix two issues introduced in commit a1c7fff7e18f5
( net: netdev_alloc_skb() use build_skb() )
- Must be IRQ safe (non NAPI drivers can use it)
- Must not leak the frag if build_skb() fails to allocate sk_buff
This patch introduces netdev_alloc_frag() for drivers willing to
use build_skb() instead of __netdev_alloc_skb() variants.
Factorize code so that :
__dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
dev_alloc_skb() a wrapper around netdev_alloc_skb()
Use __GFP_COLD flag.
Almost all network drivers now benefit from skb->head_frag
infrastructure.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv6_opt_accepted() returns a bool, and can use const pointers
ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- match() method returns a boolean
- return (A && B && C && D) -> return A && B && C && D
- fix indentation
Signed-off-by: Eric Dumazet <edumazet@google.com>
|
|
More spring cleaning!
The ancient Econet protocol should go. Most of the bug fixes in recent
years have been fixing security vulnerabilities. The hardware hasn't
been made since the 90s, it is only interesting as an archeological curiosity.
For the truly curious, or insomniac, go read up on it.
http://en.wikipedia.org/wiki/Econet
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enable dynamic debugging and remove a bunch of #ifdef/#endifs.
Add a lapb_dbg(level, fmt, ...) macro and replace the
printk(KERN_DEBUG uses.
Add pr_fmt and remove embedded prefixes.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.
Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.
The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.
After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.
The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.
This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
|
|
When I first wrote drop monitor I wrote it to just build monolithically. There
is no reason it can't be built modularly as well, so lets give it that
flexibiity.
I've tested this by building it as both a module and monolithically, and it
seems to work quite well
Change notes:
v2)
* fixed for_each_present_cpu loops to be more correct as per Eric D.
* Converted exit path failures to BUG_ON as per Ben H.
v3)
* Converted del_timer to del_timer_sync to close race noted by Ben H.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netdev_alloc_skb() is used by networks driver in their RX path to
allocate an skb to receive an incoming frame.
With recent skb->head_frag infrastructure, it makes sense to change
netdev_alloc_skb() to use build_skb() and a frag allocator.
This permits a zero copy splice(socket->pipe), and better GRO or TCP
coalescing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The padding destination or hop-by-hop option is called Pad1 and not Pad0.
See RFC2460 (4.2) or the IANA ipv6-parameters registry:
http://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xml
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bool conversions where possible.
__inline__ -> inline
space cleanups
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the name of prepare_connect_authorizer(). The next
patch is going to make this function no longer add anything to the
connection's out_kvec, so it will no longer fit the pattern of
the rest of the prepare_connect_*() functions.
In addition, pass the address of a variable that will hold the
authorization protocol to use. Move the assignment of that to the
connection's out_connect structure into prepare_write_connect().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Change prepare_connect_authorizer() so it returns a pointer (or
pointer-coded error).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value. This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced. There is no
obvious guarantee that this pointer has been assigned. And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.
Add checks in both routines to make sure they are defined (non-null)
before use. Add similar checks in a few other spots in these files
while we're at it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops. Use a local variable of that type as a
shorthand in the get_authorizer method definitions.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers." Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.
Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value. This function can
return an error, so check for it and return it if that ever occurs.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.
Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
prepare_write_connect() can return an error, but only one of its
callers checks for it. All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection. Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending. It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.
Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.
Partially addresses http://tracker.newdream.net/issues/2424
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr. Just get the msgr
value from the ceph connection and drop the unneeded argument.
The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message. It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Reset a connection's kvec fields in the caller rather than in
prepare_write_connect(). This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Use the current logging style.
This enables use of dynamic debugging as well.
Convert printk(KERN_<LEVEL> to pr_<level>.
Add pr_fmt. Remove embedded prefixes, use
%s, __func__ instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Why use several macros when one will do?
Convert the multiple ND_PRINTKx macros to a single
ND_PRINTK macro. Use the new net_<level>_ratelimited
mechanism too.
Add pr_fmt with "ICMPv6: " as prefix.
Remove embedded ICMPv6 prefixes from messages.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert printk(KERN_DEBUG to pr_debug which can
enable dynamic debugging.
Remove embedded prefixes from the conversions as
pr_fmt adds them.
Align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bool/const conversions where possible
__inline__ -> inline
space cleanups
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|