summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)Author
2014-02-13sunrpc: don't wait for write before allowing reads from use-gss-proxy fileJeff Layton
commit 1654a04cd702fd19c297c36300a6ab834cf8c072 upstream. It doesn't make much sense to make reads from this procfile hang. As far as I can tell, only gssproxy itself will open this file and it never reads from it. Change it to just give the present setting of sn->use_gss_proxy without waiting for anything. Note that we do not want to call use_gss_proxy() in this codepath since an inopportune read of this file could cause it to be disabled prematurely. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13sunrpc: Fix infinite loop in RPC state machineWeston Andros Adamson
commit 6ff33b7dd0228b7d7ed44791bbbc98b03fd15d9d upstream. When a task enters call_refreshresult with status 0 from call_refresh and !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting or max number of retries. Instead of trying forever, make use of the retry path that other errors use. This only seems to be possible when the crrefresh callback is gss_refresh_null, which only happens when destroying the context. To reproduce: 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific operations). 2) reboot - the client will be stuck and will need to be hard rebooted BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy irq event stamp: 195724 hardirqs last enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30 hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276 softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: rpciod rpc_async_schedule [sunrpc] task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 RIP: 0010:[<ffffffffa0064fd4>] [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc] RSP: 0018:ffff880079003d18 EFLAGS: 00000246 RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 Stack: ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 Call Trace: [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffff81052974>] process_one_work+0x211/0x3a5 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5 [<ffffffff81052eeb>] worker_thread+0x134/0x202 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff810584a0>] kthread+0xc9/0xd1 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 And the output of "rpcdebug -m rpc -s all": RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29SUNRPC: Avoid deep recursion in rpc_release_clientTrond Myklebust
commit d07ba8422f1e58be94cc98a1f475946dc1b89f1b upstream. In cases where an rpc client has a parent hierarchy, then rpc_free_client may end up calling rpc_release_client() on the parent, thus recursing back into rpc_free_client. If the hierarchy is deep enough, then we can get into situations where the stack simply overflows. The fix is to have rpc_release_client() loop so that it can take care of the parent rpc client hierarchy without needing to recurse. Reported-by: Jeff Layton <jlayton@redhat.com> Reported-by: Weston Andros Adamson <dros@netapp.com> Reported-by: Bruce Fields <bfields@fieldses.org> Link: http://lkml.kernel.org/r/2C73011F-0939-434C-9E4D-13A1EB1403D7@netapp.com Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29SUNRPC: Fix a data corruption issue when retransmitting RPC callsTrond Myklebust
commit a6b31d18b02ff9d7915c5898c9b5ca41a798cd73 upstream. The following scenario can cause silent data corruption when doing NFS writes. It has mainly been observed when doing database writes using O_DIRECT. 1) The RPC client uses sendpage() to do zero-copy of the page data. 2) Due to networking issues, the reply from the server is delayed, and so the RPC client times out. 3) The client issues a second sendpage of the page data as part of an RPC call retransmission. 4) The reply to the first transmission arrives from the server _before_ the client hardware has emptied the TCP socket send buffer. 5) After processing the reply, the RPC state machine rules that the call to be done, and triggers the completion callbacks. 6) The application notices the RPC call is done, and reuses the pages to store something else (e.g. a new write). 7) The client NIC drains the TCP socket send buffer. Since the page data has now changed, it reads a corrupted version of the initial RPC call, and puts it on the wire. This patch fixes the problem in the following manner: The ordering guarantees of TCP ensure that when the server sends a reply, then we know that the _first_ transmission has completed. Using zero-copy in that situation is therefore safe. If a time out occurs, we then send the retransmission using sendmsg() (i.e. no zero-copy), We then know that the socket contains a full copy of the data, and so it will retransmit a faithful reproduction even if the RPC call completes, and the application reuses the O_DIRECT buffer in the meantime. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 messageTrond Myklebust
commit 5fccc5b52ee07d07a74ce53c6f174bff81e26a16 upstream. Add the missing 'break' to ensure that we don't corrupt a legacy 'v0' type message by appending the 'v1'. Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-18RPCSEC_GSS: fix crash on destroying gss authJ. Bruce Fields
This fixes a regression since eb6dc19d8e72ce3a957af5511d20c0db0a8bd007 "RPCSEC_GSS: Share all credential caches on a per-transport basis" which could cause an occasional oops in the nfsd code (see below). The problem was that an auth was left referencing a client that had been freed. To avoid this we need to ensure that auths are shared only between descendants of a common client; the fact that a clone of an rpc_client takes a reference on its parent then ensures that the parent client will last as long as the auth. Also add a comment explaining what I think was the intention of this code. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc CPU: 3 PID: 4071 Comm: kworker/u8:2 Not tainted 3.11.0-rc2-00182-g025145f #1665 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: nfsd4_callbacks nfsd4_do_callback_rpc [nfsd] task: ffff88003e206080 ti: ffff88003c384000 task.ti: ffff88003c384000 RIP: 0010:[<ffffffffa00001f3>] [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP: 0000:ffff88003c385ab8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: ffff88003af9a800 RCX: 0000000000000002 RDX: ffffffffa00001a5 RSI: 0000000000000001 RDI: ffffffff81e284e0 RBP: ffff88003c385ad8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000015 R12: ffff88003c990840 R13: ffff88003c990878 R14: ffff88003c385ba8 R15: ffff88003e206080 FS: 0000000000000000(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcdf737e000 CR3: 000000003ad2b000 CR4: 00000000000006e0 Stack: ffffffffa00001a5 0000000000000006 0000000000000006 ffff88003af9a800 ffff88003c385b08 ffffffffa00d52a4 ffff88003c385ba8 ffff88003c751bd8 ffff88003c751bc0 ffff88003e113600 ffff88003c385b18 ffffffffa00d530c Call Trace: [<ffffffffa00001a5>] ? rpc_net_ns+0x5/0x70 [sunrpc] [<ffffffffa00d52a4>] __gss_pipe_release+0x54/0x90 [auth_rpcgss] [<ffffffffa00d530c>] gss_pipe_free+0x2c/0x30 [auth_rpcgss] [<ffffffffa00d678b>] gss_destroy+0x9b/0xf0 [auth_rpcgss] [<ffffffffa000de63>] rpcauth_release+0x23/0x30 [sunrpc] [<ffffffffa0001e81>] rpc_release_client+0x51/0xb0 [sunrpc] [<ffffffffa00020d5>] rpc_shutdown_client+0xe5/0x170 [sunrpc] [<ffffffff81098a14>] ? cpuacct_charge+0xa4/0xb0 [<ffffffff81098975>] ? cpuacct_charge+0x5/0xb0 [<ffffffffa019556f>] nfsd4_process_cb_update.isra.17+0x2f/0x210 [nfsd] [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff819a4acb>] ? _raw_spin_unlock_irq+0x3b/0x60 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffffa01957dd>] nfsd4_do_callback_rpc+0x8d/0xa0 [nfsd] [<ffffffff8107041e>] process_one_work+0x1ce/0x510 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffff810712ab>] worker_thread+0x11b/0x370 [<ffffffff81071190>] ? manage_workers.isra.24+0x2b0/0x2b0 [<ffffffff8107854b>] kthread+0xdb/0xe0 [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 [<ffffffff819ac7dc>] ret_from_fork+0x7c/0xb0 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 Code: a5 01 00 a0 31 d2 31 f6 48 c7 c7 e0 84 e2 81 e8 f4 91 0a e1 48 8b 43 60 48 c7 c2 a5 01 00 a0 be 01 00 00 00 48 c7 c7 e0 84 e2 81 <48> 8b 98 10 07 00 00 e8 91 8f 0a e1 e8 +3c 4e 07 e1 48 83 c4 18 RIP [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP <ffff88003c385ab8> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...
2013-09-12Merge tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes (part 2) from Trond Myklebust: "Bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations" * tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: No, I did not intend to create a 256KiB hashtable sunrpc: Add missing kuids conversion for printing NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1 fix decode_free_stateid
2013-09-12SUNRPC: No, I did not intend to create a 256KiB hashtableTrond Myklebust
Fix the declaration of the gss_auth_hash_table so that it creates a 16 bucket hashtable, as I had intended. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-12sunrpc: Add missing kuids conversion for printingGeert Uytterhoeven
m68k/allmodconfig: net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’: net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘kuid_t’ commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and kgids to uids and gids for printing") forgot to convert one instance. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This was a very quiet cycle! Just a few bugfixes and some cleanup" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: rpc: let xdr layer allocate gssproxy receieve pages rpc: fix huge kmalloc's in gss-proxy rpc: comment on linux_cred encoding, treat all as unsigned rpc: clean up decoding of gssproxy linux creds svcrpc: remove unused rq_resused nfsd4: nfsd4_create_clid_dir prints uninitialized data nfsd4: fix leak of inode reference on delegation failure Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR" sunrpc: prepare NFS for 2038 nfsd4: fix setlease error return nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
2013-09-10shrinker: convert remaining shrinkers to count/scan APIDave Chinner
Convert the remaining couple of random shrinkers in the tree to the new API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...
2013-09-06rpc: let xdr layer allocate gssproxy receieve pagesJ. Bruce Fields
In theory the linux cred in a gssproxy reply can include up to NGROUPS_MAX data, 256K of data. In the common case we expect it to be shorter. So do as the nfsv3 ACL code does and let the xdr code allocate the pages as they come in, instead of allocating a lot of pages that won't typically be used. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06rpc: fix huge kmalloc's in gss-proxyJ. Bruce Fields
The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will take up more than a page. We therefore need to allocate an array of pages to hold the reply instead of trying to allocate a single huge buffer. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06rpc: comment on linux_cred encoding, treat all as unsignedJ. Bruce Fields
The encoding of linux creds is a bit confusing. Also: I think in practice it doesn't really matter whether we treat any of these things as signed or unsigned, but unsigned seems more straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups overflow check. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-06rpc: clean up decoding of gssproxy linux credsJ. Bruce Fields
We can use the normal coding infrastructure here. Two minor behavior changes: - we're assuming no wasted space at the end of the linux cred. That seems to match gss-proxy's behavior, and I can't see why it would need to do differently in the future. - NGROUPS_MAX check added: note groups_alloc doesn't do this, this is the caller's responsibility. Tested-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05SUNRPC: Add an identifier for struct rpc_clntTrond Myklebust
Add an identifier in order to aid debugging. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04SUNRPC: Ensure rpc_task->tk_pid is available for tracepointsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04SUNRPC: Add tracepoints to help debug socket connection issuesTrond Myklebust
Add client side debugging to help trace socket connection/disconnection and unexpected state change issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03SUNRPC refactor rpcauth_checkverf error returnsAndy Adamson
Most of the time an error from the credops crvalidate function means the server has sent us a garbage verifier. The gss_validate function is the exception where there is an -EACCES case if the user GSS_context on the client has expired. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03SUNRPC new rpc_credops to test credential expiryAndy Adamson
This patch provides the RPC layer helper functions to allow NFS to manage data in the face of expired credentials - such as avoiding buffered WRITEs and COMMITs when the gss context will expire before the WRITEs are flushed and COMMITs are sent. These helper functions enable checking the expiration of an underlying credential key for a generic rpc credential, e.g. the gss_cred gss context gc_expiry which for Kerberos is set to the remaining TGT lifetime. A new rpc_authops key_timeout is only defined for the generic auth. A new rpc_credops crkey_to_expire is only defined for the generic cred. A new rpc_credops crkey_timeout is only defined for the gss cred. Set a credential key expiry watermark, RPC_KEY_EXPIRE_TIMEO set to 240 seconds as a default and can be set via a module parameter as we need to ensure there is time for any dirty data to be flushed. If key_timeout is called on a credential with an underlying credential key that will expire within watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON flag in the generic_cred acred so that the NFS layer can clean up prior to key expiration. Checking a generic credential's underlying credential involves a cred lookup. To avoid this lookup in the normal case when the underlying credential has a key that is valid (before the watermark), a notify flag is set in the generic credential the first time the key_timeout is called. The generic credential then stops checking the underlying credential key expiry, and the underlying credential (gss_cred) match routine then checks the key expiration upon each normal use and sets a flag in the associated generic credential only when the key expiration is within the watermark. This in turn signals the generic credential key_timeout to perform the extra credential lookup thereafter. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-03SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresultAndy Adamson
The NFS layer needs to know when a key has expired. This change also returns -EKEYEXPIRED to the application, and the informative "Key has expired" error message is displayed. The user then knows that credential renewal is required. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-02SUNRPC: rpcauth_create needs to know about rpc_clnt clone statusTrond Myklebust
Ensure that we set rpc_clnt->cl_parent before calling rpc_client_register so that rpcauth_create can find any existing RPCSEC_GSS caches for this transport. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-02RPCSEC_GSS: Share all credential caches on a per-transport basisTrond Myklebust
Ensure that all struct rpc_clnt for any given socket/rdma channel share the same RPCSEC_GSS/krb5,krb5i,krb5p caches. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01RPCSEC_GSS: Share rpc_pipes when an rpc_clnt owns multiple rpcsec auth cachesTrond Myklebust
Ensure that if an rpc_clnt owns more than one RPCSEC_GSS-based authentication mechanism, then those caches will share the same 'gssd' upcall pipe. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01SUNRPC: Add a helper to allow sharing of rpc_pipefs directory objectsTrond Myklebust
Add support for looking up existing objects and creating new ones if there is no match. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01SUNRPC: Remove the rpc_client->cl_dentryTrond Myklebust
It is now redundant. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01SUNRPC: Remove the obsolete auth-only interface for pipefs dentry managementTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-01RPCSEC_GSS: Switch auth_gss to use the new framework for pipefs dentriesTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Add a framework to clean up management of rpc_pipefs directoriesTrond Myklebust
The current system requires everyone to set up notifiers, manage directory locking, etc. What we really want to do is have the rpc_client create its directory, and then create all the entries. This patch will allow the RPCSEC_GSS and NFS code to register all the objects that they want to have appear in the directory, and then have the sunrpc code call them back to actually create/destroy their pipefs dentries when the rpc_client creates/destroys the parent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30RPCSEC_GSS: Fix an Oopsable condition when creating/destroying pipefs objectsTrond Myklebust
If an error condition occurs on rpc_pipefs creation, or the user mounts rpc_pipefs and then unmounts it, then the dentries in struct gss_auth need to be reset to NULL so that a second call to gss_pipes_dentries_destroy doesn't try to free them again. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30RPCSEC_GSS: Further cleanupsTrond Myklebust
Don't pass the rpc_client as a parameter, when what we really want is the net namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Replace clnt->cl_principalTrond Myklebust
The clnt->cl_principal is being used exclusively to store the service target name for RPCSEC_GSS/krb5 callbacks. Replace it with something that is stored only in the RPCSEC_GSS-specific code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30RPCSEC_GSS: Clean up upcall message allocationTrond Myklebust
Optimise away gss_encode_msg: we don't need to look up the pipe version a second time. Save the gss target name in struct gss_auth. It is a property of the auth cache itself, and doesn't really belong in the rpc_client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Cleanup rpc_setup_pipedirTrond Myklebust
The directory name is _always_ clnt->cl_program->pipe_dir_name. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Remove unused struct rpc_clnt field cl_protnameTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30SUNRPC: Deprecate rpc_client->cl_protnameTrond Myklebust
It just duplicates the cl_program->name, and is not used in any fast paths where the extra dereference will cause a hit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-28SUNRPC: Fix memory corruption issue on 32-bit highmem systemsTrond Myklebust
Some architectures, such as ARM-32 do not return the same base address when you call kmap_atomic() twice on the same page. This causes problems for the memmove() call in the XDR helper routine "_shift_data_right_pages()", since it defeats the detection of overlapping memory ranges, and has been seen to corrupt memory. The fix is to distinguish between the case where we're doing an inter-page copy or not. In the former case of we know that the memory ranges cannot possibly overlap, so we can additionally micro-optimise by replacing memmove() with memcpy(). Reported-by: Mark Young <MYoung@nvidia.com> Reported-by: Matt Craighead <mcraighead@nvidia.com> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Matt Craighead <mcraighead@nvidia.com>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-07SUNRPC: If the rpcbind channel is disconnected, fail the call to unregisterTrond Myklebust
If rpcbind causes our connection to the AF_LOCAL socket to close after we've registered a service, then we want to be careful about reconnecting since the mount namespace may have changed. By simply refusing to reconnect the AF_LOCAL socket in the case of unregister, we avoid the need to somehow save the mount namespace. While this may lead to some services not unregistering properly, it should be safe. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Nix <nix@esperi.org.uk> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x
2013-08-06SUNRPC: Don't auto-disconnect from the local rpcbind socketTrond Myklebust
There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.9.x
2013-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcallJ. Bruce Fields
The change made to rsc_parse() in 0dc1531aca7fd1440918bd55844a054e9c29acad "svcrpc: store gss mech in svc_cred" should also have been propagated to the gss-proxy codepath. This fixes a crash in the gss-proxy case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix kfree oops in gss-proxy codeJ. Bruce Fields
mech_oid.data is an array, not kmalloc()'d memory. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix gss-proxy xdr decoding oopsJ. Bruce Fields
Uninitialized stack data was being used as the destination for memcpy's. Longer term we'll just delete some of this code; all we're doing is skipping over xdr that we don't care about. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01svcrpc: fix gss_rpc_upcall create errorJ. Bruce Fields
Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-01NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.NeilBrown
Since we enabled auto-tuning for sunrpc TCP connections we do not guarantee that there is enough write-space on each connection to queue a reply. If memory pressure causes the window to shrink too small, the request throttling in sunrpc/svc will not accept any requests so no more requests will be handled. Even when pressure decreases the window will not grow again until data is sent on the connection. This means we get a deadlock: no requests will be handled until there is more space, and no space will be allocated until a request is handled. This can be simulated by modifying svc_tcp_has_wspace to inflate the number of byte required and removing the 'svc_sock_setbufsize' calls in svc_setup_socket. I found that multiplying by 16 was enough to make the requirement exceed the default allocation. With this modification in place: mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt would block and eventually time out because the nfs server could not accept any requests. This patch relaxes the request throttling to always allow at least one request through per connection. It does this by checking both sk_stream_min_wspace() and xprt->xpt_reserved are zero. The first is zero when the TCP transmit queue is empty. The second is zero when there are no RPC requests being processed. When both of these are zero the socket is idle and so one more request can safely be allowed through. Applying this patch allows the above mount command to succeed cleanly. Tracing shows that the allocated write buffer space quickly grows and after a few requests are handled, the extra tests are no longer needed to permit further requests to be processed. The main purpose of request throttling is to handle the case when one client is slow at collecting replies and the send queue gets full of replies that the client hasn't acknowledged (at the TCP level) yet. As we only change behaviour when the send queue is empty this main purpose is still preserved. Reported-by: Ben Myers <bpm@sgi.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-25net: add sk_stream_is_writeable() helperEric Dumazet
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>