summaryrefslogtreecommitdiff
path: root/fs/nfs/dir.c
AgeCommit message (Collapse)Author
2014-02-11mm: fix page leak at nfs_symlink()Rafael Aquini
Changes in commit a0b8cab3b9b2 ("mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API") have introduced a call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as now the page gets an extra refcount that is not dropped. Jan Stancek observed and reported the leak effect while running test8 from Connectathon Testsuite. After several iterations over the test case, which creates several symlinks on a NFS mountpoint, the test system was quickly getting into an out-of-memory scenario. This patch fixes the page leak by dropping that extra refcount add_to_page_cache_lru() is grabbing. Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: <stable@vger.kernel.org> [3.11.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-28nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATINGJeff Layton
If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-27NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mappingJeff Layton
There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do that to look for the new NFS_INO_INVALIDATING flag. Doing this requires us to be careful not to set the bitlock unnecessarily, so this code only does that if it believes it will be doing an invalidation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05NFS: dprintk() should not print negative fileids and inode numbersNiels de Vos
A fileid in NFS is a uint64. There are some occurrences where dprintk() outputs a signed fileid. This leads to confusion and more difficult to read debugging (negative fileids matching positive inode numbers). Signed-off-by: Niels de Vos <ndevos@redhat.com> CC: Santosh Pradhan <spradhan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "All kinds of stuff this time around; some more notable parts: - RCU'd vfsmounts handling - new primitives for coredump handling - files_lock is gone - Bruce's delegations handling series - exportfs fixes plus misc stuff all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits) ecryptfs: ->f_op is never NULL locks: break delegations on any attribute modification locks: break delegations on link locks: break delegations on rename locks: helper functions for delegation breaking locks: break delegations on unlink namei: minor vfs_unlink cleanup locks: implement delegations locks: introduce new FL_DELEG lock flag vfs: take i_mutex on renamed file vfs: rename I_MUTEX_QUOTA now that it's not used for quotas vfs: don't use PARENT/CHILD lock classes for non-directories vfs: pull ext4's double-i_mutex-locking into common code exportfs: fix quadratic behavior in filehandle lookup exportfs: better variable name exportfs: move most of reconnect_path to helper function exportfs: eliminate unused "noprogress" counter exportfs: stop retrying once we race with rename/remove exportfs: clear DISCONNECTED on all parents sooner exportfs: more detailed comment for path_reconnect ...
2013-10-28Merge branch 'fscache' of ↵Trond Myklebust
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next Pull fs-cache fixes from David Howells: Can you pull these commits to fix an issue with NFS whereby caching can be enabled on a file that is open for writing by subsequently opening it for reading. This can be made to crash by opening it for writing again if you're quick enough. The gist of the patchset is that the cookie should be acquired at inode creation only and subsequently enabled and disabled as appropriate (which dispenses with the backing objects when they're not needed). The extra synchronisation that NFS does can then be dispensed with as it is thenceforth managed by FS-Cache. Could you send these on to Linus? This likely will need fixing also in CIFS and 9P also once the FS-Cache changes are upstream. AFS and Ceph are probably safe. * 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() FS-Cache: Provide the ability to enable/disable cookies FS-Cache: Add use/unuse/wake cookie wrappers
2013-10-28nfs: use IS_ROOT not DCACHE_DISCONNECTEDJ. Bruce Fields
This check was added by Al Viro with d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f "nfs d_revalidate() is too trigger-happy with d_drop()", with the explanation that we don't want to remove the root of a disconnected tree, which will still be included on the s_anon list. But DCACHE_DISCONNECTED does *not* actually identify dentries that are disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the way to do that. Also add a comment from Al's commit to remind us why this check is there. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-25nfs: use %p[dD] instead of open-coded (and often racy) equivalentsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-27NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()David Howells
Use i_writecount to control whether to get an fscache cookie in nfs_open() as NFS does not do write caching yet. I *think* this is the cause of a problem encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL pointer dereference because cookie->def is NULL: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 PGD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP Modules linked in: ... CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1 Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 task: ffff8804203460c0 ti: ffff880420346640 RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 RSP: 0018:ffff8801053af878 EFLAGS: 00210286 RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8 RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780 RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440 R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0 Stack: ... Call Trace: [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70 [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90 [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90 [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620 [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0 [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70 [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40 [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70 [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120 [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0 [<ffffffff81357f78>] nfs_setattr+0xc8/0x150 [<ffffffff8122d95b>] notify_change+0x1cb/0x390 [<ffffffff8120a55b>] do_truncate+0x7b/0xc0 [<ffffffff8121f96c>] do_last+0xa4c/0xfd0 [<ffffffff8121ffbc>] path_openat+0xcc/0x670 [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0 [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0 [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50 [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24 The code at the instruction pointer was disassembled: > (gdb) disas __fscache_uncache_page > Dump of assembler code for function __fscache_uncache_page: > ... > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax) > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237> These instructions make up: ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL - which presumably means that the cookie has already been at least partway through __fscache_relinquish_cookie(). What I think may be happening is something like a three-way race on the same file: PROCESS 1 PROCESS 2 PROCESS 3 =============== =============== =============== open(O_TRUNC|O_WRONLY) open(O_RDONLY) open(O_WRONLY) -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() __fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_enable_inode_cookie() __fscache_acquire_cookie() nfs_inode->fscache = cookie <--nfs_fscache_set_inode_cookie() <--nfs_open() -->nfs_setattr() ... ... -->nfs_invalidate_page() -->__nfs_fscache_invalidate_page() cookie = nfsi->fscache -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() -->__fscache_relinquish_cookie() -->__fscache_uncache_page(cookie) <crash> <--__fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() What is needed is something to prevent process #2 from reacquiring the cookie - and I think checking i_writecount should do the trick. It's also possible to have a two-way race on this if the file is opened O_TRUNC|O_RDONLY instead. Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-26NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem methodTrond Myklebust
Determine if we've created a new file by examining the directory change attribute and/or the O_EXCL flag. This fixes a regression when doing a non-exclusive create of a new file. If the FILE_CREATED flag is not set, the atomic_open() command will perform full file access permissions checks instead of just checking for MAY_OPEN. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-16nfs: set FILE_CREATEDMiklos Szeredi
Set FILE_CREATED on O_CREAT|O_EXCL. If the NFS server honored our request for exclusivity then this must be correct. Currently this is a no-op, since the VFS sets FILE_CREATED anyway. The next patch will, however, require this flag to be always set by filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10fs: convert fs shrinkers to new scan/count APIDave Chinner
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10super: fix calculation of shrinkable objects for small numbersGlauber Costa
The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...
2013-09-05nfs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03NFS: Ensure that rmdir() waits for sillyrenames to completeTrond Myklebust
If an NFS client does mkdir("dir"); fd = open("dir/file"); unlink("dir/file"); close(fd); rmdir("dir"); then the asynchronous nature of the sillyrename operation means that we can end up getting EBUSY for the rmdir() in the above test. Fix that by ensuring that we wait for any in-progress sillyrenames before sending the rmdir() to the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-30NFS: Fix up two use-after-free issues with the new tracing codeTrond Myklebust
We don't want to pass the context argument to trace_nfs_atomic_open_exit() after it has been released. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add tracepoints for debugging NFS hard linksTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add tracepoints for debugging NFS rename and sillyrename issuesTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add tracepoints for debugging directory changesTrond Myklebust
Add tracepoints for mknod, mkdir, rmdir, remove (unlink) and symlink. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add tracepoints for debugging generic file create eventsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add event tracing for generic NFS lookupsTrond Myklebust
Add tracepoints for lookup, lookup_revalidate and atomic_open Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookupTrond Myklebust
When doing an open of a directory, ensure that we do pass the lookup flags from nfs_atomic_open into nfs_lookup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22NFS: Add event tracing for generic NFS eventsTrond Myklebust
Add tracepoints for inode attribute updates, attribute revalidation, writeback start/end fsync start/end, attribute change start/end, permission check start/end. The intention is to enable performance tracing using 'perf'as well as improving debugging. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-20NFS: Remove the NFSv4 "open optimisation" from nfs_permissionTrond Myklebust
Ever since commit 6168f62cb (Add ACCESS operation to OPEN compound) the NFSv4 atomic open has primed the access cache, and so nfs_permission will no longer do an RPC call on the wire. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07nfs: verify open flags before allowing an atomic openJeff Layton
Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot fcntl(F_SETFL,...) with those flags. This flag combination is explicitly forbidden on NFSv3 opens, and it seems like it should also be on NFSv4. Reported-by: Chao Ye <cye@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09NFS: Make nfs_readdir revalidate less oftenScott Mayhew
Make nfs_readdir revalidate only when we're at the beginning of the directory or if the cached attributes have expired. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09nfs: set verifier on existing dentries in nfs_prime_dcacheJeff Layton
nfs_prime_dcache currently only sets the verifier when it doesn't initially a matching dentry in the dcache. Set the verifier in the case where we do find a dentry in the dcache. This ensures that we don't have to look up the dentry again if we want to use it after a readdir. Cc: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Feature highlights include: - Add basic client support for NFSv4.2 - Add basic client support for Labeled NFS (selinux for NFSv4.2) - Fix the use of credentials in NFSv4.1 stateful operations, and add support for NFSv4.1 state protection. Bugfix highlights: - Fix another NFSv4 open state recovery race - Fix an NFSv4.1 back channel session regression - Various rpc_pipefs races - Fix another issue with NFSv3 auth negotiation Please note that Labeled NFS does require some additional support from the security subsystem. The relevant changesets have all been reviewed and acked by James Morris." * tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits) NFS: Set NFS_CS_MIGRATION for NFSv4 mounts NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs nfs: have NFSv3 try server-specified auth flavors in turn nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it nfs: move server_authlist into nfs_try_mount_request nfs: refactor "need_mount" code out of nfs_try_mount SUNRPC: PipeFS MOUNT notification optimization for dying clients SUNRPC: split client creation routine into setup and registration SUNRPC: fix races on PipeFS UMOUNT notifications SUNRPC: fix races on PipeFS MOUNT notifications NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize NFS: Improve legacy idmapping fallback NFSv4.1 end back channel session draining NFS: Apply v4.1 capabilities to v4.2 NFSv4.1: Clean up layout segment comparison helper names NFSv4.1: layout segment comparison helpers should take 'const' parameters NFSv4: Move the DNS resolver into the NFSv4 module rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set ...
2013-07-05helper for reading ->d_countAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-03mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec APIMel Gorman
Now that the LRU to add a page to is decided at LRU-add time, remove the misleading lru parameter from __pagevec_lru_add. A consequence of this is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers are misleading as the caller no longer has direct control over what LRU the page is added to. Unused helpers are removed by this patch and existing users of pagevec_lru_add_file() are converted to use lru_cache_add_file() directly and use the per-cpu pagevecs instead of creating their own pagevec. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com> Cc: Andrew Perepechko <anserper@ya.ru> Cc: Robin Dong <sanbai@taobao.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Bernd Schubert <bernd.schubert@fastmail.fm> Cc: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29[readdir] convert nfsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-28Merge branch 'labeled-nfs' into linux-nextTrond Myklebust
* labeled-nfs: NFS: Apply v4.1 capabilities to v4.2 NFS: Add in v4.2 callback operation NFS: Make callbacks minor version generic Kconfig: Add Kconfig entry for Labeled NFS V4 client NFS: Extend NFS xattr handlers to accept the security namespace NFS: Client implementation of Labeled-NFS NFS: Add label lifecycle management NFS:Add labels to client function prototypes NFSv4: Extend fattr bitmaps to support all 3 words NFSv4: Introduce new label structure NFSv4: Add label recommended attribute and NFSv4 flags NFSv4.2: Added NFS v4.2 support to the NFS client SELinux: Add new labeling type native labels LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data. Security: Add Hook to test if the particular xattr is part of a MAC model. Security: Add hook to calculate context based on a negative dentry. NFS: Add NFSv4.2 protocol constants Conflicts: fs/nfs/nfs4proc.c
2013-06-08NFS: Client implementation of Labeled-NFSDavid Quigley
This patch implements the client transport and handling support for labeled NFS. The patch adds two functions to encode and decode the security label recommended attribute which makes use of the LSM hooks added earlier. It also adds code to grab the label from the file attribute structures and encode the label to be sent back to the server. Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS: Add label lifecycle managementDavid Quigley
This patch adds the lifecycle management for the security label structure introduced in an earlier patch. The label is not used yet but allocations and freeing of the structure is handled. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08NFS:Add labels to client function prototypesDavid Quigley
After looking at all of the nfsv4 operations the label structure has been added to the prototypes of the functions which can transmit label data. Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06NFSv4: Move dentry instantiation into the NFSv4-specific atomic open codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_openTrond Myklebust
We already check the EXEC access mode in the lower layers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25NFSv4.1: Enable open-by-filehandleTrond Myklebust
Sometimes, we actually _want_ to do open-by-filehandle, for instance when recovering opens after a network partition, or when called from nfs4_file_open. Enable that functionality using a new capability NFS_CAP_ATOMIC_OPEN_V1, and which is only enabled for NFSv4.1 servers that support it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-26vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry opJeff Layton
The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-23new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-03NFS: Fix access to suid/sgid executablesWeston Andros Adamson
nfs_open_permission_mask() should only check MAY_EXEC for files that are opened with __FMODE_EXEC. Also fix NFSv4 access-in-open path in a similar way -- openflags must be used because fmode will not always have FMODE_EXEC set. This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-12-18Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code. Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl." * tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits) SUNRPC: continue run over clients list on PipeFS event instead of break NFS: Don't use SetPageError in the NFS writeback code SUNRPC: variable 'svsk' is unused in function bc_send_request SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket NFSv4.1: Deal effectively with interrupted RPC calls. NFSv4.1: Move the RPC timestamp out of the slot. NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 NFS: Fix calls to drop_nlink() NFS: Ensure that we always drop inodes that have been marked as stale nfs: Remove unused list nfs4_clientid_list nfs: Remove duplicate function declaration in internal.h NFS: avoid NULL dereference in nfs_destroy_server SUNRPC handle EKEYEXPIRED in call_refreshresult SUNRPC set gss gc_expiry to full lifetime nfs: fix page dirtying in NFS DIO read codepath nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ NFSv4.1: Be conservative about the client highest slotid NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly nfs: don't extend writes to cover entire page if pagecache is invalid ...
2012-12-18lseek: the "whence" argument is called "whence"Andrew Morton
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-14NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0Trond Myklebust
If the inode has no links, then we should force a new lookup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14NFS: Fix calls to drop_nlink()Trond Myklebust
It is almost always wrong for NFS to call drop_nlink() after removing a file. What we really want is to mark the inode's attributes for revalidation, and we want to ensure that the VFS drops it if we're reasonably sure that this is the final unlink(). Do the former using the usual cache validity flags, and the latter by testing if inode->i_nlink == 1, and clearing it in that case. This also fixes the following warning reported by Neil Brown and Jeff Layton (among others). [634155.004438] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442] Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm: bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace: [634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0 [634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f [634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0 [634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40 [634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs] [634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230 [634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230 [634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90 [634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100 [634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b [634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.3+]
2012-11-30nfs_lookup_revalidate(): fix a leakAl Viro
We are leaking fattr and fhandle if we decide that dentry is not to be invalidated, after all (e.g. happens to be a mountpoint). Just free both before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-30don't do blind d_drop() in nfs_prime_dcache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-01NFSv4: Add ACCESS operation to OPEN compoundWeston Andros Adamson
The OPEN operation has no way to differentiate an open for read and an open for execution - both look like read to the server. This allowed users to read files that didn't have READ access but did have EXEC access, which is obviously wrong. This patch adds an ACCESS call to the OPEN compound to handle the difference between OPENs for reading and execution. Since we're going through the trouble of calling ACCESS, we check all possible access bits and cache the results hopefully avoiding an ACCESS call in the future. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Convert v4 into a moduleBryan Schumaker
This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>