summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and bulk recallsTrond Myklebust
Replace another case where the layout 'plh_block_lgets' can trigger infinite loops in send_layoutget(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layoutTrond Myklebust
If the server reboots while there is a layoutget outstanding, then the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN error, which causes an infinite loop in send_layoutget(). The reason why we never break out of the loop is that the layout 'plh_block_lgets' field is never cleared. Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which can be reset after a new layoutget. Fixes: ab7d763e477c5 ("pNFS: Ensure nfs4_layoutget_prepare returns...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_pageKinglong Mee
unreferenced object 0xffffc90000abf000 (size 16900): comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280 [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50 [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver] [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver] [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4] [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4] [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4] [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0 [<ffffffff81228f1d>] do_fsync+0x3d/0x70 [<ffffffff812291d0>] SyS_fsync+0x10/0x20 [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff v2, add missing include header Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17nfs4: fix stateid handling for the NFS v4.2 operationsChristoph Hellwig
The newly added NFS v4.2 operations (ALLOCATE, DEALLOCATE, SEEK and CLONE) use a helper called nfs42_set_rw_stateid to select a stateid that is sent to the server. But they don't set the inode and state fields in the nfs4_exception structure, and this don't partake in the stateid recovery protocol. Because of this they will simply return errors insted of trying to recover a stateid when the server return a BAD_STATEID error. Additionally CLONE has the problem that it operates on two files and thus two stateids, and thus needs to call the exception handler twice to recover stateids. While we're at it stop grabbing an addititional reference to the open context in all these operations - having the file open guarantees that the open context won't go away. All this can be produces with the generic/168 and generic/170 tests in xfstests which stress the CLONE stateid handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17NFSv4: Fix a dentry leak on alias useBenjamin Coddington
In the case where d_add_unique() finds an appropriate alias to use it will have already incremented the reference count. An additional dget() to swap the open context's dentry is unnecessary and will leak a reference. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 275bb307865a3 ("NFSv4: Move dentry instantiation into the NFSv4-...") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomodeTrond Myklebust
When setting the layout return mode, we must always also set the NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn. Otherwise pnfs_error_mark_layout_for_return() could set the mode, but fail to send the layoutreturn because another is already in flight. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15pNFS: Fix pnfs_mark_matching_lsegs_return()Trond Myklebust
We don't need to schedule a layoutreturn if the layout segment can be freed immediately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-28NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSETrond Myklebust
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-27pNFS: Fix missing layoutreturn callsTrond Myklebust
The layoutreturn code currently relies on pnfs_put_lseg() to initiate the RPC call when conditions are right. A problem arises when we want to free the layout segment from inside an inode->i_lock section (e.g. in pnfs_clear_request_commit()), since we cannot sleep. The workaround is to move the actual call to pnfs_send_layoutreturn() to pnfs_put_layout_hdr(), which doesn't have this restriction. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: - The ->i_mutex wrappers (with small prereq in lustre) - a fix for too early freeing of symlink bodies on shmem (they need to be RCU-delayed) (-stable fodder) - followup to dedupe stuff merged this cycle * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: abort dedupe loop if fatal signals are pending make sure that freeing shmem fast symlinks is RCU-delayed wrappers for ->i_mutex access lustre: remove unused declaration
2016-01-23Merge tag 'nfs-for-4.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes and cleanups from Trond Myklebust: "Bugfixes: - pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn - pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN Cleanups: - NFS: Simplify nfs_request_add_commit_list() arguments" * tag 'nfs-for-4.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn NFS: Simplify nfs_request_add_commit_list() arguments pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22Merge branch 'bugfixes'Trond Myklebust
* bugfixes: pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN
2016-01-22pNFS/flexfiles: Fix an XDR encoding bug in layoutreturnTrond Myklebust
We must not skip encoding the statistics, or the server will see an XDR encoding error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # 4.0+
2016-01-21NFS: Simplify nfs_request_add_commit_list() argumentsAnna Schumaker
I noticed that all the callers of this function pass cinfo->mds->list as an argument in addition to the cinfo structure itself. Let's get rid of the extra argument, since it doesn't seem to be adding anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-21pNFS/flexfiles: Improve merging of errors in LAYOUTRETURNTrond Myklebust
When we hit 22 errors, we start to overflow the memory buffers allocated to the LAYOUTRETURN errors. The issue is that currently, RPC call reply ordering determines how successful we are in merging errors that refer to contiguous READ or WRITE requests. Fix is to use an insertion sort to help detect contiguity. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-15Merge tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a regression in the SunRPC socket polling code - Fix the attribute cache revalidation code - Fix race in __update_open_stateid() - Fix an lo->plh_block_lgets imbalance in layoutreturn - Fix an Oopsable typo in ff_mirror_match_fh() Features: - pNFS layout recall performance improvements. - pNFS/flexfiles: Support server-supplied layoutstats sampling period Bugfixes + cleanups: - NFSv4: Don't perform cached access checks before we've OPENed the file - Fix starvation issues with background flushes - Reclaim writes should be flushed as unstable writes if there are already entries in the commit lists - Various bugfixes from Chuck to fix NFS/RDMA send queue ordering problems - Ensure that we propagate fatal layoutget errors back to the application - Fixes for sundry flexfiles layoutstats bugs - Fix files/flexfiles to not cache invalidated layouts in the DS commit buckets" * tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits) NFS: Fix a compile warning about unused variable in nfs_generic_pg_pgios() NFSv4: Fix a compile warning about no prototype for nfs4_ioctl() NFS: Use wait_on_atomic_t() for unlock after readahead SUNRPC: Fixup socket wait for memory NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() ...
2016-01-15kmemcg: account certain kmem allocations to memcgVladimir Davydov
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14Make sure that highmem pages are not added to symlink page cacheAl Viro
inode_nohighmem() is sufficient to make sure that page_get_link() won't try to allocate a highmem page. Moreover, it is sufficient to make sure that page_symlink/__page_symlink won't do the same thing. However, any filesystem that manually preseeds the symlink's page cache upon symlink(2) needs to make sure that the page it inserts there won't be a highmem one. Fortunately, only nfs and shmem have run afoul of that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-13Merge branch 'work.copy_file_range' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper
2016-01-11Merge branch 'work.xattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "Andreas' xattr cleanup series. It's a followup to his xattr work that went in last cycle; -0.5KLoC" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr handlers: Simplify list operation ocfs2: Replace list xattr handler operations nfs: Move call to security_inode_listsecurity into nfs_listxattr xfs: Change how listxattr generates synthetic attributes tmpfs: listxattr should include POSIX ACL xattrs tmpfs: Use xattr handler infrastructure btrfs: Use xattr handler infrastructure vfs: Distinguish between full xattr names and proper prefixes posix acls: Remove duplicate xattr name definitions gfs2: Remove gfs2_xattr_acl_chmod vfs: Remove vfs_xattr_cmp
2016-01-11Merge branch 'work.symlinks' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-08NFS: Fix a compile warning about unused variable in nfs_generic_pg_pgios()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-08NFSv4: Fix a compile warning about no prototype for nfs4_ioctl()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-07Merge branch 'bugfixes'Trond Myklebust
* bugfixes: SUNRPC: Fixup socket wait for memory SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() NFS: Flush reclaim writes using FLUSH_COND_STABLE NFS: Background flush should not be low priority NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn NFSv4: Don't perform cached access checks before we've OPENed the file NFS: Allow the combination pNFS and labeled NFS NFS42: handle layoutstats stateid error nfs: Fix race in __update_open_stateid() nfs: fix missing assignment in nfs4_sequence_done tracepoint
2016-01-07NFS: Use wait_on_atomic_t() for unlock after readaheadBenjamin Coddington
The use of wait_on_atomic_t() for waiting on I/O to complete before unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the nfs_iocounter's flags member, and finally the nfs_iocounter altogether. The count of I/O is moved to the lock context, and the counter increment/decrement functions become simple enough to open-code. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04Merge branch 'pnfs_generic'Trond Myklebust
* pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
2016-01-04NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range argumentsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structuresTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()Trond Myklebust
Make it more obvious what we're returning... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Fix a race in initiate_file_draining()Trond Myklebust
Peng Tao points out that the call to pnfs_mark_matching_lsegs_return() could race with pnfs_put_lseg(), in which case the layout segment is cleared, but no layoutreturn will be sent. Fix is to replace the call to pnfs_mark_matching_lsegs_invalid(). Reported-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layoutTrond Myklebust
Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomodeTrond Myklebust
If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for return, then it must also set the return iomode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateidsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-04NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()Trond Myklebust
A stateid is a structure, pass it as a pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFS: Relax requirements in nfs_flush_incompatibleTrond Myklebust
If two processes share the same credentials and NFSv4 open stateid, then allow them both to dirty the same page, even if their nfs_open_context differs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalidTrond Myklebust
If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFS: Allow multiple commit requests in flight per fileTrond Myklebust
Allow synchronous RPC calls to wait for pending RPC calls to finish, but also allow asynchronous ones to just fire off another commit. With this patch, the xfstests generic/074 test completes in 226s instead of 242s Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-31NFS/pNFS: Fix up pNFS write reschedule layering violations and bugsTrond Myklebust
The flexfiles layout in particular, seems to want to poke around in the O_DIRECT flags when retransmitting. This patch sets up an interface to allow it to call back into O_DIRECT to handle retransmission correctly. It also fixes a potential bug whereby we could change the behaviour of O_DIRECT if an error is already pending. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()Trond Myklebust
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out copy+paste has played cruel tricks on a nested loop. Reported-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-30NFS: Fix attribute cache revalidationTrond Myklebust
If a NFSv4 client uses the cache_consistency_bitmask in order to request only information about the change attribute, timestamps and size, then it has not revalidated all attributes, and hence the attribute timeout timestamp should not be updated. Reported-by: Donald Buczek <buczek@molgen.mpg.de> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-29NFS: Ensure we revalidate attributes before using execute_ok()Trond Myklebust
Donald Buczek reports that NFS clients can also report incorrect results for access() due to lack of revalidation of attributes before calling execute_ok(). Looking closely, it seems chdir() is afflicted with the same problem. Fix is to ensure we call nfs_revalidate_inode_rcu() or nfs_revalidate_inode() as appropriate before deciding to trust execute_ok(). Reported-by: Donald Buczek <buczek@molgen.mpg.de> Link: http://lkml.kernel.org/r/1451331530-3748-1-git-send-email-buczek@molgen.mpg.de Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28Merge branch 'flexfiles'Trond Myklebust
* flexfiles: pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated early pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/write pNFS/flexfiles: Fix a statistics gathering imbalance pNFS/flexfiles: Don't mark the entire layout as failed, when returning it pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET pnfs/flexfiles: count io stat in rpc_count_stats callback pnfs/flexfiles: do not mark delay-like status as DS failure NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATA nfs: only remove page from mapping if launder_page fails nfs: handle request add failure properly nfs: centralize pgio error cleanup nfs: clean up rest of reqs when failing to add one NFS41: pop some layoutget errors to application pNFS/flexfiles: Support server-supplied layoutstats sampling period
2015-12-28NFSv4: List stateid information in the callback tracepointsTrond Myklebust
The stateid is extremely valuable when debugging. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALLTrond Myklebust
If the client is promising to return the layout ASAP, then there is no need to return DELAY and have the server retry. Instead default to the normal procedure described in RFC5661. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1Trond Myklebust
The RFC requires us to check if the server is recalling a stateid that we haven't yet received. If so, tell it to wait. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28pNFS: If we have to delay the layout callback, mark the layout for returnTrond Myklebust
If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28NFSv4.1/pNFS: Add a helper to mark the layout as returnedTrond Myklebust
This ensures that we don't reuse the stateid if a layout return or implied layout return means that we've returned all layout segments Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>