summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-09-06CacheFiles: Implement interface to check cache consistencyDavid Howells
Implement the FS-Cache interface to check the consistency of a cache object in CacheFiles. Original-author: Hongyi Jia <jiayisuse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com>
2013-09-06FS-Cache: Add interface to check consistency of a cached objectDavid Howells
Extend the fscache netfs API so that the netfs can ask as to whether a cache object is up to date with respect to its corresponding netfs object: int fscache_check_consistency(struct fscache_cookie *cookie) This will call back to the netfs to check whether the auxiliary data associated with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it may also return -ENOMEM and -ERESTARTSYS. The backends now have to implement a mandatory operation pointer: int (*check_consistency)(struct fscache_object *object) that corresponds to the above API call. FS-Cache takes care of pinning the object and the cookie in memory and managing this call with respect to the object state. Original-author: Hongyi Jia <jiayisuse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com>
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc changes from David Miller: "Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey Dobriyan) and some support for Fujitsu sparc64x chips (from Allen Pais)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Export flush_ptrace_access() (needed by lustre) sparc: fix PCI device proc file mmap(2) sparc64: Remove RWSEM export leftovers sparc64: Fix off by one in trampoline TLB mapping installation loop. sparc64: Fix ITLB handler of null page esp_scsi: Fix tag state corruption when autosensing. sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall sparc64: cleanup: Rename ret_from_syscall to ret_from_fork sparc32: Fix exit flag passed from traced sys_sigreturn sparc64: Fix wrong syscall return value passed to trace_sys_exit() support sparc64x chip type in cpumap.c cpu hw caps support for sparc64x
2013-09-05NFS: Don't check lock owner compatibility in writes unless file is lockedTrond Myklebust
If we're doing buffered writes, and there is no file locking involved, then we don't have to worry about whether or not the lock owner information is identical. By relaxing this check, we ensure that fork()ed child processes can write to a page without having to first sync dirty data that was written by the parent to disk. Reported-by: Quentin Barnes <qbarnes@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Quentin Barnes <qbarnes@gmail.com>
2013-09-05fuse: drop dentry on failed revalidateAnand Avati
Drop a subtree when we find that it has moved or been delated. This can be done as long as there are no submounts under this location. If the directory was moved and we come across the same directory in a future lookup it will be reconnected by d_materialise_unique(). Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05fuse: clean up return in fuse_dentry_revalidate()Miklos Szeredi
On errors unrelated to the filesystem's state (ENOMEM, ENOTCONN) return the error itself from ->d_revalidate() insted of returning zero (invalid). Also make a common label for invalidating the dentry. This will be used by the next patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05fuse: use d_materialise_unique()Miklos Szeredi
Use d_materialise_unique() instead of d_splice_alias(). This allows dentry subtrees to be moved to a new place if there moved, even if something is referencing a dentry in the subtree (open fd, cwd, etc..). This will also allow us to drop a subtree if it is found to be replaced by something else. In this case the disconnected subtree can later be reconnected to its new location. d_materialise_unique() ensures that a directory entry only ever has one alias. We keep fc->inst_mutex around the calls for d_materialise_unique() on directories to prevent a race with mkdir "stealing" the inode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05sysfs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05nfs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05gfs2: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05afs: use check_submounts_and_drop()Miklos Szeredi
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: check unlinked ancestors before mountMiklos Szeredi
We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Nor do we prevent mounts to be added to the disconnected subtree using relative paths after the d_drop(). This patch fixes these issues by checking for unlinked (unhashed, non-root) ancestors before proceeding with the mount. This is done with rename seqlock taken for write and with ->d_lock grabbed on each ancestor in turn, including our dentry itself. This ensures that the only one of check_submounts_and_drop() or has_unlinked_ancestor() can succeed. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: check submounts and drop atomicallyMiklos Szeredi
We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Process A: have_submounts() -> returns false Process B: mount() -> success Process A: d_drop() This patch prepares the ground for the fix by doing the following operations all under the same rename lock: have_submounts() shrink_dcache_parent() d_drop() This is actually an optimization since have_submounts() and shrink_dcache_parent() both traverse the same dentry tree separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: David Howells <dhowells@redhat.com> CC: Steven Whitehouse <swhiteho@redhat.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: add d_walk()Miklos Szeredi
This one replaces three instances open coded tree walking (have_submounts, select_parent, d_genocide) with a common helper. In addition to slightly reducing the kernel size, this simplifies the callers and makes them less bug prone. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05vfs: restructure d_genocide()Miklos Szeredi
It shouldn't matter when we decrement the refcount during the walk as long as we do it exactly once. Restructure d_genocide() to do the killing on entering the dentry instead of when leaving it. This helps creating a common helper for tree walking. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05sparc: fix PCI device proc file mmap(2)Alexey Dobriyan
Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries" must have broken mmapping of PCI device proc files on Sparc. Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area. Add wrapper around ->get_unmapped_area. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "Unfortunately, this merge window it'll have a be a lot of small piles - my fault, actually, for not keeping #for-next in anything that would resemble a sane shape ;-/ This pile: assorted fixes (the first 3 are -stable fodder, IMO) and cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last components) + several long-standing patches from various folks. There definitely will be a lot more (starting with Miklos' check_submount_and_drop() series)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) direct-io: Handle O_(D)SYNC AIO direct-io: Implement generic deferred AIO completions add formats for dentry/file pathnames kvm eventfd: switch to fdget powerpc kvm: use fdget switch fchmod() to fdget switch epoll_ctl() to fdget switch copy_module_from_fd() to fdget git simplify nilfs check for busy subtree ibmasmfs: don't bother passing superblock when not needed don't pass superblock to hypfs_{mkdir,create*} don't pass superblock to hypfs_diag_create_files don't pass superblock to hypfs_vm_create_files() oprofile: get rid of pointless forward declarations of struct super_block oprofilefs_create_...() do not need superblock argument oprofilefs_mkdir() doesn't need superblock argument don't bother with passing superblock to oprofile_create_stats_files() oprofile: don't bother with passing superblock to ->create_files() don't bother passing sb to oprofile_create_files() coh901318: don't open-code simple_read_from_buffer() ...
2013-09-05nfs4: Map NFS4ERR_WRONG_CRED to EPERMWeston Andros Adamson
Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED write and commit supportWeston Andros Adamson
WRITE and COMMIT can use the machine credential. If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED stateid supportWeston Andros Adamson
TEST_STATEID and FREE_STATEID can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED secinfo supportWeston Andros Adamson
SECINFO and SECINFO_NONAME can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED cleanup supportWeston Andros Adamson
CLOSE and LOCKU can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add state protection handlerWeston Andros Adamson
Add nfs4_state_protect - the function responsible for switching to the machine credential and the correct rpc client when SP4_MACH_CRED is in use. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Minimal SP4_MACH_CRED implementationWeston Andros Adamson
This is a minimal client side implementation of SP4_MACH_CRED. It will attempt to negotiate SP4_MACH_CRED iff the EXCHANGE_ID is using krb5i or krb5p auth. SP4_MACH_CRED will be used if the server supports the minimal operations: BIND_CONN_TO_SESSION EXCHANGE_ID CREATE_SESSION DESTROY_SESSION DESTROY_CLIENTID This patch only includes the EXCHANGE_ID negotiation code because the client will already use the machine cred for these operations. If the server doesn't support SP4_MACH_CRED or doesn't support the minimal operations, the exchange id will be resent with SP4_NONE. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05GFS2: dirty inode correctly in gfs2_write_endBenjamin Marzinski
GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when it actually increased the file size. If gfs2_fsync was called without I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before returning, so any metadata or journaled data changes were not getting fsynced. This meant that writes to the middle of files were not always getting fsynced properly. This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been updated during a write. It also make gfs2_sync flush the incore log if I_DIRTY_PAGES is set, and the file is using data journalling. This will make sure that all incore logged data gets written to disk before returning from a fsync. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05GFS2: Don't flag consistency error if first mounter is a spectatorBob Peterson
This patch checks for the first mounter being a specator. If so, it makes sure all the journals are clean. If there's a dirty journal, the mount fails. Testing results: # insmod gfs2.ko # mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2 mount: permission denied # dmesg | tail -2 [ 3390.655996] GFS2: fsid=MUSKETEER:home: Now mounting FS... [ 3390.841336] GFS2: fsid=MUSKETEER:home.s: jid=0: Journal is dirty, so the first mounter must not be a spectator. # mount -tgfs2 /dev/sasdrives/scratch /mnt/gfs2 # umount /mnt/gfs2 # mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2 # ls /mnt/gfs2|wc -l 352 # umount /mnt/gfs2 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-05f2fs: optimize gc for better performanceJin Xu
This patch improves the gc efficiency by optimizing the victim selection policy. With this optimization, the random re-write performance could increase up to 20%. For f2fs, when disk is in shortage of free spaces, gc will selects dirty segments and moves valid blocks around for making more space available. The gc cost of a segment is determined by the valid blocks in the segment. The less the valid blocks, the higher the efficiency. The ideal victim segment is the one that has the most garbage blocks. Currently, it searches up to 20 dirty segments for a victim segment. The selected victim is not likely the best victim for gc when there are much more dirty segments. Why not searching more dirty segments for a better victim? The cost of searching dirty segments is negligible in comparison to moving blocks. In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make the search more aggressively for a possible better victim. Since it also applies to victim selection for SSR, it will likely improve the SSR efficiency as well. The test case is simple. It creates as many files until the disk full. The size for each file is 32KB. Then it writes as many as 100000 records of 4KB size to random offsets of random files in sync mode. The testing was done on a 2GB partition of a SDHC card. Let's see the test result of f2fs without and with the patch. --------------------------------------- 2GB partition, SDHC create 52023 files of size 32768 bytes random re-write 100000 records of 4KB --------------------------------------- | file creation (s) | rewrite time (s) | gc count | gc garbage blocks | [no patch] 341 4227 1174 174840 [patched] 324 2958 645 106682 It's obvious that, with the patch, f2fs finishes the test in 20+% less time than without the patch. And internally it does much less gc with higher efficiency than before. Since the performance improvement is related to gc, it might not be so obvious for other tests that do not trigger gc as often as this one ( This is because f2fs selects dirty segments for SSR use most of the time when free space is in shortage). The well-known iozone test tool was not used for benchmarking the patch becuase it seems do not have a test case that performs random re-write on a full disk. This patch is the revised version based on the suggestion from Jaegeuk Kim. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: suggested simpler solution] Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05f2fs: merge more bios of node block writesJaegeuk Kim
Previously, we experience bio traces as follows when running simple sequential write test. f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K -> total 512K The first one is to write an indirect node block, and the others are to write direct node blocks. The reason why there are two separate bios for direct node blocks is: 0. initial state ------------------ ------------------ | | |xxxxxxxx | ------------------ ------------------ 1. write 368K ------------------ ------------------ | | |xxxxxxxxWWWWWWWW| ------------------ ------------------ 2. write 140K ------------------ ------------------ |WWWWWWW | |xxxxxxxxWWWWWWWW| ------------------ ------------------ This is because f2fs_write_node_pages tries to write just 512K totally, so that we can lose the chance to merge more bios nicely. After this patch is applied, we can get the following bio traces. f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K And finally, we can improve the sequential write performance, from 458.775 MB/s to 479.945 MB/s on SSD. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05Merge tag 'PTR_RET-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO
2013-09-05Merge tag 'dlm-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a workqueue cleanup and the removal of incorrect and unneeded signal blocking" * tag 'dlm-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: remove signal blocking dlm: WQ_NON_REENTRANT is meaningless and going away
2013-09-05Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features for 3.12: - Added aggressive extent caching using the extent status tree. This can actually decrease memory usage in read-mostly workloads since the information is much more compactly stored in the extent status tree than if we had to keep the extent tree metadata blocks in the buffer cache. This also improves Asynchronous I/O since it is it makes much less likely that we need to do metadata I/O to lookup the extent tree information. - Improve the recovery after corrupted allocation bitmaps are found when running in errors=ignore mode. Also fixed some writeback vs truncate races when using a blocksize less than the page size" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits) ext4: allow specifying external journal by pathname mount option ext4: mark group corrupt on group descriptor checksum ext4: mark block group as corrupt on inode bitmap error ext4: mark block group as corrupt on block bitmap error ext4: fix type declaration of ext4_validate_block_bitmap ext4: error out if verifying the block bitmap fails jbd2: Fix endian mixing problems in the checksumming code ext4: isolate ext4_extents.h file ext4: Fix misspellings using 'codespell' tool ext4: convert write_begin methods to stable_page_writes semantics ext4: fix use of potentially uninitialized variables in debugging code ext4: fix lost truncate due to race with writeback ext4: simplify truncation code in ext4_setattr() ext4: fix ext4_writepages() in presence of truncate ext4: move test whether extent to map can be extended to one place ext4: fix warning in ext4_da_update_reserve_space() quota: provide interface for readding allocated space into reserved space ext4: avoid reusing recently deleted inodes in no journal mode ext4: allocate delayed allocation blocks before rename ext4: start handle at least possible moment when renaming files ...
2013-09-04NFSv4: Document the recover_lost_locks kernel parameterTrond Myklebust
Rename the new 'recover_locks' kernel parameter to 'recover_lost_locks' and change the default to 'false'. Document why in Documentation/kernel-parameters.txt Move the 'recover_lost_locks' kernel parameter to fs/nfs/super.c to make it easy to backport to kernels prior to 3.6.x, which don't have a separate NFSv4 module. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04NFSv4: Don't try to recover NFSv4 locks when they are lost.NeilBrown
When an NFSv4 client loses contact with the server it can lose any locks that it holds. Currently when it reconnects to the server it simply tries to reclaim those locks. This might succeed even though some other client has held and released a lock in the mean time. So the first client might think the file is unchanged, but it isn't. This isn't good. If, when recovery happens, the locks cannot be claimed because some other client still holds the lock, then we get a message in the kernel logs, but the client can still write. So two clients can both think they have a lock and can both write at the same time. This is equally not good. There was a patch a while ago http://comments.gmane.org/gmane.linux.nfs/41917 which tried to address some of this, but it didn't seem to go anywhere. That patch would also send a signal to the process. That might be useful but for now this patch just causes writes to fail. For NFSv4 (unlike v2/v3) there is a strong link between the lock and the write request so we can fairly easily fail any IO of the lock is gone. While some applications might not expect this, it is still safer than allowing the write to succeed. Because this is a fairly big change in behaviour a module parameter, "recover_locks", is introduced which defaults to true (the current behaviour) but can be set to "false" to tell the client not to try to recover things that were lost. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04NFS: Fix warning introduced by NFSv4.0 transport blocking patchesChuck Lever
When CONFIG_NFS_V4_1 is not enabled, gcc emits this warning: linux/fs/nfs/nfs4state.c:255:12: warning: ‘nfs4_begin_drain_session’ defined but not used [-Wunused-function] static int nfs4_begin_drain_session(struct nfs_client *clp) ^ Eventually NFSv4.0 migration recovery will invoke this function, but that has not yet been merged. Hide nfs4_begin_drain_session() behind CONFIG_NFS_V4_1 for now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning:Chuck Lever
linux/fs/nfs/nfs4session.c:337:6: warning: symbol 'nfs41_set_target_slotid' was not declared. Should it be static? Move nfs41_set_target_slotid() and nfs41_update_target_slotid() back behind CONFIG_NFS_V4_1, since, in the final revision of this work, they are used only in NFSv4.1 and later. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04fuse: use list_for_each_entry() for list traversingDong Fang
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-09-04GFS2: Remove unnecessary memory barrierBob Peterson
Function test_and_clear_bit implies a memory barrier, so subsequent memory barriers are unnecessary. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-04direct-io: Handle O_(D)SYNC AIOChristoph Hellwig
Call generic_write_sync() from the deferred I/O completion handler if O_DSYNC is set for a write request. Also make sure various callers don't call generic_write_sync if the direct I/O code returns -EIOCBQUEUED. Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04direct-io: Implement generic deferred AIO completionsChristoph Hellwig
Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient code in XFS and ext4 (we save a memory allocation for each direct IO) and will be needed to properly support O_(D)SYNC for AIO. The communication between the filesystem and the direct I/O code requires a new buffer head flag, which is a bit ugly but not avoidable until the direct I/O code stops abusing the buffer_head structure for communicating with the filesystems. Currently this creates a per-superblock unbound workqueue for these completions, which is taken from an earlier patch by Jan Kara. I'm not really convinced about this use and would prefer a "normal" global workqueue with a high concurrency limit, but this needs further discussion. JK: Fixed ext4 part, dynamic allocation of the workqueue. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04Merge tag 'please-pull-pstore' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore changes from Tony Luck: "A big part of this is the addition of compression to the generic pstore layer so that all backends can use the pitiful amounts of storage they control more effectively. Three other small fixes/cleanups too. * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore/ram: (really) fix undefined usage of rounddown_pow_of_two pstore/ram: Read and write to the 'compressed' flag of pstore efi-pstore: Read and write to the 'compressed' flag of pstore erst: Read and write to the 'compressed' flag of pstore powerpc/pseries: Read and write to the 'compressed' flag of pstore pstore: Add file extension to pstore file if compressed pstore: Add decompression support to pstore pstore: Introduce new argument 'compressed' in the read callback pstore: Add compression support to pstore pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selected pstore: Add new argument 'compressed' in pstore write callback powerpc/pseries: Remove (de)compression in nvram with pstore enabled pstore: d_alloc_name() doesn't return an ERR_PTR acpi/apei/erst: Add missing iounmap() on error in erst_exec_move_data()
2013-09-04switch fchmod() to fdgetAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04switch epoll_ctl() to fdgetAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04git simplify nilfs check for busy subtreeAl Viro
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04constify touch_atime()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04vfs: allow umount to handle mountpoints without revalidating themJeff Layton
Christopher reported a regression where he was unable to unmount a NFS filesystem where the root had gone stale. The problem is that d_revalidate handles the root of the filesystem differently from other dentries, but d_weak_revalidate does not. We could simply fix this by making d_weak_revalidate return success on IS_ROOT dentries, but there are cases where we do want to revalidate the root of the fs. A umount is really a special case. We generally aren't interested in anything but the dentry and vfsmount that's attached at that point. If the inode turns out to be stale we just don't care since the intent is to stop using it anyway. Try to handle this situation better by treating umount as a special case in the lookup code. Have it resolve the parent using normal means, and then do a lookup of the final dentry without revalidating it. In most cases, the final lookup will come out of the dcache, but the case where there's a trailing symlink or !LAST_NORM entry on the end complicates things a bit. Cc: Neil Brown <neilb@suse.de> Reported-by: Christopher T Vogan <cvogan@us.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04vfs: call d_op->d_prune() before unhashing dentryYan, Zheng
The d_prune dentry operation is used to notify filesystem when VFS about to prune a hashed dentry from the dcache. There are three code paths that prune dentries: shrink_dcache_for_umount_subtree(), prune_dcache_sb() and d_prune_aliases(). For the d_prune_aliases() case, VFS unhashes the dentry first, then call the d_prune dentry operation. This confuses ceph_d_prune() (ceph uses the d_prune dentry operation to maintain a flag indicating whether the complete contents of a directory are in the dcache, pruning unhashed dentry does not affect dir's completeness) This patch fixes the issue by calling the d_prune dentry operation in d_prune_aliases(), before unhashing the dentry. Also make VFS only call the d_prune dentry operation for hashed dentry, to avoid calling the d_prune dentry operation twice when dentry is pruned by d_prune_aliases(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04only regular files with FMODE_WRITE need to be on s_filesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04nfsd: racy access to ->d_name in nsfd4_encode_path()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04Merge branch 'for-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...
2013-09-03xfs: XFS_MOUNT_QUOTA_ALL needed by userspaceDave Chinner
So move it to a header file shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>