summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-03-05fs/locks.c: Remove stale FIXME left over from BKL conversionMatt Fleming
The comment is no longer true as (now that the BKL conversion is finished) a spinlock _is_ now used to protect file_lock_list, blocked_list and inode->i_flock. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2011-03-02ufs: remove the BKLArnd Bergmann
This introduces a new per-superblock mutex in UFS to replace the big kernel lock. I have been careful to avoid nested calls to lock_ufs and to get the lock order right with respect to other mutexes, in particular lock_super. I did not make any attempt to prove that the big kernel lock is not needed in a particular place in the code, which is very possible. The mutex has a significant performance impact, so it is only used on SMP or PREEMPT configurations. As Nick Piggin noticed, any allocation inside of the lock may end up deadlocking when we get to ufs_getfrag_block in the reclaim task, so we now use GFP_NOFS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Nick Bowler <nbowler@elliptictech.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Nick Piggin <npiggin@gmail.com>
2011-03-02hpfs: remove the BKLArnd Bergmann
This removes the BKL in hpfs in a rather awful way, by making the code only work on uniprocessor systems without kernel preemption, as suggested by Andi Kleen. The HPFS code probably has close to zero remaining users on current kernels, all archeological uses of the file system can probably be done with the significant restrictions. The hpfs_lock/hpfs_unlock functions are left in the code, sincen Mikulas has indicated that he is still interested in fixing it in a better way. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: linux-fsdevel@vger.kernel.org
2011-03-01adfs: remove the big kernel lockArnd Bergmann
According to Russell King, adfs was written to not require the big kernel lock, and all inode updates are done under adfs_dir_lock. All other metadata in adfs is read-only and does not require locking. The use of the BKL is the result of various pushdowns from the VFS operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
2011-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Copy up lower inode attrs in getattr ecryptfs: read on a directory should return EISDIR if not supported eCryptfs: Handle NULL nameidata pointers eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata"
2011-02-21Docbook: add fs/eventfd.c and fix typos in itRandy Dunlap
Add fs/eventfd.c to filesystems docbook. Make typo corrections in fs/eventfd.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling
2011-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] update cifs version cifs: Fix regression in LANMAN (LM) auth code cifs: fix handling of scopeid in cifs_convert_address
2011-02-21[CIFS] update cifs versionSteve French
Update version to 1.71 so we can more easily spot modules with the last two fixes Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-02-21cifs: Fix regression in LANMAN (LM) auth codeShirish Pargaonkar
LANMAN response length was changed to 16 bytes instead of 24 bytes. Revert it back to 24 bytes. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: stable@kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-02-21eCryptfs: Copy up lower inode attrs in getattrTyler Hicks
The lower filesystem may do some type of inode revalidation during a getattr call. eCryptfs should take advantage of that by copying the lower inode attributes to the eCryptfs inode after a call to vfs_getattr() on the lower inode. I originally wrote this fix while working on eCryptfs on nfsv3 support, but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp bug that was reported. https://bugs.launchpad.net/bugs/613873 Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-02-21ecryptfs: read on a directory should return EISDIR if not supportedAndy Whitcroft
read() calls against a file descriptor connected to a directory are incorrectly returning EINVAL rather than EISDIR: [EISDIR] [XSI] [Option Start] The fildes argument refers to a directory and the implementation does not allow the directory to be read using read() or pread(). The readdir() function should be used instead. [Option End] This occurs because we do not have a .read operation defined for ecryptfs directories. Connect this up to generic_read_dir(). BugLink: http://bugs.launchpad.net/bugs/719691 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-02-21eCryptfs: Handle NULL nameidata pointersTyler Hicks
Allow for NULL nameidata pointers in eCryptfs create, lookup, and d_revalidate functions. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-02-20ceph: keep reference to parent inode on ceph_dentryYehuda Sadeh
When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from 949854d0, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-02-18Merge branch 'fixes-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' workqueue: wake up a worker when a rescuer is leaving a gcwq
2011-02-18eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata"Tyler Hicks
This reverts commit 21edad32205e97dc7ccb81a85234c77e760364c8 and commit 93c3fe40c279f002906ad14584c30671097d4394, which fixed a regression by the former. Al Viro pointed out bypassed dcache lookups in ecryptfs_new_lower_dentry(), misuse of vfs_path_lookup() in ecryptfs_lookup_one_lower() and a dislike of passing nameidata to the lower filesystem. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-02-18fs/partitions: Validate map_count in Mac partition tablesTimo Warns
Validate number of blocks in map and remove redundant variable. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-17Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: correctly handle return value from nfsd_map_name_to_*
2011-02-17cifs: fix handling of scopeid in cifs_convert_addressJeff Layton
The code finds, the '%' sign in an ipv6 address and copies that to a buffer allocated on the stack. It then ignores that buffer, and passes 'pct' to simple_strtoul(), which doesn't work right because we're comparing 'endp' against a completely different string. Fix it by passing the correct pointer. While we're at it, this is a good candidate for conversion to strict_strtoul as well. Cc: stable@kernel.org Cc: David Howells <dhowells@redhat.com> Reported-by: Björn JACKE <bj@sernet.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-02-17block: revert block_dev read-only checkChuck Ebbert
This reverts commit 75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()"). That commit added stricter checking to make sure devices that were being used read-only were actually opened in that mode. It turns out that the change breaks a bunch of kernel code that opens block devices. Affected systems include dm, md, and the loop device. Because strict checking for read-only opens of block devices was not done before this, the code that opens the devices was opening them read-write even if they were being used read-only. Auditing all that code will take time, and new userspace packages for dm, mdadm, etc. will also be required. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-16nfsd: correctly handle return value from nfsd_map_name_to_*NeilBrown
These functions return an nfs status, not a host_err. So don't try to convert before returning. This is a regression introduced by 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, but missed these two. Cc: stable@kernel.org Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-16vfs: fix BUG_ON() in fs/namei.c:1461Linus Torvalds
When Al moved the nameidata_dentry_drop_rcu_maybe() call into the do_follow_link function in commit 844a391799c2 ("nothing in do_follow_link() is going to see RCU"), he mistakenly left the BUG_ON(inode != path->dentry->d_inode); behind. Which would otherwise be ok, but that BUG_ON() really needs to be _after_ dropping RCU, since the dentry isn't necessarily stable otherwise. So complete the code movement in that commit, and move the BUG_ON() into do_follow_link() too. This means that we need to pass in 'inode' as an argument (just for this one use), but that's a small thing. And eventually we may be confident enough in our path lookup that we can just remove the BUG_ON() and the unnecessary inode argument. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-16workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'Tejun Heo
There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>
2011-02-15Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: break lease on unlink due to rename nfsd4: acquire only one lease per file nfsd4: modify fi_delegations under recall_lock nfsd4: remove unused deleg dprintk's. nfsd4: split lease setting into separate function nfsd4: fix leak on allocation error nfsd4: add helper function for lease setup nfsd4: split up nfsd_break_deleg_cb NFSD: memory corruption due to writing beyond the stat array NFSD: use nfserr for status after decode_cb_op_status nfsd: don't leak dentry count on mnt_want_write failure
2011-02-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() drop out of RCU in return_reval split do_revalidate() into RCU and non-RCU cases in do_lookup() split RCU and non-RCU cases of need_revalidate nothing in do_follow_link() is going to see RCU
2011-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check return value of alloc_extent_map() Btrfs - Fix memory leak in btrfs_init_new_device() btrfs: prevent heap corruption in btrfs_ioctl_space_info() Btrfs: Fix balance panic Btrfs: don't release pages when we can't clear the uptodate bits Btrfs: fix page->private races
2011-02-15s390: remove task_show_regsMartin Schwidefsky
task_show_regs used to be a debugging aid in the early bringup days of Linux on s390. /proc/<pid>/status is a world readable file, it is not a good idea to show the registers of a process. The only correct fix is to remove task_show_regs. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-15get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()Al Viro
can't happen anymore and didn't work right anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15drop out of RCU in return_revalAl Viro
... thus killing the need to handle drop-from-RCU in d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15split do_revalidate() into RCU and non-RCU casesAl Viro
fixing oopsen in lookup_one_len() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15in do_lookup() split RCU and non-RCU cases of need_revalidateAl Viro
and use unlikely() instead of gotos, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15nothing in do_follow_link() is going to see RCUAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-14Btrfs: check return value of alloc_extent_map()Tsutomu Itoh
I add the check on the return value of alloc_extent_map() to several places. In addition, alloc_extent_map() returns only the address or NULL. Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14Btrfs - Fix memory leak in btrfs_init_new_device()Ilya Dryomov
Memory allocated by calling kstrdup() should be freed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14btrfs: prevent heap corruption in btrfs_ioctl_space_info()Dan Rosenberg
Commit bf5fc093c5b625e4259203f1cee7ca73488a5620 refactored btrfs_ioctl_space_info() and introduced several security issues. space_args.space_slots is an unsigned 64-bit type controlled by a possibly unprivileged caller. The comparison as a signed int type allows providing values that are treated as negative and cause the subsequent allocation size calculation to wrap, or be truncated to 0. By providing a size that's truncated to 0, kmalloc() will return ZERO_SIZE_PTR. It's also possible to provide a value smaller than the slot count. The subsequent loop ignores the allocation size when copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR. The fix changes the slot count type and comparison typecast to u64, which prevents truncation or signedness errors, and also ensures that we don't copy more data than we've allocated in the subsequent loop. Note that zero-size allocations are no longer possible since there is already an explicit check for space_args.space_slots being 0 and truncation of this value is no longer an issue. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Josef Bacik <josef@redhat.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14Btrfs: Fix balance panicYan, Zheng
Mark the cloned backref_node as checked in clone_backref_node() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14Btrfs: don't release pages when we can't clear the uptodate bitsChris Mason
Btrfs tracks uptodate state in an rbtree as well as in the page bits. This is supposed to enable us to use block sizes other than the page size, but there are a few parts still missing before that completely works. But, our readpage routine trusts this additional range based tracking of uptodateness, much in the same way the buffer head up to date bits are trusted for the other filesystems. The problem is that sometimes we need to allocate memory in order to split records in the rbtree, even when we are just clearing bits. This can be difficult when our clearing function is called GFP_ATOMIC, which can happen in the releasepage path. So, what happens today looks like this: releasepage called with GFP_ATOMIC btrfs_releasepage calls clear_extent_bit clear_extent_bit fails to allocate ram, leaving the up to date bit set btrfs_releasepage returns success The end result is the page being gone, but btrfs thinking the range is up to date. Later on if someone tries to read that same page, the btrfs readpage code will return immediately thinking the page is already up to date. This commit fixes things to fail the releasepage when we can't clear the extent state bits. It covers both data pages and metadata tree blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14Btrfs: fix page->private racesChris Mason
There is a race where btrfs_releasepage can drop the page->private contents just as alloc_extent_buffer is setting up pages for metadata. Because of how the Btrfs page flags work, this results in us skipping the crc on the page during IO. This patch sovles the race by waiting until after the extent buffer is inserted into the radix tree before it sets page private. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14nfsd: break lease on unlink due to renameJ. Bruce Fields
4795bb37effb7b8fe77e2d2034545d062d3788a8 "nfsd: break lease on unlink, link, and rename", only broke the lease on the file that was being renamed, and didn't handle the case where the target path refers to an already-existing file that will be unlinked by a rename--in that case the target file should have any leases broken as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: acquire only one lease per fileJ. Bruce Fields
Instead of acquiring one lease each time another client opens a file, nfsd can acquire just one lease to represent all of them, and reference count it to determine when to release it. This fixes a regression introduced by c45821d263a8a5109d69a9e8942b8d65bcd5f31a "locks: eliminate fl_mylease callback": after that patch, only the struct file * is used to determine who owns a given lease. But since we recently converted the server to share a single struct file per open, if we acquire multiple leases on the same file from nfsd, it then becomes impossible on unlocking a lease to determine which of those leases (all of whom share the same struct file *) we meant to remove. Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous version of this patch. Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: modify fi_delegations under recall_lockJ. Bruce Fields
Modify fi_delegations only under the recall_lock, allowing us to use that list on lease breaks. Also some trivial cleanup to simplify later changes. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: remove unused deleg dprintk's.J. Bruce Fields
These aren't all that useful, and get in the way of the next steps. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: split lease setting into separate functionJ. Bruce Fields
Splitting some code into a separate function which we'll be adding some more to. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: fix leak on allocation errorJ. Bruce Fields
Also share some common exit code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: add helper function for lease setupJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd4: split up nfsd_break_deleg_cbJ. Bruce Fields
We'll be adding some more code here soon. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14NFSD: memory corruption due to writing beyond the stat arrayKonstantin Khorenko
If nfsd fails to find an exported via NFS file in the readahead cache, it should increment corresponding nfsdstats counter (ra_depth[10]), but due to a bug it may instead write to ra_depth[11], corrupting the following field. In a kernel with NFSDv4 compiled in the corruption takes the form of an increment of a counter of the number of NFSv4 operation 0's received; since there is no operation 0, this is harmless. In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the memory beyond nfsdstats. Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14NFSD: use nfserr for status after decode_cb_op_statusBenny Halevy
Bugs introduced in 85a56480191ca9f08fc775c129b9eb5c8c1f2c05 "NFSD: Update XDR decoders in NFSv4 callback client" Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14nfsd: don't leak dentry count on mnt_want_write failureJ. Bruce Fields
The exit cleanup isn't quite right here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-12Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: call __jbd2_log_start_commit with j_state_lock write locked ext4: serialize unaligned asynchronous DIO ext4: make grpinfo slab cache names static ext4: Fix data corruption with multi-block writepages support ext4: fix up ext4 error handling ext4: unregister features interface on module unload ext4: fix panic on module unload when stopping lazyinit thread