summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-11-22Pure nfs client performance using odirect.Arun Bharadwaj
When an application opens a file with O_DIRECT flag, if the size of the data that is written is equal to wsize, the client sends a WRITE RPC with stable flag set to UNSTABLE followed by a single COMMIT RPC rather than sending a single WRITE RPC with the stable flag set to FILE_SYNC. This a bug. Patch to fix this. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22Btrfs: avoid NULL pointer deref in try_release_extent_bufferChris Mason
If we fail to find a pointer in the radix tree, don't try to deref the NULL one we do have. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: make btrfs_add_nondir take parent inode as an argumentJosef Bacik
Everybody who calls btrfs_add_nondir just passes in the dentry of the new file and then dereference dentry->d_parent->d_inode, but everybody who calls btrfs_add_nondir() are already passed the parent's inode. So instead of dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as an argument and pass that along so we don't have to worry about d_parent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: hold i_mutex when calling btrfs_log_dentry_safeJosef Bacik
Since we walk up the path logging all of the parts of the inode's path, we need to hold i_mutex to make sure that the inode is not renamed while we're logging everything. btrfs_log_dentry_safe does dget_parent and all of that jazz, but we may get unexpected results if the rename changes the inode's location while we're higher up the path logging those dentries, so do this for safety reasons. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: use dget_parent where we can UPDATEDJosef Bacik
There are lots of places where we do dentry->d_parent->d_inode without holding the dentry->d_lock. This could cause problems with rename. So instead we need to use dget_parent() and hold the reference to the parent as long as we are going to use it's inode and then dput it at the end. Signed-off-by: Josef Bacik <josef@redhat.com> Cc: raven@themaw.net Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: fix more ESTALE problems with NFSJosef Bacik
When creating new inodes we don't setup inode->i_generation. So if we generate an fh with a newly created inode we save the generation of 0, but if we flush the inode to disk and have to read it back when getting the inode on the server we'll have the right i_generation, so gens wont match and we get ESTALE. This patch properly sets inode->i_generation when we create the new inode and now I'm no longer getting ESTALE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: handle NFS lookups properlyJosef Bacik
People kept reporting NFS issues, specifically getting ESTALE alot. I figured out how to reproduce the problem SERVER mkfs.btrfs /dev/sda1 mount /dev/sda1 /mnt/btrfs-test <add /mnt/btrfs-test to /etc/exports> btrfs subvol create /mnt/btrfs-test/foo service nfs start CLIENT mount server:/mnt/btrfs /mnt/test cd /mnt/test/foo ls SERVER echo 3 > /proc/sys/vm/drop_caches CLIENT ls <-- get an ESTALE here This is because the standard way to lookup a name in nfsd is to use readdir, and what it does is do a readdir on the parent directory looking for the inode of the child. So in this case the parent being / and the child being foo. Well subvols all have the same inode number, so doing a readdir of / looking for inode 256 will return '.', which obviously doesn't match foo. So instead we need to have our own .get_name so that we can find the right name. Our .get_name will either lookup the inode backref or the root backref, whichever we're looking for, and return the name we find. Running the above reproducer with this patch results in everything acting the way its supposed to. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: make 1-bit signed fileds unsignedMariusz Kozlowski
Fixes these sparse warnings: fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: Show device attr correctly for symlinksLi Zefan
Symlinks and files of other types show different device numbers, though they are on the same partition: $ touch tmp; ln -s tmp tmp2; stat tmp tmp2 File: `tmp' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 15h/21d Inode: 984027 Links: 1 --- snip --- File: `tmp2' -> `tmp' Size: 3 Blocks: 0 IO Block: 4096 symbolic link Device: 13h/19d Inode: 984028 Links: 1 Reported-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: Set file size correctly in file cloneLi Zefan
Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the original filesize of the dest file 'file2' is 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Now clone file1 to file2, the dest file should be 40K, but it still shows 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: Check if dest_offset is block-size aligned before cloning fileLi Zefan
We've done the check for src_offset and src_length, and We should also check dest_offset, otherwise we'll corrupt the destination file: (After cloning file1 to file2 with unaligned dest_offset) # cat /mnt/file2 cat: /mnt/file2: Input/output error Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: handle the space_cache option properlyJosef Bacik
When I added the clear_cache option I screwed up and took the break out of the space_cache case statement, so whenever you mount with space_cache you also get clear_cache, which does you no good if you say set space_cache in fstab so it always gets set. This patch adds the break back in properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: Fix early enospc because 'unused' calculated with wrong sign.Arne Jansen
'unused' calculated with wrong sign in reserve_metadata_bytes(). This might have lead to unwanted over-reservations. Signed-off-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: fix panic caused by direct IOMiao Xie
btrfs paniced when we write >64KB data by direct IO at one time. Reproduce steps: # mkfs.btrfs /dev/sda5 /dev/sda6 # mount /dev/sda5 /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=100K count=1 oflag=direct Then btrfs paniced: mapping failed logical 1103155200 bio len 69632 len 12288 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3010! [SNIP] Pid: 1992, comm: btrfs-worker-0 Not tainted 2.6.37-rc1 #1 D2399/PRIMERGY RIP: 0010:[<ffffffffa03d1462>] [<ffffffffa03d1462>] btrfs_map_bio+0x202/0x210 [btrfs] [SNIP] Call Trace: [<ffffffffa03ab3eb>] __btrfs_submit_bio_done+0x1b/0x20 [btrfs] [<ffffffffa03a35ff>] run_one_async_done+0x9f/0xb0 [btrfs] [<ffffffffa03d3d20>] run_ordered_completions+0x80/0xc0 [btrfs] [<ffffffffa03d45a4>] worker_loop+0x154/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffff81083216>] kthread+0x96/0xa0 [<ffffffff8100cec4>] kernel_thread_helper+0x4/0x10 [<ffffffff81083180>] ? kthread+0x0/0xa0 [<ffffffff8100cec0>] ? kernel_thread_helper+0x0/0x10 We fix this problem by splitting bios when we submit bios. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: cleanup duplicate bio allocating functionsMiao Xie
extent_bio_alloc() and compressed_bio_alloc() are similar, cleanup similar source code. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22btrfs: fix free dip and dip->csums twiceMiao Xie
bio_endio() will free dip and dip->csums, so dip and dip->csums twice will be freed twice. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-22Btrfs: add migrate page for metadata inodeChris Mason
Migrate page will directly call the btrfs btree writepage function, which isn't actually allowed. Our writepage assumes that you have locked the extent_buffer and flagged the block as written. Without doing these steps, we can corrupt metadata blocks. A later commit will remove the btree writepage function since it is really only safely used internally by btrfs. We use writepages for everything else. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-20Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard fs: Do not dispatch FITRIM through separate super_operation ext4: ext4_fill_super shouldn't return 0 on corruption jbd2: fix /proc/fs/jbd2/<dev> when using an external journal ext4: missing unlock in ext4_clear_request_list() ext4: fix setting random pages PageUptodate
2010-11-20ext4: Add EXT4_IOC_TRIM ioctl to handle batched discardLukas Czerner
Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-20fs: Do not dispatch FITRIM through separate super_operationLukas Czerner
There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix readdir EOVERFLOW on 32-bit archs ceph: fix frag offset for non-leftmost frags ceph: fix dangling pointer ceph: explicitly specify page alignment in network messages ceph: make page alignment explicit in osd interface ceph: fix comment, remove extraneous args ceph: fix update of ctime from MDS ceph: fix version check on racing inode updates ceph: fix uid/gid on resent mds requests ceph: fix rdcache_gen usage and invalidate ceph: re-request max_size if cap auth changes ceph: only let auth caps update max_size ceph: fix open for write on clustered mds ceph: fix bad pointer dereference in ceph_fill_trace ceph: fix small seq message skipping Revert "ceph: update issue_seq on cap grant"
2010-11-19ext4: ext4_fill_super shouldn't return 0 on corruptionDarrick J. Wong
At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit 30ca22c70e3ef0a96ff84de69cd7e8561b416cb2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-19GFS2: Userland expects quota limit/warn/usage in 512b blocksAbhijith Das
Userland programs using the quotactl() syscall assume limit/warn/usage block counts in 512b basic blocks which were instead being read/written in fs blocksize in gfs2. With this patch, gfs2 correctly interacts with the syscall using 512b blocks. Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-18ocfs2_connection_find() returns pointer to bad structuredann frazier
If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause an oops when ocfs2_control_send_down() dereferences c->oc_conn: Call Trace: [<ffffffffa00c2a3c>] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user] [<ffffffffa00c2a95>] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user] [<ffffffff81143a88>] vfs_write+0xb8/0x1a0 [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0 [<ffffffff811442f1>] sys_write+0x51/0x80 [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b Fix by explicitly returning NULL if no match is found. Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-18ocfs2: char is not always signedMilton Miller
Commit 1c66b360fe262 (Change some lock status member in ocfs2_lock_res to char.) states that these fields need to be signed due to comparision to -1, but only changed the type from unsigned char to char. However, it is a compiler option if char is a signed or unsigned type. Change these fields to signed char so the code will work with all compilers. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-18Ocfs2: Stop tracking a negative dentry after dentry_iput().Tristan Ye
I suddenly hit the problem during 2.6.37-rc1 regression test, which was introduced by commit '5e98d492406818e6a94c0ba54c61f59d40cefa4a'(Track negative entries v3), following scenario reproduces the issue easily: Node A Node B ================ ============ $touch testfile $ls testfile $rm -rf testfile $touch testfile $ls testfile ls: cannot access testfile: No such file or directory This patch stops tracking the dentry which was negativated by a inode deletion, so as to force the revaliation in next lookup, in case we'll touch the inode again in the same node. It didn't hurt the performance of multiple lookup for none-existed files anyway, while regresses a bit in the first try after a file deletion. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-18ocfs2: fix memory leakJiri Slaby
Stanse found that o2hb_heartbeat_group_make_item leaks some memory on fail paths. Fix the paths by adding a new label and jump there. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: ocfs2-devel@oss.oracle.com Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-18fs/ocfs2/dlm: Use GFP_ATOMIC under spin_lockDavid Sterba
coccinelle check scripts/coccinelle/locks/call_kern.cocci found that in fs/ocfs2/dlm/dlmdomain.c an allocation with GFP_KERNEL is done with locks held: dlm_query_region_handler spin_lock(dlm_domain_lock) dlm_match_regions kmalloc(GFP_KERNEL) Change it to GFP_ATOMIC. Signed-off-by: David Sterba <dsterba@suse.cz> CC: Joel Becker <joel.becker@oracle.com> CC: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com -- Exists in v2.6.37-rc1 and current linux-next. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-18ceph: fix readdir EOVERFLOW on 32-bit archsSage Weil
One of the readdir filldir_t callers was passing the raw ceph 64-bit ino instead of the hashed 32-bit one, producing an EOVERFLOW in the filler callback. Fix this by calling the ceph_vino_to_ino() helper to do the conversion. Reported-by: Jan Smets <jan.smets@alcatel-lucent.com> Tested-by: Jan Smets <jan.smets@alcatel-lucent.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-18jbd2: fix /proc/fs/jbd2/<dev> when using an external journalyangsheng
In jbd2_journal_init_dev(), we need make sure the journal structure is fully initialzied before calling jbd2_stats_proc_init(). Reviewed-by: Andreas Dilger <andreas.dilger@oracle.com> Signed-off-by: yangsheng <sheng.yang@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-18ext4: missing unlock in ext4_clear_request_list()Dan Carpenter
If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-18ext4: fix setting random pages PageUptodateMarkus Trippelsdorf
ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-17BKL: remove references to lock_kernel from commentsArnd Bergmann
Lock_kernel is gone from the code, so the comments should be updated, too. nfsd now uses lock_flocks instead of lock_kernel to protect against posix file locks. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17BKL: remove extraneous #include <smp_lock.h>Arnd Bergmann
The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-16nfs: Ignore kmemleak false positive in nfs_readdir_make_qstrCatalin Marinas
Strings allocated via kmemdup() in nfs_readdir_make_qstr() are referenced from the nfs_cache_array which is stored in a page cache page. Kmemleak does not scan such pages and it reports several false positives. This patch annotates the string->name pointer so that kmemleak does not consider it a real leak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16NFS: readdir shouldn't read beyond the reply returned by the serverTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16NFS: Fix a couple of regressions in readdir.Trond Myklebust
Fix up the issue that array->eof_index needs to be able to be set even if array->size == 0. Ensure that we catch all important memory allocation error conditions and/or kmap() failures. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns ↵Trond Myklebust
EISDIR" This reverts commit 80e60639f1b7c121a7fea53920c5a4b94009361a. This change requires further fixes to ensure that the open doesn't succeed if the lookup later results in a regular file being created. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16Regression: fix mounting NFS when NFSv3 support is not compiledPaulius Zaleckas
Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3 is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT, because of this check: #ifndef CONFIG_NFS_V3 if (args->version == 3) goto out_v3_not_compiled; #endif /* !CONFIG_NFS_V3 */ and args->version was always initialized to 3. It was working in 2.6.36 Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16NLM: Fix a regression in lockdTrond Myklebust
Nick Bowler reports: There are no unusual messages on the client... but I just logged into the server and I see lots of messages of the following form: nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! Bisected to commit 9247685088398cf21bcb513bd2832b4cd42516c4 (SUNRPC: Properly initialize sock_xprt.srcaddr in all cases) Apparently, removing the 'transport->srcaddr.ss_family = family' from xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly initialising the srcaddr family to AF_UNSPEC. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix inode deallocation race
2010-11-15GFS2: Fix inode deallocation raceSteven Whitehouse
This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-15ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()Greg Thelen
Using: - CONFIG_LOCKUP_DETECTOR=y - CONFIG_PREEMPT=y - CONFIG_LOCKDEP=y - CONFIG_PROVE_LOCKING=y - CONFIG_PROVE_RCU=y found a missing rcu lock during boot on a 512 MiB x86_64 ubuntu vm: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/pid.c:419 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by ureadahead/1355: #0: (tasklist_lock){.+.+..}, at: [<ffffffff8115bc09>] sys_ioprio_set+0x7f/0x29e stack backtrace: Pid: 1355, comm: ureadahead Not tainted 2.6.37-dbg-DEV #1 Call Trace: [<ffffffff8109c10c>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff81088cbf>] find_task_by_pid_ns+0x44/0x5d [<ffffffff81088cfa>] find_task_by_vpid+0x22/0x24 [<ffffffff8115bc3e>] sys_ioprio_set+0xb4/0x29e [<ffffffff8147cf21>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8105c409>] sysenter_dispatch+0x7/0x2c [<ffffffff8147cee2>] ? trace_hardirqs_on_thunk+0x3a/0x3f The fix is to: a) grab rcu lock in sys_ioprio_{set,get}() and b) avoid grabbing tasklist_lock. Discussion in: http://marc.info/?l=linux-kernel&m=128951324702889 Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Modified by Jens to remove the now redundant inner rcu lock and unlock since they are now protected by the outer lock. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-11-14Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Change some lock status member in ocfs2_lock_res to char.
2010-11-14[CIFS] fs/cifs/Kconfig: CIFS depends on CRYPTO_HMACSteve French
linux-2.6.37-rc1: I compiled a kernel with CIFS which subsequently failed with an error indicating it couldn't initialize crypto module "hmacmd5". CONFIG_CRYPTO_HMAC=y fixed the problem. This patch makes CIFS depend on CRYPTO_HMAC in kconfig. Signed-off-by: Jody Bruchon<jody@nctritech.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-11-13ocfs2: Change some lock status member in ocfs2_lock_res to char.Tao Ma
Commit 83fd9c7 changes l_level, l_requested and l_blocking of ocfs2_lock_res from int to unsigned char. But actually it is initially as -1(ocfs2_lock_res_init_common) which correspoding to 255 for unsigned char. So the whole dlm lock mechanism doesn't work now which means a disaster to ocfs2. Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-11-13cifs: don't take extra tlink reference in initiate_cifs_searchJeff Layton
It's possible for initiate_cifs_search to be called on a filp that already has private_data attached. If this happens, we'll end up calling cifs_sb_tlink, taking an extra reference to the tlink and attaching that to the cifsFileInfo. This leads to refcount leaks that manifest as a "stuck" cifsd at umount time. Fix this by only looking up the tlink for the cifsFile on the filp's first pass through this function. When called on a filp that already has cifsFileInfo associated with it, just use the tlink reference that it already owns. This patch fixes samba.org bug 7792: https://bugzilla.samba.org/show_bug.cgi?id=7792 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-11-12Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: remove unused copy_io_context() Documentation: remove anticipatory scheduler info block: remove REQ_HARDBARRIER ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) ioprio: fix RCU locking around task dereference block: ioctl: fix information leak to userland block: read i_size with i_size_read() cciss: fix proc warning on attempt to remove non-existant directory bio: take care not overflow page count when mapping/copying user data block: limit vec count in bio_kmalloc() and bio_alloc_map_data() block: take care not to overflow when calculating total iov length block: check for proper length of iov entries in blk_rq_map_user_iov() cciss: remove controllers supported by hpsa cciss: use usleep_range not msleep for small sleeps cciss: limit commands allocated on reset_devices cciss: Use kernel provided PCI state save and restore functions cciss: fix board status waiting code drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code ...
2010-11-12Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: remove incorrect assert in xfs_vm_writepage xfs: use hlist_add_fake xfs: fix a few compiler warnings with CONFIG_XFS_QUOTA=n xfs: tell lockdep about parent iolock usage in filestreams xfs: move delayed write buffer trace xfs: fix per-ag reference counting in inode reclaim tree walking xfs: xfs_ioctl: fix information leak to userland xfs: remove experimental tag from the delaylog option
2010-11-12Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: locks: remove dead lease error-handling code locks: fix leak on merging leases nfsd4: fix 4.1 connection registration race