summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-06-09Btrfs: unlock the trans lock properlyJosef Bacik
In btrfs_wait_for_commit if we came upon a transaction that had committed we just exited, but that's bad since we are holding the trans_lock. So break instead so that the lock is dropped. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09Btrfs: don't map extent buffer if path->skip_locking is setJosef Bacik
Arne's scrub stuff exposed a problem with mapping the extent buffer in reada_for_search. He searches the commit root with multiple threads and with skip_locking set, so we can race and overwrite node->map_token since node isn't locked. So fix this so that we only map the extent buffer if we don't already have a map_token and skip_locking isn't set. Without this patch scrub would panic almost immediately, with the patch it doesn't panic anymore. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09Merge branch 'urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent
2011-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: trivial: add space in fsc error message cifs: silence printk when establishing first session on socket CIFS ACL support needs CONFIG_KEYS, so depend on it possible memory corruption in cifs_parse_mount_options() cifs: make CIFS depend on CRYPTO_ECB cifs: fix the kernel release version in the default security warning message
2011-06-08Btrfs: fix duplicate checking logicJosef Bacik
When merging my code into the integration test the second check for duplicate entries got screwed up. This patch fixes it by dropping ret2 and just using ret for the return value, and checking if we got an error before adding the bitmap to the local list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: fix the allocator loop logicJosef Bacik
I was testing with empty_cluster = 0 to try and reproduce a problem and kept hitting early enospc panics. This was because our loop logic was a little confused. So this is what I did 1) Make the loop variable the ultimate decider on wether we should loop again isntead of checking to see if we had an uncached bg, empty size or empty cluster. 2) Increment loop before checking to see what we are on to make the loop definitions make more sense. 3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0 unless we didn't actually allocate a chunk. If we did allocate a chunk we should be able to easily setup a new cluster so clearing empty_size/empty_cluster makes us less efficient. This kept me from hitting panics while trying to reproduce the other problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: fix bitmap regressionJosef Bacik
In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: don't commit the transaction if we dont have enough pinned bytesJosef Bacik
I noticed when running an enospc test that we would get stuck committing the transaction in check_data_space even though we truly didn't have enough space. So check to see if bytes_pinned is bigger than num_bytes, if it's not don't commit the transaction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: noinline the cluster searching functionsJosef Bacik
When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: cache bitmaps when searching for a clusterJosef Bacik
If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08cifs: trivial: add space in fsc error messageJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-08ceph: unwind canceled flock stateSage Weil
If we request a lock and then abort (e.g., ^C), we need to send a matching unlock request to the MDS to unwind our lock attempt to avoid indefinitely blocking other clients. Reported-by: Brian Chrisman <brchrisman@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-08ceph: fix ENOENT logic in striped_readSage Weil
Getting ENOENT is equivalent to reading 0 bytes. Make that correction before setting up the hit_stripe and was_short flags. Fixes the following case: dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct Reported-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-08ceph: fix short sync reads from the OSDSage Weil
If we get a short read from the OSD because the object is small, we need to zero the remainder of the buffer. For O_DIRECT reads, the attempted range is not trimmed to i_size by the VFS, so we were actually looping indefinitely. Fix by trimming by i_size, and the unconditionally zeroing the trailing range. Reported-by: Jeff Wu <cpwu@tnsoft.com.cn> Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-08ceph: use ihold when we already have an inode refSage Weil
We should use ihold whenever we already have a stable inode ref, even when we aren't holding i_lock. This avoids adding new and unnecessary locking dependencies. Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Fix corrupt inode flags when remove ATTR_SYS flag
2011-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Processes waiting on inode glock that no processes are holding
2011-06-08Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: make unlink() and rmdir() return ENOENT in preference to EROFS lmLogOpen() broken failure exit usb: remove bad dput after dentry_unhash more conservative S_NOSEC handling
2011-06-07vfs: make unlink() and rmdir() return ENOENT in preference to EROFSTheodore Ts'o
If user space attempts to remove a non-existent file or directory, and the file system is mounted read-only, return ENOENT instead of EROFS. Either error code is arguably valid/correct, but ENOENT is a more specific error message. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-07lmLogOpen() broken failure exitAl Viro
Callers of lmLogOpen() expect it to return -E... on failure exits, which is what it returns, except for the case of blkdev_get_by_dev() failure. It that case lmLogOpen() return the error with the wrong sign... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-07cifs: silence printk when establishing first session on socketJeff Layton
When signing is enabled, the first session that's established on a socket will cause a printk like this to pop: CIFS VFS: Unexpected SMB signature This is because the key exchange hasn't happened yet, so the signature field is bogus. Don't try to check the signature on the socket until the first session has been established. Also, eliminate the specific check for SMB_COM_NEGOTIATE since this check covers that case too. Cc: stable@kernel.org Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-06CIFS ACL support needs CONFIG_KEYS, so depend on itDarren Salt
Build fails if CONFIG_KEYS is not selected. Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-06possible memory corruption in cifs_parse_mount_options()Vasily Averin
error path after mountdata check frees uninitialized mountdata_copy Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) btrfs: fix uninitialized variable warning btrfs: add helper for fs_info->closing Btrfs: add mount -o inode_cache btrfs: scrub: add explicit plugging btrfs: use btrfs_ino to access inode number Btrfs: don't save the inode cache if we are deleting this root btrfs: false BUG_ON when degraded Btrfs: don't save the inode cache in non-FS roots Btrfs: make sure we don't overflow the free space cache crc page Btrfs: fix uninit variable in the delayed inode code btrfs: scrub: don't reuse bios and pages Btrfs: leave spinning on lookup and map the leaf Btrfs: check for duplicate entries in the free space cache Btrfs: don't try to allocate from a block group that doesn't have enough space Btrfs: don't always do readahead Btrfs: try not to sleep as much when doing slow caching Btrfs: kill BTRFS_I(inode)->block_group Btrfs: don't look at the extent buffer level 3 times in a row Btrfs: map the node block when looking for readahead targets Btrfs: set range_start to the right start in count_range_bits ...
2011-06-04btrfs: fix uninitialized variable warningDavid Sterba
With Linus' tree, today's linux-next build (powercp ppc64_defconfig) produced this warning: fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode': fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used uninitialized in this function Introduced by commit 16cdcec736cd ("btrfs: implement delayed inode items operation"). This fixes a bug in btrfs_update_inode(): if the returned value from btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not updated and several call paths may hit a BUG_ON or fail with strange code. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David Sterba <dsterba@suse.cz>
2011-06-04btrfs: add helper for fs_info->closingDavid Sterba
wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-06-04Btrfs: add mount -o inode_cacheChris Mason
This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04btrfs: scrub: add explicit pluggingArne Jansen
With the removal of the implicit plugging scrub ends up doing more and smaller I/O than necessary. This patch adds explicit plugging per chunk. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04btrfs: use btrfs_ino to access inode numberDavid Sterba
commit 4cb5300bc ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04Btrfs: don't save the inode cache if we are deleting this rootJosef Bacik
With xfstest 254 I can panic the box every time with the inode number caching stuff on. This is because we clean the inodes out when we delete the subvolume, but then we write out the inode cache which adds an inode to the subvolume inode tree, and then when it gets evicted again the root gets added back on the dead roots list and is deleted again, so we have a double free. To stop this from happening just return 0 if refs is 0 (and we're not the tree root since tree root always has refs of 0). With this fix 254 no longer panics. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04btrfs: false BUG_ON when degradedArne Jansen
In degraded mode the struct btrfs_device of missing devs don't have device->name set. A kstrdup of NULL correctly returns NULL. Don't BUG in this case. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04Btrfs: don't save the inode cache in non-FS rootsliubo
This adds extra checks to make sure the inode map we are caching really belongs to a FS root instead of a special relocation tree. It prevents crashes during balancing operations. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04Btrfs: make sure we don't overflow the free space cache crc pageChris Mason
The free space cache uses only one page for crcs right now, which means we can't have a cache file bigger than the crcs we can fit in the first page. This adds a check to enforce that restriction. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04Btrfs: fix uninit variable in the delayed inode codeChris Mason
The nitems counter needs to start at zero Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-04btrfs: scrub: don't reuse bios and pagesArne Jansen
The current scrub implementation reuses bios and pages as often as possible, allocating them only on start and releasing them when finished. This leads to more problems with the block layer than it's worth. The elevator gets confused when there are more pages added to the bio than bi_size suggests. This patch completely rips out the reuse of bios and pages and allocates them freshly for each submit. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Maosn <chris.mason@oracle.com>
2011-06-03Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: block: Use hlist_entry() for io_context.cic_list.first cfq-iosched: Remove bogus check in queue_fail path xen/blkback: potential null dereference in error handling xen/blkback: don't call vbd_size() if bd_disk is NULL block: blkdev_get() should access ->bd_disk only after success CFQ: Fix typo and remove unnecessary semicolon block: remove unwanted semicolons Revert "block: Remove extra discard_alignment from hd_struct." nbd: adjust 'max_part' according to part_shift nbd: limit module parameters to a sane value nbd: pass MSG_* flags to kernel_recvmsg() block: improve the bio_add_page() and bio_add_pc_page() descriptions
2011-06-03Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix-up free space earlier UBIFS: intialize LPT earlier UBIFS: assert no fixup when writing a node UBIFS: fix clean znode counter corruption in error cases UBIFS: fix memory leak on error path UBIFS: fix shrinker object count reports UBIFS: fix recovery broken by the previous recovery fix UBIFS: amend ubifs_recover_leb interface UBIFS: introduce a "grouped" journal head flag UBIFS: supress false error messages
2011-06-03more conservative S_NOSEC handlingAl Viro
Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-03cifs: make CIFS depend on CRYPTO_ECBSuresh Jayaraman
When CONFIG_CRYPTO_ECB is not set, trying to mount a CIFS share with NTLM security resulted in mount failure with the following error: "CIFS VFS: could not allocate des crypto API" Seems like a leftover from commit 43988d7. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> CC: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-03cifs: fix the kernel release version in the default security warning messageSuresh Jayaraman
When ntlm security mechanim is used, the message that warns about the upgrade to ntlmv2 got the kernel release version wrong (Blame it on Linus :). Fix it. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-03UBIFS: fix-up free space earlierBen Gardiner
The free space fixup is currently initiated during mount after the call to ubifs_write_master() which results in a write to PEBs; this has been observed with the patch 'assert no fixup when writing a node' applied: Move the free space fixup on mount to before the calls to ubifs_recover_inl_heads() and ubifs_write_master(). This results in no assertions with the previously mentioned patch applied. Artem: tweaked the patch a bit Signed-off-by: Ben Gardiner <bengardiner@nanometrics> Reviewed-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03UBIFS: intialize LPT earlierBen Gardiner
The current 'mount_ubifs()' implementation does not initialize the LPT until the the master node is marked dirty. Move the LPT initialization to before marking the master node dirty. This is a preparation for the next patch which will move the free-space-fixup check to before marking the master node dirty, because we have to fix-up the free space before doing any writes. Artem: massaged the patch and commit message. Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca> Reviewed-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03UBIFS: assert no fixup when writing a nodeBen Gardiner
The current free space fixup can result in some writing to the UBI volume when the space_fixup flag is set. To catch instances where UBIFS is writing to the NAND while the space_fixup flag is set, add an assert to ubifs_write_node(). Artem: tweaked the patch, added similar assertion to the write buffer write path. Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca> Reviewed-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03UBIFS: fix clean znode counter corruption in error casesArtem Bityutskiy
UBIFS maintains per-filesystem and global clean znode counters ('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'. However, in case of failures during commit the counters were corrupted. E.g., if a failure happens in the middle of 'write_index()', then some nodes in the commit list ('c->cnext') are marked as clean, and some are marked as dirty. And the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we have 2 file-sytem and one of them fails, and we unmount it, 'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-03UBIFS: fix memory leak on error pathArtem Bityutskiy
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write failure because it forgets to free the 'struct ubifs_dent_node *dent' object. Although the object is small, the alignment can make it large - e.g., 2KiB if the min. I/O unit is 2KiB. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-06-03UBIFS: fix shrinker object count reportsArtem Bityutskiy
Sometimes VM asks the shrinker to return amount of objects it can shrink, and we return the ubifs_clean_zn_cnt in that case. However, it is possible that this counter is negative for a short period of time, due to the way UBIFS TNC code updates it. And I can observe the following warnings sometimes: shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788 This patch makes sure UBIFS never returns negative count of objects. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-06-01UBIFS: fix recovery broken by the previous recovery fixArtem Bityutskiy
Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb (UBIFS: fix extremely rare mount failure) broke recovery. This commit make UBIFS drop the last min. I/O unit in all journal heads, but this is needed only for the GC head. And this does not work for non-GC heads. For example, if suppose we have min. I/O units A and B, and A contains a valid node X, which was fsynced, and then a group of nodes Y which spans the rest of A and B. In this case we'll drop not only Y, but also X, which is obviously incorrect. This patch fixes the issue and additionally makes recovery to drop last min. I/O unit only for the GC head, and leave things as they have been for ages for the other heads - this is safer. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-01UBIFS: amend ubifs_recover_leb interfaceArtem Bityutskiy
Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells whether the nodes are grouped in the LEB to recover, pass the journal head number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag. This patch is a preparation to a further fix where we'll need to know the journal head number for other purposes. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-01UBIFS: introduce a "grouped" journal head flagArtem Bityutskiy
Journal heads are different in a way how UBIFS writes nodes there. All normal journal heads receive grouped nodes, while the GC journal heads receives ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which describes this property. This patch is a preparation to a further recovery fix. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-06-01UBIFS: supress false error messagesArtem Bityutskiy
Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but it introduced a regression - now UBIFS prints scary error messages during recovery on all corrupted nodes, even though the corruptions are expected (due to a power cut). This patch fixes the issue. Additionally fix a typo in a commentary introduced by the same commit. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>