summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2011-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits) Btrfs: check for a null fs root when writing to the backup root log Btrfs: fix race during transaction joins Btrfs: fix a potential btrfs_bio leak on scrub fixups Btrfs: rename btrfs_bio multi -> bbio for consistency Btrfs: stop leaking btrfs_bios on readahead Btrfs: stop the readahead threads on failed mount Btrfs: fix extent_buffer leak in the metadata IO error handling Btrfs: fix the new inspection ioctls for 32 bit compat Btrfs: fix delayed insertion reservation Btrfs: ClearPageError during writepage and clean_tree_block Btrfs: be smarter about committing the transaction in reserve_metadata_bytes Btrfs: make a delayed_block_rsv for the delayed item insertion Btrfs: add a log of past tree roots btrfs: separate superblock items out of fs_info Btrfs: use the global reserve when truncating the free space cache inode Btrfs: release metadata from global reserve if we have to fallback for unlink Btrfs: make sure to flush queued bios if write_cache_pages waits Btrfs: fix extent pinning bugs in the tree log Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN Btrfs: don't wait as long for more batches during SSD log commit ...
2011-11-07Merge branch 'writeback-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Add a 'reason' to wb_writeback_work writeback: send work item to queue_io, move_expired_inodes writeback: trace event balance_dirty_pages writeback: trace event bdi_dirty_ratelimit writeback: fix ppc compile warnings on do_div(long long, unsigned long) writeback: per-bdi background threshold writeback: dirty position control - bdi reserve area writeback: control dirty pause time writeback: limit max dirty pause time writeback: IO-less balance_dirty_pages() writeback: per task dirty rate limit writeback: stabilize bdi->dirty_ratelimit writeback: dirty rate control writeback: add bg_threshold parameter to __bdi_update_bandwidth() writeback: dirty position control writeback: account per-bdi accumulated dirtied pages
2011-11-06Btrfs: check for a null fs root when writing to the backup root logChris Mason
During log replay, can commit the transaction before the fs_root pointers are setup, so we have to make sure they are not null before trying to use them. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix race during transaction joinsChris Mason
While we're allocating ram for a new transaction, we drop our spinlock. When we get the lock back, we do check to see if a transaction started while we slept, but we don't check to make sure it isn't blocked because a commit has already started. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix a potential btrfs_bio leak on scrub fixupsIlya Dryomov
In case we were able to map less than we wanted (length < PAGE_SIZE clause is true) btrfs_bio is still allocated and we have to free it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: rename btrfs_bio multi -> bbio for consistencyIlya Dryomov
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: stop leaking btrfs_bios on readaheadIlya Dryomov
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: stop the readahead threads on failed mountChris Mason
If we don't stop them, they linger around corrupting memory by using pointers to freed things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix extent_buffer leak in the metadata IO error handlingChris Mason
The scrub readahead branch brought in a new error handling hook, but it was leaking extent_buffer references. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix the new inspection ioctls for 32 bit compatChris Mason
The new ioctls to follow backrefs are not clean for 32/64 bit compat. This reworks them for u64s everywhere. They are brand new, so there are no problems with changing the interface now. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Merge git://git.jan-o-sch.net/btrfs-unstable into integrationChris Mason
Conflicts: fs/btrfs/Makefile fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Merge branch 'for-chris' of git://github.com/sensille/linux into integrationChris Mason
Conflicts: fs/btrfs/ctree.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix delayed insertion reservationJosef Bacik
We all keep getting those stupid warnings from use_block_rsv when running stress.sh, and it's because the delayed insertion stuff is being stupid. It's not the delayed insertion stuffs fault, it's all just stupid. When marking an inode dirty for oh say updating the time on it, we just do a btrfs_join_transaction, which doesn't reserve any space. This is stupid because we're going to have to have space reserve to make this change, but we do it because it's fast because chances are we're going to call it over and over again and it doesn't matter. Well thanks to the delayed insertion stuff this is mostly the case, so we do actually need to make this reservation. So if trans->bytes_reserved is 0 then try to do a normal reservation. If not return ENOSPC which will make the btrfs_dirty_inode start a proper transaction which will let it do the whole ENOSPC dance and reserve enough space for the delayed insertion to steal the reservation from the transaction. The other stupid thing we do is not reserve space for the inode when writing to the thing. Usually this is ok since we have to update the time so we'd have already done all this work before we get to the endio stuff, so it doesn't matter. But this is stupid because we could write the data after the transaction commits where we changed the mtime of the inode so we have to cow all the way down to the inode anyway. This used to be masked by the delalloc reservation stuff, but because we delay the update it doesn't get masked in this case. So again the delayed insertion stuff bites us in the ass. So if our trans->block_rsv is delalloc, just steal the reservation from the delalloc reserve. Hopefully this won't bite us in the ass, but I've said that before. With this patch stress.sh no longer spits out those stupid warnings (famous last words). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: ClearPageError during writepage and clean_tree_blockChris Mason
Failure testing was tripping up over stale PageError bits in metadata pages. If we have an io error on a block, and later on end up reusing it, nobody ever clears PageError on those pages. During commit, we'll find PageError and think we had trouble writing the block, which will lead to aborts and other problems. This changes clean_tree_block and the btrfs writepage code to clear the PageError bit. In both cases we're either completely done with the page or the page has good stuff and the error bit is no longer valid. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: be smarter about committing the transaction in reserve_metadata_bytesJosef Bacik
Because of the overcommit stuff I had to make it so that we committed the transaction all the time in reserve_metadata_bytes in case we had overcommitted because of delayed items. This was because previously we had no way of knowing how much space was reserved for delayed items. Now that we have the delayed_block_rsv we can check it to see if committing the transaction would get us anywhere. This patch breaks out the committing logic into a helper function that will check to see if committing the transaction would free enough space for us to get anything done. With this patch xfstests 83 goes from taking 445 seconds to taking 28 seconds on my box. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: make a delayed_block_rsv for the delayed item insertionJosef Bacik
I've been hitting warnings in use_block_rsv when running the delayed insertion stuff. It's because we will readjust global block rsv based on what is in use, which means we could end up discarding reservations that are for the delayed insertion stuff. So instead create a seperate block rsv for the delayed insertion stuff. This will also make it easier to debug problems with the delayed insertion reservations since we will know that only the delayed insertion code touches this block_rsv. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: add a log of past tree rootsChris Mason
This takes some of the free space in the btrfs super block to record information about most of the roots in the last four commits. It also adds a -o recovery to use the root history log when we're not able to read the tree of tree roots, the extent tree root, the device tree root or the csum root. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06btrfs: separate superblock items out of fs_infoDavid Sterba
fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-11-06Btrfs: use the global reserve when truncating the free space cache inodeJosef Bacik
We no longer use the orphan block rsv for holding the reservation for truncating the inode, so instead use the global block rsv and check to make sure it has enough space for us to truncate the space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: release metadata from global reserve if we have to fallback for unlinkJosef Bacik
I fixed a problem where we weren't reserving space for an orphan item when we had to fallback to using the global reserve for an unlink, but I introduced another problem. I was migrating the bytes from the transaction reserve to the global reserve and then releasing from the global reserve in btrfs_end_transaction(). The problem with this is that a migrate will jack up the size for the destination, but leave the size alone for the source, with the idea that you can do a release normally on the source and it all washes out, and then you can do a release again on the destination and it works out right. My way was skipping the release on the trans_block_rsv which still had the jacked up size from our original reservation. So instead release manually from the global reserve if this transaction was using it, and then set the trans->block_rsv back to the trans_block_rsv so that btrfs_end_transaction cleans everything up properly. With this patch xfstest 83 doesn't emit warnings about leaking space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: make sure to flush queued bios if write_cache_pages waitsChris Mason
write_cache_pages tries to build up a large bio to stuff down the pipe. But if it needs to wait for a page lock, it needs to make sure and send down any pending writes so we don't deadlock with anyone who has the page lock and is waiting for writeback of things inside the bio. Dave Sterba triggered this as a deadlock between the autodefrag code and the extent write_cache_pages Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: fix extent pinning bugs in the tree logChris Mason
The tree log had two important bugs that could cause corruptions after a crash. Sometimes we were allowing tree log blocks to be reused after the tree log was committed but before the transaction commit was done. This allowed a future metadata write to overwrite the tree log data. It is fixed by adding a new variant of freeing reserved extents that always pins them. Credit goes to Stefan Behrens and Arne Jansen for many many hours spent tracking this bug down. During tree log replay, we do a pass through the tree log and pin all the extents we find. This makes sure the replay code won't go in and use any of those blocks for new allocations during replay. The problem is the free space cache isn't honoring these pinned extents. So the allocator can end up handing them out, leading to all kinds of problems during replay. The fix here is to force any free space cache to load while we pin the extents, and then to make sure we remove the pinned extents from the free space rbtree. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-11-06Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAINChris Mason
btrfs_remove_free_space needs to make sure to set ret back to a valid return value after setting it to EAGAIN, otherwise we return it to the callers. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Btrfs: don't wait as long for more batches during SSD log commitChris Mason
When we're doing log commits, we try to wait for more writers to come in and make the commit bigger. This helps improve performance on rotating disks, but on SSDs it adds latencies. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-02filesystems: add set_nlink()Miklos Szeredi
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-30writeback: Add a 'reason' to wb_writeback_workCurt Wohlgemuth
This creates a new 'reason' field in a wb_writeback_work structure, which unambiguously identifies who initiates writeback activity. A 'wb_reason' enumeration has been added to writeback.h, to enumerate the possible reasons. The 'writeback_work_class' and tracepoint event class and 'writeback_queue_io' tracepoints are updated to include the symbolic 'reason' in all trace events. And the 'writeback_inodes_sbXXX' family of routines has had a wb_stats parameter added to them, so callers can specify why writeback is being started. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-28Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits) leases: fix write-open/read-lease race nfs: drop unnecessary locking in llseek ext4: replace cut'n'pasted llseek code with generic_file_llseek_size vfs: add generic_file_llseek_size vfs: do (nearly) lockless generic_file_llseek direct-io: merge direct_io_walker into __blockdev_direct_IO direct-io: inline the complete submission path direct-io: separate map_bh from dio direct-io: use a slab cache for struct dio direct-io: rearrange fields in dio/dio_submit to avoid holes direct-io: fix a wrong comment direct-io: separate fields only used in the submission path from struct dio vfs: fix spinning prevention in prune_icache_sb vfs: add a comment to inode_permission() vfs: pass all mask flags check_acl and posix_acl_permission vfs: add hex format for MAY_* flag values vfs: indicate that the permission functions take all the MAY_* flags compat: sync compat_stats with statfs. vfs: add "device" tag to /proc/self/mountstats cleanup: vfs: small comment fix for block_invalidatepage ... Fix up trivial conflict in fs/gfs2/file.c (llseek changes)
2011-10-28vfs: do (nearly) lockless generic_file_llseekAndi Kleen
The i_mutex lock use of generic _file_llseek hurts. Independent processes accessing the same file synchronize over a single lock, even though they have no need for synchronization at all. Under high utilization this can cause llseek to scale very poorly on larger systems. This patch does some rethinking of the llseek locking model: First the 64bit f_pos is not necessarily atomic without locks on 32bit systems. This can already cause races with read() today. This was discussed on linux-kernel in the past and deemed acceptable. The patch does not change that. Let's look at the different seek variants: SEEK_SET: Doesn't really need any locking. If there's a race one writer wins, the other loses. For 32bit the non atomic update races against read() stay the same. Without a lock they can also happen against write() now. The read() race was deemed acceptable in past discussions, and I think if it's ok for read it's ok for write too. => Don't need a lock. SEEK_END: This behaves like SEEK_SET plus it reads the maximum size too. Reading the maximum size would have the 32bit atomic problem. But luckily we already have a way to read the maximum size without locking (i_size_read), so we can just use that instead. Without i_mutex there is no synchronization with write() anymore, however since the write() update is atomic on 64bit it just behaves like another racy SEEK_SET. On non atomic 32bit it's the same as SEEK_SET. => Don't need a lock, but need to use i_size_read() SEEK_CUR: This has a read-modify-write race window on the same file. One could argue that any application doing unsynchronized seeks on the same file is already broken. But for the sake of not adding a regression here I'm using the file->f_lock to synchronize this. Using this lock is much better than the inode mutex because it doesn't synchronize between processes. => So still need a lock, but can use a f_lock. This patch implements this new scheme in generic_file_llseek. I dropped generic_file_llseek_unlocked and changed all callers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-25Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds
* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits) TOMOYO: Fix incomplete read after seek. Smack: allow to access /smack/access as normal user TOMOYO: Fix unused kernel config option. Smack: fix: invalid length set for the result of /smack/access Smack: compilation fix Smack: fix for /smack/access output, use string instead of byte Smack: domain transition protections (v3) Smack: Provide information for UDS getsockopt(SO_PEERCRED) Smack: Clean up comments Smack: Repair processing of fcntl Smack: Rule list lookup performance Smack: check permissions from user space (v2) TOMOYO: Fix quota and garbage collector. TOMOYO: Remove redundant tasklist_lock. TOMOYO: Fix domain transition failure warning. TOMOYO: Remove tomoyo_policy_memory_lock spinlock. TOMOYO: Simplify garbage collector. TOMOYO: Fix make namespacecheck warnings. target: check hex2bin result encrypted-keys: check hex2bin result ...
2011-10-24btrfs: ratelimit WARN_ON in use_block_rsvDavid Sterba
The WARN_ON under some circumstances heavily polute log and slow down the machine. This is just a safety, as the warning should be fixed by another patch, nevertheless, it still pops up during testing. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-10-24Merge branch 'hotfixes-20111024/josef/for-chris' into btrfs-next-stableDavid Sterba
2011-10-24Merge remote-tracking branch 'remotes/josef/for-chris' into btrfs-next-stableDavid Sterba
2011-10-24btrfs: do not allow mounting non-subvolumes via subvol optionDavid Sterba
There's a missing test whether the path passed to subvol=path option during mount is a real subvolume, allowing any directory located in default subovlume to be passed and accepted for mount. (current btrfs progs prevent this early) $ btrfs subvol snapshot . p1-snap ERROR: '.' is not a subvolume (with "is subvolume?" test bypassed) $ btrfs subvol snapshot . p1-snap Create a snapshot of '.' in './p1-snap' $ btrfs subvol list -p . ID 258 parent 5 top level 5 path subvol ID 259 parent 5 top level 5 path subvol1 ID 260 parent 5 top level 5 path default-subvol1 ID 262 parent 5 top level 5 path p1/p1-snapshot ID 263 parent 259 top level 5 path subvol1/subvol1-snap The problem I see is that this makes a false impression of snapshotting the given subvolume but in fact snapshots the default one: a user expects outcome like ID 263 but in fact gets ID 262 . This patch makes mount fail with EINVAL with a message in syslog. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-10-20Btrfs: close all bdevs on mount failureIlya Dryomov
Fix a bug introduced by 20b45077. We have to return EINVAL on mount failure, but doing that too early in the sequence leaves all of the devices opened exclusively. This also fixes an issue where under some scenarios only a second mount -o degraded <devices> command would succeed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-10-20Btrfs: fix a bug when opening seed devicesIlya Dryomov
Initialize fs_info->bdev_holder a bit earlier to be able to pass a correct holder id to blkdev_get() when opening seed devices with O_EXCL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-10-20btrfs: fix oops on failure pathDaniel J Blueman
If lookup_extent_backref fails, path->nodes[0] reasonably could be null along with other callers of btrfs_print_leaf, so ensure we have a valid extent buffer before dereferencing. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
2011-10-20Btrfs: fix race between multi-task space allocation and caching spaceMiao Xie
The task may fail to get free space though it is enough when multi-task space allocation and caching space happen at the same time. Task1 Caching Thread Task2 ------------------------------------------------------------------------ find_free_extent The space has not be cached, and start caching thread. And wait for it. cache space, if the space is > 2MB wake up Task1 find_free_extent get all the space that is cached. try to allocate space, but there is no space now. trigger BUG_ON() The message is following: btrfs allocation failed flags 1, wanted 4096 space_info has 1040187392 free, is not full space_info total=1082130432, used=4096, pinned=41938944, reserved=0, may_use=40828928, readonly=0 block group 12582912 has 8388608 bytes, 0 used 8388608 pinned 0 reserved block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 1103101952 has 1073741824 bytes, 4096 used 33550336 pinned 0 reserved block group has cluster?: no 0 blocks of free space at or bigger than bytes is ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:835! [<ffffffffa031261b>] __extent_writepage+0x1bf/0x5ce [btrfs] [<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108 [<ffffffffa02f8ada>] ? wait_current_trans+0x23/0xec [btrfs] [<ffffffff810c3fbf>] ? find_get_pages_tag+0x73/0xe2 [<ffffffffa0312d12>] extent_write_cache_pages.clone.0+0x176/0x29a [btrfs] [<ffffffffa0312e74>] extent_writepages+0x3e/0x53 [btrfs] [<ffffffff8110ad2c>] ? do_sync_write+0xc6/0x103 [<ffffffffa0302d6e>] ? btrfs_submit_direct+0x414/0x414 [btrfs] [<ffffffff811380fa>] ? fsnotify+0x236/0x266 [<ffffffffa02fc930>] btrfs_writepages+0x22/0x24 [btrfs] [<ffffffff810cc215>] do_writepages+0x1c/0x25 [<ffffffff810c4958>] __filemap_fdatawrite_range+0x4e/0x50 [<ffffffff810c4982>] filemap_write_and_wait_range+0x28/0x51 [<ffffffffa0306b2e>] btrfs_sync_file+0x7d/0x198 [btrfs] [<ffffffff8110aa26>] ? fsnotify_modify+0x5d/0x65 [<ffffffff8112d150>] vfs_fsync_range+0x18/0x21 [<ffffffff8112d170>] vfs_fsync+0x17/0x19 [<ffffffff8112d316>] do_fsync+0x29/0x3e [<ffffffff8112d348>] sys_fsync+0xb/0xf [<ffffffff81468352>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa02fe08c>] cow_file_range+0x1c4/0x32b [btrfs] We fix this bug by trying to allocate the space again if there are block groups in caching. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2011-10-20Btrfs: fix return value of btrfs_get_acl()Tsutomu Itoh
In btrfs_get_acl(), when the second __btrfs_getxattr() call fails, acl is not correctly set. Therefore, a wrong value might return to the caller. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2011-10-20Btrfs: pass the correct root to lookup_free_space_inode()Ilya Dryomov
Free space items are located in tree of tree roots, not in the extent tree. It didn't pop up because lookup_free_space_inode() grabs the inode all the time instead of actually searching the tree. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-10-20Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOCLiu Bo
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2011-10-20Btrfs: fix direct-io vs nodatacowLi Zefan
To reproduce the bug: # mount -o nodatacow /dev/sda7 /mnt/ # dd if=/dev/zero of=/mnt/tmp bs=4K count=1 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000136115 s, 30.1 MB/s # dd if=/dev/zero of=/mnt/tmp bs=4K count=1 conv=notrunc oflag=direct dd: writing `/mnt/tmp': Input/output error 1+0 records in 0+0 records out btrfs_ordered_update_i_size() may return 1, but btrfs_endio_direct_write() mistakenly takes it as an error. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20Btrfs: remove BUG_ON() in compress_file_range()Li Zefan
It's not a big deal if we fail to allocate the array, and instead of panic we can just give up compressing. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20Btrfs: fix array bound checkingLi Zefan
Otherwise we can execced the array bound of path->slots[]. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20btrfs: return EINVAL if start > total_bytes in fitrim ioctlLukas Czerner
We should retirn EINVAL if the start is beyond the end of the file system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate check for it. Also in the btrfs_trim_fs() it is possible that len+start might overflow if big values are passed. Fix it by decrementing the len so that start+len is equal to the file system size in the worst case. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2011-10-20Btrfs: honor extent thresh during defragmentationLi Zefan
We won't defrag an extent, if it's bigger than the threshold we specified and there's no small extent before it, but actually the code doesn't work this way. There are three bugs: - When should_defrag_range() decides we should keep on defragmenting an extent, last_len is not incremented. (old bug) - The length that passes to should_defrag_range() is not the length we're going to defrag. (new bug) - We always defrag 256K bytes data, and a big extent can be part of this range. (new bug) For a file with 4 extents: | 4K | 4K | 256K | 256K | The result of defrag with (the default) 256K extent thresh should be: | 264K | 256K | but with those bugs, we'll get: | 520K | Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options()Jeff Liu
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
2011-10-20Btrfs: fix wrong max_to_defrag in btrfs_defrag_file()Li Zefan
It's off-by-one, and thus we may skip the last page while defragmenting. An example case: # create /mnt/file with 2 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag /mnt/file /mnt/file: 2 extents found So it's not defragmented. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20Btrfs: use i_size_read() in btrfs_defrag_file()Li Zefan
Don't use inode->i_size directly, since we're not holding i_mutex. This also fixes another bug, that i_size can change after it's checked against 0 and then (i_size - 1) can be negative. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20Btrfs: fix defragmentation regressionLi Zefan
There's an off-by-one bug: # create a file with lots of 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 64 1 64 3136 3435 1 2 65 3436 3136 64 3 129 3201 3499 1 4 130 3500 3201 64 5 194 3266 3563 1 6 195 3564 3266 64 7 259 3331 3627 1 8 260 3628 3331 40 eof After this patch: ... # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 300 eof /mnt/file: 1 extent found Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-10-20btrfs: fix memory leak in btrfs_defrag_fileDiego Calleja
kmemleak found this: unreferenced object 0xffff8801b64af968 (size 512): comm "btrfs-cleaner", pid 3317, jiffies 4306810886 (age 903.272s) hex dump (first 32 bytes): 00 82 01 07 00 ea ff ff c0 83 01 07 00 ea ff ff ................ 80 82 01 07 00 ea ff ff c0 87 01 07 00 ea ff ff ................ backtrace: [<ffffffff816875cc>] kmemleak_alloc+0x5c/0xc0 [<ffffffff8114aec3>] kmem_cache_alloc_trace+0x163/0x240 [<ffffffff8127a290>] btrfs_defrag_file+0xf0/0xb20 [<ffffffff8125d9a5>] btrfs_run_defrag_inodes+0x165/0x210 [<ffffffff812479d7>] cleaner_kthread+0x177/0x190 [<ffffffff81075c7d>] kthread+0x8d/0xa0 [<ffffffff816af5f4>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff "pages" is not always freed. Fix it removing the unnecesary additional return. Signed-off-by: Diego Calleja <diegocg@gmail.com>