summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-05-25xfs: do not do pointer arithmetic on extent recordsChristoph Hellwig
We need to call xfs_iext_get_ext for the previous extent to get a valid pointer, and can't just do pointer arithmetics as they might be in different pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-25xfs: do not use unchecked extent indices in xfs_bunmapiChristoph Hellwig
Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bunmapi. Also remove the old workaround for too large indices that has been superceeded by the proper fix in xfs_bmap_del_extent. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-25xfs: do not use unchecked extent indices in xfs_bmapiChristoph Hellwig
Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bmapi. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-25xfs: do not use unchecked extent indices in xfs_bmap_add_extent_*Christoph Hellwig
Make sure to only call xfs_iext_get_ext after we've validate the extent index in the various xfs_bmap_add_extent_* helpers. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-25xfs: remove if_lastexChristoph Hellwig
The if_lastex field in struct xfs_ifork is only used as a temporary index during xfs_bmapi and xfs_bunmapi. Instead of using the inode fork to store it keep it local in the callchain. Fortunately this is very easy as we already pass a stack copy of it down the whole chain which can simplify be changed to be passed by reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-25xfs: remove the unused XFS_BMAPI_RSVBLOCKS flagChristoph Hellwig
The XFS_BMAPI_RSVBLOCKS is unused, and as far as I can see has always been. Remove it to simplify the bmapi implementation and conserve stack space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-24xfs: do not discard alloc btree blocksChristoph Hellwig
Blocks for the allocation btree are allocated from and released to the AGFL, and thus frequently reused. Even worse we do not have an easy way to avoid using an AGFL block when it is discarded due to the simple FILO list of free blocks, and thus can frequently stall on blocks that are currently undergoing a discard. Add a flag to the busy extent tracking structure to skip the discard for allocation btree blocks. In normal operation these blocks are reused frequently enough that there is no need to discard them anyway, but if they spill over to the allocation btree as part of a balance we "leak" blocks that we would otherwise discard. We could fix this by adding another flag and keeping these block in the rbtree even after they aren't busy any more so that we could discard them when they migrate out of the AGFL. Given that this would cause significant overhead I don't think it's worthwile for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-24xfs: add online discard supportChristoph Hellwig
Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: obey minleft values during extent allocation correctlyDave Chinner
When allocating an extent that is long enough to consume the remaining free space in an AG, we need to ensure that the allocation leaves enough space in the AG for any subsequent bmap btree blocks that are needed to track the new extent. These have to be allocated in the same AG as we only reserve enough blocks in an allocation transaction for modification of the freespace trees in a single AG. xfs_alloc_fix_minleft() has been considering blocks on the AGFL as free blocks available for extent and bmbt block allocation, which is not correct - blocks on the AGFL are there exclusively for the use of the free space btrees. As a result, when minleft is less than the number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim the given extent to leave minleft blocks available for bmbt allocation, and hence we can fail allocation during bmbt record insertion. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: reset buffer pointers before freeing themDave Chinner
When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: avoid getting stuck during async inode flushesDave Chinner
When the underlying inode buffer is locked and xfs_sync_inode_attr() is doing a non-blocking flush, xfs_iflush() can return EAGAIN. When this happens, clear the error rather than returning it to xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk delaying for a short while and trying again. This can result in background walks getting stuck on the one AG until inode buffer is unlocked by some other means. This behaviour was noticed when analysing event traces followed by code inspection and verification of the fix via further traces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: fix xfs_itruncate_start tracingDave Chinner
Variables are ordered incorrectly in trace call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: fix duplicate workqueue initialisationDave Chinner
The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-19xfs: kill off xfs_printk()Joe Perches
xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid using xfs_printk() altogether. This is the only remaining use of xfs_printk(), so changing it this way means xfs_printk() can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated. Also add format checking to the non-debug inline function xfs_debug. Miscellaneous function prototype argument alignment. (Updated to delete the definition of xfs_printk(), which is no longer used or needed.) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-09xfs: fix race condition in AIL push triggerDave Chinner
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-05-09xfs: make AIL target updates and compares 32bit safe.Dave Chinner
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-05-09xfs: always push the AIL to the targetDave Chinner
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-05-09xfs: exit AIL push work correctly when AIL is emptyDave Chinner
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-05-09xfs: ensure reclaim cursor is reset correctly at end of AGDave Chinner
On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-By: Christian Kujau <lists@nerdbynature.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGEChristoph Hellwig
XFS_IOC_ZERO_RANGE uses struct xfs_flock64, and thus requires argument translation for 32-bit binaries on x86. Add the required XFS_IOC_ZERO_RANGE_32 defined and add it to the list of commands that require xfs_flock64 translation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: fix compiler warning in xfs_trace.hChristoph Hellwig
xfs_fsblock_t may be a 32-bit type on if XFS_BIG_BLKNOS is not set, make sure to cast a value of this type to an unsigned long long before using the ll printk qualifier. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: cleanup duplicate initializationsDavid Sterba
follow these guidelines: - leave initialization in the declaration block if it fits the line - move to the code where it's more suitable ('for' init block) The last chunk was modified from David's original to be a correct fix for what appeared to be a duplicate initialization. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-04-28xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busyChristoph Hellwig
Instead of finding the per-ag and then taking and releasing the pagb_lock for every single busy extent completed sort the list of busy extents and only switch betweens AGs where nessecary. This becomes especially important with the online discard support which will hit this lock more often. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: exact busy extent trackingChristoph Hellwig
Update the extent tree in case we have to reuse a busy extent, so that it always is kept uptodate. This is done by replacing the busy list searches with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree in case of a reuse. This allows us to allow reusing metadata extents unconditionally, and thus avoid log forces especially for allocation btree blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: do not immediately reuse busy extent rangesChristoph Hellwig
Every time we reallocate a busy extent, we cause a synchronous log force to occur to ensure the freeing transaction is on disk before we continue and use the newly allocated extent. This is extremely sub-optimal as we have to mark every transaction with blocks that get reused as synchronous. Instead of searching the busy extent list after deciding on the extent to allocate, check each candidate extent during the allocation decisions as to whether they are in the busy list. If they are in the busy list, we trim the busy range out of the extent we have found and determine if that trimmed range is still OK for allocation. In many cases, this check can be incorporated into the allocation extent alignment code which already does trimming of the found extent before determining if it is a valid candidate for allocation. Based on earlier patches from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-28xfs: optimize AGFL refillsChristoph Hellwig
While we need to make sure we do not reuse busy extents, there is no need to force out busy extents when moving them between the AGFL and the freespace btree as we still take care of that when doing the real allocation. To avoid the log force when just moving extents from the different free space tracking structures, move the busy search out of xfs_alloc_get_freelist into the callers that need it, and move the busy list insert from xfs_free_ag_extent which is used both by AGFL refills and real allocation to xfs_free_extent, which is only used by the latter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-20xfs: fix duplicate message outputDave Chinner
Commit 957935dc ("xfs: fix xfs_debug warnings" broke the logic in __xfs_printk(). Instead of only printing one of two possible output strings based on whether the fs has a name or not, it outputs both. Fix it to only output one message again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: fix free space cache leak Btrfs: avoid taking the chunk_mutex in do_chunk_alloc Btrfs end_bio_extent_readpage should look for locked bits Btrfs: don't force chunk allocation in find_free_extent Btrfs: Check validity before setting an acl Btrfs: Fix incorrect inode nlink in btrfs_link() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr() Btrfs: make uncache_state unconditional btrfs: using cached extent_state in set/unlock combinations Btrfs: avoid taking the trans_mutex in btrfs_end_transaction Btrfs: fix subvolume mount by name problem when default mount subvolume is set fix user annotation in ioctl.c Btrfs: check for duplicate iov_base's when doing dio reads btrfs: properly handle overlapping areas in memmove_extent_buffer Btrfs: fix memory leaks in btrfs_new_inode() Btrfs: check for duplicate iov_base's when doing dio reads Btrfs: reuse the extent_map we found when calling btrfs_get_extent Btrfs: do not use async submit for small DIO io's Btrfs: don't split dio bios if we don't have to ...
2011-04-18proc: do proper range check on readdir offsetLinus Torvalds
Rather than pass in some random truncated offset to the pid-related functions, check that the offset is in range up-front. This is just cleanup, the previous commit fixed the real problem. Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-18Btrfs: fix free space cache leakChris Mason
The free space caching code was recently reworked to cache all the pages it needed instead of using find_get_page everywhere. One loop was missed though, so it ended up leaking pages. This fixes it to use our page array instead of find_get_page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-17fs: synchronize_rcu when unregister_filesystem success not failureMilton Miller
While checking unregister_filesystem for saftey vs extra calls for "ext4: register ext2 and ext3 alias after ext4" I realized that the synchronize_rcu() was called on the error path but not on the success path. Cc: stable (2.6.38) Signed-off-by: Milton Miller <miltonm@bga.com> [ This probably won't really make a difference since commit d863b50ab013 ("vfs: call rcu_barrier after ->kill_sb()"), but it's the right thing to do. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-16Btrfs: avoid taking the chunk_mutex in do_chunk_allocJosef Bacik
Everytime we try to allocate disk space we try and see if we can pre-emptively allocate a chunk, but in the common case we don't allocate anything, so there is no sense in taking the chunk_mutex at all. So instead if we are allocating a chunk, mark it in the space_info so we don't get two people trying to allocate at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Reviewed-by: Liu Bo <liubo2009@cn.fujitsu.com>
2011-04-16Btrfs end_bio_extent_readpage should look for locked bitsChris Mason
A recent commit caches the extent state in end_bio_extent_readpage, but the search it does should look for locked extents. This fixes things to make it more effective. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: net/9p: nwname should be an unsigned int 9p: Fix sparse error fs/9p: Fix error reported by coccicheck 9p: revert tsyncfs related changes fs/9p: Use write_inode for data sync on server fs/9p: Fix revalidate to return correct value
2011-04-15vfs: Fix absolute RCU path walk failures due to uninitialized seq numberTim Chen
During RCU walk in path_lookupat and path_openat, the rcu lookup frequently failed if looking up an absolute path, because when root directory was looked up, seq number was not properly set in nameidata. We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in directory entry's seq number. We reverted to slow path walk that need to take references. With the following patch, I saw a 50% increase in an exim mail server benchmark throughput on a 4-socket Nehalem-EX system. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: stable@kernel.org (v2.6.38) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-15fs/9p: Fix error reported by coccicheckAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-159p: revert tsyncfs related changesAneesh Kumar K.V
Now that we use write_inode to flush server cache related to fid, we don't need tsyncfs either fort dotl or dotu protocols. For dotu this helps to do a more efficient server flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15fs/9p: Use write_inode for data sync on serverAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15fs/9p: Fix revalidate to return correct valueAneesh Kumar K.V
revalidate should return > 0 on success. Also return 0 on ENOENT to force do_revalidate to return NULL dentry; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15Btrfs: don't force chunk allocation in find_free_extentChris Mason
find_free_extent likes to allocate in contiguous clusters, which makes writeback faster, especially on SSD storage. As the FS fragments, these clusters become harder to find and we have to decide between allocating a new chunk to make more clusters or giving up on the cluster to allocate from the free space we have. Right now it creates too many chunks, and you can end up with a whole FS that is mostly empty metadata chunks. This commit changes the allocation code to be more strict and only allocate new chunks when we've made good use of the chunks we already have. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-15Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix compilation warnings when compiling with gcc 4.5 UBIFS: fix oops when R/O file-system is fsync'ed
2011-04-15vfs: fix incorrect dentry_update_name_case() BUG_ON() testLinus Torvalds
The case we should be verifying when updating the dentry name is that the _parent_ inode (the directory) semaphore is held, not the semaphore for the dentry itself. It's the directory locking that rename and readdir() etc all care about. The comment just above even says so - but then the BUG_ON() still checked the dentry inode itself. Very few people noticed, because this helper function really isn't used for very much, so you had to be using ncpfs to ever hit it. I think I should just remove the BUG_ON (the function really has just one user), but let's run with it fixed for a while before getting rid of it entirely. Reported-and-tested-by: Bongani Hlope <bonganih@bankservafrica.com> Reported-and-tested-by: Bernd Feige <bernd.feige@uniklinik-freiburg.de> Cc: Petr Vandrovec <petr@vandrovec.name>, Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14ramfs: fix memleak on no-mmu archBob Liu
On no-mmu arch, there is a memleak during shmem test. The cause of this memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2 which makes iput() can't free that pages. The simple test file is like this: int main(void) { int i; key_t k = ftok("/etc", 42); for ( i=0; i<100; ++i) { int id = shmget(k, 10000, 0644|IPC_CREAT); if (id == -1) { printf("shmget error\n"); } if(shmctl(id, IPC_RMID, NULL ) == -1) { printf("shm rm error\n"); return -1; } } printf("run ok...\n"); return 0; } And the result: root:/> free total used free shared buffers Mem: 60320 17912 42408 0 0 -/+ buffers: 17912 42408 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 19096 41224 0 0 -/+ buffers: 19096 41224 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 20296 40024 0 0 -/+ buffers: 20296 40024 ... After this patch the test result is:(no memleak anymore) root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14fs/fhandle.c: add <linux/personality.h> for ia64Jeff Mahoney
force_o_largefile() on ia64 is defined in <asm/fcntl.h> and requires <linux/personality.h>. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14brk: COMPAT_BRK: fix detection of randomized brkJiri Kosina
5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK") tried to get the whole logic of brk randomization for legacy (libc5-based) applications finally right. It turns out that the way to detect whether brk has actually been randomized in the end or not introduced by that patch still doesn't work for those binaries, as reported by Geert: : /sbin/init from my old m68k ramdisk exists prematurely. : : Before the patch: : : | brk(0x80005c8e) = 0x80006000 : : After the patch: : : | brk(0x80005c8e) = 0x80005c8e : : Old libc5 considers brk() to have failed if the return value is not : identical to the requested value. I don't like it, but currently see no better option than a bit flag in task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2 case. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14fs/partitions/ldm.c: fix oops caused by corrupted partition tableTimo Warns
The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch validates the value of vblk_size. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Richard Russon <rich@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-13UBIFS: fix compilation warnings when compiling with gcc 4.5Maksim Rayskiy
When compiling UBIFS with CONFIG_UBIFS_FS_DEBUG not set, gcc-4.5.2 generates a slew of "warning: statement with no effect" on references to non-void functions defined as 0. To avoid these warnings, replace #defines with dummy inline functions. Artem: massage the patch a bit, also remove the duplicate 'dbg_check_lprops()' prototype. Signed-off-by: Maksim Rayskiy <maksim.rayskiy@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-04-13UBIFS: fix oops when R/O file-system is fsync'edArtem Bityutskiy
This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an file on R/O-mounter file-system. We (the UBIFS authors) incorrectly thought that VFS would not propagate 'fsync()' down to the file-system if it is read-only, but this is not the case. It is easy to exploit this bug using the following simple perl script: use strict; use File::Sync qw(fsync sync); die "File path is not specified" if not defined $ARGV[0]; my $path = $ARGV[0]; open FILE, "<", "$path" or die "Cannot open $path: $!"; fsync(\*FILE) or die "cannot fsync $path: $!"; close FILE or die "Cannot close $path: $!"; Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reported-by: Reuben Dowle <Reuben.Dowle@navico.com> Cc: stable@kernel.org
2011-04-13Btrfs: Check validity before setting an aclMiao Xie
Call posix_acl_valid() to check if an acl is valid or not. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13Btrfs: Fix incorrect inode nlink in btrfs_link()Miao Xie
Link count of the inode is not decreased if btrfs_set_inode_index() fails. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Singed-off-by: Li Zefan <lizf@cn.fujitsu.com>