summaryrefslogtreecommitdiff
path: root/fs/gfs2/rgrp.c
AgeCommit message (Collapse)Author
2012-01-11GFS2: Fix a use-after-free that coverity spottedBob Peterson
In function gfs2_inplace_release it was trying to unlock a gfs2_holder structure associated with a reservation, after said reservation was freed. The problem is that the statements have the wrong order. This patch corrects the order so that the reservation is freed after the gfs2_holder is unlocked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22GFS2: Fix multi-block allocationSteven Whitehouse
Clean up gfs2_alloc_blocks so that it takes the full extent length rather than just the number of non-inode blocks as an argument. That will only make a difference in the inode allocation case for now. Also, this fixes the extent length handling around gfs2_alloc_extent() so that multi block allocations will work again. The rd_last_alloc block is set to the final block in the allocated extent (as per the update to i_goal, but referenced to a different start point). This also removes the dinode argument to rgblk_search() which is no longer used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22GFS2: decouple quota allocations from block allocationsBob Peterson
This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: split function rgblk_searchBob Peterson
This patch splits function rgblk_search into a function that finds blocks to allocate (rgblk_search) and a function that assigns those blocks (gfs2_alloc_extent). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@rehat.com>
2011-11-21GFS2: Fix up "off by one" in the previous patchSteven Whitehouse
The trace point should take extlen and not *ndata as the extent length. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: move toward a generic multi-block allocatorBob Peterson
This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-18GFS2: remove vestigial al_allocedBob Peterson
This patch removes the vestigial variable al_alloced from the gfs2_alloc structure. This is another baby step toward multi-block reservations. My next planned step is to decouple the quota variables from the gfs2_alloc structure so we can use a different method for allocations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15GFS2: combine gfs2_alloc_block and gfs2_alloc_diBob Peterson
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15GFS2: Add non-try locks back to get_local_rgrpBob Peterson
This upstream patch had what I believe is an unintended consequence: http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93 The patch changed function get_local_rgrp such that it ONLY used TRY locks for RGRP searches. Prior to that patch, the code used TRY locks during the first loop, and if that was unsuccessful, it used normal blocking locks on subsequent searches. This patch changes it back to the old way. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Remove two unused variablesSteven Whitehouse
The two variables being initialised in gfs2_inplace_reserve to track the file & line number of the caller are never used, so we might as well remove them. If something does go wrong, then a stack trace is probably more useful anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Fix off-by-one in gfs2_blk2rgrpdSteven Whitehouse
Bob reported: I found an off-by-one problem with how I coded this section: It should be: + else if (blk >= cur->rd_data0 + cur->rd_data) In fact, cur->rd_data0 + cur->rd_data is the start of the next rgrp (the next ri_addr), so without the "=" check it can land on the wrong rgrp. In all normal cases, this won't be a problem: you're searching for a block _within_ the rgrp, which will pass the test properly. Where it gets into trouble is if you search the rgrps for the block exactly equal to ri_addr. I don't think anything in the kernel does this, but I found a place in gfs2-utils gfs2_edit where it does. So I definitely need to fix it in libgfs2. I'd like to suggest we fix it in the kernel as well for the sake of keeping the functions similar. So this patch fixes the above mentioned off by one error as well as removing the unused parent pointer. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Correctly set goal block after allocationSteven Whitehouse
The new goal block should be set to the end of the newly allocated extent, not the start of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Use cached rgrp in gfs2_rlist_add()Steven Whitehouse
Each block which is deallocated, requires a call to gfs2_rlist_add() and each of those calls was calling gfs2_blk2rgrpd() in order to figure out which rgrp the block belonged in. This can be speeded up by making use of the rgrp cached in the inode. We also reset this cached rgrp in case the block has changed rgrp. This should provide a big reduction in gfs2_blk2rgrpd() calls during deallocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Remove obsolete assertSteven Whitehouse
Given that a resource group has been locked, there is no reason why we should not be able to allocate as many blocks as are free. The al_requested parameter should really be considered as a minimum number of blocks to be available. Should this limit be overshot, there are other mechanisms which will prevent over allocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Cache the most recently used resource group in the inodeSteven Whitehouse
This means that after the initial allocation for any inode, the last used resource group is cached in the inode for future use. This drastically reduces the number of lookups of resource groups in the common case, and this the contention on that data structure. The allocation algorithm is the same as previously, except that we always check to see if the goal block is within the cached rgrp first before going to the rbtree to look one up. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Make resource groups "append only" during life of fsSteven Whitehouse
Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count schemeBob Peterson
Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
2011-07-15GFS2: combine duplicated block freeing routinesEric Sandeen
__gfs2_free_data and __gfs2_free_meta are almost identical, and can be trivially combined. [This is as per Eric's original patch minus gfs2_free_data() which had no callers left and plus the conversion of the bmap.c calls to these functions. All in all, a nice clean up] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-21GFS2: Wipe directory hash table metadata when deallocating a directorySteven Whitehouse
The deallocation code for directories in GFS2 is largely divided into two parts. The first part deallocates any directory leaf blocks and marks the directory as being a regular file when that is complete. The second stage was identical to deallocating regular files. Regular files have their data blocks in a different address space to directories, and thus what would have been normal data blocks in a regular file (the hash table in a GFS2 directory) were deallocated correctly. However, a reference to these blocks was left in the journal (assuming of course that some previous activity had resulted in those blocks being in the journal or ail list). This patch uses the i_depth as a test of whether the inode is an exhash directory (we cannot test the inode type as that has already been changed to a regular file at this stage in deallocation) The original issue was reported by Chris Hertel as an issue he encountered running bonnie++ Reported-by: Christopher R. Hertel <crh@samba.org> Cc: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20GFS2: Alter point of entry to glock lru list for glocks with an address_spaceSteven Whitehouse
Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20GFS2: Dump better debug info if a bitmap inconsistency is detectedBob Peterson
On rare occasions we encounter gfs2 problems where an invalid bitmap state transition is attempted. For example, trying to "unlink" a free block. In these cases, there is really no useful information logged to debug the problem. This patch adds more debug details that should allow us to more closely examine the problem and possibly solve it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18GFS2: filesystem hang caused by incorrect lock orderBob Peterson
This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-24GFS2: deallocation performance patchBob Peterson
This patch is a performance improvement to GFS2's dealloc code. Rather than update the quota file and statfs file for every single block that's stripped off in unlink function do_strip, this patch keeps track and updates them once for every layer that's stripped. This is done entirely inside the existing transaction, so there should be no risk of corruption. The other functions that deallocate blocks will be unaffected because they are using wrapper functions that do the same thing that they do today. I tested this code on my roth cluster by creating 200 files in a directory, each of which is 100MB, then on four nodes, I simultaneously deleted the files, thus competing for GFS2 resources (but different files). The commands I used were: [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done The performance increase was significant: roth-01 roth-02 roth-03 roth-05 --------- --------- --------- --------- old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s Total time spent deleting: old: 118.6s new: 89.4 For this particular case, this showed a 25% performance increase for GFS2 unlinks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-12-07GFS2: fsck.gfs2 reported statfs error after gfs2_growBob Peterson
When you do gfs2_grow it failed to take the very last rgrp into account when adding up the new free space due to an off-by-one error. It was not reading the last rgrp from the rindex because of a check for "<=" that should have been "<". Therefore, fsck.gfs2 was finding (and fixing) an error with the system statfs file. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-11-30GFS2: fix recursive locking during rindex truncatesBenjamin Marzinski
When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold, since you already hold it. However, if you haven't already read in the resource groups, you need to do that. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-30GFS2: reread rindex when necessary to grow rindexBenjamin Marzinski
When GFS2 grew the filesystem, it was never rereading the rindex file during the grow. This is necessary for large grows when the filesystem is almost full, and GFS2 needs to use some of the space allocated earlier in the grow to complete it. Now, if GFS2 fails to reserve the necessary space and the rindex file is not uptodate, it rereads it. Also, the only difference between gfs2_ri_update() and gfs2_ri_update_special() was that gfs2_ri_update_special() didn't clear out the existing resource groups, since you knew that it was only called when there were no resource groups. Attempting to clear out the resource groups when there are none takes almost no time, and rarely happens, so I simply removed gfs2_ri_update_special(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-15GFS2: Fix inode deallocation raceSteven Whitehouse
This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-10-23Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...
2010-09-30GFS2 fatal: filesystem consistency error on renameBob Peterson
This patch fixes a GFS2 problem whereby the first rename after a mount can result in a file system consistency error being flagged improperly and cause the file system to withdraw. The problem is that the rename code tries to run the rgrp list with function gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in from disk. The patch makes the rename function hold the rindex glock (as the gfs2_unlink code does today) which reads in the rgrp list if need be. There were a total of three places in the rename code that improperly referenced the rgrp list without the rindex glock and this patch fixes all three. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-20GFS2: fallocate supportBenjamin Marzinski
This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-20GFS2: Add a bug trap in allocation codeSteven Whitehouse
This adds a check to ensure that if we reach the block allocator that we don't try and proceed if there is no alloc structure hanging off the inode. This should only happen if there is a bug in GFS2. The error return code is distinctive in order that it will be easily spotted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-20GFS2: Remove i_disksizeSteven Whitehouse
With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-16block: remove BLKDEV_IFL_WAITChristoph Hellwig
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10gfs2: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix permissions checking for setflags ioctl() GFS2: Don't "get" xattrs for ACLs when ACLs are turned off GFS2: Rework reclaiming unlinked dinodes
2010-05-21Merge branch 'master' into for-2.6.35Jens Axboe
Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21GFS2: Rework reclaiming unlinked dinodesBob Peterson
The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-12GFS2: stuck in inode wait, no glocks stuckBob Peterson
This patch changes the lock ordering when gfs2 reclaims unlinked dinodes, thereby avoiding a livelock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-04-28blkdev: generalize flags for blkdev_issue_fn functionsDmitry Monakhov
The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-14GFS2: glock livelockBob Peterson
This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-02-01GFS2: Use GFP_NOFS for alloc structureSteven Whitehouse
This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01GFS2: Fix previous patchSteven Whitehouse
The do_div() call needs to remain. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01GFS2: Don't withdraw on partial rindex entriesBenjamin Marzinski
ince gfs2 writes the rindex file a block at a time, and releases the exclusive lock after each block, it is possible that another process will grab the lock in the middle of the write. Since rindex entries are not an even divisor of blocks, that other process may see partial entries. On grows, this is fine. The process can simply ignore the the partial entires. Previously, the code withdrew when it saw partial entries. Now it simply ignores them. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03GFS2: Locking order fix in gfs2_check_blk_stateSteven Whitehouse
In some cases we already have the rindex lock when we enter this function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-21trivial: fix typo "to to" in multiple filesAnand Gadiyar
Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-15Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...
2009-09-14GFS2: Whitespace fixesSteven Whitehouse
Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-08GFS2: Be extra careful about deallocating inodesSteven Whitehouse
There is a potential race in the inode deallocation code if two nodes try to deallocate the same inode at the same time. Most of the issue is solved by the iopen locking. There is still a small window which is not covered by the iopen lock. This patches fixes that and also makes the deallocation code more robust in the face of any errors in the rgrp bitmaps, or erroneous iopen callbacks from other nodes. This does introduce one extra disk read, but that is generally not an issue since its the same block that must be written to later in the deallocation process. The total disk accesses therefore stay the same, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-27GFS2: Remove no_formal_ino generating codeSteven Whitehouse
The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>