summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_inode.c
AgeCommit message (Collapse)Author
2011-05-23Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: obey minleft values during extent allocation correctly xfs: reset buffer pointers before freeing them xfs: avoid getting stuck during async inode flushes xfs: fix xfs_itruncate_start tracing xfs: fix duplicate workqueue initialisation xfs: kill off xfs_printk() xfs: fix race condition in AIL push trigger xfs: make AIL target updates and compares 32bit safe. xfs: always push the AIL to the target xfs: exit AIL push work correctly when AIL is empty xfs: ensure reclaim cursor is reset correctly at end of AG xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE xfs: fix compiler warning in xfs_trace.h xfs: cleanup duplicate initializations xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy xfs: exact busy extent tracking xfs: do not immediately reuse busy extent ranges xfs: optimize AGFL refills
2011-05-19xfs: fix xfs_itruncate_start tracingDave Chinner
Variables are ordered incorrectly in trace call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-10treewide: fix a few typos in commentsJustin P. Mattock
- kenrel -> kernel - whetehr -> whether - ttt -> tt - sss -> ss Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-25xfs: introduce inode cluster buffer trylocks for xfs_iflushDave Chinner
There is an ABBA deadlock between synchronous inode flushing in xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the buffer, then takes inode ilocks, whilst synchronous reclaim takes the ilock followed by the buffer lock in xfs_iflush(). To avoid this deadlock, separate the inode cluster buffer locking semantics from the synchronous inode flush semantics, allowing callers to attempt to lock the buffer but still issue synchronous IO if it can get the buffer. This requires xfs_iflush() calls that currently use non-blocking semantics to pass SYNC_TRYLOCK rather than 0 as the flags parameter. This allows xfs_reclaim_inode to avoid the deadlock on the buffer lock and detect the failure so that it can drop the inode ilock and restart the reclaim attempt on the inode. This allows xfs_ifree_cluster to obtain the inode lock, mark the inode stale and release it and hence defuse the deadlock situation. It also has the pleasant side effect of avoiding IO in xfs_reclaim_inode when it tries to next reclaim the inode as it is now marked stale. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-03-06xfs: Convert remaining cmn_err() callers to new APIDave Chinner
Once converted, kill the remainder of the cmn_err() interface. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-06xfs: convert xfs_fs_cmn_err to new error logging APIDave Chinner
Continue to clean up the error logging code by converting all the callers of xfs_fs_cmn_err() to the new API. Once done, remove the unused old API function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-06xfs: kill xfs_fs_repair_cmn_err() macroDave Chinner
In certain cases of inode corruption, the xfs_fs_repair_cmn_err() macro is used to output an extra message in the corruption report. That extra message is "unmount and run xfs_repair", which really applies to any corruption report. Each case that this macro is called (except one) a following call to xfs_corruption_error() is made to optionally dump more information about the error. Hence, move the output of "run xfs_repair" to xfs_corruption_error() so that it is output on all corruption reports. Also, convert the callers of the repair macro that don't call xfs_corruption_error() to call it, hence provide consiѕtent error reporting for all cases where xfs_fs_repair_cmn_err() used to be called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-03-06xfs: convert xfs_cmn_err to xfs_alert_tagDave Chinner
Continue the conversion of the old cmn_err interface be converting all the conditional panic tag errors to xfs_alert_tag() and then removing xfs_cmn_err(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-02-23xfs: more sensible inode refcounting for iallocChristoph Hellwig
Currently we return iodes from xfs_ialloc with just a single reference held. But we need two references, as one is dropped during transaction commit and the second needs to be transfered to the VFS. Change xfs_ialloc to use xfs_iget plus xfs_trans_ijoin_ref to grab two references to the inode, and remove the now superflous IHOLD calls from all callers. This also greatly simplifies the error handling in xfs_create and also allow to remove xfs_trans_iget as no other callers are left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-12-02xfs: connect up buffer reclaim priority hooksDave Chinner
Now that the buffer reclaim infrastructure can handle different reclaim priorities for different types of buffers, reconnect the hooks in the XFS code that has been sitting dormant since it was ported to Linux. This should finally give use reclaim prioritisation that is on a par with the functionality that Irix provided XFS 15 years ago. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-12-17xfs: convert inode cache lookups to use RCU lockingDave Chinner
With delayed logging greatly increasing the sustained parallelism of inode operations, the inode cache locking is showing significant read vs write contention when inode reclaim runs at the same time as lookups. There is also a lot more write lock acquistions than there are read locks (4:1 ratio) so the read locking is not really buying us much in the way of parallelism. To avoid the read vs write contention, change the cache to use RCU locking on the read side. To avoid needing to RCU free every single inode, use the built in slab RCU freeing mechanism. This requires us to be able to detect lookups of freed inodes, so enѕure that ever freed inode has an inode number of zero and the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit lookup path, but also add a check for a zero inode number as well. We canthen convert all the read locking lockups to use RCU read side locking and hence remove all read side locking. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: Extend project quotas to support 32bit project idsArkadiusz Mi?kiewicz
This patch adds support for 32bit project quota identifiers. On disk format is backward compatible with 16bit projid numbers. projid on disk is now kept in two 16bit values - di_projid_lo (which holds the same position as old 16bit projid value) and new di_projid_hi (takes existing padding) and converts from/to 32bit value on the fly. xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used to enable PROJID32BIT support. Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_buf wrappersChristoph Hellwig
Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18xfs: remove xfs_cred.hChristoph Hellwig
We're not actually passing around credentials inside XFS for a while now, so remove all xfs_cred.h with it's cred_t typedef and all instances of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-08-24xfs: ensure we mark all inodes in a freed cluster XFS_ISTALEDave Chinner
Under heavy load parallel metadata loads (e.g. dbench), we can fail to mark all the inodes in a cluster being freed as XFS_ISTALE as we skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on. When this happens and the inode cluster buffer has already been marked stale and freed, inode reclaim can try to write the inode out as it is dirty and not marked stale. This can result in writing th metadata to an freed extent, or in the case it has already been overwritten trigger a magic number check failure and return an EUCLEAN error such as: Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117 Fix this by ensuring that we hoover up all in memory inodes in the cluster and mark them XFS_ISTALE when freeing the cluster. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-26xfs: kill the b_strat callback in xfs_bufChristoph Hellwig
The b_strat callback is used by xfs_buf_iostrategy to perform additional checks before submitting a buffer. It is used in xfs_bwrite and when writing out delayed buffers. In xfs_bwrite it we can de-virtualize the call easily as b_strat is set a few lines above the call to xfs_buf_iostrategy. For the delayed buffers the rationale is a bit more complicated: - there are three callers of xfs_buf_delwri_queue, which places buffers on the delwri list: (1) xfs_bdwrite - this sets up b_strat, so it's fine (2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set: - xlog_bdstrat is only used for log buffers, which are never delwri - _xfs_buf_read explicitly clears the delwri flag - xfs_buf_iodone_work retries log buffers only - xfsbdstrat - only used for reads, superblock writes without the delwri flag, log I/O and file zeroing with explicitly allocated buffers. - xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is not set (3) xfs_buf_unlock - only puts the buffer on the delwri list if the DELWRI flag is already set. The DELWRI flag is only ever set in xfs_bwrite, xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For xfs_buf_iodone_callbacks and xfs_trans_log_buf we require an initialized buf item, which means b_strat was set to xfs_bdstrat_cb in xfs_buf_item_init. Conclusion: we can just get rid of the callback and replace it with explicit calls to xfs_bdstrat_cb. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix gcc 4.6 set but not read and unused statement warningsChristoph Hellwig
[hch: dropped a few hunks that need structural changes instead] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: fix memory reclaim recursion deadlock on locked inode bufferDave Chinner
Calling into memory reclaim with a locked inode buffer can deadlock if memory reclaim tries to lock the inode buffer during inode teardown. Convert the relevant memory allocations to use KM_NOFS to avoid this deadlock condition. Reported-by: Peter Watkins <treestem@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: remove unused delta tracking code in xfs_bmapiChristoph Hellwig
This code was introduced four years ago in commit 3e57ecf640428c01ba1ed8c8fc538447ada1715b without any review and has been unused since. Remove it just as the rest of the code introduced in that commit to reduce that stack usage and complexity in this central piece of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove unused XFS_BMAPI_ flagsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: simplify inode to transaction joiningChristoph Hellwig
Currently we need to either call IHOLD or xfs_trans_ihold on an inode when joining it to a transaction via xfs_trans_ijoin. This patches instead makes xfs_trans_ijoin usable on it's own by doing an implicity xfs_trans_ihold, which also allows us to drop the third argument. For the case where we want to hold a reference on the inode a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks the inode for needing an xfs_iput. In addition to the cleaner interface to the caller this also simplifies the implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: give li_cb callbacks the correct prototypeChristoph Hellwig
Stop the function pointer casting madness and give all the li_cb instances correct prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-07-26xfs: remove unneeded #include statementsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
2010-07-26xfs: drop dmapi hooksChristoph Hellwig
Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-06-24xfs: remove block number from inode lookup codeDave Chinner
The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-06-24xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTEDDave Chinner
Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-06-03xfs: fix race in inode cluster freeing failing to stale inodesDave Chinner
When an inode cluster is freed, it needs to mark all inodes in memory as XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes have a different life cycle to the buffer, and once the buffer is torn down during transaction completion, we must ensure none of the inodes get written back (which is what XFS_ISTALE does). Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these inodes either during inode reclaim or tail pushing on the AIL. The buffer is read back, but no longer contains inodes and so triggers assert failures and shutdowns. This was reproducable with at run.dbench10 invocation from xfstests. There are two main causes of xfs_ifree_cluster() failing. The first is simple - it checks in-memory inodes it finds in the per-ag icache to see if they are clean without holding the flush lock. if they are clean it skips them completely. However, If an inode is flushed delwri, it will appear clean, but is not guaranteed to be written back until the flush lock has been dropped. Hence we may have raced on the clean check and the inode may actually be dirty. Hence always mark inodes found in memory stale before we check properly if they are clean. The second is more complex, and makes the first problem easier to hit. Basically the in-memory inode scan is done with full knowledge it can be racing with inode flushing and AIl tail pushing, which means that inodes that it can't get the flush lock on might not be attached to the buffer after then in-memory inode scan due to IO completion occurring. This is actually documented in the code as "needs better interlocking". i.e. this is a zero-day bug. Effectively, the in-memory scan must be done while the inode buffer is locked and Io cannot be issued on it while we do the in-memory inode scan. This ensures that inodes we couldn't get the flush lock on are guaranteed to be attached to the cluster buffer, so we can then catch all in-memory inodes and mark them stale. Now that the inode cluster buffer is locked before the in-memory scan is done, there is no need for the two-phase update of the in-memory inodes, so simplify the code into two loops and remove the allocation of the temporary buffer used to hold locked inodes across the phases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-05-28xfs: fix access to upper inodes without inode64Christoph Hellwig
If a filesystem is mounted without the inode64 mount option we should still be able to access inodes not fitting into 32 bits, just not created new ones. For this to work we need to make sure the inode cache radix tree is initialized for all allocation groups, not just those we plan to allocate inodes from. This patch makes sure we initialize the inode cache radix tree for all allocation groups, and also cleans xfs_initialize_perag up a bit to separate the inode32 logical from the general perag structure setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-19xfs: Add inode pin counts to tracesDave Chinner
We don't record pin counts in inode events right now, and this makes it difficult to track down problems related to pinning inodes. Add the pin count to the inode trace class and add trace events for pinning and unpinning inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-03-01xfs: remove xfs_ipin/xfs_iunpinChristoph Hellwig
Inodes are only pinned/unpinned via the inode item methods, and lots of code relies on that fact. So remove the separate xfs_ipin/xfs_iunpin helpers and merge them into their only callers. This also fixes up various duplicate and/or incorrect comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-03-01xfs: cleanup xfs_iunpin_wait/xfs_iunpin_nowaitChristoph Hellwig
Remove the inode item pointer and ili_last_lsn checks in __xfs_iunpin_wait as any pinned inode is guaranteed to have them valid. After this the xfs_iunpin_nowait case is nothing more than a xfs_log_force_lsn, as we know that the caller has already checked the pincount. Make xfs_iunpin_nowait the new low-level routine just doing the log force and rewrite xfs_iunpin_wait around it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-06xfs: Use delayed write for inodes rather than async V2Dave Chinner
We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06xfs: Make inode reclaim states explicitDave Chinner
A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-01-21xfs: cleanup up xfs_log_force calling conventionsChristoph Hellwig
Remove the XFS_LOG_FORCE argument which was always set, and the XFS_LOG_URGE define, which was never used. Split xfs_log_force into a two helpers - xfs_log_force which forces the whole log, and xfs_log_force_lsn which forces up to the specified LSN. The underlying implementations already were entirely separate, as were the users. Also re-indent the new _xfs_log_force/_xfs_log_force which previously had a weird coding style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-21xfs: remove duplicate buffer flagsChristoph Hellwig
Currently we define aliases for the buffer flags in various namespaces, which only adds confusion. Remove all but the XBF_ flags to clean this up a bit. Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer uses, but I'll clean that up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-15xfs: convert remaining direct references to m_peragDave Chinner
Convert the remaining direct lookups of the per ag structures to use get/put accesses. Ensure that the loops across AGs and prior users of the interface balance gets and puts correctly. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-15xfs: rename xfs_get_peragDave Chinner
xfs_get_perag is really getting the perag that an inode belongs to based on it's inode number. Convert the use of this function to just get the perag from a provided ag number. Use this new function to obtain the per-ag structure when traversing the per AG inode trees for sync and reclaim. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-15xfs: fix stale inode flush avoidanceDave Chinner
When reclaiming stale inodes, we need to guarantee that inodes are unpinned before returning with a "clean" status. If we don't we can reclaim inodes that are pinned, leading to use after free in the transaction subsystem as transactions complete. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-10xfs: Don't flush stale inodesDave Chinner
Because inodes remain in cache much longer than inode buffers do under memory pressure, we can get the situation where we have stale, dirty inodes being reclaimed but the backing storage has been freed. Hence we should never, ever flush XFS_ISTALE inodes to disk as there is no guarantee that the backing buffer is in cache and still marked stale when the flush occurs. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-15xfs: event tracing supportChristoph Hellwig
Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile | 8 fs/xfs/linux-2.6/xfs_acl.c | 1 fs/xfs/linux-2.6/xfs_aops.c | 52 - fs/xfs/linux-2.6/xfs_aops.h | 2 fs/xfs/linux-2.6/xfs_buf.c | 117 +-- fs/xfs/linux-2.6/xfs_buf.h | 33 fs/xfs/linux-2.6/xfs_fs_subr.c | 3 fs/xfs/linux-2.6/xfs_ioctl.c | 1 fs/xfs/linux-2.6/xfs_ioctl32.c | 1 fs/xfs/linux-2.6/xfs_iops.c | 1 fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_lrw.c | 87 -- fs/xfs/linux-2.6/xfs_lrw.h | 45 - fs/xfs/linux-2.6/xfs_super.c | 104 --- fs/xfs/linux-2.6/xfs_super.h | 7 fs/xfs/linux-2.6/xfs_sync.c | 1 fs/xfs/linux-2.6/xfs_trace.c | 75 ++ fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h | 4 fs/xfs/quota/xfs_dquot.c | 110 --- fs/xfs/quota/xfs_dquot.h | 21 fs/xfs/quota/xfs_qm.c | 40 - fs/xfs/quota/xfs_qm_syscalls.c | 4 fs/xfs/support/ktrace.c | 323 --------- fs/xfs/support/ktrace.h | 85 -- fs/xfs/xfs.h | 16 fs/xfs/xfs_ag.h | 14 fs/xfs/xfs_alloc.c | 230 +----- fs/xfs/xfs_alloc.h | 27 fs/xfs/xfs_alloc_btree.c | 1 fs/xfs/xfs_attr.c | 107 --- fs/xfs/xfs_attr.h | 10 fs/xfs/xfs_attr_leaf.c | 14 fs/xfs/xfs_attr_sf.h | 40 - fs/xfs/xfs_bmap.c | 507 +++------------ fs/xfs/xfs_bmap.h | 49 - fs/xfs/xfs_bmap_btree.c | 6 fs/xfs/xfs_btree.c | 5 fs/xfs/xfs_btree_trace.h | 17 fs/xfs/xfs_buf_item.c | 87 -- fs/xfs/xfs_buf_item.h | 20 fs/xfs/xfs_da_btree.c | 3 fs/xfs/xfs_da_btree.h | 7 fs/xfs/xfs_dfrag.c | 2 fs/xfs/xfs_dir2.c | 8 fs/xfs/xfs_dir2_block.c | 20 fs/xfs/xfs_dir2_leaf.c | 21 fs/xfs/xfs_dir2_node.c | 27 fs/xfs/xfs_dir2_sf.c | 26 fs/xfs/xfs_dir2_trace.c | 216 ------ fs/xfs/xfs_dir2_trace.h | 72 -- fs/xfs/xfs_filestream.c | 8 fs/xfs/xfs_fsops.c | 2 fs/xfs/xfs_iget.c | 111 --- fs/xfs/xfs_inode.c | 67 -- fs/xfs/xfs_inode.h | 76 -- fs/xfs/xfs_inode_item.c | 5 fs/xfs/xfs_iomap.c | 85 -- fs/xfs/xfs_iomap.h | 8 fs/xfs/xfs_log.c | 181 +---- fs/xfs/xfs_log_priv.h | 20 fs/xfs/xfs_log_recover.c | 1 fs/xfs/xfs_mount.c | 2 fs/xfs/xfs_quota.h | 8 fs/xfs/xfs_rename.c | 1 fs/xfs/xfs_rtalloc.c | 1 fs/xfs/xfs_rw.c | 3 fs/xfs/xfs_trans.h | 47 + fs/xfs/xfs_trans_buf.c | 62 - fs/xfs/xfs_vnodeops.c | 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-15xfs: change the xfs_iext_insert / xfs_iext_removeChristoph Hellwig
Change the xfs_iext_insert / xfs_iext_remove prototypes to pass more information which will allow pushing the trace points from the callers into those functions. This includes folding the whichfork information into the state variable to minimize the addition stack footprint. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-10-08xfs: implement ->dirty_inode to fix timestamp handlingChristoph Hellwig
This is picking up on Felix's repost of Dave's patch to implement a .dirty_inode method. We really need this notification because the VFS keeps writing directly into the inode structure instead of going through methods to update this state. In addition to the long-known atime issue we now also have a caller in VM code that updates c/mtime that way for shared writeable mmaps. And I found another one that no one has noticed in practice in the FIFO code. So implement ->dirty_inode to set i_update_core whenever the inode gets externally dirtied, and switch the c/mtime handling to the same scheme we already use for atime (always picking up the value from the Linux inode). Note that this patch also removes the xfs_synchronize_atime call in xfs_reclaim it was superflous as we already synchronize the time when writing the inode via the log (xfs_inode_item_format) or the normal buffers (xfs_iflush_int). In addition also remove the I_CLEAR check before copying the Linux timestamps - now that we always have the Linux inode available we can always use the timestamps in it. Also switch to just using file_update_time for regular reads/writes - that will get us all optimization done to it for free and make sure we notice early when it breaks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-08-31xfs: add more statics & drop some unused functionsEric Sandeen
A lot more functions could be made static, but they need forward declarations; this does some easy ones, and also found a few unused functions in the process. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-08-12xfs: check for dinode realtime flag corruptionChristoph Hellwig
Ramon tested XFS with a modified version of fsfuzzer and hit a NULL pointer dereference in __xfs_get_blocks due to the RT device target pointer being NULL. To fix this reject inode with the realtime bit set on a a filesystem without an RT subvolume during inode read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reported-by: Ramon de Carvalho Valle <ramon@risesecurity.org> Tested-by: Ramon de Carvalho Valle <ramon@risesecurity.org> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-06-10xfs: use generic Posix ACL codeChristoph Hellwig
This patch rips out the XFS ACL handling code and uses the generic fs/posix_acl.c code instead. The ondisk format is of course left unchanged. This also introduces the same ACL caching all other Linux filesystems do by adding pointers to the acl and default acl in struct xfs_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-04-29xfs_file_last_byte() needs to acquire ilockLachlan McIlroy
We had some systems crash with this stack: [<a00000010000cb20>] ia64_leave_kernel+0x0/0x280 [<a00000021291ca00>] xfs_bmbt_get_startoff+0x0/0x20 [xfs] [<a0000002129080b0>] xfs_bmap_last_offset+0x210/0x280 [xfs] [<a00000021295b010>] xfs_file_last_byte+0x70/0x1a0 [xfs] [<a00000021295b200>] xfs_itruncate_start+0xc0/0x1a0 [xfs] [<a0000002129935f0>] xfs_inactive_free_eofblocks+0x290/0x460 [xfs] [<a000000212998fb0>] xfs_release+0x1b0/0x240 [xfs] [<a0000002129ad930>] xfs_file_release+0x70/0xa0 [xfs] [<a000000100162ea0>] __fput+0x1a0/0x420 [<a000000100163160>] fput+0x40/0x60 The problem here is that xfs_file_last_byte() does not acquire the inode lock and can therefore race with another thread that is modifying the extext list. While xfs_bmap_last_offset() is trying to lookup what was the last extent some extents were merged and the extent list shrunk so the index we lookup is now beyond the end of the extent list and potentially in a freed buffer. Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-01-19xfs: sanity check attr fork sizeChristoph Hellwig
Recently we have quite a few kerneloops reports about dereferencing a NULL if_data in the attribute fork. From looking over the code this can only happen if we pass a 0 size argument to xfs_iformat_local. This implies some sort of corruption and in fact the only mailinglist report about this from earlier this year was after a powerfail presumably on a system with write cache and without barriers. Add a quick sanity check for the attr fork size in xfs_iformat to catch these early and without an oops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
2009-01-16[XFS] Remove the rest of the macro-to-function indirections.Eric Sandeen
Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22[XFS] Remove XFS_BUF_SHUT() and friendsLachlan McIlroy
Code does nothing so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>