summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_trans_resv.c
AgeCommit message (Collapse)Author
2013-08-30xfs: inode log reservations are too smallDave Chinner
We've been seeing occasional problems with log space leaks and transaction underruns such as this for some time: XFS (dm-0): xlog_write: reservation summary: trans type = FSYNC_TS (36) unit res = 2740 bytes current res = -4 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 Turns out that xfstests generic/311 is reliably reproducing this problem with the test it runs at sequence 16 of it execution. It is a 100% reliable reproducer with the mkfs configuration of "-b size=1024 -m crc=1" on a 10GB scratch device. The problem? Inode forks in btree format are logged in memory format, not disk format (i.e. bmbt format, not bmdr format). That means there is a btree block header being logged, when such a structure is never written to the inode fork in bmdr format. The bmdr header in the inode is only 4 bytes, while the bmbt header is 24 bytes for v4 filesystems and 72 bytes for v5 filesystems. We currently reserve the inode size plus the rounded up overhead of a logging a buffer, which is 128 bytes. That means the reservation for a 512 byte inode is 640 bytes. What we can actually log is: inode core, data and attr fork = 512 bytes inode log format + log op header = 56 + 12 = 68 bytes data fork bmbt hdr = 24/72 bytes attr fork bmbt hdr = 24/72 bytes So, for a v2 inodes we can log at least 628 bytes, but if we split that inode over the end of the log across log buffers, we need to also another log op header, which takes us to 640 bytes. If there's another reservation taken out of this that I haven't taken into account (perhaps multiple iclog splits?) or I haven't corectly calculated the bmbt format space used (entirely possible), then we will overun it. For v3 inodes the maximum is actually 724 bytes, and even a single maximally sized btree format fork can blow it (652 bytes). And that's exactly what is happening with the FSYNC_TS transaction in the above output - it's consumed 644 bytes of space after the CIL context took the space reserved for it (2100 bytes). This problem has always been present in the XFS code - the btree format inode forks have always been logged in this manner. Hence there has always been the possibility of an overrun with such a transaction. The CRC code has just exposed it frequently enough to be able to debug and understand the root cause.... So, let's fix all the inode log space reservations. [ I'm so glad we spent the effort to clean up the transaction reservation code. This is an easy fix now. ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: refactor xfs_trans_reserve() interfaceJie Liu
With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: Introduce tr_fsyncts to m_reservationJie Liu
A preparation step. For now fsync_ts transaction use the pre-calculated log reservation size of tr_swrite. This patch introduce a new item tr_fsyncts to mp->m_reservations structure so that we can fetch the log reservation value for it in a same manner to others. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: Introduce a new structure to hold transaction reservation itemsJie Liu
Introduce a new structure xfs_trans_res to hold transaction reservation item info per log ticket. We also need to improve xfs_trans_resv_calc() by initializing the log count as well as log flags for permanent log reservation. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: create xfs_bmap_util.[ch]Dave Chinner
There is a bunch of code in xfs_bmap.c that is kernel specific and not shared with userspace. To minimise the difference between the kernel and userspace code, shift this unshared code to xfs_bmap_util.c, and the declarations to xfs_bmap_util.h. The biggest issue here is xfs_bmap_finish() - userspace has it's own definition of this function, and so we need to move it out of xfs_bmap.[ch]. This means several other files need to include xfs_bmap_util.h as well. It also introduces and interesting dance for the stack switching code in xfs_bmapi_allocate(). The stack switching/workqueue code is actually moved to xfs_bmap_util.c, so that userspace can simply use a #define in a header file to connect the dots without needing to know about the stack switch code at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: split out transaction reservation codeDave Chinner
The transaction reservation size calculations is used by both kernel and userspace, but most of the transaction code in xfs_trans.c is kernel specific. Split all the transaction reservation code out into it's own files to make sharing with userspace simpler. This just leaves kernel-only definitions in xfs_trans.h, so it doesn't need to be shared with userspace anymore, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>