summaryrefslogtreecommitdiff
path: root/fs/nilfs2
AgeCommit message (Collapse)Author
2009-11-20nilfs2: add size option of private object to metadata file allocatorRyusuke Konishi
This adds an optional "object size" argument to nilfs_mdt_new_common() function; the argument specifies the size of private object attached to a newly allocated metadata file inode. This will afford space to keep local variables for meta data files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: move out mark_inode_dirty calls from bmap routinesRyusuke Konishi
Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called mark_inode_dirty() after they changed the number of data blocks. This moves these calls outside bmap outermost functions like nilfs_bmap_insert() or nilfs_bmap_truncate(). This will mitigate overhead for truncate or delete operation since they repeatedly remove set of blocks. Nearly 10 percent improvement was observed for removal of a large file: # dd if=/dev/zero of=/test/aaa bs=1M count=512 # time rm /test/aaa real 2.968s -> 2.705s Further optimization may be possible by eliminating these mark_inode_dirty() uses though I avoid mixing separate changes here. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: stop marking metadata inode dirty within btree operationsRyusuke Konishi
Since metadata file routines mark the inode dirty after they successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant. This removes these overlapping calls from the bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: remove buffer locking from btree codeRyusuke Konishi
lock_buffer() and unlock_buffer() uses in btree.c are eliminable because btree functions gain buffer heads through nilfs_btnode_get(), which never returns an on-the-fly buffer. Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is safe because these operations are protected by a log writer lock or the metadata file semaphore of DAT. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: remove buffer locking in nilfs_mark_inode_dirtyRyusuke Konishi
This lock is eliminable because inodes on the buffer can be updated independently. Although a log writer also fills in bmap data on the on-disk inodes, this update is exclusively done by a log writer lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: cleanup unused match_bool functionJiro SEKIBA
match_bool function is not used anymore. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: Using nobarrier option instead of barrier=offJiro SEKIBA
Since most of fs using nofoobar style option, modified barrier=off option as nobarrier. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: move definition of struct nilfs_btree_nodeJiro SEKIBA
This is a trivial patch to expose struct nilfs_fs_btree_node. The struct should be exposed outside of kernel, for it is disk format. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20nilfs2: get rid of BUG_ON use in btree lookup routinesRyusuke Konishi
The current btree lookup routines make a kernel oops when detected inconsistency in btree blocks. These routines should instead return a proper error code because the inconsistency usually comes from corruption of on-disk metadata. This fixes the issue by converting BUG_ON calls to proper error handlings. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-15nilfs2: deleted inconsistent comment in nilfs_load_inode_block()Jiro SEKIBA
The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-13nilfs2: fix lock order reversal in chcp operationRyusuke Konishi
Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-08nilfs2: fix missing cleanup of gc cache on error casesRyusuke Konishi
This fixes an -rc1 regression brought by the commit: 1cf58fa840472ec7df6bf2312885949ebb308853 ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jiro SEKIBA <jir@unicus.jp>
2009-11-08nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocksRyusuke Konishi
This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>
2009-11-03nilfs2: add zero-fill for new btree node buffersRyusuke Konishi
Adds missing initialization of newly allocated b-tree node buffers. This avoids garbage data to be mixed in b-tree node blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-03nilfs2: fix irregular checkpoint creation due to data flushRyusuke Konishi
When nilfs flushes out dirty data to reduce memory pressure, creation of checkpoints is wrongly postponed. This bug causes irregular checkpoint creation especially in small footprint systems. To correct this issue, a timer for the checkpoint creation has to be continued if a log writer does not create a checkpoint. This will do the correction. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-03nilfs2: fix dirty page accounting leak causing hang at writeRyusuke Konishi
Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly hang on ARM-based targets. I found this was caused by an underflow of dirty pages counter. A b-tree cache routine was marking page dirty without adjusting page account information. This fixes the dirty page accounting leak and resolves the hang on arm-based targets. Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Reported-by: Dunphy, Bill <WDunphy@tandbergdata.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable <stable@kernel.org>
2009-10-01const: constify remaining file_operationsAlexey Dobriyan
[akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-29nilfs2: fix missing initialization of i_dir_start_lookup memberRyusuke Konishi
The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-29nilfs2: fix missing zero-fill initialization of btree node cacheRyusuke Konishi
This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>
2009-09-27const: mark struct vm_struct_operationsAlexey Dobriyan
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: mark remaining inode_operations as constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: mark remaining address_space_operations constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: mark remaining export_operations constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: mark remaining super_operations constAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-16fs: remove bdev->bd_inode_backing_dev_infoJens Axboe
It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14fs/Kconfig: move nilfs2 outside misc filesystemsRyusuke Konishi
Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: convert nilfs_bmap_lookup to an inline functionRyusuke Konishi
The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: allow btree code to directly call dat operationsRyusuke Konishi
The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: add update functions of virtual block address to datRyusuke Konishi
This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove individual gfp constants for each metadata fileRyusuke Konishi
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: stop zero-fill of btree path just before free itRyusuke Konishi
The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove unused btree argument from btree functionsRyusuke Konishi
Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_freeRyusuke Konishi
These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: shorten freeze period due to GC in write operation v3Jiro SEKIBA
This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: add more check routines in mount processZhu Yanhai
nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: An unassigned variable is assigned to a never used structure memberZhang Qiang
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAITRyusuke Konishi
Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: stop using periodic write_super callbackJiro SEKIBA
This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: clean up nilfs_write_superJiro SEKIBA
Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fsJiro SEKIBA
This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove redundant super block commitJiro SEKIBA
This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: implement nilfs_show_options to display mount options in /proc/mountsJiro SEKIBA
This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: always lookup disk block address before reading metadata blockRyusuke Konishi
The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use semaphore to protect pointer to a writable FS-instanceRyusuke Konishi
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit 201913ed746c7724a40d33ee5a0b6a1fd2ef3193. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix format string compile warning (ino_t)Heiko Carstens
Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix ignored error code in __nilfs_read_inode()Ryusuke Konishi
The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-31nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_keyRyusuke Konishi
This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org
2009-08-18nilfs2: fix oopses with doubly mounted snapshotsRyusuke Konishi
will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-18nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()Zhang Qiang
'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-02nilfs2: fix missing unlock in error path of nilfs_mdt_write_pageRyusuke Konishi
This adds a missing unlock of nilfs->ns_writer_mutex in nilfs_mdt_write_page() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>