From 006a0973ed020a81fe1f24b511ce9feb53f70e44 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 25 Aug 2015 14:11:52 -0400 Subject: writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes() e79729123f63 ("writeback: don't issue wb_writeback_work if clean") updated writeback path to avoid kicking writeback work items if there are no inodes to be written out; unfortunately, the avoidance logic was too aggressive and broke sync_inodes_sb(). * sync_inodes_sb() must write out I_DIRTY_TIME inodes but I_DIRTY_TIME inodes dont't contribute to bdi/wb_has_dirty_io() tests and were being skipped over. * inodes are taken off wb->b_dirty/io/more_io lists after writeback starts on them. sync_inodes_sb() skipping wait_sb_inodes() when bdi_has_dirty_io() breaks it by making it return while writebacks are in-flight. This patch fixes the breakages by * Removing bdi_has_dirty_io() shortcut from bdi_split_work_to_wbs(). The callers are already testing the condition. * Removing bdi_has_dirty_io() shortcut from sync_inodes_sb() so that it always calls into bdi_split_work_to_wbs() and wait_sb_inodes(). * Making bdi_split_work_to_wbs() consider the b_dirty_time list for WB_SYNC_ALL writebacks. Kudos to Eryu, Dave and Jan for tracking down the issue. Signed-off-by: Tejun Heo Fixes: e79729123f63 ("writeback: don't issue wb_writeback_work if clean") Link: http://lkml.kernel.org/g/20150812101204.GE17933@dhcp-13-216.nay.redhat.com Reported-and-bisected-by: Eryu Guan Cc: Dave Chinner Cc: Jan Kara Cc: Ted Ts'o Signed-off-by: Jens Axboe diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 518c629..5fa588e 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -844,14 +844,15 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, struct wb_iter iter; might_sleep(); - - if (!bdi_has_dirty_io(bdi)) - return; restart: rcu_read_lock(); bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) { - if (!wb_has_dirty_io(wb) || - (skip_if_busy && writeback_in_progress(wb))) + /* SYNC_ALL writes out I_DIRTY_TIME too */ + if (!wb_has_dirty_io(wb) && + (base_work->sync_mode == WB_SYNC_NONE || + list_empty(&wb->b_dirty_time))) + continue; + if (skip_if_busy && writeback_in_progress(wb)) continue; base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages); @@ -899,8 +900,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, { might_sleep(); - if (bdi_has_dirty_io(bdi) && - (!skip_if_busy || !writeback_in_progress(&bdi->wb))) { + if (!skip_if_busy || !writeback_in_progress(&bdi->wb)) { base_work->auto_free = 0; base_work->single_wait = 0; base_work->single_done = 0; @@ -2275,8 +2275,12 @@ void sync_inodes_sb(struct super_block *sb) }; struct backing_dev_info *bdi = sb->s_bdi; - /* Nothing to do? */ - if (!bdi_has_dirty_io(bdi) || bdi == &noop_backing_dev_info) + /* + * Can't skip on !bdi_has_dirty() because we should wait for !dirty + * inodes under writeback and I_DIRTY_TIME inodes ignored by + * bdi_has_dirty() need to be written out too. + */ + if (bdi == &noop_backing_dev_info) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); -- cgit v0.10.2 From 74c9c9134bf8d8a6d5c5683f60262eed3d6fb5ac Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Wed, 29 Jul 2015 10:22:50 -0400 Subject: mtip32x: fix regression introduced by blk-mq per-hctx flush Hi, After commit f70ced091707 (blk-mq: support per-distpatch_queue flush machinery), the mtip32xx driver may oops upon module load due to walking off the end of an array in mtip_init_cmd. On initialization of the flush_rq, init_request is called with request_index >= the maximum queue depth the driver supports. For mtip32xx, this value is used to index into an array. What this means is that the driver will walk off the end of the array, and either oops or cause random memory corruption. The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx driver in a loop. I can typically reproduce the problem in about 30 seconds. Now, in the case of mtip32xx, it actually doesn't support flush/fua, so I think we can simply return without doing anything. In addition, no other mq-enabled driver does anything with the request_index passed into init_request(), so no other driver is affected. However, I'm not really sure what is expected of drivers. Ming, what did you envision drivers would do when initializing the flush requests? Signed-off-by: Jeff Moyer Signed-off-by: Jens Axboe diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 4a2ef09..f504232 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3756,6 +3756,14 @@ static int mtip_init_cmd(void *data, struct request *rq, unsigned int hctx_idx, struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq); u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64; + /* + * For flush requests, request_idx starts at the end of the + * tag space. Since we don't support FLUSH/FUA, simply return + * 0 as there's nothing to be done. + */ + if (request_idx >= MTIP_MAX_COMMAND_SLOTS) + return 0; + cmd->command = dmam_alloc_coherent(&dd->pdev->dev, CMD_DMA_ALLOC_SZ, &cmd->command_dma, GFP_KERNEL); if (!cmd->command) -- cgit v0.10.2