Age | Commit message (Collapse) | Author |
|
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:
- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
In autoclear mode bdev is NULL but the sysfs
entry should be destroyed otherwise this warning appears:
WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x82/0x95()
sysfs: cannot create duplicate filename '/devices/virtual/block/loop0/loop'
Fixes commit ee86273062cbb310665fe49e1f1937d2cf85b0b9
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Ensure kmap_atomic() usage is strictly nested
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...
|
|
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits)
cciss: fix PCI IDs for new Smart Array controllers
drbd: add race-breaker to drbd_go_diskless
drbd: use dynamic_dev_dbg to optionally log uuid changes
dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
drbd: cleanup: change "<= 0" to "== 0"
drbd: relax the grace period of the md_sync timer again
drbd: add some more explicit drbd_md_sync
drbd: drop wrong debug asserts, fix recently introduced race
drbd: cleanup useless leftover warn/error printk's
drbd: add explicit drbd_md_sync to drbd_resync_finished
drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
drbd: fix for possible deadlock on IO error during resync
drbd: fix unlikely access after free and list corruption
drbd: fix for spurious fullsync (uuids rotated too fast)
drbd: allow for explicit resync-finished notifications
drbd: preparation commit, using full state in receive_state()
drbd: drbd_send_ack_dp must not rely on header information
drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
drbd: Fixed a stupid copy and paste error
drbd: Allow larger values for c-fill-target.
...
Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
|
|
The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.
This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead. Also,
instead of checking file->f_op->fsync() directly, look at the value of
vfs_fsync() and ignore -EINVAL return.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().
blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.
All blk_queue_ordered() users are converted.
* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
loop implements FLUSH using fsync but was incorrectly setting its
ordered mode to DRAIN. Change it to DRAIN_FLUSH. In practice, this
doesn't change anything as loop doesn't make use of the block layer
ordered implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Create /sys/block/loopX/loop directory and provide these attributes:
- backing_file
- autoclear
- offset
- sizelimit
This loop directory is present only if loop device is configured.
To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).
Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Return of the bi_rw tests is no longer bool after commit 74450be1. But
results of such tests are stored in bools. This doesn't fit in there
for some compilers (gcc 4.5 here), so either use !! magic to get real
bools or use ulong where the result is assigned somewhere.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.
This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.
The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.
Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This removes q->prepare_flush_fn completely (changes the
blk_queue_ordered API).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Recent udev versions probe loop devices for filesystems meaning that
the /dev/disk hierarchy may contain useful entries such as
$ ls -l /dev/disk/by-label/Fedora-12-x86_64-Live
lrwxrwxrwx 1 root root 11 Mar 11 13:41 /dev/disk/by-label/Fedora-12-x86_64-Live -> ../../loop0
Unfortunately, no "change" uevent is generated when the loop device is
detached so the symlink persists. Additionally, no "change" uevent is
guaranteed to be generated when attaching an fd or changing capacity.
For example, user space could open the loop device O_RDONLY (in fact,
recent util-linux-ng does this) so udev's OPTIONS+="watch" machinery may
not trigger the "change" uevent.
This patch ensures that the "change" uevent is generated in all of
these cases. As a result, the /dev/disk hierarchy works as expected
for loop devices.
Signed-off-by: David Zeuthen <davidz@redhat.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
loop: Update mtime when writing using aops
block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
backing-dev: Handle class_create() failure
Block: Fix block/elevator.c elevator_get() off-by-one error
drbd: lc_element_by_index() never returns NULL
cciss: unlock on error path
cfq-iosched: Do not merge queues of BE and IDLE classes
cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
i2o: Remove the dangerous kobj_to_i2o_device macro
block: remove 16 bytes of padding from struct request on 64bits
cfq-iosched: fix a kbuild regression
block: make CONFIG_BLK_CGROUP visible
Remove GENHD_FL_DRIVERFS
block: Export max number of segments and max segment size in sysfs
block: Finalize conversion of block limits functions
block: Fix overrun in lcm() and move it to lib
vfs: improve writeback_inodes_wb()
paride: fix off-by-one test
drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
...
|
|
Update mtime when writing to backing filesystem using the address space
operations write_begin and write_end.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
|
Commit bb21488482bd36eae6b30b014d93619063773fd4 ("[PATCH] switch loop")
started to pass NULL bdev to ioctl hook.
Steps to reproduce:
[boot with loop.max_part=1]
[mount -o loop something so mount fails]
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
IP: [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
PGD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:35/ACPI0003:00/power_supply/ACAD/online
CPU 0
Modules linked in: zfs nvidia(P) [last unloaded: zfs]
Pid: 15177, comm: mount Tainted: P 2.6.32-rc4-zfs #2 Satellite X200
RIP: 0010:[<ffffffff811486ee>] [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
RSP: 0018:ffff88003b3d5bb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000125f RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88003b3d5ce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffffffff000
R13: 0000000000000000 R14: ffff880071cef280 R15: 00000000000200da
FS: 00007fd77cfe7740(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000b8 CR3: 0000000001001000 CR4: 00000000000026f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount (pid: 15177, threadinfo ffff88003b3d4000, task ffff88007572f920)
Stack:
ffff88003b3d5c38 ffffffff812f95f5 ffff88007eeb6600 0000000000000000
<0> 0000000000000000 ffff88003b3d5c18 ffffffff811547d9 ffff88001bf11ef0
<0> 7fffffffffffffff ffff88001bf11ee8 ffff88001bf11ef0 0000000000000000
Call Trace:
[<ffffffff812f95f5>] ? schedule_timeout+0x1f5/0x250
[<ffffffff811547d9>] ? rb_insert_color+0x109/0x140
[<ffffffff812fb754>] ? _spin_unlock_irq+0x14/0x40
[<ffffffff812f84c6>] ? wait_for_common+0x66/0x170
[<ffffffff8105a280>] ? default_wake_function+0x0/0x10
[<ffffffff810f8258>] ioctl_by_bdev+0x38/0x50
[<ffffffff811d2481>] loop_clr_fd+0x1e1/0x210
[<ffffffff811d2522>] lo_release+0x72/0x80
[<ffffffff810f934c>] __blkdev_put+0x1ac/0x1d0
[<ffffffff810f937b>] blkdev_put+0xb/0x10
[<ffffffff810f93b9>] blkdev_close+0x39/0x60
[<ffffffff810ccef3>] __fput+0xd3/0x230
[<ffffffff810cd06d>] fput+0x1d/0x30
[<ffffffff810c9680>] filp_close+0x50/0x80
[<ffffffff81061f11>] put_files_struct+0x81/0x100
[<ffffffff81061fde>] exit_files+0x4e/0x60
[<ffffffff81063ec5>] do_exit+0x6b5/0x730
[<ffffffff8107b279>] ? up_read+0x9/0x10
[<ffffffff8104c86e>] ? do_page_fault+0x18e/0x2a0
[<ffffffff81063f81>] do_group_exit+0x41/0xc0
[<ffffffff81064012>] sys_exit_group+0x12/0x20
[<ffffffff81030deb>] system_call_fastpath+0x16/0x1b
Code: f8 48 89 e5 48 81 ec 30 01 00 00 48 89 5d d8 4c 89 6d e8 4c 89 65 e0 4c 89 75 f0 4c 89 7d f8 48 89 bd e8 fe ff ff 49 89 cd 89 f3 <49> 8b 88 b8 00 00 00 81 fa 68 12 00 00 0f 84 57 05 00 00 0f 86
RIP [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
RSP <ffff88003b3d5bb8>
CR2: 00000000000000b8
---[ end trace c0b4d3c3118d1427 ]---
Fixing recursive fault but reboot is needed!
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.
This will allow splice and functions using splice, such as the loop
device, to work on all filesystems. This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Now that the bio list management stuff is generic, convert loop to use
bio lists instead of its own private bio list implementation.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
mount/1865 is trying to release lock (&lo->lo_ctl_mutex) at:
but there are no more locks to release!
mutex is already unlocked in loop_clr_fd(), we should not
try to unlock it in lo_release() again.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Add the ability to 'resize' the loop device on the fly.
One practical application is a loop file with XFS filesystem, already
mounted: You can easily enlarge the file (append some bytes) and then call
ioctl(fd, LOOP_SET_CAPACITY, new); The loop driver will learn about the
new size and you can use xfs_growfs later on, which will allow you to use
full capacity of the loop file without the need to unmount.
Test app:
#include <linux/fs.h>
#include <linux/loop.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define _GNU_SOURCE
#include <getopt.h>
char *me;
void usage(FILE *f)
{
fprintf(f, "%s [options] loop_dev [backend_file]\n"
"-s, --set new_size_in_bytes\n"
"\twhen backend_file is given, "
"it will be expanded too while keeping the original contents\n",
me);
}
struct option opts[] = {
{
.name = "set",
.has_arg = 1,
.flag = NULL,
.val = 's'
},
{
.name = "help",
.has_arg = 0,
.flag = NULL,
.val = 'h'
}
};
void err_size(char *name, __u64 old)
{
fprintf(stderr, "size must be larger than current %s (%llu)\n",
name, old);
}
int main(int argc, char *argv[])
{
int fd, err, c, i, bfd;
ssize_t ssz;
size_t sz;
__u64 old, new, append;
char a[BUFSIZ];
struct stat st;
FILE *out;
char *backend, *dev;
err = EINVAL;
out = stderr;
me = argv[0];
new = 0;
while ((c = getopt_long(argc, argv, "s:h", opts, &i)) != -1) {
switch (c) {
case 's':
errno = 0;
new = strtoull(optarg, NULL, 0);
if (errno) {
err = errno;
perror(argv[i]);
goto out;
}
break;
case 'h':
err = 0;
out = stdout;
goto err;
default:
perror(argv[i]);
goto err;
}
}
if (optind < argc)
dev = argv[optind++];
else
goto err;
fd = open(dev, O_RDONLY);
if (fd < 0) {
err = errno;
perror(dev);
goto out;
}
err = ioctl(fd, BLKGETSIZE64, &old);
if (err) {
err = errno;
perror("ioctl BLKGETSIZE64");
goto out;
}
if (!new) {
printf("%llu\n", old);
goto out;
}
if (new < old) {
err = EINVAL;
err_size(dev, old);
goto out;
}
if (optind < argc) {
backend = argv[optind++];
bfd = open(backend, O_WRONLY|O_APPEND);
if (bfd < 0) {
err = errno;
perror(backend);
goto out;
}
err = fstat(bfd, &st);
if (err) {
err = errno;
perror(backend);
goto out;
}
if (new < st.st_size) {
err = EINVAL;
err_size(backend, st.st_size);
goto out;
}
append = new - st.st_size;
sz = sizeof(a);
while (append > 0) {
if (append < sz)
sz = append;
ssz = write(bfd, a, sz);
if (ssz != sz) {
err = errno;
perror(backend);
goto out;
}
append -= sz;
}
err = fsync(bfd);
if (err) {
err = errno;
perror(backend);
goto out;
}
}
err = ioctl(fd, LOOP_SET_CAPACITY, new);
if (err) {
err = errno;
perror("ioctl LOOP_SET_CAPACITY");
}
goto out;
err:
usage(out);
out:
return err;
}
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Tomas Matejicek <tomas@slax.org>
Cc: <util-linux-ng@vger.kernel.org>
Cc: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With CONFIG_PROVE_LOCKING enabled
$ losetup /dev/loop0 file
$ losetup -o 32256 /dev/loop1 /dev/loop0
$ losetup -d /dev/loop1
$ losetup -d /dev/loop0
triggers a [ INFO: possible circular locking dependency detected ]
I think this warning is a false positive.
Open/close on a loop device acquires bd_mutex of the device before
acquiring lo_ctl_mutex of the same device. For ioctl(LOOP_CLR_FD) after
acquiring lo_ctl_mutex, fput on the backing_file might acquire the bd_mutex of
a device, if backing file is a device and this is the last reference to the
file being dropped . But it is guaranteed that it is impossible to have a
circular list of backing devices.(say loop2->loop1->loop0->loop2 is not
possible), which guarantees that this can never deadlock.
So this warning should be suppressed. It is very difficult to annotate lockdep
not to warn here in the correct way. A simple way to silence lockdep could be
to mark the lo_ctl_mutex in ioctl to be a sub class, but this might mask some
other real bugs.
@@ -1164,7 +1164,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
struct loop_device *lo = bdev->bd_disk->private_data;
int err;
- mutex_lock(&lo->lo_ctl_mutex);
+ mutex_lock_nested(&lo->lo_ctl_mutex, 1);
switch (cmd) {
case LOOP_SET_FD:
err = loop_set_fd(lo, mode, bdev, arg);
Or actually marking the bd_mutex after lo_ctl_mutex as a sub class could be
a better solution.
Luckily it is easy to avoid calling fput on backing file with lo_ctl_mutex
held, so no lockdep annotation is required.
If you do not like the special handling of the lo_ctl_mutex just for the
LOOP_CLR_FD ioctl in lo_ioctl(), the mutex handling could be moved inside
each of the individual ioctl handlers and I could send you another patch.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Honour barrier requests in the loop back block device driver.
In case of barrier bios, flush the backing file once before processing the
barrier and once after to guarantee ordering. In case of filesystems that
does not support fsync, barrier bios would be failed with -EOPNOTSUPP.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Upon a 'transfer error block' size is set to -EINVAL, but this becomes positive
since size is unsigned: p->offset still gets incremented.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
In loop_unplug() function is expected that mapping is set
and lo->lo_backing_file is not NULL.
Unfortunately loop_set_fd() set the request queue unplug function,
but loop_clr_fd() doesn't clear that.
Loop device allows open of non-configured loop in some situations.
If the unplug on request queue is called, loop module oopses because
of missing lo_backing_file.
Simple reproducer:
losetup /dev/loop0 /xxx
losetup -d /dev/loop0
dmsetup create x --table "0 1 linear /dev/loop0 0"
EIP is at loop_unplug+0x1d/0x3b
...
Call Trace:
blk_unplug+0x57/0x5e
dm_table_unplug_all+0x34/0x77 [dm_mod]
destroy_inode+0x27/0x38
generic_delete_inode+0xd5/0xd9
iput+0x4b/0x4e
dm_resume+0xca/0xfe [dm_mod]
dev_suspend+0x143/0x165 [dm_mod]
dm_ctl_ioctl+0x18e/0x1cf [dm_mod]
dev_suspend+0x0/0x165 [dm_mod]
dm_ctl_ioctl+0x0/0x1cf [dm_mod]
vfs_ioctl+0x22/0x69
do_vfs_ioctl+0x39d/0x3c7
trace_hardirqs_on+0xb/0xd
remove_vma+0x50/0x56
do_munmap+0x21c/0x237
sys_ioctl+0x2c/0x45
sysenter_do_call+0x12/0x31
Several reports here
http://www.kerneloops.org/search.php?search=loop_unplug
Fix it by simply clear unplug function together with
removing of backing file.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
When there are still queued bios and reference count
drops to zero, loop device must flush all queued bios.
Otherwise it can lead to situation that caller
closes the device, but some bios are still running
and endio() function call later OOpses when uses
unallocated mempool.
This happens for example when running dm-crypt over loop,
here is typical oops backtrace:
Oops: 0000 [#1] PREEMPT SMP
EIP is at mempool_free+0x12/0x6b
...
crypt_dec_pending+0x50/0x54 [dm_crypt]
crypt_endio+0x9f/0xa7 [dm_crypt]
crypt_endio+0x0/0xa7 [dm_crypt]
bio_endio+0x2b/0x2e
loop_thread+0x37a/0x3b1
do_lo_send_aops+0x0/0x165
autoremove_wake_function+0x0/0x33
loop_thread+0x0/0x3b1
kthread+0x3b/0x61
kthread+0x0/0x61
kernel_thread_helper+0x7/0x10
(But crash is reproducible with different dm targets
running over loop device too.)
Patch fixes it by flushing the bios in release call,
reusing the flush mechanism for switching backing store.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.
[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ioctl doesn't need BKL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both. That's this changeset.
2) for each driver convert to new methods. *ALL* drivers
are converted in this series.
3) kill the old (renamed) methods.
Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain. The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.
New methods:
open(bdev, mode)
release(disk, mode)
ioctl(bdev, mode, cmd, arg) /* Called without BKL */
compat_ioctl(bdev, mode, cmd, arg)
locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We can save some atomic ops in the IO path, if we clearly define
the rules of how to modify the queue flags.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
This patch allows to use loop device with partitionned disk image.
Original behavior of loop is not modified.
A new parameter is introduced to define how many partition we want to be
able to manage per loop device. This parameter is "max_part".
For instance, to manage 63 partitions / loop device, we will do:
# modprobe loop max_part=63
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7
And to attach a raw partitionned disk image, the original losetup is used:
# losetup -f etch.img
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7, 1 2008-03-05 14:57 /dev/loop0p1
brw-rw---- 1 root disk 7, 2 2008-03-05 14:57 /dev/loop0p2
brw-rw---- 1 root disk 7, 5 2008-03-05 14:57 /dev/loop0p5
brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7
# mount /dev/loop0p1 /mnt
# ls /mnt
bench cdrom home lib mnt root srv usr
bin dev initrd lost+found opt sbin sys var
boot etc initrd.img media proc selinux tmp vmlinuz
# umount /mnt
# losetup -d /dev/loop0
Of course, the same behavior can be done using kpartx on a loop device,
but modifying loop avoids to stack several layers of block device (loop +
device mapper), this is a very light modification (40% of modifications
are to manage the new parameter).
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.
In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.
In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'. That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier... and OLPC trac #356 can be closed.
The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Don't allocate room for an iovec when it is not needed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
Signed-off-by: Diego Woitasen <diego@woitasen.com.ar>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not
easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know
who still used cryptoapi, but theoretically it is possible. So let's leave
things as they are. Loop device doesn't support partial write before
Nick's "write_begin/write_end" patch set, and let's it behave the same way
after.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).
[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
No need to warn unregister_blkdev() failure by the callers. (The previous
patch makes unregister_blkdev() print error message in error case)
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.
A good return from ->confirm() means that the buffer is really
there, and that the contents are good.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|