Age | Commit message (Collapse) | Author |
|
Pull file locking fixes from Jeff Layton:
"File locking related bugfixes
Nothing too earth-shattering here. A fix for a potential regression
due to a patch in pile #1, and the addition of a memory barrier to
prevent a race condition between break_deleg and generic_add_lease"
* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
locks: set fl_owner for leases back to current->files
locks: add missing memory barrier in break_deleg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This fixes some lockups in btrfs reported with rc1. It probably has
some performance impact because it is backing off our spinning locks
more often and switching to a blocking lock. I'll be able to nail
that down next week, but for now I want to get the lockups taken care
of.
Otherwise some more stack reduction and assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong error handle when the device is missing or is not writeable
Btrfs: fix deadlock when mounting a degraded fs
Btrfs: use bio_endio_nodec instead of open code
Btrfs: fix NULL pointer crash when running balance and scrub concurrently
btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
Btrfs: fix broken free space cache after the system crashed
Btrfs: make free space cache write out functions more readable
Btrfs: remove unused wait queue in struct extent_buffer
Btrfs: fix deadlocks with trylock on tree nodes
|
|
Pull nfsd bugfixes from Bruce Fields:
"Fixes for a new regression from the xdr encoding rewrite, and a
delegation problem we've had for a while (made somewhat more annoying
by the vfs delegation support added in 3.13)"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: fix bug for readdir of pseudofs
NFSD: Don't hand out delegations for 30 seconds after recalling them.
|
|
The original bio might be submitted, so we shoud increase bi_remaining to
account for it when we deal with the error that the device is missing or
is not writeable, or we would skip the endio handle.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The deadlock happened when we mount degraded filesystem, the reproduced
steps are following:
# mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
# echo 1 > /sys/block/`basename <dev0>`/device/delete
# mount -o degraded <dev1> <mnt>
The reason was that the counter -- bi_remaining was wrong. If the missing
or unwriteable device was the last device in the mapping array, we would
not submit the original bio, so we shouldn't increase bi_remaining of it
in btrfs_end_bio(), or we would skip the final endio handle.
Fix this problem by adding a flag into btrfs bio structure. If we submit
the original bio, we will set the flag, and we increase bi_remaining counter,
or we don't.
Though there is another way to fix it -- decrease bi_remaining counter of the
original bio when we make sure the original bio is not submitted, this method
need add more check and is easy to make mistake.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
While running balance, scrub, fsstress concurrently we hit the
following kernel crash:
[56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132
[56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
[56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0
[56561.524325] Oops: 0000 [#1] SMP
[....]
[56561.527237] Call Trace:
[56561.527309] [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs]
[56561.527392] [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0
[56561.527476] [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs]
[56561.527561] [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs]
[56561.527639] [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0
[56561.527712] [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60
[56561.527788] [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0
[56561.527870] [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0
[56561.527941] [<ffffffff815707f7>] tracesys+0xdd/0xe2
[...]
[56561.528304] RIP [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
[56561.528395] RSP <ffff88004c0f5be8>
[56561.528454] CR2: 0000000000000078
This is because in btrfs_relocate_chunk(), we will free @bdev directly while
scrub may still hold extent mapping, and may access freed memory.
Fix this problem by wrapping freeing @bdev work into free_extent_map() which
is based on reference count.
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When run scrub with balance, sometimes -ENOENT will be returned, since
in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but
btrfs_lookup_block_group() will search block group in *MEMORY*, so if a
chunk is removed but not committed, -ENOENT will be returned.
However, there is no need to stop scrubbing since other chunks may be
scrubbed without problem.
So this patch changes the behavior to skip removed chunks and continue
to scrub the rest.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When we mounted the filesystem after the crash, we got the following
message:
BTRFS error (device xxx): block group xxxx has wrong amount of free space
BTRFS error (device xxx): failed to load free space cache for block group xxx
It is because we didn't update the metadata of the allocated space (in extent
tree) until the file data was written into the disk. During this time, there was
no information about the allocated spaces in either the extent tree nor the
free space cache. when we wrote out the free space cache at this time (commit
transaction), those spaces were lost. In fact, only the free space that is
used to store the file data had this problem, the others didn't because
the metadata of them is updated in the same transaction context.
There are many methods which can fix the above problem
- track the allocated space, and write it out when we write out the free
space cache
- account the size of the allocated space that is used to store the file
data, if the size is not zero, don't write out the free space cache.
The first one is complex and may make the performance drop down.
This patch chose the second method, we use a per-block-group variant to
account the size of that allocated space. Besides that, we also introduce
a per-block-group read-write semaphore to avoid the race between
the allocation and the free space cache write out.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
This patch makes the free space cache write out functions more readable,
and beisdes that, it also reduces the stack space that the function --
__btrfs_write_out_cache uses from 194bytes to 144bytes.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The lock_wq wait queue is not used anywhere, therefore just remove it.
On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320
bytes down to 296 bytes, which means a 4Kb page can now be used for
13 extent buffers instead of 12.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The Btrfs tree trylock function is poorly named. It always takes
the spinlock and backs off if the blocking lock is held. This
can lead to surprising lockups because people expect it to really be a
trylock.
This commit makes it a pure trylock, both for the spinlock and the
blocking lock. It also reworks the nested lock handling slightly to
avoid taking the read lock while a spinning write lock might be held.
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Commit 561f0ed498ca (nfsd4: allow large readdirs) introduces a bug
about readdir the root of pseudofs.
Call xdr_truncate_encode() revert encoded name when skipping.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If nfsd needs to recall a delegation for some reason it implies that there is
contention on the file, so further delegations should not be handed out.
The current code fails to do so, and the result is effectively a
live-lock under some workloads: a client attempting a conflicting
operation on a read-delegated file receives NFS4ERR_DELAY and retries
the operation, but by the time it retries the server may already have
given out another delegation.
We could simply avoid delegations for (say) 30 seconds after any recall, but
this is probably too heavy handed.
We could keep a list of inodes (or inode numbers or filehandles) for recalled
delegations, but that requires memory allocation and searching.
The approach taken here is to use a bloom filter to record the filehandles
which are currently blocked from delegation, and to accept the cost of a few
false positives.
We have 2 bloom filters, each of which is valid for 30 seconds. When a
delegation is recalled the filehandle is added to one filter and will remain
disabled for between 30 and 60 seconds.
We keep a count of the number of filehandles that have been added, so when
that count is zero we can bypass all other tests.
The bloom filters have 256 bits and 3 hash functions. This should allow a
couple of dozen blocked filehandles with minimal false positives. If many
more filehandles are all blocked at once, behaviour will degrade towards
rejecting all delegations for between 30 and 60 seconds, then resetting and
allowing new delegations.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This fixes use-after-free of epi->fllink.next inside list loop macro.
This loop actually releases elements in the body. The list is
rcu-protected but here we cannot hold rcu_read_lock because we need to
lock mutex inside.
The obvious solution is to use list_for_each_entry_safe(). RCU-ness
isn't essential because nobody can change this list under us, it's final
fput for this file.
The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL
using rcu")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Stable <stable@vger.kernel.org> # 3.13+
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
"This has a few fixes since our last pull and a new ioctl for doing
btree searches from userland. It's very similar to the existing
ioctl, but lets us return larger items back down to the app"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix error handling in create_pending_snapshot
btrfs: fix use of uninit "ret" in end_extent_writepage()
btrfs: free ulist in qgroup_shared_accounting() error path
Btrfs: fix qgroups sanity test crash or hang
btrfs: prevent RCU warning when dereferencing radix tree slot
Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting
btrfs: new ioctl TREE_SEARCH_V2
btrfs: tree_search, search_ioctl: direct copy to userspace
btrfs: new function read_extent_buffer_to_user
btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW
btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer
btrfs: tree_search, search_ioctl: accept varying buffer
btrfs: tree_search: eliminate redundant nr_items check
|
|
Pull aio fix and cleanups from Ben LaHaise:
"This consists of a couple of code cleanups plus a minor bug fix"
* git://git.kvack.org/~bcrl/aio-next:
aio: cleanup: flatten kill_ioctx()
aio: report error from io_destroy() when threads race in io_destroy()
fs/aio.c: Remove ctx parameter in kiocb_cancel
|
|
fcebe456 cut and pasted some code to a later point
in create_pending_snapshot(), but didn't switch
to the appropriate error handling for this stage
of the function.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
If this condition in end_extent_writepage() is false:
if (tree->ops && tree->ops->writepage_end_io_hook)
we will then test an uninitialized "ret" at:
ret = ret < 0 ? ret : -EIO;
The test for ret is for the case where ->writepage_end_io_hook
failed, and we'd choose that ret as the error; but if
there is no ->writepage_end_io_hook, nothing sets ret.
Initializing ret to 0 should be sufficient; if
writepage_end_io_hook wasn't set, (!uptodate) means
non-zero err was passed in, so we choose -EIO in that case.
Signed-of-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
If tmp = ulist_alloc(GFP_NOFS) fails, we return without
freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
and cause a memory leak.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Often when running the qgroups sanity test, a crash or a hang happened.
This is because the extent buffer the test uses for the root node doesn't
have an header level explicitly set, making it have a random level value.
This is a problem when it's not zero for the btrfs_search_slot() calls
the test ends up doing, resulting in crashes or hangs such as the following:
[ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on
(...)
[ 6454.127760] BTRFS: selftest: Running qgroup tests
[ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup
[ 6454.127966] BTRFS: selftest: Qgroup basic add
[ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383]
[ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs]
[ 6480.152005] irq event stamp: 188448
[ 6480.152005] hardirqs last enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30
[ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80
[ 6480.152005] softirqs last enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450
[ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0
[ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4
[ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000
[ 6480.152005] RIP: 0010:[<ffffffff81349a63>] [<ffffffff81349a63>] __write_lock_failed+0x13/0x20
[ 6480.152005] RSP: 0018:ffff8800d0d038e8 EFLAGS: 00000287
[ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852
[ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8
[ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb
[ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858
[ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0
[ 6480.152005] FS: 00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000
[ 6480.152005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0
[ 6480.152005] Stack:
[ 6480.152005] ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8
[ 6480.152005] ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007
[ 6480.152005] ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16
[ 6480.152005] Call Trace:
[ 6480.152005] [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0
[ 6480.152005] [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80
[ 6480.152005] [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs]
[ 6480.152005] [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs]
[ 6480.152005] [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 6480.152005] [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs]
[ 6480.152005] [<ffffffff811a727f>] ? create_object+0x23f/0x300
[ 6480.152005] [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 6480.152005] [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs]
[ 6480.152005] [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs]
[ 6480.152005] [<ffffffff8108cad6>] ? local_clock+0x16/0x30
[ 6480.152005] [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs]
[ 6480.152005] [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
[ 6480.152005] [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs]
[ 6480.152005] [<ffffffff81000352>] do_one_initcall+0x102/0x150
[ 6480.152005] [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50
[ 6480.152005] [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74
[ 6480.152005] [<ffffffff810d91cc>] load_module+0x1cdc/0x2630
(...)
Therefore initialize the extent buffer as an empty leaf (level 0).
Issue easy to reproduce when btrfs is built as a module via:
$ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Mark the dereference as protected by lock. Not doing so triggers
an RCU warning since the radix tree assumed that RCU is in use.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Steps to reproduce:
# mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5
# mkfs.ext4 /dev/sdc --->corrupt one of btrfs device
# mount /dev/sdb /mnt -o degraded
# btrfs scrub start -BRd /mnt
This is because readahead would skip missing device, this is not true
for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block
mapping. If expected data locates in missing device, readahead thread
would not call __readahead_hook() which makes event @rc->elems=0
wait forever.
Fix this problem by checking return value of btrfs_map_block(),we
can only skip missing device safely if there are several mirrors.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
This new ioctl call allows the user to supply a buffer of varying size in which
a tree search can store its results. This is much more flexible if you want to
receive items which are larger than the current fixed buffer of 3992 bytes or
if you want to fetch more items at once. Items larger than this buffer are for
example some of the type EXTENT_CSUM.
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm fix from David Teigland:
"This contains one small fix related to resending SCTP messages"
* tag 'dlm-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: keep listening connection alive with sctp mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"This has a mix of bug fixes and cleanups.
Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT
check when a second rbd image is mapped and a couple memory leaks.
Zheng fixes several issues with fragmented directories and multiple
MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's
patches fix setting and unsetting RBD images read-only.
Naturally there are several other cleanups mixed in for good measure"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
rbd: only set disk to read-only once
rbd: move calls that may sleep out of spin lock range
rbd: add ioctl for rbd
ceph: use truncate_pagecache() instead of truncate_inode_pages()
ceph: include time stamp in every MDS request
rbd: fix ida/idr memory leak
rbd: use reference counts for image requests
rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
rbd: make sure we have latest osdmap on 'rbd map'
libceph: add ceph_monc_wait_osdmap()
libceph: mon_get_version request infrastructure
libceph: recognize poolop requests in debugfs
ceph: refactor readpage_nounlock() to make the logic clearer
mds: check cap ID when handling cap export message
ceph: remember subtree root dirfrag's auth MDS
ceph: introduce ceph_fill_fragtree()
ceph: handle cap import atomically
ceph: pre-allocate ceph_cap struct for ceph_add_cap()
ceph: update inode fields according to issued caps
rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
"A second round of perf updates:
- wide reaching kprobes sanitization and robustization, with the hope
of fixing all 'probe this function crashes the kernel' bugs, by
Masami Hiramatsu.
- uprobes updates from Oleg Nesterov: tmpfs support, corner case
fixes and robustization work.
- perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
et al:
* Add support to accumulate hist periods (Namhyung Kim)
* various fixes, refactorings and enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
perf: Differentiate exec() and non-exec() comm events
perf: Fix perf_event_comm() vs. exec() assumption
uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
perf/documentation: Add description for conditional branch filter
perf/x86: Add conditional branch filtering support
perf/tool: Add conditional branch filter 'cond' to perf record
perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
uprobes: Teach copy_insn() to support tmpfs
uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
perf/x86: Use common PMU interrupt disabled code
perf/ARM: Use common PMU interrupt disabled code
perf: Disable sampled events if no PMU interrupt
perf: Fix use after free in perf_remove_from_context()
perf tools: Fix 'make help' message error
perf record: Fix poll return value propagation
perf tools: Move elide bool into perf_hpp_fmt struct
perf tools: Remove elide setup for SORT_MODE__MEMORY mode
perf tools: Fix "==" into "=" in ui_browser__warning assignment
perf tools: Allow overriding sysfs and proc finding with env var
perf tools: Consider header files outside perf directory in tags target
...
|
|
By copying each found item seperatly to userspace, we do not need extra
buffer in the kernel.
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
This new function reads the content of an extent directly to user memory.
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
If an item in tree_search is too large to be stored in the given buffer, return
the needed size (including the header).
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
In copy_to_sk, if an item is too large for the given buffer, it now returns
-EOVERFLOW instead of copying a search_header with len = 0. For backward
compatibility for the first item it still copies such a header to the buffer,
but not any other following items, which could have fitted.
tree_search changes -EOVERFLOW back to 0 to behave similiar to the way it
behaved before this patch.
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
rewrite search_ioctl to accept a buffer with varying size
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
If the amount of items reached the given limit of nr_items, we can leave
copy_to_sk without updating the key. Also by returning 1 we leave the loop in
search_ioctl without rechecking if we reached the given limit.
Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
|
|
The connection struct with nodeid 0 is the listening socket,
not a connection to another node. The sctp resend function
was not checking that the nodeid was valid (non-zero), so it
would mistakenly get and resend on the listening connection
when nodeid was zero.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Dentry that had been through (or into) __dentry_kill() might be seen
by shrink_dentry_list(); that's normal, it'll be taken off the shrink
list and freed if __dentry_kill() has already finished. The problem
is, its ->d_parent might be pointing to already freed dentry, so
lock_parent() needs to be careful.
We need to check that dentry hasn't already gone into __dentry_kill()
*and* grab rcu_read_lock() before dropping ->d_lock - the latter makes
sure that whatever we see in ->d_parent after dropping ->d_lock it
won't be freed until we drop rcu_read_lock().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Backmerge of dcache.c changes from mainline. It's that, or complete
rebase...
Conflicts:
fs/splice.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
no callers left
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ocfs2 was using a bunch of splice.c guts...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter(). A bunch of simple cases coverted to that...
[AV: fixed the braino spotted by Cyrill]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs and ext3 changes from Jan Kara:
"Big reiserfs cleanup from Jeff, an ext3 deadlock fix, and some small
cleanups"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
reiserfs: Fix compilation breakage with CONFIG_REISERFS_CHECK
ext3: Fix deadlock in data=journal mode when fs is frozen
reiserfs: call truncate_setsize under tailpack mutex
fs/jbd/revoke.c: replace shift loop by ilog2
reiserfs: remove obsolete __constant_cpu_to_le32
reiserfs: balance_leaf refactor, split up balance_leaf_when_delete
reiserfs: balance_leaf refactor, format balance_leaf_finish_node
reiserfs: balance_leaf refactor, format balance_leaf_new_nodes_paste
reiserfs: balance_leaf refactor, format balance_leaf_paste_right
reiserfs: balance_leaf refactor, format balance_leaf_insert_right
reiserfs: balance_leaf refactor, format balance_leaf_paste_left
reiserfs: balance_leaf refactor, format balance_leaf_insert_left
reiserfs: balance_leaf refactor, pull out balance_leaf{left, right, new_nodes, finish_node}
reiserfs: balance_leaf refactor, pull out balance_leaf_finish_node_paste
reiserfs: balance_leaf refactor pull out balance_leaf_finish_node_insert
reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_paste
reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_insert
reiserfs: balance_leaf refactor, pull out balance_leaf_paste_right
reiserfs: balance_leaf refactor, pull out balance_leaf_insert_right
reiserfs: balance_leaf refactor, pull out balance_leaf_paste_left
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"The biggest change here is Josef's rework of the btrfs quota
accounting, which improves the in-memory tracking of delayed extent
operations.
I had been working on Btrfs stack usage for a while, mostly because it
had become impossible to do long stress runs with slab, lockdep and
pagealloc debugging turned on without blowing the stack. Even though
you upgraded us to a nice king sized stack, I kept most of the
patches.
We also have some very hard to find corruption fixes, an awesome sysfs
use after free, and the usual assortment of optimizations, cleanups
and other fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (80 commits)
Btrfs: convert smp_mb__{before,after}_clear_bit
Btrfs: fix scrub_print_warning to handle skinny metadata extents
Btrfs: make fsync work after cloning into a file
Btrfs: use right type to get real comparison
Btrfs: don't check nodes for extent items
Btrfs: don't release invalid page in btrfs_page_exists_in_range()
Btrfs: make sure we retry if page is a retriable exception
Btrfs: make sure we retry if we couldn't get the page
btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56
trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/
Btrfs: fix leaf corruption after __btrfs_drop_extents
Btrfs: ensure btrfs_prev_leaf doesn't miss 1 item
Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled
btrfs: free delayed node outside of root->inode_lock
btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX
Btrfs: fix transaction leak during fsync call
btrfs: Avoid trucating page or punching hole in a already existed hole.
Btrfs: update commit root on snapshot creation after orphan cleanup
Btrfs: ioctl, don't re-lock extent range when not necessary
Btrfs: avoid visiting all extent items when cloning a range
...
|
|
Pull xfs updates from Dave Chinner:
"This update contains:
- cleanup removing unused function args
- rework of the filestreams allocator to use dentry cache parent
lookups
- new on-disk free inode btree and optimised inode allocator
- various bug fixes
- rework of internal attribute API
- cleanup of superblock feature bit support to remove historic cruft
- more fixes and minor cleanups
- added a new directory/attribute geometry abstraction
- yet more fixes and minor cleanups"
* tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs: (86 commits)
xfs: fix xfs_da_args sparse warning in xfs_readdir
xfs: Fix rounding in xfs_alloc_fix_len()
xfs: tone down writepage/releasepage WARN_ONs
xfs: small cleanup in xfs_lowbit64()
xfs: kill xfs_buf_geterror()
xfs: xfs_readsb needs to check for magic numbers
xfs: block allocation work needs to be kswapd aware
xfs: remove redundant geometry information from xfs_da_state
xfs: replace attr LBSIZE with xfs_da_geometry
xfs: pass xfs_da_args to xfs_attr_leaf_newentsize
xfs: use xfs_da_geometry for block size in attr code
xfs: remove mp->m_dir_geo from directory logging
xfs: reduce direct usage of mp->m_dir_geo
xfs: move node entry counts to xfs_da_geometry
xfs: convert dir/attr btree threshold to xfs_da_geometry
xfs: convert m_dirblksize to xfs_da_geometry
xfs: convert m_dirblkfsbs to xfs_da_geometry
xfs: convert directory segment limits to xfs_da_geometry
xfs: convert directory db conversion to xfs_da_geometry
xfs: convert directory dablk conversion to xfs_da_geometry
...
|
|
There was a bug in debug printout when CONFIG_REISERFS_CHECK was
enabled so one of the assertions in do_balan.c didn't compile. Fix it.
Fixes: 0080e9f9d3ac717537dbd6db1fc8ef72ce0b9cc1
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Merge leftovers from Andrew Morton:
"A few leftovers: ocfs2, gcov, RTC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rtc: s5m: consolidate two device type switch statements
rtc: s5m: add support for S2MPS14 RTC
rtc: s5m: support different register layout
rtc: s5m: use shorter time of register update
rtc: s5m: remove undocumented time init on first boot
mfd/rtc: sec/s5m: rename SEC* symbols to S5M
gcov: add support for GCC 4.9
ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
|
|
rejecting an invalid one
When o2net-accept-one() rejects an illegal connection, it terminates the
loop picking up the remaining queued connections. This fix will
continue accepting connections till the queue is emtpy.
Addresses Orabug 17489469.
Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- massive cleanup of the NFS read/write code by Anna and Dros
- support multiple NFS read/write requests per page in order to deal
with non-page aligned pNFS striping. Also cleans up the r/wsize <
page size code nicely.
- stable fix for ensuring inode is declared uptodate only after all
the attributes have been checked.
- stable fix for a kernel Oops when remounting
- NFS over RDMA client fixes
- move the pNFS files layout driver into its own subdirectory"
* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
NFS: populate ->net in mount data when remounting
pnfs: fix lockup caused by pnfs_generic_pg_test
NFSv4.1: Fix typo in dprintk
NFSv4.1: Comment is now wrong and redundant to code
NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
xprtrdma: Disconnect on registration failure
xprtrdma: Remove BUG_ON() call sites
xprtrdma: Avoid deadlock when credit window is reset
SUNRPC: Move congestion window constants to header file
xprtrdma: Reset connection timeout after successful reconnect
xprtrdma: Use macros for reconnection timeout constants
xprtrdma: Allocate missing pagelist
xprtrdma: Remove Tavor MTU setting
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
xprtrdma: Reduce the number of hardway buffer allocations
xprtrdma: Limit work done by completion handler
xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
xprtrmda: Reduce lock contention in completion handlers
xprtrdma: Split the completion queue
xprtrdma: Make rpcrdma_ep_destroy() return void
...
|