Age | Commit message (Collapse) | Author |
|
There are several places where we make allocations with GFP_KERNEL while under
a transaction, which could lead to an assertion panic or lockup if under
memory pressure. This patch switches these problem areas to use GFP_NOFS to
keep these problems from happening.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We should always put inode we have reference to, even if quota was reenabled
in the mean time.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix kernel-doc notation in jbd.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.
We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.
My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
1) add_wait_queue_exclusive() adds thread to ctx->wait
2) aio_read_evt() to check tail
3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
In aio_complete() with processor #2:
A) info->tail = tail;
B) waitqueue_active(&ctx->wait)
C) if waitqueue_active() returned non-0, call wake_up()
The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.
The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors. Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
so proc 1 sleeps indefinitely
My patch adds a memory barrier between steps A and B. It ensures that the
update in step 1 gets seen on processor 2 before continuing. If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).
Before the patch our AIO process would hang virtually 100% of the time. After
the patch, we have yet to see the process ever hang.
Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Ignoring the return value from nfs_pageio_add_request can cause deadlocks.
In read path:
call nfs_pageio_add_request from readpage_async_filler
assume at this point that there are requests already in desc, that
can't be merged with the current request.
so nfs_pageio_doio is fired up to clear out desc.
assume something goes wrong in setting up the io, so desc->pg_error is set.
This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
request.
BUT, since return code is ignored, readpage_async_filler assumes it has
been added, and does nothing further, leaving page locked.
do_generic_mapping_read will eventually call lock_page, resulting in deadlock
In write path:
page is marked dirty by generic_perform_write
nfs_writepages is called
call nfs_pageio_add_request from nfs_page_async_flush
assume at this point that there are requests already in desc, that
can't be merged with the current request.
so nfs_pageio_doio is fired up to clear out desc.
assume something goes wrong in setting up the io, so desc->pg_error is set.
This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
request, yet marking the request as locked (PG_BUSY) and in writeback,
clearing dirty marks.
The next time a write is done to the page, deadlock will result as
nfs_write_end calls nfs_update_request
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Somebody had put struct nameidata in stack frame of link_path_walk().
Unfortunately, there are certain realities to deal with:
* It's in the middle of recursion. Depth is equal to the nesting
depth of symlinks, i.e. up to 8.
* struct namiedata is, even if one discards the intent junk,
at least 12 pointers + 5 ints.
* moreover, adding a stack frame is not free in that situation.
* there are fs methods called on top of that, and they also have
stack footprint.
* kernel stack is not infinite.
The thing is, even if one chooses to deal with -ESTALE that way (and it's
one hell of an overkill), the only thing that needs to be preserved is
vfsmount + dentry, not the entire struct nameidata.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
once we'd done d_instantiate(), we should only do dput().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Some new uses of get_empty_filp() have crept in; switched
to alloc_file() to make sure that pieces of initialization
won't be missing.
We really need to kill get_empty_filp().
[AV] fixed dentry leak on failure exit in anon_inode_getfd()
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Make sure no-one calls dentry_open with a NULL vfsmount argument and crap
out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been
problematic, but with the per-mount r/o tracking we can't accept anymore
at all.
[AV] the last place that passed NULL had been eliminated by the previous
patch (reiserfs xattr stuff)
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
After several posts and bug reports regarding interaction with the NULL
nameidata, here's a patch to clean up the mess with struct file in the
reiserfs xattr code.
As observed in several of the posts, there's really no need for struct file
to exist in the xattr code. It was really only passed around due to the
f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it.
reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the
struct file passed to it, and the xattr code uses a private version of
reiserfs_readdir() to enumerate the xattr directories.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* hppfs_iget() and its users are racy; there's no need to pollute icache
anyway, new_inode() works fine and is safe, unlike the current kludges
(these relied on overwriting ->i_ino before another iget_locked() gets
to that one - and did it after unlocking).
* merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are
at it.
* to pass proper vfsmount to dentry_open() store the reference
in hppfs superblock.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
--
|
|
Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a
procfs instance and passed the vfsmount on through the inode private data.
Also fixes a procfs file_system_type leak for every attempted hppfs mount.
[ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ]
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] export sessionid alongside the loginuid in procfs
|
|
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "unexport bio_{,un}map_user"
relay: fix subbuf_splice_actor() adding too many pages
The ps2esdi driver was marked as BROKEN more than two years ago due to being
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vfs_kern_mount() requires having a reference to fs type, which
makes it impossible for module to create procfs, etc. private
mount. Open-coding is not an option, since e.g. put_filesystem()
is _not_ exported, and for a good reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Outside users like asmlib uses the mapping functions. API wise, the
export is definitely sane. It's a better idea to keep this export
than to require external users to open-code this piece of code instead.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
oops and fs corruption; the latter can happen even on valid fs in case of oom.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This bug was always here, but before my commit 6fa02839bf9412e18e77
("recheck for secure ports in fh_verify"), it could only be triggered by
failure of a kmalloc(). After that commit it could be triggered by a
client making a request from a non-reserved port for access to an export
marked "secure". (Exports are "secure" by default.)
The result is a struct svc_export with a reference count one too low,
resulting in likely oopses next time the export is accessed.
The reference counting here is not straightforward; a later patch will
clean up fh_verify().
Thanks to Lukas Hejtmanek for the bug report and followup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Shirish Pargaonkar noted:
With cifsacl mount option, when a file is created on the Windows server,
exclusive oplock is broken right away because the get cifs acl code
again opens the file to obtain security descriptor.
The client does not have the newly created file handle or inode in any
of its lists yet so it does not respond to oplock break and server waits for
its duration and then responds to the second open. This slows down file
creation signficantly. The fix is to pass the file descriptor to the get
cifsacl code wherever available so that get cifs acl code does not send
second open (NT Create ANDX) and oplock is not broken.
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
|
|
Kukks noticed that cp -p can write out file data too late, after the timestamp
is already set. This was introduced as an unintentional sideeffect of the change
in an earlier patch (see below) which fixed some delayed return code propagation.
cea218054ad277d6c126890213afde07b4eb1602
Author: Jeff Layton <jlayton@redhat.com>
Date: Tue Nov 20 23:19:03 2007 +0000
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
Fix pagemap_read() error handling by releasing acquired resources and checking
for get_user_pages() partial failure.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
[SCTP]: Fix local_addr deletions during list traversals.
net: fix build with CONFIG_NET=n
[TCP]: Prevent sending past receiver window with TSO (at last skb)
rt2x00: Add new D-Link USB ID
rt2x00: never disable multicast because it disables broadcast too
libertas: fix the 'compare command with itself' properly
drivers/net/Kconfig: fix whitespace for GELIC_WIRELESS entry
[NETFILTER]: nf_queue: don't return error when unregistering a non-existant handler
[NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists
[NETFILTER]: nfnetlink_log: fix EPERM when binding/unbinding and instance 0 exists
[NETFILTER]: nf_conntrack: replace horrible hack with ksize()
[NETFILTER]: nf_conntrack: add \n to "expectation table full" message
[NETFILTER]: xt_time: fix failure to match on Sundays
[NETFILTER]: nfnetlink_log: fix computation of netlink skb size
[NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb.
[NETFILTER]: nfnetlink: fix ifdef in nfnetlink_compat.h
[NET]: include <linux/types.h> into linux/ethtool.h for __u* typedef
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
RxRPC: fix rxrpc_recvmsg()'s returning of msg_name
net/enc28j60: oops fix
...
|
|
fs/built-in.o:(.rodata+0x1134): undefined reference to `proc_net_inode_operations'
fs/built-in.o:(.rodata+0x1138): undefined reference to `proc_net_operations'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
|
|
In some situations, ocfs2_set_nn_state might get called with sc = NULL and
valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node
member. Instead, do what o2net_initialize_handshake does and use NULL when
calling o2net_reconnect_delay and o2net_idle_timeout.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
This patch addresses the bug in which the dlm_thread could go to sleep
while holding the dlm_spinlock.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
Knowing the dlm recovery master helps in debugging recovery
issues. This patch prints a message on the recovery master node.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
dlm_master_request_handler() forgot to put a lockres when
dlm_assert_master_worker() failed or was skipped.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
During migration, the recovery master node may be asked to master a lockres
it may not know about. In that case, it would not only have to create a
lockres and add it to the hash, but also remember to to do the _put_
corresponding to the kref_init in dlm_init_lockres(), as soon as the migration
is completed. Yes, we don't wait for the dlm_purge_lockres() to do that
matching put. Note the ref added for it being in the hash protects the lockres
from being freed prematurely.
This patch adds that missing put, as described above, to plug a memleak.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
Normally locks for remote nodes are freed when that node sends an UNLOCK
message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action
to do an extra put on the lock at the end.
However, there are times when the master node has to free the locks for the
remote nodes forcibly.
Two cases when this happens are:
1. When the master has migrated the lockres plus all locks to another node.
2. When the master is clearing all the locks of a dead node.
It was in the above two conditions that the dlm was missing the extra put.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we
were putting cpu byteorder values into it. Swap things to the right endian
before storing.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
struct dlm_query_join_packet is made up of four one-byte fields. They
are effectively in big-endian order already. However, little-endian
machines swap them before putting the packet on the wire (because
query_join's response is a status, and that status is treated as a u32
on the wire). Thus, a big-endian and little-endian machines will
treat this structure differently.
The solution is to have little-endian machines swap the structure when
converting from the structure to the u32 representation.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
__dlm_print_one_lock_resource must be called with spin_lock
the res->spinlock. While in some cases, we use it without this
precondition and lead to the failure of assert_spin_locked.
So call dlm_print_one_lock_resource instead.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels':
fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
if DFS junction point
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
NFS: Fix the fsid revalidation in nfs_update_inode()
SUNRPC: Fix a nfs4 over rdma transport oops
NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
|
|
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] fix inode leak in xfs_iget_core()
[XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many
|
|
If the radix_tree_preload() fails, we need to destroy the inode we just
read in before trying again. This could leak xfs_vnode structures when
there is memory pressure. Noticed by Christoph Hellwig.
SGI-PV: 977823
SGI-Modid: xfs-linux-melb:xfs-kern:30606a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
|
wakeups
Idle state is not being detected properly by the xfsaild push code. The
current idle state is detected by an empty list which may never happen
with mostly idle filesystem or one using lazy superblock counters. A
single dirty item in the list that exists beyond the push target can
result repeated looping attempting to push up to the target because it
fails to check if the push target has been acheived or not.
Fix by considering a dirty list with everything past the target as an idle
state and set the timeout appropriately.
SGI-PV: 977545
SGI-Modid: xfs-linux-melb:xfs-kern:30532a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|