summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-08-17Btrfs: fix a bug of balance on full multi-disk partitionsliubo
When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-17Btrfs: fix an oops of log replayliubo
When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-17Btrfs: detect wether a device supports discardJosef Bacik
We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16cifs: demote cERROR in build_path_from_dentry to cFYIJeff Layton
Running the cthon tests on a recent kernel caused this message to pop occasionally: CIFS VFS: did not end path lookup where expected namelen is 0 Some added debugging showed that namelen and dfsplen were both 0 when this occurred. That means that the read_seqretry returned true. Assuming that the comment inside the if statement is true, this should be harmless and just means that we raced with a rename. If that is the case, then there's no need for alarm and we can demote this to cFYI. While we're at it, print the dfsplen too so that we can see what happened here if the message pops during debugging. Cc: stable@kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-15ceph: fix encoding of ino only (not relative) pathsSage Weil
A 'path' consists of a starting ino and relative component. Encode even when there is no relative component. This is primarily needed by the NFS reexport code. Signed-off-by: Sage Weil <sage@newdream.net>
2011-08-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: flush journal completely before releasing metadata inodes
2011-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: Do not set cifs/ntfs acl using a file handle (try #4) [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable
2011-08-13ext4: fix nomblk_io_submit option so it correctly converts uninit blocksTheodore Ts'o
Bug discovered by Jan Kara: Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back the old IO submission code but apparently it forgot to return the old handling of uninitialized buffers so we unconditionnaly call block_write_full_page() without specifying end_io function. So AFAICS we never convert unwritten extents to written in some cases. For example when I mount the fs as: mount -t ext4 -o nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600); char buf[1024]; memset(buf, 'a', sizeof(buf)); fallocate(fd, 0, 0, 16384); write(fd, buf, sizeof(buf)); I get a file full of zeros (after remounting the filesystem so that pagecache is dropped) instead of seeing the first KB contain 'a's. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.Tao Ma
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We don't increase i_aiodio_unwritten when setting EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process to wait forever. Part of the patch came from Eric in his e-mail, but it doesn't fix the problem met by Michael actually. http://marc.info/?l=linux-ext4&m=131316851417460&w=2 Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inodeJiaying Zhang
Flush inode's i_completed_io_list before calling ext4_io_wait to prevent the following deadlock scenario: A page fault happens while some process is writing inode A. During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Also moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion, ext4_evict_inode() is called before ext4_destroy_inode() and in ext4_evict_inode(), we may call ext4_truncate() without holding i_mutex lock. As a result, there is a race between flush_completed_IO that is called from ext4_ext_truncate() and ext4_end_io_work, which may cause corruption on an io_end structure. This change moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode() to resolve the race between ext4_truncate() and ext4_end_io_work during inode deletion. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13ext4: Fix ext4_should_writeback_data() for no-journal modeCurt Wohlgemuth
ext4_should_writeback_data() had an incorrect sequence of tests to determine if it should return 0 or 1: in particular, even in no-journal mode, 0 was being returned for a non-regular-file inode. This meant that, in non-journal mode, we would use ext4_journalled_aops for directories, symlinks, and other non-regular files. However, calling journalled aop callbacks when there is no valid handle, can cause problems. This would cause a kernel crash with Jan Kara's commit 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data"), because we now dereference 'handle' in ext4_journalled_write_end(). I also added BUG_ONs to check for a valid handle in the obviously journal-only aops callbacks. I tested this running xfstests with a scratch device in these modes: - no-journal - data=ordered - data=writeback - data=journal All work fine; the data=journal run has many failures and a crash in xfstests 074, but this is no different from a vanilla kernel. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-13Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: replace xfs_buf_geterror() with bp->b_error xfs: Check the return value of xfs_buf_read() for NULL "xfs: fix error handling for synchronous writes" revisited xfs: set cursor in xfs_ail_splice() even when AIL was empty xfs: Remove the macro XFS_BUFTARG_NAME xfs: Remove the macro XFS_BUF_TARGET xfs: Remove the macro XFS_BUF_SET_TARGET Replace the macro XFS_BUF_ISPINNED with helper xfs_buf_ispinned xfs: Remove the macro XFS_BUF_SET_PTR xfs: Remove the macro XFS_BUF_PTR xfs: Remove macro XFS_BUF_SET_START xfs: Remove macro XFS_BUF_HOLD xfs: Remove macro XFS_BUF_BUSY and family xfs: Remove the macro XFS_BUF_ERROR and family xfs: Remove the macro XFS_BUF_BFLAGS
2011-08-12xfs: remove subdirectoriesChristoph Hellwig
Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the annoying subdirectories in the XFS source code. Besides the large amount of file rename the only changes are to the Makefile, a few files including headers with the subdirectory prefix, and the binary sysctl compat code that includes a header under fs/xfs/ from kernel/. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12xfs: don't expect xfs headers to be in subdirectoriesAlex Elder
Fix up some #include directives in preparation for moving a few header files out of xfs source subdirectories. Note that "xfs_linux.h" also got its quoting convention for included files switched. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-08-12xfs: replace xfs_buf_geterror() with bp->b_errorChandra Seetharaman
Since we just checked bp for NULL, it is ok to replace xfs_buf_geterror() with bp->b_error in these places. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12xfs: Check the return value of xfs_buf_read() for NULLChandra Seetharaman
Check the return value of xfs_buf_read() for NULL and return ENOMEM if it is NULL. This is necessary in a few spots to avoid subsequent code blindly dereferencing the null buffer pointer. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) e1000e: increase driver version number e1000e: alternate MAC address update e1000e: do not disable receiver on 82574/82583 e1000e: alternate MAC address does not work on device id 0x1060 PCnet: Fix section mismatch bnx2x: disable dcb on 578xx since not supported yet bnx2x: properly clean indirect addresses bnx2x: prevent race between undi_unload and load flows bnx2x: fix select_queue when FCoE is disabled bnx2x: init FCOE FP only once ipv4: some rt_iif -> rt_route_iif conversions net/bridge/netfilter/ebtables.c: use available error handling code net/netlabel/netlabel_kapi.c: add missing cleanup code net/irda: sh_sir: tidyup compile warning net/irda: sh_sir: add missing header net/irda: sh_irda: add missing header slcan: ldisc generated skbs are received in softirq context scm: Capture the full credentials of the scm sender tcp: initialize variable ecn_ok in syncookies path drivers/net/wireless/wl1251: add missing kfree ...
2011-08-12pnfs: Automatically select blocks & objects layoutsBoaz Harrosh
Just like files-layout, blocks & objects layouts are part of the NFS 4.1 protocol and should be automatically selected if NFS_4_1 is selected. The small problem is that these depend on other Kernel support being present, while files only depends on NFS itself. This patch removes from the user choice the presence of objects and blocks layout. But makes sure these are selected only if the depended subsystems are present in the Kernel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-12ext4: Properly count journal credits for long symlinksEric Sandeen
Commit df5e6223407e ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-12ext3: Properly count journal credits for long symlinksEric Sandeen
Commit ae54870a1dc9 ("ext3: Fix lock inversion in ext3_symlink()") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11move RLIMIT_NPROC check from set_user() to do_execve_common()Vasiliy Kulikov
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC check in set_user() to check for NPROC exceeding via setuid() and similar functions. Before the check there was a possibility to greatly exceed the allowed number of processes by an unprivileged user if the program relied on rlimit only. But the check created new security threat: many poorly written programs simply don't check setuid() return code and believe it cannot fail if executed with root privileges. So, the check is removed in this patch because of too often privilege escalations related to buggy programs. The NPROC can still be enforced in the common code flow of daemons spawning user processes. Most of daemons do fork()+setuid()+execve(). The check introduced in execve() (1) enforces the same limit as in setuid() and (2) doesn't create similar security issues. Neil Brown suggested to track what specific process has exceeded the limit by setting PF_NPROC_EXCEEDED process flag. With the change only this process would fail on execve(), and other processes' execve() behaviour is not changed. Solar Designer suggested to re-check whether NPROC limit is still exceeded at the moment of execve(). If the process was sleeping for days between set*uid() and execve(), and the NPROC counter step down under the limit, the defered execve() failure because NPROC limit was exceeded days ago would be unexpected. If the limit is not exceeded anymore, we clear the flag on successful calls to execve() and fork(). The flag is also cleared on successful calls to set_user() as the limit was exceeded for the previous user, not the current one. Similar check was introduced in -ow patches (without the process flag). v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user(). Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11cifs: Do not set cifs/ntfs acl using a file handle (try #4)Shirish Pargaonkar
Set security descriptor using path name instead of a file handle. We can't be sure that the file handle has adequate permission to set a security descriptor (to modify DACL). Function set_cifs_acl_by_fid() has been removed since we can't be sure how a file was opened for writing, a valid request can fail if the file was not opened with two above mentioned permissions. We could have opted to add on WRITE_DAC and WRITE_OWNER permissions to file opens and then use that file handle but adding addtional permissions such as WRITE_DAC and WRITE_OWNER could cause an any open to fail. And it was incorrect to look for read file handle to set a security descriptor anyway. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-11[CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines ↵Steve French
more readable Christoph had requested that the stats related code (in CONFIG_CIFS_STATS2) be moved into helpers to make code flow more readable. This patch should help. For example the following section from transport.c spin_unlock(&GlobalMid_Lock); atomic_inc(&ses->server->num_waiters); wait_event(ses->server->request_q, atomic_read(&ses->server->inFlight) < cifs_max_pending); atomic_dec(&ses->server->num_waiters); spin_lock(&GlobalMid_Lock); becomes simpler (with the patch below): spin_unlock(&GlobalMid_Lock); cifs_num_waiters_inc(server); wait_event(server->request_q, atomic_read(&server->inFlight) < cifs_max_pending); cifs_num_waiters_dec(server); spin_lock(&GlobalMid_Lock); Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steve French <sfrench@us.ibm.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
2011-08-11NFS41: make PNFS_BLOCK selectablePeng Tao
PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other pnfs layout drivers. Seperate it out so others can still build when BLK_DEV_DM/MD is not enabled. Also change select to depends on to avoid build failures. Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Peng Tao <peng_tao@emc.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-10"xfs: fix error handling for synchronous writes" revisitedAjeet Yadav
xfs: fix for hang during synchronous buffer write error If removed storage while synchronous buffer write underway, "xfslogd" hangs. Detailed log http://oss.sgi.com/archives/xfs/2011-07/msg00740.html Related work bfc60177f8ab509bc225becbb58f7e53a0e33e81 "xfs: fix error handling for synchronous writes" Given that xfs_bwrite actually does the shutdown already after waiting for the b_iodone completion and given that we actually found that calling xfs_force_shutdown from inside xfs_buf_iodone_callbacks was a major contributor the problem it better to drop this call. Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: Ecryptfs: Add mount option to check uid of device being mounted = expect uid eCryptfs: Fix payload_len unitialized variable warning eCryptfs: fix compile error eCryptfs: Return error when lower file pointer is NULL
2011-08-10Ecryptfs: Add mount option to check uid of device being mounted = expect uidJohn Johansen
Close a TOCTOU race for mounts done via ecryptfs-mount-private. The mount source (device) can be raced when the ownership test is done in userspace. Provide Ecryptfs a means to force the uid check at mount time. Signed-off-by: John Johansen <john.johansen@canonical.com> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09xfs: set cursor in xfs_ail_splice() even when AIL was emptyAlex Elder
In xfs_ail_splice(), if a cursor is provided it is updated to point to the last item on the list being spliced into the AIL. But if the AIL was found to be empty, the cursor (if provided) is just initialized instead. There is no reason the empty AIL case needs to be treated any differently. And treating it the same way allows this code to be rearranged a bit, with a somewhat tidier result. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-08-09eCryptfs: Fix payload_len unitialized variable warningTyler Hicks
fs/ecryptfs/keystore.c: In function ‘ecryptfs_generate_key_packet_set’: fs/ecryptfs/keystore.c:1991:28: warning: ‘payload_len’ may be used uninitialized in this function [-Wuninitialized] fs/ecryptfs/keystore.c:1976:9: note: ‘payload_len’ was declared here Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09eCryptfs: fix compile errorRoberto Sassu
This patch fixes the compile error reported at the address: https://bugzilla.kernel.org/show_bug.cgi?id=40292 The problem arises when compiling eCryptfs as built-in and the 'encrypted' key type as a module. The patch prevents this combination from being set in the kernel configuration, by fixing the eCryptfs dependencies. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reported-by: David Hill <hilld@binarystorm.net> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09eCryptfs: Return error when lower file pointer is NULLTyler Hicks
When an eCryptfs inode's lower file has been closed, and the pointer has been set to NULL, return an error when trying to do a lower read or write rather than calling BUG(). https://bugzilla.kernel.org/show_bug.cgi?id=37292 Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: <stable@kernel.org>
2011-08-08autofs4: fix debug printk warning uncovered by cleanupLinus Torvalds
The previous comit made the autofs4 debug printouts check types against the printout format, and uncovered this bug: fs/autofs4/waitq.c:106:2: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 4 has type ‘autofs_wqt_t’ which is due to the insane type for wait_queue_token. That thing should be some fixed well-defined size (preferably just 'unsigned int' or 'u32') but for unexplained reasons it is randomly either 'unsigned long' or 'unsigned int' depending on the architecture. For now, cast it to 'unsigned long' for printing, the way we do elsewhere. Somebody else can try to explain the typedef mess. (There's a reason we don't support excessive use of typedefs in the kernel: it's usually just a good way of confusing yourself). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08autofs4: clean up uaotfs use of debug/info/warning printoutsLinus Torvalds
Use 'pr_debug()' for DPRINTK, which will do the proper type checking on the arguments (without generating code) even when DEBUG isn't #defined. Also, use the standard __VA_ARGS__ for the macros, and stop the pointless abuse of 'do { xyz } while (0)' when the macro is already a perfectly well-formed single statement. Reported-by: David Howells <dhowells@redhat.com> Suggested-by: Joe Perches <joe@perches.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08fuse: mark pages accessed when written toJohannes Weiner
As fuse does not use the page cache library functions when userspace writes to a file, it did not benefit from 'c8236db mm: mark page accessed before we write_end()' that made sure pages are properly marked accessed when written to. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08fuse: delete dead .write_begin and .write_end aopsJohannes Weiner
Ever since 'ea9b990 fuse: implement perform_write', the .write_begin and .write_end aops have been dead code. Their task - acquiring a page from the page cache, sending out a write request and releasing the page again - is now done batch-wise to maximize the number of pages send per userspace request. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08fuse: fix flockMiklos Szeredi
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a number of issues with supporing flock locks over existing POSIX locking infrastructure: - it's not backward compatible, passing flock(2) calls to userspace unconditionally (if userspace sets FUSE_POSIX_LOCKS) - it doesn't cater for the fact that flock locks are automatically unlocked on file release - it doesn't take into account the fact that flock exclusive locks (write locks) don't need an fd opened for write. The last one invalidates the original premise of the patch that flock locks can be emulated with POSIX locks. This patch fixes the first two issues. The last one needs to be fixed in userspace if the filesystem assumed that a write lock will happen only on a file operned for write (as in the case of the current fuse library). Reported-by: Sebastian Pipping <webmaster@hartwork.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08Merge branch 'master' of ↵Alex Elder
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
2011-08-08compat_ioctl: add compat handler for PPPIOCGL2TPSTATSFlorian Westphal
fixes following error seen on x86_64 kernel: ioctl32(openl2tpd:7480): Unknown cmd fd(14) cmd(80487436){t:'t';sz:72} arg(ffa7e6c0) on socket:[105094] The argument (struct pppol2tp_ioc_stats) uses "aligned_u64" and thus doesn't need fixups. Cc: James Chapman <jchapman@katalix.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07vfs: rename 'do_follow_link' to 'should_follow_link'Linus Torvalds
Al points out that the do_follow_link() helper function really is misnamed - it's about whether we should try to follow a symlink or not, not about actually doing the following. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07Fix POSIX ACL permission checkAri Savolainen
After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached" posix_acl_permission is being called with an unsupported flag and the permission check fails. This patch fixes the issue. Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-07Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
* 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Make ore its own module exofs: Rename raid engine from exofs/ios.c => ore exofs: ios: Move to a per inode components & device-table exofs: Move exofs specific osd operations out of ios.c exofs: Add offset/length to exofs_get_io_state exofs: Fix truncate for the raid-groups case exofs: Small cleanup of exofs_fill_super exofs: BUG: Avoid sbi realloc exofs: Remove pnfs-osd private definitions nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
2011-08-07vfs: optimize inode cache access patternsLinus Torvalds
The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07vfs: renumber DCACHE_xyz flags, remove some stale onesLinus Torvalds
Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07ore: Make ore its own moduleBoaz Harrosh
Export everything from ore need exporting. Change Kbuild and Kconfig to build ore.ko as an independent module. Import ore from exofs Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-07exofs: Rename raid engine from exofs/ios.c => oreBoaz Harrosh
ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-07exofs: ios: Move to a per inode components & device-tableBoaz Harrosh
Exofs raid engine was saving on memory space by having a single layout-info, single pid, and a single device-table, global to the filesystem. Then passing a credential and object_id info at the io_state level, private for each inode. It would also devise this contraption of rotating the device table view for each inode->ino to spread out the device usage. This is not compatible with the pnfs-objects standard, demanding that each inode can have it's own layout-info, device-table, and each object component it's own pid, oid and creds. So: Bring exofs raid engine to be usable for generic pnfs-objects use by: * Define an exofs_comp structure that holds obj_id and credential info. * Break up exofs_layout struct to an exofs_components structure that holds a possible array of exofs_comp and the array of devices + the size of the arrays. * Add a "comps" parameter to get_io_state() that specifies the ids creds and device array to use for each IO. This enables to keep the layout global, but the device-table view, creds and IDs at the inode level. It only adds two 64bit to each inode, since some of these members already existed in another form. * ios raid engine now access layout-info and comps-info through the passed pointers. Everything is pre-prepared by caller for generic access of these structures and arrays. At the exofs Level: * Super block holds an exofs_components struct that holds the device array, previously in layout. The devices there are in device-table order. The device-array is twice bigger and repeats the device-table twice so now each inode's device array can point to a random device and have a round-robin view of the table, making it compatible to previous exofs versions. * Each inode has an exofs_components struct that is initialized at load time, with it's own view of the device table IDs and creds. When doing IO this gets passed to the io_state together with the layout. While preforming this change. Bugs where found where credentials with the wrong IDs where used to access the different SB objects (super.c). As well as some dead code. It was never noticed because the target we use does not check the credentials. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-07exofs: Move exofs specific osd operations out of ios.cBoaz Harrosh
ios.c will be moving to an external library, for use by the objects-layout-driver. Remove from it some exofs specific functions. Also g_attr_logical_length is used both by inode.c and ios.c move definition to the later, to keep it independent Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-07exofs: Add offset/length to exofs_get_io_stateBoaz Harrosh
In future raid code we will need to know the IO offset/length and if it's a read or write to determine some of the array sizes we'll need. So add a new exofs_get_rw_state() API for use when writeing/reading. All other simple cases are left using the old way. The major change to this is that now we need to call exofs_get_io_state later at inode.c::read_exec and inode.c::write_exec when we actually know these things. So this patch is kept separate so I can test things apart from other changes. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: cope with negative dentries in cifs_get_root cifs: convert prefixpath delimiters in cifs_build_path_to_root CIFS: Fix missing a decrement of inFlight value cifs: demote DFS referral lookup errors to cFYI Revert "cifs: advertise the right receive buffer size to the server"
2011-08-06vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> filesLinus Torvalds
The CLOEXE bit is magical, and for performance (and semantic) reasons we don't actually maintain it in the file descriptor itself, but in a separate bit array. Which means that when we show f_flags, the CLOEXE status is shown incorrectly: we show the status not as it is now, but as it was when the file was opened. Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit array. Uli needs this in order to re-implement the pfiles program: "For normal file descriptors (not sockets) this was the last piece of information which wasn't available. This is all part of my 'give Solaris users no reason to not switch' effort. I intend to offer the code to the util-linux-ng maintainers." Requested-by: Ulrich Drepper <drepper@akkadia.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>