summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-04-13Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()Li Zefan
btrfs_next_leaf() can return -errno, and we should propagate it to userspace. This also simplifies how we walk the btree path. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()Li Zefan
btrfs_next_leaf() can return -errno, and we should propagate it to userspace. This also simplifies how we walk the btree path. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13Btrfs: make uncache_state unconditionalChris Mason
The extent_io code can take cached pointers into the extent state trees, and these can make lookups much faster in common operations. The caching only happens when specific bits are set that prevent merging and splitting of the extent state. A help function was added to uncache the state, and it was testing the same set of conditionals. This can leak in very strange corner cases where the lock bit goes away unexpectedly. The uncaching should be unconditional. Once we have a ref on the extent we should always give it up. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3) [CIFS] Warn on requesting default security (ntlm) on mount [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood cifs: wrap received signature check in srv_mutex cifs: clean up various nits in unicode routines (try #2) cifs: clean up length checks in check2ndT2 cifs: set ra_pages in backing_dev_info cifs: fix broken BCC check in is_valid_oplock_break cifs: always do is_path_accessible check in cifs_mount various endian fixes to cifs Elminate sparse __CHECK_ENDIAN__ warnings on port conversion Max share size is too small Allow user names longer than 32 bytes cifs: replace /proc/fs/cifs/Experimental with a module parm cifs: check for private_data before trying to put it
2011-04-12nfs: don't call __mark_inode_dirty while holding i_lockDave Chinner
nfs_scan_commit() is called with the inode->i_lock held, but it then calls __mark_inode_dirty() while still holding the lock. This causes a deadlock. Push the inode->i_lock into nfs_scan_commit() so it can protect only the parts of the code it needs to and can be dropped before the call to __mark_inode_dirty() to avoid the deadlock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Will Simoneau <simoneau@ele.uri.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12Revert "vfs: Export file system uuid via /proc/<pid>/mountinfo"Linus Torvalds
This reverts commit 93f1c20bc8cdb757be50566eff88d65c3b26881f. It turns out that libmount misparses it because it adds a '-' character in the uuid string, which libmount then incorrectly confuses with the separator string (" - ") at the end of all the optional arguments. Upstream libmount (in the util-linux tree) has been fixed, but until that fix actually percolates up to users, we'd better not expose this change in the kernel. Let's revisit this later (possibly by exposing the UUID without any '-' characters in it, avoiding the user-space bug). Reported-by: Dave Jones <davej@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Karel Zak <kzak@redhat.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3)Jeff Layton
This is more or less the same patch as before, but with some merge conflicts fixed up. If a process has a dirty page mapped into its page tables, then it has the ability to change it while the client is trying to write the data out to the server. If that happens after the signature has been calculated then that signature will then be wrong, and the server will likely reset the TCP connection. This patch adds a page_mkwrite handler for CIFS that simply takes the page lock. Because the page lock is held over the life of writepage and writepages, this prevents the page from becoming writeable until the write call has completed. With this, we can also remove the "sign_zero_copy" module option and always inline the pages when writing. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12[CIFS] Warn on requesting default security (ntlm) on mountSteve French
Warn once if default security (ntlm) requested. We will update the default to the stronger security mechanism (ntlmv2) in 2.6.41. Kerberos is also stronger than ntlm, but more servers support ntlmv2 and ntlmv2 does not require an upcall, so ntlmv2 is a better default. Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12[CIFS] cifs: clarify the meaning of tcpStatus == CifsGoodSteve French
When the TCP_Server_Info is first allocated and connected, tcpStatus == CifsGood means that the NEGOTIATE_PROTOCOL request has completed and the socket is ready for other calls. cifs_reconnect however sets tcpStatus to CifsGood as soon as the socket is reconnected and the optional RFC1001 session setup is done. We have no clear way to tell the difference between these two states, and we need to know this in order to know whether we can send an echo or not. Resolve this by adding a new statusEnum value -- CifsNeedNegotiate. When the socket has been connected but has not yet had a NEGOTIATE_PROTOCOL request done, set it to this value. Once the NEGOTIATE is done, cifs_negotiate_protocol will set tcpStatus to CifsGood. This also fixes and cleans the logic in cifs_reconnect and cifs_reconnect_tcon. The old code checked for specific states when what it really wants to know is whether the state has actually changed from CifsNeedReconnect. Reported-and-Tested-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: wrap received signature check in srv_mutexJeff Layton
While testing my patchset to fix asynchronous writes, I hit a bunch of signature problems when testing with signing on. The problem seems to be that signature checks on receive can be running at the same time as a process that is sending, or even that multiple receives can be checking signatures at the same time, clobbering the same data structures. While we're at it, clean up the comments over cifs_calculate_signature and add a note that the srv_mutex should be held when calling this function. This patch seems to fix the problems for me, but I'm not clear on whether it's the best approach. If it is, then this should probably go to stable too. Cc: stable@kernel.org Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: clean up various nits in unicode routines (try #2)Jeff Layton
Minor revision to the original patch. Don't abuse the __le16 variable on the stack by casting it to wchar_t and handing it off to char2uni. Declare an actual wchar_t on the stack instead. This fixes a valid sparse warning. Fix the spelling of UNI_ASTERISK. Eliminate the unneeded len_remaining variable in cifsConvertToUCS. Also, as David Howells points out. We were better off making cifsConvertToUCS *not* use put_unaligned_le16 since it means that we can't optimize the mapped characters at compile time. Switch them instead to use cpu_to_le16, and simply use put_unaligned to set them in the string. Reported-and-acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: clean up length checks in check2ndT2Jeff Layton
Thus spake David Howells: The code that follows this: remaining = total_data_size - data_in_this_rsp; if (remaining == 0) return 0; else if (remaining < 0) { generates better code if you drop the 'remaining' variable and compare the values directly. Clean it up per his recommendation... Reported-and-acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: set ra_pages in backing_dev_infoJeff Layton
Commit 522440ed made cifs set backing_dev_info on the mapping attached to new inodes. This change caused a fairly significant read performance regression, as cifs started doing page-sized reads exclusively. By virtue of the fact that they're allocated as part of cifs_sb_info by kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any readahead. This forces the normal read codepaths to use readpage instead of readpages causing a four-fold increase in the number of read calls with the default rsize. Fix it by setting ra_pages in the BDI to the same value as that in the default_backing_dev_info. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662 Cc: stable@kernel.org Reported-and-Tested-by: Till <till2.schaefer@uni-dortmund.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: fix broken BCC check in is_valid_oplock_breakJeff Layton
The BCC is still __le16 at this point, and in any case we need to use the get_bcc_le macro to make sure we don't hit alignment problems. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: always do is_path_accessible check in cifs_mountJeff Layton
Currently, we skip doing the is_path_accessible check in cifs_mount if there is no prefixpath. I have a report of at least one server however that allows a TREE_CONNECT to a share that has a DFS referral at its root. The reporter in this case was using a UNC that had no prefixpath, so the is_path_accessible check was not triggered and the box later hit a BUG() because we were chasing a DFS referral on the root dentry for the mount. This patch fixes this by removing the check for a zero-length prefixpath. That should make the is_path_accessible check be done in this situation and should allow the client to chase the DFS referral at mount time instead. Cc: stable@kernel.org Reported-and-Tested-by: Yogesh Sharma <ysharma@cymer.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12various endian fixes to cifsSteve French
make modules C=2 M=fs/cifs CF=-D__CHECK_ENDIAN__ Found for example: CHECK fs/cifs/cifssmb.c fs/cifs/cifssmb.c:728:22: warning: incorrect type in assignment (different base types) fs/cifs/cifssmb.c:728:22: expected unsigned short [unsigned] [usertype] Tid fs/cifs/cifssmb.c:728:22: got restricted __le16 [usertype] <noident> fs/cifs/cifssmb.c:1883:45: warning: incorrect type in assignment (different base types) fs/cifs/cifssmb.c:1883:45: expected long long [signed] [usertype] fl_start fs/cifs/cifssmb.c:1883:45: got restricted __le64 [usertype] start fs/cifs/cifssmb.c:1884:54: warning: restricted __le64 degrades to integer fs/cifs/cifssmb.c:1885:58: warning: restricted __le64 degrades to integer fs/cifs/cifssmb.c:1886:43: warning: incorrect type in assignment (different base types) fs/cifs/cifssmb.c:1886:43: expected unsigned int [unsigned] fl_pid fs/cifs/cifssmb.c:1886:43: got restricted __le32 [usertype] pid In checking new smb2 code for missing endian conversions, I noticed some endian errors had crept in over the last few releases into the cifs code (symlink, ntlmssp, posix lock, and also a less problematic warning in fscache). A followon patch will address a few smb2 endian problems. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12Elminate sparse __CHECK_ENDIAN__ warnings on port conversionSteve French
Ports are __be16 not unsigned short int Eliminates the remaining fixable endian warnings: ~/cifs-2.6$ make modules C=1 M=fs/cifs CF=-D__CHECK_ENDIAN__ CHECK fs/cifs/connect.c fs/cifs/connect.c:2408:23: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2408:23: expected unsigned short *sport fs/cifs/connect.c:2408:23: got restricted __be16 *<noident> fs/cifs/connect.c:2410:23: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2410:23: expected unsigned short *sport fs/cifs/connect.c:2410:23: got restricted __be16 *<noident> fs/cifs/connect.c:2416:24: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2416:24: expected unsigned short [unsigned] [short] <noident> fs/cifs/connect.c:2416:24: got restricted __be16 [usertype] <noident> fs/cifs/connect.c:2423:24: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2423:24: expected unsigned short [unsigned] [short] <noident> fs/cifs/connect.c:2423:24: got restricted __be16 [usertype] <noident> fs/cifs/connect.c:2326:23: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2326:23: expected unsigned short [unsigned] sport fs/cifs/connect.c:2326:23: got restricted __be16 [usertype] sin6_port fs/cifs/connect.c:2330:23: warning: incorrect type in assignment (different base types) fs/cifs/connect.c:2330:23: expected unsigned short [unsigned] sport fs/cifs/connect.c:2330:23: got restricted __be16 [usertype] sin_port fs/cifs/connect.c:2394:22: warning: restricted __be16 degrades to integer Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12Max share size is too smallSteve French
Max share name was set to 64, and (at least for Windows) can be 80. Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12Merge branch 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus
2011-04-12btrfs: using cached extent_state in set/unlock combinationsArne Jansen
In several places the sequence (set_extent_uptodate, unlock_extent) is used. This leads to a duplicate lookup of the extent state. This patch lets set_extent_uptodate return a cached extent_state which can be passed to unlock_extent_cached. The occurences of the above sequences are updated to use the cache. Only end_bio_extent_readpage is updated that it first gets a cached state to pass it to the readpage_end_io_hook as the prototype requested and is later on being used for set/unlock. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12Btrfs: avoid taking the trans_mutex in btrfs_end_transactionJosef Bacik
I've been working on making our O_DIRECT latency not suck and I noticed we were taking the trans_mutex in btrfs_end_transaction. So to do this we convert num_writers and use_count to atomic_t's and just decrement them in btrfs_end_transaction. Instead of deleting the transaction from the trans list in put_transaction we do that in btrfs_commit_transaction() since that's the only time it actually needs to be removed from the list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-12Allow user names longer than 32 bytesSteve French
We artificially limited the user name to 32 bytes, but modern servers handle larger. Set the maximum length to a reasonable 256, and make the user name string dynamically allocated rather than a fixed size in session structure. Also clean up old checkpatch warning. Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: replace /proc/fs/cifs/Experimental with a module parmJeff Layton
This flag currently only affects whether we allow "zero-copy" writes with signing enabled. Typically we map pages in the pagecache directly into the write request. If signing is enabled however and the contents of the page change after the signature is calculated but before the write is sent then the signature will be wrong. Servers typically respond to this by closing down the socket. Still, this can provide a performance benefit so the "Experimental" flag was overloaded to allow this. That's really not a good place for this option however since it's not clear what that flag does. Move that flag instead to a new module parameter that better describes its purpose. That's also better since it can be set at module insertion time by configuring modprobe.d. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12cifs: check for private_data before trying to put itJeff Layton
cifs_close doesn't check that the filp->private_data is non-NULL before trying to put it. That can cause an oops in certain error conditions that can occur on open or lookup before the private_data is set. Reported-by: Ben Greear <greearb@candelatech.com> CC: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12Btrfs: fix subvolume mount by name problem when default mount subvolume is setXin Zhong
We create two subvolumes (meego_root and meego_home) in btrfs root directory. And set meego_root as default mount subvolume. After we remount btrfs, meego_root is mounted to top directory by default. Then when we try to mount meego_home (subvol=meego_home) to a subdirectory, it failed. The problem is when default mount subvolume is set to meego_root, we search meego_home in meego_root but can not find it. So the solution is to add a new mount option (subvolrootid) to specify subvol id of root and search subvol name in it. For our case, now we can use "-o subvolrootid=0,subvol=meego_home) to mount meego_home. Detail information can be found in meego bugzilla: https://bugs.meego.com/show_bug.cgi?id=15055 Signed-off-by: Zhong, Xin <xin.zhong@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12fix user annotation in ioctl.cDaniel J Blueman
Fix address space annotation correct in ioctl.c. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> BTRFS_BLOCK_GROUP_SYSTEM, @@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg) up_read(&info->groups_sem); } - user_dest = (struct btrfs_ioctl_space_info *) + user_dest = (struct btrfs_ioctl_space_info __user *) (arg + sizeof(struct btrfs_ioctl_space_args)); if (copy_to_user(user_dest, dest_orig, alloc_size)) Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12Btrfs: check for duplicate iov_base's when doing dio readsJosef Bacik
Apparently it is ok to submit a read to an IDE device with the same target page for different offsets. This is what Windows does under qemu. The problem is under DIO we expect them to be different buffers for checksumming reasons, and so this sort of thing will result in checksum errors, when in reality the file is fine. So when reading, check to make sure that all iov bases are different, and if they aren't fall back to buffered mode, since that will work out right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12btrfs: properly handle overlapping areas in memmove_extent_bufferSergei Trofimovich
Fix data corruption caused by memcpy() usage on overlapping data. I've observed it first when found out usermode linux crash on btrfs. ?all chain is the following: ------------[ cut here ]------------ WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 memcpy_extent_buffer+0x1a5/0x219() Call Trace: 6fa39a58: [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c 6fa39a68: [<60029ad9>] warn_slowpath_common+0x59/0x70 6fa39aa8: [<60029b05>] warn_slowpath_null+0x15/0x17 6fa39ab8: [<600efc97>] memcpy_extent_buffer+0x1a5/0x219 6fa39b48: [<600efd9f>] memmove_extent_buffer+0x94/0x208 6fa39bc8: [<600becbf>] btrfs_del_items+0x214/0x473 6fa39c78: [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda 6fa39cc8: [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d 6fa39d08: [<600d7864>] btrfs_start_transaction+0xe/0x10 6fa39d48: [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b 6fa39d78: [<600e04bc>] btrfs_unlink+0x70/0xef 6fa39dc8: [<6007f0d0>] vfs_unlink+0x58/0xa3 6fa39df8: [<60080278>] do_unlinkat+0xd4/0x162 6fa39e48: [<600517db>] call_rcu_sched+0xe/0x10 6fa39e58: [<600452a8>] __put_cred+0x58/0x5a 6fa39e78: [<6007446c>] sys_faccessat+0x154/0x166 6fa39ed8: [<60080317>] sys_unlink+0x11/0x13 6fa39ee8: [<60016b80>] handle_syscall+0x58/0x70 6fa39f08: [<60021377>] userspace+0x2d4/0x381 6fa39fc8: [<60014507>] fork_handler+0x62/0x69 ---[ end trace 70b0ca2ef0266b93 ]--- http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.html Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12Btrfs: fix memory leaks in btrfs_new_inode()Yoshinori Sano
This patch fixes memory leaks in btrfs_new_inode(). Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-11Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: use proper interfaces for on-stack plugging xfs: fix xfs_debug warnings xfs: fix variable set but not used warnings xfs: convert log tail checking to a warning xfs: catch bad block numbers freeing extents. xfs: push the AIL from memory reclaim and periodic sync xfs: clean up code layout in xfs_trans_ail.c xfs: convert the xfsaild threads to a workqueue xfs: introduce background inode reclaim work xfs: convert ENOSPC inode flushing to use new syncd workqueue xfs: introduce a xfssyncd workqueue xfs: fix extent format buffer allocation size xfs: fix unreferenced var error in xfs_buf.c Also, applied patch from Tony Luck that fixes ia64: xfs_destroy_workqueues() should not be tagged with__exit in the branch before merging.
2011-04-11xfs_destroy_workqueues() should not be tagged with__exitLuck, Tony
ia64 throws away .exit sections for the built-in CONFIG case, so routines that are used in other circumstances should not be tagged as __exit. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix data corruption regression by reverting commit 6de9843dab3f ext4: Allow indirect-block file to grow the file size to max file size ext4: allow an active handle to be started when freezing ext4: sync the directory inode in ext4_sync_parent() ext4: init timer earlier to avoid a kernel panic in __save_error_info jbd2: fix potential memory leak on transaction commit ext4: fix a double free in ext4_register_li_request ext4: fix credits computing for indirect mapped files ext4: remove unnecessary [cm]time update of quota file jbd2: move bdget out of critical section
2011-04-11Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: nfsd4: fix oops on lock failure nfsd: fix auth_domain reference leak on nlm operations
2011-04-11ext4: fix data corruption regression by reverting commit 6de9843dab3fTheodore Ts'o
Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it caused a data corruption regression with BitTorrent downloads. Thanks to Damien for discovering and bisecting to find the problem commit. https://bugzilla.kernel.org/show_bug.cgi?id=32972 Reported-by: Damien Grassart <damien@grassart.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-11ext4: Allow indirect-block file to grow the file size to max file sizeKazuya Mio
We can create 4402345721856 byte file with indirect block mapping. However, if we grow an indirect-block file to the size with ftruncate(), we can see an ext4 warning. The following patch fixes this problem. How to reproduce: # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s # tail -n 1 /var/log/messages Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12 Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-11ext4: allow an active handle to be started when freezingYongqiang Yang
ext4_journal_start_sb() should not prevent an active handle from being started due to s_frozen. Otherwise, deadlock is easy to happen, below is a situation. ================================================ freeze | truncate ================================================ | ext4_ext_truncate() freeze_super() | starts a handle sets s_frozen | | ext4_ext_truncate() | holds i_data_sem ext4_freeze() | waits for updates | | ext4_free_blocks() | calls dquot_free_block() | | dquot_free_blocks() | calls ext4_dirty_inode() | | ext4_dirty_inode() | trys to start an active | handle | | block due to s_frozen ================================================ Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Amir Goldstein <amir73il@users.sf.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2011-04-11ext4: sync the directory inode in ext4_sync_parent()Curt Wohlgemuth
ext4 has taken the stance that, in the absence of a journal, when an fsync/fdatasync of an inode is done, the parent directory should be sync'ed if this inode entry is new. ext4_sync_parent(), which implements this, does indeed sync the dirent pages for parent directories, but it does not sync the directory *inode*. This patch fixes this. Also now return error status from ext4_sync_parent(). I tested this using a power fail test, which panics a machine running a file server getting requests from a client. Without this patch, on about every other test run, the server is missing many, many files that had been synced. With this patch, on > 6 runs, I see zero files being lost. Google-Bug-Id: 4179519 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10nfsd4: fix oops on lock failureJ. Bruce Fields
Lock stateid's can have access_bmap 0 if they were only partially initialized (due to a failed lock request); handle that case in free_generic_stateid. ------------[ cut here ]------------ kernel BUG at fs/nfsd/nfs4state.c:380! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf] Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform EIP: 0060:[<e24f180d>] EFLAGS: 00010297 CPU: 0 EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd] EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004 ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000) Stack: dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20 ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68 Call Trace: [<e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd] [<e24f1a5d>] release_lockowner+0x71/0x8a [nfsd] [<e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd] [<e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd] [<e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd] [<c07a0052>] ? _cond_resched+0x8/0x1c [<c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27 [<c04cac01>] ? kmem_cache_alloc+0x1a/0xd2 [<c04835a0>] ? __call_rcu+0xd7/0xdd [<e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd] [<e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd] [<e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd] [<e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd] [<e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd] [<e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd] [<e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc] [<e1d6e578>] svc_process+0xdc/0xfa [sunrpc] [<e24de0fa>] nfsd+0xd6/0x115 [nfsd] [<e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd] [<c0454322>] kthread+0x62/0x67 [<c04542c0>] ? kthread_worker_fn+0x114/0x114 [<c07a6ebe>] kernel_thread_helper+0x6/0x10 Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d EIP: [<e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0 ---[ end trace 2b0bf6c6557cb284 ]--- The trace route is: -> nfsd4_lock() -> if (lock->lk_is_new) { -> alloc_init_lock_stateid() 3739: stp->st_access_bmap = 0; ->if (status && lock->lk_is_new && lock_sop) -> release_lockowner() -> free_generic_stateid() -> nfs4_access_bmap_to_omode() -> nfs4_access_to_omode() 380: BUG(); ***** This problem was introduced by 0997b173609b9229ece28941c118a2a9b278796e. Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Tested-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-10treewide: remove extra semicolonsJustin P. Mattock
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-10ldm: Silence "ldm_validate_partition_table(): Disk read failed" when booting ↵Nikanth Karthikesan
with "quiet" When the kernel does partition detection, on certain configurations with external fibre channel raid systems (e.g. clariion from EMC) the read would fail. And "ldm_validate_partition_table(): Disk read failed" messages are printed to the console. But the failure to read is not a critical error. Now since the message is flagged as KERN_CRIT, it gets printed even when booting with the "quiet" kernel parameter. Fix it by using KERN_INFO, as the failure to read here is not really an error. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Reported-by : Klaus Hartmann <Klaus.Hartmann@ts.fujitsu.com> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-08ufs: Fix a typoAlessio Igor Bogani
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-08Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC NFS: Fix a signed vs. unsigned secinfo bug Revert "net/sunrpc: Use static const char arrays"
2011-04-08Btrfs: check for duplicate iov_base's when doing dio readsJosef Bacik
Apparently it is ok to submit a read to an IDE device with the same target page for different offsets. This is what Windows does under qemu. The problem is under DIO we expect them to be different buffers for checksumming reasons, and so this sort of thing will result in checksum errors, when in reality the file is fine. So when reading, check to make sure that all iov bases are different, and if they aren't fall back to buffered mode, since that will work out right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: reuse the extent_map we found when calling btrfs_get_extentJosef Bacik
In btrfs_get_block_direct we call btrfs_get_extent to lookup the extent for the range that we are looking for. If we don't find an extent, btrfs_get_extent will insert a extent_map for that area and mark it as a hole. So it does the job of allocating a new extent map and inserting it into the io tree. But if we're creating a new extent we free it up and redo all of that work. So instead pass the em to btrfs_new_extent_direct(), and if it will work just allocate the disk space and set it up properly and bypass the freeing/allocating of a new extent map and the expensive operation of inserting the thing into the io_tree. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: do not use async submit for small DIO io'sJosef Bacik
When looking at our DIO performance Chris said that for small IO's doing the async submit stuff tends to be more overhead than it's worth. With this on top of my other fixes I get about a 17-20% speedup doing a sequential dd with 4k IO's. Basically if we don't have to split the bio for the map length it's small enough to be directly submitted, otherwise go back to the async submit. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: don't split dio bios if we don't have toJosef Bacik
We have been unconditionally allocating a new bio and re-adding all pages from our original bio to the new bio. This is needed if our original bio is larger than our stripe size, but if it is smaller than the stripe size then there is no need to do this. So check the map length and if we are under that then go ahead and submit the original bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: do not call btrfs_update_inode in endio if nothing changedJosef Bacik
In the DIO code we often don't update the i_disk_size because the i_size isn't updated until after the DIO is completed, so basically we are allocating a path, doing a search, and updating the inode item for no reason since nothing changed. btrfs_ordered_update_i_size will return 1 if it didn't update i_disk_size, so only run btrfs_update_inode if btrfs_ordered_update_i_size returns 0. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: map the inode item when doing fill_inode_itemJosef Bacik
Instead of calling kmap_atomic for every thing we set in the inode item, map the entire inode item at the start and unmap it at the end. This makes a sequential dd of 400mb O_DIRECT something like 1% faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: only retry transaction reservation onceJosef Bacik
I saw a lockup where we kept getting into this start transaction->commit transaction loop because of enospce. The fact is if we fail to make our reservation, we've tried _everything_ several times, so we only need to try and commit the transaction once, and if that doesn't work then we really are out of space and need to just exit. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-04-08Btrfs: deal with the case that we run out of space in the cacheJosef Bacik
Currently we don't handle running out of space in the cache, so to fix this we keep track of how far in the cache we are. Then we only dirty the pages if we successfully modify all of them, otherwise if we have an error or run out of space we can just drop them and not worry about the vm writing them out. Thanks, Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Josef Bacik <josef@redhat.com>