summaryrefslogtreecommitdiff
path: root/fs/ocfs2
AgeCommit message (Collapse)Author
2010-10-08ocfs2/cluster: Track number of global heartbeat regionsSunil Mushran
In global heartbeat mode, we have a upper limit for the number of active regions. This patch adds the facility to track the number of active global heartbeat regions and fails to start heartbeat if the number exceeds the maximum. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/cluster: Maintain live node bitmap per heartbeat regionSunil Mushran
Currently we track a global livenode bitmap that keeps track of all nodes that are heartbeating in all regions. This patch adds the ability to track the livenode bitmap on a per region basis. We will use this facility in a later patch to allow us to withstand the loss of a minority number of regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-08ocfs2/cluster: Reorganize o2hb debugfs initSunil Mushran
o2hb debugfs handling is reorganized to allow for easy expansion. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-08ocfs2/cluster: Check slots for unconfigured live nodesSunil Mushran
o2hb currently checks slots for configured nodes only. This patch makes it check the slots for the live nodes too to take care of a race in which a node is removed from the configuration but not from the live map. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-08ocfs2/cluster: Print messages when adding/removing nodesSunil Mushran
Prints messages when the user adds or removes nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/cluster: Print messages when adding/removing heartbeat regionsSunil Mushran
Prints messages when the user adds or removes heartbeat regions in global heartbeat mode. These messages are useful when debugging cluster related issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/dlm: Add message DLM_QUERY_NODEINFOSunil Mushran
Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all registered nodes. This message is sent if the negotiated dlm protocol is 1.1 or higher. If the information of the joining node does not match that of any existing nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2: Print message if user mounts without starting global heartbeatSunil Mushran
In global heartbeat mode, the heartbeat is started by the user. This patch prints an error if the user attempts to mount a volume without starting the heartbeat. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-09ocfs2/dlm: Add message DLM_QUERY_REGIONSunil Mushran
Adds new dlm message DLM_QUERY_REGION that sends the names of all active heartbeat regions. This message is only sent in the global heartbeat mode. If the regions in the joining node do not fully match the ones in the active nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/cluster: Get all heartbeat regionsSunil Mushran
Export function in o2hb to get a list of heartbeat regions. It also adds an upper limit to the length of the heartbeat region name. o2hb_global_heartbeat_active() currently disables global heartbeat. It will be enabled in a later patch after all the code is added. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/dlm: Expose dlm_protocol in dlm_stateSunil Mushran
Add dlm_protocol to the list of info shown by the debugfs file, dlm_state. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2: Add support for heartbeat=global mount optionSunil Mushran
Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-09ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFOSunil Mushran
OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-07ocfs2/cluster: Add heartbeat mode configfs parameterSunil Mushran
Add heartbeat mode parameter to the configfs tree. This will be used to set/show the heartbeat mode. The user is free to toggle the mode between local and global as long as there is no active heartbeat region. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2010-10-04BKL: Remove BKL from OCFS2Arnd Bergmann
The BKL in ocfs2/dlmfs is used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. The use in ocfs2_control_open is evidently unrelated and the function is protected by ocfs2_control_lock. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>
2010-10-04BKL: Explicitly add BKL around get_sb/fill_superJan Blunck
This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>
2010-09-30ocfs2: Don't walk off the end of fast symlinks.Joel Becker
ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
2010-09-23o2dlm: force free mles during dlm exitSrinivas Eeda
While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Sync inode flags with ext2.Tao Ma
We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.Tao Ma
The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.Tao Ma
e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: update ctime when changing the file's permission by setfaclTao Ma
In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-21Fix various typos of valid in commentsNikanth Karthikesan
Fix various typos of valid. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-09-18ocfs2/net: fix uninitialized ret in o2net_send_message_vec()Wu Fengguang
mmotm/fs/ocfs2/cluster/tcp.c: In function ‘o2net_send_message_vec’: mmotm/fs/ocfs2/cluster/tcp.c:980:6: warning: ‘ret’ may be used uninitialized in this function It seems a real bug introduced by commit 9af0b38ff3 (ocfs2/net: Use wait_event() in o2net_send_message_vec()). cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-16ocfs2: Initialize the bktcnt variable properly, and call it bucket_countJoel Becker
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-15ocfs2: Silence unused warning.Joel Becker
When CONFIG_OCFS2_DEBUG_MASKLOG is undefined, we don't use the dentry variable in ocfs2_sync_file(). Let's just move all access to the dentry inside the logging call. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.cTristan Ye
This patch tries to handle the case in which list 'dlm->tracking_list' is empty, to avoid accessing an invalid pointer. It fixes the following oops: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1287 Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.Tristan Ye
In ocfs2_dx_dir_rebalance(), we need to rejournal_acess the blocks after calling ocfs2_insert_extent() since growing an extent tree may trigger ocfs2_extend_trans(), which makes previous journal_access meaningless. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Fix lockdep warning in reflink.Tao Ma
This patch change mutex_lock to a new subclass and add a new inode lock subclass for the target inode which caused this lockdep warning. ============================================= [ INFO: possible recursive locking detected ] 2.6.35+ #5 --------------------------------------------- reflink/11086 is trying to acquire lock: (Meta){+++++.}, at: [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] but task is already holding lock: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] other info that might help us debug this: 6 locks held by reflink/11086: #0: (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [<ffffffff820e09ec>] lookup_create+0x26/0x97 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa06f99a0>] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2] #2: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] #3: (&oi->ip_xattr_sem){+.+.+.}, at: [<ffffffffa06f9b58>] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2] #4: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa06f9b67>] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2] #5: (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [<ffffffffa06f9d4f>] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2] stack backtrace: Pid: 11086, comm: reflink Not tainted 2.6.35+ #5 Call Trace: [<ffffffff82063dd9>] validate_chain+0x56e/0xd68 [<ffffffff82062275>] ? mark_held_locks+0x49/0x69 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06c9ade>] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06e107b>] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2] [<ffffffffa06cb6ea>] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffff820623a0>] ? trace_hardirqs_on_caller+0x10b/0x12f [<ffffffff82060193>] ? debug_mutex_free_waiter+0x4f/0x53 [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06ce24a>] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2] [<ffffffff820bb2d2>] ? might_fault+0x40/0x8d [<ffffffffa06df9f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820ee5d3>] ? mntput_no_expire+0x1d/0xb0 [<ffffffff820e07b3>] ? path_put+0x2c/0x31 [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff8233a7f6>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff8200299c>] ? sysret_check+0x27/0x62 [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.Tao Ma
As the name shows, we shouldn't have any lock in ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller. This should be safe for us since the only 2 callers are: 1. ocfs2_xattr_get which will lock the resources. 2. ocfs2_mknod which don't need this locking. And this also resolves the following lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35+ #5 ------------------------------------------------------- reflink/30027 is trying to acquire lock: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] but task is already holding lock: (&oi->ip_xattr_sem){++++..}, at: [<ffffffffa0673b58>] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&oi->ip_xattr_sem){++++..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339650>] down_read+0x34/0x47 [<ffffffffa0691cb8>] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2] [<ffffffffa069d64f>] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2] [<ffffffffa069d9c7>] ocfs2_init_acl+0x60/0x243 [ocfs2] [<ffffffffa066499d>] ocfs2_mknod+0xae8/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #2 (jbd2_handle){+.+...}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa0604ff8>] start_this_handle+0x4a3/0x4bc [jbd2] [<ffffffffa06051d6>] jbd2__journal_start+0xba/0xee [jbd2] [<ffffffffa0605218>] jbd2_journal_start+0xe/0x10 [jbd2] [<ffffffffa065ca34>] ocfs2_start_trans+0xb7/0x19b [ocfs2] [<ffffffffa06645f3>] ocfs2_mknod+0x73e/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #1 (&journal->j_trans_barrier){.+.+..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82064fa9>] lock_release_non_nested+0x1e5/0x24b [<ffffffff82065999>] lock_release+0x158/0x17a [<ffffffff823389f6>] __mutex_unlock_slowpath+0xbf/0x11b [<ffffffff82338a5b>] mutex_unlock+0x9/0xb [<ffffffffa0679673>] ocfs2_free_ac_resource+0x31/0x67 [ocfs2] [<ffffffffa067c6bc>] ocfs2_free_alloc_context+0x11/0x1d [ocfs2] [<ffffffffa0633de0>] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2] [<ffffffffa0635523>] ocfs2_write_begin+0x11e/0x1e7 [ocfs2] [<ffffffff820a1297>] generic_file_buffered_write+0x10c/0x210 [<ffffffffa0653624>] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2] [<ffffffff820d822d>] do_sync_write+0xc2/0x106 [<ffffffff820d897b>] vfs_write+0xae/0x131 [<ffffffff820d8e55>] sys_write+0x47/0x6f [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #0 (&oi->ip_alloc_sem){+.+.+.}: [<ffffffff82063f92>] validate_chain+0x727/0xd68 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339694>] down_write+0x31/0x52 [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] [<ffffffffa06599f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Track negative entries v3Goldwyn Rodrigues
Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Cache system inodes of other slots.Tao Ma
Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10OCFS2: Allow huge (> 16 TiB) volumes to mountPatrick J. LoPresti
The OCFS2 developers have already done all of the hard work to allow volumes larger than 16 TiB. But there is still a "sanity check" in fs/ocfs2/super.c that prevents the mounting of such volumes, even when the cluster size and journal options would allow it. This patch replaces that sanity check with a more sophisticated one to mount a huge volume provided that (a) it is addressable by the raw word/address size of the system (borrowing a test from ext4); (b) the volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is set on the journal. I factored out the sanity check into its own function. I also moved it from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier, and the journal will not have been initialized yet. This patch is one of a pair, and it depends on the other ("JBD2: Allow feature checks before journal recovery"). I have tested this patch on small volumes, huge volumes, and huge volumes without 64-bit block support in the journal. All of them appear to work or to fail gracefully, as appropriate. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into ↵Joel Becker
merge-2
2010-09-10ocfs2: Remove obsolete comments before ocfs2_start_trans.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove unused old_id in ocfs2_commit_cache.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove ocfs2_sync_inode()Jan Kara
ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has already been written before calling ocfs2_sync_file() and ocfs2 doesn't use inode's private_list for tracking metadata buffers thus sync_mapping_buffers() is superfluous as well. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Reorganize data elements to reduce struct sizesGoldwyn Rodrigues
Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove obscure error handling in direct_write.Tao Ma
In ocfs2, actually we don't allow any direct write pass i_size, see the function ocfs2_prepare_inode_for_write. So we don't need the bogus simple_setsize. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Add some trace log for orphan scan.Tao Ma
Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8.Tristan Ye
The reason why we need this ioctl is to offer the none-privileged end-user a possibility to get filesys info gathering. We use OCFS2_IOC_INFO to manipulate the new ioctl, userspace passes a structure to kernel containing an array of request pointers and request count, such as, * From userspace: struct ocfs2_info_blocksize oib = { .ib_req = { .ir_magic = OCFS2_INFO_MAGIC, .ir_code = OCFS2_INFO_BLOCKSIZE, ... } ... } struct ocfs2_info_clustersize oic = { ... } uint64_t reqs[2] = {(unsigned long)&oib, (unsigned long)&oic}; struct ocfs2_info info = { .oi_requests = reqs, .oi_count = 2, } ret = ioctl(fd, OCFS2_IOC_INFO, &info); * In kernel: Get the request pointers from *info*, then handle each request one bye one. Idea here is to make the spearated request small enough to guarantee a better backward&forward compatibility since a small piece of request would be less likely to be broken if filesys on raw disk get changed. Currently, the following 7 requests are supported per the requirement from userspace tool o2info, and I believe it will grow over time:-) OCFS2_INFO_CLUSTERSIZE OCFS2_INFO_BLOCKSIZE OCFS2_INFO_MAXSLOTS OCFS2_INFO_LABEL OCFS2_INFO_UUID OCFS2_INFO_FS_FEATURES OCFS2_INFO_JOURNAL_SIZE This ioctl is only specific to OCFS2. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-08ocfs2: Fix orphan add in ocfs2_create_inode_in_orphanMark Fasheh
ocfs2_create_inode_in_orphan() is used by reflink to create the newly reflinked inode simultaneously in the orphan dir. This allows us to easily handle partially-reflinked files during recovery cleanup. We have a problem though - the orphan dir stringifies inode # to determine a unique name under which the orphan entry dirent can be created. Since ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir before it can allocate the inode, we currently call into the orphan code: /* * We give the orphan dir the root blkno to fake an orphan name, * and allocate enough space for our insertion. */ status = ocfs2_prepare_orphan_dir(osb, &orphan_dir, osb->root_blkno, orphan_name, &orphan_insert); Using osb->root_blkno might work fine on unindexed directories, but the orphan dir can have an index. When it has that index, the above code fails to allocate the proper index entry. Later, when we try to remove the file from the orphan dir (using the actual inode #), the reflink operation will fail. To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the newly split out orphan and inode alloc code to figure out what the inode block number will be (once allocated) and then prepare the orphan dir from that data. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functionsMark Fasheh
We do this because ocfs2_create_inode_in_orphan() wants to order locking of the orphan dir with respect to locking of the inode allocator *before* making any changes to the directory. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: allow return of new inode block location before allocation of the inodeMark Fasheh
This allows code which needs to know the eventual block number of an inode but can't allocate it yet due to transaction or lock ordering. For example, ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation of the orphan dir because it can't yet know where the actual inode is placed - that code is actually in ocfs2_mknod_locked. This is a problem when the orphan dirs are indexed as the junk inode number will create an index entry which goes unused (and fails the later removal from the orphan dir). Now with these interfaces, ocfs2_create_inode_in_orphan() can run the block group search (and get back the inode block number) *before* any actual allocation occurs. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open codingMark Fasheh
ocfs2_search_chain() makes the same updates as ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding the bitmap update, use our helper function. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: split out inode alloc code from ocfs2_mknod_lockedMark Fasheh
Do this by splitting the bulk of the function away from the inode allocation code at the very tom of ocfs2_mknod_locked(). Existing callers don't need to change and won't see any difference. The new function created, __ocfs2_mknod_locked() will be used shortly. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08Ocfs2: Fix a regression bug from mainline ↵Tristan Ye
commit(6b933c8e6f1a2f3118082c455eef25f9b1ac7b45). The patch is to fix the regression bug brought from commit 6b933c8...( 'ocfs2: Avoid direct write if we fall back to buffered I/O'): http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285 The commit 6b933c8e6f1a2f3118082c455eef25f9b1ac7b45 changed __generic_file_aio_write to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to flush the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt the O_DIRECT semantics somehow in extented odirect writes. This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly flushed. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: Fix deadlock when allocating pageJan Kara
We cannot call grab_cache_page() when holding filesystem locks or with a transaction started as grab_cache_page() calls page allocation with GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem causing deadlocks or various assertion failures. We have to use find_or_create_page() instead and pass it GFP_NOFS as we do with other allocations. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: properly set and use inode group alloc hintMark Fasheh
We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from res->sr_bg_blkno. Unfortunately, res->sr_bg_blkno is going to be zero under normal (non-fragmented) circumstances. The discontig block group patches effectively turned off that feature. Fix this by correctly calculating what the next group hint should be. Acked-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: Use the right group in nfs sync check.Tao Ma
We have added discontig block group now, and now an inode can be allocated in an discontig block group. So get it in ocfs2_get_suballoc_slot_bit. The old ocfs2_test_suballoc_bit gets group block no from the allocation inode which is wrong. Fix it by passing the right group. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>