summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-04-13ocfs2: Pass suballocation results back via a structure.Joel Becker
We're going to be adding more info to a suballocator allocation. Rather than growing every function in the chain, let's pass a result structure around. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Allocate discontiguous block groups.Joel Becker
If we cannot get a contiguous region for a block group, allocate a discontiguous one when the filesystem supports it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Define data structures for discontiguous block groups.Joel Becker
Defines the OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG feature bit and modifies struct ocfs2_group_desc for the feature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-05-06ocfs2/dlm: Increase o2dlm lockres hash sizeSunil Mushran
Lockres hash size of 16KB is far too small for large filesystems (where we have hundreds of thousands of lock resources stored in the table). This patch increases it to 128KB. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: Make ocfs2_extend_trans() really extend.Tao Ma
In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's blocks. But if jbd2_journal_extend() fails, it will only restart with the the new number of blocks. This tends to be awkward since in most cases we want additional reserved blocks. It makes our code harder to mantain since the caller can't be sure all the original blocks will not be accessed and dirtied again. There are 15 callers of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add h_buffer_credits before they call ocfs2_extend_trans(). This makes ocfs2_extend_trans() really extend atop the original block count. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2/trivial: Code cleanup for allocation reservation.Tao Ma
Two tiny cleanup for allocation reservation. 1. Remove some extra codes in ocfs2_local_alloc_find_clear_bits. 2. Remove an unuseful variables in ocfs2_find_resv_lhs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: make ocfs2_adjust_resv_from_alloc simple.Tao Ma
When we allocate some bits from the reservation, we always allocate from the r_start(see ocfs2_resmap_resv_bits). So there should be no reason to check between r_start and start. And I don't think we will change this behaviour later by allocating from some bits after r_start. Why not make ocfs2_adjust_resv_from_alloc simple for now? The only chance we have to adjust the reservation is when we haven't reached the end. With this patch, the function is more readable. Note: btw, this patch also fixes an original bug in the function which I haven't found before. if (end < ocfs2_resv_end(resv)) rhs = end - ocfs2_resv_end(resv); This code is of course buggy. ;) Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: Make nointr a default mount optionSunil Mushran
OCFS2 has never really supported intr. This patch acknowledges this reality and makes nointr the default mount option. In a later patch, we intend to support intr. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICESunil Mushran
o2dlm join and leave messages are more than informational as they are required for debugging locking issues. This patch changes them from KERN_INFO to KERN_NOTICE. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06o2net: log socket state changesSrinivas Eeda
This patch logs socket state changes that lead to socket shutdown. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: print node # when tcp failsWengang Wang
Print the node number of a peer node if sending it a message failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: Add dir_resv_level mount optionMark Fasheh
The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: change default reservation window sizesMark Fasheh
The default reservation size of 4 (32-bit windows) is a bit too ambitious. Scale it back to 16 bits (resv_level=2). I have been testing various sizes on a 4-node cluster which runs a mixed workload that is heavily threaded. With a 256MB local alloc, I get *roughly* the following levels of average file fragmentation: resv_level=0 70% resv_level=1 21% resv_level=2 23% resv_level=3 24% resv_level=4 60% resv_level=5 did not test resv_level=6 60% resv_level=2 seemed like a good compromise between not letting windows be too small, but not so big that heavier workloads will immediately suffer without tuning. This patch also change the behavior of directory reservations - they now track file reservations. The previous compromise of giving directory windows only 8 bits wound up fragmenting more at some window sizes because file allocations had smaller unused windows to poach from. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: increase the default size of local alloc windowsMark Fasheh
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: clean up localalloc mount option size parsingMark Fasheh
This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: remove ocfs2_local_alloc_in_range()Mark Fasheh
Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-06ocfs2: allocate btree internal block groups from the global bitmapMark Fasheh
Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-06ocfs2: use allocation reservations for directory dataMark Fasheh
Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-06ocfs2: use allocation reservations during file writeMark Fasheh
Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-06ocfs2: allocation reservationsMark Fasheh
This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-06ocfs2: Make ocfs2_journal_dirty() void.Joel Becker
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-24ocfs2: Clear undo bits when local alloc is freedMark Fasheh
When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-19ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block.Tao Ma
You can't store a pointer that you haven't filled in yet and expect it to work. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-19ocfs2: Fix the update of name_offset when removing xattrsTao Ma
When replacing a xattr's value, in some case we wipe its name/value first and then re-add it. The wipe is done by ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or block. We currently adjust name_offset for all the entries which have (offset < name_offset). This does not adjust the entrie we're replacing. Since we are replacing the entry, we don't adjust the total entry count. When we calculate a new namevalue location, we trust the entries now-wrong offset in ocfs2_xa_get_free_start(). The solution is to also adjust the name_offset for the replaced entry, allowing ocfs2_xa_get_free_start() to calculate the new namevalue location correctly. The following script can trigger a kernel panic easily. echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE=$MNT_DIR/$RANDOM for((i=0;i<76;i++)) do string_76="a$string_76" done string_78="aa$string_76" string_82="aaaa$string_78" touch $FILE setfattr -n 'user.test1234567890' -v $string_76 $FILE setfattr -n 'user.test1234567890' -v $string_78 $FILE setfattr -n 'user.test1234567890' -v $string_82 $FILE Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-18ocfs2: Always try for maximum bits with new local alloc windowsMark Fasheh
What we were doing before was to ask for the current window size as the maximum allocation. This had the effect of limiting the amount of allocation we could get for the local alloc during times when the window size was shrunk due to fragmentation. In some cases, that could actually *increase* fragmentation by artificially limiting the number of bits we can accept. So while we still want to ask for a minimum number of bits equal to window size, there is no reason why we should limit the number of bits the local alloc should accept. Hence always allow the maximum number of local alloc bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: set i_mode on disk during acl operationsMark Fasheh
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory inode, but never setting it on the disk copy. Thus, acls were some times not getting propagated between nodes. This patch fixes the issue by adding a helper function ocfs2_acl_set_mode() which does this the right way. ocfs2_set_acl() and ocfs2_init_acl() are then updated to call ocfs2_acl_set_mode(). Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: Update i_blocks in reflink operations.Tao Ma
In reflink, we need to upate i_blocks for the target inode. Reported-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: Change bg_chain check for ocfs2_validate_gd_parent.Tao Ma
In ocfs2_validate_gd_parent, we check bg_chain against the cl_next_free_rec of the dinode. Actually in resize, we have the chance of bg_chain == cl_next_free_rec. So add some additional condition check for it. I also rename paramter "clean_error" to "resize", since the old one is not clearly enough to indicate that we should only meet with this case in resize. btw, the correpsonding bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17[PATCH] Skip check for mandatory locks when unlockingSachin Prabhu
ocfs2_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. ocfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: remove whitespaces before quoted newlines nilfs2: remove spaces before tabs nilfs2: fix various typos in comments nilfs2: fix typo "cout" -> "count" in error message nilfs2: fix function name typos in docbook comments nilfs2: fix discrepancy in use of static specifier
2010-03-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: Skip check for mandatory locks when unlocking 9p: Fixes a simple bug enabling writes beyond 2GB. 9p: Change the name of new protocol from 9p2010.L to 9p2000.L fs/9p: re-init the wstat in readdir loop net/9p: Add sysfs mount_tag file for virtio 9P device net/9p: Use the tag name in the config space for identifying mount point
2010-03-14nilfs2: remove whitespaces before quoted newlinesRyusuke Konishi
This kills the following checkpatch warnings: WARNING: unnecessary whitespace before a quoted newline #869: FILE: super.c:869: + "remount to a different snapshot. \n", WARNING: unnecessary whitespace before a quoted newline #389: FILE: the_nilfs.c:389: + printk(KERN_ERR "NILFS: too short segment. \n"); Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14nilfs2: remove spaces before tabsRyusuke Konishi
This kills the following checkpatch warnings: WARNING: please, no space before tabs #74: FILE: segment.h:74: +^Iunsigned ^I^Iflags;$ WARNING: please, no space before tabs #35: FILE: segbuf.c:35: +^Iint ^I^I^Istart, end; /* The region to be submitted */$ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14nilfs2: fix various typos in commentsRyusuke Konishi
This fixes various typos I found in comments of nilfs2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14nilfs2: fix typo "cout" -> "count" in error messageRyusuke Konishi
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14nilfs2: fix function name typos in docbook commentsRyusuke Konishi
Fixes the following typos in docbook comments: nilfs_detroy_transaction_cache -> nilfs_destroy_transaction_cache nilfs_secgtor_start_timer -> nilfs_segctor_start_timer Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14nilfs2: fix discrepancy in use of static specifierRyusuke Konishi
Two segbuf functions, nilfs_segbuf_write and nilfs_segbuf_wait, are declared with the static storage class specifier, but their implementations are not. This fixes the discrepancy. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Skip check for mandatory locks when unlocking GFS2: Allow the number of committed revokes to temporarily be negative GFS2: do not select QUOTA
2010-03-139p: Skip check for mandatory locks when unlockingSachin Prabhu
While investigating a bug, I came across a possible bug in v9fs. The problem is similar to the one reported for NFS by ASANO Masahiro in http://lkml.org/lkml/2005/12/21/334. v9fs_file_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. Such a lock will be skipped during unlock and the machine will end up with a BUG in locks_remove_flock(). v9fs_file_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-139p: Fixes a simple bug enabling writes beyond 2GB.jvrao
Fixes a simple bug so that large files beyond 2GB can be created. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-139p: Change the name of new protocol from 9p2010.L to 9p2000.LSripathi Kodi
This patch changes the name of the new 9P protocol from 9p2010.L to 9p2000.u. This is because we learnt that the name 9p2010 is already being used by others. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13fs/9p: re-init the wstat in readdir loopAneesh Kumar K.V
This ensure that on failure when we free the stat buf we don't end up freeing an already freed pointer in the earlier loop Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Fix stat->f_namelen fat: Fix vfat_lookup()
2010-03-13anon_inodes: mark the anon inode privateEric Paris
Inotify was switched to use anon_inode instead of its own private filesystem which only had one inode in commit c44dcc56d2b5c7 "switch inotify_user to anon_inode" The problem with this is that now the inotify inode is not a distinct inode which can be managed by LSMs. userspace tools which use inotify were allowed to use the inotify inode but may not have had permission to do read/write type operations on the anon_inode. After looking at the anon_inode and its users it looks like the best solution is to just mark the anon_inode as S_PRIVATE so the security system will ignore it. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-13Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: use ext2_find_next_bit udf: Do not read inode before writing it udf: Fix unalloc space handling in udf_update_inode
2010-03-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-12ufs: make solaris fsck happyEvgeniy Dushistov
Alex Viskovatoff let me know that after copying data to solaris's ufs from linux, solaris's fsck sees some errors in cylinder summary information. This is because of solaris expects find some data on another places, then curernt implementation save it. This patch fixes this issue. It is tested by me, and also Alex reported that it works for him. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Reported-by: Alex Viskovatoff <viskovatoff@imap.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12fs/ufs: recognize Solaris-specific file system stateAlex Viskovatoff
Recent releases of Solaris set the fs_clean state of an unmounted UFS file system as FSLOG ("logging fs"). However, the Linux kernel currently does not recognize the value which represents this state. Thus, attempting to mount such a file system rw produces the message kernel: ufs_read_super: can't grok fs_clean 0xfffffffd and the file system is mounted read-only. This patch makes the kernel recognize that value. Signed-off-by: Alex Viskovatoff <viskovatoff@imap.cc> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12Add generic sys_old_select()Christoph Hellwig
Add a generic implementation of the old select() syscall, which expects its argument in a memory block and switch all architectures over to use it. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: James Morris <jmorris@namei.org> Acked-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Greg Ungerer <gerg@uclinux.org> Acked-by: David Howells <dhowells@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12fs: buffer_head: remove kmem_cache constructor to reduce memory usage under slubRichard Kennedy
When using slub, having a kmem_cache constructor forces slub to add a free pointer to the size of the cached object, which can have a significant impact to the number of small objects that can fit into a slab. As buffer_head is relatively small and we can have large numbers of them, removing the constructor is a definite win. On x86_64 removing the constructor gives me 39 objects/slab, 3 more than without the patch. And on x86_32 73 objects/slab, which is 9 more. As alloc_buffer_head() already initializes each new object there is very little difference in actual code run. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Acked-by: Nick Piggin <npiggin@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>