summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2005-05-01[PATCH] use smp_mb/wmb/rmb where possibleakpm@osdl.org
Replace a number of memory barriers with smp_ variants. This means we won't take the unnecessary hit on UP machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] add kmalloc_node, inline cleanupManfred Spraul
The patch makes the following function calls available to allocate memory on a specific node without changing the basic operation of the slab allocator: kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node); kmalloc_node(size_t size, unsigned int flags, int node); in a similar way to the existing node-blind functions: kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags); kmalloc(size, flags); kmem_cache_alloc_node was changed to pass flags and the node information through the existing layers of the slab allocator (which lead to some minor rearrangements). The functions at the lowest layer (kmem_getpages, cache_grow) are already node aware. Also __alloc_percpu can call kmalloc_node now. Performance measurements (using the pageset localization patch) yields: w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.27 100 484.2736 12.02 1.97 Wed Mar 30 20:50:43 2005 100 25170.83 91 251.7083 23.12 150.10 Wed Mar 30 20:51:06 2005 200 34601.66 84 173.0083 33.64 294.14 Wed Mar 30 20:51:40 2005 300 37154.47 86 123.8482 46.99 436.56 Wed Mar 30 20:52:28 2005 400 39839.82 80 99.5995 58.43 580.46 Wed Mar 30 20:53:27 2005 500 40036.32 79 80.0726 72.68 728.60 Wed Mar 30 20:54:40 2005 600 44074.21 79 73.4570 79.23 872.10 Wed Mar 30 20:55:59 2005 700 44016.60 78 62.8809 92.56 1015.84 Wed Mar 30 20:57:32 2005 800 40411.05 80 50.5138 115.22 1161.13 Wed Mar 30 20:59:28 2005 900 42298.56 79 46.9984 123.83 1303.42 Wed Mar 30 21:01:33 2005 1000 40955.05 80 40.9551 142.11 1441.92 Wed Mar 30 21:03:55 2005 with pageset localization and slab API patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.19 100 484.1930 12.02 1.98 Wed Mar 30 21:10:18 2005 100 27428.25 92 274.2825 21.22 149.79 Wed Mar 30 21:10:40 2005 200 37228.94 86 186.1447 31.27 293.49 Wed Mar 30 21:11:12 2005 300 41725.42 85 139.0847 41.84 434.10 Wed Mar 30 21:11:54 2005 400 43032.22 82 107.5805 54.10 582.06 Wed Mar 30 21:12:48 2005 500 42211.23 83 84.4225 68.94 722.61 Wed Mar 30 21:13:58 2005 600 40084.49 82 66.8075 87.12 873.11 Wed Mar 30 21:15:25 2005 700 44169.30 79 63.0990 92.24 1008.77 Wed Mar 30 21:16:58 2005 800 43097.94 79 53.8724 108.03 1155.88 Wed Mar 30 21:18:47 2005 900 41846.75 79 46.4964 125.17 1303.38 Wed Mar 30 21:20:52 2005 1000 40247.85 79 40.2478 144.60 1442.21 Wed Mar 30 21:23:17 2005 Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] sync_page() smp_mb() commentWilliam Lee Irwin III
The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses page_mapping(page). The comments in the patch (the entire patch is the addition of this comment) try to explain further how and why smp_mb() is used. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] RLIMIT_MEMLOCK checking fixChris Wright
Always use page counts when doing RLIMIT_MEMLOCK checking to avoid possible overflow. Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] count bounce buffer pages in vmstatKAMEZAWA Hiroyuki
This is a patch for counting the number of pages for bounce buffers. It's shown in /proc/vmstat. Currently, the number of bounce pages are not counted anywhere. So, if there are many bounce pages, it seems that there are leaked pages. And it's difficult for a user to imagine the usage of bounce pages. So, it's meaningful to show # of bouce pages. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] mm: use __GFP_NOMEMALLOCNick Piggin
Use the new __GFP_NOMEMALLOC to simplify the previous handling of PF_MEMALLOC. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] mempool: simplify allocNick Piggin
Mempool is pretty clever. Looks too clever for its own good :) It shouldn't really know so much about page reclaim internals. - don't guess about what effective page reclaim might involve. - don't randomly flush out all dirty data if some unlikely thing happens (alloc returns NULL). page reclaim can (sort of :P) handle it. I think the main motivation is trying to avoid pool->lock at all costs. However the first allocation is attempted with __GFP_WAIT cleared, so it will be 'can_try_harder' if it hits the page allocator. So if allocation still fails, then we can probably afford to hit the pool->lock - and what's the alternative? Try page reclaim and hit zone->lru_lock? A nice upshot is that we don't need to do any fancy memory barriers or do (intentionally) racy access to pool-> fields outside the lock. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] mempool: NOMEMALLOC and NORETRYNick Piggin
Mempools have 2 problems. The first is that mempool_alloc can possibly get stuck in __alloc_pages when they should opt to fail, and take an element from their reserved pool. The second is that it will happily eat emergency PF_MEMALLOC reserves instead of going to their reserved pools. Fix the first by passing __GFP_NORETRY in the allocation calls in mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which directs the page allocator not to allocate from the reserve pool. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] mm: pcp use non powers of 2 for batch sizeNick Piggin
Jack Steiner reported this to have fixed his problem (bad colouring): "The patches fix both problems that I found - bad coloring & excessive pages in pagesets." In most workloads this is not likely to be such a pronounced problem, however it should help corner cases. And avoiding powers of 2 in these types of memory operations is always a good idea. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] mm: rmap.c cleanupNikita Danilov
mm/rmap.c:page_referenced_one() and mm/rmap.c:try_to_unmap_one() contain identical code that - takes mm->page_table_lock; - drills through page tables; - checks that correct pte is reached. Coalesce this into page_check_address() Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] RLIMIT_AS checking fixakpm@osdl.org
Address bug #4508: there's potential for wraparound in the various places where we perform RLIMIT_AS checking. (I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is always equal to current->mm? If not, then we're comparing some other process's total_vm with the calling process's rlimits). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] generic_file_buffered_write fixesakpm@osdl.org
Anton Altaparmakov <aia21@cam.ac.uk> points out: - It calls fault_in_pages_readable() which is completely bogus if @nr_segs > 1. It needs to be replaced by a to be written "fault_in_pages_readable_iovec()". - It increments @buf even in the iovec case thus @buf can point to random memory really quickly (in the iovec case) and then it calls fault_in_pages_readable() on this random memory. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-24[PATCH] mempolicy.c GFP fixAl Viro
zonelist_policy() forgot to mask non-zone bits from gfp when comparing zone number with policy_zone. ACKed-by: Andi Kleen <ak@suse.de> Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: remove FIRST_USER_ADDRESS hackHugh Dickins
Once all the MMU architectures define FIRST_USER_ADDRESS, remove hack from mmap.c which derived it from FIRST_USER_PGD_NR. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NRHugh Dickins
Remove use of FIRST_USER_PGD_NR from sys_mincore: it's inconsistent (no other syscall refers to it), unnecessary (sys_mincore loops over vmas further down) and incorrect (misses user addresses in ARM's first pgd). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESSHugh Dickins
The patches to free_pgtables by vma left problems on any architectures which leave some user address page table entries unencapsulated by vma. Andi has fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose first PAGE_SIZE is reserved (perhaps) for machine vectors. Our calls to free_pgtables must not touch that area, and exit_mmap's BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have allocated an extra page table, which its free_pgd_slow would free later. FIRST_USER_PGD_NR has misled me and others: until all the arches define FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This patch fixes the bugs, the remaining patches just clean it up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: mpnt to vma cleanupHugh Dickins
While dabbling here in mmap.c, clean up mysterious "mpnt"s to "vma"s. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: hugetlb_free_pgd_rangeHugh Dickins
ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being called, and it wasn't obvious what to do about them. The ppc64 case turns out to be easy: the associated tables are noted elsewhere and freed later, safe to either skip its hugetlb areas or go through the motions of freeing nothing. Since ia64 does need a special case, restore to ppc64 the special case of skipping them. The ia64 hugetlb case has been broken since pgd_addr_end went in, though it probably appeared to work okay if you just had one such area; in fact it's been broken much longer if you consider a long munmap spanning from another region into the hugetlb region. In the ia64 hugetlb region, more virtual address bits are available than in the other regions, yet the page tables are structured the same way: the page at the bottom is larger. Here we need to scale down each addr before passing it to the standard free_pgd_range. Was about to write a hugely_scaled_down macro, but found htlbpage_to_page already exists for just this purpose. Fixed off-by-one in ia64 is_hugepage_only_range. Uninline free_pgd_range to make it available to ia64. Make sure the vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any other (safe to join huges? probably but don't bother). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: remove MM_VM_SIZE(mm)Hugh Dickins
There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro because mm doesn't contain the (32-bit emulation?) info needed. But it too is only needed because we ignore the end from the vma list. We could make flush_pgtables return that end, or unmap_vmas. Choose the latter, since it's a natural fit with unmap_mapping_range_vma needing to know its restart addr. This does make more than minimal change, but if unmap_vmas had returned the end before, this is how we'd have done it, rather than storing the break_addr in zap_details. unmap_vmas used to return count of vmas scanned, but that's just debug which hasn't been useful in a while; and if we want the map_count 0 on exit check back, it can easily come from the final remove_vm_struct loop. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: free_pgtables use vma listHugh Dickins
Recent woes with some arches needing their own pgd_addr_end macro; and 4-level clear_page_range regression since 2.6.10's clear_page_tables; and its long-standing well-known inefficiency in searching throughout the higher-level page tables for those few entries to clear and free: all can be blamed on ignoring the list of vmas when we free page tables. Replace exit_mmap's clear_page_range of the total user address space by free_pgtables operating on the mm's vma list; unmap_region use it in the same way, giving floor and ceiling beyond which it may not free tables. This brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled, in which case latency fixes spoil unmap_vmas throughput). Beware: the do_mmap_pgoff driver failure case must now use unmap_region instead of zap_page_range, since a page table might have been allocated, and can only be freed while it is touched by some vma. Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted from the clear_page_range levels. (Most of free_pgtables' old code was actually for a non-existent case, prev not properly set up, dating from before hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we might want to add latency lockdrops later; but no attempt to do so yet, going by vma should itself reduce latency. But what if is_hugepage_only_range? Those ia64 and ppc64 cases need careful examination: put that off until a later patch of the series. What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma? And the range to sparc64's flush_tlb_pgtables? It's less clear to me now that we need to do more than is done here - every PMD_SIZE ever occupied will be flushed, do we really have to flush every PGDIR_SIZE ever partially occupied? A shame to complicate it unnecessarily. Special thanks to David Miller for time spent repairing my ceilings. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16[PATCH] vmscan: pageout(): remove unneeded testakpm@osdl.org
) We only call pageout() for dirty pages, so this test is redundant. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16[PATCH] oom-killer disable for iscsi/lvm2/multipath userland critical sectionsAndrea Arcangeli
iscsi/lvm2/multipath needs guaranteed protection from the oom-killer, so make the magical value of -17 in /proc/<pid>/oom_adj defeat the oom-killer altogether. (akpm: we still need to document oom_adj and friends in Documentation/filesystems/proc.txt!) Signed-off-by: Andrea Arcangeli <andrea@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16[PATCH] filemap_getpage can block when MAP_NONBLOCK specifiedJeff Moyer
We will return NULL from filemap_getpage when a page does not exist in the page cache and MAP_NONBLOCK is specified, here: page = find_get_page(mapping, pgoff); if (!page) { if (nonblock) return NULL; goto no_cached_page; } But we forget to do so when the page in the cache is not uptodate. The following could result in a blocking call: /* * Ok, found a page in the page cache, now we need to check * that it's up-to-date. */ if (!PageUptodate(page)) goto page_not_uptodate; Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!