summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2012-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: igmp: Avoid zero delay when receiving odd mixture of IGMP queries netdev: make net_device_ops const bcm63xx: make ethtool_ops const usbnet: make ethtool_ops const net: Fix build with INET disabled. net: introduce netif_addr_lock_nested() and call if when appropriate net: correct lock name in dev_[uc/mc]_sync documentations. net: sk_update_clone is only used in net/core/sock.c 8139cp: fix missing napi_gro_flush. pktgen: set correct max and min in pktgen_setup_inject() smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h asix: fix infinite loop in rx_fixup() net: Default UDP and UNIX diag to 'n'. r6040: fix typo in use of MCR0 register bits net: fix sock_clone reference mismatch with tcp memcontrol
2012-01-09tracing/mm: Move include of trace/events/kmem.h out of header into slab.cSteven Rostedt
Including trace/events/*.h TRACE_EVENT() macro headers in other headers can cause strange side effects if another trace/event/*.h header includes that header. Having trace/events/kmem.h inside slab_def.h caused a compile error in sparc64 when changes were done to some header files. Moving the kmem.h trace header out of slab.h and into slab.c fixes the problem. Note, both slub.c and slob.c already include the trace/events/kmem.h file. Only slab.c had it missing. Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-09Merge branch 'for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Remove irqsafe_cpu_xxx variants Fix up conflict in arch/x86/include/asm/percpu.h due to clash with cebef5beed3d ("x86: Fix and improve percpu_cmpxchg{8,16}b_double()") which edited the (now removed) irqsafe_cpu_cmpxchg*_double code.
2012-01-09Merge branch 'for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cgroup: fix to allow mounting a hierarchy by name cgroup: move assignement out of condition in cgroup_attach_proc() cgroup: Remove task_lock() from cgroup_post_fork() cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end() cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static cgroup: only need to check oldcgrp==newgrp once cgroup: remove redundant get/put of task struct cgroup: remove redundant get/put of old css_set from migrate cgroup: Remove unnecessary task_lock before fetching css_set on migration cgroup: Drop task_lock(parent) on cgroup_fork() cgroups: remove redundant get/put of css_set from css_set_check_fetched() resource cgroups: remove bogus cast cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() cgroup, cpuset: don't use ss->pre_attach() cgroup: don't use subsys->can_attach_task() or ->attach_task() cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() cgroup: improve old cgroup handling in cgroup_attach_proc() cgroup: always lock threadgroup during migration threadgroup: extend threadgroup_lock() to cover exit and exec threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem ... Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups: fix a css_set not found bug in cgroup_attach_proc" that already mentioned that the bug is fixed (differently) in Tejun's cgroup patchset. This one, in other words.
2012-01-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-08Merge branch 'pm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) PM / Hibernate: Implement compat_ioctl for /dev/snapshot PM / Freezer: fix return value of freezable_schedule_timeout_killable() PM / shmobile: Allow the A4R domain to be turned off at run time PM / input / touchscreen: Make st1232 use device PM QoS constraints PM / QoS: Introduce dev_pm_qos_add_ancestor_request() PM / shmobile: Remove the stay_on flag from SH7372's PM domains PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412. PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() ARM: S3C64XX: Implement basic power domain support PM / shmobile: Use common always on power domain governor ... Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused XBT_FORCE_SLEEP bit
2012-01-08Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc/*/mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount * ...
2012-01-07Merge branch 'driver-core-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07net: fix sock_clone reference mismatch with tcp memcontrolGlauber Costa
Sockets can also be created through sock_clone. Because it copies all data in the sock structure, it also copies the memcg-related pointer, and all should be fine. However, since we now use reference counts in socket creation, we are left with some sockets that have no reference counts. It matters when we destroy them, since it leads to a mismatch. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> CC: Greg Thelen <gthelen@google.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07vfs: switch ->show_options() to struct dentry *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-07Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into ZAl Viro
2012-01-07Merge branch 'for-linus' of ↵Linus Torvalds
git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm * 'for-linus' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: (207 commits) ARM: 7267/1: Remove BUILD_BUG_ON from asm/bug.h ARM: 7269/1: mach-sa1100: fix sched_clock breakage ARM: 7198/1: arm/imx6: add restart support for imx6q ARM: restart: remove the now empty arch_reset() ARM: restart: remove comments about adding code to arch_reset() ARM: restart: lpc32xx & u300: remove unnecessary printk ARM: restart: plat-samsung: remove plat/reset.h and s5p_reset_hook ARM: restart: w90x900: use new restart hook ARM: restart: Versatile Express: use new restart hook ARM: restart: versatile: use new restart hook ARM: restart: u300: use new restart hook ARM: restart: tegra: use new restart hook ARM: restart: spear: use new restart hook ARM: restart: shark: use new restart hook ARM: restart: sa1100: use new restart hook ARM: 7252/1: restart: S5PV210: use new restart hook ARM: 7251/1: restart: S5PC100: use new restart hook ARM: 7250/1: restart: S5P64X0: use new restart hook ARM: 7266/1: restart: S3C64XX: use new restart hook ARM: 7265/1: restart: S3C24XX: use new restart hook ... Fix up trivial conflict in arch/arm/mm/init.c due to removal of memblock_init() clashing with the movement of the sorting of the meminfo array.
2012-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits) net: pack skb_shared_info more efficiently net_sched: red: split red_parms into parms and vars net_sched: sfq: extend limits cnic: Improve error recovery on bnx2x devices cnic: Re-init dev->stats_addr after chip reset net_sched: Bug in netem reordering bna: fix sparse warnings/errors bna: make ethtool_ops and strings const xgmac: cleanups net: make ethtool_ops const vmxnet3" make ethtool ops const xen-netback: make ops structs const virtio_net: Pass gfp flags when allocating rx buffers. ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call igb: reset PHY after recovering from PHY power down igb: add basic runtime PM support igb: Add support for byte queue limits. e1000: cleanup CE4100 MDIO registers access e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove ...
2012-01-06Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86: Fix atomic64_xxx_cx8() functions x86: Fix and improve cmpxchg_double{,_local}() x86_64, asm: Optimise fls(), ffs() and fls64() x86, bitops: Move fls64.h inside __KERNEL__ x86: Fix and improve percpu_cmpxchg{8,16}b_double() x86: Report cpb and eff_freq_ro flags correctly x86/i386: Use less assembly in strlen(), speed things up a bit x86: Use the same node_distance for 32 and 64-bit x86: Fix rflags in FAKE_STACK_FRAME x86: Clean up and extend do_int3() x86: Call do_notify_resume() with interrupts enabled x86/div64: Add a micro-optimization shortcut if base is power of two x86-64: Cleanup some assembly entry points x86-64: Slightly shorten line system call entry and exit paths x86-64: Reduce amount of redundant code generated for invalidate_interruptNN x86-64: Slightly shorten int_ret_from_sys_call x86, efi: Convert efi_phys_get_time() args to physical addresses x86: Default to vsyscall=emulate x86-64: Set siginfo and context on vsyscall emulation faults x86: consolidate xchg and xadd macros ...
2012-01-06Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06Merge branch 'core-memblock-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) memblock: Reimplement memblock allocation using reverse free area iterator memblock: Kill early_node_map[] score: Use HAVE_MEMBLOCK_NODE_MAP s390: Use HAVE_MEMBLOCK_NODE_MAP mips: Use HAVE_MEMBLOCK_NODE_MAP ia64: Use HAVE_MEMBLOCK_NODE_MAP SuperH: Use HAVE_MEMBLOCK_NODE_MAP sparc: Use HAVE_MEMBLOCK_NODE_MAP powerpc: Use HAVE_MEMBLOCK_NODE_MAP memblock: Implement memblock_add_node() memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users memblock: Track total size of regions automatically powerpc: Cleanup memblock usage memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove() memblock: Make memblock functions handle overflowing range @size memblock: Reimplement __memblock_remove() using memblock_isolate_range() memblock: Separate out memblock_isolate_range() from memblock_set_node() memblock: Kill memblock_init() memblock: Kill sentinel entries at the end of static region arrays memblock: Add __memblock_dump_all() ...
2012-01-05Merge branch 'devel-stable' into for-linusRussell King
Conflicts: arch/arm/kernel/setup.c arch/arm/mach-shmobile/board-kota2.c
2012-01-04x86: Fix and improve cmpxchg_double{,_local}()Jan Beulich
Just like the per-CPU ones they had several problems/shortcomings: Only the first memory operand was mentioned in the asm() operands, and the 2x64-bit version didn't have a memory clobber while the 2x32-bit one did. The former allowed the compiler to not recognize the need to re-load the data in case it had it cached in some register, while the latter was overly destructive. The types of the local copies of the old and new values were incorrect (the types of the pointed-to variables should be used here, to make sure the respective old/new variable types are compatible). The __dummy/__junk variables were pointless, given that local copies of the inputs already existed (and can hence be used for discarded outputs). The 32-bit variant of cmpxchg_double_local() referenced cmpxchg16b_local(). At once also: - change the return value type to what it really is: 'bool' - unify 32- and 64-bit variants - abstract out the common part of the 'normal' and 'local' variants Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F01F12A020000780006A19B@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-04should_remove_suid(): inode->i_mode is umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04shmem, ramfs: propagate umode_t, open-coded S_ISREGAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch debugfs to umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch ->mknod() to umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch ->create() to umode_tAl Viro
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch vfs_mkdir() and ->mkdir() to umode_tAl Viro
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04fs: move code out of buffer.cAl Viro
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: fix the stupidity with i_dentry in inode destructorsAl Viro
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-30mm: hugetlb: fix non-atomic enqueue of huge pageHillf Danton
If a huge page is enqueued under the protection of hugetlb_lock, then the operation is atomic and safe. Signed-off-by: Hillf Danton <dhillf@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-30mm/mempolicy.c: refix mbind_range() vma issueKOSAKI Motohiro
commit 8aacc9f550 ("mm/mempolicy.c: fix pgoff in mbind vma merge") is the slightly incorrect fix. Why? Think following case. 1. map 4 pages of a file at offset 0 [0123] 2. map 2 pages just after the first mapping of the same file but with page offset 2 [0123][23] 3. mbind() 2 pages from the first mapping at offset 2. mbind_range() should treat new vma is, [0123][23] |23| mbind vma but it does [0123][23] |01| mbind vma Oops. then, it makes wrong vma merge and splitting ([01][0123] or similar). This patch fixes it. [testcase] test result - before the patch case4: 126: test failed. expect '2,4', actual '2,2,2' case5: passed case6: passed case7: passed case8: passed case_n: 246: test failed. expect '4,2', actual '1,4' ------------[ cut here ]------------ kernel BUG at mm/filemap.c:135! invalid opcode: 0000 [#4] SMP DEBUG_PAGEALLOC (snip long bug on messages) test result - after the patch case4: passed case5: passed case6: passed case7: passed case8: passed case_n: passed source: mbind_vma_test.c ============================================================ #include <numaif.h> #include <numa.h> #include <sys/mman.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> static unsigned long pagesize; void* mmap_addr; struct bitmask *nmask; char buf[1024]; FILE *file; char retbuf[10240] = ""; int mapped_fd; char *rubysrc = "ruby -e '\ pid = %d; \ vstart = 0x%llx; \ vend = 0x%llx; \ s = `pmap -q #{pid}`; \ rary = []; \ s.each_line {|line|; \ ary=line.split(\" \"); \ addr = ary[0].to_i(16); \ if(vstart <= addr && addr < vend) then \ rary.push(ary[1].to_i()/4); \ end; \ }; \ print rary.join(\",\"); \ '"; void init(void) { void* addr; char buf[128]; nmask = numa_allocate_nodemask(); numa_bitmask_setbit(nmask, 0); pagesize = getpagesize(); sprintf(buf, "%s", "mbind_vma_XXXXXX"); mapped_fd = mkstemp(buf); if (mapped_fd == -1) perror("mkstemp "), exit(1); unlink(buf); if (lseek(mapped_fd, pagesize*8, SEEK_SET) < 0) perror("lseek "), exit(1); if (write(mapped_fd, "\0", 1) < 0) perror("write "), exit(1); addr = mmap(NULL, pagesize*8, PROT_NONE, MAP_SHARED, mapped_fd, 0); if (addr == MAP_FAILED) perror("mmap "), exit(1); if (mprotect(addr+pagesize, pagesize*6, PROT_READ|PROT_WRITE) < 0) perror("mprotect "), exit(1); mmap_addr = addr + pagesize; /* make page populate */ memset(mmap_addr, 0, pagesize*6); } void fin(void) { void* addr = mmap_addr - pagesize; munmap(addr, pagesize*8); memset(buf, 0, sizeof(buf)); memset(retbuf, 0, sizeof(retbuf)); } void mem_bind(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_BIND, nmask->maskp, nmask->size, 0); if (err) perror("mbind "), exit(err); } void mem_interleave(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_INTERLEAVE, nmask->maskp, nmask->size, 0); if (err) perror("mbind "), exit(err); } void mem_unbind(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_DEFAULT, NULL, 0, 0); if (err) perror("mbind "), exit(err); } void Assert(char *expected, char *value, char *name, int line) { if (strcmp(expected, value) == 0) { fprintf(stderr, "%s: passed\n", name); return; } else { fprintf(stderr, "%s: %d: test failed. expect '%s', actual '%s'\n", name, line, expected, value); // exit(1); } } /* AAAA PPPPPPNNNNNN might become PPNNNNNNNNNN case 4 below */ void case4(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 4); mem_unbind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("2,4", retbuf, "case4", __LINE__); fin(); } /* AAAA PPPPPPNNNNNN might become PPPPPPPPPPNN case 5 below */ void case5(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case5", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPPPPPPPPP 6 */ void case6(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_bind(4, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("6", retbuf, "case6", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPPPPPXXXX 7 */ void case7(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_interleave(4, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case7", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPNNNNNNNN 8 */ void case8(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_interleave(4, 2); mem_interleave(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("2,4", retbuf, "case8", __LINE__); fin(); } void case_n(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); /* make redundunt mappings [0][1234][34][7] */ mmap(mmap_addr + pagesize*4, pagesize*2, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_SHARED, mapped_fd, pagesize*3); /* Expect to do nothing. */ mem_unbind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case_n", __LINE__); fin(); } int main(int argc, char** argv) { case4(); case5(); case6(); case7(); case8(); case_n(); return 0; } ============================================================= Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Caspar Zhang <caspar@casparzhang.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <stable@vger.kernel.org> [3.1.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-25Merge branch 'pm-sleep' into pm-for-linusRafael J. Wysocki
* pm-sleep: (51 commits) PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep() PM / Sleep: Make [un]lock_system_sleep() generic PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count() PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else' Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer PM / Sleep: Unify diagnostic messages from device suspend/resume ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR PM / Hibernate: Remove deprecated hibernation test modes PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path ... Conflicts: kernel/kmod.c
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Partial revert "Basic kernel memory functionality for the Memory Controller"Glauber Costa
This reverts commit e5671dfae59b165e2adfd4dfbdeab11ac8db5bda. After a follow up discussion with Michal, it was agreed it would be better to leave the kmem controller with just the tcp files, deferring the behavior of the other general memory.kmem.* files for a later time, when more caches are controlled. This is because generic kmem files are not used by tcp accounting and it is not clear how other slab caches would fit into the scheme. We are reverting the original commit so we can track the reference. Part of the patch is kept, because it was used by the later tcp code. Conflicts are shown in the bottom. init/Kconfig is removed from the revert entirely. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: David S. Miller <davem@davemloft.net> Conflicts: Documentation/cgroups/memory.txt mm/memcontrol.c Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22percpu: Remove irqsafe_cpu_xxx variantsChristoph Lameter
We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. -tj: This is part of on-going percpu API cleanup. For detailed discussion of the subject, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1222078 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
2011-12-22vfs: __read_cache_page should use gfp argument rather than GFP_KERNELDave Kleikamp
lockdep reports a deadlock in jfs because a special inode's rw semaphore is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used when __read_cache_page() calls add_to_page_cache_lru(). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-21convert 'memory' sysdev_class to a regular subsystemKay Sievers
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21Merge branch 'master' into pm-sleepRafael J. Wysocki
* master: (848 commits) SELinux: Fix RCU deref check warning in sel_netport_insert() binary_sysctl(): fix memory leak mm/vmalloc.c: remove static declaration of va from __get_vm_area_node ipmi_watchdog: restore settings when BMC reset oom: fix integer overflow of points in oom_badness memcg: keep root group unchanged if creation fails nilfs2: potential integer overflow in nilfs_ioctl_clean_segments() nilfs2: unbreak compat ioctl cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask evm: prevent racing during tfm allocation evm: key must be set once during initialization mmc: vub300: fix type of firmware_rom_wait_states module parameter Revert "mmc: enable runtime PM by default" mmc: sdhci: remove "state" argument from sdhci_suspend_host x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT IB/qib: Correct sense on freectxts increment and decrement RDMA/cma: Verify private data length cgroups: fix a css_set not found bug in cgroup_attach_proc oprofile: Fix uninitialized memory access when writing to writing to oprofilefs Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel" ... Conflicts: kernel/cgroup_freezer.c
2011-12-20mm/vmalloc.c: remove static declaration of va from __get_vm_area_nodeKautuk Consul
Static storage is not required for the struct vmap_area in __get_vm_area_node. Removing "static" to store this variable on the stack instead. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20oom: fix integer overflow of points in oom_badnessFrantisek Hrbata
An integer overflow will happen on 64bit archs if task's sum of rss, swapents and nr_ptes exceeds (2^31)/1000 value. This was introduced by commit f755a04 oom: use pte pages in OOM score where the oom score computation was divided into several steps and it's no longer computed as one expression in unsigned long(rss, swapents, nr_pte are unsigned long), where the result value assigned to points(int) is in range(1..1000). So there could be an int overflow while computing 176 points *= 1000; and points may have negative value. Meaning the oom score for a mem hog task will be one. 196 if (points <= 0) 197 return 1; For example: [ 3366] 0 3366 35390480 24303939 5 0 0 oom01 Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical memory, but it's oom score is one. In this situation the mem hog task is skipped and oom killer kills another and most probably innocent task with oom score greater than one. The points variable should be of type long instead of int to prevent the int overflow. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20memcg: keep root group unchanged if creation failsHillf Danton
If the request is to create non-root group and we fail to meet it, we should leave the root unchanged. Signed-off-by: Hillf Danton <dhillf@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20Merge branch 'memblock-kill-early_node_map' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock
2011-12-15percpu: fix per_cpu_ptr_to_phys() handling of non-page-aligned addressesEugene Surovegin
per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc case to the page boundary, which is bogus for any non-page-aligned address. This affects the only in-tree user of this function - sysfs handler for per-cpu 'crash_notes' physical address. The trouble is that the crash_notes per-cpu variable is not page-aligned: crash_notes = 0xc08e8ed4 PER-CPU OFFSET VALUES: CPU 0: 3711f000 CPU 1: 37129000 CPU 2: 37133000 CPU 3: 3713d000 So, the per-cpu addresses are: crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4 crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4 crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4 crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4 However, /sys/devices/system/cpu/cpu*/crash_notes says: /sys/devices/system/cpu/cpu0/crash_notes: 36b57000 /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000 /sys/devices/system/cpu/cpu2/crash_notes: 36b43000 /sys/devices/system/cpu/cpu3/crash_notes: 36b39000 As you can see, all values are rounded down to a page boundary. Consequently, this is where kexec sets up the NOTE segments, and thus where the secondary kernel is looking for them. However, when the first kernel crashes, it saves the notes to the unaligned addresses, where they are not found. Fix it by adding offset_in_page() to the translated page address. -tj: Combined Eugene's and Petr's commit messages. Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Petr Tesarik <ptesarik@suse.cz> Cc: stable@kernel.org
2011-12-13Merge branch 'writeback-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: set max_pause to lowest value on zero bdi_dirty writeback: permit through good bdi even when global dirty exceeded writeback: comment on the bdi dirty threshold fs: Make write(2) interruptible by a fatal signal writeback: Fix issue on make htmldocs
2011-12-13cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), ↵Tejun Heo
cancel_attach() and attach() Currently, there's no way to pass multiple tasks to cgroup_subsys methods necessitating the need for separate per-process and per-task methods. This patch introduces cgroup_taskset which can be used to pass multiple tasks and their associated cgroups to cgroup_subsys methods. Three methods - can_attach(), cancel_attach() and attach() - are converted to use cgroup_taskset. This unifies passed parameters so that all methods have access to all information. Conversions in this patchset are identical and don't introduce any behavior change. -v2: documentation updated as per Paul Menage's suggestion. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Menage <paul@paulmenage.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org>
2011-12-13tcp memory pressure controlsGlauber Costa
This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13socket: initial cgroup code.Glauber Costa
The goal of this work is to move the memory pressure tcp controls to a cgroup, instead of just relying on global conditions. To avoid excessive overhead in the network fast paths, the code that accounts allocated memory to a cgroup is hidden inside a static_branch(). This branch is patched out until the first non-root cgroup is created. So when nobody is using cgroups, even if it is mounted, no significant performance penalty should be seen. This patch handles the generic part of the code, and has nothing tcp-specific. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13Basic kernel memory functionality for the Memory ControllerGlauber Costa
This patch lays down the foundation for the kernel memory component of the Memory Controller. As of today, I am only laying down the following files: * memory.independent_kmem_limit * memory.kmem.limit_in_bytes (currently ignored) * memory.kmem.usage_in_bytes (always zero) Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: Michal Hocko <mhocko@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09mm: vmalloc: check for page allocation failure before vmlist insertionMel Gorman
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via /proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after it is fully initialised. Unfortunately, it did not check that __vmalloc_area_node() successfully populated the area. In the event of allocation failure, the vmalloc area is freed but the pointer to freed memory is inserted into the vmlist leading to a a crash later in get_vmalloc_info(). This patch adds a check for ____vmalloc_area_node() failure within __vmalloc_node_range. It does not use "goto fail" as in the previous error path as a warning was already displayed by __vmalloc_area_node() before it called vfree in its failure path. Credit goes to Luciano Chavez for doing all the real work of identifying exactly where the problem was. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com> Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.1.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09mm: Ensure that pfn_valid() is called once per pageblock when reserving ↵Michal Hocko
pageblocks setup_zone_migrate_reserve() expects that zone->start_pfn starts at pageblock_nr_pages aligned pfn otherwise we could access beyond an existing memblock resulting in the following panic if CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid: IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 *pdpt = 0000000000000000 *pde = f000ff53f000ff53 Oops: 0000 [#1] SMP Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0 EIP is at setup_zone_migrate_reserve+0xcd/0x180 EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000 ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000) Call Trace: [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160 [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20 [<c08a771c>] init_per_zone_wmark_min+0x27/0x86 [<c020111b>] do_one_initcall+0x2b/0x160 [<c086639d>] kernel_init+0xbe/0x157 [<c05cae26>] kernel_thread_helper+0x6/0xd Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58 CR2: 00000000000012b4 We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because highstart_pfn = 0x36ffe. The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE"). Make sure that start_pfn is always aligned to pageblock_nr_pages to ensure that pfn_valid s always called at the start of each pageblock. Architectures with holes in pageblocks will be correctly handled by pfn_valid_within in pageblock_is_reserved. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Dang Bo <bdang@vmware.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pagesHillf Danton
Avoid unlocking and unlocked page if we failed to lock it. Signed-off-by: Hillf Danton <dhillf@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09thp: set compound tail page _count to zeroYouquan Song
Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all page_tail->_count zero at all times. But the current kernel does not set page_tail->_count to zero if a 1GB page is utilized. So when an IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a tail page's _count does not equal zero. kernel BUG at include/linux/mm.h:386! invalid opcode: 0000 [#1] SMP Call Trace: gup_pud_range+0xb8/0x19d get_user_pages_fast+0xcb/0x192 ? trace_hardirqs_off+0xd/0xf hva_to_pfn+0x119/0x2f2 gfn_to_pfn_memslot+0x2c/0x2e kvm_iommu_map_pages+0xfd/0x1c1 kvm_iommu_map_memslots+0x7c/0xbd kvm_iommu_map_guest+0xaa/0xbf kvm_vm_ioctl_assigned_device+0x2ef/0xa47 kvm_vm_ioctl+0x36c/0x3a2 do_vfs_ioctl+0x49e/0x4e4 sys_ioctl+0x5a/0x7c system_call_fastpath+0x16/0x1b RIP gup_huge_pud+0xf2/0x159 Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>