summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-02-03hfsplus: fix up a comparism in hfsplus_file_extendChristoph Hellwig
Revert an incorrect hunk from commit b2837fcf4994e699a4def002e26f274d95b387c1, "hfsplus: %L-to-%ll, macro correction, and remove unneeded braces" revert a pointless change of comparism operation argument order, which turned out to not even be equivalent. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>
2011-02-03hfsplus: fix two memory leaks in wrapper.cChuck Ebbert
Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>
2011-02-03hfsplus: do not leak buffer on errorChuck Ebbert
Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>
2011-02-03hfsplus: fix failed mount handlingChristoph Hellwig
Currently the error handling in hfsplus_fill_super is a mess, and can lead to accessing fields in the superblock that haven't been even set up yet. Fix this by making sure we do not set up sb->s_root until we have the mount fully set up, and before that do proper step by step unwinding instead of using hfsplus_put_super as a big hammer. Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>
2011-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] libsas: fix runaway error handler problem [SCSI] fix incorrect value of SCSI_MAX_SG_CHAIN_SEGMENTS due to include file ordering [SCSI] arcmsr: Fix the issue of system hangup after commands timeout on ARC-1200 [SCSI] mpt2sas: fix Integrated Raid unsynced on shutdown problem [SCSI] mpt2sas: Kernel Panic during Large Topology discovery [SCSI] mpt2sas: Fix the race between broadcast asyn event and scsi command completion [SCSI] mpt2sas: Correct resizing calculation for max_queue_depth [SCSI] mpt2sas: fix internal device reset for older firmware prior to MPI Rev K [SCSI] mpt2sas: Fix device removal handshake for zoned devices
2011-02-03x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after ↵Suresh Siddha
switching mm Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org [v2.6.32+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03RTC: Prevents a division by zero in kernel code.Marcelo Roberto Jimenez
This patch prevents a user space program from calling the RTC_IRQP_SET ioctl with a negative value of frequency. Also, if this call is make with a zero value of frequency, there would be a division by zero in the kernel code. [jstultz: Also initialize irq_freq to 1 to catch other divbyzero issues] CC: Alessandro Zummo <a.zummo@towertech.it> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-03Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
2011-02-03perf stat: Fix aggreate counter reading accountingArnaldo Carvalho de Melo
Introduced in: c52b12ed, when this sequence: count[0] = count[1] = count[2] = 0; Was replaced with: aggr->val = 0; Which is equivalent to zeroing just the first entry in the 'count' array. Fix it by zeroing the three entries with: aggr->val = aggr->ena = aggr->run = 0; Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-02-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA: Update missed conversion of flush_scheduled_work() RDMA/ucma: Copy iWARP route information on queries RDMA/amso1100: Fix compile warnings RDMA/cxgb4: Set the correct device physical function for iWARP connections RDMA/cxgb4: Limit MAXBURST EQ context field to 256B IB/qib: Hold link for TX SERDES settings mlx4_core: Add ConnectX-3 device IDs
2011-02-03Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Prevent irq storm on migration
2011-02-03Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix update_curr_rt() sched, docs: Update schedstats documentation to version 15
2011-02-03Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix reading in perf_event_read() watchdog: Don't change watchdog state on read of sysctl watchdog: Fix sysctl consistency watchdog: Fix broken nowatchdog logic perf: Fix Pentium4 raw event validation perf: Fix alloc_callchain_buffers()
2011-02-03tracing: Replace syscall_meta_data struct array with pointer arraySteven Rostedt
Currently the syscall_meta structures for the syscall tracepoints are placed in the __syscall_metadata section, and at link time, the linker makes one large array of all these syscall metadata structures. On boot up, this array is read (much like the initcall sections) and the syscall data is processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the __syscall_metadata section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The __syscall_metadata section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03tracepoints: Fix section alignment using pointer arrayMathieu Desnoyers
Make the tracepoints more robust, making them solid enough to handle compiler changes by not relying on anything based on compiler-specific behavior with respect to structure alignment. Implement an approach proposed by David Miller: use an array of const pointers to refer to the individual structures, and export this pointer array through the linker script rather than the structures per se. It will consume 32 extra bytes per tracepoint (24 for structure padding and 8 for the pointers), but are less likely to break due to compiler changes. History: commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() added the aligned(32) type and variable attribute to the tracepoint structures to deal with gcc happily aligning statically defined structures on 32-byte multiples. One attempt was to use a 8-byte alignment for tracepoint structures by applying both the variable and type attribute to tracepoint structures definitions and declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5. The reason is that the "aligned" attribute only specify the _minimum_ alignment for a structure, leaving both the compiler and the linker free to align on larger multiples. Because tracepoint.c expects the structures to be placed as an array within each section, up-alignment cause NULL-pointer exceptions due to the extra unexpected padding. (this patch applies on top of -tip) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> LKML-Reference: <20110126222622.GA10794@Krystal> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03sched: Fix update_curr_rt()Peter Zijlstra
cpu_stopper_thread() migration_cpu_stop() __migrate_task() deactivate_task() dequeue_task() dequeue_task_rq() update_curr_rt() Will call update_curr_rt() on rq->curr, which at that time is rq->stop. The problem is that rq->stop.prio matches an RT prio and thus falsely assumes its a rt_sched_class task. Reported-Debuged-Tested-Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: stable@kernel.org # .37 Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03perf: Fix reading in perf_event_read()Peter Zijlstra
It is quite possible for the event to have been disabled between perf_event_read() sending the IPI and the CPU servicing the IPI and calling __perf_event_read(), hence revalidate the state. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03perf: Cure task_oncpu_function_call() racesPeter Zijlstra
Oleg reported that on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from task_oncpu_function_call() can land before perf_event_task_sched_in() and cause interesting situations for eg. perf_install_in_context(). This patch reworks the task_oncpu_function_call() interface to give a more usable primitive as well as rework all its users to hopefully be more obvious as well as remove the races. While looking at the code I also found a number of races against perf_event_task_sched_out() which can flip contexts between tasks so plug those too. Reported-and-reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platformsSuresh Siddha
Markus Kohn ran into a hard hang regression on an acer aspire 1310, when acpi is enabled. git bisect showed the following commit as the bad one that introduced the boot regression. commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Aug 19 18:05:36 2009 -0700 x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init Because of the UP configuration of that platform, native_smp_prepare_cpus() bailed out (in smp_sanity_check()) before doing the set_mtrr_aps_delayed_init() Further down the boot path, native_smp_cpus_done() will call the delayed MTRR initialization for the AP's (mtrr_aps_init()) with mtrr_aps_delayed_init not set. This resulted in the boot processor reprogramming its MTRR's to the values seen during the start of the OS boot. While this is not needed ideally, this shouldn't have caused any side-effects. This is because the reprogramming of MTRR's (set_mtrr_state() that gets called via set_mtrr()) will check if the live register contents are different from what is being asked to write and will do the actual write only if they are different. BP's mtrr state is read during the start of the OS boot and typically nothing would have changed when we ask to reprogram it on BP again because of the above scenario on an UP platform. So on a normal UP platform no reprogramming of BP MTRR MSR's happens and all is well. However, on this platform, bios seems to be modifying the fixed mtrr range registers between the start of OS boot and when we double check the live registers for reprogramming BP MTRR registers. And as the live registers are modified, we end up reprogramming the MTRR's to the state seen during the start of the OS boot. During ACPI initialization, something in the bios (probably smi handler?) don't like this fact and results in a hard lockup. We didn't see this boot hang issue on this platform before the commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3, because only the AP's (if any) will program its MTRR's to the value that BP had at the start of the OS boot. Fix this issue by checking mtrr_aps_delayed_init before continuing further in the mtrr_aps_init(). Now, only AP's (if any) will program its MTRR's to the BP values during boot. Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393 [ By the way, this behavior of the bios modifying MTRR's after the start of the OS boot is not common and the kernel is not prepared to handle this situation well. Irrespective of this issue, during suspend/resume, linux kernel will try to reprogram the BP's MTRR values to the values seen during the start of the OS boot. So suspend/resume might be already broken on this platform for all linux kernel versions. ] Reported-and-bisected-by: Markus Kohn <jabber@gmx.org> Tested-by: Markus Kohn <jabber@gmx.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@novell.com> Cc: Rafael Wysocki <rjw@novell.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: stable@kernel.org # [v2.6.32+] LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03tracing: Replace trace_event struct array with pointer arraySteven Rostedt
Currently the trace_event structures are placed in the _ftrace_events section, and at link time, the linker makes one large array of all the trace_event structures. On boot up, this array is read (much like the initcall sections) and the events are processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the _ftrace_event section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The _ftrace_event section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03Revert "exofs: Set i_mapping->backing_dev_info anyway"Boaz Harrosh
This reverts commit 115e19c53501edc11f730191f7f047736815ae3d. Apparently setting inode->bdi to one's own sb->s_bdi stops VFS from sending *read-aheads*. This problem was bisected to this commit. A revert fixes it. I'll investigate farther why is this happening for the next Kernel, but for now a revert. I'm sending to stable@kernel.org as well, since it exists also in 2.6.37. 2.6.36 is good and does not have this patch. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03Merge branch 'media_fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'media_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: [media] fix saa7111 non-detection [media] rc/streamzap: fix reporting response times [media] mceusb: really fix remaining keybounce issues [media] rc: use time unit conversion macros correctly [media] rc/ir-lirc-codec: add back debug spew [media] ir-kbd-i2c: improve remote behavior with z8 behind usb [media] lirc_zilog: z8 on usb doesn't like back-to-back i2c_master_send [media] hdpvr: fix up i2c device registration [media] rc/mce: add mappings for missing keys [media] gspca - zc3xx: Discard the partial frames [media] gspca - zc3xx: Fix bad images with the sensor hv7131r [media] gspca - zc3xx: Bad delay when given by a table
2011-02-03Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] reset default for CONFIG_CHSC_SCH [S390] qdio: prevent compile warning under CONFIG_32BIT [S390] use asm-generic/cacheflush.h [S390] tlb: fix build error caused by THP [S390] missing sacf in uaccess [S390] pgtable_list corruption [S390] dasd: prevent panic with unresumed devices
2011-02-03fs: make block fiemap mapping length at least blocksize longJosef Bacik
Some filesystems don't deal well with being asked to map less than blocksize blocks (GFS2 for example). Since we are always mapping at least blocksize sections anyway, just make sure len is at least as big as a blocksize so we don't trip up any filesystems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03vfs: sparse: add __FMODE_EXECNamhyung Kim
FMODE_EXEC is a constant type of fmode_t but was used with normal integer constants. This results in following warnings from sparse. Fix it using new macro __FMODE_EXEC. fs/exec.c:116:58: warning: restricted fmode_t degrades to integer fs/exec.c:689:58: warning: restricted fmode_t degrades to integer fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03vfs: sparse: remove a warning on OPEN_FMODE()Namhyung Kim
AND-ing FMODE_* constant with normal integer results in following sparse warnings. Fix it. fs/open.c:662:21: warning: restricted fmode_t degrades to integer fs/anon_inodes.c:123:34: warning: restricted fmode_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memcg: fix event counting breakage from recent THP updateKAMEZAWA Hiroyuki
Changes in e401f1761 ("memcg: modify accounting function for supporting THP better") adds nr_pages to support multiple page size in memory_cgroup_charge_statistics. But counting the number of event nees abs(nr_pages) for increasing counters. This patch fixes event counting. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memcg: never OOM when charging huge pagesJohannes Weiner
Huge page coverage should obviously have less priority than the continued execution of a process. Never kill a process when charging it a huge page fails. Instead, give up after the first failed reclaim attempt and fall back to regular pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memcg: prevent endless loop when charging huge pages to near-limit groupJohannes Weiner
If reclaim after a failed charging was unsuccessful, the limits are checked again, just in case they settled by means of other tasks. This is all fine as long as every charge is of size PAGE_SIZE, because in that case, being below the limit means having at least PAGE_SIZE bytes available. But with transparent huge pages, we may end up in an endless loop where charging and reclaim fail, but we keep going because the limits are not yet exceeded, although not allowing for a huge page. Fix this up by explicitely checking for enough room, not just whether we are within limits. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memcg: prevent endless loop when charging huge pagesJohannes Weiner
The charging code can encounter a charge size that is bigger than a regular page in two situations: one is a batched charge to fill the per-cpu stocks, the other is a huge page charge. This code is distributed over two functions, however, and only the outer one is aware of huge pages. In case the charging fails, the inner function will tell the outer function to retry if the charge size is bigger than regular pages--assuming batched charging is the only case. And the outer function will retry forever charging a huge page. This patch makes sure the inner function can distinguish between batch charging and a single huge page charge. It will only signal another attempt if batch charging failed, and go into regular reclaim when it is called on behalf of a huge page. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03thp: fix unsuitable behavior for hwpoisoned tail pageJin Dongming
When a tail page of THP is poisoned, memory-failure will do nothing except setting PG_hwpoison, while the expected behavior is that the process, who is using the poisoned tail page, should be killed. The above problem is caused by lru check of the poisoned tail page of THP. Because PG_lru flag is only set on the head page of THP, the check always consider the poisoned tail page as NON lru page. So the lru check for the tail page of THP should be avoided, as like as hugetlb. This patch adds !PageTransCompound() before lru check for THP, because of the check (!PageHuge() && !PageTransCompound()) the whole branch could be optimized away at build time when both hugetlbfs and THP are set with "N" (or in archs not supporting either of those). [akpm@linux-foundation.org: fix unrelated typo in shake_page() comment] Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03thp: fix the wrong reported address of hwpoisoned hugepagesJin Dongming
When the tail page of THP is poisoned, the head page will be poisoned too. And the wrong address, address of head page, will be sent with sigbus always. So when the poisoned page is used by Guest OS which is running on KVM, after the address changing(hva->gpa) by qemu, the unexpected process on Guest OS will be killed by sigbus. What we expected is that the process using the poisoned tail page could be killed on Guest OS, but not that the process using the healthy head page is killed. Since it is not good to poison the healthy page, avoid poisoning other than the page which is really poisoned. (While we poison all pages in a huge page in case of hugetlb, we can do this for THP thanks to split_huge_page().) Here we fix two parts: 1. Isolate the poisoned page only to make sure the reported address is the address of poisoned page. 2. make the poisoned page work as the poisoned regular page. [akpm@linux-foundation.org: fix spello in comment] Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03thp: fix splitting of hwpoisoned hugepagesJin Dongming
The poisoned THP is now split with split_huge_page() in collect_procs_anon(). If kmalloc() is failed in collect_procs(), split_huge_page() could not be called. And the work after split_huge_page() for collecting the processes using poisoned page will not be done, too. So the processes using the poisoned page could not be killed. The condition becomes worse when CONFIG_DEBUG_VM == "Y". Because the poisoned THP could not be split, system panic will be caused by VM_BUG_ON(PageTransHuge(page)) in try_to_unmap(). This patch does: 1. move split_huge_page() to the place before collect_procs(). This can be sure the failure of splitting THP is caused by itself. 2. when splitting THP is failed, stop the operations after it. This can avoid unexpected system panic or non sense works. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03MAINTAINERS: fixup Simtec support email entriesBen Dooks
The support@simtec.co.uk address is for direct customer support only, the EB2410ITX and EB110ATX entries should direct to the Simtec Linux Team address of linux@simtec.co.uk Also add correct email address for Vincent Sanders [akpm@linux-foundation.org: fix Vincent's address] Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Vincent Sanders <vince@simtec.co.uk> Cc: Simtec Support <support@simtec.co.uk> Cc: Simtec Linux Team <linux@simtec.co.uk> Cc: Jack Stone <jwjstone@fastmail.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03MAINTAINERS: fixup file entries for "SIMTEC EB2410ITX (BAST)"Ben Dooks
Add the correct files for the Simtec BAST machine, ensuring the IDE and IRQ routing are added, and move to the machine specific file instead of trying to catch all of arch/arm/mach-s3c2410 Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Simtec Linux Team <linux@simtec.co.uk> Cc: Simtec Support <support@simtec.co.uk> Cc: Jack Stone <jwjstone@fastmail.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03MAINTAINERS: move s3c2410 drivers to ARM/SAMSUNG ARMBen Dooks
There are currently two entries under the "SIMTEC EB2410ITX (BAST)" machine entry for drivers/*/*s3c2410*, which is catching everything s3c2410 driver related. This entry is for a specific S3C2410 based machine, so move these two file entries to the "ARM/SAMSUNG ARM ARCHITECTURES" entry, where it will reach a wider audience of interested parties. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Simtec Linux Team <linux@simtec.co.uk> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Cc: Jack Stone <jwjstone@fastmail.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03epoll: epoll_wait() should not use timespec_add_ns()Eric Dumazet
commit 95aac7b1cd224f ("epoll: make epoll_wait() use the hrtimer range feature") added a performance regression because it uses timespec_add_ns() with potential very large 'ns' values. [akpm@linux-foundation.org: s/epoll_set_mstimeout/ep_set_mstimeout/, per Davide] Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Bohrer <shawn.bohrer@gmail.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.37.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03mm/migration: fix page corruption during hugepage migrationMinchan Kim
If migrate_huge_page by memory-failure fails , it calls put_page in itself to decrease page reference and caller of migrate_huge_page also calls putback_lru_pages. It can do double free of page so it can make page corruption on page holder. In addtion, clean of pages on caller is consistent behavior with migrate_pages by cf608ac19c ("mm: compaction: fix COMPACTPAGEFAILED counting"). Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03mm: when migrate_pages returns 0, all pages must have been releasedAndrea Arcangeli
In some cases migrate_pages could return zero while still leaving a few pages in the pagelist (and some caller wouldn't notice it has to call putback_lru_pages after commit cf608ac19c9 ("mm: compaction: fix COMPACTPAGEFAILED counting")). Add one missing putback_lru_pages not added by commit cf608ac19c95 ("mm: compaction: fix COMPACTPAGEFAILED counting"). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memsw: deprecate noswapaccount kernel parameter and schedule it for removalMichal Hocko
noswapaccount couldn't be used to control memsw for both on/off cases so we have added swapaccount[=0|1] parameter. This way we can turn the feature in two ways noswapaccount resp. swapaccount=0. We have kept the original noswapaccount but I think we should remove it after some time as it just makes more command line parameters without any advantages and also the code to handle parameters is uglier if we want both parameters. Signed-off-by: Michal Hocko <mhocko@suse.cz> Requested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03memsw: handle swapaccount kernel parameter correctlyMichal Hocko
__setup based kernel command line parameters handlers which are handled in obsolete_checksetup are provided with the parameter value including = (more precisely everything right after the parameter name). This means that the current implementation of swapaccount[=1|0] doesn't work at all because if there is a value for the parameter then we are testing for "0" resp. "1" but we are getting "=0" resp. "=1" and if there is no parameter value we are getting an empty string rather than NULL. The original noswapccount parameter, which doesn't care about the value, works correctly. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-03x86, nx: Don't force pages RW when setting NX bitsMatthieu CASTET
Xen want page table pages read only. But the initial page table (from head_*.S) live in .data or .bss. That was broken by 64edc8ed5ffae999d8d413ba006850e9e34166cb. There is absolutely no reason to force these pages RW after they have already been marked RO. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-02-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2011-02-02tcp_ecn is an integer not a booleanPeter Chubb
There was some confusion at LCA as to why the sysctl tcp_ecn took one of three values when it was documented as a Boolean. This patch fixes the documentation. Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02atl1c: Add missing PCI device IDChuck Ebbert
Commit 8f574b35f22fbb9b5e5f1d11ad6b55b6f35f4533 ("atl1c: Add AR8151 v2 support and change L0s/L1 routine") added support for a new adapter but failed to add it to the PCI device table. Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02s390: Fix possibly wrong size in strncmp (smsgiucv)Stefan Weil
This error was reported by cppcheck: drivers/s390/net/smsgiucv.c:63: error: Using sizeof for array given as function argument returns the size of pointer. Although there is no runtime problem as long as sizeof(u8 *) == 8, this misleading code should get fixed. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02s390: Fix wrong size in memcmp (netiucv)Stefan Weil
This error was reported by cppcheck: drivers/s390/net/netiucv.c:568: error: Using sizeof for array given as function argument returns the size of pointer. sizeof(ipuser) did not result in 16 (as many programmers would have expected) but sizeof(u8 *), so it is 4 or 8, too small here. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02qeth: allow OSA CHPARM change in suspend stateUrsula Braun
For OSA the CHPARM-definition determines the number of available outbound queues. A CHPARM-change may occur while a Linux system with probed OSA device is in suspend state. This patch enables proper resuming of an OSA device in this case. Signed-off-by: Ursula braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02qeth: allow HiperSockets framesize change in suspendUrsula Braun
For HiperSockets the framesize-definition determines the selected mtu-size and the size of the allocated qdio buffers. A framesize-change may occur while a Linux system with probed HiperSockets device is in suspend state. This patch enables proper resuming of a HiperSockets device in this case. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02qeth: add more strict MTU checkingFrank Blaschka
HiperSockets and OSA hardware report a maximum MTU size. Add checking to reject larger MTUs than allowed by hardware. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>