summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-06-06sched: Fix domain iterationPeter Zijlstra
Weird topologies can lead to asymmetric domain setups. This needs further consideration since these setups are typically non-minimal too. For now, make it work by adding an extra mask selecting which CPUs are allowed to iterate up. The topology that triggered it is the one from David Rientjes: 10 20 20 30 20 10 20 20 20 20 10 20 30 20 20 10 resulting in boxes that wouldn't even boot. Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3p86l9cuaqnxz7uxsojmz5rm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06sched/rt: Fix lockdep annotation within find_lock_lowest_rq()Peter Zijlstra
Roland Dreier reported spurious, hard to trigger lockdep warnings within the scheduler - without any real lockup. This bit gives us the right clue: > [89945.640512] [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90 > [89945.640568] [<ffffffff8104c546>] push_rt_task+0xc6/0x290 if you look at that code you'll find the double_lock_balance() in question is the one in find_lock_lowest_rq() [yay for inlining]. Now find_lock_lowest_rq() has a bug.. it fails to use double_unlock_balance() in one exit path, if this results in a retry in push_rt_task() we'll call double_lock_balance() again, at which point we'll run into said lockdep confusion. Reported-by: Roland Dreier <roland@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337282386.4281.77.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06sched/numa: Load balance between remote nodesAlex Shi
Commit cb83b629b ("sched/numa: Rewrite the CONFIG_NUMA sched domain support") removed the NODE sched domain and started checking if the node distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will lose the load balance chance at exec/fork/wake_affine points. But actually, even the node distance is farther than REMOTE_DISTANCE. Modern CPUs also has QPI like connections, which ensures that memory access is not too slow between nodes. So the above change in behavior on NUMA machine causes a performance regression on various benchmarks: hackbench, tbench, netperf, oltp, etc. This patch will recover the scheduler behavior to old mode on all my Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and thus fixes the perfromance regressions. (all of them just have 2 kinds distance, 10, 21) Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338965571-9812-1-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06sched/x86: Calculate booted cores after construction of sibling_maskKamalesh Babulal
Commit 316ad248307fb ("sched/x86: Rewrite set_cpu_sibling_map()") broke the booted_cores accounting. The problem is that the booted_cores accounting needs all the sibling links set up. So restore the second loop and add a comment as to why its needed. On qemu booted with -smp sockets=1,cores=2,threads=2; Before: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 1 cpu cores : 4 cpu cores : 3 With the patch: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 2 cpu cores : 2 cpu cores : 2 Reported-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-05Merge tag 'please-pull-mce' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull MCE regression fix from Tony Luck: "Typo/thinko in a cleanup caused a semantic change. Fix it." * tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix the MCE poll timer logic
2012-06-05Merge branch 'fixes-for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull arm CMA fix from Marek Szyprowski: "This removes the ARMv6+ CMA dependency and lets one use old, well- tested dma-mapping implementation also on ARMv6+ systems without the need to use EXPERIMENTAL stuff." Russell King complained (rightly) about the experimental feature being forced on by the ARM config. Here CMA is "continuous memory allocator", not "cross-memory attach". We really neet to stop using insane TLA's for things that aren't big industry standards. * 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: dma-mapping: remove unconditional dependency on CMA
2012-06-05MAINTAINERS: add linux-leds mail list address and git URLBryan Wu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-05MAINTAINERS: update util-linux infoKarel Zak
Signed-off-by: Karel Zak <kzak@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-05Merge branch '3.5-merge-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull two scsi target fixes from Nicholas Bellinger: "The first is a small name-space collision fix from Stefan for the new sbp-target / ieee-1394 code, and second is the FILEIO backend conversion patch to always use O_DSYNC by default instead of O_SYNC as recommended by hch. Note the latter is CC'ed stable." * '3.5-merge-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target/file: Use O_DSYNC by default for FILEIO backends sbp-target: rename a variable to avoid name clash
2012-06-05Merge branch 'for-3.5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "This fixes the possible premature superblock release on umount bug mentioned during v3.5-rc1 pull request. Originally, cgroup dentry destruction path assumed that cgroup dentry didn't have any reference left after cgroup removal thus put super during dentry removal. Now that there can be lingering dentry references, this led to super being put with live dentries. This patch fixes the problem by putting super ref on dentry release instead of removal." * 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: superblock can't be released with active dentries
2012-06-05Merge branch 'i2c-embedded/for-current' of ↵Linus Torvalds
git://git.pengutronix.de/git/wsa/linux Pull embedded i2c update from Wolfram Sang: "This only contains one new driver which had multiple dependencies (pinctrl, i2c-mux-rework, new devm_* functions), so I decided to wait for rc1. Plus, it had to wait a little for the ack of a devicetree maintainer since the bindings were not trivial enough for me to pass through. So, given that, I hope there is still something like the "new driver rule", so we could have the driver in 3.5 and people can start using it. That would make merging support for some boards easier for 3.6 since the dependency on this driver is gone then." * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux: i2c: Add generic I2C multiplexer using pinctrl API
2012-06-05Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fix from Avi Kivity: "A one-liner fix for a buffer overflow" * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix buffer overflow in kvm_set_irq()
2012-06-05Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm radeon fixes from Dave Airlie: "This is just radeon fixes and a bunch of new PCI ids. The fixes are for a deadlock, an audio regression, and a couple of audio fixes." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/kms: add new SI PCI ids drm/radeon/kms: add new BTC PCI ids drm/radeon/kms: add new Palm, Sumo PCI ids drm/radeon/kms: add new Trinity PCI ids drm/radeon: fix vm deadlocks on cayman drm/radeon: fix gpu_init on si drm/radeon/hdmi: don't set SEND_MAX_PACKETS bit drm/radeon/audio: don't hardcode CRTC id drm/radeon: make audio_init consistent across asics
2012-06-05radix-tree: fix contiguous iteratorKonstantin Khlebnikov
This patch fixes bug in macro radix_tree_for_each_contig(). If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following radix_tree_next_chunk() switches iterating into next chunk. As result iterating becomes non-contiguous and breaks vfs "splice" and all its users. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-and-bisected-by: Hans de Bruin <jmdebruin@xmsnet.nl> Reported-and-bisected-by: Ondrej Zary <linux@rainbow-software.org> Reported-bisected-and-tested-by: Toralf Förster <toralf.foerster@gmx.de> Link: https://lkml.org/lkml/2012/6/5/64 Cc: stable <stable@vger.kernel.org> # 3.4.x Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-05x86/mce: Fix the MCE poll timer logicChen Gong
In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot the "/ 2" there while cleaning up. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-06-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix blksize calculation fuse: fix stat call on 32 bit platforms fuse: optimize fallocate on permanent failure fuse: add FALLOCATE operation fuse: Convert to kstrtoul_from_user
2012-06-05Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Remove NULL assignment of dattr_cur sched: Remove the last NULL entry from sched_feat_names sched: Make sched_feat_names const sched/rt: Fix SCHED_RR across cgroups sched: Move nr_cpus_allowed out of 'struct sched_rt_entity' sched: Make sure to not re-read variables after validation sched: Fix SD_OVERLAP sched: Don't try allocating memory from offline nodes sched/nohz: Fix rq->cpu_load calculations some more sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask
2012-06-05drm/radeon/kms: add new SI PCI idsAlex Deucher
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon/kms: add new BTC PCI idsAlex Deucher
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon/kms: add new Palm, Sumo PCI idsAlex Deucher
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon/kms: add new Trinity PCI idsAlex Deucher
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05KVM: Fix buffer overflow in kvm_set_irq()Avi Kivity
kvm_set_irq() has an internal buffer of three irq routing entries, allowing connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry() does not properly enforce this, allowing three irqchip routes followed by an MSI route to overflow the buffer. Fix by ensuring that an MSI entry is added to an empty list. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-05drm/radeon: fix vm deadlocks on caymanChristian König
Locking mutex in different orders just screams for deadlocks, and some testing showed that it is actually quite easy to trigger them. Signed-off-by: Christian König <deathsimple@vodafone.de> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon: fix gpu_init on siAlex Deucher
- Properly set up the RBs - Properly set up the SPI - Properly set up gb_addr_config This should fix rendering issues on certain cards. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon/hdmi: don't set SEND_MAX_PACKETS bitRafał Miłecki
Many TVs and A/V receivers don't work with this bit set. Problem was confirmed using: Onkyo TX-SR605, Sony BRAVIA KDL-52X3500, Sony BRAVIA KDL-40S40xx. In theory this bit shouldn't affect audio engine when feeding it with data, however it seems it does. Driver fglrx doesn't set that bit in any of the above cases. This fixes a regression introduced by 3.5-rc1. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon/audio: don't hardcode CRTC idRafał Miłecki
This is based on info released by AMD, should allow using audio in much more cases. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-05drm/radeon: make audio_init consistent across asicsAlex Deucher
Call it in the asic startup callback on all asics. Previously r600 and rv770 called it in the startup and resume callbacks while all the other asics called it in the startup callback. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-06-04Pull 'for-linus' branches of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/{signal,vfs} Pull signal and vfs compile breakage fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: fixups for signal breakage * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nommu: fix compilation of nommu.c
2012-06-04Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Move get_next_mid to ops struct CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe CIFS: Improve identation in cifs_unlock_range CIFS: Fix possible wrong memory allocation
2012-06-04fixups for signal breakageAl Viro
Obvious brainos spotted by Geert. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-04nommu: fix compilation of nommu.cGreg Ungerer
Compiling 3.5-rc1 for nommu targets gives: CC mm/nommu.o mm/nommu.c: In function ‘sys_mmap_pgoff’: mm/nommu.c:1489:2: error: ‘ret’ undeclared (first use in this function) mm/nommu.c:1489:2: note: each undeclared identifier is reported only once for each function it appears in It is trivially fixed by replacing 'ret' with the local variable that is already defined for the return value 'retval'. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-04Merge tag 'stable/frontswap.v16-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm Pull frontswap feature from Konrad Rzeszutek Wilk: "Frontswap provides a "transcendent memory" interface for swap pages. In some environments, dramatic performance savings may be obtained because swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. This tag provides the basic infrastructure along with some changes to the existing backends." Fix up trivial conflict in mm/Makefile due to removal of swap token code changing a line next to the new frontswap entry. This pull request came in before the merge window even opened, it got delayed to after the merge window by me just wanting to make sure it had actual users. Apparently IBM is using this on their embedded side, and Jan Beulich says that it's already made available for SLES and OpenSUSE users. Also acked by Rik van Riel, and Konrad points to other people liking it too. So in it goes. By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2) via Konrad Rzeszutek Wilk * tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm: frontswap: s/put_page/store/g s/get_page/load MAINTAINER: Add myself for the frontswap API mm: frontswap: config and doc files mm: frontswap: core frontswap functionality mm: frontswap: core swap subsystem hooks and headers mm: frontswap: add frontswap header file
2012-06-04Merge branches 'irq-urgent-for-linus' and 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq and smpboot updates from Thomas Gleixner: "Just cleanup patches with no functional change and a fix for suspend issues." * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Introduce irq_do_set_affinity() to reduce duplicated code genirq: Add IRQS_PENDING for nested and simple irq * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot, idle: Fix comment mismatch over idle_threads_init() smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init()
2012-06-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The clocksource driver is pure hardware enablement and the skew option is default off, well tested and non dangerous." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Move skew_tick option into the HIGH_RES_TIMER section clocksource: em_sti: Add DT support clocksource: em_sti: Emma Mobile STI driver clockevents: Make clockevents_config() a global symbol tick: Add tick skew boot option
2012-06-04vfs: Fix /proc/<tid>/fdinfo/<fd> file handlingLinus Torvalds
Cyrill Gorcunov reports that I broke the fdinfo files with commit 30a08bf2d31d ("proc: move fd symlink i_mode calculations into tid_fd_revalidate()"), and he's quite right. The tid_fd_revalidate() function is not just used for the <tid>/fd symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the permission model for those are different. So do the dynamic symlink permission handling just for symlinks, making the fdinfo files once more appear as the proper regular files they are. Of course, Al Viro argued (probably correctly) that we shouldn't do the symlink permission games at all, and make the symlinks always just be the normal 'lrwxrwxrwx'. That would have avoided this issue too, but since somebody noticed that the permissions had changed (which was the reason for that original commit 30a08bf2d31d in the first place), people do apparently use this feature. [ Basically, you can use the symlink permission data as a cheap "fdinfo" replacement, since you see whether the file is open for reading and/or writing by just looking at st_mode of the symlink. So the feature does make sense, even if the pain it has caused means we probably shouldn't have done it to begin with. ] Reported-and-tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-04i2c: Add generic I2C multiplexer using pinctrl APIStephen Warren
This is useful for SoCs whose I2C module's signals can be routed to different sets of pins at run-time, using the pinctrl API. +-----+ +-----+ | dev | | dev | +------------------------+ +-----+ +-----+ | SoC | | | | /----|------+--------+ | +---+ +------+ | child bus A, on first set of pins | |I2C|---|Pinmux| | | +---+ +------+ | child bus B, on second set of pins | \----|------+--------+--------+ | | | | | +------------------------+ +-----+ +-----+ +-----+ | dev | | dev | | dev | +-----+ +-----+ +-----+ Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
2012-06-04ARM: dma-mapping: remove unconditional dependency on CMAMarek Szyprowski
CMA has been enabled unconditionally on all ARMv6+ systems to solve the long standing issue of double kernel mappings for all dma coherent buffers. This however created a dependency on CONFIG_EXPERIMENTAL for the whole ARM architecture what should be really avoided. This patch removes this dependency and lets one use old, well-tested dma-mapping implementation also on ARMv6+ systems without the need to use EXPERIMENTAL stuff. Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-06-04gpio/samsung: fix the typo 'exynos5_xxx' instead of 'exonys5_xxx'Kukjin Kim
Should be 'exynos5_xxx' instead of 'exonys5_xxx'. It happened at the commit 30b842889eea ("Merge tag 'soc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc") during v3.5 merge window. Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> [ My bad - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-04Merge branch 'pm-acpi' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull some left-over PM patches from Rafael J. Wysocki. * 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PM: Make acpi_pm_device_sleep_state() follow the specification ACPI / PM: Make __acpi_bus_get_power() cover D3cold correctly ACPI / PM: Fix error messages in drivers/acpi/bus.c rtc-cmos / PM: report wakeup event on ACPI RTC alarm ACPI / PM: Generate wakeup events on fixed power button
2012-06-04Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks"Linus Torvalds
This reverts commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199. That commit seems to be the cause of the mm compation list corruption issues that Dave Jones reported. The locking (or rather, absense there-of) is dubious, as is the use of the 'page' variable once it has been found to be outside the pageblock range. So revert it for now, we can re-visit this for 3.6. If we even need to: as Minchan Kim says, "The patch wasn't a bug fix and even test workload was very theoretical". Reported-and-tested-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-04mm: fix warning in __set_page_dirty_nobuffersHugh Dickins
New tmpfs use of !PageUptodate pages for fallocate() is triggering the WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers() is called from migrate_page_copy() for compaction. It is anomalous that migration should use __set_page_dirty_nobuffers() on an address_space that does not participate in dirty and writeback accounting; and this has also been observed to insert surprising dirty tags into a tmpfs radix_tree, despite tmpfs not using tags at all. We should probably give migrate_page_copy() a better way to preserve the tag and migrate accounting info, when mapping_cap_account_dirty(). But that needs some more work: so in the interim, avoid the warning by using a simple SetPageDirty on PageSwapBacked pages. Reported-and-tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-03vfs: move inode stat information closer togetherLinus Torvalds
The comment above it says "Stat data, not accessed from path walking", but in fact some of inode fields we use for the common stat data was way down at the end of the inode, causing unnecessary cache misses for the common stat operations. The inode structure is pretty big, and this can change padding depending on field width, but at least on the common 64-bit configurations this doesn't change the size. Some of our inode layout has historically been to tro to avoid unnecessary padding fields, but cache locality is at least as important for layout, if not more. Noticed by looking at kernel profiles, and noticing that the "i_blkbits" access stood out like a sore thumb. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-03target/file: Use O_DSYNC by default for FILEIO backendsNicholas Bellinger
Convert to use O_DSYNC for all cases at FILEIO backend creation time to avoid the extra syncing of pure timestamp updates with legacy O_SYNC during default operation as recommended by hch. Continue to do this independently of Write Cache Enable (WCE) bit, as WCE=0 is currently the default for all backend devices and enabled by user on per device basis via attrib/emulate_write_cache. This patch drops the now unnecessary fd_buffered_io= token usage that was originally signalling when to explictly disable O_SYNC at backend creation time for buffered I/O operation. This can end up being dangerous for a number of reasons during physical node failure, so go ahead and drop this option for now when O_DSYNC is used as the default. Also allow explict FUA WRITEs -> vfs_fsync_range() call to function in fd_execute_cmd() independently of WCE bit setting. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-06-03Linux 3.5-rc1Linus Torvalds
2012-06-03Merge tag 'dm-3.5-changes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper updates from Alasdair G Kergon: "Improve multipath's retrying mechanism in some defined circumstances and provide a simple reserve/release mechanism for userspace tools to access thin provisioning metadata while the pool is in use." * tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm thin: provide userspace access to pool metadata dm thin: use slab mempools dm mpath: allow ioctls to trigger pg init dm mpath: delay retry of bypassed pg dm mpath: reduce size of struct multipath
2012-06-02dm thin: provide userspace access to pool metadataJoe Thornber
This patch implements two new messages that can be sent to the thin pool target allowing it to take a snapshot of the _metadata_. This, read-only snapshot can be accessed by userland, concurrently with the live target. Only one metadata snapshot can be held at a time. The pool's status line will give the block location for the current msnap. Since version 0.1.5 of the userland thin provisioning tools, the thin_dump program displays the msnap as follows: thin_dump -m <msnap root> <metadata dev> Available here: https://github.com/jthornber/thin-provisioning-tools Now that userland can access the metadata we can do various things that have traditionally been kernel side tasks: i) Incremental backups. By using metadata snapshots we can work out what blocks have changed over time. Combined with data snapshots we can ensure the data doesn't change while we back it up. A short proof of concept script can be found here: https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb ii) Migration of thin devices from one pool to another. iii) Merging snapshots back into an external origin. iv) Asyncronous replication. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-02dm thin: use slab mempoolsMike Snitzer
Use dedicated caches prefixed with a "dm_" name rather than relying on kmalloc mempools backed by generic slab caches so the memory usage of thin provisioning (and any leaks) can be accounted for independently. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-02dm mpath: allow ioctls to trigger pg initMikulas Patocka
After the failure of a group of paths, any alternative paths that need initialising do not become available until further I/O is sent to the device. Until this has happened, ioctls return -EAGAIN. With this patch, new paths are made available in response to an ioctl too. The processing of the ioctl gets delayed until this has happened. Instead of returning an error, we submit a work item to kmultipathd (that will potentially activate the new path) and retry in ten milliseconds. Note that the patch doesn't retry an ioctl if the ioctl itself fails due to a path failure. Such retries should be handled intelligently by the code that generated the ioctl in the first place, noting that some SCSI commands should not be retried because they are not idempotent (XOR write commands). For commands that could be retried, there is a danger that if the device rejected the SCSI command, the path could be errorneously marked as failed, and the request would be retried on another path which might fail too. It can be determined if the failure happens on the device or on the SCSI controller, but there is no guarantee that all SCSI drivers set these flags correctly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-02dm mpath: delay retry of bypassed pgMike Christie
If I/O needs retrying and only bypassed priority groups are available, set the pg_init_delay_retry flag to wait before retrying. If, for example, the reason for the bypass is that the controller is getting reset or there is a firmware upgrade happening, retrying right away would cause a flood of log messages and retries for what could be a few seconds or even several minutes. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-02dm mpath: reduce size of struct multipathMike Snitzer
Move multipath structure's 'lock' and 'queue_size' members to eliminate two 4-byte holes. Also use a bit within a single unsigned int for each existing flag (saves 8-bytes). This allows future flags to be added without each consuming an unsigned int. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>