summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2012-10-11Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linuxLinus Torvalds
Pull i2c-embedded changes from Wolfram Sang: "The changes for i2c-embedded include: - massive rework of the omap driver - massive rework of the at91 driver. In fact, the old driver gets removed; I am okay with this approach since the old driver was depending on BROKEN and its limitations made it practically unusable, so people used bitbanging instead. But even if there are users, there is no platform_data or module parameter which would need to be converted. It is just another driver doing I2C transfers, just way better. Modifications of arch/arm/at91 related files have proper acks from the maintainer. - new driver for R-Car I2C - devicetree and generic_clock conversions and fixes - usual driver fixes and changes. The rework patches have come a long way and lots of people have been involved in creating/testing them. Most patches have been in linux-next at least since 3.6-rc5. A few have been added in the last week, I have to admit. An unexpected (but welcome :)) peak in private life is the cause for that. The "late" patches shouldn't cause any merge conflicts and I will have a special eye on them during the stabilization phase. This is an exception and I want to have the patches in place properly in time again for the next kernels." * 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (44 commits) MXS: Implement DMA support into mxs-i2c i2c: add Renesas R-Car I2C driver i2c: s3c2410: use clk_prepare_enable and clk_disable_unprepare ARM: OMAP: convert I2C driver to PM QoS for MPU latency constraints i2c: nomadik: Add Device Tree support to the Nomadik I2C driver i2c: algo: pca: Fix chip reset function for PCA9665 i2c: mpc: Wait for STOP to hit the bus i2c: davinci: preparation for switch to common clock framework omap-i2c: fix incorrect log message when using a device tree i2c: omap: sanitize exit path i2c: omap: switch over to autosuspend API i2c: omap: remove unnecessary pm_runtime_suspended check i2c: omap: switch to threaded IRQ support i2c: omap: remove redundant status read i2c: omap: get rid of the "complete" label i2c: omap: resize fifos before each message i2c: omap: simplify IRQ exit path i2c: omap: always return IRQ_HANDLED i2c: omap: simplify errata check i2c: omap: bus: add a receiver flag ...
2012-10-11Merge branch 'akpm' (Fixups from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Followups, fixes and some random stuff I found on the internet." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (11 patches) perf: fix duplicate header inclusion memcg, kmem: fix build error when CONFIG_INET is disabled rtc: kconfig: fix RTC_INTF defaults connected to RTC_CLASS rapidio: fix comment lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf rapidio: update for destination ID allocation rapidio: update asynchronous discovery initialization rapidio: use msleep in discovery wait mm: compaction: fix bit ranges in {get,clear,set}_pageblock_skip() arch/powerpc/platforms/pseries/hotplug-memory.c: section removal cleanups arch/powerpc/platforms/pseries/hotplug-memory.c: fix section handling code
2012-10-11Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...
2012-10-10memcg, kmem: fix build error when CONFIG_INET is disabledDavid Rientjes
Commit e1aab161e013 ("socket: initial cgroup code.") causes a build error when CONFIG_INET is disabled in Linus' tree: net/built-in.o: In function `sk_update_clone': net/core/sock.c:1336: undefined reference to `sock_update_memcg' sock_update_memcg() is only defined when CONFIG_INET is enabled, so fix it by defining the dummy function without this option. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-10rapidio: fix commentChad Reese
The resource index for the mailboxes was incorrect. Signed-off-by: Chad Reese <kreese@caviumnetworks.com> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-10rapidio: update for destination ID allocationAlexandre Bounine
Address comments provided by Andrew Morton: https://lkml.org/lkml/2012/10/3/550 - Keeps consistent kerneldoc compatible comments style for new static functions. - Removes unnecessary complexity from destination ID allocation routine. - Uses kcalloc() for code clarity. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-10mm: compaction: fix bit ranges in {get,clear,set}_pageblock_skip()Bartlomiej Zolnierkiewicz
{get,clear,set}_pageblock_skip() use incorrect bit ranges (please compare to bit ranges used by {get,set}_pageblock_flags() used for migration types) and can overwrite pageblock migratetype of the next pageblock in the bitmap. Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Thierry Reding <thierry.reding@avionic-design.de> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-10Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1 Aside from the issues discussed at the LKS, distros are shipping NFSv4.1 with all the trimmings. - Fix fdatasync()/fsync() for the corner case of a server reboot. - NFSv4 OPEN access fix: finally distinguish correctly between open-for-read and open-for-execute permissions in all situations. - Ensure that the TCP socket is closed when we're in CLOSE_WAIT - More idmapper bugfixes - Lots of pNFS bugfixes and cleanups to remove unnecessary state and make the code easier to read. - In cases where a pNFS read or write fails, allow the client to resume trying layoutgets after two minutes of read/write- through-mds. - More net namespace fixes to the NFSv4 callback code. - More net namespace fixes to the NFSv3 locking code. - More NFSv4 migration preparatory patches. Including patches to detect network trunking in both NFSv4 and NFSv4.1 - pNFS block updates to optimise LAYOUTGET calls." * tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits) pnfsblock: cleanup nfs4_blkdev_get NFS41: send real read size in layoutget NFS41: send real write size in layoutget NFS: track direct IO left bytes NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked() NFSv4.1: Ensure that the layout sequence id stays 'close' to the current NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code NFSv4 set open access operation call flag in nfs4_init_opendata_res NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL NFSv4 reduce attribute requests for open reclaim NFSv4: nfs4_open_done first must check that GETATTR decoded a file type NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid NFSv4.1: Deal with wraparound issues when updating the layout stateid NFSv4.1: Always set the layout stateid if this is the first layoutget NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout NFSv4: don't put ACCESS in OPEN compound if O_EXCL NFSv4: don't check MAY_WRITE access bit in OPEN NFS: Set key construction data for the legacy upcall NFSv4.1: don't do two EXCHANGE_IDs on mount NFS: nfs41_walk_client_list(): re-lock before iterating ...
2012-10-10Merge tag 'for-3.7-rc1' of git://gitorious.org/linux-pwm/linux-pwmLinus Torvalds
Pull pwm changes from Thierry Reding: "All legacy PWM providers have now been moved to the PWM subsystem. The plan for 3.8 is to adapt all board files to provide a lookup table for PWM devices in order to get rid of the global namespace. Subsequently, users of the legacy pwm_request() and pwm_free() functions can be migrated to the new pwm_get() and pwm_put() functions. Once this has been completed, the legacy API and the compatibility code in the core can be removed. In addition to the above, these changes also add support for configuring the polarity of a PWM signal (currently only supported on ECAP and EHRPWM) and include a much needed rework of the i.MX driver. Managed functions to obtain and release a PWM device (devm_pwm_get() and devm_pwm_put()) have been added and the pwm-backlight driver has been updated to use them. If the PWM subsystem hasn't been enabled, dummy functions are provided that allow the subsystem to safely compile out. Some common checks on input parameters have been moved to the core and removed from the drivers. Finally, a small fix corrects the description of the PWM specifier's second cell in the device tree representation." * tag 'for-3.7-rc1' of git://gitorious.org/linux-pwm/linux-pwm: (23 commits) pwm: dt: Fix description of second PWM cell pwm: Check for negative duty-cycle and period pwm: Add Ingenic JZ4740 support MIPS: JZ4740: Export timer API pwm: Move PUV3 PWM driver to PWM framework unicore32: pwm: Use managed resource allocations unicore32: pwm: Remove unnecessary indirection unicore32: pwm: Use module_platform_driver() unicore32: pwm: Properly remap memory-mapped registers pwm-backlight: Use devm_pwm_get() instead of pwm_get() pwm: Move AB8500 PWM driver to PWM framework pwm: Fix compilation error when CONFIG_PWM is not defined pwm: i.MX: fix clock lookup pwm: i.MX: use per clock unconditionally pwm: i.MX: add devicetree support pwm: i.MX: Use module_platform_driver pwm: i.MX: add functions to enable/disable pwm. pwm: i.MX: remove unnecessary if in pwm_[en|dis]able pwm: i.MX: factor out SoC specific functions pwm: pwm-tiehrpwm: Add support for configuring polarity of PWM ...
2012-10-10Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED subsystem update from Bryan Wu. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: (24 commits) leds: add output driver configuration for pca9633 led driver leds: lm3642: Use regmap_update_bits() in lm3642_chip_init() leds: Add new LED driver for lm3642 chips leds-lp5523: Fix riskiness of the page fault leds-lp5523: turn off the LED engines on unloading the driver leds-lm3530: Fix smatch warnings leds-lm3530: Use devm_regulator_get function leds: leds-gpio: adopt pinctrl support leds: Add new LED driver for lm355x chips leds-lp5523: use the i2c device id rather than fixed name leds-lp5523: add new device id for LP55231 leds-lp5523: support new LP55231 device leds: triggers: send uevent when changing triggers leds-lp5523: minor code style fixes leds-lp5523: change the return type of lp5523_set_mode() leds-lp5523: set the brightness to 0 forcely on removing the driver leds-lp5523: add channel name in the platform data leds: leds-gpio: Use of_get_child_count() helper leds: leds-gpio: Use platform_{get,set}_drvdata leds: leds-gpio: use of_match_ptr() ...
2012-10-10Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi target updates from Nicholas Bellinger: "Things have been calm for the most part with no new fabric drivers in flight for v3.7 (we're up to eight now !), so this update is primarily focused on addressing a few long-standing items within target-core and iscsi-target fabric code. The highlights include: - target: Simplify fabric sense data length handling (roland) - qla2xxx: Fix endianness of task management response code (roland) - target: fix truncation of mode data, support zero allocation length (paolo) - target: Properly support zero-length commands in normal processing path (paolo) - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT PDU (ronnie + nab) - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode (ronnie + nab) - target/file: Re-enable optional fd_buffered_io=1 operation (nab + hch) - iscsi-target: Add MaxXmitDataSegmenthLength forr target -> initiator MDRSL declaration (nab) - target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough (nab + hch) - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch + nab) - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab + hch) The last series for adding a new target_submit_cmd_map_sgls() fabric caller (as requested by hch) that accepts pre-allocated SGL memory (using existing logic), along with converting tcm_loop + tcm_vhost has only been in -next for the last days, but has gotten enough review +testing and is clear enough a mechanical change that I think it's reasonable to merge for -rc1 code. Thanks again to everyone who contributed this round! Extra special thanks to Roland (PureStorage) for tracking down the qla2xxx target TMR response code endian issue, and to Paolo (Redhat) for resolving the long standing zero-length CDB issues within target-core between virtual and pSCSI backends." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits) iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values iscsit: proper endianess conversions iscsit: use the itt_t abstract type iscsit: add missing endianess conversion in iscsit_check_inaddr_any iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp iscsit: mark various functions static target/iscsi: precedence bug in iscsit_set_dataout_sequence_values() target/usb-gadget: strlen() doesn't count the terminator target/usb-gadget: remove duplicate initialization tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls target: Add control CDB READ payload zero work-around tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength iscsi-target: Add MaxXmitDataSegmentLength connection recovery check iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength iscsi-target: Enable MaxXmitDataSegmentLength operation in login path iscsi-target: Add base MaxXmitDataSegmentLength code target/file: Re-enable optional fd_buffered_io=1 operation ...
2012-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull generic execve() changes from Al Viro: "This introduces the generic kernel_thread() and kernel_execve() functions, and switches x86, arm, alpha, um and s390 over to them." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits) s390: convert to generic kernel_execve() s390: switch to generic kernel_thread() s390: fold kernel_thread_helper() into ret_from_fork() s390: fold execve_tail() into start_thread(), convert to generic sys_execve() um: switch to generic kernel_thread() x86, um/x86: switch to generic sys_execve and kernel_execve x86: split ret_from_fork alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve() alpha: switch to generic kernel_thread() alpha: switch to generic sys_execve() arm: get rid of execve wrapper, switch to generic execve() implementation arm: optimized current_pt_regs() arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve() arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk] generic sys_execve() generic kernel_execve() new helper: current_pt_regs() preparation for generic kernel_thread() um: kill thread->forking um: let signal_delivered() do SIGTRAP on singlestepping into handler ...
2012-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking updates from David Miller: 1) UAPI changes for networking from David Howells 2) A netlink dump is an operation we can sleep within, and therefore we need to make sure the dump provider module doesn't disappear on us meanwhile. Fix from Gao Feng. 3) Now that tunnels support GRO, we have to be more careful in skb_gro_reset_offset() otherwise we OOPS, from Eric Dumazet. 4) We can end up processing packets for VLANs we aren't actually configured to be on, fix from Florian Zumbiehl. 5) Fix routing cache removal regression in redirects and IPVS. The core issue on the IPVS side is that it wants to rewrite who the nexthop is and we have to explicitly accomodate that case. From Julian Anastasov. 6) Error code return fixes all over the networking drivers from Peter Senna Tschudin. 7) Fix routing cache removal regressions in IPSEC, from Steffen Klassert. 8) Fix deadlock in RDS during pings, from Jeff Liu. 9) Neighbour packet queue can trigger skb_under_panic() because we do not reset the network header of the SKB in the right spot. From Ramesh Nagappa. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) RDS: fix rds-ping spinlock recursion netdev/phy: Prototype of_mdio_find_bus() farsync: fix support for over 30 cards be2net: Remove code that stops further access to BE NIC based on UE bits pch_gbe: Fix build error by selecting all the possible dependencies. e1000e: add device IDs for i218 ixgbe/ixgbevf: Limit maximum jumbo frame size to 9.5K to avoid Tx hangs ixgbevf: Set the netdev number of Tx queues UAPI: (Scripted) Disintegrate include/linux/tc_ematch UAPI: (Scripted) Disintegrate include/linux/tc_act UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv6 UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv4 UAPI: (Scripted) Disintegrate include/linux/netfilter_bridge UAPI: (Scripted) Disintegrate include/linux/netfilter_arp UAPI: (Scripted) Disintegrate include/linux/netfilter/ipset UAPI: (Scripted) Disintegrate include/linux/netfilter UAPI: (Scripted) Disintegrate include/linux/isdn UAPI: (Scripted) Disintegrate include/linux/caif net: fix typo in freescale/ucc_geth.c vxlan: fix more sparse warnings ...
2012-10-10Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dmaengine updates from Vinod Koul: "This time we have Andy updates on dw_dmac which is attempting to make this IP block available as PCI and platform device though not fully complete this time. We also have TI EDMA moving the dma driver to use dmaengine APIs, also have a new driver for mmp-tdma, along with bunch of small updates. Now for your excitement the merge is little unusual here, while merging the auto merge on linux-next picks wrong choice for pl330 (drivers/dma/pl330.c) and this causes build failure. The correct resolution is in linux-next. (DMA: PL330: Fix build error) I didn't back merge your tree this time as you are better than me so no point in doing that for me :)" Fixed the pl330 conflict as in linux-next, along with trivial header file conflicts due to changed includes. * 'next' of git://git.infradead.org/users/vkoul/slave-dma: (29 commits) dma: tegra: fix interrupt name issue with apb dma. dw_dmac: fix a regression in dwc_prep_dma_memcpy dw_dmac: introduce software emulation of LLP transfers dw_dmac: autoconfigure data_width or get it via platform data dw_dmac: autoconfigure block_size or use platform data dw_dmac: get number of channels from hardware if possible dw_dmac: fill optional encoded parameters in register structure dw_dmac: mark dwc_dump_chan_regs as inline DMA: PL330: return ENOMEM instead of 0 from pl330_alloc_chan_resources DMA: PL330: Remove redundant runtime_suspend/resume functions DMA: PL330: Remove controller clock enable/disable dmaengine: use kmem_cache_zalloc instead of kmem_cache_alloc/memset DMA: PL330: Set the capability of pdm0 and pdm1 as DMA_PRIVATE ARM: EXYNOS: Set the capability of pdm0 and pdm1 as DMA_PRIVATE dma: tegra: use list_move_tail instead of list_del/list_add_tail mxs/dma: Enlarge the CCW descriptor area to 4 pages dw_dmac: utilize slave_id to pass request line dmaengine: mmp_tdma: add dt support dmaengine: mmp-pdma support spi: davici - make davinci select edma ...
2012-10-10Merge tag 'mmc-merge-for-3.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc Pull MMC updates from Chris Ball: "Core: - Add DT properties for card detection (broken-cd, cd-gpios, non-removable) - Don't poll non-removable devices - Fixup/rework eMMC sleep mode/"power off notify" feature - Support eMMC background operations (BKOPS). To set the one-time programmable fuse that enables bkops on an eMMC that doesn't already have it set, you can use the "mmc bkops enable" command in: git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc-utils.git Drivers: - atmel-mci, dw_mmc, pxa-mci, dove, s3c, spear: Add device tree support - bfin_sdh: Add support for the controller in bf60x - dw_mmc: Support Samsung Exynos SoCs - eSDHC: Add ADMA support - sdhci: Support testing a cd-gpio (from slot-gpio) instead of presence bit - sdhci-pltfm: Support broken-cd DT property - tegra: Convert to only supporting DT (mach-tegra has gone DT-only)" * tag 'mmc-merge-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (67 commits) mmc: core: Fixup broken suspend and eMMC4.5 power off notify mmc: sdhci-spear: Add clk_{un}prepare() support mmc: sdhci-spear: add device tree bindings mmc: sdhci-s3c: Add clk_(enable/disable) in runtime suspend/resume mmc: core: Replace MMC_CAP2_BROKEN_VOLTAGE with test for fixed regulator mmc: sdhci-pxav3: Use sdhci_get_of_property for parsing DT quirks mmc: dt: Support "broken-cd" property in sdhci-pltfm mmc: sdhci-s3c: fix the wrong number of max bus clocks mmc: sh-mmcif: avoid oops on spurious interrupts mmc: sh-mmcif: properly handle MMC_WRITE_MULTIPLE_BLOCK completion IRQ mmc: sdhci-s3c: Fix crash on module insertion for second time mmc: sdhci-s3c: Enable only required bus clock mmc: Revert "mmc: dw_mmc: Add check for IDMAC configuration" mmc: mxcmmc: fix bug that may block a data transfer forever mmc: omap_hsmmc: Pass on the suspend failure to the PM core mmc: atmel-mci: AP700x PDC is not connected to MCI mmc: atmel-mci: DMA can be used with other controllers mmc: mmci: use clk_prepare_enable and clk_disable_unprepare mmc: sdhci-s3c: Add device tree support mmc: dw_mmc: add support for exynos specific implementation of dw-mshc ...
2012-10-10Merge tag 'for-linus-20121009' of git://git.infradead.org/mtd-2.6Linus Torvalds
Pull MTD updates from David Woodhouse: - Disable broken mtdchar mmap() on MMU systems - Additional ECC tests for NAND flash, and some test cleanups - New NAND and SPI chip support - Fixes/cleanup for SH FLCTL NAND controller driver - Improved hardware support for GPMI NAND controller - Conversions to device-tree support for various drivers - Removal of obsolete drivers (sbc8xxx, bcmring, etc.) - New LPC32xx drivers for MLC and SLC NAND - Further cleanup of NAND OOB/ECC handling - UAPI cleanup merge from David Howells (just moving files, since MTD headers were sorted out long ago to separate user-visible from kernel bits) * tag 'for-linus-20121009' of git://git.infradead.org/mtd-2.6: (168 commits) mtd: Disable mtdchar mmap on MMU systems UAPI: (Scripted) Disintegrate include/mtd mtd: nand: detect Samsung K9GBG08U0A, K9GAG08U0F ID mtd: nand: decode Hynix MLC, 6-byte ID length mtd: nand: increase max OOB size to 640 mtd: nand: add generic READ ID length calculation functions mtd: nand: split simple ID decode into its own function mtd: nand: split extended ID decoding into its own function mtd: nand: split BB marker options decoding into its own function mtd: nand: remove redundant ID read mtd: nand: remove unnecessary variable mtd: docg4: add missing HAS_IOMEM dependency mtd: gpmi: initialize the timing registers only one time mtd: gpmi: add EDO feature for imx6q mtd: gpmi: do not set the default values for the extra clocks mtd: gpmi: simplify the DLL setting code mtd: gpmi: add a new field for HW_GPMI_CTRL1 mtd: gpmi: do not get the clock frequency in gpmi_begin() mtd: gpmi: add a new field for HW_GPMI_TIMING1 mtd: add helpers to get the supportted ONFI timing mode ...
2012-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "This is a large pull, with the bulk of the updates coming from: - Hole punching - send/receive fixes - fsync performance - Disk format extension allowing more hardlinks inside a single directory (btrfs-progs patch required to enable the compat bit for this one) I'm cooking more unrelated RAID code, but I wanted to make sure this original batch makes it in. The largest updates here are relatively old and have been in testing for some time." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits) btrfs: init ref_index to zero in add_inode_ref Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer Btrfs: fix page leakage Btrfs: do not warn_on when we cannot alloc a page for an extent buffer Btrfs: don't bug on enomem in readpage Btrfs: cleanup pages properly when ENOMEM in compression Btrfs: make filesystem read-only when submitting barrier fails Btrfs: detect corrupted filesystem after write I/O errors Btrfs: make compress and nodatacow mount options mutually exclusive btrfs: fix message printing Btrfs: don't bother committing delayed inode updates when fsyncing btrfs: move inline function code to header file Btrfs: remove unnecessary IS_ERR in bio_readpage_error() btrfs: remove unused function btrfs_insert_some_items() Btrfs: don't commit instead of overcommitting Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called Btrfs: be smarter about dropping things from the tree log Btrfs: don't lookup csums for prealloc extents Btrfs: cache extent state when writing out dirty metadata pages Btrfs: do not hold the file extent leaf locked when adding extent item ...
2012-10-09Merge tag 'disintegrate-isdn-20121009' of ↵David S. Miller
git://git.infradead.org/users/dhowells/linux-headers UAPI Disintegration 2012-10-09 Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09Merge tag 'disintegrate-net-20121009' of ↵David S. Miller
git://git.infradead.org/users/dhowells/linux-headers UAPI Disintegration 2012-10-09 Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
Pulled mainline in order to get the UAPI infrastructure already merged before I pull in David Howells's UAPI trees for networking. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09Merge tag 'disintegrate-mtd-20121009' of ↵David Woodhouse
git://git.infradead.org/users/dhowells/linux-headers UAPI Disintegration 2012-10-09 Conflicts: MAINTAINERS arch/arm/configs/bcmring_defconfig arch/arm/mach-imx/clk-imx51-imx53.c drivers/mtd/nand/Kconfig drivers/mtd/nand/bcm_umi_nand.c drivers/mtd/nand/nand_bcm_umi.h drivers/mtd/nand/orion_nand.c
2012-10-09UAPI: (Scripted) Disintegrate include/mtdDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/tc_ematchDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/tc_actDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv6David Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv4David Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilter_bridgeDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilter_arpDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilter/ipsetDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/netfilterDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/isdnDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09UAPI: (Scripted) Disintegrate include/linux/caifDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge patches from Andrew Morton: "A few misc things and very nearly all of the MM tree. A tremendous amount of stuff (again), including a significant rbtree library rework." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits) sparc64: Support transparent huge pages. mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). mm: Add and use update_mmu_cache_pmd() in transparent huge page code. sparc64: Document PGD and PMD layout. sparc64: Eliminate PTE table memory wastage. sparc64: Halve the size of PTE tables sparc64: Only support 4MB huge pages and 8KB base pages. memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning mm: memcg: clean up mm_match_cgroup() signature mm: document PageHuge somewhat mm: use %pK for /proc/vmallocinfo mm, thp: fix mlock statistics mm, thp: fix mapped pages avoiding unevictable list on mlock memory-hotplug: update memory block's state and notify userspace memory-hotplug: preparation to notify memory block's state at memory hot remove mm: avoid section mismatch warning for memblock_type_name make GFP_NOTRACK definition unconditional cma: decrease cc.nr_migratepages after reclaiming pagelist CMA: migrate mlocked pages kpageflags: fix wrong KPF_THP on non-huge compound pages ...
2012-10-09mm: memcg: clean up mm_match_cgroup() signatureJohannes Weiner
It really should return a boolean for match/no match. And since it takes a memcg, not a cgroup, fix that parameter name as well. [akpm@linux-foundation.org: mm_match_cgroup() is not a macro] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm, thp: fix mapped pages avoiding unevictable list on mlockDavid Rientjes
When a transparent hugepage is mapped and it is included in an mlock() range, follow_page() incorrectly avoids setting the page's mlock bit and moving it to the unevictable lru. This is evident if you try to mlock(), munlock(), and then mlock() a range again. Currently: #define MAP_SIZE (4 << 30) /* 4GB */ void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); mlock(ptr, MAP_SIZE); $ grep -E "Unevictable|Inactive\(anon" /proc/meminfo Inactive(anon): 6304 kB Unevictable: 4213924 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4186252 kB Unevictable: 19652 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4198556 kB Unevictable: 21684 kB Notice that less than 2MB was added to the unevictable list; this is because these pages in the range are not transparent hugepages since the 4GB range was allocated with mmap() and has no specific alignment. If posix_memalign() were used instead, unevictable would not have grown at all on the second mlock(). The fix is to call mlock_vma_page() so that the mlock bit is set and the page is added to the unevictable list. With this patch: mlock(ptr, MAP_SIZE); Inactive(anon): 4056 kB Unevictable: 4213940 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4198268 kB Unevictable: 19636 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4008 kB Unevictable: 4213940 kB Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09memory-hotplug: update memory block's state and notify userspaceWen Congyang
remove_memory() will be called when hot removing a memory device. But even if offlining memory, we cannot notice it. So the patch updates the memory block's state and sends notification to userspace. Additionally, the memory device may contain more than one memory block. If the memory block has been offlined, __offline_pages() will fail. So we should try to offline one memory block at a time. Thus remove_memory() also check each memory block's state. So there is no need to check the memory block's state before calling remove_memory(). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09memory-hotplug: preparation to notify memory block's state at memory hot removeWen Congyang
remove_memory() is called in two cases: 1. echo offline >/sys/devices/system/memory/memoryXX/state 2. hot remove a memory device In the 1st case, the memory block's state is changed and the notification that memory block's state changed is sent to userland after calling remove_memory(). So user can notice memory block is changed. But in the 2nd case, the memory block's state is not changed and the notification is not also sent to userspcae even if calling remove_memory(). So user cannot notice memory block is changed. For adding the notification at memory hot remove, the patch just prepare as follows: 1st case uses offline_pages() for offlining memory. 2nd case uses remove_memory() for offlining memory and changing memory block's state and notifing the information. The patch does not implement notification to remove_memory(). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09make GFP_NOTRACK definition unconditionalGlauber Costa
There was a general sentiment in a recent discussion (See https://lkml.org/lkml/2012/9/18/258) that the __GFP flags should be defined unconditionally. Currently, the only offender is GFP_NOTRACK, which is conditional to KMEMCHECK. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09CMA: migrate mlocked pagesMinchan Kim
Presently CMA cannot migrate mlocked pages so it ends up failing to allocate contiguous memory space. This patch makes mlocked pages be migrated out. Of course, it can affect realtime processes but in CMA usecase, contiguous memory allocation failing is far worse than access latency to an mlocked page being variable while CMA is running. If someone wants to make the system realtime, he shouldn't enable CMA because stalls can still happen at random times. [akpm@linux-foundation.org: tweak comment text, per Mel] Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: remove unevictable_pgs_mlockfreedHugh Dickins
Simply remove UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed line from /proc/vmstat: Johannes and Mel point out that it was very unlikely to have been used by any tool, and of course we can restore it easily enough if that turns out to be wrong. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09memory-hotplug: fix zone stat mismatchMinchan Kim
During memory-hotplug, I found NR_ISOLATED_[ANON|FILE] are increasing, causing the kernel to hang. When the system doesn't have enough free pages, it enters reclaim but never reclaim any pages due to too_many_isolated()==true and loops forever. The cause is that when we do memory-hotadd after memory-remove, __zone_pcp_update() clears a zone's ZONE_STAT_ITEMS in setup_pageset() although the vm_stat_diff of all CPUs still have values. In addtion, when we offline all pages of the zone, we reset them in zone_pcp_reset without draining so we loss some zone stat item. Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: move all mmu notifier invocations to be done outside the PT lockSagi Grimberg
In order to allow sleeping during mmu notifier calls, we need to avoid invoking them under the page table spinlock. This patch solves the problem by calling invalidate_page notification after releasing the lock (but before freeing the page itself), or by wrapping the page invalidation with calls to invalidate_range_begin and invalidate_range_end. To prevent accidental changes to the invalidate_range_end arguments after the call to invalidate_range_begin, the patch introduces a convention of saving the arguments in consistently named locals: unsigned long mmun_start; /* For mmu_notifiers */ unsigned long mmun_end; /* For mmu_notifiers */ ... mmun_start = ... mmun_end = ... mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ... mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); The patch changes code to use this convention for all calls to mmu_notifier_invalidate_range_start/end, except those where the calls are close enough so that anyone who glances at the code can see the values aren't changing. This patchset is a preliminary step towards on-demand paging design to be added to the RDMA stack. Why do we want on-demand paging for Infiniband? Applications register memory with an RDMA adapter using system calls, and subsequently post IO operations that refer to the corresponding virtual addresses directly to HW. Until now, this was achieved by pinning the memory during the registration calls. The goal of on demand paging is to avoid pinning the pages of registered memory regions (MRs). This will allow users the same flexibility they get when swapping any other part of their processes address spaces. Instead of requiring the entire MR to fit in physical memory, we can allow the MR to be larger, and only fit the current working set in physical memory. Why should anyone care? What problems are users currently experiencing? This can make programming with RDMA much simpler. Today, developers that are working with more data than their RAM can hold need either to deregister and reregister memory regions throughout their process's life, or keep a single memory region and copy the data to it. On demand paging will allow these developers to register a single MR at the beginning of their process's life, and let the operating system manage which pages needs to be fetched at a given time. In the future, we might be able to provide a single memory access key for each process that would provide the entire process's address as one large memory region, and the developers wouldn't need to register memory regions at all. Is there any prospect that any other subsystems will utilise these infrastructural changes? If so, which and how, etc? As for other subsystems, I understand that XPMEM wanted to sleep in MMU notifiers, as Christoph Lameter wrote at http://lkml.indiana.edu/hypermail/linux/kernel/0802.1/0460.html and perhaps Andrea knows about other use cases. Scheduling in mmu notifications is required since we need to sync the hardware with the secondary page tables change. A TLB flush of an IO device is inherently slower than a CPU TLB flush, so our design works by sending the invalidation request to the device, and waiting for an interrupt before exiting the mmu notifier handler. Avi said: kvm may be a buyer. kvm::mmu_lock, which serializes guest page faults, also protects long operations such as destroying large ranges. It would be good to convert it into a spinlock, but as it is used inside mmu notifiers, this cannot be done. (there are alternatives, such as keeping the spinlock and using a generation counter to do the teardown in O(1), which is what the "may" is doing up there). [akpm@linux-foundation.orgpossible speed tweak in hugetlb_cow(), cleanups] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Liran Liss <liranl@mellanox.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm, numa: reclaim from all nodes within reclaim distanceDavid Rientjes
RECLAIM_DISTANCE represents the distance between nodes at which it is deemed too costly to allocate from; it's preferred to try to reclaim from a local zone before falling back to allocating on a remote node with such a distance. To do this, zone_reclaim_mode is set if the distance between any two nodes on the system is greather than this distance. This, however, ends up causing the page allocator to reclaim from every zone regardless of its affinity. What we really want is to reclaim only from zones that are closer than RECLAIM_DISTANCE. This patch adds a nodemask to each node that represents the set of nodes that are within this distance. During the zone iteration, if the bit for a zone's node is set for the local node, then reclaim is attempted; otherwise, the zone is skipped. [akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: remove free_page_mlockHugh Dickins
We should not be seeing non-0 unevictable_pgs_mlockfreed any longer. So remove free_page_mlock() from the page freeing paths: __PG_MLOCKED is already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now be checking it, reporting "BUG: Bad page state" if it's ever found set. Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: remove vma arg from page_evictableHugh Dickins
page_evictable(page, vma) is an irritant: almost all its callers pass NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page) explicitly in the couple of places it's needed. But in those places we don't even need page_evictable() itself! They're dealing with a freshly allocated anonymous page, which has no "mapping" and cannot be mlocked yet. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: fix-up zone present pagesJianguo Wu
I think zone->present_pages indicates pages that buddy system can management, it should be: zone->present_pages = spanned pages - absent pages - bootmem pages, but is now: zone->present_pages = spanned pages - absent pages - memmap pages. spanned pages: total size, including holes. absent pages: holes. bootmem pages: pages used in system boot, managed by bootmem allocator. memmap pages: pages used by page structs. This may cause zone->present_pages less than it should be. For example, numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's present_pages should be spanned pages - absent pages, but now it also minus memmap pages(free_area_init_core), which are actually allocated from ZONE_MOVABLE. When offlining all memory of a zone, this will cause zone->present_pages less than 0, because present_pages is unsigned long type, it is actually a very large integer, it indirectly caused zone->watermark[WMARK_MIN] becomes a large integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a large integer(calculate_totalreserve_pages()), and finally cause memory allocating failure when fork process(__vm_enough_memory()). [root@localhost ~]# dmesg -bash: fork: Cannot allocate memory I think the bug described in http://marc.info/?l=linux-mm&m=134502182714186&w=2 is also caused by wrong zone present pages. This patch intends to fix-up zone->present_pages when memory are freed to buddy system on x86_64 and IA64 platforms. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reported-by: Petr Tesarik <ptesarik@suse.cz> Tested-by: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear()Catalin Marinas
The CONFIG_TRANSPARENT_HUGEPAGE implementation of pmdp_get_and_clear() calls pmd_clear() with 3 arguments instead of 1. This happens only for !__HAVE_ARCH_PMDP_GET_AND_CLEAR which doesn't seem to happen because x86 defines this and it uses pmd_update. [mhocko@suse.cz: changelog addition] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm/page_alloc: refactor out __alloc_contig_migrate_alloc()Minchan Kim
__alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor it out (move + rename as a common name) into page_isolation.c. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: compaction: clear PG_migrate_skip based on compaction and reclaim activityMel Gorman
Compaction caches if a pageblock was scanned and no pages were isolated so that the pageblocks can be skipped in the future to reduce scanning. This information is not cleared by the page allocator based on activity due to the impact it would have to the page allocator fast paths. Hence there is a requirement that something clear the cache or pageblocks will be skipped forever. Currently the cache is cleared if there were a number of recent allocation failures and it has not been cleared within the last 5 seconds. Time-based decisions like this are terrible as they have no relationship to VM activity and is basically a big hammer. Unfortunately, accurate heuristics would add cost to some hot paths so this patch implements a rough heuristic. There are two cases where the cache is cleared. 1. If a !kswapd process completes a compaction cycle (migrate and free scanner meet), the zone is marked compact_blockskip_flush. When kswapd goes to sleep, it will clear the cache. This is expected to be the common case where the cache is cleared. It does not really matter if kswapd happens to be asleep or going to sleep when the flag is set as it will be woken on the next allocation request. 2. If there have been multiple failures recently and compaction just finished being deferred then a process will clear the cache and start a full scan. This situation happens if there are multiple high-order allocation requests under heavy memory pressure. The clearing of the PG_migrate_skip bits and other scans is inherently racy but the race is harmless. For allocations that can fail such as THP, they will simply fail. For requests that cannot fail, they will retry the allocation. Tests indicated that scanning rates were roughly similar to when the time-based heuristic was used and the allocation success rates were similar. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: compaction: Restart compaction from near where it left offMel Gorman
This is almost entirely based on Rik's previous patches and discussions with him about how this might be implemented. Order > 0 compaction stops when enough free pages of the correct page order have been coalesced. When doing subsequent higher order allocations, it is possible for compaction to be invoked many times. However, the compaction code always starts out looking for things to compact at the start of the zone, and for free pages to compact things to at the end of the zone. This can cause quadratic behaviour, with isolate_freepages starting at the end of the zone each time, even though previous invocations of the compaction code already filled up all free memory on that end of the zone. This can cause isolate_freepages to take enormous amounts of CPU with certain workloads on larger memory systems. This patch caches where the migration and free scanner should start from on subsequent compaction invocations using the pageblock-skip information. When compaction starts it begins from the cached restart points and will update the cached restart points until a page is isolated or a pageblock is skipped that would have been scanned by synchronous compaction. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>