Age | Commit message (Collapse) | Author |
|
Similar to commits:
51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
unordered inside the ACQUIRE of the lock.
And while this is not a problem for the regular mutual exclusive
critical section usage of spinlocks, it breaks creative locking like:
spin_lock(A) spin_lock(B)
spin_unlock_wait(B) if (!spin_is_locked(A))
do_something() do_something()
In that both CPUs can end up running do_something at the same time,
because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
spin_is_locked() loads (even on x86!!).
To avoid making the normal case slower, add smp_mb()s to the less used
spin_unlock_wait() / spin_is_locked() side of things to avoid this
problem.
Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Giovanni Gherdovich <ggherdovich@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # v4.2 and later
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull cifs fixes from Steve French:
"Two small cifs fixes, including one spnego upcall cifs security fix
for stable"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Remove some obsolete comments
cifs: Create dedicated keyring for spnego operations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Bob Peterson:
"We've got nine patches this time:
- Abhi Das has two patches that fix a GFS2 splice issue (and an
adjustment).
- Ben Marzinski has a patch which allows the proper unmount of a GFS2
file system after hitting a withdraw error.
- I have a patch to fix a problem where GFS2 would dereference an
error value, plus three cosmetic / refactoring patches.
- Daniel DeFreez has a patch to fix two glock reference count
problems, where GFS2 was not properly "uninitializing" its glock
holder on error paths.
- Denys Vlasenko has a patch to change a function to not be inlined,
thus reducing the memory footprint of the GFS2 module"
* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Refactor gfs2_remove_from_journal
GFS2: Remove allocation parms from gfs2_rbm_find
gfs2: use inode_lock/unlock instead of accessing i_mutex directly
GFS2: Add calls to gfs2_holder_uninit in two error handlers
GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
GFS2: Get rid of dead code in inode_go_demote_ok
GFS2: ignore unlock failures after withdraw
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
- Rewrite of the unflattening code to avoid recursion and lessen the
stack usage.
- Rewrite of the phandle args parsing code to get rid of the fixed args
size. This is needed for IOMMU code.
- Sync to latest dtc which adds more dts style checking. These
warnings are enabled with "W=1" compiles.
- Tegra documentation updates related to the above warnings.
- A bunch of spelling and other doc fixes.
- Various vendor prefix additions.
* tag 'devicetree-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (52 commits)
devicetree: Add Creative Technology vendor id
gpio: dt-bindings: add ibm,ppc4xx-gpio binding
of/unittest: Remove unnecessary module.h header inclusion
drivers/of: Fix build warning in populate_node()
drivers/of: Fix depth when unflattening devicetree
of: dynamic: changeset prop-update revert fix
drivers/of: Export of_detach_node()
drivers/of: Return allocated memory from of_fdt_unflatten_tree()
drivers/of: Specify parent node in of_fdt_unflatten_tree()
drivers/of: Rename unflatten_dt_node()
drivers/of: Avoid recursively calling unflatten_dt_node()
drivers/of: Split unflatten_dt_node()
of: include errno.h in of_graph.h
of: document refcount incrementation of of_get_cpu_node()
Documentation: dt: soc: fix spelling mistakes
Documentation: dt: power: fix spelling mistake
Documentation: dt: pinctrl: fix spelling mistake
Documentation: dt: opp: fix spelling mistake
Documentation: dt: net: fix spelling mistakes
Documentation: dt: mtd: fix spelling mistake
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Primary 4.7 merge window changes
- Updates to the new Intel X722 iWARP driver
- Updates to the hfi1 driver
- Fixes for the iw_cxgb4 driver
- Misc core fixes
- Generic RDMA READ/WRITE API addition
- SRP updates
- Misc ipoib updates
- Minor mlx5 updates"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits)
IB/mlx5: Fire the CQ completion handler from tasklet
net/mlx5_core: Use tasklet for user-space CQ completion events
IB/core: Do not require CAP_NET_ADMIN for packet sniffing
IB/mlx4: Fix unaligned access in send_reply_to_slave
IB/mlx5: Report Scatter FCS device capability when supported
IB/mlx5: Add Scatter FCS support for Raw Packet QP
IB/core: Add Scatter FCS create flag
IB/core: Add Raw Scatter FCS device capability
IB/core: Add extended device capability flags
i40iw: pass hw_stats by reference rather than by value
i40iw: Remove unnecessary synchronize_irq() before free_irq()
i40iw: constify i40iw_vf_cqp_ops structure
IB/mlx5: Add UARs write-combining and non-cached mapping
IB/mlx5: Allow mapping the free running counter on PROT_EXEC
IB/mlx4: Use list_for_each_entry_safe
IB/SA: Use correct free function
IB/core: Fix a potential array overrun in CMA and SA agent
IB/core: Remove unnecessary check in ibnl_rcv_msg
IB/IWPM: Fix a potential skb leak
RDMA/nes: replace custom print_hex_dump()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
- alternative reset driver for new at91 SoCs
- misc fixes
* tag 'for-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
sbs-battery: fix power status when battery charging near dry
power: ipaq-micro-battery: freeing the wrong variable
power/max8925: freeing wrong variable
power: reset: at91-shdwc: add new shutdown controller driver
ARM: dts: at91: shdwc binding: add new shutdown controller documentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers:
- Add new driver for MAXIM MAX77620/MAX20024 PMIC
- Add new driver for Hisilicon HI665X PMIC
New Device Support:
- Add support for AXP809 in axp20x-rsb
- Add support for Power Supply in axp20x
New core features:
- devm_mfd_* managed resources
Fix-ups:
- Remove unused code (da9063-irq, wm8400-core, tps6105x,
smsc-ece1099, twl4030-power)
- Improve clean-up in error path (intel_quark_i2c_gpio)
- Explicitly include headers (syscon.h)
- Allow building as modules (max77693)
- Use IS_ENABLED() instead of rolling your own (dm355evm_msp,
wm8400-core)
- DT adaptions (axp20x, hi655x, arizona, max77620)
- Remove CLK_IS_ROOT flag (intel-lpss, intel_quark)
- Move to gpiochip API (asic3, dm355evm_msp, htc-egpio, htc-i2cpld,
sm501, tc6393xb, tps65010, ucb1x00, vexpress)
- Make use of devm_mfd_* calls (act8945a, as3711, atmel-hlcdc,
bcm590xx, hi6421-pmic-core, lp3943, menf21bmc, mt6397, rdc321x,
rk808, rn5t618, rt5033, sky81452, stw481x, tps6507x, tps65217,
wm8400)
Bug Fixes"
- Fix ACPI child matching (mfd-core)
- Fix start-up ordering issues (mt6397-core, arizona-core)
- Fix forgotten register state on resume (intel-lpss)
- Fix Clock related issues (twl6040)
- Fix scheduling whilst atomic (omap-usb-tll)
- Kconfig changes (vexpress)"
* tag 'mfd-for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (73 commits)
mfd: hi655x: Add MFD driver for hi655x
mfd: ab8500-debugfs: Trivial fix of spelling mistake on "between"
mfd: vexpress: Add !ARCH_USES_GETTIMEOFFSET dependency
mfd: Add device-tree binding doc for PMIC MAX77620/MAX20024
mfd: max77620: Add core driver for MAX77620/MAX20024
mfd: arizona: Add defines for GPSW values that can be used from DT
mfd: omap-usb-tll: Fix scheduling while atomic BUG
mfd: wm5110: ARIZONA_CLOCK_CONTROL should be volatile
mfd: axp20x: Add a cell for the ac power_supply part of the axp20x PMICs
mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table correctly
mfd: wl1273-core: Use devm_mfd_add_devices() for mfd_device registration
mfd: tps65910: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
mfd: sec: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
mfd: rc5t583: Use devm_mfd_add_devices and devm_request_threaded_irq
mfd: max77686: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
mfd: as3722: Use devm_mfd_add_devices and devm_regmap_add_irq_chip
mfd: twl4030-power: Remove driver path in file comment
MAINTAINERS: Add entry for X-Powers AXP family PMIC drivers
mfd: smsc-ece1099: Remove unnecessarily remove callback
mfd: Use IS_ENABLED(CONFIG_FOO) instead of checking FOO || FOO_MODULE
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI updates from Sebastian Reichel:
- merge omap-ssi and omap-ssi-port modules
- fix omap-ssi module reloading
- add DVFS support to omap-ssi
* tag 'hsi-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
HSI: omap-ssi: move omap_ssi_port_update_fclk
HSI: omap-ssi: include pinctrl header files
HSI: omap-ssi: add COMMON_CLK dependency
HSI: omap-ssi: add clk change support
HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module
HSI: omap_ssi: fix removal of port platform device
HSI: omap_ssi: make sure probe stays available
HSI: omap_ssi: fix module unloading
HSI: omap_ssi_port: switch to gpiod API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev updates from Tomi Valkeinen:
- imxfb: fix lcd power up
- small fixes and cleanups
* tag 'fbdev-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
fbdev: Use IS_ENABLED() instead of checking for built-in or module
efifb: Don't show the mapping VA
video: AMBA CLCD: Remove unncessary include in amba-clcd.c
fbdev: ssd1307fb: Fix charge pump setting
Documentation: fb: fix spelling mistakes
fbdev: fbmem: implement error handling in fbmem_init()
fbdev: sh_mipi_dsi: remove driver
video: fbdev: imxfb: add some error handling
video: fbdev: imxfb: fix semantic of .get_power and .set_power
video: fbdev: omap2: Remove deprecated regulator_can_change_voltage() usage
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that causes sha-mb to crash"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha1-mb - make sha1_x8_avx2() conform to C function ABI
|
|
The newly added nps irqchip driver causes build warnings on ARM64.
include/soc/nps/common.h: In function 'nps_host_reg_non_cl':
include/soc/nps/common.h:148:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
As the driver is only used on ARC, we don't need to see it without
COMPILE_TEST elsewhere, and we can avoid the warnings by only building
on 32-bit architectures even with CONFIG_COMPILE_TEST.
Acked-by: Marc Zyngier <narc.zyngier@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V
- Live patching support for ppc64le (also merged via livepatching.git)
Various cleanups & minor fixes from:
- Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie,
Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring,
Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras,
Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung
Bauermann, Valentin Rothberg, Vipin K Parashar.
General:
- Update LMB associativity index during DLPAR add/remove from Nathan
Fontenot
- Fix branching to OOL handlers in relocatable kernel from Hari Bathini
- Add support for userspace Power9 copy/paste from Chris Smart
- Always use STRICT_MM_TYPECHECKS from Michael Ellerman
- Add mask of possible MMU features from Michael Ellerman
PCI:
- Enable pass through of NVLink to guests from Alexey Kardashevskiy
- Cleanups in preparation for powernv PCI hotplug from Gavin Shan
- Don't report error in eeh_pe_reset_and_recover() from Gavin Shan
- Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan
- Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
from Guilherme G Piccoli
- Remove the dependency on EEH struct in DDW mechanism from Guilherme
G Piccoli
selftests:
- Test cp_abort during context switch from Chris Smart
- Add several tests for transactional memory support from Rashmica
Gupta
perf:
- Add support for sampling interrupt register state from Anju T
- Add support for unwinding perf-stackdump from Chandan Kumar
cxl:
- Configure the PSL for two CAPI ports on POWER8NVL from Philippe
Bergheaud
- Allow initialization on timebase sync failures from Frederic Barrat
- Increase timeout for detection of AFU mmio hang from Frederic
Barrat
- Handle num_of_processes larger than can fit in the SPA from Ian
Munsie
- Ensure PSL interrupt is configured for contexts with no AFU IRQs
from Ian Munsie
- Add kernel API to allow a context to operate with relocate disabled
from Ian Munsie
- Check periodically the coherent platform function's state from
Christophe Lombard
Freescale:
- Updates from Scott: "Contains 86xx fixes, minor device tree fixes,
an erratum workaround, and a kconfig dependency fix."
* tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits)
powerpc/86xx: Fix PCI interrupt map definition
powerpc/86xx: Move pci1 definition to the include file
powerpc/fsl: Fix build of the dtb embedded kernel images
powerpc/fsl: Fix rcpm compatible string
powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC
powerpc/fsl-pci: Add a workaround for PCI 5 errata
powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb
powerpc/powernv/npu: Add PE to PHB's list
powerpc/powernv: Fix insufficient memory allocation
powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner()
powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover()
powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()"
powerpc/powernv/npu: Enable NVLink pass through
powerpc/powernv/npu: Rework TCE Kill handling
powerpc/powernv/npu: Add set/unset window helpers
powerpc/powernv/ioda2: Export debug helper pe_level_printk()
...
|
|
Pull ARM updates from Russell King:
"Changes included in this pull request:
- revert pxa2xx-flash back to using ioremap_cached() and switch
memremap() to use arch_memremap_wb()
- remove pci=firmware command line argument handling
- remove unnecessary arm_dma_set_mask() implementation, the generic
implementation will do for ARM
- removal of the ARM kallsyms "hack" to work around mode switching
veneers and vectors located below PAGE_OFFSET
- tidy up build system output a little
- add L2 cache power management DT bindings
- remove duplicated local_irq_disable() in reboot paths
- handle AMBA primecell devices better at registration time with PM
domains (needed for Samsung SoCs)
- ARM specific preparation to support Keystone II kexec"
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8567/1: cache-uniphier: activate ways for secondary CPUs
ARM: 8570/2: Documentation: devicetree: Add PL310 PM bindings
ARM: 8569/1: pl2x0: Add OF control of cache power management
ARM: 8568/1: reboot: remove duplicated local_irq_disable()
ARM: 8566/1: drivers: amba: properly handle devices with power domains
ARM: provide arm_has_idmap_alias() helper
ARM: kexec: remove 512MB restriction on kexec crashdump
ARM: provide improved virt_to_idmap() functionality
ARM: kexec: fix crashkernel= handling
ARM: 8557/1: specify install, zinstall, and uinstall as PHONY targets
ARM: 8562/1: suppress "include/generated/mach-types.h is up to date."
ARM: 8553/1: kallsyms: remove --page-offset command line option
ARM: 8552/1: kallsyms: remove special lower address limit for CONFIG_ARM
ARM: 8555/1: kallsyms: ignore ARM mode switching veneers
ARM: 8548/1: dma-mapping: remove arm_dma_set_mask()
ARM: 8554/1: kernel: pci: remove pci=firmware command line parameter handling
ARM: memremap: implement arch_memremap_wb()
memremap: add arch specific hook for MEMREMAP_WB mappings
mtd: pxa2xx-flash: switch back from memremap to ioremap_cached
ARM: reintroduce ioremap_cached() for creating cached I/O mappings
|
|
Merge updates from Andrew Morton:
- fsnotify fix
- poll() timeout fix
- a few scripts/ tweaks
- debugobjects updates
- the (small) ocfs2 queue
- Minor fixes to kernel/padata.c
- Maybe half of the MM queue
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
mm, page_alloc: restore the original nodemask if the fast path allocation failed
mm, page_alloc: uninline the bad page part of check_new_page()
mm, page_alloc: don't duplicate code in free_pcp_prepare
mm, page_alloc: defer debugging checks of pages allocated from the PCP
mm, page_alloc: defer debugging checks of freed pages until a PCP drain
cpuset: use static key better and convert to new API
mm, page_alloc: inline pageblock lookup in page free fast paths
mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
mm, page_alloc: pull out side effects from free_pages_check
mm, page_alloc: un-inline the bad part of free_pages_check
mm, page_alloc: check multiple page fields with a single branch
mm, page_alloc: remove field from alloc_context
mm, page_alloc: avoid looking up the first zone in a zonelist twice
mm, page_alloc: shortcut watermark checks for order-0 pages
mm, page_alloc: reduce cost of fair zone allocation policy retry
mm, page_alloc: shorten the page allocator fast path
mm, page_alloc: check once if a zone has isolated pageblocks
mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
mm, page_alloc: simplify last cpupid reset
mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
...
|
|
Remove some obsolete comments in the cifs inode_operations
structs that were pointed out by Stephen Rothwell.
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
|
|
The session key is the default keyring set for request_key operations.
This session key is revoked when the user owning the session logs out.
Any long running daemon processes started by this session ends up with
revoked session keyring which prevents these processes from using the
request_key mechanism from obtaining the krb5 keys.
The problem has been reported by a large number of autofs users. The
problem is also seen with multiuser mounts where the share may be used
by processes run by a user who has since logged out. A reproducer using
automount is available on the Red Hat bz.
The patch creates a new keyring which is used to cache cifs spnego
upcalls.
Red Hat bz: 1267754
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
The page allocator fast path uses either the requested nodemask or
cpuset_current_mems_allowed if cpusets are enabled. If the allocation
context allows watermarks to be ignored then it can also ignore memory
policies. However, on entering the allocator slowpath the nodemask may
still be cpuset_current_mems_allowed and the policies are enforced.
This patch resets the nodemask appropriately before entering the
slowpath.
Link: http://lkml.kernel.org/r/20160504143628.GU2858@techsingularity.net
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Bad pages should be rare so the code handling them doesn't need to be
inline for performance reasons. Put it to separate function which
returns void. This also assumes that the initial page_expected_state()
result will match the result of the thorough check, i.e. the page
doesn't become "good" in the meanwhile. This matches the same
expectations already in place in free_pages_check().
!DEBUG_VM bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
function old new delta
check_new_page_bad - 134 +134
get_page_from_freelist 3468 3194 -274
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some
future patch modifies only one of them. We should be able to achieve
the same effect (skipping free_pages_check() from !DEBUG_VM configs) by
adding a parameter to free_pages_prepare() and making it inline, so the
checks (and the order != 0 parts) are eliminated from the call from
free_pcp_prepare().
!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already
inlining free_pages_prepare() and the elimination seems to work as
expected
DEBUG_VM bloat-o-meter:
add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function old new delta
__free_pages_ok 297 1060 +763
free_hot_cold_page 480 752 +272
free_pages_prepare 778 - -778
Here inlining didn't occur before, and added some code, but it's ok for
a debug option.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Every page allocated checks a number of page fields for validity. This
catches corruption bugs of pages that are already freed but it is
expensive. This patch weakens the debugging check by checking PCP pages
only when the PCP lists are being refilled. All compound pages are
checked. This potentially avoids debugging checks entirely if the PCP
lists are never emptied and refilled so some corruption issues may be
missed. Full checking requires DEBUG_VM.
With the two deferred debugging patches applied, the impact to a page
allocator microbenchmark is
4.6.0-rc3 4.6.0-rc3
inline-v3r6 deferalloc-v3r7
Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)
Note that the free paths for large numbers of pages is impacted as the
debugging cost gets shifted into that path when the page data is no
longer necessarily cache-hot.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Every page free checks a number of page fields for validity. This
catches premature frees and corruptions but it is also expensive. This
patch weakens the debugging check by checking PCP pages at the time they
are drained from the PCP list. This will trigger the bug but the site
that freed the corrupt page will be lost. To get the full context, a
kernel rebuild with DEBUG_VM is necessary.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
An important function for cpusets is cpuset_node_allowed(), which
optimizes on the fact if there's a single root CPU set, it must be
trivially allowed. But the check "nr_cpusets() <= 1" doesn't use the
cpusets_enabled_key static key the right way where static keys eliminate
branching overhead with jump labels.
This patch converts it so that static key is used properly. It's also
switched to the new static key API and the checking functions are
converted to return bool instead of int. We also provide a new variant
__cpuset_zone_allowed() which expects that the static key check was
already done and they key was enabled. This is needed for
get_page_from_freelist() where we want to also avoid the relatively
slower check when ALLOC_CPUSET is not set in alloc_flags.
The impact on the page allocator microbenchmark is less than expected
but the cleanup in itself is worthwhile.
4.6.0-rc2 4.6.0-rc2
multcheck-v1r20 cpuset-v1r20
Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%)
Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%)
Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%)
Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%)
Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%)
Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%)
Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%)
Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%)
Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%)
Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%)
Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The function call overhead of get_pfnblock_flags_mask() is measurable in
the page free paths. This patch uses an inlined version that is faster.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The original count is never reused so it can be removed.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Check without side-effects should be easier to maintain. It also
removes the duplicated cpupid and flags reset done in !DEBUG_VM variant
of both free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it
enables the next patch.
It shouldn't result in new branches, thanks to inlining of the check.
!DEBUG_VM bloat-o-meter:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function old new delta
__free_pages_ok 748 739 -9
free_pcppages_bulk 1403 1385 -18
DEBUG_VM:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function old new delta
free_pages_prepare 806 778 -28
This is also slightly faster because cpupid information is not set on
tail pages so we can avoid resets there.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
From: Vlastimil Babka <vbabka@suse.cz>
!DEBUG_VM size and bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
function old new delta
free_pages_check_bad - 124 +124
free_pcppages_bulk 1288 1171 -117
__free_pages_ok 948 695 -253
DEBUG_VM:
add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
function old new delta
free_pages_check_bad - 124 +124
free_pages_prepare 1112 898 -214
[akpm@linux-foundation.org: fix whitespace]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later. A bad page could be due to a number
of fields. Instead of using multiple branches, this patch combines
multiple fields into a single branch. A detailed check is only
necessary if that check fails.
4.6.0-rc2 4.6.0-rc2
initonce-v1r20 multcheck-v1r20
Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)
Again, the benefit is marginal but avoiding excessive branches is
important. Ideally the paths would not have to check these conditions
at all but regrettably abandoning the tests would make use-after-free
bugs much harder to detect.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The classzone_idx can be inferred from preferred_zoneref so remove the
unnecessary field and save stack space.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The allocator fast path looks up the first usable zone in a zonelist and
then get_page_from_freelist does the same job in the zonelist iterator.
This patch preserves the necessary information.
4.6.0-rc2 4.6.0-rc2
fastmark-v1r20 initonce-v1r20
Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)
The benefit is negligible and the results are within the noise but each
cycle counts.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Watermarks have to be checked on every allocation including the number
of pages being allocated and whether reserves can be accessed. The
reserves only matter if memory is limited and the free_pages adjustment
only applies to high-order pages. This patch adds a shortcut for
order-0 pages that avoids numerous calculations if there is plenty of
free memory yielding the following performance difference in a page
allocator microbenchmark;
4.6.0-rc2 4.6.0-rc2
optfair-v1r20 fastmark-v1r20
Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The fair zone allocation policy is not without cost but it can be
reduced slightly. This patch removes an unnecessary local variable,
checks the likely conditions of the fair zone policy first, uses a bool
instead of a flags check and falls through when a remote node is
encountered instead of doing a full restart. The benefit is marginal
but it's there
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 optfair-v1r20
Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)
The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not
triggered by this particular test so the benefit for some corner cases
is understated.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The page allocator fast path checks page multiple times unnecessarily.
This patch avoids all the slowpath checks if the first allocation
attempt succeeds.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When bulk freeing pages from the per-cpu lists the zone is checked for
isolated pageblocks on every release. This patch checks it once per
drain.
[mgorman@techsingularity.net: fix locking radce, per Vlastimil]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__GFP_HARDWALL only has meaning in the context of cpusets but the fast
path always applies the flag on the first attempt. Move the
manipulations into the cpuset paths where they will be masked by a
static branch in the common case.
With the other micro-optimisations in this series combined, the impact
on a page allocator microbenchmark is
4.6.0-rc2 4.6.0-rc2
decstat-v1r20 micro-v1r20
Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The current reset unnecessarily clears flags and makes pointless
calculations.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
page is guaranteed to be set before it is read with or without the
initialisation.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
zonelist here is a copy of a struct field that is used once. Ditch it.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The number of zones skipped to a zone expiring its fair zone allocation
quota is irrelevant. Convert to bool.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler. Even
without an impact, it makes more sense that this be unsigned.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pageblocks have an associated bitmap to store migrate types and whether
the pageblock should be skipped during compaction. The bitmap may be
associated with a memory section or a zone but the zone is looked up
unconditionally. The compiler should optimise this away automatically
so this is a cosmetic patch only in many cases.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__dec_zone_state is cheaper to use for removing an order-0 page as it
has fewer conditions to check.
The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
optiter-v1r20 decstat-v1r20
Min alloc-odr0-1 382.00 ( 0.00%) 381.00 ( 0.26%)
Min alloc-odr0-2 282.00 ( 0.00%) 275.00 ( 2.48%)
Min alloc-odr0-4 233.00 ( 0.00%) 229.00 ( 1.72%)
Min alloc-odr0-8 203.00 ( 0.00%) 199.00 ( 1.97%)
Min alloc-odr0-16 188.00 ( 0.00%) 186.00 ( 1.06%)
Min alloc-odr0-32 182.00 ( 0.00%) 179.00 ( 1.65%)
Min alloc-odr0-64 177.00 ( 0.00%) 174.00 ( 1.69%)
Min alloc-odr0-128 175.00 ( 0.00%) 172.00 ( 1.71%)
Min alloc-odr0-256 184.00 ( 0.00%) 181.00 ( 1.63%)
Min alloc-odr0-512 197.00 ( 0.00%) 193.00 ( 2.03%)
Min alloc-odr0-1024 203.00 ( 0.00%) 201.00 ( 0.99%)
Min alloc-odr0-2048 209.00 ( 0.00%) 206.00 ( 1.44%)
Min alloc-odr0-4096 214.00 ( 0.00%) 212.00 ( 0.93%)
Min alloc-odr0-8192 218.00 ( 0.00%) 215.00 ( 1.38%)
Min alloc-odr0-16384 219.00 ( 0.00%) 216.00 ( 1.37%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The page allocator iterates through a zonelist for zones that match the
addressing limitations and nodemask of the caller but many allocations
will not be restricted. Despite this, there is always functional call
overhead which builds up.
This patch inlines the optimistic basic case and only calls the iterator
function for the complex case. A hindrance was the fact that
cpuset_current_mems_allowed is used in the fastpath as the allowed
nodemask even though all nodes are allowed on most systems. The patch
handles this by only considering cpuset_current_mems_allowed if a cpuset
exists. As well as being faster in the fast-path, this removes some
junk in the slowpath.
The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
statinline-v1r20 optiter-v1r20
Min alloc-odr0-1 412.00 ( 0.00%) 382.00 ( 7.28%)
Min alloc-odr0-2 301.00 ( 0.00%) 282.00 ( 6.31%)
Min alloc-odr0-4 247.00 ( 0.00%) 233.00 ( 5.67%)
Min alloc-odr0-8 215.00 ( 0.00%) 203.00 ( 5.58%)
Min alloc-odr0-16 199.00 ( 0.00%) 188.00 ( 5.53%)
Min alloc-odr0-32 191.00 ( 0.00%) 182.00 ( 4.71%)
Min alloc-odr0-64 187.00 ( 0.00%) 177.00 ( 5.35%)
Min alloc-odr0-128 185.00 ( 0.00%) 175.00 ( 5.41%)
Min alloc-odr0-256 193.00 ( 0.00%) 184.00 ( 4.66%)
Min alloc-odr0-512 207.00 ( 0.00%) 197.00 ( 4.83%)
Min alloc-odr0-1024 213.00 ( 0.00%) 203.00 ( 4.69%)
Min alloc-odr0-2048 220.00 ( 0.00%) 209.00 ( 5.00%)
Min alloc-odr0-4096 226.00 ( 0.00%) 214.00 ( 5.31%)
Min alloc-odr0-8192 229.00 ( 0.00%) 218.00 ( 4.80%)
Min alloc-odr0-16384 229.00 ( 0.00%) 219.00 ( 4.37%)
perf indicated that next_zones_zonelist disappeared in the profile and
__next_zones_zonelist did not appear. This is expected as the
micro-benchmark would hit the inlined fast-path every time.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
zone_statistics has one call-site but it's a public function. Make it
static and inline.
The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
statbranch-v1r20 statinline-v1r20
Min alloc-odr0-1 419.00 ( 0.00%) 412.00 ( 1.67%)
Min alloc-odr0-2 305.00 ( 0.00%) 301.00 ( 1.31%)
Min alloc-odr0-4 250.00 ( 0.00%) 247.00 ( 1.20%)
Min alloc-odr0-8 219.00 ( 0.00%) 215.00 ( 1.83%)
Min alloc-odr0-16 203.00 ( 0.00%) 199.00 ( 1.97%)
Min alloc-odr0-32 195.00 ( 0.00%) 191.00 ( 2.05%)
Min alloc-odr0-64 191.00 ( 0.00%) 187.00 ( 2.09%)
Min alloc-odr0-128 189.00 ( 0.00%) 185.00 ( 2.12%)
Min alloc-odr0-256 198.00 ( 0.00%) 193.00 ( 2.53%)
Min alloc-odr0-512 210.00 ( 0.00%) 207.00 ( 1.43%)
Min alloc-odr0-1024 216.00 ( 0.00%) 213.00 ( 1.39%)
Min alloc-odr0-2048 221.00 ( 0.00%) 220.00 ( 0.45%)
Min alloc-odr0-4096 227.00 ( 0.00%) 226.00 ( 0.44%)
Min alloc-odr0-8192 232.00 ( 0.00%) 229.00 ( 1.29%)
Min alloc-odr0-16384 232.00 ( 0.00%) 229.00 ( 1.29%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
zone_statistics has more branches than it really needs to take an
unlikely GFP flag into account. Reduce the number and annotate the
unlikely flag.
The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
nocompound-v1r10 statbranch-v1r10
Min alloc-odr0-1 417.00 ( 0.00%) 419.00 ( -0.48%)
Min alloc-odr0-2 308.00 ( 0.00%) 305.00 ( 0.97%)
Min alloc-odr0-4 253.00 ( 0.00%) 250.00 ( 1.19%)
Min alloc-odr0-8 221.00 ( 0.00%) 219.00 ( 0.90%)
Min alloc-odr0-16 205.00 ( 0.00%) 203.00 ( 0.98%)
Min alloc-odr0-32 199.00 ( 0.00%) 195.00 ( 2.01%)
Min alloc-odr0-64 193.00 ( 0.00%) 191.00 ( 1.04%)
Min alloc-odr0-128 191.00 ( 0.00%) 189.00 ( 1.05%)
Min alloc-odr0-256 200.00 ( 0.00%) 198.00 ( 1.00%)
Min alloc-odr0-512 212.00 ( 0.00%) 210.00 ( 0.94%)
Min alloc-odr0-1024 219.00 ( 0.00%) 216.00 ( 1.37%)
Min alloc-odr0-2048 225.00 ( 0.00%) 221.00 ( 1.78%)
Min alloc-odr0-4096 231.00 ( 0.00%) 227.00 ( 1.73%)
Min alloc-odr0-8192 234.00 ( 0.00%) 232.00 ( 0.85%)
Min alloc-odr0-16384 234.00 ( 0.00%) 232.00 ( 0.85%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The PageAnon check always checks for compound_head but this is a
relatively expensive check if the caller already knows the page is a
head page. This patch creates a helper and uses it in the page free
path which only operates on head pages.
With this patch and "Only check PageCompound for high-order pages", the
performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
vanilla nocompound-v1r20
Min alloc-odr0-1 425.00 ( 0.00%) 417.00 ( 1.88%)
Min alloc-odr0-2 313.00 ( 0.00%) 308.00 ( 1.60%)
Min alloc-odr0-4 257.00 ( 0.00%) 253.00 ( 1.56%)
Min alloc-odr0-8 224.00 ( 0.00%) 221.00 ( 1.34%)
Min alloc-odr0-16 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr0-32 199.00 ( 0.00%) 199.00 ( 0.00%)
Min alloc-odr0-64 195.00 ( 0.00%) 193.00 ( 1.03%)
Min alloc-odr0-128 192.00 ( 0.00%) 191.00 ( 0.52%)
Min alloc-odr0-256 204.00 ( 0.00%) 200.00 ( 1.96%)
Min alloc-odr0-512 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr0-1024 219.00 ( 0.00%) 219.00 ( 0.00%)
Min alloc-odr0-2048 225.00 ( 0.00%) 225.00 ( 0.00%)
Min alloc-odr0-4096 230.00 ( 0.00%) 231.00 ( -0.43%)
Min alloc-odr0-8192 235.00 ( 0.00%) 234.00 ( 0.43%)
Min alloc-odr0-16384 235.00 ( 0.00%) 234.00 ( 0.43%)
Min free-odr0-1 215.00 ( 0.00%) 191.00 ( 11.16%)
Min free-odr0-2 152.00 ( 0.00%) 136.00 ( 10.53%)
Min free-odr0-4 119.00 ( 0.00%) 107.00 ( 10.08%)
Min free-odr0-8 106.00 ( 0.00%) 96.00 ( 9.43%)
Min free-odr0-16 97.00 ( 0.00%) 87.00 ( 10.31%)
Min free-odr0-32 91.00 ( 0.00%) 83.00 ( 8.79%)
Min free-odr0-64 89.00 ( 0.00%) 81.00 ( 8.99%)
Min free-odr0-128 88.00 ( 0.00%) 80.00 ( 9.09%)
Min free-odr0-256 106.00 ( 0.00%) 95.00 ( 10.38%)
Min free-odr0-512 116.00 ( 0.00%) 111.00 ( 4.31%)
Min free-odr0-1024 125.00 ( 0.00%) 118.00 ( 5.60%)
Min free-odr0-2048 133.00 ( 0.00%) 126.00 ( 5.26%)
Min free-odr0-4096 136.00 ( 0.00%) 130.00 ( 4.41%)
Min free-odr0-8192 138.00 ( 0.00%) 130.00 ( 5.80%)
Min free-odr0-16384 137.00 ( 0.00%) 130.00 ( 5.11%)
There is a sizable boost to the free allocator performance. While there
is an apparent boost on the allocation side, it's likely a co-incidence
or due to the patches slightly reducing cache footprint.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Another year, another round of page allocator optimisations focusing
this time on the alloc and free fast paths. This should be of help to
workloads that are allocator-intensive from kernel space where the cost
of zeroing is not nceessraily incurred.
The series is motivated by the observation that page alloc
microbenchmarks on multiple machines regressed between 3.12.44 and 4.4.
Second, there is discussions before LSF/MM considering the possibility
of adding another page allocator which is potentially hazardous but a
patch series improving performance is better than whining.
After the series is applied, there are still hazards. In the free
paths, the debugging checking and page zone/pageblock lookups dominate
but there was not an obvious solution to that. In the alloc path, the
major contributers are dealing with zonelists, new page preperation, the
fair zone allocation and numerous statistic updates. The fair zone
allocator is removed by the per-node LRU series if that gets merged so
it's nor a major concern at the moment.
On normal userspace benchmarks, there is little impact as the zeroing
cost is significant but it's visible
aim9
4.6.0-rc3 4.6.0-rc3
vanilla deferalloc-v3
Min page_test 828693.33 ( 0.00%) 887060.00 ( 7.04%)
Min brk_test 4847266.67 ( 0.00%) 4966266.67 ( 2.45%)
Min exec_test 1271.00 ( 0.00%) 1275.67 ( 0.37%)
Min fork_test 12371.75 ( 0.00%) 12380.00 ( 0.07%)
The overall impact on a page allocator microbenchmark for a range of orders
and number of pages allocated in a batch is
4.6.0-rc3 4.6.0-rc3
vanilla deferalloc-v3r7
Min alloc-odr0-1 428.00 ( 0.00%) 316.00 ( 26.17%)
Min alloc-odr0-2 314.00 ( 0.00%) 231.00 ( 26.43%)
Min alloc-odr0-4 256.00 ( 0.00%) 192.00 ( 25.00%)
Min alloc-odr0-8 222.00 ( 0.00%) 166.00 ( 25.23%)
Min alloc-odr0-16 207.00 ( 0.00%) 154.00 ( 25.60%)
Min alloc-odr0-32 197.00 ( 0.00%) 148.00 ( 24.87%)
Min alloc-odr0-64 193.00 ( 0.00%) 144.00 ( 25.39%)
Min alloc-odr0-128 191.00 ( 0.00%) 143.00 ( 25.13%)
Min alloc-odr0-256 203.00 ( 0.00%) 153.00 ( 24.63%)
Min alloc-odr0-512 212.00 ( 0.00%) 165.00 ( 22.17%)
Min alloc-odr0-1024 221.00 ( 0.00%) 172.00 ( 22.17%)
Min alloc-odr0-2048 225.00 ( 0.00%) 179.00 ( 20.44%)
Min alloc-odr0-4096 232.00 ( 0.00%) 185.00 ( 20.26%)
Min alloc-odr0-8192 235.00 ( 0.00%) 187.00 ( 20.43%)
Min alloc-odr0-16384 236.00 ( 0.00%) 188.00 ( 20.34%)
Min alloc-odr1-1 519.00 ( 0.00%) 450.00 ( 13.29%)
Min alloc-odr1-2 391.00 ( 0.00%) 336.00 ( 14.07%)
Min alloc-odr1-4 313.00 ( 0.00%) 268.00 ( 14.38%)
Min alloc-odr1-8 277.00 ( 0.00%) 235.00 ( 15.16%)
Min alloc-odr1-16 256.00 ( 0.00%) 218.00 ( 14.84%)
Min alloc-odr1-32 252.00 ( 0.00%) 212.00 ( 15.87%)
Min alloc-odr1-64 244.00 ( 0.00%) 206.00 ( 15.57%)
Min alloc-odr1-128 244.00 ( 0.00%) 207.00 ( 15.16%)
Min alloc-odr1-256 243.00 ( 0.00%) 207.00 ( 14.81%)
Min alloc-odr1-512 245.00 ( 0.00%) 209.00 ( 14.69%)
Min alloc-odr1-1024 248.00 ( 0.00%) 214.00 ( 13.71%)
Min alloc-odr1-2048 253.00 ( 0.00%) 220.00 ( 13.04%)
Min alloc-odr1-4096 258.00 ( 0.00%) 224.00 ( 13.18%)
Min alloc-odr1-8192 261.00 ( 0.00%) 229.00 ( 12.26%)
Min alloc-odr2-1 560.00 ( 0.00%) 753.00 (-34.46%)
Min alloc-odr2-2 424.00 ( 0.00%) 351.00 ( 17.22%)
Min alloc-odr2-4 339.00 ( 0.00%) 393.00 (-15.93%)
Min alloc-odr2-8 298.00 ( 0.00%) 246.00 ( 17.45%)
Min alloc-odr2-16 276.00 ( 0.00%) 227.00 ( 17.75%)
Min alloc-odr2-32 271.00 ( 0.00%) 221.00 ( 18.45%)
Min alloc-odr2-64 264.00 ( 0.00%) 217.00 ( 17.80%)
Min alloc-odr2-128 264.00 ( 0.00%) 217.00 ( 17.80%)
Min alloc-odr2-256 264.00 ( 0.00%) 218.00 ( 17.42%)
Min alloc-odr2-512 269.00 ( 0.00%) 223.00 ( 17.10%)
Min alloc-odr2-1024 279.00 ( 0.00%) 230.00 ( 17.56%)
Min alloc-odr2-2048 283.00 ( 0.00%) 235.00 ( 16.96%)
Min alloc-odr2-4096 285.00 ( 0.00%) 239.00 ( 16.14%)
Min alloc-odr3-1 629.00 ( 0.00%) 505.00 ( 19.71%)
Min alloc-odr3-2 472.00 ( 0.00%) 374.00 ( 20.76%)
Min alloc-odr3-4 383.00 ( 0.00%) 301.00 ( 21.41%)
Min alloc-odr3-8 341.00 ( 0.00%) 266.00 ( 21.99%)
Min alloc-odr3-16 316.00 ( 0.00%) 248.00 ( 21.52%)
Min alloc-odr3-32 308.00 ( 0.00%) 241.00 ( 21.75%)
Min alloc-odr3-64 305.00 ( 0.00%) 241.00 ( 20.98%)
Min alloc-odr3-128 308.00 ( 0.00%) 244.00 ( 20.78%)
Min alloc-odr3-256 317.00 ( 0.00%) 249.00 ( 21.45%)
Min alloc-odr3-512 327.00 ( 0.00%) 256.00 ( 21.71%)
Min alloc-odr3-1024 331.00 ( 0.00%) 261.00 ( 21.15%)
Min alloc-odr3-2048 333.00 ( 0.00%) 266.00 ( 20.12%)
Min alloc-odr4-1 767.00 ( 0.00%) 572.00 ( 25.42%)
Min alloc-odr4-2 578.00 ( 0.00%) 429.00 ( 25.78%)
Min alloc-odr4-4 474.00 ( 0.00%) 346.00 ( 27.00%)
Min alloc-odr4-8 422.00 ( 0.00%) 310.00 ( 26.54%)
Min alloc-odr4-16 399.00 ( 0.00%) 295.00 ( 26.07%)
Min alloc-odr4-32 392.00 ( 0.00%) 293.00 ( 25.26%)
Min alloc-odr4-64 394.00 ( 0.00%) 293.00 ( 25.63%)
Min alloc-odr4-128 405.00 ( 0.00%) 305.00 ( 24.69%)
Min alloc-odr4-256 417.00 ( 0.00%) 319.00 ( 23.50%)
Min alloc-odr4-512 425.00 ( 0.00%) 326.00 ( 23.29%)
Min alloc-odr4-1024 426.00 ( 0.00%) 329.00 ( 22.77%)
Min free-odr0-1 216.00 ( 0.00%) 178.00 ( 17.59%)
Min free-odr0-2 152.00 ( 0.00%) 125.00 ( 17.76%)
Min free-odr0-4 120.00 ( 0.00%) 99.00 ( 17.50%)
Min free-odr0-8 106.00 ( 0.00%) 85.00 ( 19.81%)
Min free-odr0-16 97.00 ( 0.00%) 80.00 ( 17.53%)
Min free-odr0-32 92.00 ( 0.00%) 76.00 ( 17.39%)
Min free-odr0-64 89.00 ( 0.00%) 74.00 ( 16.85%)
Min free-odr0-128 89.00 ( 0.00%) 73.00 ( 17.98%)
Min free-odr0-256 107.00 ( 0.00%) 90.00 ( 15.89%)
Min free-odr0-512 117.00 ( 0.00%) 108.00 ( 7.69%)
Min free-odr0-1024 125.00 ( 0.00%) 118.00 ( 5.60%)
Min free-odr0-2048 132.00 ( 0.00%) 125.00 ( 5.30%)
Min free-odr0-4096 135.00 ( 0.00%) 130.00 ( 3.70%)
Min free-odr0-8192 137.00 ( 0.00%) 130.00 ( 5.11%)
Min free-odr0-16384 137.00 ( 0.00%) 131.00 ( 4.38%)
Min free-odr1-1 318.00 ( 0.00%) 289.00 ( 9.12%)
Min free-odr1-2 228.00 ( 0.00%) 207.00 ( 9.21%)
Min free-odr1-4 182.00 ( 0.00%) 165.00 ( 9.34%)
Min free-odr1-8 163.00 ( 0.00%) 146.00 ( 10.43%)
Min free-odr1-16 151.00 ( 0.00%) 135.00 ( 10.60%)
Min free-odr1-32 146.00 ( 0.00%) 129.00 ( 11.64%)
Min free-odr1-64 145.00 ( 0.00%) 130.00 ( 10.34%)
Min free-odr1-128 148.00 ( 0.00%) 134.00 ( 9.46%)
Min free-odr1-256 148.00 ( 0.00%) 137.00 ( 7.43%)
Min free-odr1-512 151.00 ( 0.00%) 140.00 ( 7.28%)
Min free-odr1-1024 154.00 ( 0.00%) 143.00 ( 7.14%)
Min free-odr1-2048 156.00 ( 0.00%) 144.00 ( 7.69%)
Min free-odr1-4096 156.00 ( 0.00%) 142.00 ( 8.97%)
Min free-odr1-8192 156.00 ( 0.00%) 140.00 ( 10.26%)
Min free-odr2-1 361.00 ( 0.00%) 457.00 (-26.59%)
Min free-odr2-2 258.00 ( 0.00%) 224.00 ( 13.18%)
Min free-odr2-4 208.00 ( 0.00%) 223.00 ( -7.21%)
Min free-odr2-8 185.00 ( 0.00%) 160.00 ( 13.51%)
Min free-odr2-16 173.00 ( 0.00%) 149.00 ( 13.87%)
Min free-odr2-32 166.00 ( 0.00%) 145.00 ( 12.65%)
Min free-odr2-64 166.00 ( 0.00%) 146.00 ( 12.05%)
Min free-odr2-128 169.00 ( 0.00%) 148.00 ( 12.43%)
Min free-odr2-256 170.00 ( 0.00%) 152.00 ( 10.59%)
Min free-odr2-512 177.00 ( 0.00%) 156.00 ( 11.86%)
Min free-odr2-1024 182.00 ( 0.00%) 162.00 ( 10.99%)
Min free-odr2-2048 181.00 ( 0.00%) 160.00 ( 11.60%)
Min free-odr2-4096 180.00 ( 0.00%) 159.00 ( 11.67%)
Min free-odr3-1 431.00 ( 0.00%) 367.00 ( 14.85%)
Min free-odr3-2 306.00 ( 0.00%) 259.00 ( 15.36%)
Min free-odr3-4 249.00 ( 0.00%) 208.00 ( 16.47%)
Min free-odr3-8 224.00 ( 0.00%) 186.00 ( 16.96%)
Min free-odr3-16 208.00 ( 0.00%) 176.00 ( 15.38%)
Min free-odr3-32 206.00 ( 0.00%) 174.00 ( 15.53%)
Min free-odr3-64 210.00 ( 0.00%) 178.00 ( 15.24%)
Min free-odr3-128 215.00 ( 0.00%) 182.00 ( 15.35%)
Min free-odr3-256 224.00 ( 0.00%) 189.00 ( 15.62%)
Min free-odr3-512 232.00 ( 0.00%) 195.00 ( 15.95%)
Min free-odr3-1024 230.00 ( 0.00%) 195.00 ( 15.22%)
Min free-odr3-2048 229.00 ( 0.00%) 193.00 ( 15.72%)
Min free-odr4-1 561.00 ( 0.00%) 439.00 ( 21.75%)
Min free-odr4-2 418.00 ( 0.00%) 318.00 ( 23.92%)
Min free-odr4-4 339.00 ( 0.00%) 269.00 ( 20.65%)
Min free-odr4-8 299.00 ( 0.00%) 239.00 ( 20.07%)
Min free-odr4-16 289.00 ( 0.00%) 234.00 ( 19.03%)
Min free-odr4-32 291.00 ( 0.00%) 235.00 ( 19.24%)
Min free-odr4-64 298.00 ( 0.00%) 238.00 ( 20.13%)
Min free-odr4-128 308.00 ( 0.00%) 251.00 ( 18.51%)
Min free-odr4-256 321.00 ( 0.00%) 267.00 ( 16.82%)
Min free-odr4-512 327.00 ( 0.00%) 269.00 ( 17.74%)
Min free-odr4-1024 326.00 ( 0.00%) 271.00 ( 16.87%)
Min total-odr0-1 644.00 ( 0.00%) 494.00 ( 23.29%)
Min total-odr0-2 466.00 ( 0.00%) 356.00 ( 23.61%)
Min total-odr0-4 376.00 ( 0.00%) 291.00 ( 22.61%)
Min total-odr0-8 328.00 ( 0.00%) 251.00 ( 23.48%)
Min total-odr0-16 304.00 ( 0.00%) 234.00 ( 23.03%)
Min total-odr0-32 289.00 ( 0.00%) 224.00 ( 22.49%)
Min total-odr0-64 282.00 ( 0.00%) 218.00 ( 22.70%)
Min total-odr0-128 280.00 ( 0.00%) 216.00 ( 22.86%)
Min total-odr0-256 310.00 ( 0.00%) 243.00 ( 21.61%)
Min total-odr0-512 329.00 ( 0.00%) 273.00 ( 17.02%)
Min total-odr0-1024 346.00 ( 0.00%) 290.00 ( 16.18%)
Min total-odr0-2048 357.00 ( 0.00%) 304.00 ( 14.85%)
Min total-odr0-4096 367.00 ( 0.00%) 315.00 ( 14.17%)
Min total-odr0-8192 372.00 ( 0.00%) 317.00 ( 14.78%)
Min total-odr0-16384 373.00 ( 0.00%) 319.00 ( 14.48%)
Min total-odr1-1 838.00 ( 0.00%) 739.00 ( 11.81%)
Min total-odr1-2 619.00 ( 0.00%) 543.00 ( 12.28%)
Min total-odr1-4 495.00 ( 0.00%) 433.00 ( 12.53%)
Min total-odr1-8 440.00 ( 0.00%) 382.00 ( 13.18%)
Min total-odr1-16 407.00 ( 0.00%) 353.00 ( 13.27%)
Min total-odr1-32 398.00 ( 0.00%) 341.00 ( 14.32%)
Min total-odr1-64 389.00 ( 0.00%) 336.00 ( 13.62%)
Min total-odr1-128 392.00 ( 0.00%) 341.00 ( 13.01%)
Min total-odr1-256 391.00 ( 0.00%) 344.00 ( 12.02%)
Min total-odr1-512 396.00 ( 0.00%) 349.00 ( 11.87%)
Min total-odr1-1024 402.00 ( 0.00%) 357.00 ( 11.19%)
Min total-odr1-2048 409.00 ( 0.00%) 364.00 ( 11.00%)
Min total-odr1-4096 414.00 ( 0.00%) 366.00 ( 11.59%)
Min total-odr1-8192 417.00 ( 0.00%) 369.00 ( 11.51%)
Min total-odr2-1 921.00 ( 0.00%) 1210.00 (-31.38%)
Min total-odr2-2 682.00 ( 0.00%) 576.00 ( 15.54%)
Min total-odr2-4 547.00 ( 0.00%) 616.00 (-12.61%)
Min total-odr2-8 483.00 ( 0.00%) 406.00 ( 15.94%)
Min total-odr2-16 449.00 ( 0.00%) 376.00 ( 16.26%)
Min total-odr2-32 437.00 ( 0.00%) 366.00 ( 16.25%)
Min total-odr2-64 431.00 ( 0.00%) 363.00 ( 15.78%)
Min total-odr2-128 433.00 ( 0.00%) 365.00 ( 15.70%)
Min total-odr2-256 434.00 ( 0.00%) 371.00 ( 14.52%)
Min total-odr2-512 446.00 ( 0.00%) 379.00 ( 15.02%)
Min total-odr2-1024 461.00 ( 0.00%) 392.00 ( 14.97%)
Min total-odr2-2048 464.00 ( 0.00%) 395.00 ( 14.87%)
Min total-odr2-4096 465.00 ( 0.00%) 398.00 ( 14.41%)
Min total-odr3-1 1060.00 ( 0.00%) 872.00 ( 17.74%)
Min total-odr3-2 778.00 ( 0.00%) 633.00 ( 18.64%)
Min total-odr3-4 632.00 ( 0.00%) 510.00 ( 19.30%)
Min total-odr3-8 565.00 ( 0.00%) 452.00 ( 20.00%)
Min total-odr3-16 524.00 ( 0.00%) 424.00 ( 19.08%)
Min total-odr3-32 514.00 ( 0.00%) 415.00 ( 19.26%)
Min total-odr3-64 515.00 ( 0.00%) 419.00 ( 18.64%)
Min total-odr3-128 523.00 ( 0.00%) 426.00 ( 18.55%)
Min total-odr3-256 541.00 ( 0.00%) 438.00 ( 19.04%)
Min total-odr3-512 559.00 ( 0.00%) 451.00 ( 19.32%)
Min total-odr3-1024 561.00 ( 0.00%) 456.00 ( 18.72%)
Min total-odr3-2048 562.00 ( 0.00%) 459.00 ( 18.33%)
Min total-odr4-1 1328.00 ( 0.00%) 1011.00 ( 23.87%)
Min total-odr4-2 997.00 ( 0.00%) 747.00 ( 25.08%)
Min total-odr4-4 813.00 ( 0.00%) 615.00 ( 24.35%)
Min total-odr4-8 721.00 ( 0.00%) 550.00 ( 23.72%)
Min total-odr4-16 689.00 ( 0.00%) 529.00 ( 23.22%)
Min total-odr4-32 683.00 ( 0.00%) 528.00 ( 22.69%)
Min total-odr4-64 692.00 ( 0.00%) 531.00 ( 23.27%)
Min total-odr4-128 713.00 ( 0.00%) 556.00 ( 22.02%)
Min total-odr4-256 738.00 ( 0.00%) 586.00 ( 20.60%)
Min total-odr4-512 753.00 ( 0.00%) 595.00 ( 20.98%)
Min total-odr4-1024 752.00 ( 0.00%) 600.00 ( 20.21%)
This patch (of 27):
order-0 pages by definition cannot be compound so avoid the check in the
fast path for those pages.
[akpm@linux-foundation.org: use unlikely(order) in free_pages_prepare(), per Vlastimil]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Right now the oom reaper will clear TIF_MEMDIE only for tasks which were
successfully reaped. This is the safest option because we know that
such an oom victim would only block forward progress of the oom killer
without a good reason because it is highly unlikely it would release
much more memory. Basically most of its memory has been already torn
down.
We can relax this assumption to catch more corner cases though.
The first obvious one is when the oom victim clears its mm and gets
stuck later on. oom_reaper would back of on find_lock_task_mm returning
NULL. We can safely try to clear TIF_MEMDIE in this case because such a
task would be ignored by the oom killer anyway. The flag would be
cleared by that time already most of the time anyway.
The less obvious one is when the oom reaper fails due to mmap_sem
contention. Even if we clear TIF_MEMDIE for this task then it is not
very likely that we would select another task too easily because we
haven't reaped the last victim and so it would be still the #1
candidate. There is a rare race condition possible when the current
victim terminates before the next select_bad_process but considering
that oom_reap_task had retried several times before giving up then this
sounds like a borderline thing.
After this patch we should have a guarantee that the OOM killer will not
be block for unbounded amount of time for most cases.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If either the current task is already killed or PF_EXITING or a selected
task is PF_EXITING then the oom killer is suppressed and so is the oom
reaper. This patch adds try_oom_reaper which checks the given task and
queues it for the oom reaper if that is safe to be done meaning that the
task doesn't share the mm with an alive process.
This might help to release the memory pressure while the task tries to
exit.
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__alloc_pages_may_oom is the central place to decide when the
out_of_memory should be invoked. This is a good approach for most
checks there because they are page allocator specific and the allocation
fails right after for all of them.
The notable exception is GFP_NOFS context which is faking
did_some_progress and keep the page allocator looping even though there
couldn't have been any progress from the OOM killer. This patch doesn't
change this behavior because we are not ready to allow those allocation
requests to fail yet (and maybe we will face the reality that we will
never manage to safely fail these request). Instead __GFP_FS check is
moved down to out_of_memory and prevent from OOM victim selection there.
There are two reasons for that
- OOM notifiers might release some memory even from this context
as none of the registered notifier seems to be FS related
- this might help a dying thread to get an access to memory
reserves and move on which will make the behavior more
consistent with the case when the task gets killed from a
different context.
Keep a comment in __alloc_pages_may_oom to make sure we do not forget
how GFP_NOFS is special and that we really want to do something about
it.
Note to the current oom_notifier users:
The observable difference for you is that oom notifiers cannot depend on
any fs locks because we could deadlock. Not that this would be allowed
today because that would just lockup machine in most of the cases and
ruling out the OOM killer along the way. Another difference is that
callbacks might be invoked sooner now because GFP_NOFS is a weaker
reclaim context and so there could be reclaimable memory which is just
not reachable now. That would require GFP_NOFS only loads which are
really rare and more importantly the observable result would be dropping
of reconstructible object and potential performance drop which is not
such a big deal when we are struggling to fulfill other important
allocation requests.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|