summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-10-15fscrypto: lock inode while setting encryption policyEric Biggers
i_rwsem needs to be acquired while setting an encryption policy so that concurrent calls to FS_IOC_SET_ENCRYPTION_POLICY are correctly serialized (especially the ->get_context() + ->set_context() pair), and so that new files cannot be created in the directory during or after the ->empty_dir() check. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org
2016-10-15ext4: correct endianness conversion in __xattr_check_inode()Eric Biggers
It should be cpu_to_le32(), not le32_to_cpu(). No change in behavior. Found with sparse, and this was the only endianness warning in fs/ext4/. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-10-13fscrypto: make XTS tweak initialization endian-independentEric Biggers
The XTS tweak (or IV) was initialized differently on little endian and big endian systems. Because the ciphertext depends on the XTS tweak, it was not possible to use an encrypted filesystem created by a little endian system on a big endian system and vice versa, even if they shared the same PAGE_SIZE. Fix this by always using little endian. This will break hypothetical big endian users of ext4 or f2fs encryption. However, all users we are aware of are little endian, and it's believed that "real" big endian users are unlikely to exist yet. So this might as well be fixed now before it's too late. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-10-13ext4: do not advertise encryption support when disabledEric Biggers
The sysfs file /sys/fs/ext4/features/encryption was present on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=n. This was misleading because such kernels do not actually support ext4 encryption. Therefore, only provide this file on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=y. Note: since the ext4 feature files are all hardcoded to have a contents of "supported", it really is the presence or absence of the file that is significant, not the contents (and this change reflects that). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-10-13jbd2: fix incorrect unlock on j_list_lockTaesoo Kim
When 'jh->b_transaction == transaction' (asserted by below) J_ASSERT_JH(jh, (jh->b_transaction == transaction || ... 'journal->j_list_lock' will be incorrectly unlocked, since the the lock is aquired only at the end of if / else-if statements (missing the else case). Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28 Cc: stable@vger.kernel.org # 3.14+
2016-10-13ext4: super.c: Update logging style using KERN_CONTJoe Perches
Recent commit require line continuing printks to use PR_CONT. Update super.c to use KERN_CONT and use vsprintf extension %pV to avoid a printk/vprintk/printk("\n") sequence as well. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-10-12Merge tag 'pwm/for-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This set of changes contains support for PWM signal capture in the STi driver as well as support for the PWM controller found on Meson SoCs. There's also support added for the MediaTek MT2701 and SunXi H3 to the existing drivers. Other than that there's a fair set of miscellaneous cleanups and fixes across the board" * tag 'pwm/for-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (24 commits) pwm: meson: Handle unknown ID values pwm: sti: Take the opportunity to conduct a little house keeping pwm: sti: It's now valid for number of PWM channels to be zero pwm: sti: Add PWM capture callback pwm: sti: Add support for PWM capture interrupts pwm: sti: Initialise PWM capture device data pwm: sti: Supply PWM Capture clock handling pwm: sti: Supply PWM capture register addresses and bit locations pwm: sti: Only request clock rate when needed pwm: sti: Reorganise register names in preparation for new functionality pwm: sti: Rename channel => device dt-bindings: pwm: sti: Update DT bindings for capture support pwm: lpc-18xx: use pwm_set_chip_data pwm: sunxi: Add H3 support pwm: Add support for Meson PWM Controller dt-bindings: pwm: Add bindings for Meson PWM Controller pwm: samsung: Fix to use lowest div for large enough modulation bits pwm: pwm-tipwmss: Remove all runtime PM gets/puts pwm: cros-ec: Add __packed to prevent padding pwm: Add MediaTek MT2701 display PWM driver support ...
2016-10-12Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal managament updates from Zhang Rui: - Enhance thermal "userspace" governor to export the reason when a thermal event is triggered and delivered to user space. From Srinivas Pandruvada - Introduce a single TSENS thermal driver for the different versions of the TSENS IP that exist, on different qcom msm/apq SoCs'. Support for msm8916, msm8960, msm8974 and msm8996 families is also added. From Rajendra Nayak - Introduce hardware-tracked trip points support to the device tree thermal sensor framework. The framework supports an arbitrary number of trip points. Whenever the current temperature is changed, the trip points immediately below and above the current temperature are found, driver callback is invoked to program the hardware to get notified when either of the two trip points are triggered. Hardware-tracked trip points support for rockchip thermal driver is also added at the same time. From Sascha Hauer, Caesar Wang - Introduce a new thermal driver, which enables TMU (Thermal Monitor Unit) on QorIQ platform. From Jia Hongtao - Introduce a new thermal driver for Maxim MAX77620. From Laxman Dewangan - Introduce a new thermal driver for Intel platforms using WhiskeyCove PMIC. From Bin Gao - Add mt2701 chip support to MTK thermal driver. From Dawei Chien - Enhance Tegra thermal driver to enable soctherm node and set "critical", "hot" trips, for Tegra124, Tegra132, Tegra210. From Wei Ni - Add resume support for tango thermal driver. From Marc Gonzalez - several small fixes and improvements for rockchip, qcom, imx, rcar, mtk thermal drivers and thermal core code. From Caesar Wang, Keerthy, Rocky Hao, Wei Yongjun, Peter Robinson, Bui Duc Phuc, Axel Lin, Hugh Kang * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (48 commits) thermal: int3403: Process trip change notification thermal: int340x: New Interface to read trip and notify thermal: user_space gov: Add additional information in uevent thermal: Enhance thermal_zone_device_update for events arm64: tegra: set hot trips for Tegra210 arm64: tegra: set critical trips for Tegra210 arm64: tegra: add soctherm node for Tegra210 arm64: tegra: set hot trips for Tegra132 arm64: tegra: set critical trips for Tegra132 arm64: tegra: use tegra132-soctherm for Tegra132 arm: tegra: set hot trips for Tegra124 arm: tegra: set critical trips for Tegra124 thermal: tegra: add hw-throttle for Tegra132 thermal: tegra: add hw-throttle function of: Add bindings of hw throttle for Tegra soctherm thermal: mtk_thermal: Check return value of devm_thermal_zone_of_sensor_register thermal: Add Mediatek thermal driver for mt2701. dt-bindings: thermal: Add binding document for Mediatek thermal controller thermal: max77620: Add thermal driver for reporting junction temp thermal: max77620: Add DT binding doc for thermal driver ...
2016-10-12Merge tag 'fbdev-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev updates from Tomi Valkeinen: "Main changes: - amba-cldc: DT backlight support, Nomadik support, Versatile improvements, fixes - efifb: fix fbcon RGB565 palette - exynos: remove unused DSI driver" * tag 'fbdev-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (42 commits) video: smscufx: remove unused variable matroxfb: fix size of memcpy fbdev: ssd1307fb: fix a possible NULL dereference fbdev: ssd1307fb: constify the device_info pointer simplefb: Disable and release clocks and regulators in destroy callback video: fbdev: constify fb_fix_screeninfo and fb_var_screeninfo structures matroxfb: constify local structures video: fbdev: i810: add in missing white space in error message text video: fbdev: add missing \n at end of printk error message ARM: exynos_defconfig: Remove old non-working MIPI driver video: fbdev: exynos: Remove old non-working MIPI driver omapfb: fix return value check in dsi_bind() MAINTAINERS: update fbdev entries video: fbdev: offb: Call pci_enable_device() before using the PCI VGA device fbdev: vfb: simplify memory management fbdev: vfb: add option for video mode fbdev: vfb: add description to module parameters video: fbdev: intelfb: remove impossible condition fb: adv7393: off by one in probe function video: fbdev: pxafb: add missing of_node_put() in of_get_pxafb_mode_info() ...
2016-10-12Disable the __builtin_return_address() warning globally after allLinus Torvalds
This affectively reverts commit 377ccbb48373 ("Makefile: Mute warning for __builtin_return_address(>0) for tracing only") because it turns out that it really isn't tracing only - it's all over the tree. We already also had the warning disabled separately for mm/usercopy.c (which this commit also removes), and it turns out that we will also want to disable it for get_lock_parent_ip(), that is used for at least TRACE_IRQFLAGS. Which (when enabled) ends up being all over the tree. Steven Rostedt had a patch that tried to limit it to just the config options that actually triggered this, but quite frankly, the extra complexity and abstraction just isn't worth it. We have never actually had a case where the warning is actually useful, so let's just disable it globally and not worry about it. Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-12Merge branch 'parisc-4.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Some final updates and fixes for this merge window for the parisc architecture. Changes include: - Fix boot problems with new memblock allocator on rp3410 machine - Increase initial kernel mapping size for 32- and 64-bit kernels, this allows to boot bigger kernels which have many modules built-in - Fix kernel layout regarding __gp and move exception table into RO section - Show trap names in crashes, use extable.h header instead of module.h" * 'parisc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Show trap name in kernel crash parisc: Zero-initialize newly alloced memblock parisc: Move exception table into read-only section parisc: Fix kernel memory layout regarding position of __gp parisc: Increase initial kernel mapping size parisc: Migrate exception table users off module.h and onto extable.h
2016-10-12Merge branch 'work.uaccess2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
2016-10-12Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "Core: - Fence destaging work - DRIVER_LEGACY to split off legacy drm drivers - drm_mm refactoring - Splitting drm_crtc.c into chunks and documenting better - Display info fixes - rbtree support for prime buffer lookup - Simple VGA DAC driver Panel: - Add Nexus 7 panel - More simple panels i915: - Refactoring GEM naming - Refactored vma/active tracking - Lockless request lookups - Better stolen memory support - FBC fixes - SKL watermark fixes - VGPU improvements - dma-buf fencing support - Better DP dongle support amdgpu: - Powerplay for Iceland asics - Improved GPU reset support - UVD/VEC powergating support for CZ/ST - Preinitialised VRAM buffer support - Virtual display support - Initial SI support - GTT rework - PCI shutdown callback support - HPD IRQ storm fixes amdkfd: - bugfixes tilcdc: - Atomic modesetting support mediatek: - AAL + GAMMA engine support - Hook up gamma LUT - Temporal dithering support imx: - Pixel clock from devicetree - drm bridge support for LVDS bridges - active plane reconfiguration - VDIC deinterlacer support - Frame synchronisation unit support - Color space conversion support analogix: - PSR support - Better panel on/off support rockchip: - rk3399 vop/crtc support - PSR support vc4: - Interlaced vblank timing - 3D rendering CPU overhead reduction - HDMI output fixes tda998x: - HDMI audio ASoC support sunxi: - Allwinner A33 support - better TCON support msm: - DT binding cleanups - Explicit fence-fd support sti: - remove sti415/416 support etnaviv: - MMUv2 refactoring - GC3000 support exynos: - Refactoring HDMI DCC/PHY - G2D pm regression fix - Page fault issues with wait for vblank There is no nouveau work in this tree, as Ben didn't get a pull request in, and he was fighting moving to atomic and adding mst support, so maybe best it waits for a cycle" * tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits) drm/crtc: constify drm_crtc_index parameter drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next drm/i915/guc: Unwind GuC workqueue reservation if request construction fails drm/i915: Reset the breadcrumbs IRQ more carefully drm/i915: Force relocations via cpu if we run out of idle aperture drm/i915: Distinguish last emitted request from last submitted request drm/i915: Allow DP to work w/o EDID drm/i915: Move long hpd handling into the hotplug work drm/i915/execlists: Reinitialise context image after GPU hang drm/i915: Use correct index for backtracking HUNG semaphores drm/i915: Unalias obj->phys_handle and obj->userptr drm/i915: Just clear the mmiodebug before a register access drm/i915/gen9: only add the planes actually affected by ddb changes drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED drm/i915/bxt: Fix HDMI DPLL configuration drm/i915/gen9: fix the watermark res_blocks value drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations drm/i915/gen9: minimum scanlines for Y tile is not always 4 drm/i915/gen9: fix the WaWmMemoryReadLatency implementation drm/i915/kbl: KBL also needs to run the SAGV code ...
2016-10-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - a few block updates that fell in my lap - lib/ updates - checkpatch - autofs - ipc - a ton of misc other things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits) mm: split gfp_mask and mapping flags into separate fields fs: use mapping_set_error instead of opencoded set_bit treewide: remove redundant #include <linux/kconfig.h> hung_task: allow hung_task_panic when hung_task_warnings is 0 kthread: add kerneldoc for kthread_create() kthread: better support freezable kthread workers kthread: allow to modify delayed kthread work kthread: allow to cancel kthread work kthread: initial support for delayed kthread work kthread: detect when a kthread work is used by more workers kthread: add kthread_destroy_worker() kthread: add kthread_create_worker*() kthread: allow to call __kthread_create_on_node() with va_list args kthread/smpboot: do not park in kthread_create_on_cpu() kthread: kthread worker API cleanup kthread: rename probe_kthread_data() to kthread_probe_data() scripts/tags.sh: enable code completion in VIM mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping kdump, vmcoreinfo: report memory sections virtual addresses ipc/sem.c: add cond_resched in exit_sme ...
2016-10-11mm: split gfp_mask and mapping flags into separate fieldsMichal Hocko
mapping->flags currently encodes two different things into a single flag. It contains sticky gfp_mask for page cache allocations and AS_ codes used to report errors/enospace and other states which are mapping specific. Condensing the two semantically unrelated things saves few bytes but it also complicates other things. For one thing the gfp flags space is reduced and in fact we are already running out of available bits. It can be assumed that more gfp flags will be necessary later on. To not introduce the address_space grow (at least on x86_64) we can stick it right after private_lock because we have a hole there. struct address_space { struct inode * host; /* 0 8 */ struct radix_tree_root page_tree; /* 8 16 */ spinlock_t tree_lock; /* 24 4 */ atomic_t i_mmap_writable; /* 28 4 */ struct rb_root i_mmap; /* 32 8 */ struct rw_semaphore i_mmap_rwsem; /* 40 40 */ /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ long unsigned int nrpages; /* 80 8 */ long unsigned int nrexceptional; /* 88 8 */ long unsigned int writeback_index; /* 96 8 */ const struct address_space_operations * a_ops; /* 104 8 */ long unsigned int flags; /* 112 8 */ spinlock_t private_lock; /* 120 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ struct list_head private_list; /* 128 16 */ void * private_data; /* 144 8 */ /* size: 152, cachelines: 3, members: 14 */ /* sum members: 148, holes: 1, sum holes: 4 */ /* last cacheline: 24 bytes */ }; Link: http://lkml.kernel.org/r/20160912114852.GI14524@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11fs: use mapping_set_error instead of opencoded set_bitMichal Hocko
The mapping_set_error() helper sets the correct AS_ flag for the mapping so there is no reason to open code it. Use the helper directly. [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO] Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11treewide: remove redundant #include <linux/kconfig.h>Masahiro Yamada
Kernel source files need not include <linux/kconfig.h> explicitly because the top Makefile forces to include it with: -include $(srctree)/include/linux/kconfig.h This commit removes explicit includes except the following: * arch/s390/include/asm/facilities_src.h * tools/testing/radix-tree/linux/kernel.h These two are used for host programs. Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11hung_task: allow hung_task_panic when hung_task_warnings is 0John Siddle
Previously hung_task_panic would not be respected if enabled after hung_task_warnings had already been decremented to 0. Permit the kernel to panic if hung_task_panic is enabled after hung_task_warnings has already been decremented to 0 and another task hangs for hung_task_timeout_secs seconds. Check if hung_task_panic is enabled so we don't return prematurely, and check if hung_task_warnings is non-zero so we don't print the warning unnecessarily. [akpm@linux-foundation.org: fix off-by-one] Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.com Signed-off-by: John Siddle <jsiddle@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: add kerneldoc for kthread_create()Jonathan Corbet
This macro is referenced in other kerneldoc comments, but lacks one of its own; fix that. Link: http://lkml.kernel.org/r/20160826072313.726a3485@lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: better support freezable kthread workersPetr Mladek
This patch allows to make kthread worker freezable via a new @flags parameter. It will allow to avoid an init work in some kthreads. It currently does not affect the function of kthread_worker_fn() but it might help to do some optimization or fixes eventually. I currently do not know about any other use for the @flags parameter but I believe that we will want more flags in the future. Finally, I hope that it will not cause confusion with @flags member in struct kthread. Well, I guess that we will want to rework the basic kthreads implementation once all kthreads are converted into kthread workers or workqueues. It is possible that we will merge the two structures. Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: allow to modify delayed kthread workPetr Mladek
There are situations when we need to modify the delay of a delayed kthread work. For example, when the work depends on an event and the initial delay means a timeout. Then we want to queue the work immediately when the event happens. This patch implements kthread_mod_delayed_work() as inspired workqueues. It cancels the timer, removes the work from any worker list and queues it again with the given timeout. A very special case is when the work is being canceled at the same time. It might happen because of the regular kthread_cancel_delayed_work_sync() or by another kthread_mod_delayed_work(). In this case, we do nothing and let the other operation win. This should not normally happen as the caller is supposed to synchronize these operations a reasonable way. Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: allow to cancel kthread workPetr Mladek
We are going to use kthread workers more widely and sometimes we will need to make sure that the work is neither pending nor running. This patch implements cancel_*_sync() operations as inspired by workqueues. Well, we are synchronized against the other operations via the worker lock, we use del_timer_sync() and a counter to count parallel cancel operations. Therefore the implementation might be easier. First, we check if a worker is assigned. If not, the work has newer been queued after it was initialized. Second, we take the worker lock. It must be the right one. The work must not be assigned to another worker unless it is initialized in between. Third, we try to cancel the timer when it exists. The timer is deleted synchronously to make sure that the timer call back is not running. We need to temporary release the worker->lock to avoid a possible deadlock with the callback. In the meantime, we set work->canceling counter to avoid any queuing. Fourth, we try to remove the work from a worker list. It might be the list of either normal or delayed works. Fifth, if the work is running, we call kthread_flush_work(). It might take an arbitrary time. We need to release the worker-lock again. In the meantime, we again block any queuing by the canceling counter. As already mentioned, the check for a pending kthread work is done under a lock. In compare with workqueues, we do not need to fight for a single PENDING bit to block other operations. Therefore we do not suffer from the thundering storm problem and all parallel canceling jobs might use kthread_flush_work(). Any queuing is blocked until the counter gets zero. Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: initial support for delayed kthread workPetr Mladek
We are going to use kthread_worker more widely and delayed works will be pretty useful. The implementation is inspired by workqueues. It uses a timer to queue the work after the requested delay. If the delay is zero, the work is queued immediately. In compare with workqueues, each work is associated with a single worker (kthread). Therefore the implementation could be much easier. In particular, we use the worker->lock to synchronize all the operations with the work. We do not need any atomic operation with a flags variable. In fact, we do not need any state variable at all. Instead, we add a list of delayed works into the worker. Then the pending work is listed either in the list of queued or delayed works. And the existing check of pending works is the same even for the delayed ones. A work must not be assigned to another worker unless reinitialized. Therefore the timer handler might expect that dwork->work->worker is valid and it could simply take the lock. We just add some sanity checks to help with debugging a potential misuse. Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: detect when a kthread work is used by more workersPetr Mladek
Nothing currently prevents a work from queuing for a kthread worker when it is already running on another one. This means that the work might run in parallel on more than one worker. Also some operations are not reliable, e.g. flush. This problem will be even more visible after we add kthread_cancel_work() function. It will only have "work" as the parameter and will use worker->lock to synchronize with others. Well, normally this is not a problem because the API users are sane. But bugs might happen and users also might be crazy. This patch adds a warning when we try to insert the work for another worker. It does not fully prevent the misuse because it would make the code much more complicated without a big benefit. It adds the same warning also into kthread_flush_work() instead of the repeated attempts to get the right lock. A side effect is that one needs to explicitly reinitialize the work if it must be queued into another worker. This is needed, for example, when the worker is stopped and started again. It is a bit inconvenient. But it looks like a good compromise between the stability and complexity. I have double checked all existing users of the kthread worker API and they all seems to initialize the work after the worker gets started. Just for completeness, the patch adds a check that the work is not already in a queue. The patch also puts all the checks into a separate function. It will be reused when implementing delayed works. Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: add kthread_destroy_worker()Petr Mladek
The current kthread worker users call flush() and stop() explicitly. This function does the same plus it frees the kthread_worker struct in one call. It is supposed to be used together with kthread_create_worker*() that allocates struct kthread_worker. Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: add kthread_create_worker*()Petr Mladek
Kthread workers are currently created using the classic kthread API, namely kthread_run(). kthread_worker_fn() is passed as the @threadfn parameter. This patch defines kthread_create_worker() and kthread_create_worker_on_cpu() functions that hide implementation details. They enforce using kthread_worker_fn() for the main thread. But I doubt that there are any plans to create any alternative. In fact, I think that we do not want any alternative main thread because it would be hard to support consistency with the rest of the kthread worker API. The naming and function of kthread_create_worker() is inspired by the workqueues API like the rest of the kthread worker API. The kthread_create_worker_on_cpu() variant is motivated by the original kthread_create_on_cpu(). Note that we need to bind per-CPU kthread workers already when they are created. It makes the life easier. kthread_bind() could not be used later for an already running worker. This patch does _not_ convert existing kthread workers. The kthread worker API need more improvements first, e.g. a function to destroy the worker. IMPORTANT: kthread_create_worker_on_cpu() allows to use any format of the worker name, in compare with kthread_create_on_cpu(). The good thing is that it is more generic. The bad thing is that most users will need to pass the cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu, "helper/%d", cpu). To be honest, the main motivation was to avoid the need for an empty va_list. The only legal way was to create a helper function that would be called with an empty list. Other attempts caused compilation warnings or even errors on different architectures. There were also other alternatives, for example, using #define or splitting __kthread_create_worker(). The used solution looked like the least ugly. Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: allow to call __kthread_create_on_node() with va_list argsPetr Mladek
kthread_create_on_node() implements a bunch of logic to create the kthread. It is already called by kthread_create_on_cpu(). We are going to extend the kthread worker API and will need to call kthread_create_on_node() with va_list args there. This patch does only a refactoring and does not modify the existing behavior. Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread/smpboot: do not park in kthread_create_on_cpu()Petr Mladek
kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e ("kthread: Implement park/unpark facility"). It is currently used only when enabling new CPU. For this purpose, the newly created kthread has to be parked. The CPU binding is a bit tricky. The kthread is parked when the CPU has not been allowed yet. And the CPU is bound when the kthread is unparked. The function would be useful for more per-CPU kthreads, e.g. bnx2fc_thread, fcoethread. For this purpose, the newly created kthread should stay in the uninterruptible state. This patch moves the parking into smpboot. It binds the thread already when created. Then the function might be used universally. Also the behavior is consistent with kthread_create() and kthread_create_on_node(). Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: kthread worker API cleanupPetr Mladek
A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [arnd@arndb.de: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kthread: rename probe_kthread_data() to kthread_probe_data()Petr Mladek
Patch series "kthread: Kthread worker API improvements" The intention of this patchset is to make it easier to manipulate and maintain kthreads. Especially, I want to replace all the custom main cycles with a generic one. Also I want to make the kthreads sleep in a consistent state in a common place when there is no work. This patch (of 11): A good practice is to prefix the names of functions by the name of the subsystem. This patch fixes the name of probe_kthread_data(). The other wrong functions names are part of the kthread worker API and will be fixed separately. Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11scripts/tags.sh: enable code completion in VIMMathieu Maret
Vim, with the omnicppcomplete(#1) plugin, can do code completion using information build by ctags. Add flags needed by omnicppcomplete(#2) to have completion on member of structure. 1: https://github.com/vim-scripts/omnicppcomplete 2: https://github.com/vim-scripts/OmniCppComplete/blob/master/doc/omnicppcomplete.txt#L93 Link: http://lkml.kernel.org/r/20160830191546.4469-1-mathieu.maret@gmail.com Signed-off-by: Mathieu Maret <mathieu.maret@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mappingCatalin Marinas
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a physical address to a virtual one using __va(). However, such physical addresses may sometimes be located in highmem and using __va() is incorrect, leading to inconsistent object tracking in kmemleak. The following functions have been added to the kmemleak API and they take a physical address as the object pointer. They only perform the corresponding action if the address has a lowmem mapping: kmemleak_alloc_phys kmemleak_free_part_phys kmemleak_not_leak_phys kmemleak_ignore_phys The affected calling places have been updated to use the new kmemleak API. Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kdump, vmcoreinfo: report memory sections virtual addressesThomas Garnier
KASLR memory randomization can randomize the base of the physical memory mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap (VMEMMAP_START). Adding these variables on VMCOREINFO so tools can easily identify the base of each memory section. Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com Signed-off-by: Thomas Garnier <thgarnie@google.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/sem.c: add cond_resched in exit_smeNikolay Borisov
In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in exit_sem. Apparently it's possible for the loop to take quite a long time and it doesn't have a scheduling point in it. Since the codes is executing under an rcu read section this may also cause rcu stalls, which in turn block synchronize_rcu operations, which more or less de-stabilises the whole system. Fix this by introducing a cond_resched() at the beginning of the loop. So this patch fixes the following: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119] CPU: 10 PID: 18119 Comm: httpd Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000 RIP: 0010:[<ffffffff81614bc7>] [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30 RSP: 0018:ffff881c95553e40 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4 RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88 R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0 R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0 FS: 0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0 Call Trace: ? exit_sem+0x7c/0x280 do_exit+0x338/0xb40 do_group_exit+0x43/0xd0 SyS_exit_group+0x14/0x20 entry_SYSCALL_64_fastpath+0x16/0x6e Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com Signed-off-by: Nikolay Borisov <kernel@kyup.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/msg: avoid waking sender upon full queueDavidlohr Bueso
Blocked tasks queued in q_senders waiting for their message to fit in the queue are blindly awoken every time we think there's a remote chance this might happen. This could cause numerous (and expensive -- thundering herd-ish) bogus wakeups if the queue is still really full. Adding to the scheduling cost/overhead, there's also the fact that we need to take the ipc object lock and requeue ourselves in the q_senders list. By keeping track of the blocked sender's message size, we can know previously if the wakeup ought to occur or not. Otherwise, to maintain the current wakeup order we just move it to the tail. This is exactly what occurs right now if the sender needs to go back to sleep. The case of EIDRM is left completely untouched, as we need to wakeup all the tasks, and shouldn't be playing games in the first place. This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context switches (avg out of ten runs). Although these tests are really about functionality (as opposed to performance), is does show the direct benefits of the optimization. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/msg: make ss_wakeup() kill arg booleanDavidlohr Bueso
... 'tis annoying. Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/msg: batch queue sender wakeupsDavidlohr Bueso
Currently the use of wake_qs in sysv msg queues are only for the receiver tasks that are blocked on the queue. But blocked sender tasks (due to queue size constraints) still are awoken with the ipc object lock held, which can be a problem particularly for small sized queues and far from gracious for -rt (just like it was for the receiver side). The paths that actually wakeup a sender are obviously related to when we are either getting rid of the queue or after (some) space is freed-up after a receiver takes the msg (msgrcv). Furthermore, with the exception of msgrcv, we can always piggy-back on expunge_all that has its own tasks lined-up for waking. Finally, upon unlinking the message, it should be no problem delaying the wakeups a bit until after we've released the lock. Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/msg: implement lockless pipelined wakeupsSebastian Andrzej Siewior
This patch moves the wakeup_process() invocation so it is not done under the ipc global lock by making use of a lockless wake_q. With this change, the waiter is woken up once the message has been assigned and it does not need to loop on SMP if the message points to NULL. In the signal case we still need to check the pointer under the lock to verify the state. This change should also avoid the introduction of preempt_disable() in -RT which avoids a busy-loop which pools for the NULL -> !NULL change if the waiter has a higher priority compared to the waker. By making use of wake_qs, the logic of sysv msg queues is greatly simplified (and very well suited as we can batch lockless wakeups), particularly around the lockless receive algorithm. This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G": test | before | after | diff -----------------|------------|------------|---------- pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 % pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 % pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 % Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11ipc/sem.c: fix complex_count vs. simple op raceManfred Spraul
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: sem_lock has a fast path that allows parallel simple operations. There are two reasons why a simple operation cannot run in parallel: - a non-simple operations is ongoing (sma->sem_perm.lock held) - a complex operation is sleeping (sma->complex_count != 0) As both facts are stored independently, a thread can bypass the current checks by sleeping in the right positions. See below for more details (or kernel bugzilla 105651). The patch fixes that by creating one variable (complex_mode) that tracks both reasons why parallel operations are not possible. The patch also updates stale documentation regarding the locking. With regards to stable kernels: The patch is required for all kernels that include the commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) The alternative is to revert the patch that introduced the race. The patch is safe for backporting, i.e. it makes no assumptions about memory barriers in spin_unlock_wait(). Background: Here is the race of the current implementation: Thread A: (simple op) - does the first "sma->complex_count == 0" test Thread B: (complex op) - does sem_lock(): This includes an array scan. But the scan can't find Thread A, because Thread A does not own sem->lock yet. - the thread does the operation, increases complex_count, drops sem_lock, sleeps Thread A: - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) - sleeps before the complex_count test Thread C: (complex op) - does sem_lock (no array scan, complex_count==1) - wakes up Thread B. - decrements complex_count Thread A: - does the complex_count test Bug: Now both thread A and thread C operate on the same array, without any synchronization. Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com Reported-by: <felixh@informatik.uni-bremen.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <1vier1@web.de> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11kcov: do not instrument lib/stackdepot.cAlexander Potapenko
There's no point in collecting coverage from lib/stackdepot.c, as it is not a function of syscall inputs. Disabling kcov instrumentation for that file will reduce the coverage noise level. Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11config: android: enable CONFIG_SECCOMPRob Herring
As of Android N, SECCOMP is required. Without it, we will get mediaextractor error: E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument Link: http://lkml.kernel.org/r/20160908185934.18098-3-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11config: android: set SELinux as default security modeRob Herring
Android won't boot without SELinux enabled, so make it the default. Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11config: android: move device mapper options to recommendedRob Herring
CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and DM_VERITY options are in base. The result is the options in base don't get enabled when applying both base and recommended fragments. Move all the options to recommended. Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11config/android: Remove CONFIG_IPV6_PRIVACYBorislav Petkov
Option is long gone, see commit 5d9efa7ee99e ("ipv6: Remove privacy config option.") Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11relay: Use irq_work instead of plain timer for deferred wakeupPeter Zijlstra
Relay avoids calling wake_up_interruptible() for doing the wakeup of readers/consumers, waiting for the generation of new data, from the context of a process which produced the data. This is apparently done to prevent the possibility of a deadlock in case Scheduler itself is is generating data for the relay, after acquiring rq->lock. The following patch used a timer (to be scheduled at next jiffy), for delegating the wakeup to another context. commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7 Author: Tom Zanussi <zanussi@comcast.net> Date: Wed May 9 02:34:01 2007 -0700 relay: use plain timer instead of delayed work relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Scheduling a plain timer, at next jiffies boundary, to do the wakeup causes a significant wakeup latency for the Userspace client, which makes relay less suitable for the high-frequency low-payload use cases where the data gets generated at a very high rate, like multiple sub buffers getting filled within a milli second. Moreover the timer is re-scheduled on every newly produced sub buffer so the timer keeps getting pushed out if sub buffers are filled in a very quick succession (less than a jiffy gap between filling of 2 sub buffers). As a result relay runs out of sub buffers to store the new data. By using irq_work it is ensured that wakeup of userspace client, blocked in the poll call, is done at earliest (through self IPI or next timer tick) enabling it to always consume the data in time. Also this makes relay consistent with printk & ring buffers (trace), as they too use irq_work for deferred wake up of readers. [arnd@arndb.de: select CONFIG_IRQ_WORK] Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11pps: kc: fix non-tickless system config dependencyMaciej S. Szmigiero
CONFIG_NO_HZ currently only sets the default value of dynticks config so if PPS kernel consumer needs periodic timer ticks it should depend on !CONFIG_NO_HZ_COMMON instead of !CONFIG_NO_HZ. Otherwise it is possible to enable it even on tickless system which has CONFIG_NO_HZ not set and CONFIG_NO_HZ_IDLE (or CONFIG_NO_HZ_FULL) set. Link: http://lkml.kernel.org/r/57E2B769.50202@maciej.szmigiero.name Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11mips/panic: replace smp_send_stop() with kdump friendly version in panic pathHidehiro Kawai
Daniel Walker reported problems which happens when crash_kexec_post_notifiers kernel option is enabled (https://lkml.org/lkml/2015/6/24/44). In that case, smp_send_stop() is called before entering kdump routines which assume other CPUs are still online. As the result, kdump routines fail to save other CPUs' registers. Additionally for MIPS OCTEON, it misses to stop the watchdog timer. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch provides MIPS version. Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080950.11028.28000.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11x86/panic: replace smp_send_stop() with kdump friendly version in panic pathHidehiro Kawai
Daniel Walker reported problems which happens when crash_kexec_post_notifiers kernel option is enabled (https://lkml.org/lkml/2015/6/24/44). In that case, smp_send_stop() is called before entering kdump routines which assume other CPUs are still online. As the result, for x86, kdump routines fail to save other CPUs' registers and disable virtualization extensions. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch only provides x86-specific version. For Xen's PV kernel, just keep the current behavior. NOTES: - Right solution would be to place crash_smp_send_stop() before __crash_kexec() invocation in all cases and remove smp_send_stop(), but we can't do that until all architectures implement own crash_smp_send_stop() - crash_smp_send_stop()-like work is still needed by machine_crash_shutdown() because crash_kexec() can be called without entering panic() Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11nvme: use the DMA_ATTR_NO_WARN attributeMauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR). Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11powerpc: implement the DMA_ATTR_NO_WARN attributeMauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code. Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>