summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2015-04-24Merge tag 'md/4.1' of git://neil.brown.name/mdLinus Torvalds
Pull md updates from Neil Brown: "More updates that usual this time. A few have performance impacts which hould mostly be positive, but RAID5 (in particular) can be very work-load ensitive... We'll have to wait and see. Highlights: - "experimental" code for managing md/raid1 across a cluster using DLM. Code is not ready for general use and triggers a WARNING if used. However it is looking good and mostly done and having in mainline will help co-ordinate development. - RAID5/6 can now batch multiple (4K wide) stripe_heads so as to handle a full (chunk wide) stripe as a single unit. - RAID6 can now perform read-modify-write cycles which should help performance on larger arrays: 6 or more devices. - RAID5/6 stripe cache now grows and shrinks dynamically. The value set is used as a minimum. - Resync is now allowed to go a little faster than the 'mininum' when there is competing IO. How much faster depends on the speed of the devices, so the effective minimum should scale with device speed to some extent" * tag 'md/4.1' of git://neil.brown.name/md: (58 commits) md/raid5: don't do chunk aligned read on degraded array. md/raid5: allow the stripe_cache to grow and shrink. md/raid5: change ->inactive_blocked to a bit-flag. md/raid5: move max_nr_stripes management into grow_one_stripe and drop_one_stripe md/raid5: pass gfp_t arg to grow_one_stripe() md/raid5: introduce configuration option rmw_level md/raid5: activate raid6 rmw feature md/raid6 algorithms: xor_syndrome() for SSE2 md/raid6 algorithms: xor_syndrome() for generic int md/raid6 algorithms: improve test program md/raid6 algorithms: delta syndrome functions raid5: handle expansion/resync case with stripe batching raid5: handle io error of batch list RAID5: batch adjacent full stripe write raid5: track overwrite disk count raid5: add a new flag to track if a stripe can be batched raid5: use flex_array for scribble data md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid md: allow resync to go faster when there is competing IO. md: remove 'go_faster' option from ->sync_request() ...
2015-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: 1) ldc_alloc_exp_dring() can be called from softints, so use GFP_ATOMIC. From Sowmini Varadhan. 2) Some minor warning/build fixups for the new iommu-common code on certain archs and with certain debug options enabled. Also from Sowmini Varadhan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context sparc64: Use M7 PMC write on all chips T4 and onward. iommu-common: rename iommu_pool_hash to iommu_hash_common iommu-common: fix x86_64 compiler warnings
2015-04-21md/raid6 algorithms: xor_syndrome() for SSE2Markus Stockhausen
The second and (last) optimized XOR syndrome calculation. This version supports right and left side optimization. All CPUs with architecture older than Haswell will benefit from it. It should be noted that SSE2 movntdq kills performance for memory areas that are read and written simultaneously in chunks smaller than cache line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR functions. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-21md/raid6 algorithms: xor_syndrome() for generic intMarkus Stockhausen
Start the algorithms with the very basic one. It is left and right optimized. That means we can avoid all calculations for unneeded pages above the right stop offset. For pages below the left start offset we still need the syndrome multiplication but without reading data pages. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-21md/raid6 algorithms: improve test programMarkus Stockhausen
It is always helpful to have a test tool in place if we implement new data critical algorithms. So add some test routines to the raid6 checker that can prove if the new xor_syndrome() works as expected. Run through all permutations of start/stop pages per algorithm and simulate a xor_syndrome() assisted rmw run. After each rmw check if the recovery algorithm still confirms that the stripe is fine. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-21md/raid6 algorithms: delta syndrome functionsMarkus Stockhausen
v3: s-o-b comment, explanation of performance and descision for the start/stop implementation Implementing rmw functionality for RAID6 requires optimized syndrome calculation. Up to now we can only generate a complete syndrome. The target P/Q pages are always overwritten. With this patch we provide a framework for inplace P/Q modification. In the first place simply fill those functions with NULL values. xor_syndrome() has two additional parameters: start & stop. These will indicate the first and last page that are changing during a rmw run. That makes it possible to avoid several unneccessary loops and speed up calculation. The caller needs to implement the following logic to make the functions work. 1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source blocks inside P/Q between (and including) start and end. 2) modify any block with start <= block <= stop 3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of source blocks into P/Q between (and including) start and end. Pages between start and stop that won't be changed should be filled with a pointer to the kernel zero page. The reasons for not taking NULL pages are: 1) Algorithms cross the whole source data line by line. Thus avoid additional branches. 2) Having a NULL page avoids calculating the XOR P parity but still need calulation steps for the Q parity. Depending on the algorithm unrolling that might be only a difference of 2 instructions per loop. The benchmark numbers of the gen_syndrome() functions are displayed in the kernel log. Do the same for the xor_syndrome() functions. This will help to analyze performance problems and give an rough estimate how well the algorithm works. The choice of the fastest algorithm will still depend on the gen_syndrome() performance. With the start/stop page implementation the speed can vary a lot in real life. E.g. a change of page 0 & page 15 on a stripe will be harder to compute than the case where page 0 & page 1 are XOR candidates. To be not to enthusiatic about the expected speeds we will run a worse case test that simulates a change on the upper half of the stripe. So we do: 1) calculation of P/Q for the upper pages 2) continuation of Q for the lower (empty) pages Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-21Merge tag 'char-misc-4.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog. All of this has been in linux-next for a while" * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits) mei: trace: remove unused TRACE_SYSTEM_STRING DTS: ARM: OMAP3-N900: Add lis3lv02d support Documentation: DT: lis302: update wakeup binding lis3lv02d: DT: add wakeup unit 2 and wakeup threshold lis3lv02d: DT: use s32 to support negative values Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case mei: replace check for connection instead of transitioning mei: use mei_cl_is_connected consistently mei: fix mei_poll operation hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg() Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function Drivers: hv: hv_balloon: do not online pages in offline blocks hv: remove the per-channel workqueue hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind() hv: run non-blocking message handlers in the dispatch tasklet coresight: moving to new "hwtracing" directory coresight-tmc: Adding a status interface to sysfs coresight: remove the unnecessary configuration coresight-default-sink ...
2015-04-20iommu-common: rename iommu_pool_hash to iommu_hash_commonSowmini Varadhan
When CONFIG_DEBUG_FORCE_WEAK_PER_CPU is set, the DEFINE_PER_CPU_SECTION macro will define an extern __pcpu_unique_##name variable that could conflict with the same definition in powerpc at this time. Avoid that conflict by renaming iommu_pool_hash in iommu-common.c Thanks to Guenter Roeck for catching this, and helping to test the fix. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-20iommu-common: fix x86_64 compiler warningsSowmini Varadhan
Declare iommu_large_alloc as static. Remove extern definition for iommu_tbl_pool_init(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-20Merge tag 'cpumask-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell: "This is the final removal (after several years!) of the obsolete cpus_* functions, prompted by their mis-use in staging. With these function removed, all cpu functions should only iterate to nr_cpu_ids, so we finally only allocate that many bits when cpumasks are allocated offstack" * tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) cpumask: remove __first_cpu / __next_cpu cpumask: resurrect CPU_MASK_CPU0 linux/cpumask.h: add typechecking to cpumask_test_cpu cpumask: only allocate nr_cpumask_bits. Fix weird uses of num_online_cpus(). cpumask: remove deprecated functions. mips: fix obsolete cpumask_of_cpu usage. x86: fix more deprecated cpu function usage. ia64: remove deprecated cpus_ usage. powerpc: fix deprecated CPU_MASK_CPU0 usage. CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region. staging/lustre/o2iblnd: Don't use cpus_weight staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_ staging/lustre/ptlrpc: Do not use deprecated cpus_* functions blackfin: fix up obsolete cpu function usage. parisc: fix up obsolete cpu function usage. tile: fix up obsolete cpu function usage. arm64: fix up obsolete cpu function usage. mips: fix up obsolete cpu function usage. x86: fix up obsolete cpu function usage. ...
2015-04-19hexdump: avoid warning in test functionLinus Torvalds
The test_data_1_le[] array is a const array of const char *. To avoid dropping any const information, we need to use "const char * const *", not just "const char **". I'm not sure why the different test arrays end up having different const'ness, but let's make the pointer we use to traverse them as const as possible, since we modify neither the array of pointers _or_ the pointers we find in the array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19cpumask: remove __first_cpu / __next_cpuRusty Russell
They were for use by the deprecated first_cpu() and next_cpu() wrappers, but sparc used them directly. They're now replaced by cpumask_first / cpumask_next. And __next_cpu_nr is completely obsolete. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net>
2015-04-18iommu-common: Fix PARISC compile-time warningsSowmini Varadhan
Fixes warnings due to - no DMA_ERROR_CODE on PARISC, - sizeof (unsigned long) == 4 bytes on PARISC. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18Break up monolithic iommu table/lock into finer graularity pools and lockSowmini Varadhan
Investigation of multithreaded iperf experiments on an ethernet interface show the iommu->lock as the hottest lock identified by lockstat, with something of the order of 21M contentions out of 27M acquisitions, and an average wait time of 26 us for the lock. This is not efficient. A more scalable design is to follow the ppc model, where the iommu_map_table has multiple pools, each stretching over a segment of the map, and with a separate lock for each pool. This model allows for better parallelization of the iommu map search. This patch adds the iommu range alloc/free function infrastructure. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18sparc: Revert generic IOMMU allocator.David S. Miller
I applied the wrong version of this patch series, V4 instead of V10, due to a patchwork bundling snafu. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc updates from David Miller: "The PowerPC folks have a really nice scalable IOMMU pool allocator that we wanted to make use of for sparc. So here we have a series that abstracts out their code into a common layer that anyone can make use of. Sparc is converted, and the PowerPC folks have reviewed and ACK'd this series and plan to convert PowerPC over as well" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: iommu-common: Fix PARISC compile-time warnings sparc: Make LDC use common iommu poll management functions sparc: Make sparc64 use scalable lib/iommu-common.c functions sparc: Break up monolithic iommu table/lock into finer graularity pools and lock
2015-04-17iommu-common: Fix PARISC compile-time warningsSowmini Varadhan
Fixes warnings due to - no DMA_ERROR_CODE on PARISC, - sizeof (unsigned long) == 4 bytes on PARISC. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third patchbomb from Andrew Morton: - various misc things - a couple of lib/ optimisations - provide DIV_ROUND_CLOSEST_ULL() - checkpatch updates - rtc tree - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs - ptrace fixes - fork() fixes - seccomp cleanups - more mmap_sem hold time reductions from Davidlohr * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits) proc: show locks in /proc/pid/fdinfo/X docs: add missing and new /proc/PID/status file entries, fix typos drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic Documentation/spi/spidev_test.c: fix warning drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type .gitignore: ignore *.tar MAINTAINERS: add Mediatek SoC mailing list tomoyo: reduce mmap_sem hold for mm->exe_file powerpc/oprofile: reduce mmap_sem hold for exe_file oprofile: reduce mmap_sem hold for mm->exe_file mips: ip32: add platform data hooks to use DS1685 driver lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text x86: switch to using asm-generic for seccomp.h sparc: switch to using asm-generic for seccomp.h powerpc: switch to using asm-generic for seccomp.h parisc: switch to using asm-generic for seccomp.h mips: switch to using asm-generic for seccomp.h microblaze: use asm-generic for seccomp.h arm: use asm-generic for seccomp.h seccomp: allow COMPAT sigreturn overrides ...
2015-04-17lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help textAndrew Morton
Cc: Yalin Wang <yalin.wang@sonymobile.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17cpumask: don't perform while loop in cpumask_next_and()Sergey Senozhatsky
cpumask_next_and() is looking for cpumask_next() in src1 in a loop and tests if found cpu is also present in src2. remove that loop, perform cpumask_and() of src1 and src2 first and use that new mask to find cpumask_next(). Apart from removing while loop, ./bloat-o-meter on x86_64 shows add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8) function old new delta cpumask_next_and 62 54 -8 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/bitmap.c: bitmap_[empty,full]: remove code duplicationYury Norov
bitmap_empty() has its own implementation. But it's clearly as simple as: find_first_bit(src, nbits) == nbits The same is true for 'bitmap_full'. Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: George Spelvin <linux@horizon.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/vsprintf.c: improve put_dec_trunc8 slightlyRasmus Villemoes
I hadn't had enough coffee when I wrote this. Currently, the final increment of buf depends on the value loaded from the table, and causes gcc to emit a cmov immediately before the return. It is smarter to let it depend on r, since the increment can then be computed in parallel with the final load/store pair. It also shaves 16 bytes of .text. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/dma-debug: fix bucket_find_contain()Sebastian Ott
bucket_find_contain() will search the bucket list for a dma_debug_entry. When the entry isn't found it needs to search other buckets too, since only the start address of a dma range is hashed (which might be in a different bucket). A copy of the dma_debug_entry is used to get the previous hash bucket but when its list is searched the original dma_debug_entry is to be used not its modified copy. This fixes false "device driver tries to sync DMA memory it has not allocated" warnings. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Horia Geanta <horia.geanta@freescale.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/vsprintf.c: even faster binary to decimal conversionRasmus Villemoes
The most expensive part of decimal conversion is the divisions by 10 (albeit done using reciprocal multiplication with appropriately chosen constants). I decided to see if one could eliminate around half of these multiplications by emitting two digits at a time, at the cost of a 200 byte lookup table, and it does indeed seem like there is something to be gained, especially on 64 bits. Microbenchmarking shows improvements ranging from -50% (for numbers uniformly distributed in [0, 2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a more realistic distribution). On a larger scale, perf shows that top, one of the big consumers of /proc data, uses 0.5-1.0% fewer cpu cycles. I had to jump through some hoops to get the 32 bit code to compile and run on my 64 bit machine, so I'm not sure how relevant these numbers are, but just for comparison the microbenchmark showed improvements between -30% and -10%. The bloat-o-meter costs are around 150 bytes (the generated code is a little smaller, so it's not the full 200 bytes) on both 32 and 64 bit. I'm aware that extra cache misses won't show up in a microbenchmark as used above, but on the other hand decimal conversions often happen in bulk (for example in the case of top). I have of course tested that the new code generates the same output as the old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9 'random' numbers in-between. Test and verification code on github: https://github.com/Villemoes/dec. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Tested-by: Jeff Epler <jepler@unpythonic.net> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib: rename lib/find_next_bit.c to lib/find_bit.cYury Norov
This file contains implementation for all find_*_bit{,_le} So giving it more generic name looks reasonable. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mark Salter <msalter@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib: move find_last_bit to lib/find_next_bit.cYury Norov
Currently all 'find_*_bit' family is located in lib/find_next_bit.c, except 'find_last_bit', which is in lib/find_last_bit.c. It seems, there's no major benefit to have it separated. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mark Salter <msalter@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib: find_*_bit reimplementationYury Norov
This patchset does rework to find_bit function family to achieve better performance, and decrease size of text. All rework is done in patch 1. Patches 2 and 3 are about code moving and renaming. It was boot-tested on x86_64 and MIPS (big-endian) machines. Performance tests were ran on userspace with code like this: /* addr[] is filled from /dev/urandom */ start = clock(); while (ret < nbits) ret = find_next_bit(addr, nbits, ret + 1); end = clock(); printf("%ld\t", (unsigned long) end - start); On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for find_next_bit, nbits is 8M, for find_first_bit - 80K) find_next_bit: find_first_bit: new current new current 26932 43151 14777 14925 26947 43182 14521 15423 26507 43824 15053 14705 27329 43759 14473 14777 26895 43367 14847 15023 26990 43693 15103 15163 26775 43299 15067 15232 27282 42752 14544 15121 27504 43088 14644 14858 26761 43856 14699 15193 26692 43075 14781 14681 27137 42969 14451 15061 ... ... find_next_bit performance gain is 35-40%; find_first_bit - no measurable difference. On ARM machine, there is arch-specific implementation for find_bit. Thanks a lot to George Spelvin and Rasmus Villemoes for hints and helpful discussions. This patch (of 3): New implementations takes less space in source file (see diffstat) and in object. For me it's 710 vs 453 bytes of text. It also shows better performance. find_last_bit description fixed due to obvious typo. [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus] Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mark Salter <msalter@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is the usual grab bag of driver updates (lpfc, qla2xxx, storvsc, aacraid, ipr) plus an assortment of minor updates. There's also a major update to aic1542 which moves the driver into this millenium" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits) change SCSI Maintainer email sd, mmc, virtio_blk, string_helpers: fix block size units ufs: add support to allow non standard behaviours (quirks) ufs-qcom: save controller revision info in internal structure qla2xxx: Update driver version to 8.07.00.18-k qla2xxx: Restore physical port WWPN only, when port down detected for FA-WWPN port. qla2xxx: Fix virtual port configuration, when switch port is disabled/enabled. qla2xxx: Prevent multiple firmware dump collection for ISP27XX. qla2xxx: Disable Interrupt handshake for ISP27XX. qla2xxx: Add debugging info for MBX timeout. qla2xxx: Add serdes read/write support for ISP27XX qla2xxx: Add udev notification to save fw dump for ISP27XX qla2xxx: Add message for sucessful FW dump collected for ISP27XX. qla2xxx: Add support to load firmware from file for ISP 26XX/27XX. qla2xxx: Fix beacon blink for ISP27XX. qla2xxx: Increase the wait time for firmware to be ready for P3P. qla2xxx: Fix crash due to wrong casting of reg for ISP27XX. qla2xxx: Fix warnings reported by static checker. lpfc: Update version to 10.5.0.0 for upstream patch set lpfc: Update copyright to 2015 ...
2015-04-16sparc: Break up monolithic iommu table/lock into finer graularity pools and lockSowmini Varadhan
Investigation of multithreaded iperf experiments on an ethernet interface show the iommu->lock as the hottest lock identified by lockstat, with something of the order of 21M contentions out of 27M acquisitions, and an average wait time of 26 us for the lock. This is not efficient. A more scalable design is to follow the ppc model, where the iommu_table has multiple pools, each stretching over a segment of the map, and with a separate lock for each pool. This model allows for better parallelization of the iommu map search. This patch adds the iommu range alloc/free function infrastructure. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patchbomb from Andrew Morton: - the rest of MM - various misc bits - add ability to run /sbin/reboot at reboot time - printk/vsprintf changes - fiddle with seq_printf() return value * akpm: (114 commits) parisc: remove use of seq_printf return value lru_cache: remove use of seq_printf return value tracing: remove use of seq_printf return value cgroup: remove use of seq_printf return value proc: remove use of seq_printf return value s390: remove use of seq_printf return value cris fasttimer: remove use of seq_printf return value cris: remove use of seq_printf return value openrisc: remove use of seq_printf return value ARM: plat-pxa: remove use of seq_printf return value nios2: cpuinfo: remove use of seq_printf return value microblaze: mb: remove use of seq_printf return value ipc: remove use of seq_printf return value rtc: remove use of seq_printf return value power: wakeup: remove use of seq_printf return value x86: mtrr: if: remove use of seq_printf return value linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK MAINTAINERS: CREDITS: remove Stefano Brivio from B43 .mailmap: add Ricardo Ribalda CREDITS: add Ricardo Ribalda Delgado ...
2015-04-15lru_cache: remove use of seq_printf return valueJoe Perches
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/string_helpers.c: change semantics of string_escape_memRasmus Villemoes
The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/string_helpers.c: refactor string_escape_memRasmus Villemoes
When printf is given the format specifier %pE, it needs a way of obtaining the total output size that would be generated if the buffer was large enough, and string_escape_mem doesn't easily provide that. This is a refactorization of string_escape_mem in preparation of changing its external API to provide that information. The somewhat ugly early returns and subsequent seemingly redundant conditionals are to make the following patch touch as little as possible in string_helpers.c while still preserving the current behaviour of never outputting partial escape sequences. That behaviour must also change for %pE to work as one expects from every other printf specifier. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf.c: fix potential NULL deref in hex_stringRasmus Villemoes
The helper hex_string() is broken in two ways. First, it doesn't increment buf regardless of whether there is room to print, so callers such as kasprintf() that try to probe the correct storage to allocate will get a too small return value. But even worse, kasprintf() (and likely anyone else trying to find the size of the result) pass NULL for buf and 0 for size, so we also have end == NULL. But this means that the end-1 in hex_string() is (char*)-1, so buf < end-1 is true and we get a NULL pointer deref. I double-checked this with a trivial kernel module that just did a kasprintf(GFP_KERNEL, "%14ph", "CrashBoomBang"). Nobody seems to be using %ph with kasprintf, but we might as well fix it before it hits someone. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf: add %pC{,n,r} format specifiers for clocksGeert Uytterhoeven
Add format specifiers for printing struct clk: - '%pC' or '%pCn': name (Common Clock Framework) or address (legacy clock framework) of the clock, - '%pCr': rate of the clock. [akpm@linux-foundation.org: omit code if !CONFIG_HAVE_CLK] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf.c: another small hackRasmus Villemoes
Making ZEROPAD == '0'-' ', we can eliminate a few more instructions. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf.c: eliminate duplicate hex string arrayRasmus Villemoes
gcc doesn't merge or overlap const char[] objects with identical contents (probably language lawyers would also insist that these things have different addresses), but there's no reason to have the string "0123456789ABCDEF" occur in multiple places. hex_asc_upper is declared in kernel.h and defined in lib/hexdump.c, which is unconditionally compiled in. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf.c: reduce stack use in number()Rasmus Villemoes
At least since the initial git commit, when base was passed as a separate parameter, number() has only been called with bases 8, 10 and 16. I'm guessing that 66 was to accommodate 64 0/1, a sign and a '\0', but the buffer is only used for the actual digits. Octal digits carry 3 bits of information, so 24 is enough. Spell that 3*sizeof(num) so one less place needs to be changed should long long ever be 128 bits. Also remove the commented-out code that would handle an arbitrary base. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/vsprintf.c: eliminate some branchesRasmus Villemoes
Since FORMAT_TYPE_INT is simply 1 more than FORMAT_TYPE_UINT, and similarly for BYTE/UBYTE, SHORT/USHORT, LONG/ULONG, we can eliminate a few instructions by making SIGN have the value 1 instead of 2, and then use arithmetic instead of branches for computing the right spec->type. It's a little hacky, but certainly in the same spirit as SMALL needing to have the value 0x20. For example for the spec->qualifier == 'l' case, gcc now generates 75e: 0f b6 53 01 movzbl 0x1(%rbx),%edx 762: 83 e2 01 and $0x1,%edx 765: 83 c2 09 add $0x9,%edx 768: 88 13 mov %dl,(%rbx) instead of 763: 0f b6 53 01 movzbl 0x1(%rbx),%edx 767: 83 e2 02 and $0x2,%edx 76a: 80 fa 01 cmp $0x1,%dl 76d: 19 d2 sbb %edx,%edx 76f: 83 c2 0a add $0xa,%edx 772: 88 13 mov %dl,(%rbx) Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15lib/test-hexdump.c: fix initconst confusionAndi Kleen
const char *...[] is not const, but an array of pointer to const. So these arrays cannot be __initconst, but must be __initdata This fixes section conflicts with LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: "Here is the crypto update for 4.1: New interfaces: - user-space interface for AEAD - user-space interface for RNG (i.e., pseudo RNG) New hashes: - ARMv8 SHA1/256 - ARMv8 AES - ARMv8 GHASH - ARM assembler and NEON SHA256 - MIPS OCTEON SHA1/256/512 - MIPS img-hash SHA1/256 and MD5 - Power 8 VMX AES/CBC/CTR/GHASH - PPC assembler AES, SHA1/256 and MD5 - Broadcom IPROC RNG driver Cleanups/fixes: - prevent internal helper algos from being exposed to user-space - merge common code from assembly/C SHA implementations - misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (169 commits) crypto: arm - workaround for building with old binutils crypto: arm/sha256 - avoid sha256 code on ARMv7-M crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer crypto: arm/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer crypto: arm/sha1-ce - move SHA-1 ARMv8 implementation to base layer crypto: arm/sha1_neon - move SHA-1 NEON implementation to base layer crypto: arm/sha1 - move SHA-1 ARM asm implementation to base layer crypto: sha512-generic - move to generic glue implementation crypto: sha256-generic - move to generic glue implementation crypto: sha1-generic - move to generic glue implementation crypto: sha512 - implement base layer for SHA-512 crypto: sha256 - implement base layer for SHA-256 crypto: sha1 - implement base layer for SHA-1 crypto: api - remove instance when test failed crypto: api - Move alg ref count init to crypto_check_alg ...
2015-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...
2015-04-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patchbomb from Andrew Morton: - arch/sh updates - ocfs2 updates - kernel/watchdog feature - about half of mm/ * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits) Documentation: update arch list in the 'memtest' entry Kconfig: memtest: update number of test patterns up to 17 arm: add support for memtest arm64: add support for memtest memtest: use phys_addr_t for physical addresses mm: move memtest under mm mm, hugetlb: abort __get_user_pages if current has been oom killed mm, mempool: do not allow atomic resizing memcg: print cgroup information when system panics due to panic_on_oom mm: numa: remove migrate_ratelimited mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE mm: split ET_DYN ASLR from mmap ASLR s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE mm: expose arch_mmap_rnd when available s390: standardize mmap_rnd() usage powerpc: standardize mmap_rnd() usage mips: extract logic for mmap_rnd() arm64: standardize mmap_rnd() usage x86: standardize mmap_rnd() usage arm: factor out mmap ASLR into mmap_rnd ...
2015-04-14Kconfig: memtest: update number of test patterns up to 17Vladimir Murzin
Additional test patterns for memtest were introduced since commit 63823126c221 ("x86: memtest: add additional (regular) test patterns"), but looks like Kconfig was not updated that time. Update Kconfig entry with the actual number of maximum test patterns. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14mm: move memtest under mmVladimir Murzin
Memtest is a simple feature which fills the memory with a given set of patterns and validates memory contents, if bad memory regions is detected it reserves them via memblock API. Since memblock API is widely used by other architectures this feature can be enabled outside of x86 world. This patch set promotes memtest to live under generic mm umbrella and enables memtest feature for arm/arm64. It was reported that this patch set was useful for tracking down an issue with some errant DMA on an arm64 platform. This patch (of 6): There is nothing platform dependent in the core memtest code, so other platforms might benefit from this feature too. [linux@roeck-us.net: MEMTEST depends on MEMBLOCK] Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14x86, mm: support huge KVA mappings on x86Toshi Kani
Implement huge KVA mapping interfaces on x86. On x86, MTRRs can override PAT memory types with a 4KB granularity. When using a huge page, MTRRs can override the memory type of the huge page, which may lead a performance penalty. The processor can also behave in an undefined manner if a huge page is mapped to a memory range that MTRRs have mapped with multiple different memory types. Therefore, the mapping code falls back to use a smaller page size toward 4KB when a mapping range is covered by non-WB type of MTRRs. The WB type of MTRRs has no affect on the PAT memory types. pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a given range is covered by MTRRs. MTRR_TYPE_WRBACK indicates that the range is either covered by WB or not covered and the MTRR default value is set to WB. 0xFF indicates that MTRRs are disabled. HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set. X86_32 without X86_PAE is not supported since such config can unlikey be benefited from this feature, and there was an issue found in testing. [fengguang.wu@intel.com: ioremap_pud_capable can be static] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14mm: change ioremap to set up huge I/O mappingsToshi Kani
ioremap_pud_range() and ioremap_pmd_range() are changed to create huge I/O mappings when their capability is enabled, and a request meets required conditions -- both virtual & physical addresses are aligned by their huge page size, and a requested range fufills their huge page size. When pud_set_huge() or pmd_set_huge() returns zero, i.e. no-operation is performed, the code simply falls back to the next level. The changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on the architecture. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14lib/ioremap.c: add huge I/O map capability interfacesToshi Kani
Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when I/O mappings with pud/pmd are enabled on the kernel. ioremap_huge_init() calls arch_ioremap_pud_supported() and arch_ioremap_pmd_supported() to initialize the capabilities at boot-time. A new kernel option "nohugeiomap" is also added, so that user can disable the huge I/O map capabilities when necessary. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14Merge branch 'for-linus-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: "Part one: - struct filename-related cleanups - saner iov_iter_init() replacements (and switching the syscalls to use of those) - ntfs switch to ->write_iter() (Anton) - aio cleanups and splitting iocb into common and async parts (Christoph) - assorted fixes (me, bfields, Andrew Elble) There's a lot more, including the completion of switchover to ->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags race fixes, etc, but that goes after #for-davem merge. David has pulled it, and once it's in I'll send the next vfs pull request" * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits) sg_start_req(): use import_iovec() sg_start_req(): make sure that there's not too many elements in iovec blk_rq_map_user(): use import_single_range() sg_io(): use import_iovec() process_vm_access: switch to {compat_,}import_iovec() switch keyctl_instantiate_key_common() to iov_iter switch {compat_,}do_readv_writev() to {compat_,}import_iovec() aio_setup_vectored_rw(): switch to {compat_,}import_iovec() vmsplice_to_user(): switch to import_iovec() kill aio_setup_single_vector() aio: simplify arguments of aio_setup_..._rw() aio: lift iov_iter_init() into aio_setup_..._rw() lift iov_iter into {compat_,}do_readv_writev() NFS: fix BUG() crash in notify_change() with patch to chown_common() dcache: return -ESTALE not -EBUSY on distributed fs race NTFS: Version 2.1.32 - Update file write from aio_write to write_iter. VFS: Add iov_iter_fault_in_multipages_readable() drop bogus check in file_open_root() switch security_inode_getattr() to struct path * constify tomoyo_realpath_from_path() ...
2015-04-14Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...