From 3fdb38bd1f43eddf4483160544d267a1e4d40e62 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:52:44 -0800 Subject: cris: provide {in,out}[wl]_p() drivers/staging/comedi/drivers/das6402.c: In function 'intr_handler': drivers/staging/comedi/drivers/das6402.c:164:3: error: implicit declaration of function 'outw_p' [-Werror=implicit-function-declaration] drivers/staging/speakup/speakup_dtlk.c: In function 'synth_probe': drivers/staging/speakup/speakup_dtlk.c:362:2: error: implicit declaration of function 'inw_p' [-Werror=implicit-function-declaration] Signed-off-by: Geert Uytterhoeven Cc: Mikael Starvik Cc: Jesper Nilsson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/cris/include/asm/io.h b/arch/cris/include/asm/io.h index 4353cf2..e59dba1 100644 --- a/arch/cris/include/asm/io.h +++ b/arch/cris/include/asm/io.h @@ -169,7 +169,11 @@ static inline void outsl(unsigned int port, const void *addr, } #define inb_p(port) inb(port) +#define inw_p(port) inw(port) +#define inl_p(port) inl(port) #define outb_p(val, port) outb((val), (port)) +#define outw_p(val, port) outw((val), (port)) +#define outl_p(val, port) outl((val), (port)) /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem -- cgit v0.10.2 From 372c7209d6a05130b9d867f7ba350dec19e54030 Mon Sep 17 00:00:00 2001 From: Michal Simek Date: Thu, 23 Jan 2014 15:52:46 -0800 Subject: microblaze: extable: sort the exception table at build time Sort the exception table at build-time rather than during boot. Microblaze is the same case as AARCH64 that's why EM_MICROBLAZE conditional check was added to allow cross-compilation on machines which are not running the latest libc-dev. Inspired by AARCH64 commit adace89562c7 ("arm64: extable: sort the exception table at build time"). Signed-off-by: Michal Simek Acked-by: David Daney Cc: Catalin Marinas Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index e23cccd..8d581ab 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -30,6 +30,7 @@ config MICROBLAZE select MODULES_USE_ELF_RELA select CLONE_BACKWARDS3 select CLKSRC_OF + select BUILDTIME_EXTABLE_SORT config SWAP def_bool n diff --git a/scripts/sortextable.c b/scripts/sortextable.c index 7941fbd..cc49062 100644 --- a/scripts/sortextable.c +++ b/scripts/sortextable.c @@ -39,6 +39,10 @@ #define EM_AARCH64 183 #endif +#ifndef EM_MICROBLAZE +#define EM_MICROBLAZE 189 +#endif + static int fd_map; /* File descriptor for file being modified. */ static int mmap_failed; /* Boolean flag. */ static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */ @@ -275,6 +279,7 @@ do_file(char const *const fname) case EM_ARCOMPACT: case EM_ARM: case EM_AARCH64: + case EM_MICROBLAZE: case EM_MIPS: break; } /* end switch */ -- cgit v0.10.2 From 57ea8171d2bc245e22e760d2e4292da8e5a15175 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 23 Jan 2014 15:52:47 -0800 Subject: mm: documentation: remove hopelessly out-of-date locking doc Documentation/vm/locking is a blast from the past. In the entire git history, it has had precisely Three modifications. Two of those look to be pure renames, and the third was from 2005. The doc contains such gems as: > The page_table_lock is grabbed while holding the > kernel_lock spinning monitor. > Page stealers hold kernel_lock to protect against a bunch of > races. Or this which talks about mmap_sem: > 4. The exception to this rule is expand_stack, which just > takes the read lock and the page_table_lock, this is ok > because it doesn't really modify fields anybody relies on. expand_stack() doesn't take any locks any more directly, and the mmap_sem acquisition was long ago moved up in to the page fault code itself. It could be argued that we need to rewrite this, but it is dangerous to leave it as-is. It will confuse more people than it helps. Signed-off-by: Dave Hansen Cc: Hugh Dickins Acked-by: Vlastimil Babka Cc: Wanpeng Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/vm/locking b/Documentation/vm/locking deleted file mode 100644 index f61228b..0000000 --- a/Documentation/vm/locking +++ /dev/null @@ -1,130 +0,0 @@ -Started Oct 1999 by Kanoj Sarcar - -The intent of this file is to have an uptodate, running commentary -from different people about how locking and synchronization is done -in the Linux vm code. - -page_table_lock & mmap_sem --------------------------------------- - -Page stealers pick processes out of the process pool and scan for -the best process to steal pages from. To guarantee the existence -of the victim mm, a mm_count inc and a mmdrop are done in swap_out(). -Page stealers hold kernel_lock to protect against a bunch of races. -The vma list of the victim mm is also scanned by the stealer, -and the page_table_lock is used to preserve list sanity against the -process adding/deleting to the list. This also guarantees existence -of the vma. Vma existence is not guaranteed once try_to_swap_out() -drops the page_table_lock. To guarantee the existence of the underlying -file structure, a get_file is done before the swapout() method is -invoked. The page passed into swapout() is guaranteed not to be reused -for a different purpose because the page reference count due to being -present in the user's pte is not released till after swapout() returns. - -Any code that modifies the vmlist, or the vm_start/vm_end/ -vm_flags:VM_LOCKED/vm_next of any vma *in the list* must prevent -kswapd from looking at the chain. - -The rules are: -1. To scan the vmlist (look but don't touch) you must hold the - mmap_sem with read bias, i.e. down_read(&mm->mmap_sem) -2. To modify the vmlist you need to hold the mmap_sem with - read&write bias, i.e. down_write(&mm->mmap_sem) *AND* - you need to take the page_table_lock. -3. The swapper takes _just_ the page_table_lock, this is done - because the mmap_sem can be an extremely long lived lock - and the swapper just cannot sleep on that. -4. The exception to this rule is expand_stack, which just - takes the read lock and the page_table_lock, this is ok - because it doesn't really modify fields anybody relies on. -5. You must be able to guarantee that while holding page_table_lock - or page_table_lock of mm A, you will not try to get either lock - for mm B. - -The caveats are: -1. find_vma() makes use of, and updates, the mmap_cache pointer hint. -The update of mmap_cache is racy (page stealer can race with other code -that invokes find_vma with mmap_sem held), but that is okay, since it -is a hint. This can be fixed, if desired, by having find_vma grab the -page_table_lock. - - -Code that add/delete elements from the vmlist chain are -1. callers of insert_vm_struct -2. callers of merge_segments -3. callers of avl_remove - -Code that changes vm_start/vm_end/vm_flags:VM_LOCKED of vma's on -the list: -1. expand_stack -2. mprotect -3. mlock -4. mremap - -It is advisable that changes to vm_start/vm_end be protected, although -in some cases it is not really needed. Eg, vm_start is modified by -expand_stack(), it is hard to come up with a destructive scenario without -having the vmlist protection in this case. - -The page_table_lock nests with the inode i_mmap_mutex and the kmem cache -c_spinlock spinlocks. This is okay, since the kmem code asks for pages after -dropping c_spinlock. The page_table_lock also nests with pagecache_lock and -pagemap_lru_lock spinlocks, and no code asks for memory with these locks -held. - -The page_table_lock is grabbed while holding the kernel_lock spinning monitor. - -The page_table_lock is a spin lock. - -Note: PTL can also be used to guarantee that no new clones using the -mm start up ... this is a loose form of stability on mm_users. For -example, it is used in copy_mm to protect against a racing tlb_gather_mmu -single address space optimization, so that the zap_page_range (from -truncate) does not lose sending ipi's to cloned threads that might -be spawned underneath it and go to user mode to drag in pte's into tlbs. - -swap_lock --------------- -The swap devices are chained in priority order from the "swap_list" header. -The "swap_list" is used for the round-robin swaphandle allocation strategy. -The #free swaphandles is maintained in "nr_swap_pages". These two together -are protected by the swap_lock. - -The swap_lock also protects all the device reference counts on the -corresponding swaphandles, maintained in the "swap_map" array, and the -"highest_bit" and "lowest_bit" fields. - -The swap_lock is a spinlock, and is never acquired from intr level. - -To prevent races between swap space deletion or async readahead swapins -deciding whether a swap handle is being used, ie worthy of being read in -from disk, and an unmap -> swap_free making the handle unused, the swap -delete and readahead code grabs a temp reference on the swaphandle to -prevent warning messages from swap_duplicate <- read_swap_cache_async. - -Swap cache locking ------------------- -Pages are added into the swap cache with kernel_lock held, to make sure -that multiple pages are not being added (and hence lost) by associating -all of them with the same swaphandle. - -Pages are guaranteed not to be removed from the scache if the page is -"shared": ie, other processes hold reference on the page or the associated -swap handle. The only code that does not follow this rule is shrink_mmap, -which deletes pages from the swap cache if no process has a reference on -the page (multiple processes might have references on the corresponding -swap handle though). lookup_swap_cache() races with shrink_mmap, when -establishing a reference on a scache page, so, it must check whether the -page it located is still in the swapcache, or shrink_mmap deleted it. -(This race is due to the fact that shrink_mmap looks at the page ref -count with pagecache_lock, but then drops pagecache_lock before deleting -the page from the scache). - -do_wp_page and do_swap_page have MP races in them while trying to figure -out whether a page is "shared", by looking at the page_count + swap_count. -To preserve the sum of the counts, the page lock _must_ be acquired before -calling is_page_shared (else processes might switch their swap_count refs -to the page count refs, after the page count ref has been snapshotted). - -Swap device deletion code currently breaks all the scache assumptions, -since it grabs neither mmap_sem nor page_table_lock. -- cgit v0.10.2 From 12ab028be0008640de712ca890dc1a9ae224934d Mon Sep 17 00:00:00 2001 From: Dan Streetman Date: Thu, 23 Jan 2014 15:52:48 -0800 Subject: mm/zswap.c: change params from hidden to ro The "compressor" and "enabled" params are currently hidden, this changes them to read-only, so userspace can tell if zswap is enabled or not and see what compressor is in use. Signed-off-by: Dan Streetman Cc: Vladimir Murzin Cc: Bob Liu Cc: Minchan Kim Cc: Weijie Yang Acked-by: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/zswap.c b/mm/zswap.c index 5a63f78..e55bab9 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -77,12 +77,12 @@ static u64 zswap_duplicate_entry; **********************************/ /* Enable/disable zswap (disabled by default, fixed at boot for now) */ static bool zswap_enabled __read_mostly; -module_param_named(enabled, zswap_enabled, bool, 0); +module_param_named(enabled, zswap_enabled, bool, 0444); /* Compressor to be used by zswap (fixed at boot for now) */ #define ZSWAP_COMPRESSOR_DEFAULT "lzo" static char *zswap_compressor = ZSWAP_COMPRESSOR_DEFAULT; -module_param_named(compressor, zswap_compressor, charp, 0); +module_param_named(compressor, zswap_compressor, charp, 0444); /* The maximum percentage of memory that the compressed pool can occupy */ static unsigned int zswap_max_pool_percent = 20; -- cgit v0.10.2 From f0b791a34cb3cffd2bbc3ca4365c9b719fa2c9f3 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 23 Jan 2014 15:52:49 -0800 Subject: mm: print more details for bad_page() bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, *specifically* which flags it is complaining about, if it was a page flag combination which was the problem. [akpm@linux-foundation.org: switch to pr_alert] Signed-off-by: Dave Hansen Reviewed-by: Christoph Lameter Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/mm.h b/include/linux/mm.h index a512dd8..03bbcb8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2029,7 +2029,9 @@ extern void shake_page(struct page *p, int access); extern atomic_long_t num_poisoned_pages; extern int soft_offline_page(struct page *page, int flags); -extern void dump_page(struct page *page); +extern void dump_page(struct page *page, char *reason); +extern void dump_page_badflags(struct page *page, char *reason, + unsigned long badflags); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c index 07dbc8e..6e45a50 100644 --- a/mm/balloon_compaction.c +++ b/mm/balloon_compaction.c @@ -267,7 +267,7 @@ void balloon_page_putback(struct page *page) put_page(page); } else { WARN_ON(1); - dump_page(page); + dump_page(page, "not movable balloon page"); } unlock_page(page); } @@ -287,7 +287,7 @@ int balloon_page_migrate(struct page *newpage, BUG_ON(!trylock_page(newpage)); if (WARN_ON(!__is_movable_balloon_page(page))) { - dump_page(page); + dump_page(page, "not movable balloon page"); unlock_page(newpage); return rc; } diff --git a/mm/memory.c b/mm/memory.c index 86487df..71d70c0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -671,7 +671,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, current->comm, (long long)pte_val(pte), (long long)pmd_val(*pmd)); if (page) - dump_page(page); + dump_page(page, "bad pte"); printk(KERN_ALERT "addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n", (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index cc2ab37..a512a47 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1309,7 +1309,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) #ifdef CONFIG_DEBUG_VM printk(KERN_ALERT "removing pfn %lx from LRU failed\n", pfn); - dump_page(page); + dump_page(page, "failed to remove from LRU"); #endif put_page(page); /* Because we don't have big zone->lock. we should diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 533e214..1939f44 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -295,7 +295,7 @@ static inline int bad_range(struct zone *zone, struct page *page) } #endif -static void bad_page(struct page *page) +static void bad_page(struct page *page, char *reason, unsigned long bad_flags) { static unsigned long resume; static unsigned long nr_shown; @@ -329,7 +329,7 @@ static void bad_page(struct page *page) printk(KERN_ALERT "BUG: Bad page state in process %s pfn:%05lx\n", current->comm, page_to_pfn(page)); - dump_page(page); + dump_page_badflags(page, reason, bad_flags); print_modules(); dump_stack(); @@ -383,7 +383,7 @@ static int destroy_compound_page(struct page *page, unsigned long order) int bad = 0; if (unlikely(compound_order(page) != order)) { - bad_page(page); + bad_page(page, "wrong compound order", 0); bad++; } @@ -392,8 +392,11 @@ static int destroy_compound_page(struct page *page, unsigned long order) for (i = 1; i < nr_pages; i++) { struct page *p = page + i; - if (unlikely(!PageTail(p) || (p->first_page != page))) { - bad_page(page); + if (unlikely(!PageTail(p))) { + bad_page(page, "PageTail not set", 0); + bad++; + } else if (unlikely(p->first_page != page)) { + bad_page(page, "first_page not consistent", 0); bad++; } __ClearPageTail(p); @@ -618,12 +621,23 @@ out: static inline int free_pages_check(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_FREE)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; + bad_flags = PAGE_FLAGS_CHECK_AT_FREE; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } page_cpupid_reset_last(page); @@ -843,12 +857,23 @@ static inline void expand(struct zone *zone, struct page *page, */ static inline int check_new_page(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set"; + bad_flags = PAGE_FLAGS_CHECK_AT_PREP; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } return 0; @@ -6494,12 +6519,23 @@ static void dump_page_flags(unsigned long flags) printk(")\n"); } -void dump_page(struct page *page) +void dump_page_badflags(struct page *page, char *reason, unsigned long badflags) { printk(KERN_ALERT "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", page, atomic_read(&page->_count), page_mapcount(page), page->mapping, page->index); dump_page_flags(page->flags); + if (reason) + pr_alert("page dumped because: %s\n", reason); + if (page->flags & badflags) { + pr_alert("bad because of flags:\n"); + dump_page_flags(page->flags & badflags); + } mem_cgroup_print_bad_page(page); } + +void dump_page(struct page *page, char *reason) +{ + dump_page_badflags(page, reason, 0); +} -- cgit v0.10.2 From 01cc2e58697e34c6ee9a40fb6cebc18bf5a1923f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 23 Jan 2014 15:52:50 -0800 Subject: mm: munlock: fix potential race with THP page split Since commit ff6a6da60b89 ("mm: accelerate munlock() treatment of THP pages") munlock skips tail pages of a munlocked THP page. There is some attempt to prevent bad consequences of racing with a THP page split, but code inspection indicates that there are two problems that may lead to a non-fatal, yet wrong outcome. First, __split_huge_page_refcount() copies flags including PageMlocked from the head page to the tail pages. Clearing PageMlocked by munlock_vma_page() in the middle of this operation might result in part of tail pages left with PageMlocked flag. As the head page still appears to be a THP page until all tail pages are processed, munlock_vma_page() might think it munlocked the whole THP page and skip all the former tail pages. Before ff6a6da60, those pages would be cleared in further iterations of munlock_vma_pages_range(), but NR_MLOCK would still become undercounted (related the next point). Second, NR_MLOCK accounting is based on call to hpage_nr_pages() after the PageMlocked is cleared. The accounting might also become inconsistent due to race with __split_huge_page_refcount() - undercount when HUGE_PMD_NR is subtracted, but some tail pages are left with PageMlocked set and counted again (only possible before ff6a6da60) - overcount when hpage_nr_pages() sees a normal page (split has already finished), but the parallel split has meanwhile cleared PageMlocked from additional tail pages This patch prevents both problems via extending the scope of lru_lock in munlock_vma_page(). This is convenient because: - __split_huge_page_refcount() takes lru_lock for its whole operation - munlock_vma_page() typically takes lru_lock anyway for page isolation As this becomes a second function where page isolation is done with lru_lock already held, factor this out to a new __munlock_isolate_lru_page() function and clean up the code around. [akpm@linux-foundation.org: avoid a coding-style ugly] Signed-off-by: Vlastimil Babka Cc: Sasha Levin Cc: Michel Lespinasse Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mlock.c b/mm/mlock.c index 10819ed..b30adbe 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -91,6 +91,26 @@ void mlock_vma_page(struct page *page) } /* + * Isolate a page from LRU with optional get_page() pin. + * Assumes lru_lock already held and page already pinned. + */ +static bool __munlock_isolate_lru_page(struct page *page, bool getpage) +{ + if (PageLRU(page)) { + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, page_zone(page)); + if (getpage) + get_page(page); + ClearPageLRU(page); + del_page_from_lru_list(page, lruvec, page_lru(page)); + return true; + } + + return false; +} + +/* * Finish munlock after successful page isolation * * Page must be locked. This is a wrapper for try_to_munlock() @@ -126,9 +146,9 @@ static void __munlock_isolated_page(struct page *page) static void __munlock_isolation_failed(struct page *page) { if (PageUnevictable(page)) - count_vm_event(UNEVICTABLE_PGSTRANDED); + __count_vm_event(UNEVICTABLE_PGSTRANDED); else - count_vm_event(UNEVICTABLE_PGMUNLOCKED); + __count_vm_event(UNEVICTABLE_PGMUNLOCKED); } /** @@ -152,28 +172,34 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { unsigned int nr_pages; + struct zone *zone = page_zone(page); BUG_ON(!PageLocked(page)); - if (TestClearPageMlocked(page)) { - nr_pages = hpage_nr_pages(page); - mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (!isolate_lru_page(page)) - __munlock_isolated_page(page); - else - __munlock_isolation_failed(page); - } else { - nr_pages = hpage_nr_pages(page); - } - /* - * Regardless of the original PageMlocked flag, we determine nr_pages - * after touching the flag. This leaves a possible race with a THP page - * split, such that a whole THP page was munlocked, but nr_pages == 1. - * Returning a smaller mask due to that is OK, the worst that can - * happen is subsequent useless scanning of the former tail pages. - * The NR_MLOCK accounting can however become broken. + * Serialize with any parallel __split_huge_page_refcount() which + * might otherwise copy PageMlocked to part of the tail pages before + * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ + spin_lock_irq(&zone->lru_lock); + + nr_pages = hpage_nr_pages(page); + if (!TestClearPageMlocked(page)) + goto unlock_out; + + __mod_zone_page_state(zone, NR_MLOCK, -nr_pages); + + if (__munlock_isolate_lru_page(page, true)) { + spin_unlock_irq(&zone->lru_lock); + __munlock_isolated_page(page); + goto out; + } + __munlock_isolation_failed(page); + +unlock_out: + spin_unlock_irq(&zone->lru_lock); + +out: return nr_pages - 1; } @@ -310,34 +336,24 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) struct page *page = pvec->pages[i]; if (TestClearPageMlocked(page)) { - struct lruvec *lruvec; - int lru; - - if (PageLRU(page)) { - lruvec = mem_cgroup_page_lruvec(page, zone); - lru = page_lru(page); - /* - * We already have pin from follow_page_mask() - * so we can spare the get_page() here. - */ - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, lru); - } else { - __munlock_isolation_failed(page); - goto skip_munlock; - } - - } else { -skip_munlock: /* - * We won't be munlocking this page in the next phase - * but we still need to release the follow_page_mask() - * pin. We cannot do it under lru_lock however. If it's - * the last pin, __page_cache_release would deadlock. + * We already have pin from follow_page_mask() + * so we can spare the get_page() here. */ - pagevec_add(&pvec_putback, pvec->pages[i]); - pvec->pages[i] = NULL; + if (__munlock_isolate_lru_page(page, false)) + continue; + else + __munlock_isolation_failed(page); } + + /* + * We won't be munlocking this page in the next phase + * but we still need to release the follow_page_mask() + * pin. We cannot do it under lru_lock however. If it's + * the last pin, __page_cache_release() would deadlock. + */ + pagevec_add(&pvec_putback, pvec->pages[i]); + pvec->pages[i] = NULL; } delta_munlocked = -nr + pagevec_count(&pvec_putback); __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); -- cgit v0.10.2 From 8ff69e2c85f84b6b371e3c1d01207e73c0500125 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:52:52 -0800 Subject: memcg: do not use vmalloc for mem_cgroup allocations The vmalloc was introduced by 33327948782b ("memcgroup: use vmalloc for mem_cgroup allocation"), because at that time MAX_NUMNODES was used for defining the per-node array in the mem_cgroup structure so that the structure could be huge even if the system had the only NUMA node. The situation was significantly improved by commit 45cf7ebd5a03 ("memcg: reduce the size of struct memcg 244-fold"), which made the size of the mem_cgroup structure calculated dynamically depending on the real number of NUMA nodes installed on the system (nr_node_ids), so now there is no point in using vmalloc here: the structure is allocated rarely and on most systems its size is about 1K. Signed-off-by: Vladimir Davydov Acked-by: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 67dd2a8..7890ce9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -49,7 +49,6 @@ #include #include #include -#include #include #include #include @@ -381,12 +380,6 @@ struct mem_cgroup { /* WARNING: nodeinfo must be the last member here */ }; -static size_t memcg_size(void) -{ - return sizeof(struct mem_cgroup) + - nr_node_ids * sizeof(struct mem_cgroup_per_node *); -} - /* internal only representation about the status of kmem accounting. */ enum { KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */ @@ -6405,14 +6398,12 @@ static void free_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node) static struct mem_cgroup *mem_cgroup_alloc(void) { struct mem_cgroup *memcg; - size_t size = memcg_size(); + size_t size; - /* Can be very big if nr_node_ids is very big */ - if (size < PAGE_SIZE) - memcg = kzalloc(size, GFP_KERNEL); - else - memcg = vzalloc(size); + size = sizeof(struct mem_cgroup); + size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); + memcg = kzalloc(size, GFP_KERNEL); if (!memcg) return NULL; @@ -6423,10 +6414,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) return memcg; out_free: - if (size < PAGE_SIZE) - kfree(memcg); - else - vfree(memcg); + kfree(memcg); return NULL; } @@ -6444,7 +6432,6 @@ out_free: static void __mem_cgroup_free(struct mem_cgroup *memcg) { int node; - size_t size = memcg_size(); mem_cgroup_remove_from_trees(memcg); @@ -6465,10 +6452,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) * the cgroup_lock. */ disarm_static_keys(memcg); - if (size < PAGE_SIZE) - kfree(memcg); - else - vfree(memcg); + kfree(memcg); } /* -- cgit v0.10.2 From e3bba3c3c90cd434c1ccb9e5dc704a96baf9541c Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 23 Jan 2014 15:52:53 -0800 Subject: fs/proc/page.c: add PageAnon check to surely detect thp stable_page_flags() checks !PageHuge && PageTransCompound && PageLRU to know that a specified page is thp or not. But sometimes it's not enough and we fail to detect thp when the thp is on pagevec. This happens only for a few seconds after LRU list operations, but it makes it difficult to control our applications depending on this flag. So this patch adds another check PageAnon to detect thps on pagevec. It might not give the future extensibility for thp pagecache, but it's OK at least for now. Signed-off-by: Naoya Horiguchi Cc: David Rientjes Cc: KOSAKI Motohiro Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/page.c b/fs/proc/page.c index b8730d9..cab84b6 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -118,10 +118,12 @@ u64 stable_page_flags(struct page *page) /* * PageTransCompound can be true for non-huge compound pages (slab * pages or pages allocated by drivers with __GFP_COMP) because it - * just checks PG_head/PG_tail, so we need to check PageLRU to make - * sure a given page is a thp, not a non-huge compound page. + * just checks PG_head/PG_tail, so we need to check PageLRU/PageAnon + * to make sure a given page is a thp, not a non-huge compound page. */ - else if (PageTransCompound(page) && PageLRU(compound_trans_head(page))) + else if (PageTransCompound(page) && + (PageLRU(compound_trans_head(page)) || + PageAnon(compound_trans_head(page)))) u |= 1 << KPF_THP; /* -- cgit v0.10.2 From 309381feaee564281c3d9e90fbca8963bb7428ad Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Thu, 23 Jan 2014 15:52:54 -0800 Subject: mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE Most of the VM_BUG_ON assertions are performed on a page. Usually, when one of these assertions fails we'll get a BUG_ON with a call stack and the registers. I've recently noticed based on the requests to add a small piece of code that dumps the page to various VM_BUG_ON sites that the page dump is quite useful to people debugging issues in mm. This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what VM_BUG_ON() does, also dumps the page before executing the actual BUG_ON. [akpm@linux-foundation.org: fix up includes] Signed-off-by: Sasha Levin Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 0596e8e..207d9aef 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -108,8 +108,8 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, static inline void get_head_page_multiple(struct page *page, int nr) { - VM_BUG_ON(page != compound_head(page)); - VM_BUG_ON(page_count(page) == 0); + VM_BUG_ON_PAGE(page != compound_head(page), page); + VM_BUG_ON_PAGE(page_count(page) == 0, page); atomic_add(nr, &page->_count); SetPageReferenced(page); } @@ -135,7 +135,7 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr, head = pte_page(pte); page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); do { - VM_BUG_ON(compound_head(page) != head); + VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; if (PageTail(page)) get_huge_page_tail(page); @@ -212,7 +212,7 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr, head = pte_page(pte); page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); do { - VM_BUG_ON(compound_head(page) != head); + VM_BUG_ON_PAGE(compound_head(page) != head, page); pages[*nr] = page; if (PageTail(page)) get_huge_page_tail(page); diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 9b4dd49..0437439 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -1,6 +1,7 @@ #ifndef __LINUX_GFP_H #define __LINUX_GFP_H +#include #include #include #include diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d01cc97..8c43cc4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -2,6 +2,7 @@ #define _LINUX_HUGETLB_H #include +#include #include #include #include @@ -354,7 +355,7 @@ static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, static inline struct hstate *page_hstate(struct page *page) { - VM_BUG_ON(!PageHuge(page)); + VM_BUG_ON_PAGE(!PageHuge(page), page); return size_to_hstate(PAGE_SIZE << compound_order(page)); } diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index ce8217f..787bba3 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -15,6 +15,7 @@ #ifndef _LINUX_HUGETLB_CGROUP_H #define _LINUX_HUGETLB_CGROUP_H +#include #include struct hugetlb_cgroup; @@ -28,7 +29,7 @@ struct hugetlb_cgroup; static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { - VM_BUG_ON(!PageHuge(page)); + VM_BUG_ON_PAGE(!PageHuge(page), page); if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) return NULL; @@ -38,7 +39,7 @@ static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) static inline int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) { - VM_BUG_ON(!PageHuge(page)); + VM_BUG_ON_PAGE(!PageHuge(page), page); if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) return -1; diff --git a/include/linux/mm.h b/include/linux/mm.h index 03bbcb8..d9992fc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -5,6 +5,7 @@ #ifdef __KERNEL__ +#include #include #include #include @@ -303,7 +304,7 @@ static inline int get_freepage_migratetype(struct page *page) */ static inline int put_page_testzero(struct page *page) { - VM_BUG_ON(atomic_read(&page->_count) == 0); + VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0, page); return atomic_dec_and_test(&page->_count); } @@ -364,7 +365,7 @@ static inline int is_vmalloc_or_module_addr(const void *x) static inline void compound_lock(struct page *page) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE - VM_BUG_ON(PageSlab(page)); + VM_BUG_ON_PAGE(PageSlab(page), page); bit_spin_lock(PG_compound_lock, &page->flags); #endif } @@ -372,7 +373,7 @@ static inline void compound_lock(struct page *page) static inline void compound_unlock(struct page *page) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE - VM_BUG_ON(PageSlab(page)); + VM_BUG_ON_PAGE(PageSlab(page), page); bit_spin_unlock(PG_compound_lock, &page->flags); #endif } @@ -447,7 +448,7 @@ static inline bool __compound_tail_refcounted(struct page *page) */ static inline bool compound_tail_refcounted(struct page *page) { - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); return __compound_tail_refcounted(page); } @@ -456,9 +457,9 @@ static inline void get_huge_page_tail(struct page *page) /* * __split_huge_page_refcount() cannot run from under us. */ - VM_BUG_ON(!PageTail(page)); - VM_BUG_ON(page_mapcount(page) < 0); - VM_BUG_ON(atomic_read(&page->_count) != 0); + VM_BUG_ON_PAGE(!PageTail(page), page); + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); + VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page); if (compound_tail_refcounted(page->first_page)) atomic_inc(&page->_mapcount); } @@ -474,7 +475,7 @@ static inline void get_page(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON(atomic_read(&page->_count) <= 0); + VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); atomic_inc(&page->_count); } @@ -511,13 +512,13 @@ static inline int PageBuddy(struct page *page) static inline void __SetPageBuddy(struct page *page) { - VM_BUG_ON(atomic_read(&page->_mapcount) != -1); + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE); } static inline void __ClearPageBuddy(struct page *page) { - VM_BUG_ON(!PageBuddy(page)); + VM_BUG_ON_PAGE(!PageBuddy(page), page); atomic_set(&page->_mapcount, -1); } @@ -1401,7 +1402,7 @@ static inline bool ptlock_init(struct page *page) * slab code uses page->slab_cache and page->first_page (for tail * pages), which share storage with page->ptl. */ - VM_BUG_ON(*(unsigned long *)&page->ptl); + VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); if (!ptlock_alloc(page)) return false; spin_lock_init(ptlock_ptr(page)); @@ -1492,7 +1493,7 @@ static inline bool pgtable_pmd_page_ctor(struct page *page) static inline void pgtable_pmd_page_dtor(struct page *page) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE - VM_BUG_ON(page->pmd_huge_pte); + VM_BUG_ON_PAGE(page->pmd_huge_pte, page); #endif ptlock_free(page); } @@ -2029,10 +2030,6 @@ extern void shake_page(struct page *p, int access); extern atomic_long_t num_poisoned_pages; extern int soft_offline_page(struct page *page, int flags); -extern void dump_page(struct page *page, char *reason); -extern void dump_page_badflags(struct page *page, char *reason, - unsigned long badflags); - #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr, diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 580bd58..5042c03 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -1,10 +1,19 @@ #ifndef LINUX_MM_DEBUG_H #define LINUX_MM_DEBUG_H 1 +struct page; + +extern void dump_page(struct page *page, char *reason); +extern void dump_page_badflags(struct page *page, char *reason, + unsigned long badflags); + #ifdef CONFIG_DEBUG_VM #define VM_BUG_ON(cond) BUG_ON(cond) +#define VM_BUG_ON_PAGE(cond, page) \ + do { if (unlikely(cond)) { dump_page(page, NULL); BUG(); } } while (0) #else #define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond) +#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond) #endif #ifdef CONFIG_DEBUG_VIRTUAL diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 98ada58..e464b4e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -412,7 +412,7 @@ static inline void ClearPageCompound(struct page *page) */ static inline int PageTransHuge(struct page *page) { - VM_BUG_ON(PageTail(page)); + VM_BUG_ON_PAGE(PageTail(page), page); return PageHead(page); } @@ -460,25 +460,25 @@ static inline int PageTransTail(struct page *page) */ static inline int PageSlabPfmemalloc(struct page *page) { - VM_BUG_ON(!PageSlab(page)); + VM_BUG_ON_PAGE(!PageSlab(page), page); return PageActive(page); } static inline void SetPageSlabPfmemalloc(struct page *page) { - VM_BUG_ON(!PageSlab(page)); + VM_BUG_ON_PAGE(!PageSlab(page), page); SetPageActive(page); } static inline void __ClearPageSlabPfmemalloc(struct page *page) { - VM_BUG_ON(!PageSlab(page)); + VM_BUG_ON_PAGE(!PageSlab(page), page); __ClearPageActive(page); } static inline void ClearPageSlabPfmemalloc(struct page *page) { - VM_BUG_ON(!PageSlab(page)); + VM_BUG_ON_PAGE(!PageSlab(page), page); ClearPageActive(page); } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index e3dea75..1710d1b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -162,7 +162,7 @@ static inline int page_cache_get_speculative(struct page *page) * disabling preempt, and hence no need for the "speculative get" that * SMP requires. */ - VM_BUG_ON(page_count(page) == 0); + VM_BUG_ON_PAGE(page_count(page) == 0, page); atomic_inc(&page->_count); #else @@ -175,7 +175,7 @@ static inline int page_cache_get_speculative(struct page *page) return 0; } #endif - VM_BUG_ON(PageTail(page)); + VM_BUG_ON_PAGE(PageTail(page), page); return 1; } @@ -191,14 +191,14 @@ static inline int page_cache_add_speculative(struct page *page, int count) # ifdef CONFIG_PREEMPT_COUNT VM_BUG_ON(!in_atomic()); # endif - VM_BUG_ON(page_count(page) == 0); + VM_BUG_ON_PAGE(page_count(page) == 0, page); atomic_add(count, &page->_count); #else if (unlikely(!atomic_add_unless(&page->_count, count, 0))) return 0; #endif - VM_BUG_ON(PageCompound(page) && page != compound_head(page)); + VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page); return 1; } @@ -210,7 +210,7 @@ static inline int page_freeze_refs(struct page *page, int count) static inline void page_unfreeze_refs(struct page *page, int count) { - VM_BUG_ON(page_count(page) != 0); + VM_BUG_ON_PAGE(page_count(page) != 0, page); VM_BUG_ON(count == 0); atomic_set(&page->_count, count); diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 9e4761c..e3817d2 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -1,6 +1,7 @@ #ifndef __LINUX_PERCPU_H #define __LINUX_PERCPU_H +#include #include #include #include diff --git a/mm/cleancache.c b/mm/cleancache.c index 5875f48..d0eac43 100644 --- a/mm/cleancache.c +++ b/mm/cleancache.c @@ -237,7 +237,7 @@ int __cleancache_get_page(struct page *page) goto out; } - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); fake_pool_id = page->mapping->host->i_sb->cleancache_poolid; if (fake_pool_id < 0) goto out; @@ -279,7 +279,7 @@ void __cleancache_put_page(struct page *page) return; } - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); fake_pool_id = page->mapping->host->i_sb->cleancache_poolid; if (fake_pool_id < 0) return; @@ -318,7 +318,7 @@ void __cleancache_invalidate_page(struct address_space *mapping, if (pool_id < 0) return; - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); if (cleancache_get_key(mapping->host, &key) >= 0) { cleancache_ops->invalidate_page(pool_id, key, page->index); diff --git a/mm/compaction.c b/mm/compaction.c index 3a91a2e..e0ab02d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -601,7 +601,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, if (__isolate_lru_page(page, mode) != 0) continue; - VM_BUG_ON(PageTransCompound(page)); + VM_BUG_ON_PAGE(PageTransCompound(page), page); /* Successfully isolated */ cc->finished_update_migrate = true; diff --git a/mm/filemap.c b/mm/filemap.c index b7749a9..7a7f3e0 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -409,9 +409,9 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) { int error; - VM_BUG_ON(!PageLocked(old)); - VM_BUG_ON(!PageLocked(new)); - VM_BUG_ON(new->mapping); + VM_BUG_ON_PAGE(!PageLocked(old), old); + VM_BUG_ON_PAGE(!PageLocked(new), new); + VM_BUG_ON_PAGE(new->mapping, new); error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); if (!error) { @@ -461,8 +461,8 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, { int error; - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(PageSwapBacked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageSwapBacked(page), page); error = mem_cgroup_cache_charge(page, current->mm, gfp_mask & GFP_RECLAIM_MASK); @@ -607,7 +607,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue); */ void unlock_page(struct page *page) { - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); clear_bit_unlock(PG_locked, &page->flags); smp_mb__after_clear_bit(); wake_up_page(page, PG_locked); @@ -760,7 +760,7 @@ repeat: page_cache_release(page); goto repeat; } - VM_BUG_ON(page->index != offset); + VM_BUG_ON_PAGE(page->index != offset, page); } return page; } @@ -1656,7 +1656,7 @@ retry_find: put_page(page); goto retry_find; } - VM_BUG_ON(page->index != offset); + VM_BUG_ON_PAGE(page->index != offset, page); /* * We have a locked page in the page cache, now we need to check diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 95d1acb..25fab71 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -712,7 +712,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, pgtable_t pgtable; spinlock_t *ptl; - VM_BUG_ON(!PageCompound(page)); + VM_BUG_ON_PAGE(!PageCompound(page), page); pgtable = pte_alloc_one(mm, haddr); if (unlikely(!pgtable)) return VM_FAULT_OOM; @@ -893,7 +893,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, goto out; } src_page = pmd_page(pmd); - VM_BUG_ON(!PageHead(src_page)); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); get_page(src_page); page_dup_rmap(src_page); add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -1067,7 +1067,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_same(*pmd, orig_pmd))) goto out_free_pages; - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); pmdp_clear_flush(vma, haddr, pmd); /* leave pmd empty until pte is filled */ @@ -1133,7 +1133,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unlock; page = pmd_page(orig_pmd); - VM_BUG_ON(!PageCompound(page) || !PageHead(page)); + VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page); if (page_mapcount(page) == 1) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); @@ -1211,7 +1211,7 @@ alloc: add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); put_huge_zero_page(); } else { - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); page_remove_rmap(page); put_page(page); } @@ -1249,7 +1249,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, goto out; page = pmd_page(*pmd); - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); if (flags & FOLL_TOUCH) { pmd_t _pmd; /* @@ -1274,7 +1274,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, } } page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; - VM_BUG_ON(!PageCompound(page)); + VM_BUG_ON_PAGE(!PageCompound(page), page); if (flags & FOLL_GET) get_page_foll(page); @@ -1432,9 +1432,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, } else { page = pmd_page(orig_pmd); page_remove_rmap(page); - VM_BUG_ON(page_mapcount(page) < 0); + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); atomic_long_dec(&tlb->mm->nr_ptes); spin_unlock(ptl); tlb_remove_page(tlb, page); @@ -2176,9 +2176,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!page)) goto out; - VM_BUG_ON(PageCompound(page)); - BUG_ON(!PageAnon(page)); - VM_BUG_ON(!PageSwapBacked(page)); + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageSwapBacked(page), page); /* cannot use mapcount: can't collapse if there's a gup pin */ if (page_count(page) != 1) @@ -2201,8 +2201,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } /* 0 stands for page_is_file_cache(page) == false */ inc_zone_page_state(page, NR_ISOLATED_ANON + 0); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageLRU(page), page); /* If there is no mapped pte young don't collapse the page */ if (pte_young(pteval) || PageReferenced(page) || @@ -2232,7 +2232,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); - VM_BUG_ON(page_mapcount(src_page) != 1); + VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); release_pte_page(src_page); /* * ptl mostly unnecessary, but preempt has to @@ -2311,7 +2311,7 @@ static struct page struct vm_area_struct *vma, unsigned long address, int node) { - VM_BUG_ON(*hpage); + VM_BUG_ON_PAGE(*hpage, *hpage); /* * Allocate the page while the vma is still valid and under * the mmap_sem read mode so there is no memory allocation @@ -2580,7 +2580,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, */ node = page_to_nid(page); khugepaged_node_load[node]++; - VM_BUG_ON(PageCompound(page)); + VM_BUG_ON_PAGE(PageCompound(page), page); if (!PageLRU(page) || PageLocked(page) || !PageAnon(page)) goto out_unmap; /* cannot use mapcount: can't collapse if there's a gup pin */ @@ -2876,7 +2876,7 @@ again: return; } page = pmd_page(*pmd); - VM_BUG_ON(!page_count(page)); + VM_BUG_ON_PAGE(!page_count(page), page); get_page(page); spin_unlock(ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 04306b9..c01cb9f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -584,7 +584,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_reserved | 1 << PG_private | 1 << PG_writeback); } - VM_BUG_ON(hugetlb_cgroup_from_page(page)); + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); set_compound_page_dtor(page, NULL); set_page_refcounted(page); arch_release_hugepage(page); @@ -1089,7 +1089,7 @@ retry: * no users -- drop the buddy allocator's reference. */ put_page_testzero(page); - VM_BUG_ON(page_count(page)); + VM_BUG_ON_PAGE(page_count(page), page); enqueue_huge_page(h, page); } free: @@ -3503,7 +3503,7 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage) bool isolate_huge_page(struct page *page, struct list_head *list) { - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); if (!get_page_unless_zero(page)) return false; spin_lock(&hugetlb_lock); @@ -3514,7 +3514,7 @@ bool isolate_huge_page(struct page *page, struct list_head *list) void putback_active_hugepage(struct page *page) { - VM_BUG_ON(!PageHead(page)); + VM_BUG_ON_PAGE(!PageHead(page), page); spin_lock(&hugetlb_lock); list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); spin_unlock(&hugetlb_lock); @@ -3523,7 +3523,7 @@ void putback_active_hugepage(struct page *page) bool is_hugepage_active(struct page *page) { - VM_BUG_ON(!PageHuge(page)); + VM_BUG_ON_PAGE(!PageHuge(page), page); /* * This function can be called for a tail page because the caller, * scan_movable_pages, scans through a given pfn-range which typically diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index d747a84..cb00829 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -390,7 +390,7 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) if (hugetlb_cgroup_disabled()) return; - VM_BUG_ON(!PageHuge(oldhpage)); + VM_BUG_ON_PAGE(!PageHuge(oldhpage), oldhpage); spin_lock(&hugetlb_lock); h_cg = hugetlb_cgroup_from_page(oldhpage); set_hugetlb_cgroup(oldhpage, NULL); diff --git a/mm/internal.h b/mm/internal.h index a346ba1..dc95e97 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -27,8 +27,8 @@ static inline void set_page_count(struct page *page, int v) */ static inline void set_page_refcounted(struct page *page) { - VM_BUG_ON(PageTail(page)); - VM_BUG_ON(atomic_read(&page->_count)); + VM_BUG_ON_PAGE(PageTail(page), page); + VM_BUG_ON_PAGE(atomic_read(&page->_count), page); set_page_count(page, 1); } @@ -46,7 +46,7 @@ static inline void __get_page_tail_foll(struct page *page, * speculative page access (like in * page_cache_get_speculative()) on tail pages. */ - VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0); + VM_BUG_ON_PAGE(atomic_read(&page->first_page->_count) <= 0, page); if (get_page_head) atomic_inc(&page->first_page->_count); get_huge_page_tail(page); @@ -71,7 +71,7 @@ static inline void get_page_foll(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_count. */ - VM_BUG_ON(atomic_read(&page->_count) <= 0); + VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page); atomic_inc(&page->_count); } } @@ -173,7 +173,7 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma) static inline int mlocked_vma_newpage(struct vm_area_struct *vma, struct page *page) { - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) return 0; diff --git a/mm/ksm.c b/mm/ksm.c index 3df141e..f91ddf5 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1898,13 +1898,13 @@ int rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc) int ret = SWAP_AGAIN; int search_new_forks = 0; - VM_BUG_ON(!PageKsm(page)); + VM_BUG_ON_PAGE(!PageKsm(page), page); /* * Rely on the page lock to protect against concurrent modifications * to that page's node of the stable tree. */ - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); stable_node = page_stable_node(page); if (!stable_node) @@ -1958,13 +1958,13 @@ void ksm_migrate_page(struct page *newpage, struct page *oldpage) { struct stable_node *stable_node; - VM_BUG_ON(!PageLocked(oldpage)); - VM_BUG_ON(!PageLocked(newpage)); - VM_BUG_ON(newpage->mapping != oldpage->mapping); + VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage); + VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); + VM_BUG_ON_PAGE(newpage->mapping != oldpage->mapping, newpage); stable_node = page_stable_node(newpage); if (stable_node) { - VM_BUG_ON(stable_node->kpfn != page_to_pfn(oldpage)); + VM_BUG_ON_PAGE(stable_node->kpfn != page_to_pfn(oldpage), oldpage); stable_node->kpfn = page_to_pfn(newpage); /* * newpage->mapping was set in advance; now we need smp_wmb() diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7890ce9..72f2d90 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2897,7 +2897,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) unsigned short id; swp_entry_t ent; - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); pc = lookup_page_cgroup(page); lock_page_cgroup(pc); @@ -2931,7 +2931,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg, bool anon; lock_page_cgroup(pc); - VM_BUG_ON(PageCgroupUsed(pc)); + VM_BUG_ON_PAGE(PageCgroupUsed(pc), page); /* * we don't need page_cgroup_lock about tail pages, becase they are not * accessed by any other context at this point. @@ -2966,7 +2966,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg, if (lrucare) { if (was_on_lru) { lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); } @@ -3780,7 +3780,7 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order) if (!memcg) return; - VM_BUG_ON(mem_cgroup_is_root(memcg)); + VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); memcg_uncharge_kmem(memcg, PAGE_SIZE << order); } #else @@ -3859,7 +3859,7 @@ static int mem_cgroup_move_account(struct page *page, bool anon = PageAnon(page); VM_BUG_ON(from == to); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); /* * The page is isolated from LRU. So, collapse function * will not handle this page. But page splitting can happen. @@ -3952,7 +3952,7 @@ static int mem_cgroup_move_parent(struct page *page, parent = root_mem_cgroup; if (nr_pages > 1) { - VM_BUG_ON(!PageTransHuge(page)); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); flags = compound_lock_irqsave(page); } @@ -3986,7 +3986,7 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm, if (PageTransHuge(page)) { nr_pages <<= compound_order(page); - VM_BUG_ON(!PageTransHuge(page)); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* * Never OOM-kill a process for a huge page. The * fault handler will fall back to regular pages. @@ -4006,8 +4006,8 @@ int mem_cgroup_newpage_charge(struct page *page, { if (mem_cgroup_disabled()) return 0; - VM_BUG_ON(page_mapped(page)); - VM_BUG_ON(page->mapping && !PageAnon(page)); + VM_BUG_ON_PAGE(page_mapped(page), page); + VM_BUG_ON_PAGE(page->mapping && !PageAnon(page), page); VM_BUG_ON(!mm); return mem_cgroup_charge_common(page, mm, gfp_mask, MEM_CGROUP_CHARGE_TYPE_ANON); @@ -4211,7 +4211,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype, if (PageTransHuge(page)) { nr_pages <<= compound_order(page); - VM_BUG_ON(!PageTransHuge(page)); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); } /* * Check if our page_cgroup is valid @@ -4303,7 +4303,7 @@ void mem_cgroup_uncharge_page(struct page *page) /* early check. */ if (page_mapped(page)) return; - VM_BUG_ON(page->mapping && !PageAnon(page)); + VM_BUG_ON_PAGE(page->mapping && !PageAnon(page), page); /* * If the page is in swap cache, uncharge should be deferred * to the swap path, which also properly accounts swap usage @@ -4323,8 +4323,8 @@ void mem_cgroup_uncharge_page(struct page *page) void mem_cgroup_uncharge_cache_page(struct page *page) { - VM_BUG_ON(page_mapped(page)); - VM_BUG_ON(page->mapping); + VM_BUG_ON_PAGE(page_mapped(page), page); + VM_BUG_ON_PAGE(page->mapping, page); __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE, false); } @@ -6880,7 +6880,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, enum mc_target_type ret = MC_TARGET_NONE; page = pmd_page(pmd); - VM_BUG_ON(!page || !PageHead(page)); + VM_BUG_ON_PAGE(!page || !PageHead(page), page); if (!move_anon()) return ret; pc = lookup_page_cgroup(page); diff --git a/mm/memory.c b/mm/memory.c index 71d70c0..be6a0c0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -289,7 +289,7 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page) return 0; batch = tlb->active; } - VM_BUG_ON(batch->nr > batch->max); + VM_BUG_ON_PAGE(batch->nr > batch->max, page); return batch->max - batch->nr; } @@ -2702,7 +2702,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, goto unwritable_page; } } else - VM_BUG_ON(!PageLocked(old_page)); + VM_BUG_ON_PAGE(!PageLocked(old_page), old_page); /* * Since we dropped the lock we need to revalidate @@ -3358,7 +3358,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (unlikely(!(ret & VM_FAULT_LOCKED))) lock_page(vmf.page); else - VM_BUG_ON(!PageLocked(vmf.page)); + VM_BUG_ON_PAGE(!PageLocked(vmf.page), vmf.page); /* * Should we do an early C-O-W break? @@ -3395,7 +3395,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, goto unwritable_page; } } else - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); page_mkwrite = 1; } } diff --git a/mm/migrate.c b/mm/migrate.c index a8025be..4b3996e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -499,7 +499,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) if (PageUptodate(page)) SetPageUptodate(newpage); if (TestClearPageActive(page)) { - VM_BUG_ON(PageUnevictable(page)); + VM_BUG_ON_PAGE(PageUnevictable(page), page); SetPageActive(newpage); } else if (TestClearPageUnevictable(page)) SetPageUnevictable(newpage); @@ -871,7 +871,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * free the metadata, so the page can be freed. */ if (!page->mapping) { - VM_BUG_ON(PageAnon(page)); + VM_BUG_ON_PAGE(PageAnon(page), page); if (page_has_private(page)) { try_to_free_buffers(page); goto uncharge; @@ -1618,7 +1618,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { int page_lru; - VM_BUG_ON(compound_order(page) && !PageTransHuge(page)); + VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ if (!migrate_balanced_pgdat(pgdat, 1UL << compound_order(page))) diff --git a/mm/mlock.c b/mm/mlock.c index b30adbe..4e1a6816 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -279,8 +279,8 @@ static int __mlock_posix_error_return(long retval) static bool __putback_lru_fast_prepare(struct page *page, struct pagevec *pvec, int *pgrescued) { - VM_BUG_ON(PageLRU(page)); - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); if (page_mapcount(page) <= 1 && page_evictable(page)) { pagevec_add(pvec, page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1939f44..f18f016 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -509,12 +509,12 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; if (page_is_guard(buddy) && page_order(buddy) == order) { - VM_BUG_ON(page_count(buddy) != 0); + VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy); return 1; } if (PageBuddy(buddy) && page_order(buddy) == order) { - VM_BUG_ON(page_count(buddy) != 0); + VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy); return 1; } return 0; @@ -564,8 +564,8 @@ static inline void __free_one_page(struct page *page, page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1); - VM_BUG_ON(page_idx & ((1 << order) - 1)); - VM_BUG_ON(bad_range(zone, page)); + VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page); + VM_BUG_ON_PAGE(bad_range(zone, page), page); while (order < MAX_ORDER-1) { buddy_idx = __find_buddy_index(page_idx, order); @@ -827,7 +827,7 @@ static inline void expand(struct zone *zone, struct page *page, area--; high--; size >>= 1; - VM_BUG_ON(bad_range(zone, &page[size])); + VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); #ifdef CONFIG_DEBUG_PAGEALLOC if (high < debug_guardpage_minorder()) { @@ -980,7 +980,7 @@ int move_freepages(struct zone *zone, for (page = start_page; page <= end_page;) { /* Make sure we are not inadvertently changing nodes */ - VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); if (!pfn_valid_within(page_to_pfn(page))) { page++; @@ -1429,8 +1429,8 @@ void split_page(struct page *page, unsigned int order) { int i; - VM_BUG_ON(PageCompound(page)); - VM_BUG_ON(!page_count(page)); + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!page_count(page), page); #ifdef CONFIG_KMEMCHECK /* @@ -1577,7 +1577,7 @@ again: zone_statistics(preferred_zone, zone, gfp_flags); local_irq_restore(flags); - VM_BUG_ON(bad_range(zone, page)); + VM_BUG_ON_PAGE(bad_range(zone, page), page); if (prep_new_page(page, order, gfp_flags)) goto again; return page; @@ -6021,7 +6021,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, pfn = page_to_pfn(page); bitmap = get_pageblock_bitmap(zone, pfn); bitidx = pfn_to_bitidx(zone, pfn); - VM_BUG_ON(!zone_spans_pfn(zone, pfn)); + VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page); for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) if (flags & value) @@ -6539,3 +6539,4 @@ void dump_page(struct page *page, char *reason) { dump_page_badflags(page, reason, 0); } +EXPORT_SYMBOL_GPL(dump_page); diff --git a/mm/page_io.c b/mm/page_io.c index 8c79a47..7247be6 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -320,8 +320,8 @@ int swap_readpage(struct page *page) int ret = 0; struct swap_info_struct *sis = page_swap_info(page); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(PageUptodate(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageUptodate(page), page); if (frontswap_load(page) == 0) { SetPageUptodate(page); unlock_page(page); diff --git a/mm/rmap.c b/mm/rmap.c index 962e2a1..2dcd335 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -894,9 +894,9 @@ void page_move_anon_rmap(struct page *page, { struct anon_vma *anon_vma = vma->anon_vma; - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON(!anon_vma); - VM_BUG_ON(page->index != linear_page_index(vma, address)); + VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page); anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; page->mapping = (struct address_space *) anon_vma; @@ -995,7 +995,7 @@ void do_page_add_anon_rmap(struct page *page, if (unlikely(PageKsm(page))) return; - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); /* address might be in next vma when migration races vma_adjust */ if (first) __page_set_anon_rmap(page, vma, address, exclusive); @@ -1481,7 +1481,7 @@ int try_to_unmap(struct page *page, enum ttu_flags flags) .anon_lock = page_lock_anon_vma_read, }; - VM_BUG_ON(!PageHuge(page) && PageTransHuge(page)); + VM_BUG_ON_PAGE(!PageHuge(page) && PageTransHuge(page), page); /* * During exec, a temporary VMA is setup and later moved. @@ -1533,7 +1533,7 @@ int try_to_munlock(struct page *page) }; - VM_BUG_ON(!PageLocked(page) || PageLRU(page)); + VM_BUG_ON_PAGE(!PageLocked(page) || PageLRU(page), page); ret = rmap_walk(page, &rwc); return ret; diff --git a/mm/shmem.c b/mm/shmem.c index 902a148..8156f95 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -285,8 +285,8 @@ static int shmem_add_to_page_cache(struct page *page, { int error; - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(!PageSwapBacked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageSwapBacked(page), page); page_cache_get(page); page->mapping = mapping; @@ -491,7 +491,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, continue; if (!unfalloc || !PageUptodate(page)) { if (page->mapping == mapping) { - VM_BUG_ON(PageWriteback(page)); + VM_BUG_ON_PAGE(PageWriteback(page), page); truncate_inode_page(mapping, page); } } @@ -568,7 +568,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, lock_page(page); if (!unfalloc || !PageUptodate(page)) { if (page->mapping == mapping) { - VM_BUG_ON(PageWriteback(page)); + VM_BUG_ON_PAGE(PageWriteback(page), page); truncate_inode_page(mapping, page); } } diff --git a/mm/slub.c b/mm/slub.c index 545a170..34bb8c6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1559,7 +1559,7 @@ static inline void *acquire_slab(struct kmem_cache *s, new.freelist = freelist; } - VM_BUG_ON(new.frozen); + VM_BUG_ON_PAGE(new.frozen, &new); new.frozen = 1; if (!__cmpxchg_double_slab(s, page, @@ -1812,7 +1812,7 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, set_freepointer(s, freelist, prior); new.counters = counters; new.inuse--; - VM_BUG_ON(!new.frozen); + VM_BUG_ON_PAGE(!new.frozen, &new); } while (!__cmpxchg_double_slab(s, page, prior, counters, @@ -1840,7 +1840,7 @@ redo: old.freelist = page->freelist; old.counters = page->counters; - VM_BUG_ON(!old.frozen); + VM_BUG_ON_PAGE(!old.frozen, &old); /* Determine target state of the slab */ new.counters = old.counters; @@ -1952,7 +1952,7 @@ static void unfreeze_partials(struct kmem_cache *s, old.freelist = page->freelist; old.counters = page->counters; - VM_BUG_ON(!old.frozen); + VM_BUG_ON_PAGE(!old.frozen, &old); new.counters = old.counters; new.freelist = old.freelist; @@ -2225,7 +2225,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page) counters = page->counters; new.counters = counters; - VM_BUG_ON(!new.frozen); + VM_BUG_ON_PAGE(!new.frozen, &new); new.inuse = page->objects; new.frozen = freelist != NULL; @@ -2319,7 +2319,7 @@ load_freelist: * page is pointing to the page from which the objects are obtained. * That page must be frozen for per cpu allocations to work. */ - VM_BUG_ON(!c->page->frozen); + VM_BUG_ON_PAGE(!c->page->frozen, c->page); c->freelist = get_freepointer(s, freelist); c->tid = next_tid(c->tid); local_irq_restore(flags); diff --git a/mm/swap.c b/mm/swap.c index d1100b6..b31ba67 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -57,7 +57,7 @@ static void __page_cache_release(struct page *page) spin_lock_irqsave(&zone->lru_lock, flags); lruvec = mem_cgroup_page_lruvec(page, zone); - VM_BUG_ON(!PageLRU(page)); + VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); spin_unlock_irqrestore(&zone->lru_lock, flags); @@ -130,8 +130,8 @@ static void put_compound_page(struct page *page) * __split_huge_page_refcount cannot race * here. */ - VM_BUG_ON(!PageHead(page_head)); - VM_BUG_ON(page_mapcount(page) != 0); + VM_BUG_ON_PAGE(!PageHead(page_head), page_head); + VM_BUG_ON_PAGE(page_mapcount(page) != 0, page); if (put_page_testzero(page_head)) { /* * If this is the tail of a slab @@ -148,7 +148,7 @@ static void put_compound_page(struct page *page) * the compound page enters the buddy * allocator. */ - VM_BUG_ON(PageSlab(page_head)); + VM_BUG_ON_PAGE(PageSlab(page_head), page_head); __put_compound_page(page_head); } return; @@ -199,7 +199,7 @@ out_put_single: __put_single_page(page); return; } - VM_BUG_ON(page_head != page->first_page); + VM_BUG_ON_PAGE(page_head != page->first_page, page); /* * We can release the refcount taken by * get_page_unless_zero() now that @@ -207,12 +207,12 @@ out_put_single: * compound_lock. */ if (put_page_testzero(page_head)) - VM_BUG_ON(1); + VM_BUG_ON_PAGE(1, page_head); /* __split_huge_page_refcount will wait now */ - VM_BUG_ON(page_mapcount(page) <= 0); + VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page); atomic_dec(&page->_mapcount); - VM_BUG_ON(atomic_read(&page_head->_count) <= 0); - VM_BUG_ON(atomic_read(&page->_count) != 0); + VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head); + VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page); compound_unlock_irqrestore(page_head, flags); if (put_page_testzero(page_head)) { @@ -223,7 +223,7 @@ out_put_single: } } else { /* page_head is a dangling pointer */ - VM_BUG_ON(PageTail(page)); + VM_BUG_ON_PAGE(PageTail(page), page); goto out_put_single; } } @@ -264,7 +264,7 @@ bool __get_page_tail(struct page *page) * page. __split_huge_page_refcount * cannot race here. */ - VM_BUG_ON(!PageHead(page_head)); + VM_BUG_ON_PAGE(!PageHead(page_head), page_head); __get_page_tail_foll(page, true); return true; } else { @@ -604,8 +604,8 @@ EXPORT_SYMBOL(__lru_cache_add); */ void lru_cache_add(struct page *page) { - VM_BUG_ON(PageActive(page) && PageUnevictable(page)); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageActive(page) && PageUnevictable(page), page); + VM_BUG_ON_PAGE(PageLRU(page), page); __lru_cache_add(page); } @@ -846,7 +846,7 @@ void release_pages(struct page **pages, int nr, int cold) } lruvec = mem_cgroup_page_lruvec(page, zone); - VM_BUG_ON(!PageLRU(page)); + VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); } @@ -888,9 +888,9 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, { const int file = 0; - VM_BUG_ON(!PageHead(page)); - VM_BUG_ON(PageCompound(page_tail)); - VM_BUG_ON(PageLRU(page_tail)); + VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(PageCompound(page_tail), page); + VM_BUG_ON_PAGE(PageLRU(page_tail), page); VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&lruvec_zone(lruvec)->lru_lock)); @@ -929,7 +929,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, int active = PageActive(page); enum lru_list lru = page_lru(page); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); add_page_to_lru_list(page, lruvec, lru); diff --git a/mm/swap_state.c b/mm/swap_state.c index e6f15f8..98e85e9 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -83,9 +83,9 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry) int error; struct address_space *address_space; - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(PageSwapCache(page)); - VM_BUG_ON(!PageSwapBacked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(PageSwapCache(page), page); + VM_BUG_ON_PAGE(!PageSwapBacked(page), page); page_cache_get(page); SetPageSwapCache(page); @@ -139,9 +139,9 @@ void __delete_from_swap_cache(struct page *page) swp_entry_t entry; struct address_space *address_space; - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(!PageSwapCache(page)); - VM_BUG_ON(PageWriteback(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageSwapCache(page), page); + VM_BUG_ON_PAGE(PageWriteback(page), page); entry.val = page_private(page); address_space = swap_address_space(entry); @@ -165,8 +165,8 @@ int add_to_swap(struct page *page, struct list_head *list) swp_entry_t entry; int err; - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(!PageUptodate(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageUptodate(page), page); entry = get_swap_page(); if (!entry.val) diff --git a/mm/swapfile.c b/mm/swapfile.c index 612a7c9..d443dea 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -906,7 +906,7 @@ int reuse_swap_page(struct page *page) { int count; - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); if (unlikely(PageKsm(page))) return 0; count = page_mapcount(page); @@ -926,7 +926,7 @@ int reuse_swap_page(struct page *page) */ int try_to_free_swap(struct page *page) { - VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON_PAGE(!PageLocked(page), page); if (!PageSwapCache(page)) return 0; @@ -2714,7 +2714,7 @@ struct swap_info_struct *page_swap_info(struct page *page) */ struct address_space *__page_file_mapping(struct page *page) { - VM_BUG_ON(!PageSwapCache(page)); + VM_BUG_ON_PAGE(!PageSwapCache(page), page); return page_swap_info(page)->swap_file->f_mapping; } EXPORT_SYMBOL_GPL(__page_file_mapping); @@ -2722,7 +2722,7 @@ EXPORT_SYMBOL_GPL(__page_file_mapping); pgoff_t __page_file_index(struct page *page) { swp_entry_t swap = { .val = page_private(page) }; - VM_BUG_ON(!PageSwapCache(page)); + VM_BUG_ON_PAGE(!PageSwapCache(page), page); return swp_offset(swap); } EXPORT_SYMBOL_GPL(__page_file_index); diff --git a/mm/vmscan.c b/mm/vmscan.c index eea668d..2254f36 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -603,7 +603,7 @@ void putback_lru_page(struct page *page) bool is_unevictable; int was_unevictable = PageUnevictable(page); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); redo: ClearPageUnevictable(page); @@ -794,8 +794,8 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (!trylock_page(page)) goto keep; - VM_BUG_ON(PageActive(page)); - VM_BUG_ON(page_zone(page) != zone); + VM_BUG_ON_PAGE(PageActive(page), page); + VM_BUG_ON_PAGE(page_zone(page) != zone, page); sc->nr_scanned++; @@ -1079,14 +1079,14 @@ activate_locked: /* Not a candidate for swapping, so reclaim swap space. */ if (PageSwapCache(page) && vm_swap_full()) try_to_free_swap(page); - VM_BUG_ON(PageActive(page)); + VM_BUG_ON_PAGE(PageActive(page), page); SetPageActive(page); pgactivate++; keep_locked: unlock_page(page); keep: list_add(&page->lru, &ret_pages); - VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); + VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } free_hot_cold_page_list(&free_pages, 1); @@ -1240,7 +1240,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, page = lru_to_page(src); prefetchw_prev_lru_page(page, src, flags); - VM_BUG_ON(!PageLRU(page)); + VM_BUG_ON_PAGE(!PageLRU(page), page); switch (__isolate_lru_page(page, mode)) { case 0: @@ -1295,7 +1295,7 @@ int isolate_lru_page(struct page *page) { int ret = -EBUSY; - VM_BUG_ON(!page_count(page)); + VM_BUG_ON_PAGE(!page_count(page), page); if (PageLRU(page)) { struct zone *zone = page_zone(page); @@ -1366,7 +1366,7 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) struct page *page = lru_to_page(page_list); int lru; - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { spin_unlock_irq(&zone->lru_lock); @@ -1586,7 +1586,7 @@ static void move_active_pages_to_lru(struct lruvec *lruvec, page = lru_to_page(list); lruvec = mem_cgroup_page_lruvec(page, zone); - VM_BUG_ON(PageLRU(page)); + VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); nr_pages = hpage_nr_pages(page); @@ -3701,7 +3701,7 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages) if (page_evictable(page)) { enum lru_list lru = page_lru_base_type(page); - VM_BUG_ON(PageActive(page)); + VM_BUG_ON_PAGE(PageActive(page), page); ClearPageUnevictable(page); del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); add_page_to_lru_list(page, lruvec, lru); -- cgit v0.10.2 From 3965fc3652244651006ebb31c8c45318ce84818f Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:52:55 -0800 Subject: slab: clean up kmem_cache_create_memcg() error handling Currently kmem_cache_create_memcg() backoffs on failure inside conditionals, without using gotos. This results in the rollback code duplication, which makes the function look cumbersome even though on error we should only free the allocated cache. Since in the next patch I am going to add yet another rollback function call on error path there, let's employ labels instead of conditionals for undoing any changes on failure to keep things clean. Signed-off-by: Vladimir Davydov Reviewed-by: Pekka Enberg Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/slab_common.c b/mm/slab_common.c index 0b7bb39..f70df3e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -171,13 +171,14 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size, struct kmem_cache *parent_cache) { struct kmem_cache *s = NULL; - int err = 0; + int err; get_online_cpus(); mutex_lock(&slab_mutex); - if (!kmem_cache_sanity_check(memcg, name, size) == 0) - goto out_locked; + err = kmem_cache_sanity_check(memcg, name, size); + if (err) + goto out_unlock; /* * Some allocators will constraint the set of valid flags to a subset @@ -189,45 +190,38 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size, s = __kmem_cache_alias(memcg, name, size, align, flags, ctor); if (s) - goto out_locked; + goto out_unlock; + err = -ENOMEM; s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL); - if (s) { - s->object_size = s->size = size; - s->align = calculate_alignment(flags, align, size); - s->ctor = ctor; + if (!s) + goto out_unlock; - if (memcg_register_cache(memcg, s, parent_cache)) { - kmem_cache_free(kmem_cache, s); - err = -ENOMEM; - goto out_locked; - } + s->object_size = s->size = size; + s->align = calculate_alignment(flags, align, size); + s->ctor = ctor; - s->name = kstrdup(name, GFP_KERNEL); - if (!s->name) { - kmem_cache_free(kmem_cache, s); - err = -ENOMEM; - goto out_locked; - } + s->name = kstrdup(name, GFP_KERNEL); + if (!s->name) + goto out_free_cache; - err = __kmem_cache_create(s, flags); - if (!err) { - s->refcount = 1; - list_add(&s->list, &slab_caches); - memcg_cache_list_add(memcg, s); - } else { - kfree(s->name); - kmem_cache_free(kmem_cache, s); - } - } else - err = -ENOMEM; + err = memcg_register_cache(memcg, s, parent_cache); + if (err) + goto out_free_cache; + + err = __kmem_cache_create(s, flags); + if (err) + goto out_free_cache; + + s->refcount = 1; + list_add(&s->list, &slab_caches); + memcg_cache_list_add(memcg, s); -out_locked: +out_unlock: mutex_unlock(&slab_mutex); put_online_cpus(); if (err) { - if (flags & SLAB_PANIC) panic("kmem_cache_create: Failed to create slab '%s'. Error %d\n", name, err); @@ -236,11 +230,14 @@ out_locked: name, err); dump_stack(); } - return NULL; } - return s; + +out_free_cache: + kfree(s->name); + kmem_cache_free(kmem_cache, s); + goto out_unlock; } struct kmem_cache * -- cgit v0.10.2 From 363a044f739b0f07a8c063b838c5528d10720e02 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:52:56 -0800 Subject: memcg, slab: kmem_cache_create_memcg(): fix memleak on fail path We do not free the cache's memcg_params if __kmem_cache_create fails. Fix this. Plus, rename memcg_register_cache() to memcg_alloc_cache_params(), because it actually does not register the cache anywhere, but simply initialize kmem_cache::memcg_params. [akpm@linux-foundation.org: fix build] Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b3e7a66..284daff 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -497,8 +497,9 @@ void __memcg_kmem_commit_charge(struct page *page, void __memcg_kmem_uncharge_pages(struct page *page, int order); int memcg_cache_id(struct mem_cgroup *memcg); -int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s, - struct kmem_cache *root_cache); +int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s, + struct kmem_cache *root_cache); +void memcg_free_cache_params(struct kmem_cache *s); void memcg_release_cache(struct kmem_cache *cachep); void memcg_cache_list_add(struct mem_cgroup *memcg, struct kmem_cache *cachep); @@ -640,13 +641,16 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) return -1; } -static inline int -memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s, - struct kmem_cache *root_cache) +static inline int memcg_alloc_cache_params(struct mem_cgroup *memcg, + struct kmem_cache *s, struct kmem_cache *root_cache) { return 0; } +static inline void memcg_free_cache_params(struct kmem_cache *s) +{ +} + static inline void memcg_release_cache(struct kmem_cache *cachep) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 72f2d90..b8ebe71 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3231,8 +3231,8 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups) return 0; } -int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s, - struct kmem_cache *root_cache) +int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s, + struct kmem_cache *root_cache) { size_t size; @@ -3260,6 +3260,11 @@ int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s, return 0; } +void memcg_free_cache_params(struct kmem_cache *s) +{ + kfree(s->memcg_params); +} + void memcg_release_cache(struct kmem_cache *s) { struct kmem_cache *root; @@ -3288,7 +3293,7 @@ void memcg_release_cache(struct kmem_cache *s) css_put(&memcg->css); out: - kfree(s->memcg_params); + memcg_free_cache_params(s); } /* diff --git a/mm/slab_common.c b/mm/slab_common.c index f70df3e..70f9e24 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -205,7 +205,7 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size, if (!s->name) goto out_free_cache; - err = memcg_register_cache(memcg, s, parent_cache); + err = memcg_alloc_cache_params(memcg, s, parent_cache); if (err) goto out_free_cache; @@ -235,6 +235,7 @@ out_unlock: return s; out_free_cache: + memcg_free_cache_params(s); kfree(s->name); kmem_cache_free(kmem_cache, s); goto out_unlock; -- cgit v0.10.2 From 1aa13254259bdef0bca723849ab3bab308d2f0c3 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:52:58 -0800 Subject: memcg, slab: clean up memcg cache initialization/destruction Currently, we have rather a messy function set relating to per-memcg kmem cache initialization/destruction. Per-memcg caches are created in memcg_create_kmem_cache(). This function calls kmem_cache_create_memcg() to allocate and initialize a kmem cache and then "registers" the new cache in the memcg_params::memcg_caches array of the parent cache. During its work-flow, kmem_cache_create_memcg() executes the following memcg-related functions: - memcg_alloc_cache_params(), to initialize memcg_params of the newly created cache; - memcg_cache_list_add(), to add the new cache to the memcg_slab_caches list. On the other hand, kmem_cache_destroy() called on a cache destruction only calls memcg_release_cache(), which does all the work: it cleans the reference to the cache in its parent's memcg_params::memcg_caches, removes the cache from the memcg_slab_caches list, and frees memcg_params. Such an inconsistency between destruction and initialization paths make the code difficult to read, so let's clean this up a bit. This patch moves all the code relating to registration of per-memcg caches (adding to memcg list, setting the pointer to a cache from its parent) to the newly created memcg_register_cache() and memcg_unregister_cache() functions making the initialization and destruction paths look symmetrical. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 284daff..abd0113 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -500,8 +500,8 @@ int memcg_cache_id(struct mem_cgroup *memcg); int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s, struct kmem_cache *root_cache); void memcg_free_cache_params(struct kmem_cache *s); -void memcg_release_cache(struct kmem_cache *cachep); -void memcg_cache_list_add(struct mem_cgroup *memcg, struct kmem_cache *cachep); +void memcg_register_cache(struct kmem_cache *s); +void memcg_unregister_cache(struct kmem_cache *s); int memcg_update_cache_size(struct kmem_cache *s, int num_groups); void memcg_update_array_size(int num_groups); @@ -651,12 +651,11 @@ static inline void memcg_free_cache_params(struct kmem_cache *s) { } -static inline void memcg_release_cache(struct kmem_cache *cachep) +static inline void memcg_register_cache(struct kmem_cache *s) { } -static inline void memcg_cache_list_add(struct mem_cgroup *memcg, - struct kmem_cache *s) +static inline void memcg_unregister_cache(struct kmem_cache *s) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b8ebe71..739383c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3095,16 +3095,6 @@ static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) css_put(&memcg->css); } -void memcg_cache_list_add(struct mem_cgroup *memcg, struct kmem_cache *cachep) -{ - if (!memcg) - return; - - mutex_lock(&memcg->slab_caches_mutex); - list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches); - mutex_unlock(&memcg->slab_caches_mutex); -} - /* * helper for acessing a memcg's index. It will be used as an index in the * child cache array in kmem_cache, and also to derive its name. This function @@ -3265,21 +3255,41 @@ void memcg_free_cache_params(struct kmem_cache *s) kfree(s->memcg_params); } -void memcg_release_cache(struct kmem_cache *s) +void memcg_register_cache(struct kmem_cache *s) { struct kmem_cache *root; struct mem_cgroup *memcg; int id; + if (is_root_cache(s)) + return; + + root = s->memcg_params->root_cache; + memcg = s->memcg_params->memcg; + id = memcg_cache_id(memcg); + + css_get(&memcg->css); + + mutex_lock(&memcg->slab_caches_mutex); + list_add(&s->memcg_params->list, &memcg->memcg_slab_caches); + mutex_unlock(&memcg->slab_caches_mutex); + + root->memcg_params->memcg_caches[id] = s; /* - * This happens, for instance, when a root cache goes away before we - * add any memcg. + * the readers won't lock, make sure everybody sees the updated value, + * so they won't put stuff in the queue again for no reason */ - if (!s->memcg_params) - return; + wmb(); +} - if (s->memcg_params->is_root_cache) - goto out; +void memcg_unregister_cache(struct kmem_cache *s) +{ + struct kmem_cache *root; + struct mem_cgroup *memcg; + int id; + + if (is_root_cache(s)) + return; memcg = s->memcg_params->memcg; id = memcg_cache_id(memcg); @@ -3292,8 +3302,6 @@ void memcg_release_cache(struct kmem_cache *s) mutex_unlock(&memcg->slab_caches_mutex); css_put(&memcg->css); -out: - memcg_free_cache_params(s); } /* @@ -3451,26 +3459,13 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg, mutex_lock(&memcg_cache_mutex); new_cachep = cache_from_memcg_idx(cachep, idx); - if (new_cachep) { - css_put(&memcg->css); + if (new_cachep) goto out; - } new_cachep = kmem_cache_dup(memcg, cachep); - if (new_cachep == NULL) { + if (new_cachep == NULL) new_cachep = cachep; - css_put(&memcg->css); - goto out; - } - - atomic_set(&new_cachep->memcg_params->nr_pages , 0); - cachep->memcg_params->memcg_caches[idx] = new_cachep; - /* - * the readers won't lock, make sure everybody sees the updated value, - * so they won't put stuff in the queue again for no reason - */ - wmb(); out: mutex_unlock(&memcg_cache_mutex); return new_cachep; @@ -3550,6 +3545,7 @@ static void memcg_create_cache_work_func(struct work_struct *w) cw = container_of(w, struct create_work, work); memcg_create_kmem_cache(cw->memcg, cw->cachep); + css_put(&cw->memcg->css); kfree(cw); } diff --git a/mm/slab_common.c b/mm/slab_common.c index 70f9e24..db24ec4 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -215,7 +215,7 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size, s->refcount = 1; list_add(&s->list, &slab_caches); - memcg_cache_list_add(memcg, s); + memcg_register_cache(s); out_unlock: mutex_unlock(&slab_mutex); @@ -265,7 +265,8 @@ void kmem_cache_destroy(struct kmem_cache *s) if (s->flags & SLAB_DESTROY_BY_RCU) rcu_barrier(); - memcg_release_cache(s); + memcg_unregister_cache(s); + memcg_free_cache_params(s); kfree(s->name); kmem_cache_free(kmem_cache, s); } else { -- cgit v0.10.2 From 959c8963fc6c8c9b97e80c55ce77105247040e7d Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:52:59 -0800 Subject: memcg, slab: fix barrier usage when accessing memcg_caches Each root kmem_cache has pointers to per-memcg caches stored in its memcg_params::memcg_caches array. Whenever we want to allocate a slab for a memcg, we access this array to get per-memcg cache to allocate from (see memcg_kmem_get_cache()). The access must be lock-free for performance reasons, so we should use barriers to assert the kmem_cache is up-to-date. First, we should place a write barrier immediately before setting the pointer to it in the memcg_caches array in order to make sure nobody will see a partially initialized object. Second, we should issue a read barrier before dereferencing the pointer to conform to the write barrier. However, currently the barrier usage looks rather strange. We have a write barrier *after* setting the pointer and a read barrier *before* reading the pointer, which is incorrect. This patch fixes this. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 739383c..322d18d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3274,12 +3274,14 @@ void memcg_register_cache(struct kmem_cache *s) list_add(&s->memcg_params->list, &memcg->memcg_slab_caches); mutex_unlock(&memcg->slab_caches_mutex); - root->memcg_params->memcg_caches[id] = s; /* - * the readers won't lock, make sure everybody sees the updated value, - * so they won't put stuff in the queue again for no reason + * Since readers won't lock (see cache_from_memcg_idx()), we need a + * barrier here to ensure nobody will see the kmem_cache partially + * initialized. */ - wmb(); + smp_wmb(); + + root->memcg_params->memcg_caches[id] = s; } void memcg_unregister_cache(struct kmem_cache *s) @@ -3605,7 +3607,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) { struct mem_cgroup *memcg; - int idx; + struct kmem_cache *memcg_cachep; VM_BUG_ON(!cachep->memcg_params); VM_BUG_ON(!cachep->memcg_params->is_root_cache); @@ -3619,15 +3621,9 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep, if (!memcg_can_account_kmem(memcg)) goto out; - idx = memcg_cache_id(memcg); - - /* - * barrier to mare sure we're always seeing the up to date value. The - * code updating memcg_caches will issue a write barrier to match this. - */ - read_barrier_depends(); - if (likely(cache_from_memcg_idx(cachep, idx))) { - cachep = cache_from_memcg_idx(cachep, idx); + memcg_cachep = cache_from_memcg_idx(cachep, memcg_cache_id(memcg)); + if (likely(memcg_cachep)) { + cachep = memcg_cachep; goto out; } diff --git a/mm/slab.h b/mm/slab.h index 0859c42..72d1f9d 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -163,9 +163,19 @@ static inline const char *cache_name(struct kmem_cache *s) static inline struct kmem_cache * cache_from_memcg_idx(struct kmem_cache *s, int idx) { + struct kmem_cache *cachep; + if (!s->memcg_params) return NULL; - return s->memcg_params->memcg_caches[idx]; + cachep = s->memcg_params->memcg_caches[idx]; + + /* + * Make sure we will access the up-to-date value. The code updating + * memcg_caches issues a write barrier to match this (see + * memcg_register_cache()). + */ + smp_read_barrier_depends(); + return cachep; } static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) -- cgit v0.10.2 From 96403da244443d9842dbf290c2a02390b78a158e Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:01 -0800 Subject: memcg: fix possible NULL deref while traversing memcg_slab_caches list All caches of the same memory cgroup are linked in the memcg_slab_caches list via kmem_cache::memcg_params::list. This list is traversed, for example, when we read memory.kmem.slabinfo. Since the list actually consists of memcg_cache_params objects, we have to convert an element of the list to a kmem_cache object using memcg_params_to_cache(), which obtains the pointer to the cache from the memcg_params::memcg_caches array of the corresponding root cache. That said the pointer to a kmem_cache in its parent's memcg_params must be initialized before adding the cache to the list, and cleared only after it has been unlinked. Currently it is vice-versa, which can result in a NULL ptr dereference while traversing the memcg_slab_caches list. This patch restores the correct order. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 322d18d..014a4f1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3270,9 +3270,6 @@ void memcg_register_cache(struct kmem_cache *s) css_get(&memcg->css); - mutex_lock(&memcg->slab_caches_mutex); - list_add(&s->memcg_params->list, &memcg->memcg_slab_caches); - mutex_unlock(&memcg->slab_caches_mutex); /* * Since readers won't lock (see cache_from_memcg_idx()), we need a @@ -3281,7 +3278,16 @@ void memcg_register_cache(struct kmem_cache *s) */ smp_wmb(); + /* + * Initialize the pointer to this cache in its parent's memcg_params + * before adding it to the memcg_slab_caches list, otherwise we can + * fail to convert memcg_params_to_cache() while traversing the list. + */ root->memcg_params->memcg_caches[id] = s; + + mutex_lock(&memcg->slab_caches_mutex); + list_add(&s->memcg_params->list, &memcg->memcg_slab_caches); + mutex_unlock(&memcg->slab_caches_mutex); } void memcg_unregister_cache(struct kmem_cache *s) @@ -3293,16 +3299,21 @@ void memcg_unregister_cache(struct kmem_cache *s) if (is_root_cache(s)) return; - memcg = s->memcg_params->memcg; - id = memcg_cache_id(memcg); - root = s->memcg_params->root_cache; - root->memcg_params->memcg_caches[id] = NULL; + memcg = s->memcg_params->memcg; + id = memcg_cache_id(memcg); mutex_lock(&memcg->slab_caches_mutex); list_del(&s->memcg_params->list); mutex_unlock(&memcg->slab_caches_mutex); + /* + * Clear the pointer to this cache in its parent's memcg_params only + * after removing it from the memcg_slab_caches list, otherwise we can + * fail to convert memcg_params_to_cache() while traversing the list. + */ + root->memcg_params->memcg_caches[id] = NULL; + css_put(&memcg->css); } -- cgit v0.10.2 From 2edefe1155b3ad3dc92065f6e1018d363525296e Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:02 -0800 Subject: memcg, slab: fix races in per-memcg cache creation/destruction We obtain a per-memcg cache from a root kmem_cache by dereferencing an entry of the root cache's memcg_params::memcg_caches array. If we find no cache for a memcg there on allocation, we initiate the memcg cache creation (see memcg_kmem_get_cache()). The cache creation proceeds asynchronously in memcg_create_kmem_cache() in order to avoid lock clashes, so there can be several threads trying to create the same kmem_cache concurrently, but only one of them may succeed. However, due to a race in the code, it is not always true. The point is that the memcg_caches array can be relocated when we activate kmem accounting for a memcg (see memcg_update_all_caches(), memcg_update_cache_size()). If memcg_update_cache_size() and memcg_create_kmem_cache() proceed concurrently as described below, we can leak a kmem_cache. Asume two threads schedule creation of the same kmem_cache. One of them successfully creates it. Another one should fail then, but if memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as follows, it won't: memcg_create_kmem_cache() memcg_update_cache_size() (called w/o mutexes held) (called with slab_mutex, set_limit_mutex held) ------------------------- ------------------------- mutex_lock(&memcg_cache_mutex) s->memcg_params=kzalloc(...) new_cachep=cache_from_memcg_idx(cachep,idx) // new_cachep==NULL => proceed to creation s->memcg_params->memcg_caches[i] =cur_params->memcg_caches[i] // kmem_cache_create_memcg takes slab_mutex // so we will hang around until // memcg_update_cache_size finishes, but // nothing will prevent it from succeeding so // memcg_caches[idx] will be overwritten in // memcg_register_cache! new_cachep = kmem_cache_create_memcg(...) mutex_unlock(&memcg_cache_mutex) Let's fix this by moving the check for existence of the memcg cache to kmem_cache_create_memcg() to be called under the slab_mutex and make it return NULL if so. A similar race is possible when destroying a memcg cache (see kmem_cache_destroy()). Since memcg_unregister_cache(), which clears the pointer in the memcg_caches array, is called w/o protection, we can race with memcg_update_cache_size() and omit clearing the pointer. Therefore memcg_unregister_cache() should be moved before we release the slab_mutex. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 014a4f1..d2da65c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3264,6 +3264,12 @@ void memcg_register_cache(struct kmem_cache *s) if (is_root_cache(s)) return; + /* + * Holding the slab_mutex assures nobody will touch the memcg_caches + * array while we are modifying it. + */ + lockdep_assert_held(&slab_mutex); + root = s->memcg_params->root_cache; memcg = s->memcg_params->memcg; id = memcg_cache_id(memcg); @@ -3283,6 +3289,7 @@ void memcg_register_cache(struct kmem_cache *s) * before adding it to the memcg_slab_caches list, otherwise we can * fail to convert memcg_params_to_cache() while traversing the list. */ + VM_BUG_ON(root->memcg_params->memcg_caches[id]); root->memcg_params->memcg_caches[id] = s; mutex_lock(&memcg->slab_caches_mutex); @@ -3299,6 +3306,12 @@ void memcg_unregister_cache(struct kmem_cache *s) if (is_root_cache(s)) return; + /* + * Holding the slab_mutex assures nobody will touch the memcg_caches + * array while we are modifying it. + */ + lockdep_assert_held(&slab_mutex); + root = s->memcg_params->root_cache; memcg = s->memcg_params->memcg; id = memcg_cache_id(memcg); @@ -3312,6 +3325,7 @@ void memcg_unregister_cache(struct kmem_cache *s) * after removing it from the memcg_slab_caches list, otherwise we can * fail to convert memcg_params_to_cache() while traversing the list. */ + VM_BUG_ON(!root->memcg_params->memcg_caches[id]); root->memcg_params->memcg_caches[id] = NULL; css_put(&memcg->css); @@ -3464,22 +3478,13 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg, struct kmem_cache *cachep) { struct kmem_cache *new_cachep; - int idx; BUG_ON(!memcg_can_account_kmem(memcg)); - idx = memcg_cache_id(memcg); - mutex_lock(&memcg_cache_mutex); - new_cachep = cache_from_memcg_idx(cachep, idx); - if (new_cachep) - goto out; - new_cachep = kmem_cache_dup(memcg, cachep); if (new_cachep == NULL) new_cachep = cachep; - -out: mutex_unlock(&memcg_cache_mutex); return new_cachep; } diff --git a/mm/slab_common.c b/mm/slab_common.c index db24ec4..f34707e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -180,6 +180,18 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size, if (err) goto out_unlock; + if (memcg) { + /* + * Since per-memcg caches are created asynchronously on first + * allocation (see memcg_kmem_get_cache()), several threads can + * try to create the same cache, but only one of them may + * succeed. Therefore if we get here and see the cache has + * already been created, we silently return NULL. + */ + if (cache_from_memcg_idx(parent_cache, memcg_cache_id(memcg))) + goto out_unlock; + } + /* * Some allocators will constraint the set of valid flags to a subset * of all flags. We expect them to define CACHE_CREATE_MASK in this @@ -261,11 +273,11 @@ void kmem_cache_destroy(struct kmem_cache *s) list_del(&s->list); if (!__kmem_cache_shutdown(s)) { + memcg_unregister_cache(s); mutex_unlock(&slab_mutex); if (s->flags & SLAB_DESTROY_BY_RCU) rcu_barrier(); - memcg_unregister_cache(s); memcg_free_cache_params(s); kfree(s->name); kmem_cache_free(kmem_cache, s); -- cgit v0.10.2 From 842e2873697e023d140a8905a41fcf39d4e546f1 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:03 -0800 Subject: memcg: get rid of kmem_cache_dup() kmem_cache_dup() is only called from memcg_create_kmem_cache(). The latter, in fact, does nothing besides this, so let's fold kmem_cache_dup() into memcg_create_kmem_cache(). This patch also makes the memcg_cache_mutex private to memcg_create_kmem_cache(), because it is not used anywhere else. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d2da65c..80197e5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3427,27 +3427,16 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep) schedule_work(&cachep->memcg_params->destroy); } -/* - * This lock protects updaters, not readers. We want readers to be as fast as - * they can, and they will either see NULL or a valid cache value. Our model - * allow them to see NULL, in which case the root memcg will be selected. - * - * We need this lock because multiple allocations to the same cache from a non - * will span more than one worker. Only one of them can create the cache. - */ -static DEFINE_MUTEX(memcg_cache_mutex); - -/* - * Called with memcg_cache_mutex held - */ -static struct kmem_cache *kmem_cache_dup(struct mem_cgroup *memcg, - struct kmem_cache *s) +static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg, + struct kmem_cache *s) { struct kmem_cache *new; static char *tmp_name = NULL; + static DEFINE_MUTEX(mutex); /* protects tmp_name */ - lockdep_assert_held(&memcg_cache_mutex); + BUG_ON(!memcg_can_account_kmem(memcg)); + mutex_lock(&mutex); /* * kmem_cache_create_memcg duplicates the given name and * cgroup_name for this name requires RCU context. @@ -3470,25 +3459,13 @@ static struct kmem_cache *kmem_cache_dup(struct mem_cgroup *memcg, if (new) new->allocflags |= __GFP_KMEMCG; + else + new = s; + mutex_unlock(&mutex); return new; } -static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg, - struct kmem_cache *cachep) -{ - struct kmem_cache *new_cachep; - - BUG_ON(!memcg_can_account_kmem(memcg)); - - mutex_lock(&memcg_cache_mutex); - new_cachep = kmem_cache_dup(memcg, cachep); - if (new_cachep == NULL) - new_cachep = cachep; - mutex_unlock(&memcg_cache_mutex); - return new_cachep; -} - void kmem_cache_destroy_memcg_children(struct kmem_cache *s) { struct kmem_cache *c; -- cgit v0.10.2 From f717eb3abb5ea38f60e671dbfdbf512c2c93d22e Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:05 -0800 Subject: slab: do not panic if we fail to create memcg cache There is no point in flooding logs with warnings or especially crashing the system if we fail to create a cache for a memcg. In this case we will be accounting the memcg allocation to the root cgroup until we succeed to create its own cache, but it isn't that critical. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/slab_common.c b/mm/slab_common.c index f34707e..8e40321 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -233,7 +233,14 @@ out_unlock: mutex_unlock(&slab_mutex); put_online_cpus(); - if (err) { + /* + * There is no point in flooding logs with warnings or especially + * crashing the system if we fail to create a cache for a memcg. In + * this case we will be accounting the memcg allocation to the root + * cgroup until we succeed to create its own cache, but it isn't that + * critical. + */ + if (err && !memcg) { if (flags & SLAB_PANIC) panic("kmem_cache_create: Failed to create slab '%s'. Error %d\n", name, err); -- cgit v0.10.2 From f8570263ee16eb1d5038b8e20d7db3a68bbb2b49 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:06 -0800 Subject: memcg, slab: RCU protect memcg_params for root caches We relocate root cache's memcg_params whenever we need to grow the memcg_caches array to accommodate all kmem-active memory cgroups. Currently on relocation we free the old version immediately, which can lead to use-after-free, because the memcg_caches array is accessed lock-free (see cache_from_memcg_idx()). This patch fixes this by making memcg_params RCU-protected for root caches. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Pekka Enberg Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/slab.h b/include/linux/slab.h index 1e2f4fe..a060142 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -513,7 +513,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) * * Both the root cache and the child caches will have it. For the root cache, * this will hold a dynamically allocated array large enough to hold - * information about the currently limited memcgs in the system. + * information about the currently limited memcgs in the system. To allow the + * array to be accessed without taking any locks, on relocation we free the old + * version only after a grace period. * * Child caches will hold extra metadata needed for its operation. Fields are: * @@ -528,7 +530,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) struct memcg_cache_params { bool is_root_cache; union { - struct kmem_cache *memcg_caches[0]; + struct { + struct rcu_head rcu_head; + struct kmem_cache *memcg_caches[0]; + }; struct { struct mem_cgroup *memcg; struct list_head list; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 80197e5..216659d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3178,18 +3178,17 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups) if (num_groups > memcg_limited_groups_array_size) { int i; + struct memcg_cache_params *new_params; ssize_t size = memcg_caches_array_size(num_groups); size *= sizeof(void *); size += offsetof(struct memcg_cache_params, memcg_caches); - s->memcg_params = kzalloc(size, GFP_KERNEL); - if (!s->memcg_params) { - s->memcg_params = cur_params; + new_params = kzalloc(size, GFP_KERNEL); + if (!new_params) return -ENOMEM; - } - s->memcg_params->is_root_cache = true; + new_params->is_root_cache = true; /* * There is the chance it will be bigger than @@ -3203,7 +3202,7 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups) for (i = 0; i < memcg_limited_groups_array_size; i++) { if (!cur_params->memcg_caches[i]) continue; - s->memcg_params->memcg_caches[i] = + new_params->memcg_caches[i] = cur_params->memcg_caches[i]; } @@ -3216,7 +3215,9 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups) * bigger than the others. And all updates will reset this * anyway. */ - kfree(cur_params); + rcu_assign_pointer(s->memcg_params, new_params); + if (cur_params) + kfree_rcu(cur_params, rcu_head); } return 0; } diff --git a/mm/slab.h b/mm/slab.h index 72d1f9d..8184a7c 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -160,14 +160,28 @@ static inline const char *cache_name(struct kmem_cache *s) return s->name; } +/* + * Note, we protect with RCU only the memcg_caches array, not per-memcg caches. + * That said the caller must assure the memcg's cache won't go away. Since once + * created a memcg's cache is destroyed only along with the root cache, it is + * true if we are going to allocate from the cache or hold a reference to the + * root cache by other means. Otherwise, we should hold either the slab_mutex + * or the memcg's slab_caches_mutex while calling this function and accessing + * the returned value. + */ static inline struct kmem_cache * cache_from_memcg_idx(struct kmem_cache *s, int idx) { struct kmem_cache *cachep; + struct memcg_cache_params *params; if (!s->memcg_params) return NULL; - cachep = s->memcg_params->memcg_caches[idx]; + + rcu_read_lock(); + params = rcu_dereference(s->memcg_params); + cachep = params->memcg_caches[idx]; + rcu_read_unlock(); /* * Make sure we will access the up-to-date value. The code updating -- cgit v0.10.2 From 6de64beb3435ab8f2ac1428dd7dddad5dc679c4b Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:08 -0800 Subject: memcg: remove KMEM_ACCOUNTED_ACTIVATED flag Currently we have two state bits in mem_cgroup::kmem_account_flags regarding kmem accounting activation, ACTIVATED and ACTIVE. We start kmem accounting only if both flags are set (memcg_can_account_kmem()), plus throughout the code there are several places where we check only the ACTIVE flag, but we never check the ACTIVATED flag alone. These flags are both set from memcg_update_kmem_limit() under the set_limit_mutex, the ACTIVE flag always being set after ACTIVATED, and they never get cleared. That said checking if both flags are set is equivalent to checking only for the ACTIVE flag, and since there is no ACTIVATED flag checks, we can safely remove the ACTIVATED flag, and nothing will change. Let's try to understand what was the reason for introducing these flags. The purpose of the ACTIVE flag is clear - it states that kmem should be accounting to the cgroup. The only requirement for it is that it should be set after we have fully initialized kmem accounting bits for the cgroup and patched all static branches relating to kmem accounting. Since we always check if static branch is enabled before actually considering if we should account (otherwise we wouldn't benefit from static branching), this guarantees us that we won't skip a commit or uncharge after a charge due to an unpatched static branch. Now let's move on to the ACTIVATED bit. As I proved in the beginning of this message, it is absolutely useless, and removing it will change nothing. So what was the reason introducing it? The ACTIVATED flag was introduced by commit a8964b9b84f9 ("memcg: use static branches when code not in use") in order to guarantee that static_key_slow_inc(&memcg_kmem_enabled_key) would be called only once for each memory cgroup when its kmem accounting was activated. The point was that at that time the memcg_update_kmem_limit() function's work-flow looked like this: bool must_inc_static_branch = false; cgroup_lock(); mutex_lock(&set_limit_mutex); if (!memcg->kmem_account_flags && val != RESOURCE_MAX) { /* The kmem limit is set for the first time */ ret = res_counter_set_limit(&memcg->kmem, val); memcg_kmem_set_activated(memcg); must_inc_static_branch = true; } else ret = res_counter_set_limit(&memcg->kmem, val); mutex_unlock(&set_limit_mutex); cgroup_unlock(); if (must_inc_static_branch) { /* We can't do this under cgroup_lock */ static_key_slow_inc(&memcg_kmem_enabled_key); memcg_kmem_set_active(memcg); } So that without the ACTIVATED flag we could race with other threads trying to set the limit and increment the static branching ref-counter more than once. Today we call the whole memcg_update_kmem_limit() function under the set_limit_mutex and this race is impossible. As now we understand why the ACTIVATED bit was introduced and why we don't need it now, and know that removing it will change nothing anyway, let's get rid of it. Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 216659d..706f7bc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -382,15 +382,10 @@ struct mem_cgroup { /* internal only representation about the status of kmem accounting. */ enum { - KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */ - KMEM_ACCOUNTED_ACTIVATED, /* static key enabled. */ + KMEM_ACCOUNTED_ACTIVE, /* accounted by this cgroup itself */ KMEM_ACCOUNTED_DEAD, /* dead memcg with pending kmem charges */ }; -/* We account when limit is on, but only after call sites are patched */ -#define KMEM_ACCOUNTED_MASK \ - ((1 << KMEM_ACCOUNTED_ACTIVE) | (1 << KMEM_ACCOUNTED_ACTIVATED)) - #ifdef CONFIG_MEMCG_KMEM static inline void memcg_kmem_set_active(struct mem_cgroup *memcg) { @@ -402,16 +397,6 @@ static bool memcg_kmem_is_active(struct mem_cgroup *memcg) return test_bit(KMEM_ACCOUNTED_ACTIVE, &memcg->kmem_account_flags); } -static void memcg_kmem_set_activated(struct mem_cgroup *memcg) -{ - set_bit(KMEM_ACCOUNTED_ACTIVATED, &memcg->kmem_account_flags); -} - -static void memcg_kmem_clear_activated(struct mem_cgroup *memcg) -{ - clear_bit(KMEM_ACCOUNTED_ACTIVATED, &memcg->kmem_account_flags); -} - static void memcg_kmem_mark_dead(struct mem_cgroup *memcg) { /* @@ -2995,8 +2980,7 @@ static DEFINE_MUTEX(set_limit_mutex); static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg) { return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg) && - (memcg->kmem_account_flags & KMEM_ACCOUNTED_MASK) == - KMEM_ACCOUNTED_MASK; + memcg_kmem_is_active(memcg); } /* @@ -3120,19 +3104,10 @@ static int memcg_update_cache_sizes(struct mem_cgroup *memcg) 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); if (num < 0) return num; - /* - * After this point, kmem_accounted (that we test atomically in - * the beginning of this conditional), is no longer 0. This - * guarantees only one process will set the following boolean - * to true. We don't need test_and_set because we're protected - * by the set_limit_mutex anyway. - */ - memcg_kmem_set_activated(memcg); ret = memcg_update_all_caches(num+1); if (ret) { ida_simple_remove(&kmem_limited_groups, num); - memcg_kmem_clear_activated(memcg); return ret; } -- cgit v0.10.2 From d6441637709ba302905f1076f2afcb6d4ea3a901 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:09 -0800 Subject: memcg: rework memcg_update_kmem_limit synchronization Currently we take both the memcg_create_mutex and the set_limit_mutex when we enable kmem accounting for a memory cgroup, which makes kmem activation events serialize with both memcg creations and other memcg limit updates (memory.limit, memory.memsw.limit). However, there is no point in such strict synchronization rules there. First, the set_limit_mutex was introduced to keep the memory.limit and memory.memsw.limit values in sync. Since memory.kmem.limit can be set independently of them, it is better to introduce a separate mutex to synchronize against concurrent kmem limit updates. Second, we take the memcg_create_mutex in order to make sure all children of this memcg will be kmem-active as well. For achieving that, it is enough to hold this mutex only while checking if memcg_has_children() though. This guarantees that if a child is added after we checked that the memcg has no children, the newly added cgroup will see its parent kmem-active (of course if the latter succeeded), and call kmem activation for itself. This patch simplifies the locking rules of memcg_update_kmem_limit() according to these considerations. [vdavydov@parallels.com: fix unintialized var warning] Signed-off-by: Vladimir Davydov Cc: Michal Hocko Cc: Glauber Costa Cc: Johannes Weiner Cc: Balbir Singh Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 706f7bc..c871505 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2977,6 +2977,8 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg, static DEFINE_MUTEX(set_limit_mutex); #ifdef CONFIG_MEMCG_KMEM +static DEFINE_MUTEX(activate_kmem_mutex); + static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg) { return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg) && @@ -3089,34 +3091,6 @@ int memcg_cache_id(struct mem_cgroup *memcg) return memcg ? memcg->kmemcg_id : -1; } -/* - * This ends up being protected by the set_limit mutex, during normal - * operation, because that is its main call site. - * - * But when we create a new cache, we can call this as well if its parent - * is kmem-limited. That will have to hold set_limit_mutex as well. - */ -static int memcg_update_cache_sizes(struct mem_cgroup *memcg) -{ - int num, ret; - - num = ida_simple_get(&kmem_limited_groups, - 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); - if (num < 0) - return num; - - ret = memcg_update_all_caches(num+1); - if (ret) { - ida_simple_remove(&kmem_limited_groups, num); - return ret; - } - - memcg->kmemcg_id = num; - INIT_LIST_HEAD(&memcg->memcg_slab_caches); - mutex_init(&memcg->slab_caches_mutex); - return 0; -} - static size_t memcg_caches_array_size(int num_groups) { ssize_t size; @@ -3459,9 +3433,10 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s) * * Still, we don't want anyone else freeing memcg_caches under our * noses, which can happen if a new memcg comes to life. As usual, - * we'll take the set_limit_mutex to protect ourselves against this. + * we'll take the activate_kmem_mutex to protect ourselves against + * this. */ - mutex_lock(&set_limit_mutex); + mutex_lock(&activate_kmem_mutex); for_each_memcg_cache_index(i) { c = cache_from_memcg_idx(s, i); if (!c) @@ -3484,7 +3459,7 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s) cancel_work_sync(&c->memcg_params->destroy); kmem_cache_destroy(c); } - mutex_unlock(&set_limit_mutex); + mutex_unlock(&activate_kmem_mutex); } struct create_work { @@ -5148,11 +5123,23 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, return val; } -static int memcg_update_kmem_limit(struct cgroup_subsys_state *css, u64 val) -{ - int ret = -EINVAL; #ifdef CONFIG_MEMCG_KMEM - struct mem_cgroup *memcg = mem_cgroup_from_css(css); +/* should be called with activate_kmem_mutex held */ +static int __memcg_activate_kmem(struct mem_cgroup *memcg, + unsigned long long limit) +{ + int err = 0; + int memcg_id; + + if (memcg_kmem_is_active(memcg)) + return 0; + + /* + * We are going to allocate memory for data shared by all memory + * cgroups so let's stop accounting here. + */ + memcg_stop_kmem_account(); + /* * For simplicity, we won't allow this to be disabled. It also can't * be changed if the cgroup has children already, or if tasks had @@ -5166,72 +5153,101 @@ static int memcg_update_kmem_limit(struct cgroup_subsys_state *css, u64 val) * of course permitted. */ mutex_lock(&memcg_create_mutex); - mutex_lock(&set_limit_mutex); - if (!memcg->kmem_account_flags && val != RES_COUNTER_MAX) { - if (cgroup_task_count(css->cgroup) || memcg_has_children(memcg)) { - ret = -EBUSY; - goto out; - } - ret = res_counter_set_limit(&memcg->kmem, val); - VM_BUG_ON(ret); + if (cgroup_task_count(memcg->css.cgroup) || memcg_has_children(memcg)) + err = -EBUSY; + mutex_unlock(&memcg_create_mutex); + if (err) + goto out; - ret = memcg_update_cache_sizes(memcg); - if (ret) { - res_counter_set_limit(&memcg->kmem, RES_COUNTER_MAX); - goto out; - } - static_key_slow_inc(&memcg_kmem_enabled_key); - /* - * setting the active bit after the inc will guarantee no one - * starts accounting before all call sites are patched - */ - memcg_kmem_set_active(memcg); - } else - ret = res_counter_set_limit(&memcg->kmem, val); + memcg_id = ida_simple_get(&kmem_limited_groups, + 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); + if (memcg_id < 0) { + err = memcg_id; + goto out; + } + + /* + * Make sure we have enough space for this cgroup in each root cache's + * memcg_params. + */ + err = memcg_update_all_caches(memcg_id + 1); + if (err) + goto out_rmid; + + memcg->kmemcg_id = memcg_id; + INIT_LIST_HEAD(&memcg->memcg_slab_caches); + mutex_init(&memcg->slab_caches_mutex); + + /* + * We couldn't have accounted to this cgroup, because it hasn't got the + * active bit set yet, so this should succeed. + */ + err = res_counter_set_limit(&memcg->kmem, limit); + VM_BUG_ON(err); + + static_key_slow_inc(&memcg_kmem_enabled_key); + /* + * Setting the active bit after enabling static branching will + * guarantee no one starts accounting before all call sites are + * patched. + */ + memcg_kmem_set_active(memcg); out: - mutex_unlock(&set_limit_mutex); - mutex_unlock(&memcg_create_mutex); -#endif + memcg_resume_kmem_account(); + return err; + +out_rmid: + ida_simple_remove(&kmem_limited_groups, memcg_id); + goto out; +} + +static int memcg_activate_kmem(struct mem_cgroup *memcg, + unsigned long long limit) +{ + int ret; + + mutex_lock(&activate_kmem_mutex); + ret = __memcg_activate_kmem(memcg, limit); + mutex_unlock(&activate_kmem_mutex); + return ret; +} + +static int memcg_update_kmem_limit(struct mem_cgroup *memcg, + unsigned long long val) +{ + int ret; + + if (!memcg_kmem_is_active(memcg)) + ret = memcg_activate_kmem(memcg, val); + else + ret = res_counter_set_limit(&memcg->kmem, val); return ret; } -#ifdef CONFIG_MEMCG_KMEM static int memcg_propagate_kmem(struct mem_cgroup *memcg) { int ret = 0; struct mem_cgroup *parent = parent_mem_cgroup(memcg); - if (!parent) - goto out; - memcg->kmem_account_flags = parent->kmem_account_flags; - /* - * When that happen, we need to disable the static branch only on those - * memcgs that enabled it. To achieve this, we would be forced to - * complicate the code by keeping track of which memcgs were the ones - * that actually enabled limits, and which ones got it from its - * parents. - * - * It is a lot simpler just to do static_key_slow_inc() on every child - * that is accounted. - */ - if (!memcg_kmem_is_active(memcg)) - goto out; + if (!parent) + return 0; + mutex_lock(&activate_kmem_mutex); /* - * __mem_cgroup_free() will issue static_key_slow_dec() because this - * memcg is active already. If the later initialization fails then the - * cgroup core triggers the cleanup so we do not have to do it here. + * If the parent cgroup is not kmem-active now, it cannot be activated + * after this point, because it has at least one child already. */ - static_key_slow_inc(&memcg_kmem_enabled_key); - - mutex_lock(&set_limit_mutex); - memcg_stop_kmem_account(); - ret = memcg_update_cache_sizes(memcg); - memcg_resume_kmem_account(); - mutex_unlock(&set_limit_mutex); -out: + if (memcg_kmem_is_active(parent)) + ret = __memcg_activate_kmem(memcg, RES_COUNTER_MAX); + mutex_unlock(&activate_kmem_mutex); return ret; } +#else +static int memcg_update_kmem_limit(struct mem_cgroup *memcg, + unsigned long long val) +{ + return -EINVAL; +} #endif /* CONFIG_MEMCG_KMEM */ /* @@ -5265,7 +5281,7 @@ static int mem_cgroup_write(struct cgroup_subsys_state *css, struct cftype *cft, else if (type == _MEMSWAP) ret = mem_cgroup_resize_memsw_limit(memcg, val); else if (type == _KMEM) - ret = memcg_update_kmem_limit(css, val); + ret = memcg_update_kmem_limit(memcg, val); else return -EINVAL; break; @@ -6499,7 +6515,6 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); struct mem_cgroup *parent = mem_cgroup_from_css(css_parent(css)); - int error = 0; if (css->cgroup->id > MEM_CGROUP_ID_MAX) return -ENOSPC; @@ -6534,10 +6549,9 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css) if (parent != root_mem_cgroup) mem_cgroup_subsys.broken_hierarchy = true; } - - error = memcg_init_kmem(memcg, &mem_cgroup_subsys); mutex_unlock(&memcg_create_mutex); - return error; + + return memcg_init_kmem(memcg, &mem_cgroup_subsys); } /* -- cgit v0.10.2 From 87379ec8c2b8ae0acd526b87d2240afca92a7505 Mon Sep 17 00:00:00 2001 From: Philipp Hachtmann Date: Thu, 23 Jan 2014 15:53:10 -0800 Subject: mm/nobootmem.c: add return value check in __alloc_memory_core_early() When memblock_reserve() fails because memblock.reserved.regions cannot be resized, the caller (e.g. alloc_bootmem()) is not informed of the failed allocation. Therefore alloc_bootmem() silently returns the same pointer again and again. This patch adds a check for the return value of memblock_reserve() in __alloc_memory_core(). Signed-off-by: Philipp Hachtmann Reviewed-by: Tejun Heo Cc: Joonsoo Kim Cc: Johannes Weiner Cc: Tang Chen Cc: Toshi Kani Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 19121ce..bb1a70c 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -45,7 +45,9 @@ static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align, if (!addr) return NULL; - memblock_reserve(addr, size); + if (memblock_reserve(addr, size)) + return NULL; + ptr = phys_to_virt(addr); memset(ptr, 0, size); /* -- cgit v0.10.2 From 5e270e254885893f8c82ab9b91caa648af3690df Mon Sep 17 00:00:00 2001 From: Philipp Hachtmann Date: Thu, 23 Jan 2014 15:53:11 -0800 Subject: mm: free memblock.memory in free_all_bootmem When calling free_all_bootmem() the free areas under memblock's control are released to the buddy allocator. Additionally the reserved list is freed if it was reallocated by memblock. The same should apply for the memory list. Signed-off-by: Philipp Hachtmann Reviewed-by: Tejun Heo Cc: Joonsoo Kim Cc: Johannes Weiner Cc: Tang Chen Cc: Toshi Kani Cc: Jianguo Wu Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/memblock.h b/include/linux/memblock.h index cd0274b..1ef6636 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -61,6 +61,7 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t size, phys_addr_t align, phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align); phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr); +phys_addr_t get_allocated_memblock_memory_regions_info(phys_addr_t *addr); void memblock_allow_resize(void); int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid); int memblock_add(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 1c2ef2c..64ed243 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -291,6 +291,22 @@ phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info( memblock.reserved.max); } +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK + +phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info( + phys_addr_t *addr) +{ + if (memblock.memory.regions == memblock_memory_init_regions) + return 0; + + *addr = __pa(memblock.memory.regions); + + return PAGE_ALIGN(sizeof(struct memblock_region) * + memblock.memory.max); +} + +#endif + /** * memblock_double_array - double the size of the memblock regions array * @type: memblock type of the regions array being doubled diff --git a/mm/nobootmem.c b/mm/nobootmem.c index bb1a70c..17c8902 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -122,11 +122,19 @@ static unsigned long __init free_low_memory_core_early(void) for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL) count += __free_memory_core(start, end); - /* free range that is used for reserved array if we allocate it */ + /* Free memblock.reserved array if it was allocated */ size = get_allocated_memblock_reserved_regions_info(&start); if (size) count += __free_memory_core(start, start + size); +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK + + /* Free memblock.memory array if it was allocated */ + size = get_allocated_memblock_memory_regions_info(&start); + if (size) + count += __free_memory_core(start, start + size); +#endif + return count; } -- cgit v0.10.2 From 54a43d54988a3731d644fdeb7a1d6f46b4ac64c7 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Thu, 23 Jan 2014 15:53:13 -0800 Subject: numa: add a sysctl for numa_balancing Add a working sysctl to enable/disable automatic numa memory balancing at runtime. This allows us to track down performance problems with this feature and is generally a good idea. This was possible earlier through debugfs, but only with special debugging options set. Also fix the boot message. [akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/] Signed-off-by: Andi Kleen Acked-by: Mel Gorman Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 31e0193..b13cf43 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -99,4 +99,8 @@ extern int sched_rt_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +extern int sysctl_numa_balancing(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); + #endif /* _SCHED_SYSCTL_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4d6964e..7fea865 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1770,7 +1770,29 @@ void set_numabalancing_state(bool enabled) numabalancing_enabled = enabled; } #endif /* CONFIG_SCHED_DEBUG */ -#endif /* CONFIG_NUMA_BALANCING */ + +#ifdef CONFIG_PROC_SYSCTL +int sysctl_numa_balancing(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table t; + int err; + int state = numabalancing_enabled; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + t = *table; + t.data = &state; + err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); + if (err < 0) + return err; + if (write) + set_numabalancing_state(state); + return err; +} +#endif +#endif /* * fork()/clone()-time setup: diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 332cefc..693eac3 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -389,6 +389,15 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "numa_balancing", + .data = NULL, /* filled in by handler */ + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = sysctl_numa_balancing, + .extra1 = &zero, + .extra2 = &one, + }, #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_SCHED_DEBUG */ { diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 0cd2c4d..947293e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2668,7 +2668,7 @@ static void __init check_numabalancing_enable(void) if (nr_node_ids > 1 && !numabalancing_override) { printk(KERN_INFO "Enabling automatic NUMA balancing. " - "Configure with numa_balancing= or sysctl"); + "Configure with numa_balancing= or the kernel.numa_balancing sysctl"); set_numabalancing_state(numabalancing_default); } } -- cgit v0.10.2 From 54b9dd14d09f24927285359a227aa363ce46089e Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 23 Jan 2014 15:53:14 -0800 Subject: mm/memory-failure.c: shift page lock from head page to tail page after thp split After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 Call Trace: dump_stack+0x19/0x1b bad_page.part.59+0xcf/0xe8 free_pages_prepare+0x148/0x160 free_hot_cold_page+0x31/0x140 free_hot_cold_page_list+0x46/0xa0 release_pages+0x1c1/0x200 free_pages_and_swap_cache+0xad/0xd0 tlb_flush_mmu.part.46+0x4c/0x90 tlb_finish_mmu+0x55/0x60 exit_mmap+0xcb/0x170 mmput+0x67/0xf0 vhost_dev_cleanup+0x231/0x260 [vhost_net] vhost_net_release+0x3f/0x90 [vhost_net] __fput+0xe9/0x270 ____fput+0xe/0x10 task_work_run+0xc4/0xe0 do_exit+0x2bb/0xa40 do_group_exit+0x3f/0xa0 get_signal_to_deliver+0x1d0/0x6e0 do_signal+0x48/0x5e0 do_notify_resume+0x71/0xc0 retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Wanpeng Li Cc: [3.9+] # a3e0f9e47d5ef "mm/memory-failure.c: transfer page count from head page to tail page after split thp" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b25ed32..4f08a2d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -856,14 +856,14 @@ static int page_action(struct page_state *ps, struct page *p, * the pages and send SIGBUS to the processes if the data was dirty. */ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, - int trapno, int flags) + int trapno, int flags, struct page **hpagep) { enum ttu_flags ttu = TTU_UNMAP | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS; struct address_space *mapping; LIST_HEAD(tokill); int ret; int kill = 1, forcekill; - struct page *hpage = compound_head(p); + struct page *hpage = *hpagep; struct page *ppage; if (PageReserved(p) || PageSlab(p)) @@ -942,11 +942,14 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, * We pinned the head page for hwpoison handling, * now we split the thp and we are interested in * the hwpoisoned raw page, so move the refcount - * to it. + * to it. Similarly, page lock is shifted. */ if (hpage != p) { put_page(hpage); get_page(p); + lock_page(p); + unlock_page(hpage); + *hpagep = p; } /* THP is split, so ppage should be the real poisoned page. */ ppage = p; @@ -964,17 +967,11 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, if (kill) collect_procs(ppage, &tokill); - if (hpage != ppage) - lock_page(ppage); - ret = try_to_unmap(ppage, ttu); if (ret != SWAP_SUCCESS) printk(KERN_ERR "MCE %#lx: failed to unmap page (mapcount=%d)\n", pfn, page_mapcount(ppage)); - if (hpage != ppage) - unlock_page(ppage); - /* * Now that the dirty bit has been propagated to the * struct page and all unmaps done we can decide if @@ -1193,8 +1190,12 @@ int memory_failure(unsigned long pfn, int trapno, int flags) /* * Now take care of user space mappings. * Abort on fail: __delete_from_page_cache() assumes unmapped page. + * + * When the raw error page is thp tail page, hpage points to the raw + * page after thp split. */ - if (hwpoison_user_mappings(p, pfn, trapno, flags) != SWAP_SUCCESS) { + if (hwpoison_user_mappings(p, pfn, trapno, flags, &hpage) + != SWAP_SUCCESS) { printk(KERN_ERR "MCE %#lx: cannot unmap page, give up\n", pfn); res = -EBUSY; goto out; -- cgit v0.10.2 From cc81717ed3bc6d4f3738d13a1e097437caada0e9 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 23 Jan 2014 15:53:15 -0800 Subject: mm: new_vma_page() cannot see NULL vma for hugetlb pages Commit 11c731e81bb0 ("mm/mempolicy: fix !vma in new_vma_page()") has removed BUG_ON(!vma) from new_vma_page which is partially correct because page_address_in_vma will return EFAULT for non-linear mappings and at least shared shmem might be mapped this way. The patch also tried to prevent NULL ptr for hugetlb pages which is not correct AFAICS because hugetlb pages cannot be mapped as VM_NONLINEAR and other conditions in page_address_in_vma seem to be legit and catch real bugs. This patch restores BUG_ON for PageHuge to catch potential issues when the to-be-migrated page is not setup properly. Signed-off-by: Michal Hocko Reviewed-by: Bob Liu Cc: Sasha Levin Cc: Wanpeng Li Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 947293e..463b7fb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1199,10 +1199,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * } if (PageHuge(page)) { - if (vma) - return alloc_huge_page_noerr(vma, address, 1); - else - return NULL; + BUG_ON(!vma); + return alloc_huge_page_noerr(vma, address, 1); } /* * if !vma, alloc_page_vma() will use task or system default policy -- cgit v0.10.2 From da8c757b080ee84f219fa2368cb5dd23ac304fc0 Mon Sep 17 00:00:00 2001 From: Han Pingtian Date: Thu, 23 Jan 2014 15:53:17 -0800 Subject: mm: prevent setting of a value less than 0 to min_free_kbytes If echo -1 > /proc/vm/sys/min_free_kbytes, the system will hang. Changing proc_dointvec() to proc_dointvec_minmax() in the min_free_kbytes_sysctl_handler() can prevent this to happen. mhocko said: : You can still do echo $BIG_VALUE > /proc/vm/sys/min_free_kbytes and make : your machine unusable but I agree that proc_dointvec_minmax is more : suitable here as we already have: : : .proc_handler = min_free_kbytes_sysctl_handler, : .extra1 = &zero, : : It used to work properly but then 6fce56ec91b5 ("sysctl: Remove references : to ctl_name and strategy from the generic sysctl table") has removed : sysctl_intvec strategy and so extra1 is ignored. Signed-off-by: Han Pingtian Acked-by: Michal Hocko Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f18f016..a818d56 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5754,7 +5754,12 @@ module_init(init_per_zone_wmark_min) int min_free_kbytes_sysctl_handler(ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos) { - proc_dointvec(table, write, buffer, length, ppos); + int rc; + + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); + if (rc) + return rc; + if (write) { user_min_free_kbytes = min_free_kbytes; setup_per_zone_wmarks(); -- cgit v0.10.2 From bd7278166aaf8b33da1a3ee437354e2ed88bf70f Mon Sep 17 00:00:00 2001 From: Vinayak Menon Date: Thu, 23 Jan 2014 15:53:18 -0800 Subject: Documentation/trace/postprocess/trace-vmscan-postprocess.pl: fix the traceevent regex When irq, preempt and lockdep fields are printed (field 3 in the example below) in the trace output, the script fails. An example entry: kswapd0-610 [000] ...1 158.112152: mm_vmscan_kswapd_wake: nid=0 order=0 Signed-off-by: Vinayak Menon Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl index 4a37c47..00e425f 100644 --- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl +++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl @@ -123,7 +123,7 @@ my $regex_writepage; # Static regex used. Specified like this for readability and for use with /o # (process_pid) (cpus ) ( time ) (tpoint ) (details) -my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)'; +my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])(\s*[dX.][Nnp.][Hhs.][0-9a-fA-F.]*|)\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)'; my $regex_statname = '[-0-9]*\s\((.*)\).*'; my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*'; @@ -270,8 +270,8 @@ EVENT_PROCESS: while ($traceevent = ) { if ($traceevent =~ /$regex_traceevent/o) { $process_pid = $1; - $timestamp = $3; - $tracepoint = $4; + $timestamp = $4; + $tracepoint = $5; $process_pid =~ /(.*)-([0-9]*)$/; my $process = $1; @@ -299,7 +299,7 @@ EVENT_PROCESS: $perprocesspid{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}++; $perprocesspid{$process_pid}->{STATE_DIRECT_BEGIN} = $timestamp; - $details = $5; + $details = $6; if ($details !~ /$regex_direct_begin/o) { print "WARNING: Failed to parse mm_vmscan_direct_reclaim_begin as expected\n"; print " $details\n"; @@ -322,7 +322,7 @@ EVENT_PROCESS: $perprocesspid{$process_pid}->{HIGH_DIRECT_RECLAIM_LATENCY}[$index] = "$order-$latency"; } } elsif ($tracepoint eq "mm_vmscan_kswapd_wake") { - $details = $5; + $details = $6; if ($details !~ /$regex_kswapd_wake/o) { print "WARNING: Failed to parse mm_vmscan_kswapd_wake as expected\n"; print " $details\n"; @@ -356,7 +356,7 @@ EVENT_PROCESS: } elsif ($tracepoint eq "mm_vmscan_wakeup_kswapd") { $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}++; - $details = $5; + $details = $6; if ($details !~ /$regex_wakeup_kswapd/o) { print "WARNING: Failed to parse mm_vmscan_wakeup_kswapd as expected\n"; print " $details\n"; @@ -366,7 +366,7 @@ EVENT_PROCESS: my $order = $3; $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD_PERORDER}[$order]++; } elsif ($tracepoint eq "mm_vmscan_lru_isolate") { - $details = $5; + $details = $6; if ($details !~ /$regex_lru_isolate/o) { print "WARNING: Failed to parse mm_vmscan_lru_isolate as expected\n"; print " $details\n"; @@ -387,7 +387,7 @@ EVENT_PROCESS: } $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty; } elsif ($tracepoint eq "mm_vmscan_lru_shrink_inactive") { - $details = $5; + $details = $6; if ($details !~ /$regex_lru_shrink_inactive/o) { print "WARNING: Failed to parse mm_vmscan_lru_shrink_inactive as expected\n"; print " $details\n"; @@ -397,7 +397,7 @@ EVENT_PROCESS: my $nr_reclaimed = $4; $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED} += $nr_reclaimed; } elsif ($tracepoint eq "mm_vmscan_writepage") { - $details = $5; + $details = $6; if ($details !~ /$regex_writepage/o) { print "WARNING: Failed to parse mm_vmscan_writepage as expected\n"; print " $details\n"; -- cgit v0.10.2 From b30afea019a4548ee77b73e83f03104e0e3a0556 Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Thu, 23 Jan 2014 15:53:18 -0800 Subject: include/linux/genalloc.h: spinlock_t needs spinlock_types.h Compiling a C file which includes genalloc.h but without spinlock_types.h being included before, we will see the compile error below. include/linux/genalloc.h:54:2: error: unknown type name `spinlock_t' Include spinlock_types.h from genalloc.h to fix the problem. Signed-off-by: Shawn Guo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 1eda33d..1c2fdaa 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -30,6 +30,8 @@ #ifndef __GENALLOC_H__ #define __GENALLOC_H__ +#include + struct device; struct device_node; -- cgit v0.10.2 From c980e66a556659f14da2294e1fc696e1352b5d00 Mon Sep 17 00:00:00 2001 From: Jianguo Wu Date: Thu, 23 Jan 2014 15:53:19 -0800 Subject: mm: do_mincore() cleanup Two cleanups: 1. remove redundant codes for hugetlb pages. 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, this may increase do_mincore() calls, remove it. Signed-off-by: Jianguo Wu Acked-by: Johannes Weiner Cc: Minchan Kim Cc: qiuxishi Reviewed-by: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mincore.c b/mm/mincore.c index da2be56..1016233 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -225,13 +225,6 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); - if (is_vm_hugetlb_page(vma)) { - mincore_hugetlb_page_range(vma, addr, end, vec); - return (end - addr) >> PAGE_SHIFT; - } - - end = pmd_addr_end(addr, end); - if (is_vm_hugetlb_page(vma)) mincore_hugetlb_page_range(vma, addr, end, vec); else -- cgit v0.10.2 From baae911b27b8dbee6830f4e3ef0fcf4dc8e9c07b Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Thu, 23 Jan 2014 15:53:21 -0800 Subject: sched/numa: fix setting of cpupid on page migration twice Commit 7851a45cd3f6 ("mm: numa: Copy cpupid on page migration") copiess over the cpupid at page migration time. It is unnecessary to set it again in migrate_misplaced_transhuge_page(). Signed-off-by: Wanpeng Li Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Mel Gorman Cc: Rik van Riel Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/migrate.c b/mm/migrate.c index 4b3996e..734704f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1753,8 +1753,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, if (!new_page) goto out_fail; - page_cpupid_xchg_last(new_page, page_cpupid_last(page)); - isolated = numamigrate_isolate_page(pgdat, page); if (!isolated) { put_page(new_page); -- cgit v0.10.2 From 0b1fb40a3b1291f2f12f13f644ac95cf756a00e6 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:22 -0800 Subject: mm: vmscan: shrink all slab objects if tight on memory When reclaiming kmem, we currently don't scan slabs that have less than batch_size objects (see shrink_slab_node()): while (total_scan >= batch_size) { shrinkctl->nr_to_scan = batch_size; shrinker->scan_objects(shrinker, shrinkctl); total_scan -= batch_size; } If there are only a few shrinkers available, such a behavior won't cause any problems, because the batch_size is usually small, but if we have a lot of slab shrinkers, which is perfectly possible since FS shrinkers are now per-superblock, we can end up with hundreds of megabytes of practically unreclaimable kmem objects. For instance, mounting a thousand of ext2 FS images with a hundred of files in each and iterating over all the files using du(1) will result in about 200 Mb of FS caches that cannot be dropped even with the aid of the vm.drop_caches sysctl! This problem was initially pointed out by Glauber Costa [*]. Glauber proposed to fix it by making the shrink_slab() always take at least one pass, to put it simply, turning the scan loop above to a do{}while() loop. However, this proposal was rejected, because it could result in more aggressive and frequent slab shrinking even under low memory pressure when total_scan is naturally very small. This patch is a slightly modified version of Glauber's approach. Similarly to Glauber's patch, it makes shrink_slab() scan less than batch_size objects, but only if the total number of objects we want to scan (total_scan) is greater than the total number of objects available (max_pass). Since total_scan is biased as half max_pass if the current delta change is small: if (delta < max_pass / 4) total_scan = min(total_scan, max_pass / 2); this is only possible if we are scanning at high prio. That said, this patch shouldn't change the vmscan behaviour if the memory pressure is low, but if we are tight on memory, we will do our best by trying to reclaim all available objects, which sounds reasonable. [*] http://www.spinics.net/lists/cgroups/msg06913.html Signed-off-by: Vladimir Davydov Cc: Mel Gorman Cc: Michal Hocko Cc: Johannes Weiner Cc: Rik van Riel Cc: Dave Chinner Cc: Glauber Costa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/vmscan.c b/mm/vmscan.c index 2254f36..45c1cf6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -281,17 +281,34 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, nr_pages_scanned, lru_pages, max_pass, delta, total_scan); - while (total_scan >= batch_size) { + /* + * Normally, we should not scan less than batch_size objects in one + * pass to avoid too frequent shrinker calls, but if the slab has less + * than batch_size objects in total and we are really tight on memory, + * we will try to reclaim all available objects, otherwise we can end + * up failing allocations although there are plenty of reclaimable + * objects spread over several slabs with usage less than the + * batch_size. + * + * We detect the "tight on memory" situations by looking at the total + * number of objects we want to scan (total_scan). If it is greater + * than the total number of objects on slab (max_pass), we must be + * scanning at high prio and therefore should try to reclaim as much as + * possible. + */ + while (total_scan >= batch_size || + total_scan >= max_pass) { unsigned long ret; + unsigned long nr_to_scan = min(batch_size, total_scan); - shrinkctl->nr_to_scan = batch_size; + shrinkctl->nr_to_scan = nr_to_scan; ret = shrinker->scan_objects(shrinker, shrinkctl); if (ret == SHRINK_STOP) break; freed += ret; - count_vm_events(SLABS_SCANNED, batch_size); - total_scan -= batch_size; + count_vm_events(SLABS_SCANNED, nr_to_scan); + total_scan -= nr_to_scan; cond_resched(); } -- cgit v0.10.2 From ec97097bca147d5718a5d2c024d1ec740b10096d Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:23 -0800 Subject: mm: vmscan: call NUMA-unaware shrinkers irrespective of nodemask If a shrinker is not NUMA-aware, shrink_slab() should call it exactly once with nid=0, but currently it is not true: if node 0 is not set in the nodemask or if it is not online, we will not call such shrinkers at all. As a result some slabs will be left untouched under some circumstances. Let us fix it. Signed-off-by: Vladimir Davydov Reported-by: Dave Chinner Cc: Mel Gorman Cc: Michal Hocko Cc: Johannes Weiner Cc: Rik van Riel Cc: Glauber Costa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/vmscan.c b/mm/vmscan.c index 45c1cf6..90c4075 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -369,16 +369,17 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl, } list_for_each_entry(shrinker, &shrinker_list, list) { - for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) { - if (!node_online(shrinkctl->nid)) - continue; - - if (!(shrinker->flags & SHRINKER_NUMA_AWARE) && - (shrinkctl->nid != 0)) - break; - + if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) { + shrinkctl->nid = 0; freed += shrink_slab_node(shrinkctl, shrinker, - nr_pages_scanned, lru_pages); + nr_pages_scanned, lru_pages); + continue; + } + + for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) { + if (node_online(shrinkctl->nid)) + freed += shrink_slab_node(shrinkctl, shrinker, + nr_pages_scanned, lru_pages); } } -- cgit v0.10.2 From 354f17e1e2512018a603793cc133e2f296f6ebc6 Mon Sep 17 00:00:00 2001 From: Philipp Hachtmann Date: Thu, 23 Jan 2014 15:53:24 -0800 Subject: mm/nobootmem: free_all_bootmem again get_allocated_memblock_reserved_regions_info() should work if it is compiled in. Extended the ifdef around get_allocated_memblock_memory_regions_info() to include get_allocated_memblock_reserved_regions_info() as well. Similar changes in nobootmem.c/free_low_memory_core_early() where the two functions are called. [akpm@linux-foundation.org: cleanup] Signed-off-by: Philipp Hachtmann Cc: qiuxishi Cc: David Howells Cc: Daeseok Youn Cc: Jiang Liu Acked-by: Yinghai Lu Cc: Zhang Yanfei Cc: Santosh Shilimkar Cc: Grygorii Strashko Cc: Tang Chen Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memblock.c b/mm/memblock.c index 64ed243..9c0aeef 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -266,33 +266,20 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u } } +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK + phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info( phys_addr_t *addr) { if (memblock.reserved.regions == memblock_reserved_init_regions) return 0; - /* - * Don't allow nobootmem allocator to free reserved memory regions - * array if - * - CONFIG_DEBUG_FS is enabled; - * - CONFIG_ARCH_DISCARD_MEMBLOCK is not enabled; - * - reserved memory regions array have been resized during boot. - * Otherwise debug_fs entry "sys/kernel/debug/memblock/reserved" - * will show garbage instead of state of memory reservations. - */ - if (IS_ENABLED(CONFIG_DEBUG_FS) && - !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)) - return 0; - *addr = __pa(memblock.reserved.regions); return PAGE_ALIGN(sizeof(struct memblock_region) * memblock.reserved.max); } -#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK - phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info( phys_addr_t *addr) { diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 17c8902..f73f298 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -116,23 +116,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start, static unsigned long __init free_low_memory_core_early(void) { unsigned long count = 0; - phys_addr_t start, end, size; + phys_addr_t start, end; u64 i; for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL) count += __free_memory_core(start, end); - /* Free memblock.reserved array if it was allocated */ - size = get_allocated_memblock_reserved_regions_info(&start); - if (size) - count += __free_memory_core(start, start + size); - #ifdef CONFIG_ARCH_DISCARD_MEMBLOCK - - /* Free memblock.memory array if it was allocated */ - size = get_allocated_memblock_memory_regions_info(&start); - if (size) - count += __free_memory_core(start, start + size); + { + phys_addr_t size; + + /* Free memblock.reserved array if it was allocated */ + size = get_allocated_memblock_reserved_regions_info(&start); + if (size) + count += __free_memory_core(start, start + size); + + /* Free memblock.memory array if it was allocated */ + size = get_allocated_memblock_memory_regions_info(&start); + if (size) + count += __free_memory_core(start, start + size); + } #endif return count; -- cgit v0.10.2 From ac13c4622bda2a9ff8f57bbbfeff48b2a42d0963 Mon Sep 17 00:00:00 2001 From: Nathan Zimmer Date: Thu, 23 Jan 2014 15:53:26 -0800 Subject: mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen Cc: Wen Congyang Cc: Kamezawa Hiroyuki Reviewed-by: Yasuaki Ishimatsu Cc: "Rafael J. Wysocki" Cc: Hedi Cc: Mike Travis Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a512a47..a650db2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1107,17 +1107,18 @@ int __ref add_memory(int nid, u64 start, u64 size) if (ret) return ret; - lock_memory_hotplug(); - res = register_memory_resource(start, size); ret = -EEXIST; if (!res) - goto out; + return ret; { /* Stupid hack to suppress address-never-null warning */ void *p = NODE_DATA(nid); new_pgdat = !p; } + + lock_memory_hotplug(); + new_node = !node_online(nid); if (new_node) { pgdat = hotadd_new_pgdat(nid, start); -- cgit v0.10.2 From 42aa83cb6757800f4e2b499f5db3127761517a6a Mon Sep 17 00:00:00 2001 From: Han Pingtian Date: Thu, 23 Jan 2014 15:53:28 -0800 Subject: mm: show message when updating min_free_kbytes in thp min_free_kbytes may be raised during THP's initialization. Sometimes, this will change the value which was set by the user. Showing this message will clarify this confusion. Only show this message when changing a value which was set by the user according to Michal Hocko's suggestion. Show the old value of min_free_kbytes according to Dave Hansen's suggestion. This will give user the chance to restore old value of min_free_kbytes. Signed-off-by: Han Pingtian Reviewed-by: Michal Hocko Cc: David Rientjes Cc: Mel Gorman Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 25fab71..afe7383 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -130,8 +130,14 @@ static int set_recommended_min_free_kbytes(void) (unsigned long) nr_free_buffer_pages() / 20); recommended_min <<= (PAGE_SHIFT-10); - if (recommended_min > min_free_kbytes) + if (recommended_min > min_free_kbytes) { + if (user_min_free_kbytes >= 0) + pr_info("raising min_free_kbytes from %d to %lu " + "to help transparent hugepage allocations\n", + min_free_kbytes, recommended_min); + min_free_kbytes = recommended_min; + } setup_per_zone_wmarks(); return 0; } diff --git a/mm/internal.h b/mm/internal.h index dc95e97..7e145e8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -99,6 +99,7 @@ extern void prep_compound_page(struct page *page, unsigned long order); #ifdef CONFIG_MEMORY_FAILURE extern bool is_free_buddy_page(struct page *page); #endif +extern int user_min_free_kbytes; #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a818d56..e3758a0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -205,7 +205,7 @@ static char * const zone_names[MAX_NR_ZONES] = { }; int min_free_kbytes = 1024; -int user_min_free_kbytes; +int user_min_free_kbytes = -1; static unsigned long __meminitdata nr_kernel_pages; static unsigned long __meminitdata nr_all_pages; -- cgit v0.10.2 From da29bd36224bfa008df5d83df496c07e31a0da6d Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Thu, 23 Jan 2014 15:53:29 -0800 Subject: mm/mm_init.c: make creation of the mm_kobj happen earlier than device_initcall The use of __initcall is to be eventually replaced by choosing one from the prioritized groupings laid out in init.h header: pure_initcall 0 core_initcall 1 postcore_initcall 2 arch_initcall 3 subsys_initcall 4 fs_initcall 5 device_initcall 6 late_initcall 7 In the interim, all __initcall are mapped onto device_initcall, which as can be seen above, comes quite late in the ordering. Currently the mm_kobj is created with __initcall in mm_sysfs_init(). This means that any other initcalls that want to reference the mm_kobj have to be device_initcall (or later), otherwise we will for example, trip the BUG_ON(!kobj) in sysfs's internal_create_group(). This unfairly restricts those users; for example something that clearly makes sense to be an arch_initcall will not be able to choose that. However, upon examination, it is only this way for historical reasons (i.e. simply not reprioritized yet). We see that sysfs is ready quite earlier in init/main.c via: vfs_caches_init |_ mnt_init |_ sysfs_init well ahead of the processing of the prioritized calls listed above. So we can recategorize mm_sysfs_init to be a pure_initcall, which in turn allows any mm_kobj initcall users a wider range (1 --> 7) of initcall priorities to choose from. Signed-off-by: Paul Gortmaker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mm_init.c b/mm/mm_init.c index 68562e9..857a643 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -202,5 +202,4 @@ static int __init mm_sysfs_init(void) return 0; } - -__initcall(mm_sysfs_init); +pure_initcall(mm_sysfs_init); -- cgit v0.10.2 From a64fb3cd610c8e6806512dbac63f3fc45812d8fd Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Thu, 23 Jan 2014 15:53:30 -0800 Subject: mm: audit/fix non-modular users of module_init in core code Code that is obj-y (always built-in) or dependent on a bool Kconfig (built-in or absent) can never be modular. So using module_init as an alias for __initcall can be somewhat misleading. Fix these up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. The audit targets the following module_init users for change: mm/ksm.c bool KSM mm/mmap.c bool MMU mm/huge_memory.c bool TRANSPARENT_HUGEPAGE mm/mmu_notifier.c bool MMU_NOTIFIER Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of subsys_initcall (which makes sense for these files) will thus change this registration from level 6-device to level 4-subsys (i.e. slightly earlier). However no observable impact of that difference has been observed during testing. One might think that core_initcall (l2) or postcore_initcall (l3) would be more appropriate for anything in mm/ but if we look at some actual init functions themselves, we see things like: mm/huge_memory.c --> hugepage_init --> hugepage_init_sysfs mm/mmap.c --> init_user_reserve --> sysctl_user_reserve_kbytes mm/ksm.c --> ksm_init --> sysfs_create_group and hence the choice of subsys_initcall (l4) seems reasonable, and at the same time minimizes the risk of changing the priority too drastically all at once. We can adjust further in the future. Also, several instances of missing ";" at EOL are fixed. Signed-off-by: Paul Gortmaker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index afe7383..65c98eb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -661,7 +661,7 @@ out: hugepage_exit_sysfs(hugepage_kobj); return err; } -module_init(hugepage_init) +subsys_initcall(hugepage_init); static int __init setup_transparent_hugepage(char *str) { diff --git a/mm/ksm.c b/mm/ksm.c index f91ddf5..aa4c7c7 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2345,4 +2345,4 @@ out_free: out: return err; } -module_init(ksm_init) +subsys_initcall(ksm_init); diff --git a/mm/mmap.c b/mm/mmap.c index a0e7153..126d8b9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3142,7 +3142,7 @@ static int init_user_reserve(void) sysctl_user_reserve_kbytes = min(free_kbytes / 32, 1UL << 17); return 0; } -module_init(init_user_reserve) +subsys_initcall(init_user_reserve); /* * Initialise sysctl_admin_reserve_kbytes. @@ -3163,7 +3163,7 @@ static int init_admin_reserve(void) sysctl_admin_reserve_kbytes = min(free_kbytes / 32, 1UL << 13); return 0; } -module_init(init_admin_reserve) +subsys_initcall(init_admin_reserve); /* * Reinititalise user and admin reserves if memory is added or removed. @@ -3233,4 +3233,4 @@ static int __meminit init_reserve_notifier(void) return 0; } -module_init(init_reserve_notifier) +subsys_initcall(init_reserve_notifier); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 93e6089..41cefdf 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -329,5 +329,4 @@ static int __init mmu_notifier_init(void) { return init_srcu_struct(&srcu); } - -module_init(mmu_notifier_init); +subsys_initcall(mmu_notifier_init); -- cgit v0.10.2 From d2ab70aaae74456ed608740915dc82ef52291f69 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Jan 2014 15:53:30 -0800 Subject: mm/memcg: fix last_dead_count memory wastage Shorten mem_cgroup_reclaim_iter.last_dead_count from unsigned long to int: it's assigned from an int and compared with an int, and adjacent to an unsigned int: so there's no point to it being unsigned long, which wasted 104 bytes in every mem_cgroup_per_zone. Signed-off-by: Hugh Dickins Acked-by: Michal Hocko Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c871505..aa66cc4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -149,7 +149,7 @@ struct mem_cgroup_reclaim_iter { * matches memcg->dead_count of the hierarchy root group. */ struct mem_cgroup *last_visited; - unsigned long last_dead_count; + int last_dead_count; /* scan generation, increased every round-trip */ unsigned int generation; -- cgit v0.10.2 From d8ad30559715ce97afb7d1a93a12fd90e8fff312 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Jan 2014 15:53:32 -0800 Subject: mm/memcg: iteration skip memcgs not yet fully initialized It is surprising that the mem_cgroup iterator can return memcgs which have not yet been fully initialized. By accident (or trial and error?) this appears not to present an actual problem; but it may be better to prevent such surprises, by skipping memcgs not yet online. Signed-off-by: Hugh Dickins Cc: Tejun Heo Acked-by: Michal Hocko Cc: Johannes Weiner Cc: [3.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index aa66cc4..9537e13 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1119,10 +1119,8 @@ skip_node: * protected by css_get and the tree walk is rcu safe. */ if (next_css) { - struct mem_cgroup *mem = mem_cgroup_from_css(next_css); - - if (css_tryget(&mem->css)) - return mem; + if ((next_css->flags & CSS_ONLINE) && css_tryget(next_css)) + return mem_cgroup_from_css(next_css); else { prev_css = next_css; goto skip_node; -- cgit v0.10.2 From c3ac14b2677e0bc130238c5d01856592ac7a584b Mon Sep 17 00:00:00 2001 From: Xishi Qiu Date: Thu, 23 Jan 2014 15:53:33 -0800 Subject: doc/kmemcheck: add kmemcheck to kernel-parameters Add "kmemcheck=xx" to Documentation/kernel-parameters.txt. Signed-off-by: Xishi Qiu Cc: Vegard Nossum Cc: Pekka Enberg Cc: Rob Landley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index f085a61..5efebfd 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1445,6 +1445,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Valid arguments: on, off Default: on + kmemcheck= [X86] Boot-time kmemcheck enable/disable/one-shot mode + Valid arguments: 0, 1, 2 + kmemcheck=0 (disabled) + kmemcheck=1 (enabled) + kmemcheck=2 (one-shot mode) + Default: 2 (one-shot mode) + kstack=N [X86] Print N words from the kernel stack in oops dumps. -- cgit v0.10.2 From d49ad9355420c743c736bfd1dee9eaa5b1a7722a Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Thu, 23 Jan 2014 15:53:34 -0800 Subject: mm, oom: prefer thread group leaders for display purposes When two threads have the same badness score, it's preferable to kill the thread group leader so that the actual process name is printed to the kernel log rather than the thread group name which may be shared amongst several processes. This was the behavior when select_bad_process() used to do for_each_process(), but it now iterates threads instead and leads to ambiguity. Signed-off-by: David Rientjes Cc: Johannes Weiner Cc: Michal Hocko Cc: KAMEZAWA Hiroyuki Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9537e13..c8336e8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1841,13 +1841,18 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, break; }; points = oom_badness(task, memcg, NULL, totalpages); - if (points > chosen_points) { - if (chosen) - put_task_struct(chosen); - chosen = task; - chosen_points = points; - get_task_struct(chosen); - } + if (!points || points < chosen_points) + continue; + /* Prefer thread group leaders for display purposes */ + if (points == chosen_points && + thread_group_leader(chosen)) + continue; + + if (chosen) + put_task_struct(chosen); + chosen = task; + chosen_points = points; + get_task_struct(chosen); } css_task_iter_end(&it); } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 054ff47..37b1b19 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -327,10 +327,14 @@ static struct task_struct *select_bad_process(unsigned int *ppoints, break; }; points = oom_badness(p, NULL, nodemask, totalpages); - if (points > chosen_points) { - chosen = p; - chosen_points = points; - } + if (!points || points < chosen_points) + continue; + /* Prefer thread group leaders for display purposes */ + if (points == chosen_points && thread_group_leader(chosen)) + continue; + + chosen = p; + chosen_points = points; } if (chosen) get_task_struct(chosen); -- cgit v0.10.2 From ecc736fc3c71c411a9d201d8588c9e7e049e5d8c Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 23 Jan 2014 15:53:35 -0800 Subject: memcg: fix endless loop caused by mem_cgroup_iter Hugh has reported an endless loop when the hardlimit reclaim sees the same group all the time. This might happen when the reclaim races with the memcg removal. shrink_zone [rmdir root] mem_cgroup_iter(root, NULL, reclaim) // prev = NULL rcu_read_lock() mem_cgroup_iter_load last_visited = iter->last_visited // gets root || NULL css_tryget(last_visited) // failed last_visited = NULL [1] memcg = root = __mem_cgroup_iter_next(root, NULL) mem_cgroup_iter_update iter->last_visited = root; reclaim->generation = iter->generation mem_cgroup_iter(root, root, reclaim) // prev = root rcu_read_lock mem_cgroup_iter_load last_visited = iter->last_visited // gets root css_tryget(last_visited) // failed [1] The issue seemed to be introduced by commit 5f5781619718 ("memcg: relax memcg iter caching") which has replaced unconditional css_get/css_put by css_tryget/css_put for the cached iterator. This patch fixes the issue by skipping css_tryget on the root of the tree walk in mem_cgroup_iter_load and symmetrically doesn't release it in mem_cgroup_iter_update. Signed-off-by: Michal Hocko Reported-by: Hugh Dickins Tested-by: Hugh Dickins Cc: Johannes Weiner Cc: Greg Thelen Cc: [3.10+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c8336e8..da07784 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1158,7 +1158,15 @@ mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter, if (iter->last_dead_count == *sequence) { smp_rmb(); position = iter->last_visited; - if (position && !css_tryget(&position->css)) + + /* + * We cannot take a reference to root because we might race + * with root removal and returning NULL would end up in + * an endless loop on the iterator user level when root + * would be returned all the time. + */ + if (position && position != root && + !css_tryget(&position->css)) position = NULL; } return position; @@ -1167,9 +1175,11 @@ mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter, static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter, struct mem_cgroup *last_visited, struct mem_cgroup *new_position, + struct mem_cgroup *root, int sequence) { - if (last_visited) + /* root reference counting symmetric to mem_cgroup_iter_load */ + if (last_visited && last_visited != root) css_put(&last_visited->css); /* * We store the sequence count from the time @last_visited was @@ -1244,7 +1254,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, memcg = __mem_cgroup_iter_next(root, last_visited); if (reclaim) { - mem_cgroup_iter_update(iter, last_visited, memcg, seq); + mem_cgroup_iter_update(iter, last_visited, memcg, root, + seq); if (!memcg) iter->generation++; -- cgit v0.10.2 From 0eef615665ede1e0d603ea9ecca88c1da6f02234 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 23 Jan 2014 15:53:37 -0800 Subject: memcg: fix css reference leak and endless loop in mem_cgroup_iter Commit 19f39402864e ("memcg: simplify mem_cgroup_iter") has reorganized mem_cgroup_iter code in order to simplify it. A part of that change was dropping an optimization which didn't call css_tryget on the root of the walked tree. The patch however didn't change the css_put part in mem_cgroup_iter which excludes root. This wasn't an issue at the time because __mem_cgroup_iter_next bailed out for root early without taking a reference as cgroup iterators (css_next_descendant_pre) didn't visit root themselves. Nevertheless cgroup iterators have been reworked to visit root by commit bd8815a6d802 ("cgroup: make css_for_each_descendant() and friends include the origin css in the iteration") when the root bypass have been dropped in __mem_cgroup_iter_next. This means that css_put is not called for root and so css along with mem_cgroup and other cgroup internal object tied by css lifetime are never freed. Fix the issue by reintroducing root check in __mem_cgroup_iter_next and do not take css reference for it. This reference counting magic protects us also from another issue, an endless loop reported by Hugh Dickins when reclaim races with root removal and css_tryget called by iterator internally would fail. There would be no other nodes to visit so __mem_cgroup_iter_next would return NULL and mem_cgroup_iter would interpret it as "start looping from root again" and so mem_cgroup_iter would loop forever internally. Signed-off-by: Michal Hocko Reported-by: Hugh Dickins Tested-by: Hugh Dickins Cc: Johannes Weiner Cc: Greg Thelen Cc: [3.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index da07784..98f80be 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1117,14 +1117,22 @@ skip_node: * skipped and we should continue the tree walk. * last_visited css is safe to use because it is * protected by css_get and the tree walk is rcu safe. + * + * We do not take a reference on the root of the tree walk + * because we might race with the root removal when it would + * be the only node in the iterated hierarchy and mem_cgroup_iter + * would end up in an endless loop because it expects that at + * least one valid node will be returned. Root cannot disappear + * because caller of the iterator should hold it already so + * skipping css reference should be safe. */ if (next_css) { - if ((next_css->flags & CSS_ONLINE) && css_tryget(next_css)) + if ((next_css->flags & CSS_ONLINE) && + (next_css == &root->css || css_tryget(next_css))) return mem_cgroup_from_css(next_css); - else { - prev_css = next_css; - goto skip_node; - } + + prev_css = next_css; + goto skip_node; } return NULL; -- cgit v0.10.2 From 6c14466cc00ff13121ae782d33d9df0fde20b124 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 23 Jan 2014 15:53:38 -0800 Subject: mm: improve documentation of page_order Developers occasionally try and optimise PFN scanners by using page_order but miss that in general it requires zone->lock. This has happened twice for compaction.c and rejected both times. This patch clarifies the documentation of page_order and adds a note to compaction.c why page_order is not used. [akpm@linux-foundation.org: tweaks] [lauraa@codeaurora.org: Corrected a page_zone(page)->lock reference] Signed-off-by: Mel Gorman Acked-by: Rafael Aquini Acked-by: Minchan Kim Cc: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/compaction.c b/mm/compaction.c index e0ab02d..b48c525 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -523,7 +523,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, if (!isolation_suitable(cc, page)) goto next_pageblock; - /* Skip if free */ + /* + * Skip if free. page_order cannot be used without zone->lock + * as nothing prevents parallel allocations or buddy merging. + */ if (PageBuddy(page)) continue; diff --git a/mm/internal.h b/mm/internal.h index 7e145e8..612c14f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -143,9 +143,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, #endif /* - * function for dealing with page's order in buddy system. - * zone->lock is already acquired when we use these. - * So, we don't need atomic page->flags operations here. + * This function returns the order of a free page in the buddy system. In + * general, page_zone(page)->lock must be held by the caller to prevent the + * page from being allocated in parallel and returning garbage as the order. + * If a caller does not hold page_zone(page)->lock, it must guarantee that the + * page cannot be allocated or merged in parallel. */ static inline unsigned long page_order(struct page *page) { -- cgit v0.10.2 From 0d8a4a3799ab007b7a5e50aff9da9558925e0c15 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 23 Jan 2014 15:53:39 -0800 Subject: memcg: remove unused code from kmem_cache_destroy_work_func Signed-off-by: Vladimir Davydov Reviewed-by: Michal Hocko Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 98f80be..19d5d42 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3359,11 +3359,9 @@ static void kmem_cache_destroy_work_func(struct work_struct *w) * So if we aren't down to zero, we'll just schedule a worker and try * again */ - if (atomic_read(&cachep->memcg_params->nr_pages) != 0) { + if (atomic_read(&cachep->memcg_params->nr_pages) != 0) kmem_cache_shrink(cachep); - if (atomic_read(&cachep->memcg_params->nr_pages) == 0) - return; - } else + else kmem_cache_destroy(cachep); } -- cgit v0.10.2 From a5998061daab27802c418debe662be98a6e42874 Mon Sep 17 00:00:00 2001 From: Jamie Liu Date: Thu, 23 Jan 2014 15:53:40 -0800 Subject: mm/swapfile.c: do not skip lowest_bit in scan_swap_map() scan loop In the second half of scan_swap_map()'s scan loop, offset is set to si->lowest_bit and then incremented before entering the loop for the first time, causing si->swap_map[si->lowest_bit] to be skipped. Signed-off-by: Jamie Liu Cc: Shaohua Li Acked-by: Hugh Dickins Cc: Minchan Kim Cc: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/swapfile.c b/mm/swapfile.c index d443dea..c6c13b05 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -616,7 +616,7 @@ scan: } } offset = si->lowest_bit; - while (++offset < scan_base) { + while (offset < scan_base) { if (!si->swap_map[offset]) { spin_lock(&si->lock); goto checks; @@ -629,6 +629,7 @@ scan: cond_resched(); latency_ration = LATENCY_LIMIT; } + offset++; } spin_lock(&si->lock); -- cgit v0.10.2 From 871beb8c313ab270242022d314e37db5044e2bab Mon Sep 17 00:00:00 2001 From: Fengguang Wu Date: Thu, 23 Jan 2014 15:53:41 -0800 Subject: mm/rmap: fix coccinelle warnings mm/rmap.c:851:9-10: WARNING: return of 0/1 in function 'invalid_mkclean_vma' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/rmap.c b/mm/rmap.c index 2dcd335..d9d4231 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -848,9 +848,9 @@ out: static bool invalid_mkclean_vma(struct vm_area_struct *vma, void *arg) { if (vma->vm_flags & VM_SHARED) - return 0; + return false; - return 1; + return true; } int page_mkclean(struct page *page) -- cgit v0.10.2 From 34228d473efe764d4db7c0536375f0c993e6e06a Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov Date: Thu, 23 Jan 2014 15:53:42 -0800 Subject: mm: ignore VM_SOFTDIRTY on VMA merging The VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all bits in vm_flags matched except dirty bit the kernel can't longer merge them and this forces the kernel to generate new VMAs instead. It finally may lead to the situation when userspace application reaches vm.max_map_count limit and get crashed in worse case | (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes | | (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error | xinit: connection to X server lost | | waiting for X server to shut down | /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup https://bugzilla.kernel.org/show_bug.cgi?id=67651 https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0 Initial problem came from missed VM_SOFTDIRTY in do_brk() routine but even if we would set up VM_SOFTDIRTY here, there is still a way to prevent VMAs from merging: one can call | echo 4 > /proc/$PID/clear_refs and clear all VM_SOFTDIRTY over all VMAs presented in memory map, then new do_brk() will try to extend old VMA and finds that dirty bit doesn't match thus new VMA will be generated. As discussed with Pavel, the right approach should be to ignore VM_SOFTDIRTY bit when we're trying to merge VMAs and if merge successed we mark extended VMA with dirty bit where needed. Signed-off-by: Cyrill Gorcunov Reported-by: Bastian Hougaard Reported-by: Mel Gorman Cc: Pavel Emelyanov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mmap.c b/mm/mmap.c index 126d8b9..20ff0c3 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -894,7 +894,15 @@ again: remove_next = 1 + (end > next->vm_end); static inline int is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags) { - if (vma->vm_flags ^ vm_flags) + /* + * VM_SOFTDIRTY should not prevent from VMA merging, if we + * match the flags but dirty bit -- the caller should mark + * merged VMA as dirty. If dirty bit won't be excluded from + * comparison, we increase pressue on the memory system forcing + * the kernel to generate new VMAs when old one could be + * extended instead. + */ + if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY) return 0; if (vma->vm_file != file) return 0; @@ -1083,7 +1091,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct * return a->vm_end == b->vm_start && mpol_equal(vma_policy(a), vma_policy(b)) && a->vm_file == b->vm_file && - !((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC)) && + !((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) && b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT); } -- cgit v0.10.2 From 0c79a8e29b5fcbcbfd611daf9d500cfad8370fcf Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:53:43 -0800 Subject: asm/types.h: Remove include/asm-generic/int-l64.h Now all 64-bit architectures have been converted to int-ll64.h, we can remove int-l64.h in kernelspace. For backwards compatibility, alpha, ia64, mips64, and powerpc64 still use int-l64.h in userspace. This is the (reworked for UAPI) non-documentation part of more than two year old "asm/types.h: All architectures use int-ll64.h in kernelspace" (https://lkml.org/lkml/2011/8/13/104) Since (from include/uapi/asm-generic/types.h) is used for both kernel and user space, include/asm-generic/int-ll64.h cannot just become include/asm-generic/types.h, as Arnd suggested. Signed-off-by: Geert Uytterhoeven Acked-by: Arnd Bergmann Cc: Al Viro Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/asm-generic/int-l64.h b/include/asm-generic/int-l64.h deleted file mode 100644 index 27d4ec0..0000000 --- a/include/asm-generic/int-l64.h +++ /dev/null @@ -1,49 +0,0 @@ -/* - * asm-generic/int-l64.h - * - * Integer declarations for architectures which use "long" - * for 64-bit types. - */ -#ifndef _ASM_GENERIC_INT_L64_H -#define _ASM_GENERIC_INT_L64_H - -#include - - -#ifndef __ASSEMBLY__ - -typedef signed char s8; -typedef unsigned char u8; - -typedef signed short s16; -typedef unsigned short u16; - -typedef signed int s32; -typedef unsigned int u32; - -typedef signed long s64; -typedef unsigned long u64; - -#define S8_C(x) x -#define U8_C(x) x ## U -#define S16_C(x) x -#define U16_C(x) x ## U -#define S32_C(x) x -#define U32_C(x) x ## U -#define S64_C(x) x ## L -#define U64_C(x) x ## UL - -#else /* __ASSEMBLY__ */ - -#define S8_C(x) x -#define U8_C(x) x -#define S16_C(x) x -#define U16_C(x) x -#define S32_C(x) x -#define U32_C(x) x -#define S64_C(x) x -#define U64_C(x) x - -#endif /* __ASSEMBLY__ */ - -#endif /* _ASM_GENERIC_INT_L64_H */ diff --git a/include/uapi/asm-generic/types.h b/include/uapi/asm-generic/types.h index bd39806..a387792 100644 --- a/include/uapi/asm-generic/types.h +++ b/include/uapi/asm-generic/types.h @@ -1,8 +1,7 @@ #ifndef _ASM_GENERIC_TYPES_H #define _ASM_GENERIC_TYPES_H /* - * int-ll64 is used practically everywhere now, - * so use it as a reasonable default. + * int-ll64 is used everywhere now. */ #include -- cgit v0.10.2 From 4a102b4d144f0422eb8b0a59c7cb194bf12163a9 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 23 Jan 2014 15:53:45 -0800 Subject: drivers/mailbox/omap: make mbox->irq signed for error handling There is a bug in omap2_mbox_probe() where we try do: mbox->irq = platform_get_irq(pdev, info->irq_id); if (mbox->irq < 0) { The problem is that mbox->irq is unsigned so the error handling doesn't work. I've changed it to a signed integer. Signed-off-by: Dan Carpenter Cc: Suman Anna Cc: Greg Kroah-Hartman Cc: Omar Ramirez Luna Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/mailbox/omap-mbox.h b/drivers/mailbox/omap-mbox.h index 6cd38fc..86d7518 100644 --- a/drivers/mailbox/omap-mbox.h +++ b/drivers/mailbox/omap-mbox.h @@ -52,7 +52,7 @@ struct omap_mbox_queue { struct omap_mbox { const char *name; - unsigned int irq; + int irq; struct omap_mbox_queue *txq, *rxq; struct omap_mbox_ops *ops; struct device *dev; -- cgit v0.10.2 From a3b25d9b774fbda2e6add28cf792941fd98fa999 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Thu, 23 Jan 2014 15:53:46 -0800 Subject: drivers/block/Kconfig: update RAM block device module name RAM block device support module name changed to brd.ko some years ago with an "rd" alias to match previous module implementation. This patch updates its Kconfig definition. Signed-off-by: Fabian Frederick Acked-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 86b9f37..9ffa90c 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -368,7 +368,8 @@ config BLK_DEV_RAM For details, read . To compile this driver as a module, choose M here: the - module will be called rd. + module will be called brd. An alias "rd" has been defined + for historical reasons. Most normal users won't need the RAM disk functionality, and can thus say N here. -- cgit v0.10.2 From 2252b62a56601c9e31396da230b4ce792f167fb4 Mon Sep 17 00:00:00 2001 From: Younger Liu Date: Thu, 23 Jan 2014 15:53:47 -0800 Subject: logfs: check for the return value after calling find_or_create_page() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In get_mapping_page(), after calling find_or_create_page(), the return value should be checked. This patch has been provided: http://www.spinics.net/lists/linux-fsdevel/msg66948.html but not been applied now. Signed-off-by: Younger Liu Cc: Younger Liu Cc: Vyacheslav Dubeyko Reviewed-by: Prasad Joshi Cc: Jörn Engel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/logfs/segment.c b/fs/logfs/segment.c index d448a77..7f9b096 100644 --- a/fs/logfs/segment.c +++ b/fs/logfs/segment.c @@ -62,7 +62,8 @@ static struct page *get_mapping_page(struct super_block *sb, pgoff_t index, page = read_cache_page(mapping, index, filler, sb); else { page = find_or_create_page(mapping, index, GFP_NOFS); - unlock_page(page); + if (page) + unlock_page(page); } return page; } -- cgit v0.10.2 From d57c33c5daa4efa9e4d303bd0faf868080b532be Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:48 -0800 Subject: add generic fixmap.h Many architectures provide an asm/fixmap.h which defines support for compile-time 'special' virtual mappings which need to be made before paging_init() has run. This support is also used for early ioremap on x86. Much of this support is identical across the architectures. This patch consolidates all of the common bits into asm-generic/fixmap.h which is intended to be included from arch/*/include/asm/fixmap.h. Signed-off-by: Mark Salter Acked-by: Arnd Bergmann Acked-by: Ralf Baechle Cc: Russell King Cc: Richard Kuo Cc: James Hogan Cc: Michal Simek Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: "H. Peter Anvin" Cc: Chris Metcalf Cc: Ingo Molnar Cc: Jeff Dike Cc: Paul Mundt Cc: Richard Weinberger Cc: Thomas Gleixner Cc: Jonas Bonn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/asm-generic/fixmap.h b/include/asm-generic/fixmap.h new file mode 100644 index 0000000..5a64ca4 --- /dev/null +++ b/include/asm-generic/fixmap.h @@ -0,0 +1,97 @@ +/* + * fixmap.h: compile-time virtual memory allocation + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 1998 Ingo Molnar + * + * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 + * x86_32 and x86_64 integration by Gustavo F. Padovan, February 2009 + * Break out common bits to asm-generic by Mark Salter, November 2013 + */ + +#ifndef __ASM_GENERIC_FIXMAP_H +#define __ASM_GENERIC_FIXMAP_H + +#include + +#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) +#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) + +#ifndef __ASSEMBLY__ +/* + * 'index to address' translation. If anyone tries to use the idx + * directly without translation, we catch the bug with a NULL-deference + * kernel oops. Illegal ranges of incoming indices are caught too. + */ +static __always_inline unsigned long fix_to_virt(const unsigned int idx) +{ + BUILD_BUG_ON(idx >= __end_of_fixed_addresses); + return __fix_to_virt(idx); +} + +static inline unsigned long virt_to_fix(const unsigned long vaddr) +{ + BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); + return __virt_to_fix(vaddr); +} + +/* + * Provide some reasonable defaults for page flags. + * Not all architectures use all of these different types and some + * architectures use different names. + */ +#ifndef FIXMAP_PAGE_NORMAL +#define FIXMAP_PAGE_NORMAL PAGE_KERNEL +#endif +#ifndef FIXMAP_PAGE_NOCACHE +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NOCACHE +#endif +#ifndef FIXMAP_PAGE_IO +#define FIXMAP_PAGE_IO PAGE_KERNEL_IO +#endif +#ifndef FIXMAP_PAGE_CLEAR +#define FIXMAP_PAGE_CLEAR __pgprot(0) +#endif + +#ifndef set_fixmap +#define set_fixmap(idx, phys) \ + __set_fixmap(idx, phys, FIXMAP_PAGE_NORMAL) +#endif + +#ifndef clear_fixmap +#define clear_fixmap(idx) \ + __set_fixmap(idx, 0, FIXMAP_PAGE_CLEAR) +#endif + +/* Return a pointer with offset calculated */ +#define __set_fixmap_offset(idx, phys, flags) \ +({ \ + unsigned long addr; \ + __set_fixmap(idx, phys, flags); \ + addr = fix_to_virt(idx) + ((phys) & (PAGE_SIZE - 1)); \ + addr; \ +}) + +#define set_fixmap_offset(idx, phys) \ + __set_fixmap_offset(idx, phys, FIXMAP_PAGE_NORMAL) + +/* + * Some hardware wants to get fixmapped without caching. + */ +#define set_fixmap_nocache(idx, phys) \ + __set_fixmap(idx, phys, FIXMAP_PAGE_NOCACHE) + +#define set_fixmap_offset_nocache(idx, phys) \ + __set_fixmap_offset(idx, phys, FIXMAP_PAGE_NOCACHE) + +/* + * Some fixmaps are for IO + */ +#define set_fixmap_io(idx, phys) \ + __set_fixmap(idx, phys, FIXMAP_PAGE_IO) + +#endif /* __ASSEMBLY__ */ +#endif /* __ASM_GENERIC_FIXMAP_H */ -- cgit v0.10.2 From 114cefc87ec445207f6968f584ea5edcb474f541 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:50 -0800 Subject: x86: use generic fixmap.h Signed-off-by: Mark Salter Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index e846225..7252cd3 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -175,64 +175,7 @@ static inline void __set_fixmap(enum fixed_addresses idx, } #endif -#define set_fixmap(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL) - -/* - * Some hardware wants to get fixmapped without caching. - */ -#define set_fixmap_nocache(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL_NOCACHE) - -#define clear_fixmap(idx) \ - __set_fixmap(idx, 0, __pgprot(0)) - -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without translation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static __always_inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} - -/* Return an pointer with offset calculated */ -static __always_inline unsigned long -__set_fixmap_offset(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) -{ - __set_fixmap(idx, phys, flags); - return fix_to_virt(idx) + (phys & (PAGE_SIZE - 1)); -} - -#define set_fixmap_offset(idx, phys) \ - __set_fixmap_offset(idx, phys, PAGE_KERNEL) - -#define set_fixmap_offset_nocache(idx, phys) \ - __set_fixmap_offset(idx, phys, PAGE_KERNEL_NOCACHE) +#include #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_FIXMAP_H */ -- cgit v0.10.2 From 978c5584015e9a843f96586a725141a80ff2109d Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:51 -0800 Subject: hexagon: use generic fixmap.h Signed-off-by: Mark Salter Acked-by: Richard Kuo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/hexagon/include/asm/fixmap.h b/arch/hexagon/include/asm/fixmap.h index b75b6bf..1387f84 100644 --- a/arch/hexagon/include/asm/fixmap.h +++ b/arch/hexagon/include/asm/fixmap.h @@ -26,45 +26,7 @@ */ #include -/* - * Full fixmap support involves set_fixmap() functions, but - * these may not be needed if all we're after is an area for - * highmem kernel mappings. - */ -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/** - * fix_to_virt -- "index to address" translation. - * - * If anyone tries to use the idx directly without translation, - * we catch the bug with a NULL-deference kernel oops. Illegal - * ranges of incoming indices are caught too. - */ -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * This branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #define kmap_get_fixmap_pte(vaddr) \ pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr), \ -- cgit v0.10.2 From 1c5c8043f5f4402105e87caec228373bf76d7793 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:52 -0800 Subject: metag: use generic fixmap.h Signed-off-by: Mark Salter Acked-by: James Hogan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/metag/include/asm/fixmap.h b/arch/metag/include/asm/fixmap.h index 3331275..af621b0 100644 --- a/arch/metag/include/asm/fixmap.h +++ b/arch/metag/include/asm/fixmap.h @@ -51,37 +51,7 @@ enum fixed_addresses { #define FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START ((FIXADDR_TOP - FIXADDR_SIZE) & PMD_MASK) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #define kmap_get_fixmap_pte(vaddr) \ pte_offset_kernel( \ -- cgit v0.10.2 From 142885379a8d296e89b9a89c95e40af7068b4bb3 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:53 -0800 Subject: microblaze: use generic fixmap.h Signed-off-by: Mark Salter Tested-by: Michal Simek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/microblaze/include/asm/fixmap.h b/arch/microblaze/include/asm/fixmap.h index f2b312e..06c0e2b 100644 --- a/arch/microblaze/include/asm/fixmap.h +++ b/arch/microblaze/include/asm/fixmap.h @@ -58,52 +58,12 @@ enum fixed_addresses { extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags); -#define set_fixmap(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL) -/* - * Some hardware wants to get fixmapped without caching. - */ -#define set_fixmap_nocache(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL_CI) - -#define clear_fixmap(idx) \ - __set_fixmap(idx, 0, __pgprot(0)) - #define __FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static __always_inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_CI -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #endif /* !__ASSEMBLY__ */ #endif -- cgit v0.10.2 From 19fd9629a72e6e9f35b7f4d7f5bb2d794a8ab4c7 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:54 -0800 Subject: mips: use generic fixmap.h Signed-off-by: Mark Salter Acked-by: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/mips/include/asm/fixmap.h b/arch/mips/include/asm/fixmap.h index dfaaf49..8c012af 100644 --- a/arch/mips/include/asm/fixmap.h +++ b/arch/mips/include/asm/fixmap.h @@ -71,38 +71,7 @@ enum fixed_addresses { #define FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #define kmap_get_fixmap_pte(vaddr) \ pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr), (vaddr)), (vaddr)), (vaddr)) -- cgit v0.10.2 From 9494a1e8428ea8e60ae77b221b9d32a5edc21ef4 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:55 -0800 Subject: powerpc: use generic fixmap.h Signed-off-by: Mark Salter Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/powerpc/include/asm/fixmap.h b/arch/powerpc/include/asm/fixmap.h index 5c2c023..90f604b 100644 --- a/arch/powerpc/include/asm/fixmap.h +++ b/arch/powerpc/include/asm/fixmap.h @@ -58,52 +58,12 @@ enum fixed_addresses { extern void __set_fixmap (enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags); -#define set_fixmap(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL) -/* - * Some hardware wants to get fixmapped without caching. - */ -#define set_fixmap_nocache(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL_NCG) - -#define clear_fixmap(idx) \ - __set_fixmap(idx, 0, __pgprot(0)) - #define __FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static __always_inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NCG -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #endif /* !__ASSEMBLY__ */ #endif -- cgit v0.10.2 From 083f7ba834c09ed9b3904bf2168c378c5550c970 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:56 -0800 Subject: sh: use generic fixmap.h Signed-off-by: Mark Salter Cc: Paul Mundt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/sh/include/asm/fixmap.h b/arch/sh/include/asm/fixmap.h index cbe0186..4daf91c 100644 --- a/arch/sh/include/asm/fixmap.h +++ b/arch/sh/include/asm/fixmap.h @@ -79,13 +79,6 @@ extern void __set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags); extern void __clear_fixmap(enum fixed_addresses idx, pgprot_t flags); -#define set_fixmap(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL) -/* - * Some hardware wants to get fixmapped without caching. - */ -#define set_fixmap_nocache(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL_NOCACHE) /* * used by vmalloc.c. * @@ -101,36 +94,8 @@ extern void __clear_fixmap(enum fixed_addresses idx, pgprot_t flags); #define FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NOCACHE - return __fix_to_virt(idx); -} +#include -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} #endif -- cgit v0.10.2 From 8b49233fd7a733fddd9050c0f67f3c8b9056cc34 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:57 -0800 Subject: tile: use generic fixmap.h Signed-off-by: Mark Salter Acked-by: Chris Metcalf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/tile/include/asm/fixmap.h b/arch/tile/include/asm/fixmap.h index c6b9c1b..ffe2637 100644 --- a/arch/tile/include/asm/fixmap.h +++ b/arch/tile/include/asm/fixmap.h @@ -25,9 +25,6 @@ #include #endif -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - /* * Here we define all the compile-time 'special' virtual * addresses. The point is to have a constant address at @@ -83,35 +80,7 @@ enum fixed_addresses { #define FIXADDR_START (FIXADDR_TOP + PAGE_SIZE - __FIXADDR_SIZE) #define FIXADDR_BOOT_START (FIXADDR_TOP + PAGE_SIZE - __FIXADDR_BOOT_SIZE) -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static __always_inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #endif /* !__ASSEMBLY__ */ -- cgit v0.10.2 From a6ce7114eed4f2f56edcd02fc6dad019ba2c04fc Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Thu, 23 Jan 2014 15:53:58 -0800 Subject: um: use generic fixmap.h Signed-off-by: Mark Salter Acked-by: Richard Weinberger Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/um/include/asm/fixmap.h b/arch/um/include/asm/fixmap.h index 21a423b..3094ea3c 100644 --- a/arch/um/include/asm/fixmap.h +++ b/arch/um/include/asm/fixmap.h @@ -43,13 +43,6 @@ enum fixed_addresses { extern void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags); -#define set_fixmap(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL) -/* - * Some hardware wants to get fixmapped without caching. - */ -#define set_fixmap_nocache(idx, phys) \ - __set_fixmap(idx, phys, PAGE_KERNEL_NOCACHE) /* * used by vmalloc.c. * @@ -62,37 +55,6 @@ extern void __set_fixmap (enum fixed_addresses idx, #define FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) -#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -/* - * 'index to address' translation. If anyone tries to use the idx - * directly without tranlation, we catch the bug with a NULL-deference - * kernel oops. Illegal ranges of incoming indices are caught too. - */ -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - /* - * this branch gets completely eliminated after inlining, - * except when someone tries to use fixaddr indices in an - * illegal way. (such as mixing up address types or using - * out-of-range indices). - * - * If it doesn't get removed, the linker will complain - * loudly with a reasonably clear error message.. - */ - if (idx >= __end_of_fixed_addresses) - __this_fixmap_does_not_exist(); - - return __fix_to_virt(idx); -} - -static inline unsigned long virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include #endif -- cgit v0.10.2 From 77719536dc00f8fd8f5abe6dadbde5331c37f996 Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Thu, 23 Jan 2014 15:53:59 -0800 Subject: conditionally define U32_MAX The symbol U32_MAX is defined in several spots. Change these definitions to be conditional. This is in preparation for the next patch, which centralizes the definition in . Signed-off-by: Alex Elder Cc: Sage Weil Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h index f8adaee..66a2e83 100644 --- a/fs/reiserfs/reiserfs.h +++ b/fs/reiserfs/reiserfs.h @@ -1958,7 +1958,9 @@ struct treepath var = {.path_length = ILLEGAL_PATH_ELEMENT_OFFSET, .reada = 0,} #define MAX_US_INT 0xffff // reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset +#ifndef U32_MAX #define U32_MAX (~(__u32)0) +#endif /* !U32_MAX */ static inline loff_t max_reiserfs_offset(struct inode *inode) { diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index 0442c3d..27fe66a 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -10,6 +10,7 @@ /* This seemed to be the easiest place to define these */ +#ifndef U32_MAX #define U8_MAX ((u8)(~0U)) #define U16_MAX ((u16)(~0U)) #define U32_MAX ((u32)(~0U)) @@ -24,6 +25,7 @@ #define S16_MIN ((s16)(-S16_MAX - 1)) #define S32_MIN ((s32)(-S32_MAX - 1)) #define S64_MIN ((s64)(-S64_MAX - 1LL)) +#endif /* !U32_MAX */ /* * in all cases, diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c index 8a52099..f439472 100644 --- a/net/ipv4/tcp_illinois.c +++ b/net/ipv4/tcp_illinois.c @@ -23,7 +23,9 @@ #define ALPHA_MIN ((3*ALPHA_SCALE)/10) /* ~0.3 */ #define ALPHA_MAX (10*ALPHA_SCALE) /* 10.0 */ #define ALPHA_BASE ALPHA_SCALE /* 1.0 */ +#ifndef U32_MAX #define U32_MAX ((u32)~0U) +#endif /* !U32_MAX */ #define RTT_MAX (U32_MAX / ALPHA_MAX) /* 3.3 secs */ #define BETA_SHIFT 6 -- cgit v0.10.2 From 89a0714106aac7309c7dfa0f004b39e1e89d2942 Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Thu, 23 Jan 2014 15:54:00 -0800 Subject: kernel.h: define u8, s8, u32, etc. limits Create constants that define the maximum and minimum values representable by the kernel types u8, s8, u16, s16, and so on. Signed-off-by: Alex Elder Cc: Sage Weil Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2aa3d4b0..f74bb58 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -29,6 +29,19 @@ #define ULLONG_MAX (~0ULL) #define SIZE_MAX (~(size_t)0) +#define U8_MAX ((u8)~0U) +#define S8_MAX ((s8)(U8_MAX>>1)) +#define S8_MIN ((s8)(-S8_MAX - 1)) +#define U16_MAX ((u16)~0U) +#define S16_MAX ((s16)(U16_MAX>>1)) +#define S16_MIN ((s16)(-S16_MAX - 1)) +#define U32_MAX ((u32)~0U) +#define S32_MAX ((s32)(U32_MAX>>1)) +#define S32_MIN ((s32)(-S32_MAX - 1)) +#define U64_MAX ((u64)~0ULL) +#define S64_MAX ((s64)(U64_MAX>>1)) +#define S64_MIN ((s64)(-S64_MAX - 1)) + #define STACK_MAGIC 0xdeadbeef #define REPEAT_BYTE(x) ((~0ul / 0xff) * (x)) -- cgit v0.10.2 From 04f9b74e4d96d349de12fdd4e6626af4a9f75e09 Mon Sep 17 00:00:00 2001 From: Alex Elder Date: Thu, 23 Jan 2014 15:54:01 -0800 Subject: remove extra definitions of U32_MAX Now that the definition is centralized in , the definitions of U32_MAX (and related) elsewhere in the kernel can be removed. Signed-off-by: Alex Elder Acked-by: Sage Weil Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h index 66a2e83..dfb617b 100644 --- a/fs/reiserfs/reiserfs.h +++ b/fs/reiserfs/reiserfs.h @@ -1958,10 +1958,6 @@ struct treepath var = {.path_length = ILLEGAL_PATH_ELEMENT_OFFSET, .reada = 0,} #define MAX_US_INT 0xffff // reiserfs version 2 has max offset 60 bits. Version 1 - 32 bit offset -#ifndef U32_MAX -#define U32_MAX (~(__u32)0) -#endif /* !U32_MAX */ - static inline loff_t max_reiserfs_offset(struct inode *inode) { if (get_inode_item_key_version(inode) == KEY_FORMAT_3_5) diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index 27fe66a..a6ef9cc 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -8,25 +8,6 @@ #include -/* This seemed to be the easiest place to define these */ - -#ifndef U32_MAX -#define U8_MAX ((u8)(~0U)) -#define U16_MAX ((u16)(~0U)) -#define U32_MAX ((u32)(~0U)) -#define U64_MAX ((u64)(~0ULL)) - -#define S8_MAX ((s8)(U8_MAX >> 1)) -#define S16_MAX ((s16)(U16_MAX >> 1)) -#define S32_MAX ((s32)(U32_MAX >> 1)) -#define S64_MAX ((s64)(U64_MAX >> 1LL)) - -#define S8_MIN ((s8)(-S8_MAX - 1)) -#define S16_MIN ((s16)(-S16_MAX - 1)) -#define S32_MIN ((s32)(-S32_MAX - 1)) -#define S64_MIN ((s64)(-S64_MAX - 1LL)) -#endif /* !U32_MAX */ - /* * in all cases, * void **p pointer to position pointer diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c index f439472..e498a62 100644 --- a/net/ipv4/tcp_illinois.c +++ b/net/ipv4/tcp_illinois.c @@ -23,9 +23,6 @@ #define ALPHA_MIN ((3*ALPHA_SCALE)/10) /* ~0.3 */ #define ALPHA_MAX (10*ALPHA_SCALE) /* 10.0 */ #define ALPHA_BASE ALPHA_SCALE /* 1.0 */ -#ifndef U32_MAX -#define U32_MAX ((u32)~0U) -#endif /* !U32_MAX */ #define RTT_MAX (U32_MAX / ALPHA_MAX) /* 3.3 secs */ #define BETA_SHIFT 6 -- cgit v0.10.2 From 00b2c76a6abbe082bb5afb89ee49ec325e9cd4d2 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 23 Jan 2014 15:54:02 -0800 Subject: include/linux/of.h: make for_each_child_of_node() reference its args when CONFIG_OF=n Make for_each_child_of_node() reference its args when CONFIG_OF=n to avoid warnings like: drivers/leds/leds-pwm.c:88:22: warning: unused variable 'node' [-Wunused-variable] struct device_node *node = pdev->dev.of_node; ^ Signed-off-by: David Howells Cc: Grant Likely Cc: Rob Herring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/of.h b/include/linux/of.h index 276c546..70c64ba 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -377,8 +377,13 @@ static inline bool of_have_populated_dt(void) return false; } +/* Kill an unused variable warning on a device_node pointer */ +static inline void __of_use_dn(const struct device_node *np) +{ +} + #define for_each_child_of_node(parent, child) \ - while (0) + while (__of_use_dn(parent), __of_use_dn(child), 0) #define for_each_available_child_of_node(parent, child) \ while (0) -- cgit v0.10.2 From e13e64ece037104f5b04d0e98929ad2149c5bb09 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 23 Jan 2014 15:54:03 -0800 Subject: drivers/gpu/drm/gma500/backlight.c: fix a defined-but-not-used warning for do_gma_backlight_set() Fix the following warning: drivers/gpu/drm/gma500/backlight.c:29:13: warning: 'do_gma_backlight_set' defined but not used [-Wunused-function] by moving the entire function inside the conditional section currently inside of it. All the places that call it are so conditionalised. Signed-off-by: David Howells Cc: Alan Cox Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/gpu/drm/gma500/backlight.c b/drivers/gpu/drm/gma500/backlight.c index 143eba3..ea7dfc59 100644 --- a/drivers/gpu/drm/gma500/backlight.c +++ b/drivers/gpu/drm/gma500/backlight.c @@ -26,13 +26,13 @@ #include "intel_bios.h" #include "power.h" +#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE static void do_gma_backlight_set(struct drm_device *dev) { -#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE struct drm_psb_private *dev_priv = dev->dev_private; backlight_update_status(dev_priv->backlight_device); -#endif } +#endif void gma_backlight_enable(struct drm_device *dev) { -- cgit v0.10.2 From f30c0c32b69b6467fa23e2798432262428587471 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 23 Jan 2014 15:54:04 -0800 Subject: drivers/mfd/max8998.c: fix pointer-integer size mismatch warning in max8998_i2c_get_driver_data() Fix up the following pointer-integer size mismatch warning in max8998_i2c_get_driver_data(): drivers/mfd/max8998.c: In function 'max8998_i2c_get_driver_data': drivers/mfd/max8998.c:178:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] return (int)match->data; ^ Signed-off-by: David Howells Cc: Tomasz Figa Cc: Mark Brown Cc: Samuel Ortiz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/mfd/max8998.c b/drivers/mfd/max8998.c index f47eaa7..612ca40 100644 --- a/drivers/mfd/max8998.c +++ b/drivers/mfd/max8998.c @@ -175,7 +175,7 @@ static inline int max8998_i2c_get_driver_data(struct i2c_client *i2c, if (IS_ENABLED(CONFIG_OF) && i2c->dev.of_node) { const struct of_device_id *match; match = of_match_node(max8998_dt_match, i2c->dev.of_node); - return (int)match->data; + return (int)(long)match->data; } return (int)id->driver_data; -- cgit v0.10.2 From c6d5f989e14f36d5ff71a9b79a6d3c3bf06c185f Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 23 Jan 2014 15:54:05 -0800 Subject: drivers/mfd/tps65217.c: fix pointer-integer size mismatch warning in tps65217_probe() Fix up the following pointer-integer size mismatch warning in tps65217_probe(): drivers/mfd/tps65217.c: In function 'tps65217_probe': drivers/mfd/tps65217.c:173:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] chip_id = (unsigned int)match->data; ^ Signed-off-by: David Howells Cc: AnilKumar Ch Cc: Samuel Ortiz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/mfd/tps65217.c b/drivers/mfd/tps65217.c index 6939ae5..966cf65 100644 --- a/drivers/mfd/tps65217.c +++ b/drivers/mfd/tps65217.c @@ -170,7 +170,7 @@ static int tps65217_probe(struct i2c_client *client, "Failed to find matching dt id\n"); return -EINVAL; } - chip_id = (unsigned int)match->data; + chip_id = (unsigned int)(unsigned long)match->data; status_off = of_property_read_bool(client->dev.of_node, "ti,pmic-shutdown-controller"); } -- cgit v0.10.2 From a79530e4d8c8ef2bece88f8dab680e541162f010 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 23 Jan 2014 15:54:06 -0800 Subject: drivers/video/aty/aty128fb.c: fix a warning pertaining to the aty128fb backlight variable Fix the following warning in the aty128fb driver: drivers/video/aty/aty128fb.c:363:12: warning: 'backlight' defined but not used [-Wunused-variable] static int backlight = 0; ^ as the variable's value is only read if CONFIG_FB_ATY128_BACKLIGHT=y. The variable is also set if MODULE is unset[*]. [*] I wonder if the conditional wrapper around aty128fb_setup() should be using CONFIG_MODULE rather than MODULE. Signed-off-by: David Howells Cc: Paul Mackerras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/aty/aty128fb.c b/drivers/video/aty/aty128fb.c index 12ca031..52108be 100644 --- a/drivers/video/aty/aty128fb.c +++ b/drivers/video/aty/aty128fb.c @@ -357,11 +357,13 @@ static int default_lcd_on = 1; static bool mtrr = true; #endif +#ifdef CONFIG_FB_ATY128_BACKLIGHT #ifdef CONFIG_PMAC_BACKLIGHT static int backlight = 1; #else static int backlight = 0; #endif +#endif /* PLL constants */ struct aty128_constants { @@ -1671,7 +1673,9 @@ static int aty128fb_setup(char *options) default_crt_on = simple_strtoul(this_opt+4, NULL, 0); continue; } else if (!strncmp(this_opt, "backlight:", 10)) { +#ifdef CONFIG_FB_ATY128_BACKLIGHT backlight = simple_strtoul(this_opt+10, NULL, 0); +#endif continue; } #ifdef CONFIG_MTRR -- cgit v0.10.2 From 59d42cd43c7335a3a8081fd6ee54ea41b0c239be Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 23 Jan 2014 15:54:07 -0800 Subject: drivers/vlynq/vlynq.c: fix another resource size off by 1 error We fixed the call to request_mem_region() in commit 3354f73b24c6 ("drivers/vlynq/vlynq.c: fix resource size off by 1 error"). But we need to fix the call the release_mem_region() as well. Signed-off-by: Dan Carpenter Cc: Florian Fainelli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/vlynq/vlynq.c b/drivers/vlynq/vlynq.c index 7b07135..c0227f9 100644 --- a/drivers/vlynq/vlynq.c +++ b/drivers/vlynq/vlynq.c @@ -762,7 +762,8 @@ static int vlynq_remove(struct platform_device *pdev) device_unregister(&dev->dev); iounmap(dev->local); - release_mem_region(dev->regs_start, dev->regs_end - dev->regs_start); + release_mem_region(dev->regs_start, + dev->regs_end - dev->regs_start + 1); kfree(dev); -- cgit v0.10.2 From a7e1d98f3e2a0d858fddcac7c66b78b6dcfd9d2e Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Thu, 23 Jan 2014 15:54:08 -0800 Subject: headers_check: special case seqbuf_dump() "make headers_check" warns about soundcard.h for (at least) five years now: [...]/usr/include/linux/soundcard.h:1054: userspace cannot reference function or variable defined in the kernel We're apparently stuck with providing OSSlib-3.8 compatibility, so let's special case this declaration just to silence it. Notes: 0) Support for OSSlib post 3.8 was already removed in commit 43a990765a ("sound: Remove OSSlib stuff from linux/soundcard.h"). Five years have passed since that commit: do people still care about OSSlib-3.8? If not, quite a bit of code could be remove from soundcard.h (and probably ultrasound.h). 2) By the way, what is actually meant by: It is no longer possible to actually link against OSSlib with this header, but we still provide these macros for programs using them. Doesn't that mean compatibility to OSSlib isn't even useful? 3) Anyhow, a previous discussion soundcard.h, which led to that commit, starts at https://lkml.org/lkml/2009/1/20/349 . 4) And, yes, I sneaked in a whitespace fix. Signed-off-by: Paul Bolle Cc: Takashi Iwai Acked-by: Arnd Bergmann Cc: Michal Marek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/headers_check.pl b/scripts/headers_check.pl index 64ac238..62320f9 100644 --- a/scripts/headers_check.pl +++ b/scripts/headers_check.pl @@ -65,7 +65,11 @@ sub check_include sub check_declarations { - if ($line =~m/^(\s*extern|unsigned|char|short|int|long|void)\b/) { + # soundcard.h is what it is + if ($line =~ m/^void seqbuf_dump\(void\);/) { + return; + } + if ($line =~ m/^(\s*extern|unsigned|char|short|int|long|void)\b/) { printf STDERR "$filename:$lineno: " . "userspace cannot reference function or " . "variable defined in the kernel\n"; -- cgit v0.10.2 From e8b671460410c8fd996c8a1c228b718c547cc236 Mon Sep 17 00:00:00 2001 From: Mike Frysinger Date: Thu, 23 Jan 2014 15:54:09 -0800 Subject: include/uapi/linux/ppp-ioctl.h: pull in ppp_defs.h This header uses enum NPmode but doesn't include ppp_defs.h. If you try to use this header w/out including the defs header first, it leads to a build failure. So add the explicit include to fix it. Don't know of any packages directly impacted, but noticed while building some ppp code by hand. Signed-off-by: Mike Frysinger Cc: Paul Mackerras Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h index 2d9a885..63a23a3 100644 --- a/include/uapi/linux/ppp-ioctl.h +++ b/include/uapi/linux/ppp-ioctl.h @@ -12,6 +12,7 @@ #include #include +#include /* * Bit definitions for flags argument to PPPIOCGFLAGS/PPPIOCSFLAGS. -- cgit v0.10.2 From c318924582cf553c05afa48f81871f4ad46f014c Mon Sep 17 00:00:00 2001 From: Mike Frysinger Date: Thu, 23 Jan 2014 15:54:10 -0800 Subject: include/uapi/linux/dn.h: pull in ioctl.h header This header uses _IOW/_IOR defines but doesn't include ioctl.h for it. If you try to use this w/out including ioctl.h yourself, it can fail to build, so add the explicit include. Signed-off-by: Mike Frysinger Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/uapi/linux/dn.h b/include/uapi/linux/dn.h index 5fbdd3d..4295c74 100644 --- a/include/uapi/linux/dn.h +++ b/include/uapi/linux/dn.h @@ -1,6 +1,7 @@ #ifndef _LINUX_DN_H #define _LINUX_DN_H +#include #include #include -- cgit v0.10.2 From 0d9dfc23f4d8c17365c84eb48ecca28b963ba192 Mon Sep 17 00:00:00 2001 From: Mike Frysinger Date: Thu, 23 Jan 2014 15:54:11 -0800 Subject: uapi: convert u64 to __u64 in exported headers The u64 type is not defined in any exported kernel headers, so trying to use it will lead to build failures. Signed-off-by: Mike Frysinger Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h index 788128e..35f5f4c 100644 --- a/include/uapi/linux/nfs4.h +++ b/include/uapi/linux/nfs4.h @@ -150,7 +150,7 @@ #define NFS4_SECINFO_STYLE4_CURRENT_FH 0 #define NFS4_SECINFO_STYLE4_PARENT 1 -#define NFS4_MAX_UINT64 (~(u64)0) +#define NFS4_MAX_UINT64 (~(__u64)0) /* An NFS4 sessions server must support at least NFS4_MAX_OPS operations. * If a compound requires more operations, adjust NFS4_MAX_OPS accordingly. diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e244ed4..853bc1c 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -788,7 +788,7 @@ union perf_mem_data_src { #define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_S(a, s) \ - (((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) + (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) /* * single taken branch record layout: -- cgit v0.10.2 From aace05097a0fd467230e39acb148be0fdaa90068 Mon Sep 17 00:00:00 2001 From: "Du, Changbin" Date: Thu, 23 Jan 2014 15:54:12 -0800 Subject: lib/parser.c: add match_wildcard() function match_wildcard function is a simple implementation of wildcard matching algorithm. It only supports two usual wildcardes: '*' - matches zero or more characters '?' - matches one character This algorithm is safe since it is non-recursive. Signed-off-by: Du, Changbin Cc: Jason Baron Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/parser.h b/include/linux/parser.h index ea2281e..39d5b79 100644 --- a/include/linux/parser.h +++ b/include/linux/parser.h @@ -29,5 +29,6 @@ int match_token(char *, const match_table_t table, substring_t args[]); int match_int(substring_t *, int *result); int match_octal(substring_t *, int *result); int match_hex(substring_t *, int *result); +bool match_wildcard(const char *pattern, const char *str); size_t match_strlcpy(char *, const substring_t *, size_t); char *match_strdup(const substring_t *); diff --git a/lib/parser.c b/lib/parser.c index 807b2aa..ee52955 100644 --- a/lib/parser.c +++ b/lib/parser.c @@ -193,6 +193,56 @@ int match_hex(substring_t *s, int *result) } /** + * match_wildcard: - parse if a string matches given wildcard pattern + * @pattern: wildcard pattern + * @str: the string to be parsed + * + * Description: Parse the string @str to check if matches wildcard + * pattern @pattern. The pattern may contain two type wildcardes: + * '*' - matches zero or more characters + * '?' - matches one character + * If it's matched, return true, else return false. + */ +bool match_wildcard(const char *pattern, const char *str) +{ + const char *s = str; + const char *p = pattern; + bool star = false; + + while (*s) { + switch (*p) { + case '?': + s++; + p++; + break; + case '*': + star = true; + str = s; + if (!*++p) + return true; + pattern = p; + break; + default: + if (*s == *p) { + s++; + p++; + } else { + if (!star) + return false; + str++; + s = str; + p = pattern; + } + break; + } + } + + if (*p == '*') + ++p; + return !*p; +} + +/** * match_strlcpy: - Copy the characters from a substring_t to a sized buffer * @dest: where to copy to * @src: &substring_t to copy @@ -235,5 +285,6 @@ EXPORT_SYMBOL(match_token); EXPORT_SYMBOL(match_int); EXPORT_SYMBOL(match_octal); EXPORT_SYMBOL(match_hex); +EXPORT_SYMBOL(match_wildcard); EXPORT_SYMBOL(match_strlcpy); EXPORT_SYMBOL(match_strdup); -- cgit v0.10.2 From a3d2cca43cd31479d4f0414b4d014c4400a4e6d6 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Thu, 23 Jan 2014 15:54:13 -0800 Subject: lib/parser.c: put EXPORT_SYMBOLs in the conventional place Cc: Du, Changbin Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/parser.c b/lib/parser.c index ee52955..b6d1163 100644 --- a/lib/parser.c +++ b/lib/parser.c @@ -113,6 +113,7 @@ int match_token(char *s, const match_table_t table, substring_t args[]) return p->token; } +EXPORT_SYMBOL(match_token); /** * match_number: scan a number in the given base from a substring_t @@ -163,6 +164,7 @@ int match_int(substring_t *s, int *result) { return match_number(s, result, 0); } +EXPORT_SYMBOL(match_int); /** * match_octal: - scan an octal representation of an integer from a substring_t @@ -177,6 +179,7 @@ int match_octal(substring_t *s, int *result) { return match_number(s, result, 8); } +EXPORT_SYMBOL(match_octal); /** * match_hex: - scan a hex representation of an integer from a substring_t @@ -191,6 +194,7 @@ int match_hex(substring_t *s, int *result) { return match_number(s, result, 16); } +EXPORT_SYMBOL(match_hex); /** * match_wildcard: - parse if a string matches given wildcard pattern @@ -241,6 +245,7 @@ bool match_wildcard(const char *pattern, const char *str) ++p; return !*p; } +EXPORT_SYMBOL(match_wildcard); /** * match_strlcpy: - Copy the characters from a substring_t to a sized buffer @@ -263,6 +268,7 @@ size_t match_strlcpy(char *dest, const substring_t *src, size_t size) } return ret; } +EXPORT_SYMBOL(match_strlcpy); /** * match_strdup: - allocate a new string with the contents of a substring_t @@ -280,11 +286,4 @@ char *match_strdup(const substring_t *s) match_strlcpy(p, s, sz); return p; } - -EXPORT_SYMBOL(match_token); -EXPORT_SYMBOL(match_int); -EXPORT_SYMBOL(match_octal); -EXPORT_SYMBOL(match_hex); -EXPORT_SYMBOL(match_wildcard); -EXPORT_SYMBOL(match_strlcpy); EXPORT_SYMBOL(match_strdup); -- cgit v0.10.2 From 578b1e0701af34f9ef69daabda4431f1e8501109 Mon Sep 17 00:00:00 2001 From: "Du, Changbin" Date: Thu, 23 Jan 2014 15:54:14 -0800 Subject: dynamic_debug: add wildcard support to filter files/functions/modules Add wildcard '*'(matches zero or more characters) and '?' (matches one character) support when qurying debug flags. Now we can open debug messages using keywords. eg: 1. open debug logs in all usb drivers echo "file drivers/usb/* +p" > /dynamic_debug/control 2. open debug logs for usb xhci code echo "file *xhci* +p" > /dynamic_debug/control Signed-off-by: Du, Changbin Cc: Jason Baron Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index c37aeac..600ac57 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@ -8,6 +8,7 @@ * By Greg Banks * Copyright (c) 2008 Silicon Graphics Inc. All Rights Reserved. * Copyright (C) 2011 Bart Van Assche. All Rights Reserved. + * Copyright (C) 2013 Du, Changbin */ #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__ @@ -24,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -147,7 +149,8 @@ static int ddebug_change(const struct ddebug_query *query, list_for_each_entry(dt, &ddebug_tables, link) { /* match against the module name */ - if (query->module && strcmp(query->module, dt->mod_name)) + if (query->module && + !match_wildcard(query->module, dt->mod_name)) continue; for (i = 0; i < dt->num_ddebugs; i++) { @@ -155,14 +158,16 @@ static int ddebug_change(const struct ddebug_query *query, /* match against the source filename */ if (query->filename && - strcmp(query->filename, dp->filename) && - strcmp(query->filename, kbasename(dp->filename)) && - strcmp(query->filename, trim_prefix(dp->filename))) + !match_wildcard(query->filename, dp->filename) && + !match_wildcard(query->filename, + kbasename(dp->filename)) && + !match_wildcard(query->filename, + trim_prefix(dp->filename))) continue; /* match against the function */ if (query->function && - strcmp(query->function, dp->function)) + !match_wildcard(query->function, dp->function)) continue; /* match against the format */ -- cgit v0.10.2 From 8f073bd0d0bf7c519854a196215a837abbfbdc62 Mon Sep 17 00:00:00 2001 From: "Du, Changbin" Date: Thu, 23 Jan 2014 15:54:15 -0800 Subject: dynamic-debug-howto.txt: update since new wildcard support Add the usage of using new feature wildcard support. Signed-off-by: Du, Changbin Cc: Jason Baron Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/dynamic-debug-howto.txt b/Documentation/dynamic-debug-howto.txt index 1bbdcfc..46325eb 100644 --- a/Documentation/dynamic-debug-howto.txt +++ b/Documentation/dynamic-debug-howto.txt @@ -108,6 +108,12 @@ If your query set is big, you can batch them too: ~# cat query-batch-file > /dynamic_debug/control +A another way is to use wildcard. The match rule support '*' (matches +zero or more characters) and '?' (matches exactly one character).For +example, you can match all usb drivers: + + ~# echo "file drivers/usb/* +p" > /dynamic_debug/control + At the syntactical level, a command comprises a sequence of match specifications, followed by a flags change specification. @@ -315,6 +321,9 @@ nullarbor:~ # echo -n 'func svc_process -p' > nullarbor:~ # echo -n 'format "nfsd: READ" +p' > /dynamic_debug/control +// enable messages in files of which the pathes include string "usb" +nullarbor:~ # echo -n '*usb* +p' > /dynamic_debug/control + // enable all messages nullarbor:~ # echo -n '+p' > /dynamic_debug/control -- cgit v0.10.2 From c28aa1f0a847c36daa4280b611e2b54bad75c576 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:16 -0800 Subject: printk/cache: mark printk_once test variable __read_mostly Add #include to define __read_mostly. Convert cache.h to use uapi/linux/kernel.h instead of linux/kernel.h to avoid recursive #includes. Convert the ALIGN macro to __ALIGN_KERNEL. printk_once only sets the bool variable tested once so mark it __read_mostly. Neaten the alignment so it matches the rest of the pr__once #defines too. Signed-off-by: Joe Perches Reviewed-by: James Hogan Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h index 5a84b3a..efd1b92 100644 --- a/arch/ia64/include/asm/processor.h +++ b/arch/ia64/include/asm/processor.h @@ -71,6 +71,7 @@ #include #include #include +#include #include #include diff --git a/include/linux/cache.h b/include/linux/cache.h index 4c57065..17e7e82 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -1,11 +1,11 @@ #ifndef __LINUX_CACHE_H #define __LINUX_CACHE_H -#include +#include #include #ifndef L1_CACHE_ALIGN -#define L1_CACHE_ALIGN(x) ALIGN(x, L1_CACHE_BYTES) +#define L1_CACHE_ALIGN(x) __ALIGN_KERNEL(x, L1_CACHE_BYTES) #endif #ifndef SMP_CACHE_BYTES diff --git a/include/linux/printk.h b/include/linux/printk.h index 6949258..cc6f74d 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -5,6 +5,7 @@ #include #include #include +#include extern const char linux_banner[]; extern const char linux_proc_banner[]; @@ -253,17 +254,17 @@ extern asmlinkage void dump_stack(void) __cold; */ #ifdef CONFIG_PRINTK -#define printk_once(fmt, ...) \ -({ \ - static bool __print_once; \ - \ - if (!__print_once) { \ - __print_once = true; \ - printk(fmt, ##__VA_ARGS__); \ - } \ +#define printk_once(fmt, ...) \ +({ \ + static bool __print_once __read_mostly; \ + \ + if (!__print_once) { \ + __print_once = true; \ + printk(fmt, ##__VA_ARGS__); \ + } \ }) #else -#define printk_once(fmt, ...) \ +#define printk_once(fmt, ...) \ no_printk(fmt, ##__VA_ARGS__) #endif -- cgit v0.10.2 From aaf07621b8bbfdc0d87e9e5dbf1af3b24304998a Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:17 -0800 Subject: vsprintf: add %pad extension for dma_addr_t use dma_addr_t's can be either u32 or u64 depending on a CONFIG option. There are a few hundred dma_addr_t's printed via either cast to unsigned long long, unsigned long or no cast at all. Add %pad to be able to emit them without the cast. Update Documentation/printk-formats.txt too. Signed-off-by: Joe Perches Cc: "Shevchenko, Andriy" Cc: Rob Landley Cc: Laurent Pinchart Cc: Julia Lawall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 445ad74..6f4eb32 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -55,14 +55,21 @@ Struct Resources: For printing struct resources. The 'R' and 'r' specifiers result in a printed resource with ('R') or without ('r') a decoded flags member. -Physical addresses: +Physical addresses types phys_addr_t: - %pa 0x01234567 or 0x0123456789abcdef + %pa[p] 0x01234567 or 0x0123456789abcdef For printing a phys_addr_t type (and its derivatives, such as resource_size_t) which can vary based on build options, regardless of the width of the CPU data path. Passed by reference. +DMA addresses types dma_addr_t: + + %pad 0x01234567 or 0x0123456789abcdef + + For printing a dma_addr_t type which can vary based on build options, + regardless of the width of the CPU data path. Passed by reference. + Raw buffer as a hex string: %*ph 00 01 02 ... 3f %*phC 00:01:02: ... :3f diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 10909c5..185b6d3 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1155,6 +1155,30 @@ char *netdev_feature_string(char *buf, char *end, const u8 *addr, return number(buf, end, *(const netdev_features_t *)addr, spec); } +static noinline_for_stack +char *address_val(char *buf, char *end, const void *addr, + struct printf_spec spec, const char *fmt) +{ + unsigned long long num; + + spec.flags |= SPECIAL | SMALL | ZEROPAD; + spec.base = 16; + + switch (fmt[1]) { + case 'd': + num = *(const dma_addr_t *)addr; + spec.field_width = sizeof(dma_addr_t) * 2 + 2; + break; + case 'p': + default: + num = *(const phys_addr_t *)addr; + spec.field_width = sizeof(phys_addr_t) * 2 + 2; + break; + } + + return number(buf, end, num, spec); +} + int kptr_restrict __read_mostly; /* @@ -1218,7 +1242,8 @@ int kptr_restrict __read_mostly; * N no separator * The maximum supported length is 64 bytes of the input. Consider * to use print_hex_dump() for the larger input. - * - 'a' For a phys_addr_t type and its derivative types (passed by reference) + * - 'a[pd]' For address types [p] phys_addr_t, [d] dma_addr_t and derivatives + * (default assumed to be phys_addr_t, passed by reference) * - 'd[234]' For a dentry name (optionally 2-4 last components) * - 'D[234]' Same as 'd' but for a struct file * @@ -1353,11 +1378,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, } break; case 'a': - spec.flags |= SPECIAL | SMALL | ZEROPAD; - spec.field_width = sizeof(phys_addr_t) * 2 + 2; - spec.base = 16; - return number(buf, end, - (unsigned long long) *((phys_addr_t *)ptr), spec); + return address_val(buf, end, ptr, spec, fmt); case 'd': return dentry_name(buf, end, ptr, spec, fmt); case 'D': -- cgit v0.10.2 From 1d3fa370346d9d96ab0efb84e3312aed3aeb35ea Mon Sep 17 00:00:00 2001 From: Arun KS Date: Thu, 23 Jan 2014 15:54:19 -0800 Subject: printk: flush conflicting continuation line An earlier newline was missing and current print is from different task. In this scenario flush the continuation line and store this line seperatly. This patch fix the below scenario of timestamp interleaving, [ 28.154370 ] read_word_reg : reg[0x 3], reg[0x 4] data [0x 642] [ 28.155428 ] uart disconnect [ 31.947341 ] dvfs[cpufreq.c<275>]:plug-in cpu<1> done [ 28.155445 ] UART detached : send switch state 201 [ 32.014112 ] read_reg : reg[0x 3] data[0x21] [akpm@linux-foundation.org: simplify and condense the code] Signed-off-by: Arun KS Signed-off-by: Arun KS Cc: Joe Perches Cc: Tejun Heo Cc: Frederic Weisbecker Cc: Paul Gortmaker Cc: Kay Sievers Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f8b41bd..b1d255f 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1595,10 +1595,13 @@ asmlinkage int vprintk_emit(int facility, int level, * either merge it with the current buffer and flush, or if * there was a race with interrupts (prefix == true) then just * flush it out and store this line separately. + * If the preceding printk was from a different task and missed + * a newline, flush and append the newline. */ - if (cont.len && cont.owner == current) { - if (!(lflags & LOG_PREFIX)) - stored = cont_add(facility, level, text, text_len); + if (cont.len) { + if (cont.owner == current && !(lflags & LOG_PREFIX)) + stored = cont_add(facility, level, text, + text_len); cont_flush(LOG_NEWLINE); } -- cgit v0.10.2 From c9ecefea0be0673f8b3efbc37b15831d1f02a39f Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:20 -0800 Subject: get_maintainer: add commit author information to --rolestats MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit get_maintainer currently uses "Signed-off-by" style lines to find interested parties to send patches to when the MAINTAINERS file does not have a specific section entry with a matching file pattern. Add statistics for commit authors and lines added and deleted to the information provided by --rolestats. These statistics are also emitted whenever --rolestats and --git are selected even when there is a specified maintainer. This can have the effect of expanding the number of people that are shown as possible "maintainers" of a particular file because "authors", "added_lines", and "removed_lines" are also used as criterion for the --max-maintainers option separate from the "commit_signers". The first "--git-max-maintainers" values of each criterion are emitted. Any "ties" are not shown. For example: (forcedeth does not have a named maintainer) Old output: $ ./scripts/get_maintainer.pl -f drivers/net/ethernet/nvidia/forcedeth.c "David S. Miller" (commit_signer:8/10=80%) Jiri Pirko (commit_signer:2/10=20%) Patrick McHardy (commit_signer:2/10=20%) Larry Finger (commit_signer:1/10=10%) Peter Zijlstra (commit_signer:1/10=10%) netdev@vger.kernel.org (open list:NETWORKING DRIVERS) linux-kernel@vger.kernel.org (open list) New output: $ ./scripts/get_maintainer.pl -f drivers/net/ethernet/nvidia/forcedeth.c "David S. Miller" (commit_signer:8/10=80%) Jiri Pirko (commit_signer:2/10=20%,authored:2/10=20%,removed_lines:3/33=9%) Patrick McHardy (commit_signer:2/10=20%,authored:2/10=20%,added_lines:12/95=13%,removed_lines:10/33=30%) Larry Finger (commit_signer:1/10=10%,authored:1/10=10%,added_lines:35/95=37%) Peter Zijlstra (commit_signer:1/10=10%) "Peter Hüwe" (authored:1/10=10%,removed_lines:15/33=45%) Joe Perches (authored:1/10=10%) Neil Horman (added_lines:40/95=42%) Bill Pemberton (removed_lines:3/33=9%) netdev@vger.kernel.org (open list:NETWORKING DRIVERS) linux-kernel@vger.kernel.org (open list) Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index 5e4fb14..9c3986f 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -98,6 +98,7 @@ my %VCS_cmds_git = ( "available" => '(which("git") ne "") && (-d ".git")', "find_signers_cmd" => "git log --no-color --follow --since=\$email_git_since " . + '--numstat --no-merges ' . '--format="GitCommit: %H%n' . 'GitAuthor: %an <%ae>%n' . 'GitDate: %aD%n' . @@ -106,6 +107,7 @@ my %VCS_cmds_git = ( " -- \$file", "find_commit_signers_cmd" => "git log --no-color " . + '--numstat ' . '--format="GitCommit: %H%n' . 'GitAuthor: %an <%ae>%n' . 'GitDate: %aD%n' . @@ -114,6 +116,7 @@ my %VCS_cmds_git = ( " -1 \$commit", "find_commit_author_cmd" => "git log --no-color " . + '--numstat ' . '--format="GitCommit: %H%n' . 'GitAuthor: %an <%ae>%n' . 'GitDate: %aD%n' . @@ -125,6 +128,7 @@ my %VCS_cmds_git = ( "blame_commit_pattern" => "^([0-9a-f]+) ", "author_pattern" => "^GitAuthor: (.*)", "subject_pattern" => "^GitSubject: (.*)", + "stat_pattern" => "^(\\d+)\\t(\\d+)\\t\$file\$", ); my %VCS_cmds_hg = ( @@ -152,6 +156,7 @@ my %VCS_cmds_hg = ( "blame_commit_pattern" => "^([ 0-9a-f]+):", "author_pattern" => "^HgAuthor: (.*)", "subject_pattern" => "^HgSubject: (.*)", + "stat_pattern" => "^(\\d+)\t(\\d+)\t\$file\$", ); my $conf = which_conf(".get_maintainer.conf"); @@ -1269,20 +1274,30 @@ sub extract_formatted_signatures { } sub vcs_find_signers { - my ($cmd) = @_; + my ($cmd, $file) = @_; my $commits; my @lines = (); my @signatures = (); + my @authors = (); + my @stats = (); @lines = &{$VCS_cmds{"execute_cmd"}}($cmd); my $pattern = $VCS_cmds{"commit_pattern"}; + my $author_pattern = $VCS_cmds{"author_pattern"}; + my $stat_pattern = $VCS_cmds{"stat_pattern"}; + + $stat_pattern =~ s/(\$\w+)/$1/eeg; #interpolate $stat_pattern $commits = grep(/$pattern/, @lines); # of commits + @authors = grep(/$author_pattern/, @lines); @signatures = grep(/^[ \t]*${signature_pattern}.*\@.*$/, @lines); + @stats = grep(/$stat_pattern/, @lines); - return (0, @signatures) if !@signatures; +# print("stats: <@stats>\n"); + + return (0, \@signatures, \@authors, \@stats) if !@signatures; save_commits_by_author(@lines) if ($interactive); save_commits_by_signer(@lines) if ($interactive); @@ -1291,9 +1306,10 @@ sub vcs_find_signers { @signatures = grep(!/${penguin_chiefs}/i, @signatures); } + my ($author_ref, $authors_ref) = extract_formatted_signatures(@authors); my ($types_ref, $signers_ref) = extract_formatted_signatures(@signatures); - return ($commits, @$signers_ref); + return ($commits, $signers_ref, $authors_ref, \@stats); } sub vcs_find_author { @@ -1849,7 +1865,12 @@ sub vcs_assign { sub vcs_file_signoffs { my ($file) = @_; + my $authors_ref; + my $signers_ref; + my $stats_ref; + my @authors = (); my @signers = (); + my @stats = (); my $commits; $vcs_used = vcs_exists(); @@ -1858,13 +1879,59 @@ sub vcs_file_signoffs { my $cmd = $VCS_cmds{"find_signers_cmd"}; $cmd =~ s/(\$\w+)/$1/eeg; # interpolate $cmd - ($commits, @signers) = vcs_find_signers($cmd); + ($commits, $signers_ref, $authors_ref, $stats_ref) = vcs_find_signers($cmd, $file); + + @signers = @{$signers_ref} if defined $signers_ref; + @authors = @{$authors_ref} if defined $authors_ref; + @stats = @{$stats_ref} if defined $stats_ref; + +# print("commits: <$commits>\nsigners:<@signers>\nauthors: <@authors>\nstats: <@stats>\n"); foreach my $signer (@signers) { $signer = deduplicate_email($signer); } vcs_assign("commit_signer", $commits, @signers); + vcs_assign("authored", $commits, @authors); + if ($#authors == $#stats) { + my $stat_pattern = $VCS_cmds{"stat_pattern"}; + $stat_pattern =~ s/(\$\w+)/$1/eeg; #interpolate $stat_pattern + + my $added = 0; + my $deleted = 0; + for (my $i = 0; $i <= $#stats; $i++) { + if ($stats[$i] =~ /$stat_pattern/) { + $added += $1; + $deleted += $2; + } + } + my @tmp_authors = uniq(@authors); + foreach my $author (@tmp_authors) { + $author = deduplicate_email($author); + } + @tmp_authors = uniq(@tmp_authors); + my @list_added = (); + my @list_deleted = (); + foreach my $author (@tmp_authors) { + my $auth_added = 0; + my $auth_deleted = 0; + for (my $i = 0; $i <= $#stats; $i++) { + if ($author eq deduplicate_email($authors[$i]) && + $stats[$i] =~ /$stat_pattern/) { + $auth_added += $1; + $auth_deleted += $2; + } + } + for (my $i = 0; $i < $auth_added; $i++) { + push(@list_added, $author); + } + for (my $i = 0; $i < $auth_deleted; $i++) { + push(@list_deleted, $author); + } + } + vcs_assign("added_lines", $added, @list_added); + vcs_assign("removed_lines", $deleted, @list_deleted); + } } sub vcs_file_blame { @@ -1887,6 +1954,10 @@ sub vcs_file_blame { if ($email_git_blame_signatures) { if (vcs_is_hg()) { my $commit_count; + my $commit_authors_ref; + my $commit_signers_ref; + my $stats_ref; + my @commit_authors = (); my @commit_signers = (); my $commit = join(" -r ", @commits); my $cmd; @@ -1894,19 +1965,27 @@ sub vcs_file_blame { $cmd = $VCS_cmds{"find_commit_signers_cmd"}; $cmd =~ s/(\$\w+)/$1/eeg; #substitute variables in $cmd - ($commit_count, @commit_signers) = vcs_find_signers($cmd); + ($commit_count, $commit_signers_ref, $commit_authors_ref, $stats_ref) = vcs_find_signers($cmd, $file); + @commit_authors = @{$commit_authors_ref} if defined $commit_authors_ref; + @commit_signers = @{$commit_signers_ref} if defined $commit_signers_ref; push(@signers, @commit_signers); } else { foreach my $commit (@commits) { my $commit_count; + my $commit_authors_ref; + my $commit_signers_ref; + my $stats_ref; + my @commit_authors = (); my @commit_signers = (); my $cmd; $cmd = $VCS_cmds{"find_commit_signers_cmd"}; $cmd =~ s/(\$\w+)/$1/eeg; #substitute variables in $cmd - ($commit_count, @commit_signers) = vcs_find_signers($cmd); + ($commit_count, $commit_signers_ref, $commit_authors_ref, $stats_ref) = vcs_find_signers($cmd, $file); + @commit_authors = @{$commit_authors_ref} if defined $commit_authors_ref; + @commit_signers = @{$commit_signers_ref} if defined $commit_signers_ref; push(@signers, @commit_signers); } -- cgit v0.10.2 From ef575f47363413935e2251141e3477cf6e096500 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:54:21 -0800 Subject: MAINTAINERS: add an entry for the Macintosh HFSPlus Filesystem To make scripts/get_maintainer.pl output something sensible. Signed-off-by: Geert Uytterhoeven Cc: Alan Cox Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 6710476..4e76cdc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3955,6 +3955,12 @@ S: Orphan F: Documentation/filesystems/hfs.txt F: fs/hfs/ +HFSPLUS FILESYSTEM +L: linux-fsdevel@vger.kernel.org +S: Orphan +F: Documentation/filesystems/hfsplus.txt +F: fs/hfsplus/ + HGA FRAMEBUFFER DRIVER M: Ferenc Bakonyi L: linux-nvidia@lists.surfsouth.com -- cgit v0.10.2 From 6ab88e00704bde2d7c7f57793e12ed4976275ecd Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:22 -0800 Subject: MAINTAINERS: describe differences between F: and N: patterns MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is a difference in how scripts/get_maintainer.pl treats F: and N: file pattern matches. Describe those differences in the MAINTAINERS file. Signed-off-by: Joe Perches Cc: Greg Kroah-Hartman Cc: Mark Brown Cc: Uwe Kleine-König Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 4e76cdc..f801c3b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -93,6 +93,11 @@ Descriptions of section entries: N: Files and directories with regex patterns. N: [^a-z]tegra all files whose path contains the word tegra One pattern per line. Multiple N: lines acceptable. + scripts/get_maintainer.pl has different behavior for files that + match F: pattern and matches of N: patterns. By default, + get_maintainer will not look at git log history when an F: pattern + match occurs. When an N: match occurs, git log history is used + to also notify the people that have git commit signatures. X: Files and directories that are NOT maintained, same rules as F: Files exclusions are tested before file matches. Can be useful for excluding a specific subdirectory, for instance: -- cgit v0.10.2 From 1cc456119c53e5de45748382146c7d5f8765cf4e Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:23 -0800 Subject: MAINTAINERS: remove unnecessary EXYNOS DP DRIVER F: pattern Remove unnecessary pattern for Exynos DP header from MAINTAINERS file. After commit f9b1e013f1c6 ("video: exynos_dp: remove non-DT support for Exynos Display Port"), 'exynos_dp.h' has not been used. Signed-off-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index f801c3b..29ccb5f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3349,7 +3349,6 @@ M: Jingoo Han L: linux-fbdev@vger.kernel.org S: Maintained F: drivers/video/exynos/exynos_dp* -F: include/video/exynos_dp* EXYNOS MIPI DISPLAY DRIVERS M: Inki Dae -- cgit v0.10.2 From 0ec585d320acb3e9c33b2b2ba9cefb4a62483d42 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:24 -0800 Subject: backlight: jornada720: use devm_backlight_device_register() Use devm_backlight_device_register() to make cleanup paths simpler, and remove unnecessary remove(). Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/jornada720_bl.c b/drivers/video/backlight/jornada720_bl.c index 3ccb893..6ce96b4 100644 --- a/drivers/video/backlight/jornada720_bl.c +++ b/drivers/video/backlight/jornada720_bl.c @@ -115,9 +115,10 @@ static int jornada_bl_probe(struct platform_device *pdev) memset(&props, 0, sizeof(struct backlight_properties)); props.type = BACKLIGHT_RAW; props.max_brightness = BL_MAX_BRIGHT; - bd = backlight_device_register(S1D_DEVICENAME, &pdev->dev, NULL, - &jornada_bl_ops, &props); + bd = devm_backlight_device_register(&pdev->dev, S1D_DEVICENAME, + &pdev->dev, NULL, &jornada_bl_ops, + &props); if (IS_ERR(bd)) { ret = PTR_ERR(bd); dev_err(&pdev->dev, "failed to register device, err=%x\n", ret); @@ -139,18 +140,8 @@ static int jornada_bl_probe(struct platform_device *pdev) return 0; } -static int jornada_bl_remove(struct platform_device *pdev) -{ - struct backlight_device *bd = platform_get_drvdata(pdev); - - backlight_device_unregister(bd); - - return 0; -} - static struct platform_driver jornada_bl_driver = { .probe = jornada_bl_probe, - .remove = jornada_bl_remove, .driver = { .name = "jornada_bl", }, -- cgit v0.10.2 From 0561c1794a0df435631474a99c24220b9fc1bbfe Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:25 -0800 Subject: backlight: hp680_bl: use devm_backlight_device_register() Use devm_backlight_device_register() to make cleanup paths simpler. Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/hp680_bl.c b/drivers/video/backlight/hp680_bl.c index 00076ec..8ea42b8 100644 --- a/drivers/video/backlight/hp680_bl.c +++ b/drivers/video/backlight/hp680_bl.c @@ -110,8 +110,8 @@ static int hp680bl_probe(struct platform_device *pdev) memset(&props, 0, sizeof(struct backlight_properties)); props.type = BACKLIGHT_RAW; props.max_brightness = HP680_MAX_INTENSITY; - bd = backlight_device_register("hp680-bl", &pdev->dev, NULL, - &hp680bl_ops, &props); + bd = devm_backlight_device_register(&pdev->dev, "hp680-bl", &pdev->dev, + NULL, &hp680bl_ops, &props); if (IS_ERR(bd)) return PTR_ERR(bd); @@ -131,8 +131,6 @@ static int hp680bl_remove(struct platform_device *pdev) bd->props.power = 0; hp680bl_send_intensity(bd); - backlight_device_unregister(bd); - return 0; } -- cgit v0.10.2 From c76d1022b4ce399cfad835cc505b4f1d914f53cb Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:26 -0800 Subject: backlight: omap1: use devm_backlight_device_register() Use devm_backlight_device_register() to make cleanup paths simpler, and remove unnecessary remove(). Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/omap1_bl.c b/drivers/video/backlight/omap1_bl.c index ac11a46..a0dcd88 100644 --- a/drivers/video/backlight/omap1_bl.c +++ b/drivers/video/backlight/omap1_bl.c @@ -146,8 +146,8 @@ static int omapbl_probe(struct platform_device *pdev) memset(&props, 0, sizeof(struct backlight_properties)); props.type = BACKLIGHT_RAW; props.max_brightness = OMAPBL_MAX_INTENSITY; - dev = backlight_device_register("omap-bl", &pdev->dev, bl, &omapbl_ops, - &props); + dev = devm_backlight_device_register(&pdev->dev, "omap-bl", &pdev->dev, + bl, &omapbl_ops, &props); if (IS_ERR(dev)) return PTR_ERR(dev); @@ -170,20 +170,10 @@ static int omapbl_probe(struct platform_device *pdev) return 0; } -static int omapbl_remove(struct platform_device *pdev) -{ - struct backlight_device *dev = platform_get_drvdata(pdev); - - backlight_device_unregister(dev); - - return 0; -} - static SIMPLE_DEV_PM_OPS(omapbl_pm_ops, omapbl_suspend, omapbl_resume); static struct platform_driver omapbl_driver = { .probe = omapbl_probe, - .remove = omapbl_remove, .driver = { .name = "omap-bl", .pm = &omapbl_pm_ops, -- cgit v0.10.2 From 443956fdd501bdfa8471191ef23d1432ea5fa928 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:27 -0800 Subject: backlight: ot200_bl: use devm_backlight_device_register() Use devm_backlight_device_register() to make cleanup paths simpler. Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/ot200_bl.c b/drivers/video/backlight/ot200_bl.c index fdbb6ee..f5a5202 100644 --- a/drivers/video/backlight/ot200_bl.c +++ b/drivers/video/backlight/ot200_bl.c @@ -118,8 +118,9 @@ static int ot200_backlight_probe(struct platform_device *pdev) props.brightness = 100; props.type = BACKLIGHT_RAW; - bl = backlight_device_register(dev_name(&pdev->dev), &pdev->dev, data, - &ot200_backlight_ops, &props); + bl = devm_backlight_device_register(&pdev->dev, dev_name(&pdev->dev), + &pdev->dev, data, &ot200_backlight_ops, + &props); if (IS_ERR(bl)) { dev_err(&pdev->dev, "failed to register backlight\n"); retval = PTR_ERR(bl); @@ -137,10 +138,6 @@ error_devm_kzalloc: static int ot200_backlight_remove(struct platform_device *pdev) { - struct backlight_device *bl = platform_get_drvdata(pdev); - - backlight_device_unregister(bl); - /* on module unload set brightness to 100% */ cs5535_mfgpt_write(pwm_timer, MFGPT_REG_COUNTER, 0); cs5535_mfgpt_write(pwm_timer, MFGPT_REG_SETUP, MFGPT_SETUP_CNTEN); -- cgit v0.10.2 From ffb1f6c83b273f2b7aac207ad2220c28d3cfa44a Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:28 -0800 Subject: backlight: tosa: use devm_backlight_device_register() Use devm_backlight_device_register() to make cleanup paths simpler. Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/tosa_bl.c b/drivers/video/backlight/tosa_bl.c index b8db933..3ad6765 100644 --- a/drivers/video/backlight/tosa_bl.c +++ b/drivers/video/backlight/tosa_bl.c @@ -105,8 +105,9 @@ static int tosa_bl_probe(struct i2c_client *client, memset(&props, 0, sizeof(struct backlight_properties)); props.type = BACKLIGHT_RAW; props.max_brightness = 512 - 1; - data->bl = backlight_device_register("tosa-bl", &client->dev, data, - &bl_ops, &props); + data->bl = devm_backlight_device_register(&client->dev, "tosa-bl", + &client->dev, data, &bl_ops, + &props); if (IS_ERR(data->bl)) { ret = PTR_ERR(data->bl); goto err_reg; @@ -128,9 +129,7 @@ static int tosa_bl_remove(struct i2c_client *client) { struct tosa_bl_data *data = i2c_get_clientdata(client); - backlight_device_unregister(data->bl); data->bl = NULL; - return 0; } -- cgit v0.10.2 From 964598f239c37cd6df73d94458beaabec3dc6928 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:29 -0800 Subject: backlight: jornada720: use devm_lcd_device_register() Use devm_lcd_device_register() to make cleanup paths simpler, and remove unnecessary remove(). Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/jornada720_lcd.c b/drivers/video/backlight/jornada720_lcd.c index b061413..da3876c 100644 --- a/drivers/video/backlight/jornada720_lcd.c +++ b/drivers/video/backlight/jornada720_lcd.c @@ -100,7 +100,8 @@ static int jornada_lcd_probe(struct platform_device *pdev) struct lcd_device *lcd_device; int ret; - lcd_device = lcd_device_register(S1D_DEVICENAME, &pdev->dev, NULL, &jornada_lcd_props); + lcd_device = devm_lcd_device_register(&pdev->dev, S1D_DEVICENAME, + &pdev->dev, NULL, &jornada_lcd_props); if (IS_ERR(lcd_device)) { ret = PTR_ERR(lcd_device); @@ -119,18 +120,8 @@ static int jornada_lcd_probe(struct platform_device *pdev) return 0; } -static int jornada_lcd_remove(struct platform_device *pdev) -{ - struct lcd_device *lcd_device = platform_get_drvdata(pdev); - - lcd_device_unregister(lcd_device); - - return 0; -} - static struct platform_driver jornada_lcd_driver = { .probe = jornada_lcd_probe, - .remove = jornada_lcd_remove, .driver = { .name = "jornada_lcd", }, -- cgit v0.10.2 From 7dd78077369ebe7db031d9de1d6b4c766b3b4f7d Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:30 -0800 Subject: backlight: l4f00242t03: use devm_lcd_device_register() Use devm_lcd_device_register() to make cleanup paths simpler. Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/l4f00242t03.c b/drivers/video/backlight/l4f00242t03.c index b5fc13b..63e7638 100644 --- a/drivers/video/backlight/l4f00242t03.c +++ b/drivers/video/backlight/l4f00242t03.c @@ -223,8 +223,8 @@ static int l4f00242t03_probe(struct spi_device *spi) return PTR_ERR(priv->core_reg); } - priv->ld = lcd_device_register("l4f00242t03", - &spi->dev, priv, &l4f_ops); + priv->ld = devm_lcd_device_register(&spi->dev, "l4f00242t03", &spi->dev, + priv, &l4f_ops); if (IS_ERR(priv->ld)) return PTR_ERR(priv->ld); @@ -243,8 +243,6 @@ static int l4f00242t03_remove(struct spi_device *spi) struct l4f00242t03_priv *priv = spi_get_drvdata(spi); l4f00242t03_lcd_power_set(priv->ld, FB_BLANK_POWERDOWN); - lcd_device_unregister(priv->ld); - return 0; } -- cgit v0.10.2 From 0f53449d69b99f0455ecd3e3a73a38aa7555b174 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:30 -0800 Subject: backlight: tosa: use devm_lcd_device_register() Use devm_lcd_device_register() to make cleanup paths simpler. Signed-off-by: Jingoo Han Cc: Tomi Valkeinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/tosa_lcd.c b/drivers/video/backlight/tosa_lcd.c index be5d636..f08d641 100644 --- a/drivers/video/backlight/tosa_lcd.c +++ b/drivers/video/backlight/tosa_lcd.c @@ -206,8 +206,8 @@ static int tosa_lcd_probe(struct spi_device *spi) tosa_lcd_tg_on(data); - data->lcd = lcd_device_register("tosa-lcd", &spi->dev, data, - &tosa_lcd_ops); + data->lcd = devm_lcd_device_register(&spi->dev, "tosa-lcd", &spi->dev, + data, &tosa_lcd_ops); if (IS_ERR(data->lcd)) { ret = PTR_ERR(data->lcd); @@ -226,8 +226,6 @@ static int tosa_lcd_remove(struct spi_device *spi) { struct tosa_lcd_data *data = spi_get_drvdata(spi); - lcd_device_unregister(data->lcd); - if (data->i2c) i2c_unregister_device(data->i2c); -- cgit v0.10.2 From 81f5cdc1b2a48d466bc53f38e843a7981a4e5e0c Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:31 -0800 Subject: backlight: kb3886_bl: fix incorrect placement of __initdata marker The __initdata marker can be virtually anywhere on the line, EXCEPT right after "struct". The preferred location is before the "=" sign if there is one, or before the trailing ";" otherwise. It also fixes the following chechpatch warning. WARNING: __initdata should be placed after kb3886bl_device_table[] Signed-off-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/kb3886_bl.c b/drivers/video/backlight/kb3886_bl.c index 7592cc2..84a110a 100644 --- a/drivers/video/backlight/kb3886_bl.c +++ b/drivers/video/backlight/kb3886_bl.c @@ -78,7 +78,7 @@ static struct kb3886bl_machinfo *bl_machinfo; static unsigned long kb3886bl_flags; #define KB3886BL_SUSPENDED 0x01 -static struct dmi_system_id __initdata kb3886bl_device_table[] = { +static struct dmi_system_id kb3886bl_device_table[] __initdata = { { .ident = "Sahara Touch-iT", .matches = { -- cgit v0.10.2 From 2ce2386072854f25f03052f819d5fd11ddc75f6c Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:32 -0800 Subject: backlight: lp855x: remove unnecessary parentheses Remove unnecessary parentheses in order to fix the following checkpatch error. ERROR: return is not a function, parentheses are not required Signed-off-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/lp855x_bl.c b/drivers/video/backlight/lp855x_bl.c index cae80d5..2ca3a04 100644 --- a/drivers/video/backlight/lp855x_bl.c +++ b/drivers/video/backlight/lp855x_bl.c @@ -125,7 +125,7 @@ static bool lp855x_is_valid_rom_area(struct lp855x *lp, u8 addr) return false; } - return (addr >= start && addr <= end); + return addr >= start && addr <= end; } static int lp8557_bl_off(struct lp855x *lp) -- cgit v0.10.2 From 68585c41c9863688bff81fd69e37a2f585d474f9 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:54:33 -0800 Subject: backlight: lp8788: remove unnecessary parentheses Remove unnecessary parentheses in order to fix the following checkpatch error. ERROR: return is not a function, parentheses are not required Signed-off-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/video/backlight/lp8788_bl.c b/drivers/video/backlight/lp8788_bl.c index e49905d..daba34d 100644 --- a/drivers/video/backlight/lp8788_bl.c +++ b/drivers/video/backlight/lp8788_bl.c @@ -63,13 +63,13 @@ static struct lp8788_bl_config default_bl_config = { static inline bool is_brightness_ctrl_by_pwm(enum lp8788_bl_ctrl_mode mode) { - return (mode == LP8788_BL_COMB_PWM_BASED); + return mode == LP8788_BL_COMB_PWM_BASED; } static inline bool is_brightness_ctrl_by_register(enum lp8788_bl_ctrl_mode mode) { - return (mode == LP8788_BL_REGISTER_ONLY || - mode == LP8788_BL_COMB_REGISTER_BASED); + return mode == LP8788_BL_REGISTER_ONLY || + mode == LP8788_BL_COMB_REGISTER_BASED; } static int lp8788_backlight_configure(struct lp8788_bl *bl) -- cgit v0.10.2 From ae2924a2bdc5255745e68f2b9206404ddadfc5bf Mon Sep 17 00:00:00 2001 From: Felipe Contreras Date: Thu, 23 Jan 2014 15:54:34 -0800 Subject: lib/kstrtox.c: remove redundant cleanup We can't reach the cleanup code unless the flag KSTRTOX_OVERFLOW is not set, so there's not no point in clearing a bit that we know is not set. Signed-off-by: Felipe Contreras Acked-by: Levente Kurusa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/kstrtox.c b/lib/kstrtox.c index f78ae0c..ec8da78 100644 --- a/lib/kstrtox.c +++ b/lib/kstrtox.c @@ -92,7 +92,6 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res) rv = _parse_integer(s, base, &_res); if (rv & KSTRTOX_OVERFLOW) return -ERANGE; - rv &= ~KSTRTOX_OVERFLOW; if (rv == 0) return -EINVAL; s += rv; -- cgit v0.10.2 From 9fd4305448a4639deade433893c5233a324df3a2 Mon Sep 17 00:00:00 2001 From: Felipe Contreras Date: Thu, 23 Jan 2014 15:54:35 -0800 Subject: lib/cmdline.c: fix style issues WARNING: space prohibited between function name and open parenthesis '(' +int get_option (char **str, int *pint) WARNING: space prohibited between function name and open parenthesis '(' + *pint = simple_strtol (cur, str, 0); ERROR: trailing whitespace + $ WARNING: please, no spaces at the start of a line + $ WARNING: space prohibited between function name and open parenthesis '(' + res = get_option ((char **)&str, ints + i); Signed-off-by: Felipe Contreras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/cmdline.c b/lib/cmdline.c index eb67911..5466333 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -49,13 +49,13 @@ static int get_range(char **str, int *pint) * 3 - hyphen found to denote a range */ -int get_option (char **str, int *pint) +int get_option(char **str, int *pint) { char *cur = *str; if (!cur || !(*cur)) return 0; - *pint = simple_strtol (cur, str, 0); + *pint = simple_strtol(cur, str, 0); if (cur == *str) return 0; if (**str == ',') { @@ -84,13 +84,13 @@ int get_option (char **str, int *pint) * the parse to end (typically a null terminator, if @str is * completely parseable). */ - + char *get_options(const char *str, int nints, int *ints) { int res, i = 1; while (i < nints) { - res = get_option ((char **)&str, ints + i); + res = get_option((char **)&str, ints + i); if (res == 0) break; if (res == 3) { @@ -153,7 +153,6 @@ unsigned long long memparse(const char *ptr, char **retptr) return ret; } - EXPORT_SYMBOL(memparse); EXPORT_SYMBOL(get_option); EXPORT_SYMBOL(get_options); -- cgit v0.10.2 From ff6f9bbb582c1cb00cbe7ecd96bcde229fd336f7 Mon Sep 17 00:00:00 2001 From: Felipe Contreras Date: Thu, 23 Jan 2014 15:54:36 -0800 Subject: lib/cmdline.c: declare exported symbols immediately WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL(memparse); WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL(get_option); WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL(get_options); Signed-off-by: Felipe Contreras Cc: Levente Kurusa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/cmdline.c b/lib/cmdline.c index 5466333..d4932f7 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -67,6 +67,7 @@ int get_option(char **str, int *pint) return 1; } +EXPORT_SYMBOL(get_option); /** * get_options - Parse a string into a list of integers @@ -112,6 +113,7 @@ char *get_options(const char *str, int nints, int *ints) ints[0] = i - 1; return (char *)str; } +EXPORT_SYMBOL(get_options); /** * memparse - parse a string with mem suffixes into a number @@ -152,7 +154,4 @@ unsigned long long memparse(const char *ptr, char **retptr) return ret; } - EXPORT_SYMBOL(memparse); -EXPORT_SYMBOL(get_option); -EXPORT_SYMBOL(get_options); -- cgit v0.10.2 From 93e9ef83f40603535ffe6b60498149e75f33aa8f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 23 Jan 2014 15:54:37 -0800 Subject: test: add minimal module for verification testing This is a pair of test modules I'd like to see in the tree. Instead of putting these in lkdtm, where I've been adding various tests that trigger crashes, these don't make sense there since they need to be either distinctly separate, or their pass/fail state don't need to crash the machine. These live in lib/ for now, along with a few other in-kernel test modules, and use the slightly more common "test_" naming convention, instead of "test-". We should likely standardize on the former: $ find . -name 'test_*.c' | grep -v /tools/ | wc -l 4 $ find . -name 'test-*.c' | grep -v /tools/ | wc -l 2 The first is entirely a no-op module, designed to allow simple testing of the module loading and verification interface. It's useful to have a module that has no other uses or dependencies so it can be reliably used for just testing module loading and verification. The second is a module that exercises the user memory access functions, in an effort to make sure that we can quickly catch any regressions in boundary checking (e.g. like what was recently fixed on ARM). This patch (of 2): When doing module loading verification tests (for example, with module signing, or LSM hooks), it is very handy to have a module that can be built on all systems under test, isn't auto-loaded at boot, and has no device or similar dependencies. This creates the "test_module.ko" module for that purpose, which only reports its load and unload to printk. Signed-off-by: Kees Cook Acked-by: Rusty Russell Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 900b63c..7e37a36 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1595,6 +1595,20 @@ config DMA_API_DEBUG If unsure, say N. +config TEST_MODULE + tristate "Test module loading with 'hello world' module" + default n + depends on m + help + This builds the "test_module" module that emits "Hello, world" + on printk when loaded. It is designed to be used for basic + evaluation of the module loading subsystem (for example when + validating module verification). It lacks any extra dependencies, + and will not normally be loaded by the system unless explicitly + requested by name. + + If unsure, say N. + source "samples/Kconfig" source "lib/Kconfig.kgdb" diff --git a/lib/Makefile b/lib/Makefile index a459c31..b494b9a 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -31,6 +31,7 @@ obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += kstrtox.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o +obj-$(CONFIG_TEST_MODULE) += test_module.o ifeq ($(CONFIG_DEBUG_KOBJECT),y) CFLAGS_kobject.o += -DDEBUG diff --git a/lib/test_module.c b/lib/test_module.c new file mode 100644 index 0000000..319b66f --- /dev/null +++ b/lib/test_module.c @@ -0,0 +1,33 @@ +/* + * This module emits "Hello, world" on printk when loaded. + * + * It is designed to be used for basic evaluation of the module loading + * subsystem (for example when validating module signing/verification). It + * lacks any extra dependencies, and will not normally be loaded by the + * system unless explicitly requested by name. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include + +static int __init test_module_init(void) +{ + pr_warn("Hello, world\n"); + + return 0; +} + +module_init(test_module_init); + +static void __exit test_module_exit(void) +{ + pr_warn("Goodbye\n"); +} + +module_exit(test_module_exit); + +MODULE_AUTHOR("Kees Cook "); +MODULE_LICENSE("GPL"); -- cgit v0.10.2 From 3e2a4c183ace8708c69f589505fb82bb63010ade Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 23 Jan 2014 15:54:38 -0800 Subject: test: check copy_to/from_user boundary validation To help avoid an architecture failing to correctly check kernel/user boundaries when handling copy_to_user, copy_from_user, put_user, or get_user, perform some simple tests and fail to load if any of them behave unexpectedly. Specifically, this is to make sure there is a way to notice if things like what was fixed in commit 8404663f81d2 ("ARM: 7527/1: uaccess: explicitly check __user pointer when !CPU_USE_DOMAINS") ever regresses again, for any architecture. Additionally, adds new "user" selftest target, which loads this module. Signed-off-by: Kees Cook Cc: Rusty Russell Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7e37a36..e0e2eeb 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1609,6 +1609,19 @@ config TEST_MODULE If unsure, say N. +config TEST_USER_COPY + tristate "Test user/kernel boundary protections" + default n + depends on m + help + This builds the "test_user_copy" module that runs sanity checks + on the copy_to/from_user infrastructure, making sure basic + user/kernel boundary testing is working. If it fails to load, + a regression has been detected in the user/kernel memory boundary + protections. + + If unsure, say N. + source "samples/Kconfig" source "lib/Kconfig.kgdb" diff --git a/lib/Makefile b/lib/Makefile index b494b9a..98ec3b8 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -32,6 +32,7 @@ obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += kstrtox.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o obj-$(CONFIG_TEST_MODULE) += test_module.o +obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o ifeq ($(CONFIG_DEBUG_KOBJECT),y) CFLAGS_kobject.o += -DDEBUG diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c new file mode 100644 index 0000000..0ecef3e --- /dev/null +++ b/lib/test_user_copy.c @@ -0,0 +1,110 @@ +/* + * Kernel module for testing copy_to/from_user infrastructure. + * + * Copyright 2013 Google Inc. All Rights Reserved + * + * Authors: + * Kees Cook + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include + +#define test(condition, msg) \ +({ \ + int cond = (condition); \ + if (cond) \ + pr_warn("%s\n", msg); \ + cond; \ +}) + +static int __init test_user_copy_init(void) +{ + int ret = 0; + char *kmem; + char __user *usermem; + char *bad_usermem; + unsigned long user_addr; + unsigned long value = 0x5A; + + kmem = kmalloc(PAGE_SIZE * 2, GFP_KERNEL); + if (!kmem) + return -ENOMEM; + + user_addr = vm_mmap(NULL, 0, PAGE_SIZE * 2, + PROT_READ | PROT_WRITE | PROT_EXEC, + MAP_ANONYMOUS | MAP_PRIVATE, 0); + if (user_addr >= (unsigned long)(TASK_SIZE)) { + pr_warn("Failed to allocate user memory\n"); + kfree(kmem); + return -ENOMEM; + } + + usermem = (char __user *)user_addr; + bad_usermem = (char *)user_addr; + + /* Legitimate usage: none of these should fail. */ + ret |= test(copy_from_user(kmem, usermem, PAGE_SIZE), + "legitimate copy_from_user failed"); + ret |= test(copy_to_user(usermem, kmem, PAGE_SIZE), + "legitimate copy_to_user failed"); + ret |= test(get_user(value, (unsigned long __user *)usermem), + "legitimate get_user failed"); + ret |= test(put_user(value, (unsigned long __user *)usermem), + "legitimate put_user failed"); + + /* Invalid usage: none of these should succeed. */ + ret |= test(!copy_from_user(kmem, (char __user *)(kmem + PAGE_SIZE), + PAGE_SIZE), + "illegal all-kernel copy_from_user passed"); + ret |= test(!copy_from_user(bad_usermem, (char __user *)kmem, + PAGE_SIZE), + "illegal reversed copy_from_user passed"); + ret |= test(!copy_to_user((char __user *)kmem, kmem + PAGE_SIZE, + PAGE_SIZE), + "illegal all-kernel copy_to_user passed"); + ret |= test(!copy_to_user((char __user *)kmem, bad_usermem, + PAGE_SIZE), + "illegal reversed copy_to_user passed"); + ret |= test(!get_user(value, (unsigned long __user *)kmem), + "illegal get_user passed"); + ret |= test(!put_user(value, (unsigned long __user *)kmem), + "illegal put_user passed"); + + vm_munmap(user_addr, PAGE_SIZE * 2); + kfree(kmem); + + if (ret == 0) { + pr_info("tests passed.\n"); + return 0; + } + + return -EINVAL; +} + +module_init(test_user_copy_init); + +static void __exit test_user_copy_exit(void) +{ + pr_info("unloaded.\n"); +} + +module_exit(test_user_copy_exit); + +MODULE_AUTHOR("Kees Cook "); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 9f3eae2..32487ed 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -9,6 +9,7 @@ TARGETS += ptrace TARGETS += timers TARGETS += vm TARGETS += powerpc +TARGETS += user all: for TARGET in $(TARGETS); do \ diff --git a/tools/testing/selftests/user/Makefile b/tools/testing/selftests/user/Makefile new file mode 100644 index 0000000..396255b --- /dev/null +++ b/tools/testing/selftests/user/Makefile @@ -0,0 +1,13 @@ +# Makefile for user memory selftests + +# No binaries, but make sure arg-less "make" doesn't trigger "run_tests" +all: + +run_tests: all + @if /sbin/modprobe test_user_copy ; then \ + rmmod test_user_copy; \ + echo "user_copy: ok"; \ + else \ + echo "user_copy: [FAIL]"; \ + exit 1; \ + fi -- cgit v0.10.2 From cf0744021c5d5de54d2c66e2020c6de2fe800264 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 23 Jan 2014 15:54:39 -0800 Subject: firmware/dmi_scan: generalize for use by other archs This patch makes a couple of changes to the SMBIOS/DMI scanning code so it can be used on other archs (such as ARM and arm64): (a) wrap the calls to ioremap()/iounmap(), this allows the use of a flavor of ioremap() more suitable for random unaligned access; (b) allow the non-EFI fallback probe into hardcoded physical address 0xF0000 to be disabled. Signed-off-by: Ard Biesheuvel Acked-by: Grant Likely Cc: Ingo Molnar Cc "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index a8c3a11..c063b05 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -104,6 +104,7 @@ config HAVE_SETUP_PER_CPU_AREA config DMI bool default y + select DMI_SCAN_MACHINE_NON_EFI_FALLBACK config EFI bool diff --git a/arch/ia64/include/asm/dmi.h b/arch/ia64/include/asm/dmi.h index 185d3d1..f365a61 100644 --- a/arch/ia64/include/asm/dmi.h +++ b/arch/ia64/include/asm/dmi.h @@ -5,8 +5,10 @@ #include /* Use normal IO mappings for DMI */ -#define dmi_ioremap ioremap -#define dmi_iounmap(x,l) iounmap(x) -#define dmi_alloc(l) kzalloc(l, GFP_ATOMIC) +#define dmi_early_remap ioremap +#define dmi_early_unmap(x, l) iounmap(x) +#define dmi_remap ioremap +#define dmi_unmap iounmap +#define dmi_alloc(l) kzalloc(l, GFP_ATOMIC) #endif diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d3b9186..5aadc49 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -731,6 +731,7 @@ config APB_TIMER # The code disables itself when not needed. config DMI default y + select DMI_SCAN_MACHINE_NON_EFI_FALLBACK bool "Enable DMI scanning" if EXPERT ---help--- Enabled scanning of DMI to identify machine quirks. Say Y diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h index fd8f9e2..535192f 100644 --- a/arch/x86/include/asm/dmi.h +++ b/arch/x86/include/asm/dmi.h @@ -13,7 +13,9 @@ static __always_inline __init void *dmi_alloc(unsigned len) } /* Use early IO mappings for DMI because it's initialized early */ -#define dmi_ioremap early_ioremap -#define dmi_iounmap early_iounmap +#define dmi_early_remap early_ioremap +#define dmi_early_unmap early_iounmap +#define dmi_remap ioremap +#define dmi_unmap iounmap #endif /* _ASM_X86_DMI_H */ diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index 0747872..5a29fac 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -108,6 +108,9 @@ config DMI_SYSFS under /sys/firmware/dmi when this option is enabled and loaded. +config DMI_SCAN_MACHINE_NON_EFI_FALLBACK + bool + config ISCSI_IBFT_FIND bool "iSCSI Boot Firmware Table Attributes" depends on X86 diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c index c7e81ff..17afc51 100644 --- a/drivers/firmware/dmi_scan.c +++ b/drivers/firmware/dmi_scan.c @@ -116,7 +116,7 @@ static int __init dmi_walk_early(void (*decode)(const struct dmi_header *, { u8 *buf; - buf = dmi_ioremap(dmi_base, dmi_len); + buf = dmi_early_remap(dmi_base, dmi_len); if (buf == NULL) return -1; @@ -124,7 +124,7 @@ static int __init dmi_walk_early(void (*decode)(const struct dmi_header *, add_device_randomness(buf, dmi_len); - dmi_iounmap(buf, dmi_len); + dmi_early_unmap(buf, dmi_len); return 0; } @@ -527,18 +527,18 @@ void __init dmi_scan_machine(void) * needed during early boot. This also means we can * iounmap the space when we're done with it. */ - p = dmi_ioremap(efi.smbios, 32); + p = dmi_early_remap(efi.smbios, 32); if (p == NULL) goto error; memcpy_fromio(buf, p, 32); - dmi_iounmap(p, 32); + dmi_early_unmap(p, 32); if (!dmi_present(buf)) { dmi_available = 1; goto out; } - } else { - p = dmi_ioremap(0xF0000, 0x10000); + } else if (IS_ENABLED(CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK)) { + p = dmi_early_remap(0xF0000, 0x10000); if (p == NULL) goto error; @@ -554,12 +554,12 @@ void __init dmi_scan_machine(void) memcpy_fromio(buf + 16, q, 16); if (!dmi_present(buf)) { dmi_available = 1; - dmi_iounmap(p, 0x10000); + dmi_early_unmap(p, 0x10000); goto out; } memcpy(buf, buf + 16, 16); } - dmi_iounmap(p, 0x10000); + dmi_early_unmap(p, 0x10000); } error: pr_info("DMI not present or invalid.\n"); @@ -831,13 +831,13 @@ int dmi_walk(void (*decode)(const struct dmi_header *, void *), if (!dmi_available) return -1; - buf = ioremap(dmi_base, dmi_len); + buf = dmi_remap(dmi_base, dmi_len); if (buf == NULL) return -1; dmi_table(buf, dmi_len, dmi_num, decode, private_data); - iounmap(buf); + dmi_unmap(buf); return 0; } EXPORT_SYMBOL_GPL(dmi_walk); -- cgit v0.10.2 From 8c5fcd24a9ea608286816a1508c067c8a512af78 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:40 -0800 Subject: checkpatch: more comprehensive split strings warning The current checkpatch test for split strings does not find several cases that should be found. For instance: /* Else poor success; go back to mode in "active" table */ } else { IWL_DEBUG_RATE(mvm, - "LQ: GOING BACK TO THE OLD TABLE suc=%d cur-tpt=%d old-tpt=%d\n", + "GOING BACK TO THE OLD TABLE: SR %d " + "cur-tpt %d old-tpt %d\n", window->success_ratio, window->average_tpt, lq_sta->last_tpt); does not currently emit a warning. Improve the test to find these cases. Add more exceptions to reduce false positives for assembly and octal/hex string constants. Signed-off-by: Joe Perches Reviewed-by: Josh Triplett Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 9fb30b1..59fa00e 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2049,16 +2049,12 @@ sub process { } # Check for user-visible strings broken across lines, which breaks the ability -# to grep for the string. Limited to strings used as parameters (those -# following an open parenthesis), which almost completely eliminates false -# positives, as well as warning only once per parameter rather than once per -# line of the string. Make an exception when the previous string ends in a -# newline (multiple lines in one string constant) or \n\t (common in inline -# assembly to indent the instruction on the following line). +# to grep for the string. Make exceptions when the previous string ends in a +# newline (multiple lines in one string constant) or '\t', '\r', ';', or '{' +# (common in inline assembly) or is a octal \123 or hexadecimal \xaf value if ($line =~ /^\+\s*"/ && $prevline =~ /"\s*$/ && - $prevline =~ /\(/ && - $prevrawline !~ /\\n(?:\\t)*"\s*$/) { + $prevrawline !~ /(?:\\(?:[ntr]|[0-7]{1,3}|x[0-9a-fA-F]{1,2})|;\s*|\{\s*)"\s*$/) { WARN("SPLIT_STRING", "quoted string split across lines\n" . $hereprev); } -- cgit v0.10.2 From d2e248e7b0068b940f3ca1fc26da603536a533db Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:41 -0800 Subject: checkpatch: warn only on "space before semicolon" at end of line The "space before a non-naked semicolon" test has unwanted output when used in "for ( ;; )" loops. Make the test work only on end-of-line statement termination semicolons. Signed-off-by: Joe Perches Cc: Dan Carpenter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 59fa00e..8efce59 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3121,7 +3121,7 @@ sub process { } # check for whitespace before a non-naked semicolon - if ($line =~ /^\+.*\S\s+;/) { + if ($line =~ /^\+.*\S\s+;\s*$/) { if (WARN("SPACING", "space prohibited before semicolon\n" . $herecurr) && $fix) { -- cgit v0.10.2 From 7e4915e78992ebd3cc031051dc23063bbf29e749 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Thu, 23 Jan 2014 15:54:42 -0800 Subject: checkpatch: add warning of future __GFP_NOFAIL use gfp.h and page_alloc.c already specify that __GFP_NOFAIL is deprecated and no new users should be added. Add a warning to checkpatch to catch this. Signed-off-by: David Rientjes Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 8efce59..9bb4a421 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -4113,6 +4113,12 @@ sub process { "$1 uses number as first arg, sizeof is generally wrong\n" . $herecurr); } +# check for GFP_NOWAIT use + if ($line =~ /\b__GFP_NOFAIL\b/) { + WARN("__GFP_NOFAIL", + "Use of __GFP_NOFAIL is deprecated, no new users should be added\n" . $herecurr); + } + # check for multiple semicolons if ($line =~ /;\s*;\s*$/) { if (WARN("ONE_SEMICOLON", -- cgit v0.10.2 From c34c09a8451fac8555cbf0e8df1f6cf31cf1360b Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:43 -0800 Subject: checkpatch: attempt to find missing switch/case break; switch case statements missing a break statement are an unfortunately common error. e.g.: commit 4a2c94c9b6c0 ("HID: kye: Add report fixup for Genius Manticore Keyboard") case blocks should end in a break/return/goto/continue. If a fall-through is used, it should have a comment showing that it is intentional. Ideally that comment should be something like: "/* fall-through */" Add a test to look for missing break statements. This looks only at the context lines before an inserted case so it's possible to have false positives when the context contains a close brace and the break is before the brace and not part of the patch context. Looking at recent patches, this is a pretty rare occurrence. The normal kernel style uses a break as the last line of the previous block. Signed-off-by: Joe Perches Cc: Andy Whitcroft Cc: Jiri Kosina Cc: Benjamin Tissoires Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 9bb4a421..260b324 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -4128,6 +4128,31 @@ sub process { } } +# check for case / default statements not preceeded by break/fallthrough/switch + if ($line =~ /^.\s*(?:case\s+(?:$Ident|$Constant)\s*|default):/) { + my $has_break = 0; + my $has_statement = 0; + my $count = 0; + my $prevline = $linenr; + while ($prevline > 1 && $count < 3 && !$has_break) { + $prevline--; + my $rline = $rawlines[$prevline - 1]; + my $fline = $lines[$prevline - 1]; + last if ($fline =~ /^\@\@/); + next if ($fline =~ /^\-/); + next if ($fline =~ /^.(?:\s*(?:case\s+(?:$Ident|$Constant)[\s$;]*|default):[\s$;]*)*$/); + $has_break = 1 if ($rline =~ /fall[\s_-]*(through|thru)/i); + next if ($fline =~ /^.[\s$;]*$/); + $has_statement = 1; + $count++; + $has_break = 1 if ($fline =~ /\bswitch\b|\b(?:break\s*;[\s$;]*$|return\b|goto\b|continue\b)/); + } + if (!$has_break && $has_statement) { + WARN("MISSING_BREAK", + "Possible switch case/default not preceeded by break or fallthrough comment\n" . $herecurr); + } + } + # check for switch/default statements without a break; if ($^V && $^V ge 5.10.0 && defined $stat && -- cgit v0.10.2 From 9624b8d65cd1e9a6415a81a6588e423b1d8c2282 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:44 -0800 Subject: checkpatch: add a --fix-inplace option Add the ability to fix and overwrite existing files/patches instead of creating a new file ".EXPERIMENTAL-checkpatch-fixes". Suggested-by: Manfred Spraul Signed-off-by: Joe Perches Reviewed-by: Josh Triplett Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 260b324..93f8507 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -29,6 +29,7 @@ my $mailback = 0; my $summary_file = 0; my $show_types = 0; my $fix = 0; +my $fix_inplace = 0; my $root; my %debug; my %camelcase = (); @@ -76,6 +77,9 @@ Options: ".EXPERIMENTAL-checkpatch-fixes" with potential errors corrected to the preferred checkpatch style + --fix-inplace EXPERIMENTAL - may create horrible results + Is the same as --fix, but overwrites the input + file. It's your fault if there's no backup or git --ignore-perl-version override checking of perl version. expect runtime errors. -h, --help, --version display this help and exit @@ -131,6 +135,7 @@ GetOptions( 'mailback!' => \$mailback, 'summary-file!' => \$summary_file, 'fix!' => \$fix, + 'fix-inplace!' => \$fix_inplace, 'ignore-perl-version!' => \$ignore_perl_version, 'debug=s' => \%debug, 'test-only=s' => \$tst_only, @@ -140,6 +145,8 @@ GetOptions( help(0) if ($help); +$fix = 1 if ($fix_inplace); + my $exit = 0; if ($^V && $^V lt $minimum_perl_version) { @@ -4388,7 +4395,8 @@ sub process { hash_show_words(\%ignore_type, "Ignored"); if ($clean == 0 && $fix && "@rawlines" ne "@fixed") { - my $newfile = $filename . ".EXPERIMENTAL-checkpatch-fixes"; + my $newfile = $filename; + $newfile .= ".EXPERIMENTAL-checkpatch-fixes" if (!$fix_inplace); my $linecount = 0; my $f; -- cgit v0.10.2 From c76f4cb3d25e5dc84017d7e845072e9aef6037f4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:46 -0800 Subject: checkpatch: improve space before tab --fix option This test should remove all the spaces before a tab not just one space. Substitute a tab for each 8 space block before a tab and remove less than 8 spaces before a tab. This SPACE_BEFORE_TAB test is done after CODE_INDENT. If there are spaces used at the beginning of a line that should be converted to tabs, please make sure that the CODE_INDENT test and conversion is done before this SPACE_BEFORE_TAB test and conversion. Reported-by: Manfred Spraul Signed-off-by: Joe Perches Cc: Josh Triplett Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 93f8507..3e0b3f4 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2118,8 +2118,10 @@ sub process { if (WARN("SPACE_BEFORE_TAB", "please, no space before tabs\n" . $herevet) && $fix) { - $fixed[$linenr - 1] =~ - s/(^\+.*) +\t/$1\t/; + while ($fixed[$linenr - 1] =~ + s/(^\+.*) {8,8}+\t/$1\t\t/) {} + while ($fixed[$linenr - 1] =~ + s/(^\+.*) +\t/$1\t/) {} } } -- cgit v0.10.2 From 189248d8f4f3ac2fba30da9b40133b5891df95fc Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:47 -0800 Subject: checkpatch: check for if's with unnecessary parentheses If statements don't need multiple parentheses around tested comparisons like "if ((foo == bar))". An == comparison maybe a sign of an intended assignment, so emit a slightly different message if so. Signed-off-by: Joe Perches Reviewed-by: Josh Triplett Cc: Manfred Spraul Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3e0b3f4..57f10db 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3254,6 +3254,20 @@ sub process { } } +# if statements using unnecessary parentheses - ie: if ((foo == bar)) + if ($^V && $^V ge 5.10.0 && + $line =~ /\bif\s*((?:\(\s*){2,})/) { + my $openparens = $1; + my $count = $openparens =~ tr@\(@\(@; + my $msg = ""; + if ($line =~ /\bif\s*(?:\(\s*){$count,$count}$LvalOrFunc\s*($Compare)\s*$LvalOrFunc(?:\s*\)){$count,$count}/) { + my $comp = $4; #Not $1 because of $LvalOrFunc + $msg = " - maybe == should be = ?" if ($comp eq "=="); + WARN("UNNECESSARY_PARENTHESES", + "Unnecessary parentheses$msg\n" . $herecurr); + } + } + # Return of what appears to be an errno should normally be -'ve if ($line =~ /^.\s*return\s*(E[A-Z]*)\s*;/) { my $name = $1; -- cgit v0.10.2 From 3e2232f2d03ffa531e31662c447496ec2552d85b Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:48 -0800 Subject: checkpatch: update the FSF/GPL address check The FSF address check is a bit too verbose looking for the GPL text. Quiet it a bit by requiring --strict for the GPL bit. Also make the address tests match a few uses of abbreviations for street names and make it case insensitive. Signed-off-by: Joe Perches Reviewed-by: Josh Triplett Cc: Manfred Spraul Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 57f10db..82fd120 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1970,15 +1970,16 @@ sub process { } # Check for FSF mailing addresses. - if ($rawline =~ /You should have received a copy/ || - $rawline =~ /write to the Free Software/ || - $rawline =~ /59 Temple Place/ || - $rawline =~ /51 Franklin Street/) { + if ($rawline =~ /\bYou should have received a copy/i || + $rawline =~ /\bwrite to the Free/i || + $rawline =~ /\b59\s+Temple\s+Pl/i || + $rawline =~ /\b51\s+Franklin\s+St/i) { my $herevet = "$here\n" . cat_vet($rawline) . "\n"; my $msg_type = \&ERROR; $msg_type = \&CHK if ($file); + $msg_type = \&CHK if ($rawline =~ /\bYou should have received a copy/i); &{$msg_type}("FSF_MAILING_ADDRESS", - "Do not include the paragraph about writing to the Free Software Foundation's mailing address from the sample GPL notice. The FSF has changed addresses in the past, and may do so again. Linux already includes a copy of the GPL.\n" . $herevet) + "Do not include the paragraph about writing to the Free Software Foundation's mailing address from the sample GPL notice. The FSF has changed addresses in the past, and may do so again. Linux already includes a copy of the GPL.\n" . $herevet) } # check for Kconfig help text having a real description -- cgit v0.10.2 From 31070b5d4490c6c876e0d3b093e5d5b05e4027fa Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:49 -0800 Subject: checkpatch: add tests for function pointer style misuses Kernel style uses function pointers in this form: "type (*funcptr)(args...)" Emit warnings when this function pointer form isn't used. Signed-off-by: Joe Perches Cc: Andy Whitcroft Cc: Derek Perrin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 82fd120..19e8e86 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2811,6 +2811,65 @@ sub process { } } +# Function pointer declarations +# check spacing between type, funcptr, and args +# canonical declaration is "type (*funcptr)(args...)" +# +# the $Declare variable will capture all spaces after the type +# so check it for trailing missing spaces or multiple spaces + if ($line =~ /^.\s*($Declare)\((\s*)\*(\s*)$Ident(\s*)\)(\s*)\(/) { + my $declare = $1; + my $pre_pointer_space = $2; + my $post_pointer_space = $3; + my $funcname = $4; + my $post_funcname_space = $5; + my $pre_args_space = $6; + + if ($declare !~ /\s$/) { + WARN("SPACING", + "missing space after return type\n" . $herecurr); + } + +# unnecessary space "type (*funcptr)(args...)" + elsif ($declare =~ /\s{2,}$/) { + WARN("SPACING", + "Multiple spaces after return type\n" . $herecurr); + } + +# unnecessary space "type ( *funcptr)(args...)" + if (defined $pre_pointer_space && + $pre_pointer_space =~ /^\s/) { + WARN("SPACING", + "Unnecessary space after function pointer open parenthesis\n" . $herecurr); + } + +# unnecessary space "type (* funcptr)(args...)" + if (defined $post_pointer_space && + $post_pointer_space =~ /^\s/) { + WARN("SPACING", + "Unnecessary space before function pointer name\n" . $herecurr); + } + +# unnecessary space "type (*funcptr )(args...)" + if (defined $post_funcname_space && + $post_funcname_space =~ /^\s/) { + WARN("SPACING", + "Unnecessary space after function pointer name\n" . $herecurr); + } + +# unnecessary space "type (*funcptr) (args...)" + if (defined $pre_args_space && + $pre_args_space =~ /^\s/) { + WARN("SPACING", + "Unnecessary space before function pointer arguments\n" . $herecurr); + } + + if (show_type("SPACING") && $fix) { + $fixed[$linenr - 1] =~ + s/^(.\s*$Declare)\(\s*\*\s*($Ident)\s*\)\s*\(/rtrim($1) . " " . "\(\*$2\)\("/ex; + } + } + # check for spacing round square brackets; allowed: # 1. with a type on the left -- int [] a; # 2. at the beginning of a line for slice initialisers -- [0...10] = 5, -- cgit v0.10.2 From 109d8cb2002dcb0fff04ff1afe8f1cec66bbdad9 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Thu, 23 Jan 2014 15:54:50 -0800 Subject: checkpatch: only flag FSF address, not gnu.org URL This change restricts the check for the for the FSF address in the GPL copyright statement so that it only flags the address, not the references to the gnu.org/licenses URL which appears to be used in numerous drivers. The idea is to still allow some reference to an external copy of the GPL in the event that files are copied out of the kernel tree without the COPYING file. So for example this statement will still return an error: You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. However, this statement will not return an error after this patch: You should have received a copy of the GNU General Public License along with this program. If not, see . Signed-off-by: Alexander Duyck Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 19e8e86..d89e429 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1970,14 +1970,12 @@ sub process { } # Check for FSF mailing addresses. - if ($rawline =~ /\bYou should have received a copy/i || - $rawline =~ /\bwrite to the Free/i || + if ($rawline =~ /\bwrite to the Free/i || $rawline =~ /\b59\s+Temple\s+Pl/i || $rawline =~ /\b51\s+Franklin\s+St/i) { my $herevet = "$here\n" . cat_vet($rawline) . "\n"; my $msg_type = \&ERROR; $msg_type = \&CHK if ($file); - $msg_type = \&CHK if ($rawline =~ /\bYou should have received a copy/i); &{$msg_type}("FSF_MAILING_ADDRESS", "Do not include the paragraph about writing to the Free Software Foundation's mailing address from the sample GPL notice. The FSF has changed addresses in the past, and may do so again. Linux already includes a copy of the GPL.\n" . $herevet) } -- cgit v0.10.2 From bff5da4335256513497cc8c79f9a9d1665e09864 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Thu, 23 Jan 2014 15:54:51 -0800 Subject: checkpatch: add DT compatible string documentation checks This adds a simple check that any compatible strings in DeviceTree dts files are present in Documentation/devicetree/bindings. Vendor prefixes are also checked for existing in vendor-prefixes.txt These should be temporary checks until we have more sophisticated binding schema checking. Signed-off-by: Rob Herring Signed-off-by: Joe Perches Cc: Grant Likely Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index d89e429..05c99c0 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2040,6 +2040,33 @@ sub process { "Use of $flag is deprecated, please use \`$replacement->{$flag} instead.\n" . $herecurr) if ($replacement->{$flag}); } +# check for DT compatible documentation + if (defined $root && $realfile =~ /\.dts/ && + $rawline =~ /^\+\s*compatible\s*=/) { + my @compats = $rawline =~ /\"([a-zA-Z0-9\-\,\.\+_]+)\"/g; + + foreach my $compat (@compats) { + my $compat2 = $compat; + my $dt_path = $root . "/Documentation/devicetree/bindings/"; + $compat2 =~ s/\,[a-z]*\-/\,<\.\*>\-/; + `grep -Erq "$compat|$compat2" $dt_path`; + if ( $? >> 8 ) { + WARN("UNDOCUMENTED_DT_STRING", + "DT compatible string \"$compat\" appears un-documented -- check $dt_path\n" . $herecurr); + } + + my $vendor = $compat; + my $vendor_path = $dt_path . "vendor-prefixes.txt"; + next if (! -f $vendor_path); + $vendor =~ s/^([a-zA-Z0-9]+)\,.*/$1/; + `grep -Eq "$vendor" $vendor_path`; + if ( $? >> 8 ) { + WARN("UNDOCUMENTED_DT_STRING", + "DT compatible string vendor \"$vendor\" appears un-documented -- check $vendor_path\n" . $herecurr); + } + } + } + # check we are in a valid source file if not then ignore this hunk next if ($realfile !~ /\.(h|c|s|S|pl|sh)$/); -- cgit v0.10.2 From 98a9bba51c6e47f69c4fa22cc39a600d2e39536c Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Thu, 23 Jan 2014 15:54:52 -0800 Subject: checkpatch: prefer ether_addr_copy to memcpy(foo, bar, ETH_ALEN) ether_addr_copy was added for kernel version 3.14. It's slightly smaller/faster for some arches. Encourage its use. Signed-off-by: Joe Perches Cc: Andy Whitcroft Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 05c99c0..1dbd6d1 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -4087,6 +4087,16 @@ sub process { } } +# Check for memcpy(foo, bar, ETH_ALEN) that could be ether_addr_copy(foo, bar) + if ($^V && $^V ge 5.10.0 && + $line =~ /^\+(?:.*?)\bmemcpy\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/s) { + if (WARN("PREFER_ETHER_ADDR_COPY", + "Prefer ether_addr_copy() over memcpy() if the Ethernet addresses are __aligned(2)\n" . $herecurr) && + $fix) { + $fixed[$linenr - 1] =~ s/\bmemcpy\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/ether_addr_copy($2, $7)/; + } + } + # typecasts on min/max could be min_t/max_t if ($^V && $^V ge 5.10.0 && defined $stat && -- cgit v0.10.2 From 7a5f4f1cb0e7581ee7deb938d65f97145fa045f8 Mon Sep 17 00:00:00 2001 From: Todor Minchev Date: Thu, 23 Jan 2014 15:54:53 -0800 Subject: fs: binfmt_elf: remove unused defines INTERPRETER_NONE and INTERPRETER_ELF These two defines are unused since the removal of the a.out interpreter support in the ELF loader in kernel 2.6.25 Signed-off-by: Todor Minchev Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 571a423..67be295 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -543,9 +543,6 @@ out: * libraries. There is no binary dependent code anywhere else. */ -#define INTERPRETER_NONE 0 -#define INTERPRETER_ELF 2 - #ifndef STACK_RND_MASK #define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12)) /* 8MB of VA */ #endif -- cgit v0.10.2 From 0fa9aa20c33d76e98f44ff1de6e128e39a7738ca Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Thu, 23 Jan 2014 15:54:54 -0800 Subject: fs/ramfs/file-nommu.c: make ramfs_nommu_get_unmapped_area() and ramfs_nommu_mmap() static Since commit 853ac43ab194 ("shmem: unify regular and tiny shmem"), ramfs_nommu_get_unmapped_area() and ramfs_nommu_mmap() are not directly referenced outside of file-nommu.c. Thus make them static. Signed-off-by: Axel Lin Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c index 8d5b438..80862b1 100644 --- a/fs/ramfs/file-nommu.c +++ b/fs/ramfs/file-nommu.c @@ -27,6 +27,12 @@ #include "internal.h" static int ramfs_nommu_setattr(struct dentry *, struct iattr *); +static unsigned long ramfs_nommu_get_unmapped_area(struct file *file, + unsigned long addr, + unsigned long len, + unsigned long pgoff, + unsigned long flags); +static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma); const struct address_space_operations ramfs_aops = { .readpage = simple_readpage, @@ -197,7 +203,7 @@ static int ramfs_nommu_setattr(struct dentry *dentry, struct iattr *ia) * - the pages to be mapped must exist * - the pages be physically contiguous in sequence */ -unsigned long ramfs_nommu_get_unmapped_area(struct file *file, +static unsigned long ramfs_nommu_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -256,7 +262,7 @@ out: /* * set up a mapping for shared memory segments */ -int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma) +static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma) { if (!(vma->vm_flags & VM_SHARED)) return -ENOSYS; diff --git a/include/linux/ramfs.h b/include/linux/ramfs.h index 753207c..ecc7309 100644 --- a/include/linux/ramfs.h +++ b/include/linux/ramfs.h @@ -14,13 +14,6 @@ ramfs_nommu_expand_for_mapping(struct inode *inode, size_t newsize) } #else extern int ramfs_nommu_expand_for_mapping(struct inode *inode, size_t newsize); -extern unsigned long ramfs_nommu_get_unmapped_area(struct file *file, - unsigned long addr, - unsigned long len, - unsigned long pgoff, - unsigned long flags); - -extern int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma); #endif extern const struct file_operations ramfs_file_operations; -- cgit v0.10.2 From 87e06aa3a7e5f9fbc2f5215c4ba9c4a42b404192 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Thu, 23 Jan 2014 15:54:55 -0800 Subject: fs/ramfs: move ramfs_aops to inode.c ramfs_aops is identical in file-mmu.c and file-nommu.c. Thus move it to fs/ramfs/inode.c and make it static. Signed-off-by: Axel Lin Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c index 4884ac5..1e56a4e 100644 --- a/fs/ramfs/file-mmu.c +++ b/fs/ramfs/file-mmu.c @@ -30,13 +30,6 @@ #include "internal.h" -const struct address_space_operations ramfs_aops = { - .readpage = simple_readpage, - .write_begin = simple_write_begin, - .write_end = simple_write_end, - .set_page_dirty = __set_page_dirty_no_writeback, -}; - const struct file_operations ramfs_file_operations = { .read = do_sync_read, .aio_read = generic_file_aio_read, diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c index 80862b1..0b3d8e4 100644 --- a/fs/ramfs/file-nommu.c +++ b/fs/ramfs/file-nommu.c @@ -34,13 +34,6 @@ static unsigned long ramfs_nommu_get_unmapped_area(struct file *file, unsigned long flags); static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma); -const struct address_space_operations ramfs_aops = { - .readpage = simple_readpage, - .write_begin = simple_write_begin, - .write_end = simple_write_end, - .set_page_dirty = __set_page_dirty_no_writeback, -}; - const struct file_operations ramfs_file_operations = { .mmap = ramfs_nommu_mmap, .get_unmapped_area = ramfs_nommu_get_unmapped_area, diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index 6a3e2c4..d365b1c 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -43,6 +43,13 @@ static const struct super_operations ramfs_ops; static const struct inode_operations ramfs_dir_inode_operations; +static const struct address_space_operations ramfs_aops = { + .readpage = simple_readpage, + .write_begin = simple_write_begin, + .write_end = simple_write_end, + .set_page_dirty = __set_page_dirty_no_writeback, +}; + static struct backing_dev_info ramfs_backing_dev_info = { .name = "ramfs", .ra_pages = 0, /* No readahead */ diff --git a/fs/ramfs/internal.h b/fs/ramfs/internal.h index 6b33063..a9d8ae8 100644 --- a/fs/ramfs/internal.h +++ b/fs/ramfs/internal.h @@ -10,5 +10,4 @@ */ -extern const struct address_space_operations ramfs_aops; extern const struct inode_operations ramfs_file_inode_operations; -- cgit v0.10.2 From 128e3f4541ec844c90a99320bf7d2909da4ef80b Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:54:55 -0800 Subject: init/main.c: remove unused declaration of tc_init() Its user was removed in v2.5.2.4. Signed-off-by: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/main.c b/init/main.c index f865261..98e3537 100644 --- a/init/main.c +++ b/init/main.c @@ -99,10 +99,6 @@ extern void radix_tree_init(void); static inline void mark_rodata_ro(void) { } #endif -#ifdef CONFIG_TC -extern void tc_init(void); -#endif - /* * Debug helper: via this flag we know that we are in 'early bootup code' * where only the boot processor is running with IRQ disabled. This means -- cgit v0.10.2 From 499a4584d7f817d43d09ccfc6bb26315eeaab6bc Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 23 Jan 2014 15:54:56 -0800 Subject: init: fix possible format string bug Use constant format string in case message changes. Signed-off-by: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/initramfs.c b/init/initramfs.c index a67ef9d..93b6139 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -583,7 +583,7 @@ static int __init populate_rootfs(void) { char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size); if (err) - panic(err); /* Failed to decompress INTERNAL initramfs */ + panic("%s", err); /* Failed to decompress INTERNAL initramfs */ if (initrd_start) { #ifdef CONFIG_BLK_DEV_RAM int fd; diff --git a/init/main.c b/init/main.c index 98e3537..f333385 100644 --- a/init/main.c +++ b/init/main.c @@ -278,7 +278,7 @@ static int __init unknown_bootoption(char *param, char *val, const char *unused) unsigned int i; for (i = 0; envp_init[i]; i++) { if (i == MAX_INIT_ENVS) { - panic_later = "Too many boot env vars at `%s'"; + panic_later = "env"; panic_param = param; } if (!strncmp(param, envp_init[i], val - param)) @@ -290,7 +290,7 @@ static int __init unknown_bootoption(char *param, char *val, const char *unused) unsigned int i; for (i = 0; argv_init[i]; i++) { if (i == MAX_INIT_ARGS) { - panic_later = "Too many boot init vars at `%s'"; + panic_later = "init"; panic_param = param; } } @@ -582,7 +582,8 @@ asmlinkage void __init start_kernel(void) */ console_init(); if (panic_later) - panic(panic_later, panic_param); + panic("Too many boot %s vars at `%s'", panic_later, + panic_param); lockdep_info(); -- cgit v0.10.2 From 6eaba35b437438988078efc92f1ef445a00cd7bc Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Thu, 23 Jan 2014 15:54:57 -0800 Subject: autofs4: allow autofs to work outside the initial PID namespace Enable autofs4 to work in a "container". oz_pgrp is converted from pid_t to struct pid and this is stored at mount time based on the "pgrp=" option or if the option is missing then the current pgrp. The "pgrp=" option is interpreted in the PID namespace of the current process. This option is flawed in that it doesn't carry the namespace information, so it should be deprecated. AFAICS the autofs daemon always sends the current pgrp, which is the default anyway. The oz_pgrp is also set from the AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ioctl. This ioctl sets oz_pgrp to the current pgrp. It is not allowed to change the pid namespace. oz_pgrp is used mainly to determine whether the process traversing the autofs mount tree is the autofs daemon itself or not. This function now compares the pid pointers instead of the pid_t values. One other use of oz_pgrp is in autofs4_show_options. There is shows the virtual pid number (i.e. the one that is valid inside the PID namespace of the calling process) For debugging printk convert oz_pgrp to the value in the initial pid namespace. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Miklos Szeredi Acked-by: Serge Hallyn Cc: Eric Biederman Acked-by: Ian Kent Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index 4218e26..acf3205 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -104,7 +104,7 @@ struct autofs_sb_info { u32 magic; int pipefd; struct file *pipe; - pid_t oz_pgrp; + struct pid *oz_pgrp; int catatonic; int version; int sub_version; @@ -140,7 +140,7 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) filesystem without "magic".) */ static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) { - return sbi->catatonic || task_pgrp_nr(current) == sbi->oz_pgrp; + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; } /* Does a dentry have some pending activity? */ diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c index 1818ce7..3182c0e 100644 --- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -346,6 +346,7 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp, { int pipefd; int err = 0; + struct pid *new_pid = NULL; if (param->setpipefd.pipefd == -1) return -EINVAL; @@ -357,7 +358,17 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp, mutex_unlock(&sbi->wq_mutex); return -EBUSY; } else { - struct file *pipe = fget(pipefd); + struct file *pipe; + + new_pid = get_task_pid(current, PIDTYPE_PGID); + + if (ns_of_pid(new_pid) != ns_of_pid(sbi->oz_pgrp)) { + AUTOFS_WARN("Not allowed to change PID namespace"); + err = -EINVAL; + goto out; + } + + pipe = fget(pipefd); if (!pipe) { err = -EBADF; goto out; @@ -367,12 +378,13 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp, fput(pipe); goto out; } - sbi->oz_pgrp = task_pgrp_nr(current); + swap(sbi->oz_pgrp, new_pid); sbi->pipefd = pipefd; sbi->pipe = pipe; sbi->catatonic = 0; } out: + put_pid(new_pid); mutex_unlock(&sbi->wq_mutex); return err; } diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c index 3b9cc9b..a3de082 100644 --- a/fs/autofs4/inode.c +++ b/fs/autofs4/inode.c @@ -56,8 +56,11 @@ void autofs4_kill_sb(struct super_block *sb) * just call kill_anon_super when we are called from * deactivate_super. */ - if (sbi) /* Free wait queues, close pipe */ + if (sbi) { + /* Free wait queues, close pipe */ autofs4_catatonic_mode(sbi); + put_pid(sbi->oz_pgrp); + } DPRINTK("shutting down"); kill_litter_super(sb); @@ -80,7 +83,7 @@ static int autofs4_show_options(struct seq_file *m, struct dentry *root) if (!gid_eq(root_inode->i_gid, GLOBAL_ROOT_GID)) seq_printf(m, ",gid=%u", from_kgid_munged(&init_user_ns, root_inode->i_gid)); - seq_printf(m, ",pgrp=%d", sbi->oz_pgrp); + seq_printf(m, ",pgrp=%d", pid_vnr(sbi->oz_pgrp)); seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ); seq_printf(m, ",minproto=%d", sbi->min_proto); seq_printf(m, ",maxproto=%d", sbi->max_proto); @@ -124,7 +127,8 @@ static const match_table_t tokens = { }; static int parse_options(char *options, int *pipefd, kuid_t *uid, kgid_t *gid, - pid_t *pgrp, unsigned int *type, int *minproto, int *maxproto) + int *pgrp, bool *pgrp_set, unsigned int *type, + int *minproto, int *maxproto) { char *p; substring_t args[MAX_OPT_ARGS]; @@ -132,7 +136,6 @@ static int parse_options(char *options, int *pipefd, kuid_t *uid, kgid_t *gid, *uid = current_uid(); *gid = current_gid(); - *pgrp = task_pgrp_nr(current); *minproto = AUTOFS_MIN_PROTO_VERSION; *maxproto = AUTOFS_MAX_PROTO_VERSION; @@ -171,6 +174,7 @@ static int parse_options(char *options, int *pipefd, kuid_t *uid, kgid_t *gid, if (match_int(args, &option)) return 1; *pgrp = option; + *pgrp_set = true; break; case Opt_minproto: if (match_int(args, &option)) @@ -206,6 +210,8 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) int pipefd; struct autofs_sb_info *sbi; struct autofs_info *ino; + int pgrp; + bool pgrp_set = false; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) @@ -218,7 +224,7 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) sbi->pipe = NULL; sbi->catatonic = 1; sbi->exp_timeout = 0; - sbi->oz_pgrp = task_pgrp_nr(current); + sbi->oz_pgrp = NULL; sbi->sb = s; sbi->version = 0; sbi->sub_version = 0; @@ -255,12 +261,23 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) /* Can this call block? */ if (parse_options(data, &pipefd, &root_inode->i_uid, &root_inode->i_gid, - &sbi->oz_pgrp, &sbi->type, &sbi->min_proto, - &sbi->max_proto)) { + &pgrp, &pgrp_set, &sbi->type, &sbi->min_proto, + &sbi->max_proto)) { printk("autofs: called with bogus options\n"); goto fail_dput; } + if (pgrp_set) { + sbi->oz_pgrp = find_get_pid(pgrp); + if (!sbi->oz_pgrp) { + pr_warn("autofs: could not find process group %d\n", + pgrp); + goto fail_dput; + } + } else { + sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID); + } + if (autofs_type_trigger(sbi->type)) __managed_dentry_set_managed(root); @@ -284,9 +301,9 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) sbi->version = sbi->max_proto; sbi->sub_version = AUTOFS_PROTO_SUBVERSION; - DPRINTK("pipe fd = %d, pgrp = %u", pipefd, sbi->oz_pgrp); + DPRINTK("pipe fd = %d, pgrp = %u", pipefd, pid_nr(sbi->oz_pgrp)); pipe = fget(pipefd); - + if (!pipe) { printk("autofs: could not open pipe file descriptor\n"); goto fail_dput; @@ -316,6 +333,7 @@ fail_dput: fail_ino: kfree(ino); fail_free: + put_pid(sbi->oz_pgrp); kfree(sbi); s->s_fs_info = NULL; fail_unlock: -- cgit v0.10.2 From fbff08706d12fcdb160604c4ba790df6707c32cb Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 23 Jan 2014 15:54:58 -0800 Subject: autofs4: translate pids to the right namespace for the daemon The PID and the TGID of the process triggering the mount are sent to the daemon. Currently the global pid values are sent (ones valid in the initial pid namespace) but this is wrong if the autofs daemon itself is not running in the initial pid namespace. So send the pid values that are valid in the namespace of the autofs daemon. The namespace to use is taken from the oz_pgrp pid pointer, which was set at mount time to the mounting process' pid namespace. If the pid translation fails (the triggering process is in an unrelated pid namespace) then the automount fails with ENOENT. Signed-off-by: Miklos Szeredi Acked-by: Serge Hallyn Cc: Eric Biederman Acked-by: Ian Kent Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c index 689e40d..116fd38 100644 --- a/fs/autofs4/waitq.c +++ b/fs/autofs4/waitq.c @@ -347,11 +347,23 @@ int autofs4_wait(struct autofs_sb_info *sbi, struct dentry *dentry, struct qstr qstr; char *name; int status, ret, type; + pid_t pid; + pid_t tgid; /* In catatonic mode, we don't wait for nobody */ if (sbi->catatonic) return -ENOENT; + /* + * Try translating pids to the namespace of the daemon. + * + * Zero means failure: we are in an unrelated pid namespace. + */ + pid = task_pid_nr_ns(current, ns_of_pid(sbi->oz_pgrp)); + tgid = task_tgid_nr_ns(current, ns_of_pid(sbi->oz_pgrp)); + if (pid == 0 || tgid == 0) + return -ENOENT; + if (!dentry->d_inode) { /* * A wait for a negative dentry is invalid for certain @@ -417,8 +429,8 @@ int autofs4_wait(struct autofs_sb_info *sbi, struct dentry *dentry, wq->ino = autofs4_get_ino(sbi); wq->uid = current_uid(); wq->gid = current_gid(); - wq->pid = current->pid; - wq->tgid = current->tgid; + wq->pid = pid; + wq->tgid = tgid; wq->status = -EINTR; /* Status return if interrupted */ wq->wait_ctr = 2; -- cgit v0.10.2 From da29b7543957c6e967066f1ee18fab2feb0eeeb3 Mon Sep 17 00:00:00 2001 From: Rui Xiang Date: Thu, 23 Jan 2014 15:54:59 -0800 Subject: autofs: fix the return value of autofs4_fill_super While kzallocing sbi/ino fails, it should return -ENOMEM. And it should return the err value from autofs_prepare_pipe. Signed-off-by: Rui Xiang Signed-off-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c index a3de082..d7bd395 100644 --- a/fs/autofs4/inode.c +++ b/fs/autofs4/inode.c @@ -212,10 +212,11 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) struct autofs_info *ino; int pgrp; bool pgrp_set = false; + int ret = -EINVAL; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) - goto fail_unlock; + return -ENOMEM; DPRINTK("starting up, sbi = %p",sbi); s->s_fs_info = sbi; @@ -249,8 +250,10 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) * Get the root inode and dentry, but defer checking for errors. */ ino = autofs4_new_ino(sbi); - if (!ino) + if (!ino) { + ret = -ENOMEM; goto fail_free; + } root_inode = autofs4_get_inode(s, S_IFDIR | 0755); root = d_make_root(root_inode); if (!root) @@ -308,7 +311,8 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent) printk("autofs: could not open pipe file descriptor\n"); goto fail_dput; } - if (autofs_prepare_pipe(pipe) < 0) + ret = autofs_prepare_pipe(pipe); + if (ret < 0) goto fail_fput; sbi->pipe = pipe; sbi->pipefd = pipefd; @@ -336,8 +340,7 @@ fail_free: put_pid(sbi->oz_pgrp); kfree(sbi); s->s_fs_info = NULL; -fail_unlock: - return -EINVAL; + return ret; } struct inode *autofs4_get_inode(struct super_block *sb, umode_t mode) -- cgit v0.10.2 From c24930a9bbb6219f21f670c38b9473181d5f5e10 Mon Sep 17 00:00:00 2001 From: Rui Xiang Date: Thu, 23 Jan 2014 15:55:00 -0800 Subject: autofs: use IS_ROOT to replace root dentry checks Use the helper macro !IS_ROOT to replace parent != dentry->d_parent. Just clean up. Signed-off-by: Rui Xiang Signed-off-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index 92ef341..2caf36a 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -558,7 +558,7 @@ static int autofs4_dir_symlink(struct inode *dir, dget(dentry); atomic_inc(&ino->count); p_ino = autofs4_dentry_ino(dentry->d_parent); - if (p_ino && dentry->d_parent != dentry) + if (p_ino && !IS_ROOT(dentry)) atomic_inc(&p_ino->count); dir->i_mtime = CURRENT_TIME; @@ -593,7 +593,7 @@ static int autofs4_dir_unlink(struct inode *dir, struct dentry *dentry) if (atomic_dec_and_test(&ino->count)) { p_ino = autofs4_dentry_ino(dentry->d_parent); - if (p_ino && dentry->d_parent != dentry) + if (p_ino && !IS_ROOT(dentry)) atomic_dec(&p_ino->count); } dput(ino->dentry); @@ -732,7 +732,7 @@ static int autofs4_dir_mkdir(struct inode *dir, struct dentry *dentry, umode_t m dget(dentry); atomic_inc(&ino->count); p_ino = autofs4_dentry_ino(dentry->d_parent); - if (p_ino && dentry->d_parent != dentry) + if (p_ino && !IS_ROOT(dentry)) atomic_inc(&p_ino->count); inc_nlink(dir); dir->i_mtime = CURRENT_TIME; -- cgit v0.10.2 From 8dc51fe5ab9edcaebb5438d6462befdc6922b4a1 Mon Sep 17 00:00:00 2001 From: Ian Kent Date: Thu, 23 Jan 2014 15:55:01 -0800 Subject: autofs: fix symlinks aren't checked for expiry The autofs4 module doesn't consider symlinks for expire as it did in the older autofs v3 module (so it's actually a long standing regression). The user space daemon has focused on the use of bind mounts instead of symlinks for a long time now and that's why this has not been noticed. But with the future addition of amd map parsing to automount(8), not to mention amd itself (of am-utils), symlink expiry will be needed. The direct and offset mount types can't be symlinks and the tree mounts of version 4 were always real mounts so only indirect mounts need expire symlinks. Since the current users of the autofs4 module haven't reported this as a problem to date this patch probably isn't a candidate for backport to stable. Signed-off-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index 3d9d3f5..394e90b 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -402,6 +402,20 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb, goto next; } + if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) { + DPRINTK("checking symlink %p %.*s", + dentry, (int)dentry->d_name.len, dentry->d_name.name); + /* + * A symlink can't be "busy" in the usual sense so + * just check last used for expire timeout. + */ + if (autofs4_can_expire(dentry, timeout, do_now)) { + expired = dentry; + goto found; + } + goto next; + } + if (simple_empty(dentry)) goto next; diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c index f27c094..1e8ea19 100644 --- a/fs/autofs4/symlink.c +++ b/fs/autofs4/symlink.c @@ -14,6 +14,10 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd) { + struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); + struct autofs_info *ino = autofs4_dentry_ino(dentry); + if (ino && !autofs4_oz_mode(sbi)) + ino->last_used = jiffies; nd_set_link(nd, dentry->d_inode->i_private); return NULL; } -- cgit v0.10.2 From 75465c49f092f24acc236b0f51e9b8bf8adc329e Mon Sep 17 00:00:00 2001 From: Laxman Dewangan Date: Thu, 23 Jan 2014 15:55:02 -0800 Subject: drivers/rtc/rtc-as3722: use devm for rtc and irq registration Use devm_* calls for rtc and irq registration and get rid of remove callback for platform driver. Signed-off-by: Laxman Dewangan Cc: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-as3722.c b/drivers/rtc/rtc-as3722.c index 9cfa817..4af0169 100644 --- a/drivers/rtc/rtc-as3722.c +++ b/drivers/rtc/rtc-as3722.c @@ -198,7 +198,7 @@ static int as3722_rtc_probe(struct platform_device *pdev) device_init_wakeup(&pdev->dev, 1); - as3722_rtc->rtc = rtc_device_register("as3722", &pdev->dev, + as3722_rtc->rtc = devm_rtc_device_register(&pdev->dev, "as3722-rtc", &as3722_rtc_ops, THIS_MODULE); if (IS_ERR(as3722_rtc->rtc)) { ret = PTR_ERR(as3722_rtc->rtc); @@ -209,28 +209,16 @@ static int as3722_rtc_probe(struct platform_device *pdev) as3722_rtc->alarm_irq = platform_get_irq(pdev, 0); dev_info(&pdev->dev, "RTC interrupt %d\n", as3722_rtc->alarm_irq); - ret = request_threaded_irq(as3722_rtc->alarm_irq, NULL, + ret = devm_request_threaded_irq(&pdev->dev, as3722_rtc->alarm_irq, NULL, as3722_alarm_irq, IRQF_ONESHOT | IRQF_EARLY_RESUME, "rtc-alarm", as3722_rtc); if (ret < 0) { dev_err(&pdev->dev, "Failed to request alarm IRQ %d: %d\n", as3722_rtc->alarm_irq, ret); - goto scrub; + return ret; } disable_irq(as3722_rtc->alarm_irq); return 0; -scrub: - rtc_device_unregister(as3722_rtc->rtc); - return ret; -} - -static int as3722_rtc_remove(struct platform_device *pdev) -{ - struct as3722_rtc *as3722_rtc = platform_get_drvdata(pdev); - - free_irq(as3722_rtc->alarm_irq, as3722_rtc); - rtc_device_unregister(as3722_rtc->rtc); - return 0; } #ifdef CONFIG_PM_SLEEP @@ -260,7 +248,6 @@ static const struct dev_pm_ops as3722_rtc_pm_ops = { static struct platform_driver as3722_rtc_driver = { .probe = as3722_rtc_probe, - .remove = as3722_rtc_remove, .driver = { .name = "as3722-rtc", .pm = &as3722_rtc_pm_ops, -- cgit v0.10.2 From a3e6ad6740c6e77beda83d4238f2123bf3aef45f Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:55:03 -0800 Subject: drivers/rtc/rtc-ds1305.c: remove unnecessary spi_set_drvdata() The driver core clears the driver data to NULL after device_release or on probe failure. Thus, it is not needed to manually clear the device driver data to NULL. Signed-off-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-ds1305.c b/drivers/rtc/rtc-ds1305.c index 80f3237..2dd586a 100644 --- a/drivers/rtc/rtc-ds1305.c +++ b/drivers/rtc/rtc-ds1305.c @@ -787,7 +787,6 @@ static int ds1305_remove(struct spi_device *spi) cancel_work_sync(&ds1305->work); } - spi_set_drvdata(spi, NULL); return 0; } -- cgit v0.10.2 From fbd5e754cb03c134ed45ff3417606daf61f576ca Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 23 Jan 2014 15:55:04 -0800 Subject: drivers/rtc/rtc-mxc.c: remove unneeded label There is no need to jump to the 'exit_free_pdata' label when devm_clk_get() fails, as we can directly return the error and simplify the code a bit. Signed-off-by: Fabio Estevam Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c index 50c5726..a3ed1cf 100644 --- a/drivers/rtc/rtc-mxc.c +++ b/drivers/rtc/rtc-mxc.c @@ -391,8 +391,7 @@ static int mxc_rtc_probe(struct platform_device *pdev) pdata->clk = devm_clk_get(&pdev->dev, NULL); if (IS_ERR(pdata->clk)) { dev_err(&pdev->dev, "unable to get clock!\n"); - ret = PTR_ERR(pdata->clk); - goto exit_free_pdata; + return PTR_ERR(pdata->clk); } clk_prepare_enable(pdata->clk); @@ -447,8 +446,6 @@ static int mxc_rtc_probe(struct platform_device *pdev) exit_put_clk: clk_disable_unprepare(pdata->clk); -exit_free_pdata: - return ret; } -- cgit v0.10.2 From 1b3d2243d049e062d0dc53b85f0e95db67e114af Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 23 Jan 2014 15:55:05 -0800 Subject: drivers/rtc/rtc-mxc.c: check the return value from clk_prepare_enable() clk_prepare_enable() may fail, so let's check its return value and propagate it in the case of error. Signed-off-by: Fabio Estevam Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c index a3ed1cf..419874f 100644 --- a/drivers/rtc/rtc-mxc.c +++ b/drivers/rtc/rtc-mxc.c @@ -394,7 +394,10 @@ static int mxc_rtc_probe(struct platform_device *pdev) return PTR_ERR(pdata->clk); } - clk_prepare_enable(pdata->clk); + ret = clk_prepare_enable(pdata->clk); + if (ret) + return ret; + rate = clk_get_rate(pdata->clk); if (rate == 32768) -- cgit v0.10.2 From 663b35241df1d0ed24be3d17733807cc8723cc4a Mon Sep 17 00:00:00 2001 From: Alexander Shiyan Date: Thu, 23 Jan 2014 15:55:06 -0800 Subject: drivers/rtc/rtc-ds1742.c: add devicetree support This patch allows the driver to be enabled with devicetree. Signed-off-by: Alexander Shiyan Acked-by: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt b/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt new file mode 100644 index 0000000..d0f937c --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/maxim,ds1742.txt @@ -0,0 +1,12 @@ +* Maxim (Dallas) DS1742/DS1743 Real Time Clock + +Required properties: +- compatible: Should contain "maxim,ds1742". +- reg: Physical base address of the RTC and length of memory + mapped region. + +Example: + rtc: rtc@10000000 { + compatible = "maxim,ds1742"; + reg = <0x10000000 0x800>; + }; diff --git a/drivers/rtc/rtc-ds1742.c b/drivers/rtc/rtc-ds1742.c index 17b73fd..d7f74f5 100644 --- a/drivers/rtc/rtc-ds1742.c +++ b/drivers/rtc/rtc-ds1742.c @@ -13,12 +13,13 @@ */ #include -#include #include #include #include #include #include +#include +#include #include #include #include @@ -215,12 +216,19 @@ static int ds1742_rtc_remove(struct platform_device *pdev) return 0; } +static struct of_device_id __maybe_unused ds1742_rtc_of_match[] = { + { .compatible = "maxim,ds1742", }, + { } +}; +MODULE_DEVICE_TABLE(of, ds1742_rtc_of_match); + static struct platform_driver ds1742_rtc_driver = { .probe = ds1742_rtc_probe, .remove = ds1742_rtc_remove, .driver = { .name = "rtc-ds1742", .owner = THIS_MODULE, + .of_match_table = of_match_ptr(ds1742_rtc_of_match), }, }; -- cgit v0.10.2 From f53eeb853dfe408737ac6b8c3117bd21a9c60fd4 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:55:06 -0800 Subject: drivers/rtc/rtc-twl.c: use devm_*() functions Use devm_*() functions to make cleanup paths simpler, and remove unnecessary remove(). Signed-off-by: Jingoo Han Cc: Yoichi Yuasa Cc: Grygorii Strashko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-twl.c b/drivers/rtc/rtc-twl.c index c2e80d7..1915464 100644 --- a/drivers/rtc/rtc-twl.c +++ b/drivers/rtc/rtc-twl.c @@ -479,7 +479,7 @@ static int twl_rtc_probe(struct platform_device *pdev) u8 rd_reg; if (irq <= 0) - goto out1; + return ret; /* Initialize the register map */ if (twl_class_is_4030()) @@ -489,7 +489,7 @@ static int twl_rtc_probe(struct platform_device *pdev) ret = twl_rtc_read_u8(&rd_reg, REG_RTC_STATUS_REG); if (ret < 0) - goto out1; + return ret; if (rd_reg & BIT_RTC_STATUS_REG_POWER_UP_M) dev_warn(&pdev->dev, "Power up reset detected.\n"); @@ -500,7 +500,7 @@ static int twl_rtc_probe(struct platform_device *pdev) /* Clear RTC Power up reset and pending alarm interrupts */ ret = twl_rtc_write_u8(rd_reg, REG_RTC_STATUS_REG); if (ret < 0) - goto out1; + return ret; if (twl_class_is_6030()) { twl6030_interrupt_unmask(TWL6030_RTC_INT_MASK, @@ -512,7 +512,7 @@ static int twl_rtc_probe(struct platform_device *pdev) dev_info(&pdev->dev, "Enabling TWL-RTC\n"); ret = twl_rtc_write_u8(BIT_RTC_CTRL_REG_STOP_RTC_M, REG_RTC_CTRL_REG); if (ret < 0) - goto out1; + return ret; /* ensure interrupts are disabled, bootloaders can be strange */ ret = twl_rtc_write_u8(0, REG_RTC_INTERRUPTS_REG); @@ -522,34 +522,29 @@ static int twl_rtc_probe(struct platform_device *pdev) /* init cached IRQ enable bits */ ret = twl_rtc_read_u8(&rtc_irq_bits, REG_RTC_INTERRUPTS_REG); if (ret < 0) - goto out1; + return ret; device_init_wakeup(&pdev->dev, 1); - rtc = rtc_device_register(pdev->name, - &pdev->dev, &twl_rtc_ops, THIS_MODULE); + rtc = devm_rtc_device_register(&pdev->dev, pdev->name, + &twl_rtc_ops, THIS_MODULE); if (IS_ERR(rtc)) { - ret = PTR_ERR(rtc); dev_err(&pdev->dev, "can't register RTC device, err %ld\n", PTR_ERR(rtc)); - goto out1; + return PTR_ERR(rtc); } - ret = request_threaded_irq(irq, NULL, twl_rtc_interrupt, - IRQF_TRIGGER_RISING | IRQF_ONESHOT, - dev_name(&rtc->dev), rtc); + ret = devm_request_threaded_irq(&pdev->dev, irq, NULL, + twl_rtc_interrupt, + IRQF_TRIGGER_RISING | IRQF_ONESHOT, + dev_name(&rtc->dev), rtc); if (ret < 0) { dev_err(&pdev->dev, "IRQ is not free.\n"); - goto out2; + return ret; } platform_set_drvdata(pdev, rtc); return 0; - -out2: - rtc_device_unregister(rtc); -out1: - return ret; } /* @@ -559,9 +554,6 @@ out1: static int twl_rtc_remove(struct platform_device *pdev) { /* leave rtc running, but disable irqs */ - struct rtc_device *rtc = platform_get_drvdata(pdev); - int irq = platform_get_irq(pdev, 0); - mask_rtc_irq_bit(BIT_RTC_INTERRUPTS_REG_IT_ALARM_M); mask_rtc_irq_bit(BIT_RTC_INTERRUPTS_REG_IT_TIMER_M); if (twl_class_is_6030()) { @@ -571,10 +563,6 @@ static int twl_rtc_remove(struct platform_device *pdev) REG_INT_MSK_STS_A); } - - free_irq(irq, rtc); - - rtc_device_unregister(rtc); return 0; } -- cgit v0.10.2 From bf6ce1a102797ceca6d44de991def4e0c1825cb2 Mon Sep 17 00:00:00 2001 From: Jingoo Han Date: Thu, 23 Jan 2014 15:55:07 -0800 Subject: drivers/rtc/rtc-vr41xx.c: use devm_*() functions Use devm_*() functions to make cleanup paths simpler, and remove unnecessary remove(). Signed-off-by: Jingoo Han Cc: Yoichi Yuasa Cc: Grygorii Strashko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-vr41xx.c b/drivers/rtc/rtc-vr41xx.c index aabc22c..88c9c92 100644 --- a/drivers/rtc/rtc-vr41xx.c +++ b/drivers/rtc/rtc-vr41xx.c @@ -293,7 +293,7 @@ static int rtc_probe(struct platform_device *pdev) if (!res) return -EBUSY; - rtc1_base = ioremap(res->start, resource_size(res)); + rtc1_base = devm_ioremap(&pdev->dev, res->start, resource_size(res)); if (!rtc1_base) return -EBUSY; @@ -303,13 +303,14 @@ static int rtc_probe(struct platform_device *pdev) goto err_rtc1_iounmap; } - rtc2_base = ioremap(res->start, resource_size(res)); + rtc2_base = devm_ioremap(&pdev->dev, res->start, resource_size(res)); if (!rtc2_base) { retval = -EBUSY; goto err_rtc1_iounmap; } - rtc = rtc_device_register(rtc_name, &pdev->dev, &vr41xx_rtc_ops, THIS_MODULE); + rtc = devm_rtc_device_register(&pdev->dev, rtc_name, &vr41xx_rtc_ops, + THIS_MODULE); if (IS_ERR(rtc)) { retval = PTR_ERR(rtc); goto err_iounmap_all; @@ -330,24 +331,24 @@ static int rtc_probe(struct platform_device *pdev) aie_irq = platform_get_irq(pdev, 0); if (aie_irq <= 0) { retval = -EBUSY; - goto err_device_unregister; + goto err_iounmap_all; } - retval = request_irq(aie_irq, elapsedtime_interrupt, 0, - "elapsed_time", pdev); + retval = devm_request_irq(&pdev->dev, aie_irq, elapsedtime_interrupt, 0, + "elapsed_time", pdev); if (retval < 0) - goto err_device_unregister; + goto err_iounmap_all; pie_irq = platform_get_irq(pdev, 1); if (pie_irq <= 0) { retval = -EBUSY; - goto err_free_irq; + goto err_iounmap_all; } - retval = request_irq(pie_irq, rtclong1_interrupt, 0, - "rtclong1", pdev); + retval = devm_request_irq(&pdev->dev, pie_irq, rtclong1_interrupt, 0, + "rtclong1", pdev); if (retval < 0) - goto err_free_irq; + goto err_iounmap_all; platform_set_drvdata(pdev, rtc); @@ -358,47 +359,20 @@ static int rtc_probe(struct platform_device *pdev) return 0; -err_free_irq: - free_irq(aie_irq, pdev); - -err_device_unregister: - rtc_device_unregister(rtc); - err_iounmap_all: - iounmap(rtc2_base); rtc2_base = NULL; err_rtc1_iounmap: - iounmap(rtc1_base); rtc1_base = NULL; return retval; } -static int rtc_remove(struct platform_device *pdev) -{ - struct rtc_device *rtc; - - rtc = platform_get_drvdata(pdev); - if (rtc) - rtc_device_unregister(rtc); - - free_irq(aie_irq, pdev); - free_irq(pie_irq, pdev); - if (rtc1_base) - iounmap(rtc1_base); - if (rtc2_base) - iounmap(rtc2_base); - - return 0; -} - /* work with hotplug and coldplug */ MODULE_ALIAS("platform:RTC"); static struct platform_driver rtc_platform_driver = { .probe = rtc_probe, - .remove = rtc_remove, .driver = { .name = rtc_name, .owner = THIS_MODULE, -- cgit v0.10.2 From b7ed189dc742cf4f744ee113e0fcac808b801702 Mon Sep 17 00:00:00 2001 From: Heiko Stuebner Date: Thu, 23 Jan 2014 15:55:08 -0800 Subject: dt-bindings: add hym8563 binding Add binding documentation for the hym8563 rtc chip. Signed-off-by: Heiko Stuebner Cc: Rob Herring Cc: Pawel Moll Cc: Mark Rutland Cc: Stephen Warren Cc: Ian Campbell Cc: Grant Likely Cc: Mike Turquette Cc: Richard Weinberger Cc: Mark Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt b/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt new file mode 100644 index 0000000..31406fd --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/haoyu,hym8563.txt @@ -0,0 +1,27 @@ +Haoyu Microelectronics HYM8563 Real Time Clock + +The HYM8563 provides basic rtc and alarm functionality +as well as a clock output of up to 32kHz. + +Required properties: +- compatible: should be: "haoyu,hym8563" +- reg: i2c address +- interrupts: rtc alarm/event interrupt +- #clock-cells: the value should be 0 + +Example: + +hym8563: hym8563@51 { + compatible = "haoyu,hym8563"; + reg = <0x51>; + + interrupts = <13 IRQ_TYPE_EDGE_FALLING>; + + #clock-cells = <0>; +}; + +device { +... + clocks = <&hym8563>; +... +}; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index f29cd78..0977b15 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -34,6 +34,7 @@ fsl Freescale Semiconductor GEFanuc GE Fanuc Intelligent Platforms Embedded Systems, Inc. gef GE Fanuc Intelligent Platforms Embedded Systems, Inc. gmt Global Mixed-mode Technology, Inc. +haoyu Haoyu Microelectronic Co. Ltd. hisilicon Hisilicon Limited. hp Hewlett Packard ibm International Business Machines (IBM) -- cgit v0.10.2 From dcaf038493525604e3d1648d5761a852e8db2bf9 Mon Sep 17 00:00:00 2001 From: Heiko Stuebner Date: Thu, 23 Jan 2014 15:55:10 -0800 Subject: rtc: add hym8563 rtc-driver The Haoyu Microelectronics HYM8563 provides rtc and alarm functions as well as a clock output of up to 32kHz. Signed-off-by: Heiko Stuebner Cc: Rob Herring Cc: Pawel Moll Cc: Mark Rutland Cc: Stephen Warren Cc: Ian Campbell Cc: Grant Likely Cc: Mike Turquette Cc: Richard Weinberger Cc: Mark Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 0077302..4bd489c 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -212,6 +212,17 @@ config RTC_DRV_DS3232 This driver can also be built as a module. If so, the module will be called rtc-ds3232. +config RTC_DRV_HYM8563 + tristate "Haoyu Microelectronics HYM8563" + depends on I2C && OF + help + Say Y to enable support for the HYM8563 I2C RTC chip. Apart + from the usual rtc functions it provides a clock output of + up to 32kHz. + + This driver can also be built as a module. If so, the module + will be called rtc-hym8563. + config RTC_DRV_LP8788 tristate "TI LP8788 RTC driver" depends on MFD_LP8788 diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile index 27b4bd8..913c5be 100644 --- a/drivers/rtc/Makefile +++ b/drivers/rtc/Makefile @@ -55,6 +55,7 @@ obj-$(CONFIG_RTC_DRV_EP93XX) += rtc-ep93xx.o obj-$(CONFIG_RTC_DRV_FM3130) += rtc-fm3130.o obj-$(CONFIG_RTC_DRV_GENERIC) += rtc-generic.o obj-$(CONFIG_RTC_DRV_HID_SENSOR_TIME) += rtc-hid-sensor-time.o +obj-$(CONFIG_RTC_DRV_HYM8563) += rtc-hym8563.o obj-$(CONFIG_RTC_DRV_IMXDI) += rtc-imxdi.o obj-$(CONFIG_RTC_DRV_ISL1208) += rtc-isl1208.o obj-$(CONFIG_RTC_DRV_ISL12022) += rtc-isl12022.o diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c new file mode 100644 index 0000000..b56e3d3 --- /dev/null +++ b/drivers/rtc/rtc-hym8563.c @@ -0,0 +1,606 @@ +/* + * Haoyu HYM8563 RTC driver + * + * Copyright (C) 2013 MundoReader S.L. + * Author: Heiko Stuebner + * + * based on rtc-HYM8563 + * Copyright (C) 2010 ROCKCHIP, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include + +#define HYM8563_CTL1 0x00 +#define HYM8563_CTL1_TEST BIT(7) +#define HYM8563_CTL1_STOP BIT(5) +#define HYM8563_CTL1_TESTC BIT(3) + +#define HYM8563_CTL2 0x01 +#define HYM8563_CTL2_TI_TP BIT(4) +#define HYM8563_CTL2_AF BIT(3) +#define HYM8563_CTL2_TF BIT(2) +#define HYM8563_CTL2_AIE BIT(1) +#define HYM8563_CTL2_TIE BIT(0) + +#define HYM8563_SEC 0x02 +#define HYM8563_SEC_VL BIT(7) +#define HYM8563_SEC_MASK 0x7f + +#define HYM8563_MIN 0x03 +#define HYM8563_MIN_MASK 0x7f + +#define HYM8563_HOUR 0x04 +#define HYM8563_HOUR_MASK 0x3f + +#define HYM8563_DAY 0x05 +#define HYM8563_DAY_MASK 0x3f + +#define HYM8563_WEEKDAY 0x06 +#define HYM8563_WEEKDAY_MASK 0x07 + +#define HYM8563_MONTH 0x07 +#define HYM8563_MONTH_CENTURY BIT(7) +#define HYM8563_MONTH_MASK 0x1f + +#define HYM8563_YEAR 0x08 + +#define HYM8563_ALM_MIN 0x09 +#define HYM8563_ALM_HOUR 0x0a +#define HYM8563_ALM_DAY 0x0b +#define HYM8563_ALM_WEEK 0x0c + +/* Each alarm check can be disabled by setting this bit in the register */ +#define HYM8563_ALM_BIT_DISABLE BIT(7) + +#define HYM8563_CLKOUT 0x0d +#define HYM8563_CLKOUT_DISABLE BIT(7) +#define HYM8563_CLKOUT_32768 0 +#define HYM8563_CLKOUT_1024 1 +#define HYM8563_CLKOUT_32 2 +#define HYM8563_CLKOUT_1 3 +#define HYM8563_CLKOUT_MASK 3 + +#define HYM8563_TMR_CTL 0x0e +#define HYM8563_TMR_CTL_ENABLE BIT(7) +#define HYM8563_TMR_CTL_4096 0 +#define HYM8563_TMR_CTL_64 1 +#define HYM8563_TMR_CTL_1 2 +#define HYM8563_TMR_CTL_1_60 3 +#define HYM8563_TMR_CTL_MASK 3 + +#define HYM8563_TMR_CNT 0x0f + +struct hym8563 { + struct i2c_client *client; + struct rtc_device *rtc; + bool valid; +#ifdef CONFIG_COMMON_CLK + struct clk_hw clkout_hw; +#endif +}; + +/* + * RTC handling + */ + +static int hym8563_rtc_read_time(struct device *dev, struct rtc_time *tm) +{ + struct i2c_client *client = to_i2c_client(dev); + struct hym8563 *hym8563 = i2c_get_clientdata(client); + u8 buf[7]; + int ret; + + if (!hym8563->valid) { + dev_warn(&client->dev, "no valid clock/calendar values available\n"); + return -EPERM; + } + + ret = i2c_smbus_read_i2c_block_data(client, HYM8563_SEC, 7, buf); + + tm->tm_sec = bcd2bin(buf[0] & HYM8563_SEC_MASK); + tm->tm_min = bcd2bin(buf[1] & HYM8563_MIN_MASK); + tm->tm_hour = bcd2bin(buf[2] & HYM8563_HOUR_MASK); + tm->tm_mday = bcd2bin(buf[3] & HYM8563_DAY_MASK); + tm->tm_wday = bcd2bin(buf[4] & HYM8563_WEEKDAY_MASK); /* 0 = Sun */ + tm->tm_mon = bcd2bin(buf[5] & HYM8563_MONTH_MASK) - 1; /* 0 = Jan */ + tm->tm_year = bcd2bin(buf[6]) + 100; + + return 0; +} + +static int hym8563_rtc_set_time(struct device *dev, struct rtc_time *tm) +{ + struct i2c_client *client = to_i2c_client(dev); + struct hym8563 *hym8563 = i2c_get_clientdata(client); + u8 buf[7]; + int ret; + + /* Years >= 2100 are to far in the future, 19XX is to early */ + if (tm->tm_year < 100 || tm->tm_year >= 200) + return -EINVAL; + + buf[0] = bin2bcd(tm->tm_sec); + buf[1] = bin2bcd(tm->tm_min); + buf[2] = bin2bcd(tm->tm_hour); + buf[3] = bin2bcd(tm->tm_mday); + buf[4] = bin2bcd(tm->tm_wday); + buf[5] = bin2bcd(tm->tm_mon + 1); + + /* + * While the HYM8563 has a century flag in the month register, + * it does not seem to carry it over a subsequent write/read. + * So we'll limit ourself to 100 years, starting at 2000 for now. + */ + buf[6] = tm->tm_year - 100; + + /* + * CTL1 only contains TEST-mode bits apart from stop, + * so no need to read the value first + */ + ret = i2c_smbus_write_byte_data(client, HYM8563_CTL1, + HYM8563_CTL1_STOP); + if (ret < 0) + return ret; + + ret = i2c_smbus_write_i2c_block_data(client, HYM8563_SEC, 7, buf); + if (ret < 0) + return ret; + + ret = i2c_smbus_write_byte_data(client, HYM8563_CTL1, 0); + if (ret < 0) + return ret; + + hym8563->valid = true; + + return 0; +} + +static int hym8563_rtc_alarm_irq_enable(struct device *dev, + unsigned int enabled) +{ + struct i2c_client *client = to_i2c_client(dev); + int data; + + data = i2c_smbus_read_byte_data(client, HYM8563_CTL2); + if (data < 0) + return data; + + if (enabled) + data |= HYM8563_CTL2_AIE; + else + data &= ~HYM8563_CTL2_AIE; + + return i2c_smbus_write_byte_data(client, HYM8563_CTL2, data); +}; + +static int hym8563_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alm) +{ + struct i2c_client *client = to_i2c_client(dev); + struct rtc_time *alm_tm = &alm->time; + u8 buf[4]; + int ret; + + ret = i2c_smbus_read_i2c_block_data(client, HYM8563_ALM_MIN, 4, buf); + if (ret < 0) + return ret; + + /* The alarm only has a minute accuracy */ + alm_tm->tm_sec = -1; + + alm_tm->tm_min = (buf[0] & HYM8563_ALM_BIT_DISABLE) ? + -1 : + bcd2bin(buf[0] & HYM8563_MIN_MASK); + alm_tm->tm_hour = (buf[1] & HYM8563_ALM_BIT_DISABLE) ? + -1 : + bcd2bin(buf[1] & HYM8563_HOUR_MASK); + alm_tm->tm_mday = (buf[2] & HYM8563_ALM_BIT_DISABLE) ? + -1 : + bcd2bin(buf[2] & HYM8563_DAY_MASK); + alm_tm->tm_wday = (buf[3] & HYM8563_ALM_BIT_DISABLE) ? + -1 : + bcd2bin(buf[3] & HYM8563_WEEKDAY_MASK); + + alm_tm->tm_mon = -1; + alm_tm->tm_year = -1; + + ret = i2c_smbus_read_byte_data(client, HYM8563_CTL2); + if (ret < 0) + return ret; + + if (ret & HYM8563_CTL2_AIE) + alm->enabled = 1; + + return 0; +} + +static int hym8563_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alm) +{ + struct i2c_client *client = to_i2c_client(dev); + struct rtc_time *alm_tm = &alm->time; + u8 buf[4]; + int ret; + + /* + * The alarm has no seconds so deal with it + */ + if (alm_tm->tm_sec) { + alm_tm->tm_sec = 0; + alm_tm->tm_min++; + if (alm_tm->tm_min >= 60) { + alm_tm->tm_min = 0; + alm_tm->tm_hour++; + if (alm_tm->tm_hour >= 24) { + alm_tm->tm_hour = 0; + alm_tm->tm_mday++; + if (alm_tm->tm_mday > 31) + alm_tm->tm_mday = 0; + } + } + } + + ret = i2c_smbus_read_byte_data(client, HYM8563_CTL2); + if (ret < 0) + return ret; + + ret &= ~HYM8563_CTL2_AIE; + + ret = i2c_smbus_write_byte_data(client, HYM8563_CTL2, ret); + if (ret < 0) + return ret; + + buf[0] = (alm_tm->tm_min < 60 && alm_tm->tm_min >= 0) ? + bin2bcd(alm_tm->tm_min) : HYM8563_ALM_BIT_DISABLE; + + buf[1] = (alm_tm->tm_hour < 24 && alm_tm->tm_hour >= 0) ? + bin2bcd(alm_tm->tm_hour) : HYM8563_ALM_BIT_DISABLE; + + buf[2] = (alm_tm->tm_mday <= 31 && alm_tm->tm_mday >= 1) ? + bin2bcd(alm_tm->tm_mday) : HYM8563_ALM_BIT_DISABLE; + + buf[3] = (alm_tm->tm_wday < 7 && alm_tm->tm_wday >= 0) ? + bin2bcd(alm_tm->tm_wday) : HYM8563_ALM_BIT_DISABLE; + + ret = i2c_smbus_write_i2c_block_data(client, HYM8563_ALM_MIN, 4, buf); + if (ret < 0) + return ret; + + return hym8563_rtc_alarm_irq_enable(dev, alm->enabled); +} + +static const struct rtc_class_ops hym8563_rtc_ops = { + .read_time = hym8563_rtc_read_time, + .set_time = hym8563_rtc_set_time, + .alarm_irq_enable = hym8563_rtc_alarm_irq_enable, + .read_alarm = hym8563_rtc_read_alarm, + .set_alarm = hym8563_rtc_set_alarm, +}; + +/* + * Handling of the clkout + */ + +#ifdef CONFIG_COMMON_CLK +#define clkout_hw_to_hym8563(_hw) container_of(_hw, struct hym8563, clkout_hw) + +static int clkout_rates[] = { + 32768, + 1024, + 32, + 1, +}; + +static unsigned long hym8563_clkout_recalc_rate(struct clk_hw *hw, + unsigned long parent_rate) +{ + struct hym8563 *hym8563 = clkout_hw_to_hym8563(hw); + struct i2c_client *client = hym8563->client; + int ret = i2c_smbus_read_byte_data(client, HYM8563_CLKOUT); + + if (ret < 0 || ret & HYM8563_CLKOUT_DISABLE) + return 0; + + ret &= HYM8563_CLKOUT_MASK; + return clkout_rates[ret]; +} + +static long hym8563_clkout_round_rate(struct clk_hw *hw, unsigned long rate, + unsigned long *prate) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(clkout_rates); i++) + if (clkout_rates[i] <= rate) + return clkout_rates[i]; + + return 0; +} + +static int hym8563_clkout_set_rate(struct clk_hw *hw, unsigned long rate, + unsigned long parent_rate) +{ + struct hym8563 *hym8563 = clkout_hw_to_hym8563(hw); + struct i2c_client *client = hym8563->client; + int ret = i2c_smbus_read_byte_data(client, HYM8563_CLKOUT); + int i; + + if (ret < 0) + return ret; + + for (i = 0; i < ARRAY_SIZE(clkout_rates); i++) + if (clkout_rates[i] == rate) { + ret &= ~HYM8563_CLKOUT_MASK; + ret |= i; + return i2c_smbus_write_byte_data(client, + HYM8563_CLKOUT, ret); + } + + return -EINVAL; +} + +static int hym8563_clkout_control(struct clk_hw *hw, bool enable) +{ + struct hym8563 *hym8563 = clkout_hw_to_hym8563(hw); + struct i2c_client *client = hym8563->client; + int ret = i2c_smbus_read_byte_data(client, HYM8563_CLKOUT); + + if (ret < 0) + return ret; + + if (enable) + ret &= ~HYM8563_CLKOUT_DISABLE; + else + ret |= HYM8563_CLKOUT_DISABLE; + + return i2c_smbus_write_byte_data(client, HYM8563_CLKOUT, ret); +} + +static int hym8563_clkout_prepare(struct clk_hw *hw) +{ + return hym8563_clkout_control(hw, 1); +} + +static void hym8563_clkout_unprepare(struct clk_hw *hw) +{ + hym8563_clkout_control(hw, 0); +} + +static int hym8563_clkout_is_prepared(struct clk_hw *hw) +{ + struct hym8563 *hym8563 = clkout_hw_to_hym8563(hw); + struct i2c_client *client = hym8563->client; + int ret = i2c_smbus_read_byte_data(client, HYM8563_CLKOUT); + + if (ret < 0) + return ret; + + return !(ret & HYM8563_CLKOUT_DISABLE); +} + +const struct clk_ops hym8563_clkout_ops = { + .prepare = hym8563_clkout_prepare, + .unprepare = hym8563_clkout_unprepare, + .is_prepared = hym8563_clkout_is_prepared, + .recalc_rate = hym8563_clkout_recalc_rate, + .round_rate = hym8563_clkout_round_rate, + .set_rate = hym8563_clkout_set_rate, +}; + +static struct clk *hym8563_clkout_register_clk(struct hym8563 *hym8563) +{ + struct i2c_client *client = hym8563->client; + struct device_node *node = client->dev.of_node; + struct clk *clk; + struct clk_init_data init; + int ret; + + ret = i2c_smbus_write_byte_data(client, HYM8563_CLKOUT, + HYM8563_CLKOUT_DISABLE); + if (ret < 0) + return ERR_PTR(ret); + + init.name = "hym8563-clkout"; + init.ops = &hym8563_clkout_ops; + init.flags = CLK_IS_ROOT; + init.parent_names = NULL; + init.num_parents = 0; + hym8563->clkout_hw.init = &init; + + /* register the clock */ + clk = clk_register(&client->dev, &hym8563->clkout_hw); + + if (!IS_ERR(clk)) + of_clk_add_provider(node, of_clk_src_simple_get, clk); + + return clk; +} +#endif + +/* + * The alarm interrupt is implemented as a level-low interrupt in the + * hym8563, while the timer interrupt uses a falling edge. + * We don't use the timer at all, so the interrupt is requested to + * use the level-low trigger. + */ +static irqreturn_t hym8563_irq(int irq, void *dev_id) +{ + struct hym8563 *hym8563 = (struct hym8563 *)dev_id; + struct i2c_client *client = hym8563->client; + struct mutex *lock = &hym8563->rtc->ops_lock; + int data, ret; + + mutex_lock(lock); + + /* Clear the alarm flag */ + + data = i2c_smbus_read_byte_data(client, HYM8563_CTL2); + if (data < 0) { + dev_err(&client->dev, "%s: error reading i2c data %d\n", + __func__, data); + goto out; + } + + data &= ~HYM8563_CTL2_AF; + + ret = i2c_smbus_write_byte_data(client, HYM8563_CTL2, data); + if (ret < 0) { + dev_err(&client->dev, "%s: error writing i2c data %d\n", + __func__, ret); + } + +out: + mutex_unlock(lock); + return IRQ_HANDLED; +} + +static int hym8563_init_device(struct i2c_client *client) +{ + int ret; + + /* Clear stop flag if present */ + ret = i2c_smbus_write_byte_data(client, HYM8563_CTL1, 0); + if (ret < 0) + return ret; + + ret = i2c_smbus_read_byte_data(client, HYM8563_CTL2); + if (ret < 0) + return ret; + + /* Disable alarm and timer interrupts */ + ret &= ~HYM8563_CTL2_AIE; + ret &= ~HYM8563_CTL2_TIE; + + /* Clear any pending alarm and timer flags */ + if (ret & HYM8563_CTL2_AF) + ret &= ~HYM8563_CTL2_AF; + + if (ret & HYM8563_CTL2_TF) + ret &= ~HYM8563_CTL2_TF; + + ret &= ~HYM8563_CTL2_TI_TP; + + return i2c_smbus_write_byte_data(client, HYM8563_CTL2, ret); +} + +#ifdef CONFIG_PM_SLEEP +static int hym8563_suspend(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + int ret; + + if (device_may_wakeup(dev)) { + ret = enable_irq_wake(client->irq); + if (ret) { + dev_err(dev, "enable_irq_wake failed, %d\n", ret); + return ret; + } + } + + return 0; +} + +static int hym8563_resume(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + + if (device_may_wakeup(dev)) + disable_irq_wake(client->irq); + + return 0; +} +#endif + +static SIMPLE_DEV_PM_OPS(hym8563_pm_ops, hym8563_suspend, hym8563_resume); + +static int hym8563_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + struct hym8563 *hym8563; + int ret; + + hym8563 = devm_kzalloc(&client->dev, sizeof(*hym8563), GFP_KERNEL); + if (!hym8563) + return -ENOMEM; + + hym8563->client = client; + i2c_set_clientdata(client, hym8563); + + device_set_wakeup_capable(&client->dev, true); + + ret = hym8563_init_device(client); + if (ret) { + dev_err(&client->dev, "could not init device, %d\n", ret); + return ret; + } + + ret = devm_request_threaded_irq(&client->dev, client->irq, + NULL, hym8563_irq, + IRQF_TRIGGER_LOW | IRQF_ONESHOT, + client->name, hym8563); + if (ret < 0) { + dev_err(&client->dev, "irq %d request failed, %d\n", + client->irq, ret); + return ret; + } + + /* check state of calendar information */ + ret = i2c_smbus_read_byte_data(client, HYM8563_SEC); + if (ret < 0) + return ret; + + hym8563->valid = !(ret & HYM8563_SEC_VL); + dev_dbg(&client->dev, "rtc information is %s\n", + hym8563->valid ? "valid" : "invalid"); + + hym8563->rtc = devm_rtc_device_register(&client->dev, client->name, + &hym8563_rtc_ops, THIS_MODULE); + if (IS_ERR(hym8563->rtc)) + return PTR_ERR(hym8563->rtc); + +#ifdef CONFIG_COMMON_CLK + hym8563_clkout_register_clk(hym8563); +#endif + + return 0; +} + +static const struct i2c_device_id hym8563_id[] = { + { "hym8563", 0 }, + {}, +}; +MODULE_DEVICE_TABLE(i2c, hym8563_id); + +static struct of_device_id hym8563_dt_idtable[] = { + { .compatible = "haoyu,hym8563" }, + {}, +}; +MODULE_DEVICE_TABLE(of, hym8563_dt_idtable); + +static struct i2c_driver hym8563_driver = { + .driver = { + .name = "rtc-hym8563", + .owner = THIS_MODULE, + .pm = &hym8563_pm_ops, + .of_match_table = of_match_ptr(hym8563_dt_idtable), + }, + .probe = hym8563_probe, + .id_table = hym8563_id, +}; + +module_i2c_driver(hym8563_driver); + +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_DESCRIPTION("HYM8563 RTC driver"); +MODULE_LICENSE("GPL"); -- cgit v0.10.2 From c823a20244e1673047ac88b4439809748e2ab34e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:55:11 -0800 Subject: drivers/rtc/rtc-cmos.c: remove superfluous name cast device_driver.name is "const char *" Signed-off-by: Geert Uytterhoeven Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index a2325bc..f28b458 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -1175,7 +1175,7 @@ static struct platform_driver cmos_platform_driver = { .remove = __exit_p(cmos_platform_remove), .shutdown = cmos_platform_shutdown, .driver = { - .name = (char *) driver_name, + .name = driver_name, #ifdef CONFIG_PM .pm = &cmos_pm_ops, #endif -- cgit v0.10.2 From 41c9dbf4ba7cea158cfc1aeb0d6ec3270fb5427b Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:55:12 -0800 Subject: drivers/rtc/Kconfig: disable RTC_DRV_CMOS on Atari On ARAnyM (emulating an Atari Falcon, which doesn't have an RTC IRQ, as the Second Multi Function Peripheral MFP 68901 is available on Atari TT only), rtc-cmos doesn't work well: - The date is of by 32 years (2045 instead of 2013): rtc_cmos rtc_cmos: setting system clock to 2045-12-02 10:56:17 UTC (2395824977) - The hwclock utility doesn't work: hwclock: ioctl() to /dev/rtc to turn on update interrupts failed unexpectedly, errno=5: Input/output error. As rtc-generic works fine for the RTC part, and nvram works for the NVRAM part, we'll continue on using that. Signed-off-by: Geert Uytterhoeven Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 4bd489c..609b069 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -637,7 +637,7 @@ comment "Platform RTC drivers" config RTC_DRV_CMOS tristate "PC-style 'CMOS'" - depends on X86 || ARM || M32R || ATARI || PPC || MIPS || SPARC64 + depends on X86 || ARM || M32R || PPC || MIPS || SPARC64 default y if X86 help Say "yes" here to get direct support for the real time clock -- cgit v0.10.2 From 7ab26cd1ef817bca74cf82116eaf9eb5fe4a56c7 Mon Sep 17 00:00:00 2001 From: Duan Jiong Date: Thu, 23 Jan 2014 15:55:13 -0800 Subject: drivers/rtc/rtc-pcf2127.c: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO Fix a coccinelle error regarding usage of IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO. Signed-off-by: Duan Jiong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index 1ee514a..9bd842e 100644 --- a/drivers/rtc/rtc-pcf2127.c +++ b/drivers/rtc/rtc-pcf2127.c @@ -197,10 +197,7 @@ static int pcf2127_probe(struct i2c_client *client, pcf2127_driver.driver.name, &pcf2127_rtc_ops, THIS_MODULE); - if (IS_ERR(pcf2127->rtc)) - return PTR_ERR(pcf2127->rtc); - - return 0; + return PTR_ERR_OR_ZERO(pcf2127->rtc); } static const struct i2c_device_id pcf2127_id[] = { -- cgit v0.10.2 From 9d2b7e532da8aadfcc1bd85b62ec5dd853e870e3 Mon Sep 17 00:00:00 2001 From: Stephen Warren Date: Thu, 23 Jan 2014 15:55:14 -0800 Subject: rtc: honor device tree /alias entries when assigning IDs Assign RTC device IDs based on device tree /aliases entries if present, falling back to the existing numbering scheme if there is no /aliases entry (which includes when the system isn't booted using DT), or there is a numbering conflict. This is useful in systems with multiple RTC devices, to ensure that the best RTC device is selected as /dev/rtc0, which provides the overall system time. For example, Tegra has an on-SoC RTC that is not battery backed, typically coupled with an off-SoC RTC that is battery backed. Only the latter is useful for populating the system time, yet the former is useful e.g. for wakeup timing, since the time is not lost when the system is sleeps. Signed-off-by: Stephen Warren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c index 0242681..589351e 100644 --- a/drivers/rtc/class.c +++ b/drivers/rtc/class.c @@ -14,6 +14,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include #include @@ -157,12 +158,27 @@ struct rtc_device *rtc_device_register(const char *name, struct device *dev, { struct rtc_device *rtc; struct rtc_wkalrm alrm; - int id, err; + int of_id = -1, id = -1, err; + + if (dev->of_node) + of_id = of_alias_get_id(dev->of_node, "rtc"); + else if (dev->parent && dev->parent->of_node) + of_id = of_alias_get_id(dev->parent->of_node, "rtc"); + + if (of_id >= 0) { + id = ida_simple_get(&rtc_ida, of_id, of_id + 1, + GFP_KERNEL); + if (id < 0) + dev_warn(dev, "/aliases ID %d not available\n", + of_id); + } - id = ida_simple_get(&rtc_ida, 0, 0, GFP_KERNEL); if (id < 0) { - err = id; - goto exit; + id = ida_simple_get(&rtc_ida, 0, 0, GFP_KERNEL); + if (id < 0) { + err = id; + goto exit; + } } rtc = kzalloc(sizeof(struct rtc_device), GFP_KERNEL); -- cgit v0.10.2 From 24b34472e2e6e3815f36e39d6996e3b39ebb2a5e Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Thu, 23 Jan 2014 15:55:15 -0800 Subject: drivers/rtc/rtc-cmos.c: propagate hpet_register_irq_handler() failure If hpet_register_irq_handler() fails, cmos_do_probe() will incorrectly return 0. Reported-by: Julia Lawall Cc: John Stultz Cc: Grant Likely Cc: Rob Herring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index f28b458..cae212f 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -756,11 +756,9 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) irq_handler_t rtc_cmos_int_handler; if (is_hpet_enabled()) { - int err; - rtc_cmos_int_handler = hpet_rtc_interrupt; - err = hpet_register_irq_handler(cmos_interrupt); - if (err != 0) { + retval = hpet_register_irq_handler(cmos_interrupt); + if (retval) { dev_warn(dev, "hpet_register_irq_handler " " failed in rtc_init()."); goto cleanup1; -- cgit v0.10.2 From 5516f0971793a0f7d0d54bf0220b6b2e13a05d7e Mon Sep 17 00:00:00 2001 From: Sachin Kamat Date: Thu, 23 Jan 2014 15:55:16 -0800 Subject: drivers/rtc/rtc-ds1742.c: remove redundant of_match_ptr() helper 'ds1742_rtc_of_match' is always compiled in. Hence the helper macro is not needed. Signed-off-by: Sachin Kamat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-ds1742.c b/drivers/rtc/rtc-ds1742.c index d7f74f5..5a1f3b2 100644 --- a/drivers/rtc/rtc-ds1742.c +++ b/drivers/rtc/rtc-ds1742.c @@ -228,7 +228,7 @@ static struct platform_driver ds1742_rtc_driver = { .driver = { .name = "rtc-ds1742", .owner = THIS_MODULE, - .of_match_table = of_match_ptr(ds1742_rtc_of_match), + .of_match_table = ds1742_rtc_of_match, }, }; -- cgit v0.10.2 From 156e3526e8509140723c3af3b6b5468a854cc2fa Mon Sep 17 00:00:00 2001 From: Sachin Kamat Date: Thu, 23 Jan 2014 15:55:17 -0800 Subject: drivers/rtc/rtc-hym8563.c: remove redundant of_match_ptr() helper 'hym8563_dt_idtable' is always compiled in. Hence the helper macro is not needed. Signed-off-by: Sachin Kamat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c index b56e3d3..6118000 100644 --- a/drivers/rtc/rtc-hym8563.c +++ b/drivers/rtc/rtc-hym8563.c @@ -593,7 +593,7 @@ static struct i2c_driver hym8563_driver = { .name = "rtc-hym8563", .owner = THIS_MODULE, .pm = &hym8563_pm_ops, - .of_match_table = of_match_ptr(hym8563_dt_idtable), + .of_match_table = hym8563_dt_idtable, }, .probe = hym8563_probe, .id_table = hym8563_id, -- cgit v0.10.2 From 28ed893c02e7da25792e6742a78b679662a144e3 Mon Sep 17 00:00:00 2001 From: Sachin Kamat Date: Thu, 23 Jan 2014 15:55:18 -0800 Subject: drivers/rtc/rtc-hym8563.c: staticize local symbol 'hym8563_clkout_ops' is used only in this file. Signed-off-by: Sachin Kamat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c index 6118000..bd628a6 100644 --- a/drivers/rtc/rtc-hym8563.c +++ b/drivers/rtc/rtc-hym8563.c @@ -389,7 +389,7 @@ static int hym8563_clkout_is_prepared(struct clk_hw *hw) return !(ret & HYM8563_CLKOUT_DISABLE); } -const struct clk_ops hym8563_clkout_ops = { +static const struct clk_ops hym8563_clkout_ops = { .prepare = hym8563_clkout_prepare, .unprepare = hym8563_clkout_unprepare, .is_prepared = hym8563_clkout_is_prepared, -- cgit v0.10.2 From 75ea799df4cb07e505c91b4abaa87bc28aad3e66 Mon Sep 17 00:00:00 2001 From: Stephen Warren Date: Thu, 23 Jan 2014 15:55:19 -0800 Subject: rtc: max8907: weekday encoding fixes The current MAX8907 driver has two issues related to weekday value handling: 1) The HW WEEKDAY register has range 0..6 rather than 1..7 as documented. Note that I validated the actual HW range by observing the HW register roll from 6->0 rather than 6->7->1 as would otherwise be expected. This matches Linux's tm_wday range of 0..6. When the CMOS RAM content is lost, the date returned from the device is 2007-01-01 00:00:00, which is a Monday. The WEEKDAY register reads 1 in this case. This matches the numbering in Linux's tm_wday field. Hence we should write Linux's tm_wday value to the register without modifying it. Hence, remove the +1/-1 calculations for WEEKDAY/tm_wday. 2) There's no need to make alarms match on the WEEKDAY register, since the other fields together uniquely define the alarm date/time. Ignoring the WEEKDAY value in the match isolates the driver from any incorrect value in the current time copy of the WEEKDAY register. Each change individually, or both together, solves an issue that I observed; "hwclock -r" would time out waiting for its alarm to fire if the CMOS RAM content had been lost, and hence the WEEKDAY register value mismatched what the driver expected it to be. "hwclock -w" would solve this by over-writing the HW default WEEKDAY register value with what the driver expected. Signed-off-by: Stephen Warren Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-max8907.c b/drivers/rtc/rtc-max8907.c index 8e45b3c..3032178 100644 --- a/drivers/rtc/rtc-max8907.c +++ b/drivers/rtc/rtc-max8907.c @@ -51,7 +51,7 @@ static irqreturn_t max8907_irq_handler(int irq, void *data) { struct max8907_rtc *rtc = data; - regmap_update_bits(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0x7f, 0); + regmap_write(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0); rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF); @@ -64,7 +64,7 @@ static void regs_to_tm(u8 *regs, struct rtc_time *tm) bcd2bin(regs[RTC_YEAR1]) - 1900; tm->tm_mon = bcd2bin(regs[RTC_MONTH] & 0x1f) - 1; tm->tm_mday = bcd2bin(regs[RTC_DATE] & 0x3f); - tm->tm_wday = (regs[RTC_WEEKDAY] & 0x07) - 1; + tm->tm_wday = (regs[RTC_WEEKDAY] & 0x07); if (regs[RTC_HOUR] & HOUR_12) { tm->tm_hour = bcd2bin(regs[RTC_HOUR] & 0x01f); if (tm->tm_hour == 12) @@ -88,7 +88,7 @@ static void tm_to_regs(struct rtc_time *tm, u8 *regs) regs[RTC_YEAR1] = bin2bcd(low); regs[RTC_MONTH] = bin2bcd(tm->tm_mon + 1); regs[RTC_DATE] = bin2bcd(tm->tm_mday); - regs[RTC_WEEKDAY] = tm->tm_wday + 1; + regs[RTC_WEEKDAY] = tm->tm_wday; regs[RTC_HOUR] = bin2bcd(tm->tm_hour); regs[RTC_MIN] = bin2bcd(tm->tm_min); regs[RTC_SEC] = bin2bcd(tm->tm_sec); @@ -153,7 +153,7 @@ static int max8907_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm) tm_to_regs(&alrm->time, regs); /* Disable alarm while we update the target time */ - ret = regmap_update_bits(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0x7f, 0); + ret = regmap_write(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0); if (ret < 0) return ret; @@ -163,8 +163,7 @@ static int max8907_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm) return ret; if (alrm->enabled) - ret = regmap_update_bits(rtc->regmap, MAX8907_REG_ALARM0_CNTL, - 0x7f, 0x7f); + ret = regmap_write(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0x77); return ret; } -- cgit v0.10.2 From 11ba5a1eeb5406c58ffda0fc07360b2ef5c4f176 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 23 Jan 2014 15:55:19 -0800 Subject: drivers/rtc/rtc-s5m.c: s5m_rtc_{suspend,resume}() should depend on CONFIG_PM_SLEEP If CONFIG_PM_SLEEP=n: drivers/rtc/rtc-s5m.c:643: warning: `s5m_rtc_resume' defined but not used drivers/rtc/rtc-s5m.c:654: warning: `s5m_rtc_suspend' defined but not used Signed-off-by: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index ae8119d..476af93 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -639,6 +639,7 @@ static void s5m_rtc_shutdown(struct platform_device *pdev) s5m_rtc_enable_smpl(info, false); } +#ifdef CONFIG_PM_SLEEP static int s5m_rtc_resume(struct device *dev) { struct s5m_rtc_info *info = dev_get_drvdata(dev); @@ -660,6 +661,7 @@ static int s5m_rtc_suspend(struct device *dev) return ret; } +#endif /* CONFIG_PM_SLEEP */ static SIMPLE_DEV_PM_OPS(s5m_rtc_pm_ops, s5m_rtc_suspend, s5m_rtc_resume); -- cgit v0.10.2 From d643a49ae16c755b3dc2ef897438b7d9c6dd488b Mon Sep 17 00:00:00 2001 From: Andreas Werner Date: Thu, 23 Jan 2014 15:55:20 -0800 Subject: drivers/rtc/rtc-rx8581.c: add SMBus-only adapters support Add support for SMBus-only adapters (e.g. i2c-piix4). The driver has implemented only support for I2C adapters which implement the I2C_FUNC_SMBUS_I2C_BLOCK functionality before. With this patch it is possible to load and use the RTC driver with I2C and SMBUS adapters like the rtc-ds1307 does. Tested on AMD G Series Platform (i2c-piix4 adapter driver). Signed-off-by: Andreas Werner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-rx8581.c b/drivers/rtc/rtc-rx8581.c index 00b0eb7..de8d9c4 100644 --- a/drivers/rtc/rtc-rx8581.c +++ b/drivers/rtc/rtc-rx8581.c @@ -52,8 +52,45 @@ #define RX8581_CTRL_STOP 0x02 /* STOP bit */ #define RX8581_CTRL_RESET 0x01 /* RESET bit */ +struct rx8581 { + struct i2c_client *client; + struct rtc_device *rtc; + s32 (*read_block_data)(const struct i2c_client *client, u8 command, + u8 length, u8 *values); + s32 (*write_block_data)(const struct i2c_client *client, u8 command, + u8 length, const u8 *values); +}; + static struct i2c_driver rx8581_driver; +static int rx8581_read_block_data(const struct i2c_client *client, u8 command, + u8 length, u8 *values) +{ + s32 i, data; + + for (i = 0; i < length; i++) { + data = i2c_smbus_read_byte_data(client, command + i); + if (data < 0) + return data; + values[i] = data; + } + return i; +} + +static int rx8581_write_block_data(const struct i2c_client *client, u8 command, + u8 length, const u8 *values) +{ + s32 i, ret; + + for (i = 0; i < length; i++) { + ret = i2c_smbus_write_byte_data(client, command + i, + values[i]); + if (ret < 0) + return ret; + } + return length; +} + /* * In the routines that deal directly with the rx8581 hardware, we use * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. @@ -62,6 +99,7 @@ static int rx8581_get_datetime(struct i2c_client *client, struct rtc_time *tm) { unsigned char date[7]; int data, err; + struct rx8581 *rx8581 = i2c_get_clientdata(client); /* First we ensure that the "update flag" is not set, we read the * time and date then re-read the "update flag". If the update flag @@ -80,14 +118,13 @@ static int rx8581_get_datetime(struct i2c_client *client, struct rtc_time *tm) err = i2c_smbus_write_byte_data(client, RX8581_REG_FLAG, (data & ~RX8581_FLAG_UF)); if (err != 0) { - dev_err(&client->dev, "Unable to write device " - "flags\n"); + dev_err(&client->dev, "Unable to write device flags\n"); return -EIO; } } /* Now read time and date */ - err = i2c_smbus_read_i2c_block_data(client, RX8581_REG_SC, + err = rx8581->read_block_data(client, RX8581_REG_SC, 7, date); if (err < 0) { dev_err(&client->dev, "Unable to read date\n"); @@ -140,6 +177,7 @@ static int rx8581_set_datetime(struct i2c_client *client, struct rtc_time *tm) { int data, err; unsigned char buf[7]; + struct rx8581 *rx8581 = i2c_get_clientdata(client); dev_dbg(&client->dev, "%s: secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", @@ -176,7 +214,7 @@ static int rx8581_set_datetime(struct i2c_client *client, struct rtc_time *tm) } /* write register's data */ - err = i2c_smbus_write_i2c_block_data(client, RX8581_REG_SC, 7, buf); + err = rx8581->write_block_data(client, RX8581_REG_SC, 7, buf); if (err < 0) { dev_err(&client->dev, "Unable to write to date registers\n"); return -EIO; @@ -231,22 +269,39 @@ static const struct rtc_class_ops rx8581_rtc_ops = { static int rx8581_probe(struct i2c_client *client, const struct i2c_device_id *id) { - struct rtc_device *rtc; + struct rx8581 *rx8581; dev_dbg(&client->dev, "%s\n", __func__); - if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) - return -ENODEV; + if (!i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_BYTE_DATA) + && !i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_I2C_BLOCK)) + return -EIO; - dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); + rx8581 = devm_kzalloc(&client->dev, sizeof(struct rx8581), GFP_KERNEL); + if (!rx8581) + return -ENOMEM; - rtc = devm_rtc_device_register(&client->dev, rx8581_driver.driver.name, - &rx8581_rtc_ops, THIS_MODULE); + i2c_set_clientdata(client, rx8581); + rx8581->client = client; - if (IS_ERR(rtc)) - return PTR_ERR(rtc); + if (i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_I2C_BLOCK)) { + rx8581->read_block_data = i2c_smbus_read_i2c_block_data; + rx8581->write_block_data = i2c_smbus_write_i2c_block_data; + } else { + rx8581->read_block_data = rx8581_read_block_data; + rx8581->write_block_data = rx8581_write_block_data; + } - i2c_set_clientdata(client, rtc); + dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); + + rx8581->rtc = devm_rtc_device_register(&client->dev, + rx8581_driver.driver.name, &rx8581_rtc_ops, THIS_MODULE); + + if (IS_ERR(rx8581->rtc)) { + dev_err(&client->dev, + "unable to register the class device\n"); + return PTR_ERR(rx8581->rtc); + } return 0; } -- cgit v0.10.2 From 7e775f46a125f894a1d71e96797c776dbec161f0 Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov Date: Thu, 23 Jan 2014 15:55:21 -0800 Subject: fs/pipe.c: skip file_update_time on frozen fs Pipe has no data associated with fs so it is not good idea to block pipe_write() if FS is frozen, but we can not update file's time on such filesystem. Let's use same idea as we use in touch_time(). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=65701 Signed-off-by: Dmitry Monakhov Reviewed-by: Jan Kara Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/pipe.c b/fs/pipe.c index 0e0752e..78fd0d0 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -663,10 +663,11 @@ out: wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); } - if (ret > 0) { + if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) { int err = file_update_time(filp); if (err) ret = err; + sb_end_write(file_inode(filp)->i_sb); } return ret; } -- cgit v0.10.2 From 4b15d61718f0d1bd3bc32e15bffb25a31c1d5782 Mon Sep 17 00:00:00 2001 From: Wenliang Fan Date: Thu, 23 Jan 2014 15:55:22 -0800 Subject: fs/nilfs2: fix integer overflow in nilfs_ioctl_wrap_copy() The local variable 'pos' in nilfs_ioctl_wrap_copy function can overflow if a large number was passed to argv->v_index from userspace and the sum of argv->v_index and argv->v_nmembs exceeds the maximum value of __u64 type integer (= ~(__u64)0 = 18446744073709551615). Here, argv->v_index is a 64-bit width argument to specify the start position of target data items (such as segment number, checkpoint number, or virtual block address of nilfs), and argv->v_nmembs gives the total number of the items that userland programs (such as lssu, lscp, or cleanerd) want to get information about, which also gives the maximum element count of argv->v_base[] array. nilfs_ioctl_wrap_copy() calls dofunc() repeatedly and increments the position variable 'pos' at the end of each iteration if dofunc() itself didn't update 'pos': if (pos == ppos) pos += n; This patch prevents the overflow here by rejecting pairs of a start position (argv->v_index) and a total count (argv->v_nmembs) which leads to the overflow. [konishi.ryusuke@lab.ntt.co.jp: fix signedness issue] Signed-off-by: Wenliang Fan Cc: Vyacheslav Dubeyko Signed-off-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index b44bdb29..d22281d 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -57,6 +57,14 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs, if (argv->v_size > PAGE_SIZE) return -EINVAL; + /* + * Reject pairs of a start item position (argv->v_index) and a + * total count (argv->v_nmembs) which leads position 'pos' to + * overflow by the increment at the end of the loop. + */ + if (argv->v_index > ~(__u64)0 - argv->v_nmembs) + return -EINVAL; + buf = (void *)__get_free_pages(GFP_NOFS, 0); if (unlikely(!buf)) return -ENOMEM; -- cgit v0.10.2 From d623a9420c9ae2b748ba458c0e9d59084419fce0 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Thu, 23 Jan 2014 15:55:23 -0800 Subject: nilfs2: add comments for ioctls Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2 specific ioctls in Documentation/filesystems/nilfs2.txt. Signed-off-by: Vyacheslav Dubeyko Reviewed-by: Ryusuke Konishi Cc: Wenliang Fan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index 873a2ab..06887d4 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -81,6 +81,62 @@ nodiscard(*) The discard/TRIM commands are sent to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs. +Ioctls +====== + +There is some NILFS2 specific functionality which can be accessed by applications +through the system call interfaces. The list of all NILFS2 specific ioctls are +shown in the table below. + +Table of NILFS2 specific ioctls +.............................................................................. + Ioctl Description + NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between + checkpoint and snapshot state. This ioctl is + used in chcp and mkcp utilities. + + NILFS_IOCTL_DELETE_CHECKPOINT Remove checkpoint from NILFS2 file system. + This ioctl is used in rmcp utility. + + NILFS_IOCTL_GET_CPINFO Return info about requested checkpoints. This + ioctl is used in lscp utility and by + nilfs_cleanerd daemon. + + NILFS_IOCTL_GET_CPSTAT Return checkpoints statistics. This ioctl is + used by lscp, rmcp utilities and by + nilfs_cleanerd daemon. + + NILFS_IOCTL_GET_SUINFO Return segment usage info about requested + segments. This ioctl is used in lssu, + nilfs_resize utilities and by nilfs_cleanerd + daemon. + + NILFS_IOCTL_GET_SUSTAT Return segment usage statistics. This ioctl + is used in lssu, nilfs_resize utilities and + by nilfs_cleanerd daemon. + + NILFS_IOCTL_GET_VINFO Return information on virtual block addresses. + This ioctl is used by nilfs_cleanerd daemon. + + NILFS_IOCTL_GET_BDESCS Return information about descriptors of disk + block numbers. This ioctl is used by + nilfs_cleanerd daemon. + + NILFS_IOCTL_CLEAN_SEGMENTS Do garbage collection operation in the + environment of requested parameters from + userspace. This ioctl is used by + nilfs_cleanerd daemon. + + NILFS_IOCTL_SYNC Make a checkpoint. This ioctl is used in + mkcp utility. + + NILFS_IOCTL_RESIZE Resize NILFS2 volume. This ioctl is used + by nilfs_resize utility. + + NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and + upper limit of segments in bytes. This ioctl + is used by nilfs_resize utility. + NILFS2 usage ============ diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index d22281d..2b34021 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -37,7 +37,26 @@ #include "sufile.h" #include "dat.h" - +/** + * nilfs_ioctl_wrap_copy - wrapping function of get/set metadata info + * @nilfs: nilfs object + * @argv: vector of arguments from userspace + * @dir: set of direction flags + * @dofunc: concrete function of get/set metadata info + * + * Description: nilfs_ioctl_wrap_copy() gets/sets metadata info by means of + * calling dofunc() function on the basis of @argv argument. + * + * Return Value: On success, 0 is returned and requested metadata info + * is copied into userspace. On error, one of the following + * negative error codes is returned. + * + * %-EINVAL - Invalid arguments from userspace. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EFAULT - Failure during execution of requested operation. + */ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs, struct nilfs_argv *argv, int dir, ssize_t (*dofunc)(struct the_nilfs *, @@ -107,6 +126,9 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs, return ret; } +/** + * nilfs_ioctl_getflags - ioctl to support lsattr + */ static int nilfs_ioctl_getflags(struct inode *inode, void __user *argp) { unsigned int flags = NILFS_I(inode)->i_flags & FS_FL_USER_VISIBLE; @@ -114,6 +136,9 @@ static int nilfs_ioctl_getflags(struct inode *inode, void __user *argp) return put_user(flags, (int __user *)argp); } +/** + * nilfs_ioctl_setflags - ioctl to support chattr + */ static int nilfs_ioctl_setflags(struct inode *inode, struct file *filp, void __user *argp) { @@ -166,11 +191,33 @@ out: return ret; } +/** + * nilfs_ioctl_getversion - get info about a file's version (generation number) + */ static int nilfs_ioctl_getversion(struct inode *inode, void __user *argp) { return put_user(inode->i_generation, (int __user *)argp); } +/** + * nilfs_ioctl_change_cpmode - change checkpoint mode (checkpoint/snapshot) + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_change_cpmode() function changes mode of + * given checkpoint between checkpoint and snapshot state. This ioctl + * is used in chcp and mkcp utilities. + * + * Return Value: On success, 0 is returned and mode of a checkpoint is + * changed. On error, one of the following negative error codes + * is returned. + * + * %-EPERM - Operation not permitted. + * + * %-EFAULT - Failure during checkpoint mode changing. + */ static int nilfs_ioctl_change_cpmode(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -206,6 +253,25 @@ out: return ret; } +/** + * nilfs_ioctl_delete_checkpoint - remove checkpoint + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_delete_checkpoint() function removes + * checkpoint from NILFS2 file system. This ioctl is used in rmcp + * utility. + * + * Return Value: On success, 0 is returned and a checkpoint is + * removed. On error, one of the following negative error codes + * is returned. + * + * %-EPERM - Operation not permitted. + * + * %-EFAULT - Failure during checkpoint removing. + */ static int nilfs_ioctl_delete_checkpoint(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) @@ -237,6 +303,21 @@ out: return ret; } +/** + * nilfs_ioctl_do_get_cpinfo - callback method getting info about checkpoints + * @nilfs: nilfs object + * @posp: pointer on array of checkpoint's numbers + * @flags: checkpoint mode (checkpoint or snapshot) + * @buf: buffer for storing checkponts' info + * @size: size in bytes of one checkpoint info item in array + * @nmembs: number of checkpoints in array (numbers and infos) + * + * Description: nilfs_ioctl_do_get_cpinfo() function returns info about + * requested checkpoints. The NILFS_IOCTL_GET_CPINFO ioctl is used in + * lscp utility and by nilfs_cleanerd daemon. + * + * Return value: count of nilfs_cpinfo structures in output buffer. + */ static ssize_t nilfs_ioctl_do_get_cpinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, void *buf, size_t size, size_t nmembs) @@ -250,6 +331,27 @@ nilfs_ioctl_do_get_cpinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, return ret; } +/** + * nilfs_ioctl_get_cpstat - get checkpoints statistics + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_get_cpstat() returns information about checkpoints. + * The NILFS_IOCTL_GET_CPSTAT ioctl is used by lscp, rmcp utilities + * and by nilfs_cleanerd daemon. + * + * Return Value: On success, 0 is returned, and checkpoints information is + * copied into userspace pointer @argp. On error, one of the following + * negative error codes is returned. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EFAULT - Failure during getting checkpoints statistics. + */ static int nilfs_ioctl_get_cpstat(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -268,6 +370,21 @@ static int nilfs_ioctl_get_cpstat(struct inode *inode, struct file *filp, return ret; } +/** + * nilfs_ioctl_do_get_suinfo - callback method getting segment usage info + * @nilfs: nilfs object + * @posp: pointer on array of segment numbers + * @flags: *not used* + * @buf: buffer for storing suinfo array + * @size: size in bytes of one suinfo item in array + * @nmembs: count of segment numbers and suinfos in array + * + * Description: nilfs_ioctl_do_get_suinfo() function returns segment usage + * info about requested segments. The NILFS_IOCTL_GET_SUINFO ioctl is used + * in lssu, nilfs_resize utilities and by nilfs_cleanerd daemon. + * + * Return value: count of nilfs_suinfo structures in output buffer. + */ static ssize_t nilfs_ioctl_do_get_suinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, void *buf, size_t size, size_t nmembs) @@ -281,6 +398,27 @@ nilfs_ioctl_do_get_suinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, return ret; } +/** + * nilfs_ioctl_get_sustat - get segment usage statistics + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_get_sustat() returns segment usage statistics. + * The NILFS_IOCTL_GET_SUSTAT ioctl is used in lssu, nilfs_resize utilities + * and by nilfs_cleanerd daemon. + * + * Return Value: On success, 0 is returned, and segment usage information is + * copied into userspace pointer @argp. On error, one of the following + * negative error codes is returned. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EFAULT - Failure during getting segment usage statistics. + */ static int nilfs_ioctl_get_sustat(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -299,6 +437,21 @@ static int nilfs_ioctl_get_sustat(struct inode *inode, struct file *filp, return ret; } +/** + * nilfs_ioctl_do_get_vinfo - callback method getting virtual blocks info + * @nilfs: nilfs object + * @posp: *not used* + * @flags: *not used* + * @buf: buffer for storing array of nilfs_vinfo structures + * @size: size in bytes of one vinfo item in array + * @nmembs: count of vinfos in array + * + * Description: nilfs_ioctl_do_get_vinfo() function returns information + * on virtual block addresses. The NILFS_IOCTL_GET_VINFO ioctl is used + * by nilfs_cleanerd daemon. + * + * Return value: count of nilfs_vinfo structures in output buffer. + */ static ssize_t nilfs_ioctl_do_get_vinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, void *buf, size_t size, size_t nmembs) @@ -311,6 +464,21 @@ nilfs_ioctl_do_get_vinfo(struct the_nilfs *nilfs, __u64 *posp, int flags, return ret; } +/** + * nilfs_ioctl_do_get_bdescs - callback method getting disk block descriptors + * @nilfs: nilfs object + * @posp: *not used* + * @flags: *not used* + * @buf: buffer for storing array of nilfs_bdesc structures + * @size: size in bytes of one bdesc item in array + * @nmembs: count of bdescs in array + * + * Description: nilfs_ioctl_do_get_bdescs() function returns information + * about descriptors of disk block numbers. The NILFS_IOCTL_GET_BDESCS ioctl + * is used by nilfs_cleanerd daemon. + * + * Return value: count of nilfs_bdescs structures in output buffer. + */ static ssize_t nilfs_ioctl_do_get_bdescs(struct the_nilfs *nilfs, __u64 *posp, int flags, void *buf, size_t size, size_t nmembs) @@ -337,6 +505,29 @@ nilfs_ioctl_do_get_bdescs(struct the_nilfs *nilfs, __u64 *posp, int flags, return nmembs; } +/** + * nilfs_ioctl_get_bdescs - get disk block descriptors + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_do_get_bdescs() function returns information + * about descriptors of disk block numbers. The NILFS_IOCTL_GET_BDESCS ioctl + * is used by nilfs_cleanerd daemon. + * + * Return Value: On success, 0 is returned, and disk block descriptors are + * copied into userspace pointer @argp. On error, one of the following + * negative error codes is returned. + * + * %-EINVAL - Invalid arguments from userspace. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EFAULT - Failure during getting disk block descriptors. + */ static int nilfs_ioctl_get_bdescs(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -360,6 +551,26 @@ static int nilfs_ioctl_get_bdescs(struct inode *inode, struct file *filp, return ret; } +/** + * nilfs_ioctl_move_inode_block - prepare data/node block for moving by GC + * @inode: inode object + * @vdesc: descriptor of virtual block number + * @buffers: list of moving buffers + * + * Description: nilfs_ioctl_move_inode_block() function registers data/node + * buffer in the GC pagecache and submit read request. + * + * Return Value: On success, 0 is returned. On error, one of the following + * negative error codes is returned. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-ENOENT - Requested block doesn't exist. + * + * %-EEXIST - Blocks conflict is detected. + */ static int nilfs_ioctl_move_inode_block(struct inode *inode, struct nilfs_vdesc *vdesc, struct list_head *buffers) @@ -405,6 +616,19 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode, return 0; } +/** + * nilfs_ioctl_move_blocks - move valid inode's blocks during garbage collection + * @sb: superblock object + * @argv: vector of arguments from userspace + * @buf: array of nilfs_vdesc structures + * + * Description: nilfs_ioctl_move_blocks() function reads valid data/node + * blocks that garbage collector specified with the array of nilfs_vdesc + * structures and stores them into page caches of GC inodes. + * + * Return Value: Number of processed nilfs_vdesc structures or + * error code, otherwise. + */ static int nilfs_ioctl_move_blocks(struct super_block *sb, struct nilfs_argv *argv, void *buf) { @@ -470,6 +694,25 @@ static int nilfs_ioctl_move_blocks(struct super_block *sb, return ret; } +/** + * nilfs_ioctl_delete_checkpoints - delete checkpoints + * @nilfs: nilfs object + * @argv: vector of arguments from userspace + * @buf: array of periods of checkpoints numbers + * + * Description: nilfs_ioctl_delete_checkpoints() function deletes checkpoints + * in the period from p_start to p_end, excluding p_end itself. The checkpoints + * which have been already deleted are ignored. + * + * Return Value: Number of processed nilfs_period structures or + * error code, otherwise. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EINVAL - invalid checkpoints. + */ static int nilfs_ioctl_delete_checkpoints(struct the_nilfs *nilfs, struct nilfs_argv *argv, void *buf) { @@ -487,6 +730,24 @@ static int nilfs_ioctl_delete_checkpoints(struct the_nilfs *nilfs, return nmembs; } +/** + * nilfs_ioctl_free_vblocknrs - free virtual block numbers + * @nilfs: nilfs object + * @argv: vector of arguments from userspace + * @buf: array of virtual block numbers + * + * Description: nilfs_ioctl_free_vblocknrs() function frees + * the virtual block numbers specified by @buf and @argv->v_nmembs. + * + * Return Value: Number of processed virtual block numbers or + * error code, otherwise. + * + * %-EIO - I/O error. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-ENOENT - The virtual block number have not been allocated. + */ static int nilfs_ioctl_free_vblocknrs(struct the_nilfs *nilfs, struct nilfs_argv *argv, void *buf) { @@ -498,6 +759,24 @@ static int nilfs_ioctl_free_vblocknrs(struct the_nilfs *nilfs, return (ret < 0) ? ret : nmembs; } +/** + * nilfs_ioctl_mark_blocks_dirty - mark blocks dirty + * @nilfs: nilfs object + * @argv: vector of arguments from userspace + * @buf: array of block descriptors + * + * Description: nilfs_ioctl_mark_blocks_dirty() function marks + * metadata file or data blocks as dirty. + * + * Return Value: Number of processed block descriptors or + * error code, otherwise. + * + * %-ENOMEM - Insufficient memory available. + * + * %-EIO - I/O error + * + * %-ENOENT - the specified block does not exist (hole block) + */ static int nilfs_ioctl_mark_blocks_dirty(struct the_nilfs *nilfs, struct nilfs_argv *argv, void *buf) { @@ -579,6 +858,20 @@ int nilfs_ioctl_prepare_clean_segments(struct the_nilfs *nilfs, return ret; } +/** + * nilfs_ioctl_clean_segments - clean segments + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_clean_segments() function makes garbage + * collection operation in the environment of requested parameters + * from userspace. The NILFS_IOCTL_CLEAN_SEGMENTS ioctl is used by + * nilfs_cleanerd daemon. + * + * Return Value: On success, 0 is returned or error code, otherwise. + */ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -690,6 +983,33 @@ out: return ret; } +/** + * nilfs_ioctl_sync - make a checkpoint + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * + * Description: nilfs_ioctl_sync() function constructs a logical segment + * for checkpointing. This function guarantees that all modified data + * and metadata are written out to the device when it successfully + * returned. + * + * Return Value: On success, 0 is retured. On errors, one of the following + * negative error code is returned. + * + * %-EROFS - Read only filesystem. + * + * %-EIO - I/O error + * + * %-ENOSPC - No space left on device (only in a panic state). + * + * %-ERESTARTSYS - Interrupted. + * + * %-ENOMEM - Insufficient memory available. + * + * %-EFAULT - Failure during execution of requested operation. + */ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp) { @@ -718,6 +1038,14 @@ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp, return 0; } +/** + * nilfs_ioctl_resize - resize NILFS2 volume + * @inode: inode object + * @filp: file object + * @argp: pointer on argument from userspace + * + * Return Value: On success, 0 is returned or error code, otherwise. + */ static int nilfs_ioctl_resize(struct inode *inode, struct file *filp, void __user *argp) { @@ -743,6 +1071,17 @@ out: return ret; } +/** + * nilfs_ioctl_set_alloc_range - limit range of segments to be allocated + * @inode: inode object + * @argp: pointer on argument from userspace + * + * Decription: nilfs_ioctl_set_alloc_range() function defines lower limit + * of segments in bytes and upper limit of segments in bytes. + * The NILFS_IOCTL_SET_ALLOC_RANGE is used by nilfs_resize utility. + * + * Return Value: On success, 0 is returned or error code, otherwise. + */ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp) { struct the_nilfs *nilfs = inode->i_sb->s_fs_info; @@ -775,6 +1114,28 @@ out: return ret; } +/** + * nilfs_ioctl_get_info - wrapping function of get metadata info + * @inode: inode object + * @filp: file object + * @cmd: ioctl's request code + * @argp: pointer on argument from userspace + * @membsz: size of an item in bytes + * @dofunc: concrete function of getting metadata info + * + * Description: nilfs_ioctl_get_info() gets metadata info by means of + * calling dofunc() function. + * + * Return Value: On success, 0 is returned and requested metadata info + * is copied into userspace. On error, one of the following + * negative error codes is returned. + * + * %-EINVAL - Invalid arguments from userspace. + * + * %-ENOMEM - Insufficient amount of memory available. + * + * %-EFAULT - Failure during execution of requested operation. + */ static int nilfs_ioctl_get_info(struct inode *inode, struct file *filp, unsigned int cmd, void __user *argp, size_t membsz, -- cgit v0.10.2 From d74a054fa4f5a3fc05eae11b3ff0b653b49dd7cb Mon Sep 17 00:00:00 2001 From: Sougata Santra Date: Thu, 23 Jan 2014 15:55:25 -0800 Subject: hfsplus: remove hfsplus_file_lookup() HFS+ resource fork lookup breaks opendir() library function. Since opendir first calls open() with O_DIRECTORY flag set. O_DIRECTORY means "refuse to open if not a directory". The open system call in the kernel does a check for inode->i_op->lookup and returns -ENOTDIR. So if hfsplus_file_lookup is set it allows opendir() for plain files. Also resource fork lookup in HFS+ does not work. Since it is never invoked after VFS permission checking. It will always return with -EACCES. When we call opendir() on a file, it does not return NULL. opendir() library call is based on open with O_DIRECTORY flag passed and then layered on top of getdents() system call. O_DIRECTORY means "refuse to open if not a directory". The open() system call in the kernel does a check for: do_sys_open() -->..--> can_lookup() i.e it only checks inode->i_op->lookup and returns ENOTDIR if this function pointer is not set. In OSX, we can open "file/rsrc" to get the resource fork of "file". This behavior is emulated inside hfsplus on Linux, which means that to some degree every file acts like a directory. That is the reason lookup() inode operations is supported for files, and it is possible to do a lookup on this specific name. As a result of this open succeeds without returning ENOTDIR for HFS+ Please see the LKML discussion thread on this issue: http://marc.info/?l=linux-fsdevel&m=122823343730412&w=2 I tried to test file/rsrc lookup in HFS+ driver and the feature does not work. From OSX: $ touch test $ echo "1234" > test/..namedfork/rsrc $ ls -l test..namedfork/rsrc --rw-r--r-- 1 tuxera staff 5 10 dec 12:59 test/..namedfork/rsrc [sougata@ultrabook tmp]$ id uid=1000(sougata) gid=1000(sougata) groups=1000(sougata),5(tty),18(dialout),1001(vboxusers) [sougata@ultrabook tmp]$ mount /dev/sdb1 on /mnt/tmp type hfsplus (rw,relatime,umask=0,uid=1000,gid=1000,nls=utf8) [sougata@ultrabook tmp]$ ls -l test/rsrc ls: cannot access test/rsrc: Permission denied According to this LKML thread it is expected behavior. http://marc.info/?t=121139033800008&r=1&w=4 I guess now that permission checking happens in vfs generic_permission() ? So it turns out that even though the lookup() inode_operation exists for HFS+ files. It cannot really get invoked ?. So if we can disable this feature to make opendir() work for HFS+. Signed-off-by: Sougata Santra Acked-by: Christoph Hellwig Cc: Vyacheslav Dubeyko Cc: Anton Altaparmakov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c index 37213d0..3ebda92 100644 --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -178,64 +178,6 @@ const struct dentry_operations hfsplus_dentry_operations = { .d_compare = hfsplus_compare_dentry, }; -static struct dentry *hfsplus_file_lookup(struct inode *dir, - struct dentry *dentry, unsigned int flags) -{ - struct hfs_find_data fd; - struct super_block *sb = dir->i_sb; - struct inode *inode = NULL; - struct hfsplus_inode_info *hip; - int err; - - if (HFSPLUS_IS_RSRC(dir) || strcmp(dentry->d_name.name, "rsrc")) - goto out; - - inode = HFSPLUS_I(dir)->rsrc_inode; - if (inode) - goto out; - - inode = new_inode(sb); - if (!inode) - return ERR_PTR(-ENOMEM); - - hip = HFSPLUS_I(inode); - inode->i_ino = dir->i_ino; - INIT_LIST_HEAD(&hip->open_dir_list); - mutex_init(&hip->extents_lock); - hip->extent_state = 0; - hip->flags = 0; - hip->userflags = 0; - set_bit(HFSPLUS_I_RSRC, &hip->flags); - - err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd); - if (!err) { - err = hfsplus_find_cat(sb, dir->i_ino, &fd); - if (!err) - err = hfsplus_cat_read_inode(inode, &fd); - hfs_find_exit(&fd); - } - if (err) { - iput(inode); - return ERR_PTR(err); - } - hip->rsrc_inode = dir; - HFSPLUS_I(dir)->rsrc_inode = inode; - igrab(dir); - - /* - * __mark_inode_dirty expects inodes to be hashed. Since we don't - * want resource fork inodes in the regular inode space, we make them - * appear hashed, but do not put on any lists. hlist_del() - * will work fine and require no locking. - */ - hlist_add_fake(&inode->i_hash); - - mark_inode_dirty(inode); -out: - d_add(dentry, inode); - return NULL; -} - static void hfsplus_get_perms(struct inode *inode, struct hfsplus_perm *perms, int dir) { @@ -385,7 +327,6 @@ int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end, } static const struct inode_operations hfsplus_file_inode_operations = { - .lookup = hfsplus_file_lookup, .setattr = hfsplus_setattr, .setxattr = generic_setxattr, .getxattr = generic_getxattr, -- cgit v0.10.2 From c1083732908f233c5234a5c8765347602e83630c Mon Sep 17 00:00:00 2001 From: Andre Richter Date: Thu, 23 Jan 2014 15:55:26 -0800 Subject: Documentation/filesystems/sysfs.txt: fix device_attribute declaration Fix a wrong device_attribute declaration example. Signed-off-by: Andre Richter Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index a6619b7..b35a64b 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -108,12 +108,12 @@ static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); is equivalent to doing: static struct device_attribute dev_attr_foo = { - .attr = { + .attr = { .name = "foo", .mode = S_IWUSR | S_IRUGO, - .show = show_foo, - .store = store_foo, }, + .show = show_foo, + .store = store_foo, }; -- cgit v0.10.2 From f5abc8e75815fc6e8f4635d2c011315d132a32cf Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Thu, 23 Jan 2014 15:55:27 -0800 Subject: Documentation/blockdev/ramdisk.txt: updates - ramdisk_blocksize doesn't exist anymore - Module parameters added to documentation Signed-off-by: Fabian Frederick Acked-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.txt index fa72e97..fe2ef97 100644 --- a/Documentation/blockdev/ramdisk.txt +++ b/Documentation/blockdev/ramdisk.txt @@ -36,21 +36,30 @@ allowing one to squeeze more programs onto an average installation or rescue floppy disk. -2) Kernel Command Line Parameters +2) Parameters --------------------------------- +2a) Kernel Command Line Parameters + ramdisk_size=N ============== This parameter tells the RAM disk driver to set up RAM disks of N k size. The -default is 4096 (4 MB) (8192 (8 MB) on S390). +default is 4096 (4 MB). + +2b) Module parameters - ramdisk_blocksize=N - =================== + rd_nr + ===== + /dev/ramX devices created. -This parameter tells the RAM disk driver how many bytes to use per block. The -default is 1024 (BLOCK_SIZE). + max_part + ======== + Maximum partition number. + rd_size + ======= + See ramdisk_size. 3) Using "rdev -r" ------------------ -- cgit v0.10.2 From 50114c110346a57d25691148e4376886287dd8be Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Thu, 23 Jan 2014 15:55:28 -0800 Subject: Documentation/filesystems/00-INDEX: updates Add the following documentation-files with description : -autofs4-mount-control.txt -btrfs.txt -debugfs.txt -devpts.txt -fiemap.txt -gfs2-glocks.txt -gfs2-uevents.txt -omfs.txt -path-lookup.txt -qnx6.txt -quota.txt -squashfs.txt -sysfs-tagging.txt -ubifs.txt -xfs-delayed-logging-design.txt -xfs-self-describing-metadata.txt Add the following documentation directories with description : -caching -cifs (replacing cifs.txt) -pohmelfs Remove the following documentation-files reference: -dentry-locking.txt -reiser4.txt Signed-off-by: Fabian Frederick Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 8042050..632211c 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -10,24 +10,32 @@ afs.txt - info and examples for the distributed AFS (Andrew File System) fs. affs.txt - info and mount options for the Amiga Fast File System. +autofs4-mount-control.txt + - info on device control operations for autofs4 module. automount-support.txt - information about filesystem automount support. befs.txt - information about the BeOS filesystem for Linux. bfs.txt - info for the SCO UnixWare Boot Filesystem (BFS). +btrfs.txt + - info for the BTRFS filesystem. +caching/ + - directory containing filesystem cache documentation. ceph.txt - - info for the Ceph Distributed File System -cifs.txt - - description of the CIFS filesystem. + - info for the Ceph Distributed File System. +cifs/ + - directory containing CIFS filesystem documentation and example code. coda.txt - description of the CODA filesystem. configfs/ - directory containing configfs documentation and example code. cramfs.txt - info on the cram filesystem for small storage (ROMs etc). -dentry-locking.txt - - info on the RCU-based dcache locking model. +debugfs.txt + - info on the debugfs filesystem. +devpts.txt + - info on the devpts filesystem. directory-locking - info about the locking scheme used for directory operations. dlmfs.txt @@ -35,7 +43,7 @@ dlmfs.txt dnotify.txt - info about directory notification in Linux. dnotify_test.c - - example program for dnotify + - example program for dnotify. ecryptfs.txt - docs on eCryptfs: stacked cryptographic filesystem for Linux. efivarfs.txt @@ -48,12 +56,18 @@ ext3.txt - info, mount options and specifications for the Ext3 filesystem. ext4.txt - info, mount options and specifications for the Ext4 filesystem. -files.txt - - info on file management in the Linux kernel. f2fs.txt - info and mount options for the F2FS filesystem. +fiemap.txt + - info on fiemap ioctl. +files.txt + - info on file management in the Linux kernel. fuse.txt - info on the Filesystem in User SpacE including mount options. +gfs2-glocks.txt + - info on the Global File System 2 - Glock internal locking rules. +gfs2-uevents.txt + - info on the Global File System 2 - uevents. gfs2.txt - info on the Global File System 2. hfs.txt @@ -84,40 +98,58 @@ ntfs.txt - info and mount options for the NTFS filesystem (Windows NT). ocfs2.txt - info and mount options for the OCFS2 clustered filesystem. +omfs.txt + - info on the Optimized MPEG FileSystem. +path-lookup.txt + - info on path walking and name lookup locking. +pohmelfs/ + - directory containing pohmelfs filesystem documentation. porting - various information on filesystem porting. proc.txt - info on Linux's /proc filesystem. +qnx6.txt + - info on the QNX6 filesystem. +quota.txt + - info on Quota subsystem. ramfs-rootfs-initramfs.txt - info on the 'in memory' filesystems ramfs, rootfs and initramfs. -reiser4.txt - - info on the Reiser4 filesystem based on dancing tree algorithms. relay.txt - info on relay, for efficient streaming from kernel to user space. romfs.txt - description of the ROMFS filesystem. seq_file.txt - - how to use the seq_file API + - how to use the seq_file API. sharedsubtree.txt - a description of shared subtrees for namespaces. spufs.txt - info and mount options for the SPU filesystem used on Cell. +squashfs.txt + - info on the squashfs filesystem. sysfs-pci.txt - info on accessing PCI device resources through sysfs. +sysfs-tagging.txt + - info on sysfs tagging to avoid duplicates. sysfs.txt - info on sysfs, a ram-based filesystem for exporting kernel objects. sysv-fs.txt - info on the SystemV/V7/Xenix/Coherent filesystem. tmpfs.txt - info on tmpfs, a filesystem that holds all files in virtual memory. +ubifs.txt + - info on the Unsorted Block Images FileSystem. udf.txt - info and mount options for the UDF filesystem. ufs.txt - info on the ufs filesystem. vfat.txt - - info on using the VFAT filesystem used in Windows NT and Windows 95 + - info on using the VFAT filesystem used in Windows NT and Windows 95. vfs.txt - - overview of the Virtual File System + - overview of the Virtual File System. +xfs-delayed-logging-design.txt + - info on the XFS Delayed Logging Design. +xfs-self-describing-metadata.txt + - info on XFS Self Describing Metadata. xfs.txt - info and mount options for the XFS filesystem. xip.txt -- cgit v0.10.2 From 4a474157747ab7c4432ac269247e0e0e15f85584 Mon Sep 17 00:00:00 2001 From: Robert Graffham Date: Thu, 23 Jan 2014 15:55:29 -0800 Subject: Kconfig: update flightly outdated CONFIG_SMP documentation Remove an outdated reference to "most personal computers" having only one CPU, and change the use of "singleprocessor" and "single processor" in CONFIG_SMP's documentation to "uniprocessor" across all arches where that documentation is present. Signed-off-by: Robert Graffham Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index d39dc9b..3ba48fe 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -539,13 +539,13 @@ config SMP depends on ALPHA_SABLE || ALPHA_LYNX || ALPHA_RAWHIDE || ALPHA_DP264 || ALPHA_WILDFIRE || ALPHA_TITAN || ALPHA_GENERIC || ALPHA_SHARK || ALPHA_MARVEL ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. See also the SMP-HOWTO available at diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 9063ae6..5438cab 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -128,8 +128,8 @@ config SMP default n help This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. if SMP diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index ab1689c..4797b24 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1435,14 +1435,14 @@ config SMP depends on MMU || ARM_MPU help This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If - you say Y here, the kernel will run on many, but not all, single - processor machines. On a single processor machine, the kernel will - run faster if you say N here. + you say Y here, the kernel will run on many, but not all, + uniprocessor machines. On a uniprocessor machine, the kernel + will run faster if you say N here. See also , and the SMP-HOWTO available at diff --git a/arch/m32r/Kconfig b/arch/m32r/Kconfig index 09ef94a..ca45044 100644 --- a/arch/m32r/Kconfig +++ b/arch/m32r/Kconfig @@ -277,13 +277,13 @@ config SMP bool "Symmetric multi-processing support" ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. People using multiprocessor machines who say Y here should also say diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c93d92b..92c8e0b 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2129,13 +2129,13 @@ config SMP depends on SYS_SUPPORTS_SMP help This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. People using multiprocessor machines who say Y here should also say diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig index 8bde923..a648de1 100644 --- a/arch/mn10300/Kconfig +++ b/arch/mn10300/Kconfig @@ -184,13 +184,13 @@ config SMP depends on MN10300_PROC_MN2WS0038 || MN10300_PROC_MN2WS0050 ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. See also , diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index b5f1858..bb2a8ec 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -229,13 +229,13 @@ config SMP bool "Symmetric multi-processing support" ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. See also and the SMP-HOWTO diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index e9f3125..4f858f7 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -334,10 +334,10 @@ config SMP a system with only one CPU, like most personal computers, say N. If you have a system with more than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. See also the SMP-HOWTO available at diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index ce29831..6357710 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -701,13 +701,13 @@ config SMP depends on SYS_SUPPORTS_SMP ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. People using multiprocessor machines who say Y here should also say diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index d4f7a6a..63dfe68 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -152,10 +152,10 @@ config SMP a system with only one CPU, say N. If you have a system with more than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. People using multiprocessor machines who say Y here should also say diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5aadc49..3e97a3d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -279,13 +279,13 @@ config SMP bool "Symmetric multi-processing support" ---help--- This enables support for systems with more than one CPU. If you have - a system with only one CPU, like most personal computers, say N. If - you have a system with more than one CPU, say Y. + a system with only one CPU, say N. If you have a system with more + than one CPU, say Y. - If you say N here, the kernel will run on single and multiprocessor + If you say N here, the kernel will run on uni- and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel + uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. Note that if you say Y here and choose architecture "586" or -- cgit v0.10.2 From f3c73a99a1fac2db992b6879b8a78a3ae2fcc06e Mon Sep 17 00:00:00 2001 From: Sangjung Woo Date: Thu, 23 Jan 2014 15:55:30 -0800 Subject: Documentation/cpu-hotplug.txt: fix a typo in example code As the notifier_block name (i.e. foobar_cpu_notifer) is different from the parameter (i.e.foobar_cpu_notifier) of register function, that is definitely error and it also makes readers confused. Signed-off-by: Sangjung Woo Reviewed-by: Srivatsa S. Bhat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 8cb9938..be675d2 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -285,7 +285,7 @@ A: This is what you would need in your kernel code to receive notifications. return NOTIFY_OK; } - static struct notifier_block foobar_cpu_notifer = + static struct notifier_block foobar_cpu_notifier = { .notifier_call = foobar_cpu_callback, }; -- cgit v0.10.2 From abacd2fe3ca10b3ade57f3634053241a660002c2 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:31 -0800 Subject: coredump: set_dumpable: fix the theoretical race with itself set_dumpable() updates MMF_DUMPABLE_MASK in a non-trivial way to ensure that get_dumpable() can't observe the intermediate state, but this all can't help if multiple threads call set_dumpable() at the same time. And in theory commit_creds()->set_dumpable(SUID_DUMP_ROOT) racing with sys_prctl()->set_dumpable(SUID_DUMP_DISABLE) can result in SUID_DUMP_USER. Change this code to update both bits atomically via cmpxchg(). Note: this assumes that it is safe to mix bitops and cmpxchg. IOW, if, say, an architecture implements cmpxchg() using the locking (like arch/parisc/lib/bitops.c does), then it should use the same locks for set_bit/etc. Signed-off-by: Oleg Nesterov Acked-by: Kees Cook Cc: Alex Kelly Cc: "Eric W. Biederman" Cc: Josh Triplett Cc: Petr Matousek Cc: Vasily Kulikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 7ea097f..f039386 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1614,43 +1614,24 @@ EXPORT_SYMBOL(set_binfmt); /* * set_dumpable converts traditional three-value dumpable to two flags and - * stores them into mm->flags. It modifies lower two bits of mm->flags, but - * these bits are not changed atomically. So get_dumpable can observe the - * intermediate state. To avoid doing unexpected behavior, get get_dumpable - * return either old dumpable or new one by paying attention to the order of - * modifying the bits. - * - * dumpable | mm->flags (binary) - * old new | initial interim final - * ---------+----------------------- - * 0 1 | 00 01 01 - * 0 2 | 00 10(*) 11 - * 1 0 | 01 00 00 - * 1 2 | 01 11 11 - * 2 0 | 11 10(*) 00 - * 2 1 | 11 11 01 - * - * (*) get_dumpable regards interim value of 10 as 11. + * stores them into mm->flags. */ void set_dumpable(struct mm_struct *mm, int value) { - switch (value) { - case SUID_DUMP_DISABLE: - clear_bit(MMF_DUMPABLE, &mm->flags); - smp_wmb(); - clear_bit(MMF_DUMP_SECURELY, &mm->flags); - break; - case SUID_DUMP_USER: - set_bit(MMF_DUMPABLE, &mm->flags); - smp_wmb(); - clear_bit(MMF_DUMP_SECURELY, &mm->flags); - break; - case SUID_DUMP_ROOT: - set_bit(MMF_DUMP_SECURELY, &mm->flags); - smp_wmb(); - set_bit(MMF_DUMPABLE, &mm->flags); - break; - } + unsigned long old, new; + + do { + old = ACCESS_ONCE(mm->flags); + new = old & ~MMF_DUMPABLE_MASK; + + switch (value) { + case SUID_DUMP_ROOT: + new |= (1 << MMF_DUMP_SECURELY); + case SUID_DUMP_USER: + new |= (1<< MMF_DUMPABLE); + } + + } while (cmpxchg(&mm->flags, old, new) != old); } int __get_dumpable(unsigned long mm_flags) -- cgit v0.10.2 From 7288e1187ba935996232246916418c64bb88da30 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:32 -0800 Subject: coredump: kill MMF_DUMPABLE and MMF_DUMP_SECURELY Nobody actually needs MMF_DUMPABLE/MMF_DUMP_SECURELY, they are only used to enforce the encoding of SUID_DUMP_* enum in mm->flags & MMF_DUMPABLE_MASK. Now that set_dumpable() updates both bits atomically we can kill them and simply store the value "as is" in 2 lower bits. Signed-off-by: Oleg Nesterov Acked-by: Kees Cook Cc: Alex Kelly Cc: "Eric W. Biederman" Cc: Josh Triplett Cc: Petr Matousek Cc: Vasily Kulikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index f039386..f798da0 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1613,33 +1613,24 @@ void set_binfmt(struct linux_binfmt *new) EXPORT_SYMBOL(set_binfmt); /* - * set_dumpable converts traditional three-value dumpable to two flags and - * stores them into mm->flags. + * set_dumpable stores three-value SUID_DUMP_* into mm->flags. */ void set_dumpable(struct mm_struct *mm, int value) { unsigned long old, new; + if (WARN_ON((unsigned)value > SUID_DUMP_ROOT)) + return; + do { old = ACCESS_ONCE(mm->flags); - new = old & ~MMF_DUMPABLE_MASK; - - switch (value) { - case SUID_DUMP_ROOT: - new |= (1 << MMF_DUMP_SECURELY); - case SUID_DUMP_USER: - new |= (1<< MMF_DUMPABLE); - } - + new = (old & ~MMF_DUMPABLE_MASK) | value; } while (cmpxchg(&mm->flags, old, new) != old); } int __get_dumpable(unsigned long mm_flags) { - int ret; - - ret = mm_flags & MMF_DUMPABLE_MASK; - return (ret > SUID_DUMP_USER) ? SUID_DUMP_ROOT : ret; + return mm_flags & MMF_DUMPABLE_MASK; } /* diff --git a/include/linux/sched.h b/include/linux/sched.h index 485234d..124430b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -400,10 +400,8 @@ extern int get_dumpable(struct mm_struct *mm); #define SUID_DUMP_ROOT 2 /* Dump as root */ /* mm flags */ -/* dumpable bits */ -#define MMF_DUMPABLE 0 /* core dump is permitted */ -#define MMF_DUMP_SECURELY 1 /* core file is readable only by root */ +/* for SUID_DUMP_* above */ #define MMF_DUMPABLE_BITS 2 #define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1) -- cgit v0.10.2 From 942be3875a1931c379bbc37053829dd6847e0f3f Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:34 -0800 Subject: coredump: make __get_dumpable/get_dumpable inline, kill fs/coredump.h 1. Remove fs/coredump.h. It is not clear why do we need it, it only declares __get_dumpable(), signal.c includes it for no reason. 2. Now that get_dumpable() and __get_dumpable() are really trivial make them inline in linux/sched.h. Signed-off-by: Oleg Nesterov Acked-by: Kees Cook Cc: Alex Kelly Cc: "Eric W. Biederman" Cc: Josh Triplett Cc: Petr Matousek Cc: Vasily Kulikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/coredump.c b/fs/coredump.c index bc3fbcd..e3ad709 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -40,7 +40,6 @@ #include #include "internal.h" -#include "coredump.h" #include diff --git a/fs/coredump.h b/fs/coredump.h deleted file mode 100644 index e39ff07..0000000 --- a/fs/coredump.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _FS_COREDUMP_H -#define _FS_COREDUMP_H - -extern int __get_dumpable(unsigned long mm_flags); - -#endif diff --git a/fs/exec.c b/fs/exec.c index f798da0..9cbad5b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -62,7 +62,6 @@ #include #include "internal.h" -#include "coredump.h" #include @@ -1609,7 +1608,6 @@ void set_binfmt(struct linux_binfmt *new) if (new) __module_get(new->module); } - EXPORT_SYMBOL(set_binfmt); /* @@ -1628,22 +1626,6 @@ void set_dumpable(struct mm_struct *mm, int value) } while (cmpxchg(&mm->flags, old, new) != old); } -int __get_dumpable(unsigned long mm_flags) -{ - return mm_flags & MMF_DUMPABLE_MASK; -} - -/* - * This returns the actual value of the suid_dumpable flag. For things - * that are using this for checking for privilege transitions, it must - * test against SUID_DUMP_USER rather than treating it as a boolean - * value. - */ -int get_dumpable(struct mm_struct *mm) -{ - return __get_dumpable(mm->flags); -} - SYSCALL_DEFINE3(execve, const char __user *, filename, const char __user *const __user *, argv, diff --git a/include/linux/sched.h b/include/linux/sched.h index 124430b..cf9e414 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -391,10 +391,6 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, static inline void arch_pick_mmap_layout(struct mm_struct *mm) {} #endif - -extern void set_dumpable(struct mm_struct *mm, int value); -extern int get_dumpable(struct mm_struct *mm); - #define SUID_DUMP_DISABLE 0 /* No setuid dumping */ #define SUID_DUMP_USER 1 /* Dump as user of process */ #define SUID_DUMP_ROOT 2 /* Dump as root */ @@ -405,6 +401,23 @@ extern int get_dumpable(struct mm_struct *mm); #define MMF_DUMPABLE_BITS 2 #define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1) +extern void set_dumpable(struct mm_struct *mm, int value); +/* + * This returns the actual value of the suid_dumpable flag. For things + * that are using this for checking for privilege transitions, it must + * test against SUID_DUMP_USER rather than treating it as a boolean + * value. + */ +static inline int __get_dumpable(unsigned long mm_flags) +{ + return mm_flags & MMF_DUMPABLE_MASK; +} + +static inline int get_dumpable(struct mm_struct *mm) +{ + return __get_dumpable(mm->flags); +} + /* coredump filter bits */ #define MMF_DUMP_ANON_PRIVATE 2 #define MMF_DUMP_ANON_SHARED 3 -- cgit v0.10.2 From 74e37200de8e9c4e09b70c21c3f13c2071e77457 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:35 -0800 Subject: proc: cleanup/simplify get_task_state/task_state_array get_task_state() and task_state_array[] look confusing and suboptimal, it is not clear what it can actually report to user-space and task_state_array[] blows .data for no reason. 1. state = (tsk->state & TASK_REPORT) | tsk->exit_state is not clear. TASK_REPORT is self-documenting but it is not clear what ->exit_state can add. Move the potential exit_state's (EXIT_ZOMBIE and EXIT_DEAD) into TASK_REPORT and use it to calculate the final result. 2. With the change above it is obvious that task_state_array[] has the unused entries just to make BUILD_BUG_ON() happy. Change this BUILD_BUG_ON() to use TASK_REPORT rather than TASK_STATE_MAX and shrink task_state_array[]. 3. Turn the "while (state)" loop into fls(state). Signed-off-by: Oleg Nesterov Cc: Peter Zijlstra Cc: David Laight Cc: Geert Uytterhoeven Cc: Ingo Molnar Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/array.c b/fs/proc/array.c index 1bd2077..554a0b2 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -140,24 +140,15 @@ static const char * const task_state_array[] = { "t (tracing stop)", /* 8 */ "Z (zombie)", /* 16 */ "X (dead)", /* 32 */ - "x (dead)", /* 64 */ - "K (wakekill)", /* 128 */ - "W (waking)", /* 256 */ - "P (parked)", /* 512 */ }; static inline const char *get_task_state(struct task_struct *tsk) { - unsigned int state = (tsk->state & TASK_REPORT) | tsk->exit_state; - const char * const *p = &task_state_array[0]; + unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT; - BUILD_BUG_ON(1 + ilog2(TASK_STATE_MAX) != ARRAY_SIZE(task_state_array)); + BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1); - while (state) { - p++; - state >>= 1; - } - return *p; + return task_state_array[fls(state)]; } static inline void task_state(struct seq_file *m, struct pid_namespace *ns, diff --git a/include/linux/sched.h b/include/linux/sched.h index cf9e414..33e4e9e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -229,7 +229,7 @@ extern char ___assert_task_state[1 - 2*!!( /* get_task_state() */ #define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \ - __TASK_TRACED) + __TASK_TRACED | EXIT_ZOMBIE | EXIT_DEAD) #define task_is_traced(task) ((task->state & __TASK_TRACED) != 0) #define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0) -- cgit v0.10.2 From 940fe4793a219375c4713a17c61b843720807c9d Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:36 -0800 Subject: proc: fix the potential use-after-free in first_tid() proc_task_readdir() verifies that the result of get_proc_task() is pid_alive() and thus its ->group_leader is fine too. However this is not necessarily true after rcu_read_unlock(), we need to recheck this again after first_tid() does rcu_read_lock(). Otherwise leader->thread_group.next (used by next_thread()) can be invalid if the rcu grace period expires in between. The race is subtle and unlikely, but still it is possible afaics. To simplify lets ignore the "likely" case when tid != 0, f_version can be cleared by proc_task_operations->llseek(). Suppose we have a main thread M and its subthread T. Suppose that f_pos == 3, iow first_tid() should return T. Now suppose that the following happens between rcu_read_unlock() and rcu_read_lock(): 1. T execs and becomes the new leader. This removes M from ->thread_group but next_thread(M) is still T. 2. T creates another thread X which does exec as well, T goes away. 3. X creates another subthread, this increments nr_threads. 4. first_tid() does next_thread(M) and returns the already dead T. Note also that we need 2. and 3. only because of get_nr_threads() check, and this check was supposed to be optimization only. Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Michal Hocko Cc: Sameer Nanda Cc: Sergey Dyasly Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 03c8d74..f223a56 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3109,6 +3109,9 @@ static struct task_struct *first_tid(struct task_struct *leader, pos = NULL; if (nr && nr >= get_nr_threads(leader)) goto out; + /* It could be unhashed before we take rcu lock */ + if (!pid_alive(leader)) + goto out; /* If we haven't found our starting place yet start * with the leader and walk nr threads forward. -- cgit v0.10.2 From c986c14a6a88427946dc77d7018a81b95b3d41b6 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:38 -0800 Subject: proc: change first_tid() to use while_each_thread() rather than next_thread() Rerwrite the main loop to use while_each_thread() instead of next_thread(). We are going to fix or replace while_each_thread(), next_thread() should be avoided whenever possible. Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Michal Hocko Cc: Sameer Nanda Cc: Sergey Dyasly Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index f223a56..be8e17c 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3106,23 +3106,23 @@ static struct task_struct *first_tid(struct task_struct *leader, } /* If nr exceeds the number of threads there is nothing todo */ - pos = NULL; if (nr && nr >= get_nr_threads(leader)) - goto out; + goto fail; /* It could be unhashed before we take rcu lock */ if (!pid_alive(leader)) - goto out; + goto fail; /* If we haven't found our starting place yet start * with the leader and walk nr threads forward. */ - for (pos = leader; nr > 0; --nr) { - pos = next_thread(pos); - if (pos == leader) { - pos = NULL; - goto out; - } - } + pos = leader; + do { + if (nr-- <= 0) + goto found; + } while_each_thread(leader, pos); +fail: + pos = NULL; + goto out; found: get_task_struct(pos); out: -- cgit v0.10.2 From d855a4b79f49ea07d1827fc0591490a6a324148b Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:39 -0800 Subject: proc: don't (ab)use ->group_leader in proc_task_readdir() paths proc_task_readdir() does not really need "leader", first_tid() has to revalidate it anyway. Just pass proc_pid(inode) to first_tid() instead, it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if necessary. The patch also extracts the "inode is dead" code from pid_delete_dentry(dentry) into the new trivial helper, proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT if this dir was removed. This is a bit racy, but the race is very inlikely and the getdents() after openndir() can see the empty "." + ".." dir only once. Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Michal Hocko Cc: Sameer Nanda Cc: Sergey Dyasly Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index be8e17c..9b423fe 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1658,13 +1658,18 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags) return 0; } +static inline bool proc_inode_is_dead(struct inode *inode) +{ + return !proc_pid(inode)->tasks[PIDTYPE_PID].first; +} + int pid_delete_dentry(const struct dentry *dentry) { /* Is the task we represent dead? * If so, then don't put the dentry on the lru list, * kill it immediately. */ - return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first; + return proc_inode_is_dead(dentry->d_inode); } const struct dentry_operations pid_dentry_operations = @@ -3092,34 +3097,35 @@ out_no_task: * In the case of a seek we start with the leader and walk nr * threads past it. */ -static struct task_struct *first_tid(struct task_struct *leader, - int tid, int nr, struct pid_namespace *ns) +static struct task_struct *first_tid(struct pid *pid, int tid, + int nr, struct pid_namespace *ns) { - struct task_struct *pos; + struct task_struct *pos, *task; rcu_read_lock(); - /* Attempt to start with the pid of a thread */ + task = pid_task(pid, PIDTYPE_PID); + if (!task) + goto fail; + + /* Attempt to start with the tid of a thread */ if (tid && (nr > 0)) { pos = find_task_by_pid_ns(tid, ns); - if (pos && (pos->group_leader == leader)) + if (pos && same_thread_group(pos, task)) goto found; } /* If nr exceeds the number of threads there is nothing todo */ - if (nr && nr >= get_nr_threads(leader)) - goto fail; - /* It could be unhashed before we take rcu lock */ - if (!pid_alive(leader)) + if (nr && nr >= get_nr_threads(task)) goto fail; /* If we haven't found our starting place yet start * with the leader and walk nr threads forward. */ - pos = leader; + pos = task = task->group_leader; do { if (nr-- <= 0) goto found; - } while_each_thread(leader, pos); + } while_each_thread(task, pos); fail: pos = NULL; goto out; @@ -3155,25 +3161,16 @@ static struct task_struct *next_tid(struct task_struct *start) /* for the /proc/TGID/task/ directories */ static int proc_task_readdir(struct file *file, struct dir_context *ctx) { - struct task_struct *leader = NULL; - struct task_struct *task = get_proc_task(file_inode(file)); + struct inode *inode = file_inode(file); + struct task_struct *task; struct pid_namespace *ns; int tid; - if (!task) - return -ENOENT; - rcu_read_lock(); - if (pid_alive(task)) { - leader = task->group_leader; - get_task_struct(leader); - } - rcu_read_unlock(); - put_task_struct(task); - if (!leader) + if (proc_inode_is_dead(inode)) return -ENOENT; if (!dir_emit_dots(file, ctx)) - goto out; + return 0; /* f_version caches the tgid value that the last readdir call couldn't * return. lseek aka telldir automagically resets f_version to 0. @@ -3181,7 +3178,7 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx) ns = file->f_dentry->d_sb->s_fs_info; tid = (int)file->f_version; file->f_version = 0; - for (task = first_tid(leader, tid, ctx->pos - 2, ns); + for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns); task; task = next_tid(task), ctx->pos++) { char name[PROC_NUMBUF]; @@ -3197,8 +3194,7 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx) break; } } -out: - put_task_struct(leader); + return 0; } -- cgit v0.10.2 From 9f6e963f06c19a57a876cb77a9c87f6a56295b13 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:40 -0800 Subject: proc: fix ->f_pos overflows in first_tid() 1. proc_task_readdir()->first_tid() path truncates f_pos to int, this is wrong even on 64bit. We could check that f_pos < PID_MAX or even INT_MAX in proc_task_readdir(), but this patch simply checks the potential overflow in first_tid(), this check is nop on 64bit. We do not care if it was negative and the new unsigned value is huge, all we need to ensure is that we never wrongly return !NULL. 2. Remove the 2nd "nr != 0" check before get_nr_threads(), nr_threads == 0 is not distinguishable from !pid_task() above. Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Michal Hocko Cc: Sameer Nanda Cc: Sergey Dyasly Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 9b423fe..5150706 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3097,10 +3097,14 @@ out_no_task: * In the case of a seek we start with the leader and walk nr * threads past it. */ -static struct task_struct *first_tid(struct pid *pid, int tid, - int nr, struct pid_namespace *ns) +static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos, + struct pid_namespace *ns) { struct task_struct *pos, *task; + unsigned long nr = f_pos; + + if (nr != f_pos) /* 32bit overflow? */ + return NULL; rcu_read_lock(); task = pid_task(pid, PIDTYPE_PID); @@ -3108,14 +3112,14 @@ static struct task_struct *first_tid(struct pid *pid, int tid, goto fail; /* Attempt to start with the tid of a thread */ - if (tid && (nr > 0)) { + if (tid && nr) { pos = find_task_by_pid_ns(tid, ns); if (pos && same_thread_group(pos, task)) goto found; } /* If nr exceeds the number of threads there is nothing todo */ - if (nr && nr >= get_nr_threads(task)) + if (nr >= get_nr_threads(task)) goto fail; /* If we haven't found our starting place yet start @@ -3123,7 +3127,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid, */ pos = task = task->group_leader; do { - if (nr-- <= 0) + if (!nr--) goto found; } while_each_thread(task, pos); fail: -- cgit v0.10.2 From cdf7e8dded6212cb29f758017a613e4eefc4ce9e Mon Sep 17 00:00:00 2001 From: Rui Xiang Date: Thu, 23 Jan 2014 15:55:41 -0800 Subject: proc: set attributes of pde using accessor functions Use existing accessors proc_set_user() and proc_set_size() to set attributes. Just a cleanup. Signed-off-by: Rui Xiang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/generic.c b/fs/proc/generic.c index cca93b6..b7f268e 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -49,8 +49,7 @@ static int proc_notify_change(struct dentry *dentry, struct iattr *iattr) setattr_copy(inode, iattr); mark_inode_dirty(inode); - de->uid = inode->i_uid; - de->gid = inode->i_gid; + proc_set_user(de, inode->i_uid, inode->i_gid); de->mode = inode->i_mode; return 0; } diff --git a/fs/proc/proc_devtree.c b/fs/proc/proc_devtree.c index 70779b2..c824187 100644 --- a/fs/proc/proc_devtree.c +++ b/fs/proc/proc_devtree.c @@ -74,9 +74,9 @@ __proc_device_tree_add_prop(struct proc_dir_entry *de, struct property *pp, return NULL; if (!strncmp(name, "security-", 9)) - ent->size = 0; /* don't leak number of password chars */ + proc_set_size(ent, 0); /* don't leak number of password chars */ else - ent->size = pp->length; + proc_set_size(ent, pp->length); return ent; } -- cgit v0.10.2 From c1d867a54d426b45da017fbe8e585f8a3064ce8d Mon Sep 17 00:00:00 2001 From: Dave Jones Date: Thu, 23 Jan 2014 15:55:43 -0800 Subject: fs/proc/proc_devtree.c: remove empty /proc/device-tree when no openfirmware exists. Distribution kernels might want to build in support for /proc/device-tree for kernels that might end up running on hardware that doesn't support openfirmware. This results in an empty /proc/device-tree existing. Remove it if the OFW root node doesn't exist. This situation actually confuses grub2, resulting in install failures. grub2 sees the /proc/device-tree and picks the wrong install target cf. http://bzr.savannah.gnu.org/lh/grub/trunk/grub/annotate/4300/util/grub-install.in#L311 grub should be more robust, but still, leaving an empty proc dir seems pointless. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=818378. Signed-off-by: Dave Jones Cc: Al Viro Cc: Paul Mackerras Cc: Josh Boyer Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/proc_devtree.c b/fs/proc/proc_devtree.c index c824187..c82dd51 100644 --- a/fs/proc/proc_devtree.c +++ b/fs/proc/proc_devtree.c @@ -232,6 +232,7 @@ void __init proc_device_tree_init(void) return; root = of_find_node_by_path("/"); if (root == NULL) { + remove_proc_entry("device-tree", NULL); pr_debug("/proc/device-tree: can't find root\n"); return; } -- cgit v0.10.2 From 3d93116cef306bd516a7645e7b4895d1d0ceec2b Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Thu, 23 Jan 2014 15:55:44 -0800 Subject: fs/proc_namespace.c: simplify testing nsp and nsp->mnt_ns Trivial cleanup to eliminate a goto. Signed-off-by: Axel Lin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 439406e..7be26f0 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -234,17 +234,12 @@ static int mounts_open_common(struct inode *inode, struct file *file, rcu_read_lock(); nsp = task_nsproxy(task); - if (!nsp) { + if (!nsp || !nsp->mnt_ns) { rcu_read_unlock(); put_task_struct(task); goto err; } ns = nsp->mnt_ns; - if (!ns) { - rcu_read_unlock(); - put_task_struct(task); - goto err; - } get_mnt_ns(ns); rcu_read_unlock(); task_lock(task); -- cgit v0.10.2 From abaf3787ac26ba33e2f75e76b1174c32254c25b0 Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Thu, 23 Jan 2014 15:55:45 -0800 Subject: fs/proc: don't use module_init for non-modular core code PROC_FS is a bool, so this code is either present or absent. It will never be modular, so using module_init as an alias for __initcall is rather misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be ugly at best. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of fs_initcall (which makes sense for fs code) will thus change these registrations from level 6-device to level 5-fs (i.e. slightly earlier). However no observable impact of that small difference has been observed during testing, or is expected. Also note that this change uncovers a missing semicolon bug in the registration of vmcore_init as an initcall. Signed-off-by: Paul Gortmaker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c index 82676e3..cbd82df 100644 --- a/fs/proc/cmdline.c +++ b/fs/proc/cmdline.c @@ -26,4 +26,4 @@ static int __init proc_cmdline_init(void) proc_create("cmdline", 0, NULL, &cmdline_proc_fops); return 0; } -module_init(proc_cmdline_init); +fs_initcall(proc_cmdline_init); diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c index 51942d5..290ba85 100644 --- a/fs/proc/consoles.c +++ b/fs/proc/consoles.c @@ -109,4 +109,4 @@ static int __init proc_consoles_init(void) proc_create("consoles", 0, NULL, &proc_consoles_operations); return 0; } -module_init(proc_consoles_init); +fs_initcall(proc_consoles_init); diff --git a/fs/proc/cpuinfo.c b/fs/proc/cpuinfo.c index 5a1e539..06f4d31 100644 --- a/fs/proc/cpuinfo.c +++ b/fs/proc/cpuinfo.c @@ -21,4 +21,4 @@ static int __init proc_cpuinfo_init(void) proc_create("cpuinfo", 0, NULL, &proc_cpuinfo_operations); return 0; } -module_init(proc_cpuinfo_init); +fs_initcall(proc_cpuinfo_init); diff --git a/fs/proc/devices.c b/fs/proc/devices.c index b143471..50493ed 100644 --- a/fs/proc/devices.c +++ b/fs/proc/devices.c @@ -67,4 +67,4 @@ static int __init proc_devices_init(void) proc_create("devices", 0, NULL, &proc_devinfo_operations); return 0; } -module_init(proc_devices_init); +fs_initcall(proc_devices_init); diff --git a/fs/proc/interrupts.c b/fs/proc/interrupts.c index 05029c0..a352d57 100644 --- a/fs/proc/interrupts.c +++ b/fs/proc/interrupts.c @@ -50,4 +50,4 @@ static int __init proc_interrupts_init(void) proc_create("interrupts", 0, NULL, &proc_interrupts_operations); return 0; } -module_init(proc_interrupts_init); +fs_initcall(proc_interrupts_init); diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 5ed0e52..39e6ef3 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -639,4 +639,4 @@ static int __init proc_kcore_init(void) return 0; } -module_init(proc_kcore_init); +fs_initcall(proc_kcore_init); diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c index bdfabda..05f8dcd 100644 --- a/fs/proc/kmsg.c +++ b/fs/proc/kmsg.c @@ -61,4 +61,4 @@ static int __init proc_kmsg_init(void) proc_create("kmsg", S_IRUSR, NULL, &proc_kmsg_operations); return 0; } -module_init(proc_kmsg_init); +fs_initcall(proc_kmsg_init); diff --git a/fs/proc/loadavg.c b/fs/proc/loadavg.c index 1afa4dd..aec66e6 100644 --- a/fs/proc/loadavg.c +++ b/fs/proc/loadavg.c @@ -42,4 +42,4 @@ static int __init proc_loadavg_init(void) proc_create("loadavg", 0, NULL, &loadavg_proc_fops); return 0; } -module_init(proc_loadavg_init); +fs_initcall(proc_loadavg_init); diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 24270ec..136e548 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -220,4 +220,4 @@ static int __init proc_meminfo_init(void) proc_create("meminfo", 0, NULL, &meminfo_proc_fops); return 0; } -module_init(proc_meminfo_init); +fs_initcall(proc_meminfo_init); diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c index 5f9bc8a..d4a3574 100644 --- a/fs/proc/nommu.c +++ b/fs/proc/nommu.c @@ -131,4 +131,4 @@ static int __init proc_nommu_init(void) return 0; } -module_init(proc_nommu_init); +fs_initcall(proc_nommu_init); diff --git a/fs/proc/page.c b/fs/proc/page.c index cab84b6..02174a6 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -219,4 +219,4 @@ static int __init proc_page_init(void) proc_create("kpageflags", S_IRUSR, NULL, &proc_kpageflags_operations); return 0; } -module_init(proc_page_init); +fs_initcall(proc_page_init); diff --git a/fs/proc/softirqs.c b/fs/proc/softirqs.c index 62604be..ad8a77f 100644 --- a/fs/proc/softirqs.c +++ b/fs/proc/softirqs.c @@ -41,4 +41,4 @@ static int __init proc_softirqs_init(void) proc_create("softirqs", 0, NULL, &proc_softirqs_operations); return 0; } -module_init(proc_softirqs_init); +fs_initcall(proc_softirqs_init); diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 1cf86c0..6f599c6 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -221,4 +221,4 @@ static int __init proc_stat_init(void) proc_create("stat", 0, NULL, &proc_stat_operations); return 0; } -module_init(proc_stat_init); +fs_initcall(proc_stat_init); diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c index 0618946..7141b8d 100644 --- a/fs/proc/uptime.c +++ b/fs/proc/uptime.c @@ -49,4 +49,4 @@ static int __init proc_uptime_init(void) proc_create("uptime", 0, NULL, &uptime_proc_fops); return 0; } -module_init(proc_uptime_init); +fs_initcall(proc_uptime_init); diff --git a/fs/proc/version.c b/fs/proc/version.c index 76817a6..d2154eb 100644 --- a/fs/proc/version.c +++ b/fs/proc/version.c @@ -31,4 +31,4 @@ static int __init proc_version_init(void) proc_create("version", 0, NULL, &version_proc_fops); return 0; } -module_init(proc_version_init); +fs_initcall(proc_version_init); diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 9100d69..2ca7ba0 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -1082,7 +1082,7 @@ static int __init vmcore_init(void) proc_vmcore->size = vmcore_size; return 0; } -module_init(vmcore_init) +fs_initcall(vmcore_init); /* Cleanup function for vmcore module. */ void vmcore_cleanup(void) -- cgit v0.10.2 From ff252c1fc537b0c9e40f62da0a9d11bf0737b7db Mon Sep 17 00:00:00 2001 From: DaeSeok Youn Date: Thu, 23 Jan 2014 15:55:46 -0800 Subject: kernel/fork.c: make dup_mm() static dup_mm() is used only in kernel/fork.c Signed-off-by: Daeseok Youn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/sched.h b/include/linux/sched.h index 33e4e9e..66a17ad 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2295,8 +2295,6 @@ extern struct mm_struct *get_task_mm(struct task_struct *task); extern struct mm_struct *mm_access(struct task_struct *task, unsigned int mode); /* Remove the current tasks stale references to the old mm_struct */ extern void mm_release(struct task_struct *, struct mm_struct *); -/* Allocate a new mm structure and copy contents from tsk->mm */ -extern struct mm_struct *dup_mm(struct task_struct *tsk); extern int copy_thread(unsigned long, unsigned long, unsigned long, struct task_struct *); diff --git a/kernel/fork.c b/kernel/fork.c index 2f11bbe..5615ead 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -800,7 +800,7 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm) * Allocate a new mm structure and copy contents from the * mm structure of the passed in task structure. */ -struct mm_struct *dup_mm(struct task_struct *tsk) +static struct mm_struct *dup_mm(struct task_struct *tsk) { struct mm_struct *mm, *oldmm = current->mm; int err; -- cgit v0.10.2 From 5d59e18270d4769c9160c282b25c00b6fc004ffb Mon Sep 17 00:00:00 2001 From: Daeseok Youn Date: Thu, 23 Jan 2014 15:55:47 -0800 Subject: kernel/fork.c: fix coding style issues Fix errors reported by checkpatch.pl. One error is parentheses, the other is a whitespace issue. Signed-off-by: Daeseok Youn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/fork.c b/kernel/fork.c index 5615ead..01ccc61 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1654,7 +1654,7 @@ SYSCALL_DEFINE0(fork) return do_fork(SIGCHLD, 0, 0, NULL, NULL); #else /* can not support in nommu mode */ - return(-EINVAL); + return -EINVAL; #endif } #endif @@ -1662,7 +1662,7 @@ SYSCALL_DEFINE0(fork) #ifdef __ARCH_WANT_SYS_VFORK SYSCALL_DEFINE0(vfork) { - return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0, + return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0, 0, NULL, NULL); } #endif -- cgit v0.10.2 From 68ce670b6e8edc30551862e7f6a306e45389e189 Mon Sep 17 00:00:00 2001 From: Daeseok Youn Date: Thu, 23 Jan 2014 15:55:48 -0800 Subject: kernel/fork.c: remove redundant NULL check in dup_mm() current->mm doesn't need a NULL check in dup_mm(). Becasue dup_mm() is used only in copy_mm() and current->mm is checked whether it is NULL or not in copy_mm() before calling dup_mm(). Signed-off-by: Daeseok Youn Acked-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/fork.c b/kernel/fork.c index 01ccc61..b6dd0bb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -805,9 +805,6 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) struct mm_struct *mm, *oldmm = current->mm; int err; - if (!oldmm) - return NULL; - mm = allocate_mm(); if (!mm) goto fail_nomem; -- cgit v0.10.2 From 83f62a2eacb1d6945c78523f20e0c34b5d94913c Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:49 -0800 Subject: exec:check_unsafe_exec: use while_each_thread() rather than next_thread() next_thread() should be avoided, change check_unsafe_exec() to use while_each_thread(). Nobody except signal->curr_target actually needs next_thread-like code, and we need to change (fix) this interface. This particular code is fine, p == current. But in general the code like this can loop forever if p exits and next_thread(t) can't reach the unhashed thread. This also saves 32 bytes. Signed-off-by: Oleg Nesterov Acked-by: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 9cbad5b..81ae621 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1243,10 +1243,11 @@ static int check_unsafe_exec(struct linux_binprm *bprm) if (current->no_new_privs) bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS; + t = p; n_fs = 1; spin_lock(&p->fs->lock); rcu_read_lock(); - for (t = next_thread(p); t != p; t = next_thread(t)) { + while_each_thread(p, t) { if (t->fs == p->fs) n_fs++; } -- cgit v0.10.2 From 9e00cdb091b008cb3c78192651180896de412a63 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:50 -0800 Subject: exec:check_unsafe_exec: kill the dead -EAGAIN and clear_in_exec logic fs_struct->in_exec == T means that this ->fs is used by a single process (thread group), and one of the treads does do_execve(). To avoid the mt-exec races this code has the following complications: 1. check_unsafe_exec() returns -EBUSY if ->in_exec was already set by another thread. 2. do_execve_common() records "clear_in_exec" to ensure that the error path can only clear ->in_exec if it was set by current. However, after 9b1bf12d5d51 "signals: move cred_guard_mutex from task_struct to signal_struct" we do not need these complications: 1. We can't race with our sub-thread, this is called under per-process ->cred_guard_mutex. And we can't race with another CLONE_FS task, we already checked that this fs is not shared. We can remove the dead -EAGAIN logic. 2. "out_unmark:" in do_execve_common() is either called under ->cred_guard_mutex, or after de_thread() which kills other threads, so we can't race with sub-thread which could set ->in_exec. And if ->fs is shared with another process ->in_exec should be false anyway. We can clear in_exec unconditionally. This also means that check_unsafe_exec() can be void. Signed-off-by: Oleg Nesterov Acked-by: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 81ae621..389fe7b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1223,11 +1223,10 @@ EXPORT_SYMBOL(install_exec_creds); * - the caller must hold ->cred_guard_mutex to protect against * PTRACE_ATTACH */ -static int check_unsafe_exec(struct linux_binprm *bprm) +static void check_unsafe_exec(struct linux_binprm *bprm) { struct task_struct *p = current, *t; unsigned n_fs; - int res = 0; if (p->ptrace) { if (p->ptrace & PT_PTRACE_CAP) @@ -1253,22 +1252,15 @@ static int check_unsafe_exec(struct linux_binprm *bprm) } rcu_read_unlock(); - if (p->fs->users > n_fs) { + if (p->fs->users > n_fs) bprm->unsafe |= LSM_UNSAFE_SHARE; - } else { - res = -EAGAIN; - if (!p->fs->in_exec) { - p->fs->in_exec = 1; - res = 1; - } - } + else + p->fs->in_exec = 1; spin_unlock(&p->fs->lock); - - return res; } -/* - * Fill the binprm structure from the inode. +/* + * Fill the binprm structure from the inode. * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes * * This may be called multiple times for binary chains (scripts for example). @@ -1453,7 +1445,6 @@ static int do_execve_common(const char *filename, struct linux_binprm *bprm; struct file *file; struct files_struct *displaced; - bool clear_in_exec; int retval; /* @@ -1485,10 +1476,7 @@ static int do_execve_common(const char *filename, if (retval) goto out_free; - retval = check_unsafe_exec(bprm); - if (retval < 0) - goto out_free; - clear_in_exec = retval; + check_unsafe_exec(bprm); current->in_execve = 1; file = open_exec(filename); @@ -1558,8 +1546,7 @@ out_file: } out_unmark: - if (clear_in_exec) - current->fs->in_exec = 0; + current->fs->in_exec = 0; current->in_execve = 0; out_free: -- cgit v0.10.2 From 63e46b95e9eae1161832bf45cb40bbad37bfb182 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:51 -0800 Subject: exec: move the final allow_write_access/fput into free_bprm() Both success/failure paths cleanup bprm->file, we can move this code into free_bprm() to simlify and cleanup this logic. Signed-off-by: Oleg Nesterov Acked-by: KOSAKI Motohiro Cc: Al Viro Acked-by: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 389fe7b..f860866 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1138,9 +1138,7 @@ void setup_new_exec(struct linux_binprm * bprm) /* An exec changes our domain. We are no longer part of the thread group */ - current->self_exec_id++; - flush_signal_handlers(current, 0); do_close_on_exec(current->files); } @@ -1172,6 +1170,10 @@ void free_bprm(struct linux_binprm *bprm) mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } + if (bprm->file) { + allow_write_access(bprm->file); + fput(bprm->file); + } /* If a binfmt changed the interp, free it. */ if (bprm->interp != bprm->filename) kfree(bprm->interp); @@ -1424,12 +1426,6 @@ static int exec_binprm(struct linux_binprm *bprm) ptrace_event(PTRACE_EVENT_EXEC, old_vpid); current->did_exec = 1; proc_exec_connector(current); - - if (bprm->file) { - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; /* to catch use-after-free */ - } } return ret; @@ -1492,7 +1488,7 @@ static int do_execve_common(const char *filename, retval = bprm_mm_init(bprm); if (retval) - goto out_file; + goto out_unmark; bprm->argc = count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) @@ -1539,12 +1535,6 @@ out: mmput(bprm->mm); } -out_file: - if (bprm->file) { - allow_write_access(bprm->file); - fput(bprm->file); - } - out_unmark: current->fs->in_exec = 0; current->in_execve = 0; -- cgit v0.10.2 From 98611e4e6a2b4a03fd2d4750cce8e4455a995c8d Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:52 -0800 Subject: exec: kill task_struct->did_exec We can kill either task->did_exec or PF_FORKNOEXEC, they are mutually exclusive. The patch kills ->did_exec because it has a single user. Signed-off-by: Oleg Nesterov Acked-by: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index f860866..493b102 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1424,7 +1424,6 @@ static int exec_binprm(struct linux_binprm *bprm) audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); - current->did_exec = 1; proc_exec_connector(current); } diff --git a/include/linux/sched.h b/include/linux/sched.h index 66a17ad..68a0e84 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1239,7 +1239,6 @@ struct task_struct { /* Used for emulating ABI behavior of previous Linux versions */ unsigned int personality; - unsigned did_exec:1; unsigned in_execve:1; /* Tell the LSMs that the process is doing an * execve */ unsigned in_iowait:1; diff --git a/kernel/fork.c b/kernel/fork.c index b6dd0bb..a17621c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1226,7 +1226,6 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (!try_module_get(task_thread_info(p)->exec_domain->module)) goto bad_fork_cleanup_count; - p->did_exec = 0; delayacct_tsk_init(p); /* Must remain after dup_task_struct() */ copy_flags(clone_flags, p); INIT_LIST_HEAD(&p->children); diff --git a/kernel/sys.c b/kernel/sys.c index c723113..ecd3ea1 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -895,8 +895,7 @@ SYSCALL_DEFINE1(times, struct tms __user *, tbuf) * only important on a multi-user system anyway, to make sure one user * can't send a signal to a process owned by another. -TYT, 12/12/91 * - * Auch. Had to add the 'did_exec' flag to conform completely to POSIX. - * LBT 04.03.94 + * !PF_FORKNOEXEC check to conform completely to POSIX. */ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) { @@ -932,7 +931,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) if (task_session(p) != task_session(group_leader)) goto out; err = -EACCES; - if (p->did_exec) + if (!(p->flags & PF_FORKNOEXEC)) goto out; } else { err = -ESRCH; -- cgit v0.10.2 From 185ee40ee7fd1ecfc6575e8cefa2331218d1eca2 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:53 -0800 Subject: fs/proc/array.c: change do_task_stat() to use while_each_thread() Change the remaining next_thread (ab)users to use while_each_thread(). The last user which should be changed is next_tid(), but we can't do this now. __exit_signal() and complete_signal() are fine, they actually need next_thread() logic. This patch (of 3): do_task_stat() can use while_each_thread(), no changes in the compiled code. Signed-off-by: Oleg Nesterov Cc: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Reviewed-by: Sameer Nanda Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/array.c b/fs/proc/array.c index 554a0b2..656e401 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -444,8 +444,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, min_flt += t->min_flt; maj_flt += t->maj_flt; gtime += task_gtime(t); - t = next_thread(t); - } while (t != task); + } while_each_thread(task, t); min_flt += sig->min_flt; maj_flt += sig->maj_flt; -- cgit v0.10.2 From 2e1f38358246b8f8e5871026b21d374e9bb1a163 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:55 -0800 Subject: kernel/sys.c: k_getrusage() can use while_each_thread() Change k_getrusage() to use while_each_thread(), no changes in the compiled code. Signed-off-by: Oleg Nesterov Cc: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Reviewed-by: Sameer Nanda Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/sys.c b/kernel/sys.c index ecd3ea1..c0a58be 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1571,8 +1571,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r) t = p; do { accumulate_thread_rusage(t, r); - t = next_thread(t); - } while (t != p); + } while_each_thread(p, t); break; default: -- cgit v0.10.2 From 8d38f203b46c36626285400b9466b08abecaaa80 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 23 Jan 2014 15:55:56 -0800 Subject: kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread() Change do_signal_stop() and do_sigaction() to avoid next_thread() and use while_each_thread() instead. Signed-off-by: Oleg Nesterov Cc: KOSAKI Motohiro Cc: Al Viro Cc: Kees Cook Reviewed-by: Sameer Nanda Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/signal.c b/kernel/signal.c index 940b30e..52f881d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2047,8 +2047,8 @@ static bool do_signal_stop(int signr) if (task_set_jobctl_pending(current, signr | gstop)) sig->group_stop_count++; - for (t = next_thread(current); t != current; - t = next_thread(t)) { + t = current; + while_each_thread(current, t) { /* * Setting state to TASK_STOPPED for a group * stop is always done with the siglock held, @@ -3125,8 +3125,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) rm_from_queue_full(&mask, &t->signal->shared_pending); do { rm_from_queue_full(&mask, &t->pending); - t = next_thread(t); - } while (t != current); + } while_each_thread(current, t); } } -- cgit v0.10.2 From b88fae644e5e3922251a4b242f435f5e3b49c381 Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Thu, 23 Jan 2014 15:55:57 -0800 Subject: exec: avoid propagating PF_NO_SETAFFINITY into userspace child Userspace process doesn't want the PF_NO_SETAFFINITY, but its parent may be a kernel worker thread which has PF_NO_SETAFFINITY set, and this worker thread can do kernel_thread() to create the child. Clearing this flag in usersapce child to enable its migrating capability. Signed-off-by: Zhang Yi Acked-by: Oleg Nesterov Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 493b102..44218a7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1087,8 +1087,8 @@ int flush_old_exec(struct linux_binprm * bprm) bprm->mm = NULL; /* We're using it now */ set_fs(USER_DS); - current->flags &= - ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE); + current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | + PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); current->personality &= ~bprm->per_clear; -- cgit v0.10.2 From 3b96d7db3b6dc99d207bca50037274d22e48dea5 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Thu, 23 Jan 2014 15:55:58 -0800 Subject: fs/exec.c: call arch_pick_mmap_layout() only once Currently both setup_new_exec() and flush_old_exec() issue a call to arch_pick_mmap_layout(). As setup_new_exec() and flush_old_exec() are always called pairwise arch_pick_mmap_layout() is called twice. This patch removes one call from setup_new_exec() to have it only called once. Signed-off-by: Richard Weinberger Tested-by: Pat Erley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index 44218a7..e1529b4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -842,7 +842,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->active_mm = mm; activate_mm(active_mm, mm); task_unlock(tsk); - arch_pick_mmap_layout(mm); if (old_mm) { up_read(&old_mm->mmap_sem); BUG_ON(active_mm != old_mm); -- cgit v0.10.2 From 7984754b99b6c89054edc405e9d9d35810a91d36 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 23 Jan 2014 15:55:59 -0800 Subject: kexec: add sysctl to disable kexec_load For general-purpose (i.e. distro) kernel builds it makes sense to build with CONFIG_KEXEC to allow end users to choose what kind of things they want to do with kexec. However, in the face of trying to lock down a system with such a kernel, there needs to be a way to disable kexec_load (much like module loading can be disabled). Without this, it is too easy for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM and modules_disabled are set. With this change, it is still possible to load an image for use later, then disable kexec_load so the image (or lack of image) can't be altered. The intention is for using this in environments where "perfect" enforcement is hard. Without a verified boot, along with verified modules, and along with verified kexec, this is trying to give a system a better chance to defend itself (or at least grow the window of discoverability) against attack in the face of a privilege escalation. In my mind, I consider several boot scenarios: 1) Verified boot of read-only verified root fs loading fd-based verification of kexec images. 2) Secure boot of writable root fs loading signed kexec images. 3) Regular boot loading kexec (e.g. kcrash) image early and locking it. 4) Regular boot with no control of kexec image at all. 1 and 2 don't exist yet, but will soon once the verified kexec series has landed. 4 is the state of things now. The gap between 2 and 4 is too large, so this change creates scenario 3, a middle-ground above 4 when 2 and 1 are not possible for a system. Signed-off-by: Kees Cook Acked-by: Rik van Riel Cc: Vivek Goyal Cc: Eric Biederman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6d48640..ee9a2f9 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -33,6 +33,7 @@ show up in /proc/sys/kernel: - domainname - hostname - hotplug +- kexec_load_disabled - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] @@ -287,6 +288,18 @@ Default value is "/sbin/hotplug". ============================================================== +kexec_load_disabled: + +A toggle indicating if the kexec_load syscall has been disabled. This +value defaults to 0 (false: kexec_load enabled), but can be set to 1 +(true: kexec_load disabled). Once true, kexec can no longer be used, and +the toggle cannot be set back to false. This allows a kexec image to be +loaded before disabling the syscall, allowing a system to set up (and +later use) an image without it being altered. Generally used together +with the "modules_disabled" sysctl. + +============================================================== + kptr_restrict: This toggle indicates whether restrictions are placed on @@ -331,7 +344,7 @@ A toggle value indicating if modules are allowed to be loaded in an otherwise modular kernel. This toggle defaults to off (0), but can be set true (1). Once true, modules can be neither loaded nor unloaded, and the toggle cannot be set back -to false. +to false. Generally used with the "kexec_load_disabled" toggle. ============================================================== diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 5fd33dc..6d4066c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -170,6 +170,7 @@ unsigned long paddr_vmcoreinfo_note(void); extern struct kimage *kexec_image; extern struct kimage *kexec_crash_image; +extern int kexec_load_disabled; #ifndef kexec_flush_icache_page #define kexec_flush_icache_page(page) diff --git a/kernel/kexec.c b/kernel/kexec.c index 9c97016..ac73878 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -932,6 +932,7 @@ static int kimage_load_segment(struct kimage *image, */ struct kimage *kexec_image; struct kimage *kexec_crash_image; +int kexec_load_disabled; static DEFINE_MUTEX(kexec_mutex); @@ -942,7 +943,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, int result; /* We only trust the superuser with rebooting the system. */ - if (!capable(CAP_SYS_BOOT)) + if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; /* diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 693eac3..096db74 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include @@ -614,6 +615,18 @@ static struct ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif +#ifdef CONFIG_KEXEC + { + .procname = "kexec_load_disabled", + .data = &kexec_load_disabled, + .maxlen = sizeof(int), + .mode = 0644, + /* only handle a transition from default "0" to "1" */ + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, + .extra2 = &one, + }, +#endif #ifdef CONFIG_MODULES { .procname = "modprobe", -- cgit v0.10.2 From 77019967f06b5f30c8b619eac0dfdbc68465fa87 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Thu, 23 Jan 2014 15:56:00 -0800 Subject: kdump: fix exported size of vmcoreinfo note Right now we seem to be exporting the max data size contained inside vmcoreinfo note. But this does not include the size of meta data around vmcore info data. Like name of the note and starting and ending elf_note. I think user space expects total size and that size is put in PT_NOTE elf header. Things seem to be fine so far because we are not using vmcoreinfo note to the maximum capacity. But as it starts filling up, to capacity, at some point of time, problem will be visible. I don't think user space will be broken with this change. So there is no need to introduce vmcoreinfo2. This change is safe and backward compatible. More explanation on why this change is safe is below. vmcoreinfo contains information about kernel which user space needs to know to do things like filtering. For example, various kernel config options or information about size or offset of some data structures etc. All this information is commmunicated to user space with an ELF note present in ELF /proc/vmcore file. Currently vmcoreinfo data size is 4096. With some elf note meta data around it, actual size is 4132 bytes. But we are using barely 25% of that size. Rest is empty. So even if we tell user space that size of ELf note is 4096 and not 4132, nothing will be broken becase after around 1000 bytes, everything is zero anyway. But once we start filling up the note to the capacity, and not report the full size of note, bad things will start happening. Either some data will be lost or tools will be confused that they did not fine the zero note at the end. So I think this change is safe and should not break existing tools. Signed-off-by: Vivek Goyal Cc: Ken'ichi Ohmichi Cc: Dan Aloni Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index 9659d38..d945a94 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -126,7 +126,7 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj, { return sprintf(buf, "%lx %x\n", paddr_vmcoreinfo_note(), - (unsigned int)vmcoreinfo_max_size); + (unsigned int)sizeof(vmcoreinfo_note)); } KERNEL_ATTR_RO(vmcoreinfo); -- cgit v0.10.2 From bdd490ade365b1173485b829e5457af8e16c7f01 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Thu, 23 Jan 2014 15:56:01 -0800 Subject: kdump: add /sys/kernel/vmcoreinfo ABI documentation /sys/kernel/vmcoreinfo was introduced long back but there is no ABI documentation. This patch adds the documentation. Signed-off-by: Vivek Goyal Cc: Ken'ichi Ohmichi Cc: Dan Aloni Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo b/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo new file mode 100644 index 0000000..7bd8116 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo @@ -0,0 +1,14 @@ +What: /sys/kernel/vmcoreinfo +Date: October 2007 +KernelVersion: 2.6.24 +Contact: Ken'ichi Ohmichi + Kexec Mailing List + Vivek Goyal +Description + Shows physical address and size of vmcoreinfo ELF note. + First value contains physical address of note in hex and + second value contains the size of note in hex. This ELF + note info is parsed by second kernel and exported to user + space as part of ELF note in /proc/vmcore file. This note + contains various information like struct size, symbol + values, page size etc. -- cgit v0.10.2 From 6c5de79ba22bbde9a3cfc7b405140763a7252410 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 23 Jan 2014 15:56:03 -0800 Subject: partitions/efi: complete documentation of gpt kernel param purpose The usage of the 'gpt' kernel parameter is twofold: (i) skip any mbr integrity checks and (ii) enable the backup GPT header to be used in situations where the primary one is corrupted. This last "feature" is not obvious and needs to be properly documented in the kernel-parameters document. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=63591 Signed-off-by: Davidlohr Bueso Cc: Matt Domsch Cc: Matt Fleming Cc: "Chandramouleeswaran,Aswin" Cc: Chris Murphy Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 5efebfd..248fe9d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1043,7 +1043,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. debugfs files are removed at module unload time. gpt [EFI] Forces disk with valid GPT signature but - invalid Protective MBR to be treated as GPT. + invalid Protective MBR to be treated as GPT. If the + primary GPT is corrupted, it enables the backup/alternate + GPT to be used instead. grcan.enable0= [HW] Configuration of physical interface 0. Determines the "Enable 0" bit of the configuration register. -- cgit v0.10.2 From 56abde7239cb76d4dffcb79c8f96c1dab1cc81d1 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Thu, 23 Jan 2014 15:56:04 -0800 Subject: rapidio: add modular rapidio core build into powerpc and mips branches Allow modular build option for RapidIO subsystem core in MIPS and PowerPC architectural branches. At this moment modular RapidIO subsystem build is enabled only for platforms that use PCI/PCIe based RapidIO controllers (e.g. Tsi721). Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Jean Delvare Cc: Ralf Baechle Cc: Benjamin Herrenschmidt Cc: Li Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 92c8e0b..52dac06 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2430,7 +2430,7 @@ source "drivers/pcmcia/Kconfig" source "drivers/pci/hotplug/Kconfig" config RAPIDIO - bool "RapidIO support" + tristate "RapidIO support" depends on PCI default n help diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b2be8e8..bedc62b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -794,7 +794,7 @@ config HAS_RAPIDIO default n config RAPIDIO - bool "RapidIO support" + tristate "RapidIO support" depends on HAS_RAPIDIO || PCI help If you say Y here, the kernel will include drivers and @@ -802,7 +802,7 @@ config RAPIDIO config FSL_RIO bool "Freescale Embedded SRIO Controller support" - depends on RAPIDIO && HAS_RAPIDIO + depends on RAPIDIO = y && HAS_RAPIDIO default "n" ---help--- Include support for RapidIO controller on Freescale embedded -- cgit v0.10.2 From dbf128cbf9b90f97d74c734d1a768c564958e970 Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:05 -0800 Subject: rbtree/test: move rb_node to the middle of the test struct Avoid making the rb_node the first entry to catch some bugs around NULL checking the rb_node. Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c index 31dd4cc..df6c125 100644 --- a/lib/rbtree_test.c +++ b/lib/rbtree_test.c @@ -8,8 +8,8 @@ #define CHECK_LOOPS 100 struct test_node { - struct rb_node rb; u32 key; + struct rb_node rb; /* following fields used for testing augmented rbtree functionality */ u32 val; -- cgit v0.10.2 From 964fe94d71b771c8801134407ad8676874bb589e Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:06 -0800 Subject: rbtree/test: test rbtree_postorder_for_each_entry_safe() Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c index df6c125..8b3c9dc 100644 --- a/lib/rbtree_test.c +++ b/lib/rbtree_test.c @@ -114,6 +114,16 @@ static int black_path_count(struct rb_node *rb) return count; } +static void check_postorder_foreach(int nr_nodes) +{ + struct test_node *cur, *n; + int count = 0; + rbtree_postorder_for_each_entry_safe(cur, n, &root, rb) + count++; + + WARN_ON_ONCE(count != nr_nodes); +} + static void check_postorder(int nr_nodes) { struct rb_node *rb; @@ -148,6 +158,7 @@ static void check(int nr_nodes) WARN_ON_ONCE(count < (1 << black_path_count(rb_last(&root))) - 1); check_postorder(nr_nodes); + check_postorder_foreach(nr_nodes); } static void check_augmented(int nr_nodes) -- cgit v0.10.2 From b182837ac111e87f8e82cbcb0046449d9412187f Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:07 -0800 Subject: net/netfilter/ipset/ip_set_hash_netiface.c: use rbtree postorder iteration instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Cc: Pablo Neira Ayuso Cc: Patrick McHardy Cc: Jozsef Kadlecsik Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c index 3f64a66..b827a0f 100644 --- a/net/netfilter/ipset/ip_set_hash_netiface.c +++ b/net/netfilter/ipset/ip_set_hash_netiface.c @@ -46,31 +46,12 @@ struct iface_node { static void rbtree_destroy(struct rb_root *root) { - struct rb_node *p, *n = root->rb_node; - struct iface_node *node; - - /* Non-recursive destroy, like in ext3 */ - while (n) { - if (n->rb_left) { - n = n->rb_left; - continue; - } - if (n->rb_right) { - n = n->rb_right; - continue; - } - p = rb_parent(n); - node = rb_entry(n, struct iface_node, node); - if (!p) - *root = RB_ROOT; - else if (p->rb_left == n) - p->rb_left = NULL; - else if (p->rb_right == n) - p->rb_right = NULL; + struct iface_node *node, *next; + rbtree_postorder_for_each_entry_safe(node, next, root, node) kfree(node); - n = p; - } + + *root = RB_ROOT; } static int -- cgit v0.10.2 From bb25e49ff8ab0ef0b3c073c09d55cf10ef8a2aa0 Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:08 -0800 Subject: fs/ubifs: use rbtree postorder iteration helper instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Cc: Artem Bityutskiy Cc: Adrian Hunter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c index cc1febd..5157b86 100644 --- a/fs/ubifs/debug.c +++ b/fs/ubifs/debug.c @@ -2118,26 +2118,10 @@ out_free: */ static void free_inodes(struct fsck_data *fsckd) { - struct rb_node *this = fsckd->inodes.rb_node; - struct fsck_inode *fscki; + struct fsck_inode *fscki, *n; - while (this) { - if (this->rb_left) - this = this->rb_left; - else if (this->rb_right) - this = this->rb_right; - else { - fscki = rb_entry(this, struct fsck_inode, rb); - this = rb_parent(this); - if (this) { - if (this->rb_left == &fscki->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } - kfree(fscki); - } - } + rbtree_postorder_for_each_entry_safe(fscki, n, &fsckd->inodes, rb) + kfree(fscki); } /** diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c index 36bd4ef..a902c59 100644 --- a/fs/ubifs/log.c +++ b/fs/ubifs/log.c @@ -574,27 +574,10 @@ static int done_already(struct rb_root *done_tree, int lnum) */ static void destroy_done_tree(struct rb_root *done_tree) { - struct rb_node *this = done_tree->rb_node; - struct done_ref *dr; + struct done_ref *dr, *n; - while (this) { - if (this->rb_left) { - this = this->rb_left; - continue; - } else if (this->rb_right) { - this = this->rb_right; - continue; - } - dr = rb_entry(this, struct done_ref, rb); - this = rb_parent(this); - if (this) { - if (this->rb_left == &dr->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } + rbtree_postorder_for_each_entry_safe(dr, n, done_tree, rb) kfree(dr); - } } /** diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c index ba32da3..f1c3e5a1 100644 --- a/fs/ubifs/orphan.c +++ b/fs/ubifs/orphan.c @@ -815,27 +815,10 @@ static int dbg_find_check_orphan(struct rb_root *root, ino_t inum) static void dbg_free_check_tree(struct rb_root *root) { - struct rb_node *this = root->rb_node; - struct check_orphan *o; + struct check_orphan *o, *n; - while (this) { - if (this->rb_left) { - this = this->rb_left; - continue; - } else if (this->rb_right) { - this = this->rb_right; - continue; - } - o = rb_entry(this, struct check_orphan, rb); - this = rb_parent(this); - if (this) { - if (this->rb_left == &o->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } + rbtree_postorder_for_each_entry_safe(o, n, root, rb) kfree(o); - } } static int dbg_orphan_check(struct ubifs_info *c, struct ubifs_zbranch *zbr, diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c index 065096e..c14adb2 100644 --- a/fs/ubifs/recovery.c +++ b/fs/ubifs/recovery.c @@ -1335,29 +1335,14 @@ static void remove_ino(struct ubifs_info *c, ino_t inum) */ void ubifs_destroy_size_tree(struct ubifs_info *c) { - struct rb_node *this = c->size_tree.rb_node; - struct size_entry *e; + struct size_entry *e, *n; - while (this) { - if (this->rb_left) { - this = this->rb_left; - continue; - } else if (this->rb_right) { - this = this->rb_right; - continue; - } - e = rb_entry(this, struct size_entry, rb); + rbtree_postorder_for_each_entry_safe(e, n, &c->size_tree, rb) { if (e->inode) iput(e->inode); - this = rb_parent(this); - if (this) { - if (this->rb_left == &e->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } kfree(e); } + c->size_tree = RB_ROOT; } diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index f69daa5..5ded849 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -873,26 +873,10 @@ static void free_orphans(struct ubifs_info *c) */ static void free_buds(struct ubifs_info *c) { - struct rb_node *this = c->buds.rb_node; - struct ubifs_bud *bud; - - while (this) { - if (this->rb_left) - this = this->rb_left; - else if (this->rb_right) - this = this->rb_right; - else { - bud = rb_entry(this, struct ubifs_bud, rb); - this = rb_parent(this); - if (this) { - if (this->rb_left == &bud->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } - kfree(bud); - } - } + struct ubifs_bud *bud, *n; + + rbtree_postorder_for_each_entry_safe(bud, n, &c->buds, rb) + kfree(bud); } /** diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c index 349f31a..9083bc7 100644 --- a/fs/ubifs/tnc.c +++ b/fs/ubifs/tnc.c @@ -178,27 +178,11 @@ static int ins_clr_old_idx_znode(struct ubifs_info *c, */ void destroy_old_idx(struct ubifs_info *c) { - struct rb_node *this = c->old_idx.rb_node; - struct ubifs_old_idx *old_idx; + struct ubifs_old_idx *old_idx, *n; - while (this) { - if (this->rb_left) { - this = this->rb_left; - continue; - } else if (this->rb_right) { - this = this->rb_right; - continue; - } - old_idx = rb_entry(this, struct ubifs_old_idx, rb); - this = rb_parent(this); - if (this) { - if (this->rb_left == &old_idx->rb) - this->rb_left = NULL; - else - this->rb_right = NULL; - } + rbtree_postorder_for_each_entry_safe(old_idx, n, &c->old_idx, rb) kfree(old_idx); - } + c->old_idx = RB_ROOT; } -- cgit v0.10.2 From d1866bd06101eb8ab2bb9d180b47c052c04b7cee Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:10 -0800 Subject: fs/ext4: use rbtree postorder iteration helper instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer Reviewed-by: Jan Kara Cc: Michel Lespinasse Cc: Theodore Ts'o Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c index 3f11656..41eb9dc 100644 --- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -180,37 +180,12 @@ int ext4_setup_system_zone(struct super_block *sb) /* Called when the filesystem is unmounted */ void ext4_release_system_zone(struct super_block *sb) { - struct rb_node *n = EXT4_SB(sb)->system_blks.rb_node; - struct rb_node *parent; - struct ext4_system_zone *entry; + struct ext4_system_zone *entry, *n; - while (n) { - /* Do the node's children first */ - if (n->rb_left) { - n = n->rb_left; - continue; - } - if (n->rb_right) { - n = n->rb_right; - continue; - } - /* - * The node has no children; free it, and then zero - * out parent's link to it. Finally go to the - * beginning of the loop and try to free the parent - * node. - */ - parent = rb_parent(n); - entry = rb_entry(n, struct ext4_system_zone, node); + rbtree_postorder_for_each_entry_safe(entry, n, + &EXT4_SB(sb)->system_blks, node) kmem_cache_free(ext4_system_zone_cachep, entry); - if (!parent) - EXT4_SB(sb)->system_blks = RB_ROOT; - else if (parent->rb_left == n) - parent->rb_left = NULL; - else if (parent->rb_right == n) - parent->rb_right = NULL; - n = parent; - } + EXT4_SB(sb)->system_blks = RB_ROOT; } diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 680bb33..d638c57 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -353,41 +353,16 @@ struct fname { */ static void free_rb_tree_fname(struct rb_root *root) { - struct rb_node *n = root->rb_node; - struct rb_node *parent; - struct fname *fname; - - while (n) { - /* Do the node's children first */ - if (n->rb_left) { - n = n->rb_left; - continue; - } - if (n->rb_right) { - n = n->rb_right; - continue; - } - /* - * The node has no children; free it, and then zero - * out parent's link to it. Finally go to the - * beginning of the loop and try to free the parent - * node. - */ - parent = rb_parent(n); - fname = rb_entry(n, struct fname, rb_hash); + struct fname *fname, *next; + + rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash) while (fname) { struct fname *old = fname; fname = fname->next; kfree(old); } - if (!parent) - *root = RB_ROOT; - else if (parent->rb_left == n) - parent->rb_left = NULL; - else if (parent->rb_right == n) - parent->rb_right = NULL; - n = parent; - } + + *root = RB_ROOT; } -- cgit v0.10.2 From e8bbeeb755a077cfc0f814b07739f9225642d65c Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:11 -0800 Subject: fs/jffs2: use rbtree postorder iteration helper instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Cc: David Woodhouse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/jffs2/nodelist.c b/fs/jffs2/nodelist.c index 975a1f5..9a5449b 100644 --- a/fs/jffs2/nodelist.c +++ b/fs/jffs2/nodelist.c @@ -564,25 +564,10 @@ struct jffs2_node_frag *jffs2_lookup_node_frag(struct rb_root *fragtree, uint32_ they're killed. */ void jffs2_kill_fragtree(struct rb_root *root, struct jffs2_sb_info *c) { - struct jffs2_node_frag *frag; - struct jffs2_node_frag *parent; - - if (!root->rb_node) - return; + struct jffs2_node_frag *frag, *next; dbg_fragtree("killing\n"); - - frag = (rb_entry(root->rb_node, struct jffs2_node_frag, rb)); - while(frag) { - if (frag->rb.rb_left) { - frag = frag_left(frag); - continue; - } - if (frag->rb.rb_right) { - frag = frag_right(frag); - continue; - } - + rbtree_postorder_for_each_entry_safe(frag, next, root, rb) { if (frag->node && !(--frag->node->frags)) { /* Not a hole, and it's the final remaining frag of this node. Free the node */ @@ -591,17 +576,8 @@ void jffs2_kill_fragtree(struct rb_root *root, struct jffs2_sb_info *c) jffs2_free_full_dnode(frag->node); } - parent = frag_parent(frag); - if (parent) { - if (frag_left(parent) == frag) - parent->rb.rb_left = NULL; - else - parent->rb.rb_right = NULL; - } jffs2_free_node_frag(frag); - frag = parent; - cond_resched(); } } diff --git a/fs/jffs2/readinode.c b/fs/jffs2/readinode.c index ae81b01..386303d 100644 --- a/fs/jffs2/readinode.c +++ b/fs/jffs2/readinode.c @@ -543,33 +543,13 @@ static int jffs2_build_inode_fragtree(struct jffs2_sb_info *c, static void jffs2_free_tmp_dnode_info_list(struct rb_root *list) { - struct rb_node *this; - struct jffs2_tmp_dnode_info *tn; - - this = list->rb_node; + struct jffs2_tmp_dnode_info *tn, *next; - /* Now at bottom of tree */ - while (this) { - if (this->rb_left) - this = this->rb_left; - else if (this->rb_right) - this = this->rb_right; - else { - tn = rb_entry(this, struct jffs2_tmp_dnode_info, rb); + rbtree_postorder_for_each_entry_safe(tn, next, list, rb) { jffs2_free_full_dnode(tn->fn); jffs2_free_tmp_dnode_info(tn); - - this = rb_parent(this); - if (!this) - break; - - if (this->rb_left == &tn->rb) - this->rb_left = NULL; - else if (this->rb_right == &tn->rb) - this->rb_right = NULL; - else BUG(); - } } + *list = RB_ROOT; } -- cgit v0.10.2 From b1c8047c6b474c639d923122ab84732cbfeb7225 Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:12 -0800 Subject: fs/ext3: use rbtree postorder iteration helper instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c index bafdd48..a331ad1 100644 --- a/fs/ext3/dir.c +++ b/fs/ext3/dir.c @@ -309,43 +309,17 @@ struct fname { */ static void free_rb_tree_fname(struct rb_root *root) { - struct rb_node *n = root->rb_node; - struct rb_node *parent; - struct fname *fname; - - while (n) { - /* Do the node's children first */ - if (n->rb_left) { - n = n->rb_left; - continue; - } - if (n->rb_right) { - n = n->rb_right; - continue; - } - /* - * The node has no children; free it, and then zero - * out parent's link to it. Finally go to the - * beginning of the loop and try to free the parent - * node. - */ - parent = rb_parent(n); - fname = rb_entry(n, struct fname, rb_hash); + struct fname *fname, *next; + + rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash) while (fname) { struct fname * old = fname; fname = fname->next; kfree (old); } - if (!parent) - *root = RB_ROOT; - else if (parent->rb_left == n) - parent->rb_left = NULL; - else if (parent->rb_right == n) - parent->rb_right = NULL; - n = parent; - } -} + *root = RB_ROOT; +} static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp, loff_t pos) -- cgit v0.10.2 From ed8f68669a27287a3b15882e8d88ebccae75ec59 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Thu, 23 Jan 2014 15:56:13 -0800 Subject: fs-ext3-use-rbtree-postorder-iteration-helper-instead-of-opencoding-fix use do{}while - more efficient and it squishes a coccinelle warning Reported-by: Fengguang Wu Cc: Cody P Schafer Cc: Jan Kara Cc: Michel Lespinasse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c index a331ad1..e66e480 100644 --- a/fs/ext3/dir.c +++ b/fs/ext3/dir.c @@ -312,11 +312,11 @@ static void free_rb_tree_fname(struct rb_root *root) struct fname *fname, *next; rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash) - while (fname) { - struct fname * old = fname; + do { + struct fname *old = fname; fname = fname->next; - kfree (old); - } + kfree(old); + } while (fname); *root = RB_ROOT; } -- cgit v0.10.2 From e376ed7c85fe102ff63db2eb8a0c5595f68151fa Mon Sep 17 00:00:00 2001 From: Cody P Schafer Date: Thu, 23 Jan 2014 15:56:14 -0800 Subject: arch/sh/kernel/dwarf.c: use rbtree postorder iteration helper instead of solution using repeated rb_erase() Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of using repeated rb_erase() calls Signed-off-by: Cody P Schafer Cc: Michel Lespinasse Cc: Jan Kara Cc: Paul Mundt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/sh/kernel/dwarf.c b/arch/sh/kernel/dwarf.c index 49c09c7..67a049e 100644 --- a/arch/sh/kernel/dwarf.c +++ b/arch/sh/kernel/dwarf.c @@ -995,29 +995,19 @@ static struct unwinder dwarf_unwinder = { static void dwarf_unwinder_cleanup(void) { - struct rb_node **fde_rb_node = &fde_root.rb_node; - struct rb_node **cie_rb_node = &cie_root.rb_node; + struct dwarf_fde *fde, *next_fde; + struct dwarf_cie *cie, *next_cie; /* * Deallocate all the memory allocated for the DWARF unwinder. * Traverse all the FDE/CIE lists and remove and free all the * memory associated with those data structures. */ - while (*fde_rb_node) { - struct dwarf_fde *fde; - - fde = rb_entry(*fde_rb_node, struct dwarf_fde, node); - rb_erase(*fde_rb_node, &fde_root); + rbtree_postorder_for_each_entry_safe(fde, next_fde, &fde_root, node) kfree(fde); - } - while (*cie_rb_node) { - struct dwarf_cie *cie; - - cie = rb_entry(*cie_rb_node, struct dwarf_cie, node); - rb_erase(*cie_rb_node, &cie_root); + rbtree_postorder_for_each_entry_safe(cie, next_cie, &cie_root, node) kfree(cie); - } kmem_cache_destroy(dwarf_reg_cachep); kmem_cache_destroy(dwarf_frame_cachep); -- cgit v0.10.2 From 949b9c3d4263c9b7c2448588afce37becd58e1ad Mon Sep 17 00:00:00 2001 From: Andreas Gruenbacher Date: Thu, 23 Jan 2014 15:56:15 -0800 Subject: userns: relax the posix_acl_valid() checks So far, POSIX ACLs are using a canonical representation that keeps all ACL entries in a strict order; the ACL_USER and ACL_GROUP entries for specific users and groups are ordered by user and group identifier, respectively. The user-space code provides ACL entries in this order; the kernel verifies that the ACL entry order is correct in posix_acl_valid(). User namespaces allow to arbitrary map user and group identifiers which can cause the ACL_USER and ACL_GROUP entry order to differ between user space and the kernel; posix_acl_valid() would then fail. Work around this by allowing ACL_USER and ACL_GROUP entries to be in any order in the kernel. The effect is only minor: file permission checks will pick the first matching ACL_USER entry, and check all matching ACL_GROUP entries. (The libacl user-space library and getfacl / setfacl tools will not create ACLs with duplicate user or group idenfifiers; they will handle ACLs with entries in an arbitrary order correctly.) Signed-off-by: Andreas Gruenbacher Cc: Eric W. Biederman Cc: Theodore Tso Cc: Christoph Hellwig Cc: Andreas Dilger Cc: Jan Kara Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 021e7c0..551e61b 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -149,8 +149,6 @@ posix_acl_valid(const struct posix_acl *acl) { const struct posix_acl_entry *pa, *pe; int state = ACL_USER_OBJ; - kuid_t prev_uid = INVALID_UID; - kgid_t prev_gid = INVALID_GID; int needs_mask = 0; FOREACH_ACL_ENTRY(pa, acl, pe) { @@ -169,10 +167,6 @@ posix_acl_valid(const struct posix_acl *acl) return -EINVAL; if (!uid_valid(pa->e_uid)) return -EINVAL; - if (uid_valid(prev_uid) && - uid_lte(pa->e_uid, prev_uid)) - return -EINVAL; - prev_uid = pa->e_uid; needs_mask = 1; break; @@ -188,10 +182,6 @@ posix_acl_valid(const struct posix_acl *acl) return -EINVAL; if (!gid_valid(pa->e_gid)) return -EINVAL; - if (gid_valid(prev_gid) && - gid_lte(pa->e_gid, prev_gid)) - return -EINVAL; - prev_gid = pa->e_gid; needs_mask = 1; break; -- cgit v0.10.2 From 63509beaf7ed7a9dc8c574be61189fce791489f0 Mon Sep 17 00:00:00 2001 From: Micky Ching Date: Thu, 23 Jan 2014 15:56:17 -0800 Subject: drivers/memstick/host/rtsx_pci_ms.c: fix ms card data transfer bug This patch is used to add support for ms card. The main difference between ms card and mspro card is long data transfer mode. mspro card can use auto mode DMA for long data transfer, but ms can not use this mode, it should use normal mode DMA. The memstick core added support for ms card, but the original driver will make ms card fail at initialization, because it uses auto mode DMA. This patch makes the ms card work properly. Signed-off-by: Micky Ching Cc: Maxim Levitsky Cc: Samuel Ortiz Cc: Alex Dubov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/memstick/host/rtsx_pci_ms.c b/drivers/memstick/host/rtsx_pci_ms.c index 25f8f93..2a635b6 100644 --- a/drivers/memstick/host/rtsx_pci_ms.c +++ b/drivers/memstick/host/rtsx_pci_ms.c @@ -145,6 +145,8 @@ static int ms_transfer_data(struct realtek_pci_ms *host, unsigned char data_dir, unsigned int length = sg->length; u16 sec_cnt = (u16)(length / 512); u8 val, trans_mode, dma_dir; + struct memstick_dev *card = host->msh->card; + bool pro_card = card->id.type == MEMSTICK_TYPE_PRO; dev_dbg(ms_dev(host), "%s: tpc = 0x%02x, data_dir = %s, length = %d\n", __func__, tpc, (data_dir == READ) ? "READ" : "WRITE", @@ -152,19 +154,21 @@ static int ms_transfer_data(struct realtek_pci_ms *host, unsigned char data_dir, if (data_dir == READ) { dma_dir = DMA_DIR_FROM_CARD; - trans_mode = MS_TM_AUTO_READ; + trans_mode = pro_card ? MS_TM_AUTO_READ : MS_TM_NORMAL_READ; } else { dma_dir = DMA_DIR_TO_CARD; - trans_mode = MS_TM_AUTO_WRITE; + trans_mode = pro_card ? MS_TM_AUTO_WRITE : MS_TM_NORMAL_WRITE; } rtsx_pci_init_cmd(pcr); rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_TPC, 0xFF, tpc); - rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_SECTOR_CNT_H, - 0xFF, (u8)(sec_cnt >> 8)); - rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_SECTOR_CNT_L, - 0xFF, (u8)sec_cnt); + if (pro_card) { + rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_SECTOR_CNT_H, + 0xFF, (u8)(sec_cnt >> 8)); + rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_SECTOR_CNT_L, + 0xFF, (u8)sec_cnt); + } rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, MS_TRANS_CFG, 0xFF, cfg); rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, IRQSTAT0, @@ -192,8 +196,14 @@ static int ms_transfer_data(struct realtek_pci_ms *host, unsigned char data_dir, } rtsx_pci_read_register(pcr, MS_TRANS_CFG, &val); - if (val & (MS_INT_CMDNK | MS_INT_ERR | MS_CRC16_ERR | MS_RDY_TIMEOUT)) - return -EIO; + if (pro_card) { + if (val & (MS_INT_CMDNK | MS_INT_ERR | + MS_CRC16_ERR | MS_RDY_TIMEOUT)) + return -EIO; + } else { + if (val & (MS_CRC16_ERR | MS_RDY_TIMEOUT)) + return -EIO; + } return 0; } @@ -462,8 +472,8 @@ static int rtsx_pci_ms_set_param(struct memstick_host *msh, clock = 19000000; ssc_depth = RTSX_SSC_DEPTH_500K; - err = rtsx_pci_write_register(pcr, MS_CFG, - 0x18, MS_BUS_WIDTH_1); + err = rtsx_pci_write_register(pcr, MS_CFG, 0x58, + MS_BUS_WIDTH_1 | PUSH_TIME_DEFAULT); if (err < 0) return err; } else if (value == MEMSTICK_PAR4) { -- cgit v0.10.2 From 3089a4c8d3abc7e2ab105d1d39d415110d1566d6 Mon Sep 17 00:00:00 2001 From: Evgeny Boger Date: Thu, 23 Jan 2014 15:56:18 -0800 Subject: drivers/w1/masters/w1-gpio.c: add strong pullup emulation Strong pullup is emulated by driving pin logic high after write command when using tri-state push-pull GPIO. Signed-off-by: Evgeny Boger Cc: Greg KH Acked-by: David Fries Acked-by: Evgeniy Polyakov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/w1/masters/w1-gpio.c b/drivers/w1/masters/w1-gpio.c index e36b18b..9709b8b 100644 --- a/drivers/w1/masters/w1-gpio.c +++ b/drivers/w1/masters/w1-gpio.c @@ -18,10 +18,31 @@ #include #include #include +#include #include "../w1.h" #include "../w1_int.h" +static u8 w1_gpio_set_pullup(void *data, int delay) +{ + struct w1_gpio_platform_data *pdata = data; + + if (delay) { + pdata->pullup_duration = delay; + } else { + if (pdata->pullup_duration) { + gpio_direction_output(pdata->pin, 1); + + msleep(pdata->pullup_duration); + + gpio_direction_input(pdata->pin); + } + pdata->pullup_duration = 0; + } + + return 0; +} + static void w1_gpio_write_bit_dir(void *data, u8 bit) { struct w1_gpio_platform_data *pdata = data; @@ -132,6 +153,7 @@ static int w1_gpio_probe(struct platform_device *pdev) } else { gpio_direction_input(pdata->pin); master->write_bit = w1_gpio_write_bit_dir; + master->set_pullup = w1_gpio_set_pullup; } err = w1_add_master_device(master); diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c index 5a98649..590bd8a 100644 --- a/drivers/w1/w1_int.c +++ b/drivers/w1/w1_int.c @@ -117,18 +117,6 @@ int w1_add_master_device(struct w1_bus_master *master) printk(KERN_ERR "w1_add_master_device: invalid function set\n"); return(-EINVAL); } - /* While it would be electrically possible to make a device that - * generated a strong pullup in bit bang mode, only hardware that - * controls 1-wire time frames are even expected to support a strong - * pullup. w1_io.c would need to support calling set_pullup before - * the last write_bit operation of a w1_write_8 which it currently - * doesn't. - */ - if (!master->write_byte && !master->touch_bit && master->set_pullup) { - printk(KERN_ERR "w1_add_master_device: set_pullup requires " - "write_byte or touch_bit, disabling\n"); - master->set_pullup = NULL; - } /* Lock until the device is added (or not) to w1_masters. */ mutex_lock(&w1_mlock); diff --git a/include/linux/w1-gpio.h b/include/linux/w1-gpio.h index 065e3ae..d58594a 100644 --- a/include/linux/w1-gpio.h +++ b/include/linux/w1-gpio.h @@ -20,6 +20,7 @@ struct w1_gpio_platform_data { unsigned int is_open_drain:1; void (*enable_external_pullup)(int enable); unsigned int ext_pullup_enable_pin; + unsigned int pullup_duration; }; #endif /* _LINUX_W1_GPIO_H */ -- cgit v0.10.2 From 40e2c71d57565a82970a5a2b75f7eb67bb3252f4 Mon Sep 17 00:00:00 2001 From: Rui Xiang Date: Thu, 23 Jan 2014 15:56:19 -0800 Subject: romfs: fix returm err while getting inode in fill_super Getting an inode by romfs_iget may lead to an err in fill_super, and the err value should be return. And it should return -ENOMEM instead while d_make_root fails, fix it too. Signed-off-by: Rui Xiang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/romfs/super.c b/fs/romfs/super.c index ff1d3d4..d841878 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -533,16 +533,14 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent) root = romfs_iget(sb, pos); if (IS_ERR(root)) - goto error; + return PTR_ERR(root); sb->s_root = d_make_root(root); if (!sb->s_root) - goto error; + return -ENOMEM; return 0; -error: - return -EINVAL; error_rsb_inval: ret = -EINVAL; error_rsb: -- cgit v0.10.2 From 2a1d689c9ba42a6066540fb221b6ecbd6298b728 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Thu, 23 Jan 2014 15:56:20 -0800 Subject: lib/decompress_unlz4.c: always set an error return code on failures "ret", being set to -1 early on, gets cleared by the first invocation of lz4_decompress()/lz4_decompress_unknownoutputsize(), and hence subsequent failures wouldn't be noticed by the caller without setting it back to -1 right after those calls. Reported-by: Matthew Daley Signed-off-by: Jan Beulich Cc: Kyungsik Lee Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c index 3e67cfad1..7d1e83c 100644 --- a/lib/decompress_unlz4.c +++ b/lib/decompress_unlz4.c @@ -141,6 +141,7 @@ STATIC inline int INIT unlz4(u8 *input, int in_len, goto exit_2; } + ret = -1; if (flush && flush(outp, dest_len) != dest_len) goto exit_2; if (output) -- cgit v0.10.2