From edcad25095503c9a390bea0cc2f518793447a6a8 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Fri, 8 Aug 2014 14:19:15 -0700 Subject: Revert "slab: remove BAD_ALIEN_MAGIC" This reverts commit a640616822b2 ("slab: remove BAD_ALIEN_MAGIC"). commit a640616822b2 ("slab: remove BAD_ALIEN_MAGIC") assumes that the system with !CONFIG_NUMA has only one memory node. But, it turns out to be false by the report from Geert. His system, m68k, has many memory nodes and is configured in !CONFIG_NUMA. So it couldn't boot with above change. Here goes his failure report. With latest mainline, I'm getting a crash during bootup on m68k/ARAnyM: enable_cpucache failed for radix_tree_node, error 12. kernel BUG at /scratch/geert/linux/linux-m68k/mm/slab.c:1522! *** TRAP #7 *** FORMAT=0 Current process id is 0 BAD KERNEL TRAP: 00000000 Modules linked in: PC: [<0039c92c>] kmem_cache_init_late+0x70/0x8c SR: 2200 SP: 00345f90 a2: 0034c2e8 d0: 0000003d d1: 00000000 d2: 00000000 d3: 003ac942 d4: 00000000 d5: 00000000 a0: 0034f686 a1: 0034f682 Process swapper (pid: 0, task=0034c2e8) Frame format=0 Stack from 00345fc4: 002f69ef 002ff7e5 000005f2 000360fa 0017d806 003921d4 00000000 00000000 00000000 00000000 00000000 00000000 003ac942 00000000 003912d6 Call Trace: [<000360fa>] parse_args+0x0/0x2ca [<0017d806>] strlen+0x0/0x1a [<003921d4>] start_kernel+0x23c/0x428 [<003912d6>] _sinittext+0x2d6/0x95e Code: f7e5 4879 002f 69ef 61ff ffca 462a 4e47 <4879> 0035 4b1c 61ff fff0 0cc4 7005 23c0 0037 fd20 588f 265f 285f 4e75 48e7 301c Disabling lock debugging due to kernel taint Kernel panic - not syncing: Attempted to kill the idle task! Although there is a alternative way to fix this issue such as disabling use of alien cache on !CONFIG_NUMA, but, reverting issued commit is better to me in this time. Signed-off-by: Joonsoo Kim Reported-by: Geert Uytterhoeven Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/slab.c b/mm/slab.c index 2e60bf3..a467b30 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -470,6 +470,8 @@ static struct kmem_cache kmem_cache_boot = { .name = "kmem_cache", }; +#define BAD_ALIEN_MAGIC 0x01020304ul + static DEFINE_PER_CPU(struct delayed_work, slab_reap_work); static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep) @@ -836,7 +838,7 @@ static int transfer_objects(struct array_cache *to, static inline struct alien_cache **alloc_alien_cache(int node, int limit, gfp_t gfp) { - return NULL; + return (struct alien_cache **)BAD_ALIEN_MAGIC; } static inline void free_alien_cache(struct alien_cache **ac_ptr) -- cgit v0.10.2 From 4449a51a7c281602d3a385044ab928322a122a02 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 8 Aug 2014 14:19:17 -0700 Subject: vm_is_stack: use for_each_thread() rather then buggy while_each_thread() Aleksei hit the soft lockup during reading /proc/PID/smaps. David investigated the problem and suggested the right fix. while_each_thread() is racy and should die, this patch updates vm_is_stack(). Signed-off-by: Oleg Nesterov Reported-by: Aleksei Besogonov Tested-by: Aleksei Besogonov Suggested-by: David Rientjes Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/util.c b/mm/util.c index 7b6608d..093c973 100644 --- a/mm/util.c +++ b/mm/util.c @@ -183,17 +183,14 @@ pid_t vm_is_stack(struct task_struct *task, if (in_group) { struct task_struct *t; - rcu_read_lock(); - if (!pid_alive(task)) - goto done; - t = task; - do { + rcu_read_lock(); + for_each_thread(task, t) { if (vm_is_stack_for_task(t, vma)) { ret = t->pid; goto done; } - } while_each_thread(task, t); + } done: rcu_read_unlock(); } -- cgit v0.10.2 From 00501b531c4723972aa11d6d4ebcf8d6552007c8 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 8 Aug 2014 14:19:20 -0700 Subject: mm: memcontrol: rewrite charge API These patches rework memcg charge lifetime to integrate more naturally with the lifetime of user pages. This drastically simplifies the code and reduces charging and uncharging overhead. The most expensive part of charging and uncharging is the page_cgroup bit spinlock, which is removed entirely after this series. Here are the top-10 profile entries of a stress test that reads a 128G sparse file on a freshly booted box, without even a dedicated cgroup (i.e. executing in the root memcg). Before: 15.36% cat [kernel.kallsyms] [k] copy_user_generic_string 13.31% cat [kernel.kallsyms] [k] memset 11.48% cat [kernel.kallsyms] [k] do_mpage_readpage 4.23% cat [kernel.kallsyms] [k] get_page_from_freelist 2.38% cat [kernel.kallsyms] [k] put_page 2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge 2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common 1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn After: 15.67% cat [kernel.kallsyms] [k] copy_user_generic_string 13.48% cat [kernel.kallsyms] [k] memset 11.42% cat [kernel.kallsyms] [k] do_mpage_readpage 3.98% cat [kernel.kallsyms] [k] get_page_from_freelist 2.46% cat [kernel.kallsyms] [k] put_page 2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn 1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk 1.30% cat [kernel.kallsyms] [k] kfree As you can see, the memcg footprint has shrunk quite a bit. text data bss dec hex filename 37970 9892 400 48262 bc86 mm/memcontrol.o.old 35239 9892 400 45531 b1db mm/memcontrol.o This patch (of 4): The memcg charge API charges pages before they are rmapped - i.e. have an actual "type" - and so every callsite needs its own set of charge and uncharge functions to know what type is being operated on. Worse, uncharge has to happen from a context that is still type-specific, rather than at the end of the page's lifetime with exclusive access, and so requires a lot of synchronization. Rewrite the charge API to provide a generic set of try_charge(), commit_charge() and cancel_charge() transaction operations, much like what's currently done for swap-in: mem_cgroup_try_charge() attempts to reserve a charge, reclaiming pages from the memcg if necessary. mem_cgroup_commit_charge() commits the page to the charge once it has a valid page->mapping and PageAnon() reliably tells the type. mem_cgroup_cancel_charge() aborts the transaction. This reduces the charge API and enables subsequent patches to drastically simplify uncharging. As pages need to be committed after rmap is established but before they are added to the LRU, page_add_new_anon_rmap() must stop doing LRU additions again. Revive lru_cache_add_active_or_unevictable(). [hughd@google.com: fix shmem_unuse] [hughd@google.com: Add comments on the private use of -EAGAIN] Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Tejun Heo Cc: Vladimir Davydov Signed-off-by: Hugh Dickins Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt index 80ac454..bcf750d 100644 --- a/Documentation/cgroups/memcg_test.txt +++ b/Documentation/cgroups/memcg_test.txt @@ -24,24 +24,7 @@ Please note that implementation details can be changed. a page/swp_entry may be charged (usage += PAGE_SIZE) at - mem_cgroup_charge_anon() - Called at new page fault and Copy-On-Write. - - mem_cgroup_try_charge_swapin() - Called at do_swap_page() (page fault on swap entry) and swapoff. - Followed by charge-commit-cancel protocol. (With swap accounting) - At commit, a charge recorded in swap_cgroup is removed. - - mem_cgroup_charge_file() - Called at add_to_page_cache() - - mem_cgroup_cache_charge_swapin() - Called at shmem's swapin. - - mem_cgroup_prepare_migration() - Called before migration. "extra" charge is done and followed by - charge-commit-cancel protocol. - At commit, charge against oldpage or newpage will be committed. + mem_cgroup_try_charge() 2. Uncharge a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by @@ -69,19 +52,14 @@ Please note that implementation details can be changed. to new page is committed. At failure, charge to old page is committed. 3. charge-commit-cancel - In some case, we can't know this "charge" is valid or not at charging - (because of races). - To handle such case, there are charge-commit-cancel functions. - mem_cgroup_try_charge_XXX - mem_cgroup_commit_charge_XXX - mem_cgroup_cancel_charge_XXX - these are used in swap-in and migration. + Memcg pages are charged in two steps: + mem_cgroup_try_charge() + mem_cgroup_commit_charge() or mem_cgroup_cancel_charge() At try_charge(), there are no flags to say "this page is charged". at this point, usage += PAGE_SIZE. - At commit(), the function checks the page should be charged or not - and set flags or avoid charging.(usage -= PAGE_SIZE) + At commit(), the page is associated with the memcg. At cancel(), simply usage -= PAGE_SIZE. diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index eb65d29..1a9a096 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -54,28 +54,11 @@ struct mem_cgroup_reclaim_cookie { }; #ifdef CONFIG_MEMCG -/* - * All "charge" functions with gfp_mask should use GFP_KERNEL or - * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't - * alloc memory but reclaims memory from all available zones. So, "where I want - * memory from" bits of gfp_mask has no meaning. So any bits of that field is - * available but adding a rule is better. charge functions' gfp_mask should - * be set to GFP_KERNEL or gfp_mask & GFP_RECLAIM_MASK for avoiding ambiguous - * codes. - * (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.) - */ - -extern int mem_cgroup_charge_anon(struct page *page, struct mm_struct *mm, - gfp_t gfp_mask); -/* for swap handling */ -extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm, - struct page *page, gfp_t mask, struct mem_cgroup **memcgp); -extern void mem_cgroup_commit_charge_swapin(struct page *page, - struct mem_cgroup *memcg); -extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg); - -extern int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm, - gfp_t gfp_mask); +int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask, struct mem_cgroup **memcgp); +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, + bool lrucare); +void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); @@ -233,30 +216,22 @@ void mem_cgroup_print_bad_page(struct page *page); #else /* CONFIG_MEMCG */ struct mem_cgroup; -static inline int mem_cgroup_charge_anon(struct page *page, - struct mm_struct *mm, gfp_t gfp_mask) -{ - return 0; -} - -static inline int mem_cgroup_charge_file(struct page *page, - struct mm_struct *mm, gfp_t gfp_mask) -{ - return 0; -} - -static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm, - struct page *page, gfp_t gfp_mask, struct mem_cgroup **memcgp) +static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask, + struct mem_cgroup **memcgp) { + *memcgp = NULL; return 0; } -static inline void mem_cgroup_commit_charge_swapin(struct page *page, - struct mem_cgroup *memcg) +static inline void mem_cgroup_commit_charge(struct page *page, + struct mem_cgroup *memcg, + bool lrucare) { } -static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg) +static inline void mem_cgroup_cancel_charge(struct page *page, + struct mem_cgroup *memcg) { } diff --git a/include/linux/swap.h b/include/linux/swap.h index 1eb6404..46a649e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -320,6 +320,9 @@ extern void swap_setup(void); extern void add_page_to_unevictable_list(struct page *page); +extern void lru_cache_add_active_or_unevictable(struct page *page, + struct vm_area_struct *vma); + /* linux/mm/vmscan.c */ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 6f3254e..1d0af8a 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -167,6 +167,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, /* For mmu_notifiers */ const unsigned long mmun_start = addr; const unsigned long mmun_end = addr + PAGE_SIZE; + struct mem_cgroup *memcg; + + err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg); + if (err) + return err; /* For try_to_free_swap() and munlock_vma_page() below */ lock_page(page); @@ -179,6 +184,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, get_page(kpage); page_add_new_anon_rmap(kpage, vma, addr); + mem_cgroup_commit_charge(kpage, memcg, false); + lru_cache_add_active_or_unevictable(kpage, vma); if (!PageAnon(page)) { dec_mm_counter(mm, MM_FILEPAGES); @@ -200,6 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, err = 0; unlock: + mem_cgroup_cancel_charge(kpage, memcg); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); unlock_page(page); return err; @@ -315,18 +323,11 @@ retry: if (!new_page) goto put_old; - if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL)) - goto put_new; - __SetPageUptodate(new_page); copy_highpage(new_page, old_page); copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); ret = __replace_page(vma, vaddr, old_page, new_page); - if (ret) - mem_cgroup_uncharge_page(new_page); - -put_new: page_cache_release(new_page); put_old: put_page(old_page); diff --git a/mm/filemap.c b/mm/filemap.c index af19a6b..349a40e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -31,6 +31,7 @@ #include #include #include /* for BUG_ON(!in_atomic()) only */ +#include #include #include #include @@ -548,19 +549,24 @@ static int __add_to_page_cache_locked(struct page *page, pgoff_t offset, gfp_t gfp_mask, void **shadowp) { + int huge = PageHuge(page); + struct mem_cgroup *memcg; int error; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageSwapBacked(page), page); - error = mem_cgroup_charge_file(page, current->mm, - gfp_mask & GFP_RECLAIM_MASK); - if (error) - return error; + if (!huge) { + error = mem_cgroup_try_charge(page, current->mm, + gfp_mask, &memcg); + if (error) + return error; + } error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM); if (error) { - mem_cgroup_uncharge_cache_page(page); + if (!huge) + mem_cgroup_cancel_charge(page, memcg); return error; } @@ -575,13 +581,16 @@ static int __add_to_page_cache_locked(struct page *page, goto err_insert; __inc_zone_page_state(page, NR_FILE_PAGES); spin_unlock_irq(&mapping->tree_lock); + if (!huge) + mem_cgroup_commit_charge(page, memcg, false); trace_mm_filemap_add_to_page_cache(page); return 0; err_insert: page->mapping = NULL; /* Leave page->index set: truncation relies upon it */ spin_unlock_irq(&mapping->tree_lock); - mem_cgroup_uncharge_cache_page(page); + if (!huge) + mem_cgroup_cancel_charge(page, memcg); page_cache_release(page); return error; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3630d57..d9a21d06 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -715,13 +715,20 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, unsigned long haddr, pmd_t *pmd, struct page *page) { + struct mem_cgroup *memcg; pgtable_t pgtable; spinlock_t *ptl; VM_BUG_ON_PAGE(!PageCompound(page), page); + + if (mem_cgroup_try_charge(page, mm, GFP_TRANSHUGE, &memcg)) + return VM_FAULT_OOM; + pgtable = pte_alloc_one(mm, haddr); - if (unlikely(!pgtable)) + if (unlikely(!pgtable)) { + mem_cgroup_cancel_charge(page, memcg); return VM_FAULT_OOM; + } clear_huge_page(page, haddr, HPAGE_PMD_NR); /* @@ -734,7 +741,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_none(*pmd))) { spin_unlock(ptl); - mem_cgroup_uncharge_page(page); + mem_cgroup_cancel_charge(page, memcg); put_page(page); pte_free(mm, pgtable); } else { @@ -742,6 +749,8 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, entry = mk_huge_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); page_add_new_anon_rmap(page, vma, haddr); + mem_cgroup_commit_charge(page, memcg, false); + lru_cache_add_active_or_unevictable(page, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, haddr, pmd, entry); add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); @@ -827,13 +836,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, count_vm_event(THP_FAULT_FALLBACK); return VM_FAULT_FALLBACK; } - if (unlikely(mem_cgroup_charge_anon(page, mm, GFP_TRANSHUGE))) { - put_page(page); - count_vm_event(THP_FAULT_FALLBACK); - return VM_FAULT_FALLBACK; - } if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) { - mem_cgroup_uncharge_page(page); put_page(page); count_vm_event(THP_FAULT_FALLBACK); return VM_FAULT_FALLBACK; @@ -979,6 +982,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, struct page *page, unsigned long haddr) { + struct mem_cgroup *memcg; spinlock_t *ptl; pgtable_t pgtable; pmd_t _pmd; @@ -999,20 +1003,21 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, __GFP_OTHER_NODE, vma, address, page_to_nid(page)); if (unlikely(!pages[i] || - mem_cgroup_charge_anon(pages[i], mm, - GFP_KERNEL))) { + mem_cgroup_try_charge(pages[i], mm, GFP_KERNEL, + &memcg))) { if (pages[i]) put_page(pages[i]); - mem_cgroup_uncharge_start(); while (--i >= 0) { - mem_cgroup_uncharge_page(pages[i]); + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); + mem_cgroup_cancel_charge(pages[i], memcg); put_page(pages[i]); } - mem_cgroup_uncharge_end(); kfree(pages); ret |= VM_FAULT_OOM; goto out; } + set_page_private(pages[i], (unsigned long)memcg); } for (i = 0; i < HPAGE_PMD_NR; i++) { @@ -1041,7 +1046,11 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, pte_t *pte, entry; entry = mk_pte(pages[i], vma->vm_page_prot); entry = maybe_mkwrite(pte_mkdirty(entry), vma); + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); page_add_new_anon_rmap(pages[i], vma, haddr); + mem_cgroup_commit_charge(pages[i], memcg, false); + lru_cache_add_active_or_unevictable(pages[i], vma); pte = pte_offset_map(&_pmd, haddr); VM_BUG_ON(!pte_none(*pte)); set_pte_at(mm, haddr, pte, entry); @@ -1065,12 +1074,12 @@ out: out_free_pages: spin_unlock(ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); - mem_cgroup_uncharge_start(); for (i = 0; i < HPAGE_PMD_NR; i++) { - mem_cgroup_uncharge_page(pages[i]); + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); + mem_cgroup_cancel_charge(pages[i], memcg); put_page(pages[i]); } - mem_cgroup_uncharge_end(); kfree(pages); goto out; } @@ -1081,6 +1090,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, spinlock_t *ptl; int ret = 0; struct page *page = NULL, *new_page; + struct mem_cgroup *memcg; unsigned long haddr; unsigned long mmun_start; /* For mmu_notifiers */ unsigned long mmun_end; /* For mmu_notifiers */ @@ -1132,7 +1142,8 @@ alloc: goto out; } - if (unlikely(mem_cgroup_charge_anon(new_page, mm, GFP_TRANSHUGE))) { + if (unlikely(mem_cgroup_try_charge(new_page, mm, + GFP_TRANSHUGE, &memcg))) { put_page(new_page); if (page) { split_huge_page(page); @@ -1161,7 +1172,7 @@ alloc: put_user_huge_page(page); if (unlikely(!pmd_same(*pmd, orig_pmd))) { spin_unlock(ptl); - mem_cgroup_uncharge_page(new_page); + mem_cgroup_cancel_charge(new_page, memcg); put_page(new_page); goto out_mn; } else { @@ -1170,6 +1181,8 @@ alloc: entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); pmdp_clear_flush(vma, haddr, pmd); page_add_new_anon_rmap(new_page, vma, haddr); + mem_cgroup_commit_charge(new_page, memcg, false); + lru_cache_add_active_or_unevictable(new_page, vma); set_pmd_at(mm, haddr, pmd, entry); update_mmu_cache_pmd(vma, address, pmd); if (!page) { @@ -2413,6 +2426,7 @@ static void collapse_huge_page(struct mm_struct *mm, spinlock_t *pmd_ptl, *pte_ptl; int isolated; unsigned long hstart, hend; + struct mem_cgroup *memcg; unsigned long mmun_start; /* For mmu_notifiers */ unsigned long mmun_end; /* For mmu_notifiers */ @@ -2423,7 +2437,8 @@ static void collapse_huge_page(struct mm_struct *mm, if (!new_page) return; - if (unlikely(mem_cgroup_charge_anon(new_page, mm, GFP_TRANSHUGE))) + if (unlikely(mem_cgroup_try_charge(new_page, mm, + GFP_TRANSHUGE, &memcg))) return; /* @@ -2510,6 +2525,8 @@ static void collapse_huge_page(struct mm_struct *mm, spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); page_add_new_anon_rmap(new_page, vma, address); + mem_cgroup_commit_charge(new_page, memcg, false); + lru_cache_add_active_or_unevictable(new_page, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); @@ -2523,7 +2540,7 @@ out_up_write: return; out: - mem_cgroup_uncharge_page(new_page); + mem_cgroup_cancel_charge(new_page, memcg); goto out_up_write; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90dc501..1cbe1e5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2551,17 +2551,8 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb, return NOTIFY_OK; } -/** - * mem_cgroup_try_charge - try charging a memcg - * @memcg: memcg to charge - * @nr_pages: number of pages to charge - * - * Returns 0 if @memcg was charged successfully, -EINTR if the charge - * was bypassed to root_mem_cgroup, and -ENOMEM if the charge failed. - */ -static int mem_cgroup_try_charge(struct mem_cgroup *memcg, - gfp_t gfp_mask, - unsigned int nr_pages) +static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) { unsigned int batch = max(CHARGE_BATCH, nr_pages); int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; @@ -2660,41 +2651,7 @@ done: return ret; } -/** - * mem_cgroup_try_charge_mm - try charging a mm - * @mm: mm_struct to charge - * @nr_pages: number of pages to charge - * @oom: trigger OOM if reclaim fails - * - * Returns the charged mem_cgroup associated with the given mm_struct or - * NULL the charge failed. - */ -static struct mem_cgroup *mem_cgroup_try_charge_mm(struct mm_struct *mm, - gfp_t gfp_mask, - unsigned int nr_pages) - -{ - struct mem_cgroup *memcg; - int ret; - - memcg = get_mem_cgroup_from_mm(mm); - ret = mem_cgroup_try_charge(memcg, gfp_mask, nr_pages); - css_put(&memcg->css); - if (ret == -EINTR) - memcg = root_mem_cgroup; - else if (ret) - memcg = NULL; - - return memcg; -} - -/* - * Somemtimes we have to undo a charge we got by try_charge(). - * This function is for that and do uncharge, put css's refcnt. - * gotten by try_charge(). - */ -static void __mem_cgroup_cancel_charge(struct mem_cgroup *memcg, - unsigned int nr_pages) +static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long bytes = nr_pages * PAGE_SIZE; @@ -2760,17 +2717,13 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) return memcg; } -static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg, - struct page *page, - unsigned int nr_pages, - enum charge_type ctype, - bool lrucare) +static void commit_charge(struct page *page, struct mem_cgroup *memcg, + unsigned int nr_pages, bool anon, bool lrucare) { struct page_cgroup *pc = lookup_page_cgroup(page); struct zone *uninitialized_var(zone); struct lruvec *lruvec; bool was_on_lru = false; - bool anon; lock_page_cgroup(pc); VM_BUG_ON_PAGE(PageCgroupUsed(pc), page); @@ -2807,11 +2760,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg, spin_unlock_irq(&zone->lru_lock); } - if (ctype == MEM_CGROUP_CHARGE_TYPE_ANON) - anon = true; - else - anon = false; - mem_cgroup_charge_statistics(memcg, page, anon, nr_pages); unlock_page_cgroup(pc); @@ -2882,21 +2830,21 @@ static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) if (ret) return ret; - ret = mem_cgroup_try_charge(memcg, gfp, size >> PAGE_SHIFT); + ret = try_charge(memcg, gfp, size >> PAGE_SHIFT); if (ret == -EINTR) { /* - * mem_cgroup_try_charge() chosed to bypass to root due to - * OOM kill or fatal signal. Since our only options are to - * either fail the allocation or charge it to this cgroup, do - * it as a temporary condition. But we can't fail. From a - * kmem/slab perspective, the cache has already been selected, - * by mem_cgroup_kmem_get_cache(), so it is too late to change + * try_charge() chose to bypass to root due to OOM kill or + * fatal signal. Since our only options are to either fail + * the allocation or charge it to this cgroup, do it as a + * temporary condition. But we can't fail. From a kmem/slab + * perspective, the cache has already been selected, by + * mem_cgroup_kmem_get_cache(), so it is too late to change * our minds. * * This condition will only trigger if the task entered - * memcg_charge_kmem in a sane state, but was OOM-killed during - * mem_cgroup_try_charge() above. Tasks that were already - * dying when the allocation triggers should have been already + * memcg_charge_kmem in a sane state, but was OOM-killed + * during try_charge() above. Tasks that were already dying + * when the allocation triggers should have been already * directed to the root cgroup in memcontrol.h */ res_counter_charge_nofail(&memcg->res, size, &fail_res); @@ -3618,164 +3566,6 @@ out: return ret; } -int mem_cgroup_charge_anon(struct page *page, - struct mm_struct *mm, gfp_t gfp_mask) -{ - unsigned int nr_pages = 1; - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return 0; - - VM_BUG_ON_PAGE(page_mapped(page), page); - VM_BUG_ON_PAGE(page->mapping && !PageAnon(page), page); - VM_BUG_ON(!mm); - - if (PageTransHuge(page)) { - nr_pages <<= compound_order(page); - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - } - - memcg = mem_cgroup_try_charge_mm(mm, gfp_mask, nr_pages); - if (!memcg) - return -ENOMEM; - __mem_cgroup_commit_charge(memcg, page, nr_pages, - MEM_CGROUP_CHARGE_TYPE_ANON, false); - return 0; -} - -/* - * While swap-in, try_charge -> commit or cancel, the page is locked. - * And when try_charge() successfully returns, one refcnt to memcg without - * struct page_cgroup is acquired. This refcnt will be consumed by - * "commit()" or removed by "cancel()" - */ -static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm, - struct page *page, - gfp_t mask, - struct mem_cgroup **memcgp) -{ - struct mem_cgroup *memcg = NULL; - struct page_cgroup *pc; - int ret; - - pc = lookup_page_cgroup(page); - /* - * Every swap fault against a single page tries to charge the - * page, bail as early as possible. shmem_unuse() encounters - * already charged pages, too. The USED bit is protected by - * the page lock, which serializes swap cache removal, which - * in turn serializes uncharging. - */ - if (PageCgroupUsed(pc)) - goto out; - if (do_swap_account) - memcg = try_get_mem_cgroup_from_page(page); - if (!memcg) - memcg = get_mem_cgroup_from_mm(mm); - ret = mem_cgroup_try_charge(memcg, mask, 1); - css_put(&memcg->css); - if (ret == -EINTR) - memcg = root_mem_cgroup; - else if (ret) - return ret; -out: - *memcgp = memcg; - return 0; -} - -int mem_cgroup_try_charge_swapin(struct mm_struct *mm, struct page *page, - gfp_t gfp_mask, struct mem_cgroup **memcgp) -{ - if (mem_cgroup_disabled()) { - *memcgp = NULL; - return 0; - } - /* - * A racing thread's fault, or swapoff, may have already - * updated the pte, and even removed page from swap cache: in - * those cases unuse_pte()'s pte_same() test will fail; but - * there's also a KSM case which does need to charge the page. - */ - if (!PageSwapCache(page)) { - struct mem_cgroup *memcg; - - memcg = mem_cgroup_try_charge_mm(mm, gfp_mask, 1); - if (!memcg) - return -ENOMEM; - *memcgp = memcg; - return 0; - } - return __mem_cgroup_try_charge_swapin(mm, page, gfp_mask, memcgp); -} - -void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg) -{ - if (mem_cgroup_disabled()) - return; - if (!memcg) - return; - __mem_cgroup_cancel_charge(memcg, 1); -} - -static void -__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg, - enum charge_type ctype) -{ - if (mem_cgroup_disabled()) - return; - if (!memcg) - return; - - __mem_cgroup_commit_charge(memcg, page, 1, ctype, true); - /* - * Now swap is on-memory. This means this page may be - * counted both as mem and swap....double count. - * Fix it by uncharging from memsw. Basically, this SwapCache is stable - * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page() - * may call delete_from_swap_cache() before reach here. - */ - if (do_swap_account && PageSwapCache(page)) { - swp_entry_t ent = {.val = page_private(page)}; - mem_cgroup_uncharge_swap(ent); - } -} - -void mem_cgroup_commit_charge_swapin(struct page *page, - struct mem_cgroup *memcg) -{ - __mem_cgroup_commit_charge_swapin(page, memcg, - MEM_CGROUP_CHARGE_TYPE_ANON); -} - -int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm, - gfp_t gfp_mask) -{ - enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE; - struct mem_cgroup *memcg; - int ret; - - if (mem_cgroup_disabled()) - return 0; - if (PageCompound(page)) - return 0; - - if (PageSwapCache(page)) { /* shmem */ - ret = __mem_cgroup_try_charge_swapin(mm, page, - gfp_mask, &memcg); - if (ret) - return ret; - __mem_cgroup_commit_charge_swapin(page, memcg, type); - return 0; - } - - memcg = mem_cgroup_try_charge_mm(mm, gfp_mask, 1); - if (!memcg) - return -ENOMEM; - __mem_cgroup_commit_charge(memcg, page, 1, type, false); - return 0; -} - static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages, const enum charge_type ctype) @@ -4122,7 +3912,6 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage, struct mem_cgroup *memcg = NULL; unsigned int nr_pages = 1; struct page_cgroup *pc; - enum charge_type ctype; *memcgp = NULL; @@ -4184,16 +3973,12 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage, * page. In the case new page is migrated but not remapped, new page's * mapcount will be finally 0 and we call uncharge in end_migration(). */ - if (PageAnon(page)) - ctype = MEM_CGROUP_CHARGE_TYPE_ANON; - else - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE; /* * The page is committed to the memcg, but it's not actually * charged to the res_counter since we plan on replacing the * old one and only one page is going to be left afterwards. */ - __mem_cgroup_commit_charge(memcg, newpage, nr_pages, ctype, false); + commit_charge(newpage, memcg, nr_pages, PageAnon(page), false); } /* remove redundant charge if migration failed*/ @@ -4252,7 +4037,6 @@ void mem_cgroup_replace_page_cache(struct page *oldpage, { struct mem_cgroup *memcg = NULL; struct page_cgroup *pc; - enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE; if (mem_cgroup_disabled()) return; @@ -4278,7 +4062,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage, * the newpage may be on LRU(or pagevec for LRU) already. We lock * LRU while we overwrite pc->mem_cgroup. */ - __mem_cgroup_commit_charge(memcg, newpage, 1, type, true); + commit_charge(newpage, memcg, 1, false, true); } #ifdef CONFIG_DEBUG_VM @@ -6319,20 +6103,19 @@ static int mem_cgroup_do_precharge(unsigned long count) int ret; /* Try a single bulk charge without reclaim first */ - ret = mem_cgroup_try_charge(mc.to, GFP_KERNEL & ~__GFP_WAIT, count); + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_WAIT, count); if (!ret) { mc.precharge += count; return ret; } if (ret == -EINTR) { - __mem_cgroup_cancel_charge(root_mem_cgroup, count); + cancel_charge(root_mem_cgroup, count); return ret; } /* Try charges one by one with reclaim */ while (count--) { - ret = mem_cgroup_try_charge(mc.to, - GFP_KERNEL & ~__GFP_NORETRY, 1); + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1); /* * In case of failure, any residual charges against * mc.to will be dropped by mem_cgroup_clear_mc() @@ -6340,7 +6123,7 @@ static int mem_cgroup_do_precharge(unsigned long count) * bypassed to root right away or they'll be lost. */ if (ret == -EINTR) - __mem_cgroup_cancel_charge(root_mem_cgroup, 1); + cancel_charge(root_mem_cgroup, 1); if (ret) return ret; mc.precharge++; @@ -6609,7 +6392,7 @@ static void __mem_cgroup_clear_mc(void) /* we must uncharge all the leftover precharges from mc.to */ if (mc.precharge) { - __mem_cgroup_cancel_charge(mc.to, mc.precharge); + cancel_charge(mc.to, mc.precharge); mc.precharge = 0; } /* @@ -6617,7 +6400,7 @@ static void __mem_cgroup_clear_mc(void) * we must uncharge here. */ if (mc.moved_charge) { - __mem_cgroup_cancel_charge(mc.from, mc.moved_charge); + cancel_charge(mc.from, mc.moved_charge); mc.moved_charge = 0; } /* we must fixup refcnts and charges */ @@ -6946,6 +6729,150 @@ static void __init enable_swap_cgroup(void) } #endif +/** + * mem_cgroup_try_charge - try charging a page + * @page: page to charge + * @mm: mm context of the victim + * @gfp_mask: reclaim mode + * @memcgp: charged memcg return + * + * Try to charge @page to the memcg that @mm belongs to, reclaiming + * pages according to @gfp_mask if necessary. + * + * Returns 0 on success, with *@memcgp pointing to the charged memcg. + * Otherwise, an error code is returned. + * + * After page->mapping has been set up, the caller must finalize the + * charge with mem_cgroup_commit_charge(). Or abort the transaction + * with mem_cgroup_cancel_charge() in case page instantiation fails. + */ +int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, + gfp_t gfp_mask, struct mem_cgroup **memcgp) +{ + struct mem_cgroup *memcg = NULL; + unsigned int nr_pages = 1; + int ret = 0; + + if (mem_cgroup_disabled()) + goto out; + + if (PageSwapCache(page)) { + struct page_cgroup *pc = lookup_page_cgroup(page); + /* + * Every swap fault against a single page tries to charge the + * page, bail as early as possible. shmem_unuse() encounters + * already charged pages, too. The USED bit is protected by + * the page lock, which serializes swap cache removal, which + * in turn serializes uncharging. + */ + if (PageCgroupUsed(pc)) + goto out; + } + + if (PageTransHuge(page)) { + nr_pages <<= compound_order(page); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + } + + if (do_swap_account && PageSwapCache(page)) + memcg = try_get_mem_cgroup_from_page(page); + if (!memcg) + memcg = get_mem_cgroup_from_mm(mm); + + ret = try_charge(memcg, gfp_mask, nr_pages); + + css_put(&memcg->css); + + if (ret == -EINTR) { + memcg = root_mem_cgroup; + ret = 0; + } +out: + *memcgp = memcg; + return ret; +} + +/** + * mem_cgroup_commit_charge - commit a page charge + * @page: page to charge + * @memcg: memcg to charge the page to + * @lrucare: page might be on LRU already + * + * Finalize a charge transaction started by mem_cgroup_try_charge(), + * after page->mapping has been set up. This must happen atomically + * as part of the page instantiation, i.e. under the page table lock + * for anonymous pages, under the page lock for page and swap cache. + * + * In addition, the page must not be on the LRU during the commit, to + * prevent racing with task migration. If it might be, use @lrucare. + * + * Use mem_cgroup_cancel_charge() to cancel the transaction instead. + */ +void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, + bool lrucare) +{ + unsigned int nr_pages = 1; + + VM_BUG_ON_PAGE(!page->mapping, page); + VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page); + + if (mem_cgroup_disabled()) + return; + /* + * Swap faults will attempt to charge the same page multiple + * times. But reuse_swap_page() might have removed the page + * from swapcache already, so we can't check PageSwapCache(). + */ + if (!memcg) + return; + + if (PageTransHuge(page)) { + nr_pages <<= compound_order(page); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + } + + commit_charge(page, memcg, nr_pages, PageAnon(page), lrucare); + + if (do_swap_account && PageSwapCache(page)) { + swp_entry_t entry = { .val = page_private(page) }; + /* + * The swap entry might not get freed for a long time, + * let's not wait for it. The page already received a + * memory+swap charge, drop the swap entry duplicate. + */ + mem_cgroup_uncharge_swap(entry); + } +} + +/** + * mem_cgroup_cancel_charge - cancel a page charge + * @page: page to charge + * @memcg: memcg to charge the page to + * + * Cancel a charge transaction started by mem_cgroup_try_charge(). + */ +void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg) +{ + unsigned int nr_pages = 1; + + if (mem_cgroup_disabled()) + return; + /* + * Swap faults will attempt to charge the same page multiple + * times. But reuse_swap_page() might have removed the page + * from swapcache already, so we can't check PageSwapCache(). + */ + if (!memcg) + return; + + if (PageTransHuge(page)) { + nr_pages <<= compound_order(page); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + } + + cancel_charge(memcg, nr_pages); +} + /* * subsys_initcall() for memory controller. * diff --git a/mm/memory.c b/mm/memory.c index 5c55270..6d76487 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2049,6 +2049,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page *dirty_page = NULL; unsigned long mmun_start = 0; /* For mmu_notifiers */ unsigned long mmun_end = 0; /* For mmu_notifiers */ + struct mem_cgroup *memcg; old_page = vm_normal_page(vma, address, orig_pte); if (!old_page) { @@ -2204,7 +2205,7 @@ gotten: } __SetPageUptodate(new_page); - if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL)) + if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) goto oom_free_new; mmun_start = address & PAGE_MASK; @@ -2234,6 +2235,8 @@ gotten: */ ptep_clear_flush(vma, address, page_table); page_add_new_anon_rmap(new_page, vma, address); + mem_cgroup_commit_charge(new_page, memcg, false); + lru_cache_add_active_or_unevictable(new_page, vma); /* * We call the notify macro here because, when using secondary * mmu page tables (such as kvm shadow page tables), we want the @@ -2271,7 +2274,7 @@ gotten: new_page = old_page; ret |= VM_FAULT_WRITE; } else - mem_cgroup_uncharge_page(new_page); + mem_cgroup_cancel_charge(new_page, memcg); if (new_page) page_cache_release(new_page); @@ -2410,10 +2413,10 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, { spinlock_t *ptl; struct page *page, *swapcache; + struct mem_cgroup *memcg; swp_entry_t entry; pte_t pte; int locked; - struct mem_cgroup *ptr; int exclusive = 0; int ret = 0; @@ -2489,7 +2492,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, goto out_page; } - if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) { + if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg)) { ret = VM_FAULT_OOM; goto out_page; } @@ -2514,10 +2517,6 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, * while the page is counted on swap but not yet in mapcount i.e. * before page_add_anon_rmap() and swap_free(); try_to_free_swap() * must be called after the swap_free(), or it will never succeed. - * Because delete_from_swap_page() may be called by reuse_swap_page(), - * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry - * in page->private. In this case, a record in swap_cgroup is silently - * discarded at swap_free(). */ inc_mm_counter_fast(mm, MM_ANONPAGES); @@ -2533,12 +2532,14 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma, if (pte_swp_soft_dirty(orig_pte)) pte = pte_mksoft_dirty(pte); set_pte_at(mm, address, page_table, pte); - if (page == swapcache) + if (page == swapcache) { do_page_add_anon_rmap(page, vma, address, exclusive); - else /* ksm created a completely new copy */ + mem_cgroup_commit_charge(page, memcg, true); + } else { /* ksm created a completely new copy */ page_add_new_anon_rmap(page, vma, address); - /* It's better to call commit-charge after rmap is established */ - mem_cgroup_commit_charge_swapin(page, ptr); + mem_cgroup_commit_charge(page, memcg, false); + lru_cache_add_active_or_unevictable(page, vma); + } swap_free(entry); if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) @@ -2571,7 +2572,7 @@ unlock: out: return ret; out_nomap: - mem_cgroup_cancel_charge_swapin(ptr); + mem_cgroup_cancel_charge(page, memcg); pte_unmap_unlock(page_table, ptl); out_page: unlock_page(page); @@ -2627,6 +2628,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pte_t *page_table, pmd_t *pmd, unsigned int flags) { + struct mem_cgroup *memcg; struct page *page; spinlock_t *ptl; pte_t entry; @@ -2660,7 +2662,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, */ __SetPageUptodate(page); - if (mem_cgroup_charge_anon(page, mm, GFP_KERNEL)) + if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg)) goto oom_free_page; entry = mk_pte(page, vma->vm_page_prot); @@ -2673,6 +2675,8 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, inc_mm_counter_fast(mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, address); + mem_cgroup_commit_charge(page, memcg, false); + lru_cache_add_active_or_unevictable(page, vma); setpte: set_pte_at(mm, address, page_table, entry); @@ -2682,7 +2686,7 @@ unlock: pte_unmap_unlock(page_table, ptl); return 0; release: - mem_cgroup_uncharge_page(page); + mem_cgroup_cancel_charge(page, memcg); page_cache_release(page); goto unlock; oom_free_page: @@ -2919,6 +2923,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma, pgoff_t pgoff, unsigned int flags, pte_t orig_pte) { struct page *fault_page, *new_page; + struct mem_cgroup *memcg; spinlock_t *ptl; pte_t *pte; int ret; @@ -2930,7 +2935,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (!new_page) return VM_FAULT_OOM; - if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL)) { + if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) { page_cache_release(new_page); return VM_FAULT_OOM; } @@ -2950,12 +2955,14 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma, goto uncharge_out; } do_set_pte(vma, address, new_page, pte, true, true); + mem_cgroup_commit_charge(new_page, memcg, false); + lru_cache_add_active_or_unevictable(new_page, vma); pte_unmap_unlock(pte, ptl); unlock_page(fault_page); page_cache_release(fault_page); return ret; uncharge_out: - mem_cgroup_uncharge_page(new_page); + mem_cgroup_cancel_charge(new_page, memcg); page_cache_release(new_page); return ret; } diff --git a/mm/rmap.c b/mm/rmap.c index 22a4a76..f56b5ed 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1032,25 +1032,6 @@ void page_add_new_anon_rmap(struct page *page, __mod_zone_page_state(page_zone(page), NR_ANON_PAGES, hpage_nr_pages(page)); __page_set_anon_rmap(page, vma, address, 1); - - VM_BUG_ON_PAGE(PageLRU(page), page); - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) { - SetPageActive(page); - lru_cache_add(page); - return; - } - - if (!TestSetPageMlocked(page)) { - /* - * We use the irq-unsafe __mod_zone_page_stat because this - * counter is not modified from interrupt context, and the pte - * lock is held(spinlock), which implies preemption disabled. - */ - __mod_zone_page_state(page_zone(page), NR_MLOCK, - hpage_nr_pages(page)); - count_vm_event(UNEVICTABLE_PGMLOCKED); - } - add_page_to_unevictable_list(page); } /** diff --git a/mm/shmem.c b/mm/shmem.c index 302d1cf..1f1a808 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -621,7 +621,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, radswap = swp_to_radix_entry(swap); index = radix_tree_locate_item(&mapping->page_tree, radswap); if (index == -1) - return 0; + return -EAGAIN; /* tell shmem_unuse we found nothing */ /* * Move _head_ to start search for next from here. @@ -680,7 +680,6 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_unlock(&info->lock); swap_free(swap); } - error = 1; /* not an error, but entry was found */ } return error; } @@ -692,7 +691,7 @@ int shmem_unuse(swp_entry_t swap, struct page *page) { struct list_head *this, *next; struct shmem_inode_info *info; - int found = 0; + struct mem_cgroup *memcg; int error = 0; /* @@ -707,26 +706,32 @@ int shmem_unuse(swp_entry_t swap, struct page *page) * the shmem_swaplist_mutex which might hold up shmem_writepage(). * Charged back to the user (not to caller) when swap account is used. */ - error = mem_cgroup_charge_file(page, current->mm, GFP_KERNEL); + error = mem_cgroup_try_charge(page, current->mm, GFP_KERNEL, &memcg); if (error) goto out; /* No radix_tree_preload: swap entry keeps a place for page in tree */ + error = -EAGAIN; mutex_lock(&shmem_swaplist_mutex); list_for_each_safe(this, next, &shmem_swaplist) { info = list_entry(this, struct shmem_inode_info, swaplist); if (info->swapped) - found = shmem_unuse_inode(info, swap, &page); + error = shmem_unuse_inode(info, swap, &page); else list_del_init(&info->swaplist); cond_resched(); - if (found) + if (error != -EAGAIN) break; + /* found nothing in this: move on to search the next */ } mutex_unlock(&shmem_swaplist_mutex); - if (found < 0) - error = found; + if (error) { + if (error != -ENOMEM) + error = 0; + mem_cgroup_cancel_charge(page, memcg); + } else + mem_cgroup_commit_charge(page, memcg, true); out: unlock_page(page); page_cache_release(page); @@ -1030,6 +1035,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, struct address_space *mapping = inode->i_mapping; struct shmem_inode_info *info; struct shmem_sb_info *sbinfo; + struct mem_cgroup *memcg; struct page *page; swp_entry_t swap; int error; @@ -1108,8 +1114,7 @@ repeat: goto failed; } - error = mem_cgroup_charge_file(page, current->mm, - gfp & GFP_RECLAIM_MASK); + error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg); if (!error) { error = shmem_add_to_page_cache(page, mapping, index, swp_to_radix_entry(swap)); @@ -1125,12 +1130,16 @@ repeat: * Reset swap.val? No, leave it so "failed" goes back to * "repeat": reading a hole and writing should succeed. */ - if (error) + if (error) { + mem_cgroup_cancel_charge(page, memcg); delete_from_swap_cache(page); + } } if (error) goto failed; + mem_cgroup_commit_charge(page, memcg, true); + spin_lock(&info->lock); info->swapped--; shmem_recalc_inode(inode); @@ -1168,8 +1177,7 @@ repeat: if (sgp == SGP_WRITE) __SetPageReferenced(page); - error = mem_cgroup_charge_file(page, current->mm, - gfp & GFP_RECLAIM_MASK); + error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg); if (error) goto decused; error = radix_tree_maybe_preload(gfp & GFP_RECLAIM_MASK); @@ -1179,9 +1187,10 @@ repeat: radix_tree_preload_end(); } if (error) { - mem_cgroup_uncharge_cache_page(page); + mem_cgroup_cancel_charge(page, memcg); goto decused; } + mem_cgroup_commit_charge(page, memcg, false); lru_cache_add_anon(page); spin_lock(&info->lock); diff --git a/mm/swap.c b/mm/swap.c index c789d01..3baca70 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -687,6 +687,40 @@ void add_page_to_unevictable_list(struct page *page) spin_unlock_irq(&zone->lru_lock); } +/** + * lru_cache_add_active_or_unevictable + * @page: the page to be added to LRU + * @vma: vma in which page is mapped for determining reclaimability + * + * Place @page on the active or unevictable LRU list, depending on its + * evictability. Note that if the page is not evictable, it goes + * directly back onto it's zone's unevictable list, it does NOT use a + * per cpu pagevec. + */ +void lru_cache_add_active_or_unevictable(struct page *page, + struct vm_area_struct *vma) +{ + VM_BUG_ON_PAGE(PageLRU(page), page); + + if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) { + SetPageActive(page); + lru_cache_add(page); + return; + } + + if (!TestSetPageMlocked(page)) { + /* + * We use the irq-unsafe __mod_zone_page_stat because this + * counter is not modified from interrupt context, and the pte + * lock is held(spinlock), which implies preemption disabled. + */ + __mod_zone_page_state(page_zone(page), NR_MLOCK, + hpage_nr_pages(page)); + count_vm_event(UNEVICTABLE_PGMLOCKED); + } + add_page_to_unevictable_list(page); +} + /* * If the page can not be invalidated, it is moved to the * inactive list to speed up its reclaim. It is moved to the diff --git a/mm/swapfile.c b/mm/swapfile.c index 4c524f7..0883b49 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1106,15 +1106,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, if (unlikely(!page)) return -ENOMEM; - if (mem_cgroup_try_charge_swapin(vma->vm_mm, page, - GFP_KERNEL, &memcg)) { + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) { ret = -ENOMEM; goto out_nolock; } pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) { - mem_cgroup_cancel_charge_swapin(memcg); + mem_cgroup_cancel_charge(page, memcg); ret = 0; goto out; } @@ -1124,11 +1123,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, get_page(page); set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - if (page == swapcache) + if (page == swapcache) { page_add_anon_rmap(page, vma, addr); - else /* ksm created a completely new copy */ + mem_cgroup_commit_charge(page, memcg, true); + } else { /* ksm created a completely new copy */ page_add_new_anon_rmap(page, vma, addr); - mem_cgroup_commit_charge_swapin(page, memcg); + mem_cgroup_commit_charge(page, memcg, false); + lru_cache_add_active_or_unevictable(page, vma); + } swap_free(entry); /* * Move the page to the active list so it is not -- cgit v0.10.2 From 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 8 Aug 2014 14:19:22 -0700 Subject: mm: memcontrol: rewrite uncharge API The memcg uncharging code that is involved towards the end of a page's lifetime - truncation, reclaim, swapout, migration - is impressively complicated and fragile. Because anonymous and file pages were always charged before they had their page->mapping established, uncharges had to happen when the page type could still be known from the context; as in unmap for anonymous, page cache removal for file and shmem pages, and swap cache truncation for swap pages. However, these operations happen well before the page is actually freed, and so a lot of synchronization is necessary: - Charging, uncharging, page migration, and charge migration all need to take a per-page bit spinlock as they could race with uncharging. - Swap cache truncation happens during both swap-in and swap-out, and possibly repeatedly before the page is actually freed. This means that the memcg swapout code is called from many contexts that make no sense and it has to figure out the direction from page state to make sure memory and memory+swap are always correctly charged. - On page migration, the old page might be unmapped but then reused, so memcg code has to prevent untimely uncharging in that case. Because this code - which should be a simple charge transfer - is so special-cased, it is not reusable for replace_page_cache(). But now that charged pages always have a page->mapping, introduce mem_cgroup_uncharge(), which is called after the final put_page(), when we know for sure that nobody is looking at the page anymore. For page migration, introduce mem_cgroup_migrate(), which is called after the migration is successful and the new page is fully rmapped. Because the old page is no longer uncharged after migration, prevent double charges by decoupling the page's memcg association (PCG_USED and pc->mem_cgroup) from the page holding an actual charge. The new bits PCG_MEM and PCG_MEMSW represent the respective charges and are transferred to the new page during migration. mem_cgroup_migrate() is suitable for replace_page_cache() as well, which gets rid of mem_cgroup_replace_page_cache(). However, care needs to be taken because both the source and the target page can already be charged and on the LRU when fuse is splicing: grab the page lock on the charge moving side to prevent changing pc->mem_cgroup of a page under migration. Also, the lruvecs of both pages change as we uncharge the old and charge the new during migration, and putback may race with us, so grab the lru lock and isolate the pages iff on LRU to prevent races and ensure the pages are on the right lruvec afterward. Swap accounting is massively simplified: because the page is no longer uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry before the final put_page() in page reclaim. Finally, page_cgroup changes are now protected by whatever protection the page itself offers: anonymous pages are charged under the page table lock, whereas page cache insertions, swapin, and migration hold the page lock. Uncharging happens under full exclusion with no outstanding references. Charging and uncharging also ensure that the page is off-LRU, which serializes against charge migration. Remove the very costly page_cgroup lock and set pc->flags non-atomically. [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable] [vdavydov@parallels.com: fix flags definition] Signed-off-by: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Tested-by: Jet Chen Acked-by: Michal Hocko Tested-by: Felipe Balbi Signed-off-by: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt index bcf750d..8870b02 100644 --- a/Documentation/cgroups/memcg_test.txt +++ b/Documentation/cgroups/memcg_test.txt @@ -29,28 +29,13 @@ Please note that implementation details can be changed. 2. Uncharge a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by - mem_cgroup_uncharge_page() - Called when an anonymous page is fully unmapped. I.e., mapcount goes - to 0. If the page is SwapCache, uncharge is delayed until - mem_cgroup_uncharge_swapcache(). - - mem_cgroup_uncharge_cache_page() - Called when a page-cache is deleted from radix-tree. If the page is - SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache(). - - mem_cgroup_uncharge_swapcache() - Called when SwapCache is removed from radix-tree. The charge itself - is moved to swap_cgroup. (If mem+swap controller is disabled, no - charge to swap occurs.) + mem_cgroup_uncharge() + Called when a page's refcount goes down to 0. mem_cgroup_uncharge_swap() Called when swp_entry's refcnt goes down to 0. A charge against swap disappears. - mem_cgroup_end_migration(old, new) - At success of migration old is uncharged (if necessary), a charge - to new page is committed. At failure, charge to old page is committed. - 3. charge-commit-cancel Memcg pages are charged in two steps: mem_cgroup_try_charge() @@ -69,18 +54,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. Anonymous page is newly allocated at - page fault into MAP_ANONYMOUS mapping. - Copy-On-Write. - It is charged right after it's allocated before doing any page table - related operations. Of course, it's uncharged when another page is used - for the fault address. - - At freeing anonymous page (by exit() or munmap()), zap_pte() is called - and pages for ptes are freed one by one.(see mm/memory.c). Uncharges - are done at page_remove_rmap() when page_mapcount() goes down to 0. - - Another page freeing is by page-reclaim (vmscan.c) and anonymous - pages are swapped out. In this case, the page is marked as - PageSwapCache(). uncharge() routine doesn't uncharge the page marked - as SwapCache(). It's delayed until __delete_from_swap_cache(). 4.1 Swap-in. At swap-in, the page is taken from swap-cache. There are 2 cases. @@ -89,41 +62,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. (b) If the SwapCache has been mapped by processes, it has been charged already. - This swap-in is one of the most complicated work. In do_swap_page(), - following events occur when pte is unchanged. - - (1) the page (SwapCache) is looked up. - (2) lock_page() - (3) try_charge_swapin() - (4) reuse_swap_page() (may call delete_swap_cache()) - (5) commit_charge_swapin() - (6) swap_free(). - - Considering following situation for example. - - (A) The page has not been charged before (2) and reuse_swap_page() - doesn't call delete_from_swap_cache(). - (B) The page has not been charged before (2) and reuse_swap_page() - calls delete_from_swap_cache(). - (C) The page has been charged before (2) and reuse_swap_page() doesn't - call delete_from_swap_cache(). - (D) The page has been charged before (2) and reuse_swap_page() calls - delete_from_swap_cache(). - - memory.usage/memsw.usage changes to this page/swp_entry will be - Case (A) (B) (C) (D) - Event - Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 - =========================================== - (3) +1/+1 +1/+1 +1/+1 +1/+1 - (4) - 0/ 0 - -1/ 0 - (5) 0/-1 0/ 0 -1/-1 0/ 0 - (6) - 0/-1 - 0/-1 - =========================================== - Result 1/ 1 1/ 1 1/ 1 1/ 1 - - In any cases, charges to this page should be 1/ 1. - 4.2 Swap-out. At swap-out, typical state transition is below. @@ -136,28 +74,20 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. swp_entry's refcnt -= 1. - At (b), the page is marked as SwapCache and not uncharged. - At (d), the page is removed from SwapCache and a charge in page_cgroup - is moved to swap_cgroup. - Finally, at task exit, (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. - Here, a charge in swap_cgroup disappears. 5. Page Cache Page Cache is charged at - add_to_page_cache_locked(). - uncharged at - - __remove_from_page_cache(). - The logic is very clear. (About migration, see below) Note: __remove_from_page_cache() is called by remove_from_page_cache() and __remove_mapping(). 6. Shmem(tmpfs) Page Cache - Memcg's charge/uncharge have special handlers of shmem. The best way - to understand shmem's page state transition is to read mm/shmem.c. + The best way to understand shmem's page state transition is to read + mm/shmem.c. But brief explanation of the behavior of memcg around shmem will be helpful to understand the logic. @@ -170,56 +100,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. It's charged when... - A new page is added to shmem's radix-tree. - A swp page is read. (move a charge from swap_cgroup to page_cgroup) - It's uncharged when - - A page is removed from radix-tree and not SwapCache. - - When SwapCache is removed, a charge is moved to swap_cgroup. - - When swp_entry's refcnt goes down to 0, a charge in swap_cgroup - disappears. 7. Page Migration - One of the most complicated functions is page-migration-handler. - Memcg has 2 routines. Assume that we are migrating a page's contents - from OLDPAGE to NEWPAGE. - - Usual migration logic is.. - (a) remove the page from LRU. - (b) allocate NEWPAGE (migration target) - (c) lock by lock_page(). - (d) unmap all mappings. - (e-1) If necessary, replace entry in radix-tree. - (e-2) move contents of a page. - (f) map all mappings again. - (g) pushback the page to LRU. - (-) OLDPAGE will be freed. - - Before (g), memcg should complete all necessary charge/uncharge to - NEWPAGE/OLDPAGE. - - The point is.... - - If OLDPAGE is anonymous, all charges will be dropped at (d) because - try_to_unmap() drops all mapcount and the page will not be - SwapCache. - - - If OLDPAGE is SwapCache, charges will be kept at (g) because - __delete_from_swap_cache() isn't called at (e-1) - - - If OLDPAGE is page-cache, charges will be kept at (g) because - remove_from_swap_cache() isn't called at (e-1) - - memcg provides following hooks. - - - mem_cgroup_prepare_migration(OLDPAGE) - Called after (b) to account a charge (usage += PAGE_SIZE) against - memcg which OLDPAGE belongs to. - - - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) - Called after (f) before (g). - If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already - charged, a charge by prepare_migration() is automatically canceled. - If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. - - But zap_pte() (by exit or munmap) can be called while migration, - we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). + + mem_cgroup_migrate() 8. LRU Each memcg has its own private LRU. Now, its handling is under global diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1a9a096..806b8fa1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -60,15 +60,17 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare); void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); -struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); -struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); +void mem_cgroup_uncharge(struct page *page); + +/* Batched uncharging */ +void mem_cgroup_uncharge_start(void); +void mem_cgroup_uncharge_end(void); -/* For coalescing uncharge for reducing memcg' overhead*/ -extern void mem_cgroup_uncharge_start(void); -extern void mem_cgroup_uncharge_end(void); +void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, + bool lrucare); -extern void mem_cgroup_uncharge_page(struct page *page); -extern void mem_cgroup_uncharge_cache_page(struct page *page); +struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); +struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); bool __mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg, struct mem_cgroup *memcg); @@ -96,12 +98,6 @@ bool mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *memcg) extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg); -extern void -mem_cgroup_prepare_migration(struct page *page, struct page *newpage, - struct mem_cgroup **memcgp); -extern void mem_cgroup_end_migration(struct mem_cgroup *memcg, - struct page *oldpage, struct page *newpage, bool migration_ok); - struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, struct mem_cgroup *, struct mem_cgroup_reclaim_cookie *); @@ -116,8 +112,6 @@ unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list); void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int); extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p); -extern void mem_cgroup_replace_page_cache(struct page *oldpage, - struct page *newpage); static inline void mem_cgroup_oom_enable(void) { @@ -235,19 +229,21 @@ static inline void mem_cgroup_cancel_charge(struct page *page, { } -static inline void mem_cgroup_uncharge_start(void) +static inline void mem_cgroup_uncharge(struct page *page) { } -static inline void mem_cgroup_uncharge_end(void) +static inline void mem_cgroup_uncharge_start(void) { } -static inline void mem_cgroup_uncharge_page(struct page *page) +static inline void mem_cgroup_uncharge_end(void) { } -static inline void mem_cgroup_uncharge_cache_page(struct page *page) +static inline void mem_cgroup_migrate(struct page *oldpage, + struct page *newpage, + bool lrucare) { } @@ -286,17 +282,6 @@ static inline struct cgroup_subsys_state return NULL; } -static inline void -mem_cgroup_prepare_migration(struct page *page, struct page *newpage, - struct mem_cgroup **memcgp) -{ -} - -static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg, - struct page *oldpage, struct page *newpage, bool migration_ok) -{ -} - static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, @@ -392,10 +377,6 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx) { } -static inline void mem_cgroup_replace_page_cache(struct page *oldpage, - struct page *newpage) -{ -} #endif /* CONFIG_MEMCG */ #if !defined(CONFIG_MEMCG) || !defined(CONFIG_DEBUG_VM) diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 777a524..9bfb8e6 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -3,9 +3,9 @@ enum { /* flags for mem_cgroup */ - PCG_LOCK, /* Lock for pc->mem_cgroup and following bits. */ - PCG_USED, /* this object is in use. */ - PCG_MIGRATION, /* under page migration */ + PCG_USED = 0x01, /* This page is charged to a memcg */ + PCG_MEM = 0x02, /* This page holds a memory charge */ + PCG_MEMSW = 0x04, /* This page holds a memory+swap charge */ __NR_PCG_FLAGS, }; @@ -44,42 +44,9 @@ static inline void __init page_cgroup_init(void) struct page_cgroup *lookup_page_cgroup(struct page *page); struct page *lookup_cgroup_page(struct page_cgroup *pc); -#define TESTPCGFLAG(uname, lname) \ -static inline int PageCgroup##uname(struct page_cgroup *pc) \ - { return test_bit(PCG_##lname, &pc->flags); } - -#define SETPCGFLAG(uname, lname) \ -static inline void SetPageCgroup##uname(struct page_cgroup *pc)\ - { set_bit(PCG_##lname, &pc->flags); } - -#define CLEARPCGFLAG(uname, lname) \ -static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ - { clear_bit(PCG_##lname, &pc->flags); } - -#define TESTCLEARPCGFLAG(uname, lname) \ -static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \ - { return test_and_clear_bit(PCG_##lname, &pc->flags); } - -TESTPCGFLAG(Used, USED) -CLEARPCGFLAG(Used, USED) -SETPCGFLAG(Used, USED) - -SETPCGFLAG(Migration, MIGRATION) -CLEARPCGFLAG(Migration, MIGRATION) -TESTPCGFLAG(Migration, MIGRATION) - -static inline void lock_page_cgroup(struct page_cgroup *pc) -{ - /* - * Don't take this lock in IRQ context. - * This lock is for pc->mem_cgroup, USED, MIGRATION - */ - bit_spin_lock(PCG_LOCK, &pc->flags); -} - -static inline void unlock_page_cgroup(struct page_cgroup *pc) +static inline int PageCgroupUsed(struct page_cgroup *pc) { - bit_spin_unlock(PCG_LOCK, &pc->flags); + return !!(pc->flags & PCG_USED); } #else /* CONFIG_MEMCG */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 46a649e..1b72060 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -381,9 +381,13 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif #ifdef CONFIG_MEMCG_SWAP -extern void mem_cgroup_uncharge_swap(swp_entry_t ent); +extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); +extern void mem_cgroup_uncharge_swap(swp_entry_t entry); #else -static inline void mem_cgroup_uncharge_swap(swp_entry_t ent) +static inline void mem_cgroup_swapout(struct page *page, swp_entry_t entry) +{ +} +static inline void mem_cgroup_uncharge_swap(swp_entry_t entry) { } #endif @@ -443,7 +447,7 @@ extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); -extern void swapcache_free(swp_entry_t, struct page *page); +extern void swapcache_free(swp_entry_t); extern int free_swap_and_cache(swp_entry_t); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); @@ -507,7 +511,7 @@ static inline void swap_free(swp_entry_t swp) { } -static inline void swapcache_free(swp_entry_t swp, struct page *page) +static inline void swapcache_free(swp_entry_t swp) { } diff --git a/mm/filemap.c b/mm/filemap.c index 349a40e..f501b56 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -234,7 +234,6 @@ void delete_from_page_cache(struct page *page) spin_lock_irq(&mapping->tree_lock); __delete_from_page_cache(page, NULL); spin_unlock_irq(&mapping->tree_lock); - mem_cgroup_uncharge_cache_page(page); if (freepage) freepage(page); @@ -490,8 +489,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) if (PageSwapBacked(new)) __inc_zone_page_state(new, NR_SHMEM); spin_unlock_irq(&mapping->tree_lock); - /* mem_cgroup codes must not be called under tree_lock */ - mem_cgroup_replace_page_cache(old, new); + mem_cgroup_migrate(old, new, true); radix_tree_preload_end(); if (freepage) freepage(old); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1cbe1e5..9106f1b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -754,9 +754,11 @@ static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_zone *mz, static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_zone *mz, struct mem_cgroup_tree_per_zone *mctz) { - spin_lock(&mctz->lock); + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); __mem_cgroup_remove_exceeded(mz, mctz); - spin_unlock(&mctz->lock); + spin_unlock_irqrestore(&mctz->lock, flags); } @@ -779,7 +781,9 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) * mem is over its softlimit. */ if (excess || mz->on_tree) { - spin_lock(&mctz->lock); + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); /* if on-tree, remove it */ if (mz->on_tree) __mem_cgroup_remove_exceeded(mz, mctz); @@ -788,7 +792,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page) * If excess is 0, no tree ops. */ __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock(&mctz->lock); + spin_unlock_irqrestore(&mctz->lock, flags); } } } @@ -839,9 +843,9 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz) { struct mem_cgroup_per_zone *mz; - spin_lock(&mctz->lock); + spin_lock_irq(&mctz->lock); mz = __mem_cgroup_largest_soft_limit_node(mctz); - spin_unlock(&mctz->lock); + spin_unlock_irq(&mctz->lock); return mz; } @@ -882,13 +886,6 @@ static long mem_cgroup_read_stat(struct mem_cgroup *memcg, return val; } -static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg, - bool charge) -{ - int val = (charge) ? 1 : -1; - this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val); -} - static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, enum mem_cgroup_events_index idx) { @@ -909,13 +906,13 @@ static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, struct page *page, - bool anon, int nr_pages) + int nr_pages) { /* * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is * counted as CACHE even if it's on ANON LRU. */ - if (anon) + if (PageAnon(page)) __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS], nr_pages); else @@ -1013,7 +1010,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, */ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) { - preempt_disable(); /* threshold event is triggered in finer grain than soft limit */ if (unlikely(mem_cgroup_event_ratelimit(memcg, MEM_CGROUP_TARGET_THRESH))) { @@ -1026,8 +1022,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) do_numainfo = mem_cgroup_event_ratelimit(memcg, MEM_CGROUP_TARGET_NUMAINFO); #endif - preempt_enable(); - mem_cgroup_threshold(memcg); if (unlikely(do_softlimit)) mem_cgroup_update_tree(memcg, page); @@ -1035,8 +1029,7 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) if (unlikely(do_numainfo)) atomic_inc(&memcg->numainfo_events); #endif - } else - preempt_enable(); + } } struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) @@ -1347,20 +1340,6 @@ out: return lruvec; } -/* - * Following LRU functions are allowed to be used without PCG_LOCK. - * Operations are called by routine of global LRU independently from memcg. - * What we have to take care of here is validness of pc->mem_cgroup. - * - * Changes to pc->mem_cgroup happens when - * 1. charge - * 2. moving account - * In typical case, "charge" is done before add-to-lru. Exception is SwapCache. - * It is added to LRU before charge. - * If PCG_USED bit is not set, page_cgroup is not added to this private LRU. - * When moving account, the page is not on LRU. It's isolated. - */ - /** * mem_cgroup_page_lruvec - return lruvec for adding an lru page * @page: the page @@ -2261,22 +2240,14 @@ cleanup: * * Notes: Race condition * - * We usually use lock_page_cgroup() for accessing page_cgroup member but - * it tends to be costly. But considering some conditions, we doesn't need - * to do so _always_. - * - * Considering "charge", lock_page_cgroup() is not required because all - * file-stat operations happen after a page is attached to radix-tree. There - * are no race with "charge". + * Charging occurs during page instantiation, while the page is + * unmapped and locked in page migration, or while the page table is + * locked in THP migration. No race is possible. * - * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup - * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even - * if there are race with "uncharge". Statistics itself is properly handled - * by flags. + * Uncharge happens to pages with zero references, no race possible. * - * Considering "move", this is an only case we see a race. To make the race - * small, we check memcg->moving_account and detect there are possibility - * of race or not. If there is, we take a lock. + * Charge moving between groups is protected by checking mm->moving + * account and taking the move_lock in the slowpath. */ void __mem_cgroup_begin_update_page_stat(struct page *page, @@ -2689,6 +2660,16 @@ static struct mem_cgroup *mem_cgroup_lookup(unsigned short id) return mem_cgroup_from_id(id); } +/* + * try_get_mem_cgroup_from_page - look up page's memcg association + * @page: the page + * + * Look up, get a css reference, and return the memcg that owns @page. + * + * The page must be locked to prevent racing with swap-in and page + * cache charges. If coming from an unlocked page table, the caller + * must ensure the page is on the LRU or this can race with charging. + */ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) { struct mem_cgroup *memcg = NULL; @@ -2699,7 +2680,6 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) VM_BUG_ON_PAGE(!PageLocked(page), page); pc = lookup_page_cgroup(page); - lock_page_cgroup(pc); if (PageCgroupUsed(pc)) { memcg = pc->mem_cgroup; if (memcg && !css_tryget_online(&memcg->css)) @@ -2713,19 +2693,46 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) memcg = NULL; rcu_read_unlock(); } - unlock_page_cgroup(pc); return memcg; } +static void lock_page_lru(struct page *page, int *isolated) +{ + struct zone *zone = page_zone(page); + + spin_lock_irq(&zone->lru_lock); + if (PageLRU(page)) { + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, zone); + ClearPageLRU(page); + del_page_from_lru_list(page, lruvec, page_lru(page)); + *isolated = 1; + } else + *isolated = 0; +} + +static void unlock_page_lru(struct page *page, int isolated) +{ + struct zone *zone = page_zone(page); + + if (isolated) { + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, zone); + VM_BUG_ON_PAGE(PageLRU(page), page); + SetPageLRU(page); + add_page_to_lru_list(page, lruvec, page_lru(page)); + } + spin_unlock_irq(&zone->lru_lock); +} + static void commit_charge(struct page *page, struct mem_cgroup *memcg, - unsigned int nr_pages, bool anon, bool lrucare) + unsigned int nr_pages, bool lrucare) { struct page_cgroup *pc = lookup_page_cgroup(page); - struct zone *uninitialized_var(zone); - struct lruvec *lruvec; - bool was_on_lru = false; + int isolated; - lock_page_cgroup(pc); VM_BUG_ON_PAGE(PageCgroupUsed(pc), page); /* * we don't need page_cgroup_lock about tail pages, becase they are not @@ -2736,39 +2743,38 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, * In some cases, SwapCache and FUSE(splice_buf->radixtree), the page * may already be on some other mem_cgroup's LRU. Take care of it. */ - if (lrucare) { - zone = page_zone(page); - spin_lock_irq(&zone->lru_lock); - if (PageLRU(page)) { - lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - was_on_lru = true; - } - } + if (lrucare) + lock_page_lru(page, &isolated); + /* + * Nobody should be changing or seriously looking at + * pc->mem_cgroup and pc->flags at this point: + * + * - the page is uncharged + * + * - the page is off-LRU + * + * - an anonymous fault has exclusive page access, except for + * a locked page table + * + * - a page cache insertion, a swapin fault, or a migration + * have the page locked + */ pc->mem_cgroup = memcg; - SetPageCgroupUsed(pc); - - if (lrucare) { - if (was_on_lru) { - lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup); - VM_BUG_ON_PAGE(PageLRU(page), page); - SetPageLRU(page); - add_page_to_lru_list(page, lruvec, page_lru(page)); - } - spin_unlock_irq(&zone->lru_lock); - } + pc->flags = PCG_USED | PCG_MEM | (do_swap_account ? PCG_MEMSW : 0); - mem_cgroup_charge_statistics(memcg, page, anon, nr_pages); - unlock_page_cgroup(pc); + if (lrucare) + unlock_page_lru(page, isolated); + local_irq_disable(); + mem_cgroup_charge_statistics(memcg, page, nr_pages); /* * "charge_statistics" updated event counter. Then, check it. * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree. * if they exceeds softlimit. */ memcg_check_events(memcg, page); + local_irq_enable(); } static DEFINE_MUTEX(set_limit_mutex); @@ -3395,7 +3401,6 @@ static inline void memcg_unregister_all_caches(struct mem_cgroup *memcg) #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define PCGF_NOCOPY_AT_SPLIT (1 << PCG_LOCK | 1 << PCG_MIGRATION) /* * Because tail pages are not marked as "used", set it. We're under * zone->lru_lock, 'splitting on pmd' and compound_lock. @@ -3416,7 +3421,7 @@ void mem_cgroup_split_huge_fixup(struct page *head) for (i = 1; i < HPAGE_PMD_NR; i++) { pc = head_pc + i; pc->mem_cgroup = memcg; - pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT; + pc->flags = head_pc->flags; } __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS_HUGE], HPAGE_PMD_NR); @@ -3446,7 +3451,6 @@ static int mem_cgroup_move_account(struct page *page, { unsigned long flags; int ret; - bool anon = PageAnon(page); VM_BUG_ON(from == to); VM_BUG_ON_PAGE(PageLRU(page), page); @@ -3460,15 +3464,21 @@ static int mem_cgroup_move_account(struct page *page, if (nr_pages > 1 && !PageTransHuge(page)) goto out; - lock_page_cgroup(pc); + /* + * Prevent mem_cgroup_migrate() from looking at pc->mem_cgroup + * of its source page while we change it: page migration takes + * both pages off the LRU, but page cache replacement doesn't. + */ + if (!trylock_page(page)) + goto out; ret = -EINVAL; if (!PageCgroupUsed(pc) || pc->mem_cgroup != from) - goto unlock; + goto out_unlock; move_lock_mem_cgroup(from, &flags); - if (!anon && page_mapped(page)) { + if (!PageAnon(page) && page_mapped(page)) { __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], nr_pages); __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], @@ -3482,20 +3492,25 @@ static int mem_cgroup_move_account(struct page *page, nr_pages); } - mem_cgroup_charge_statistics(from, page, anon, -nr_pages); + /* + * It is safe to change pc->mem_cgroup here because the page + * is referenced, charged, and isolated - we can't race with + * uncharging, charging, migration, or LRU putback. + */ /* caller should have done css_get */ pc->mem_cgroup = to; - mem_cgroup_charge_statistics(to, page, anon, nr_pages); move_unlock_mem_cgroup(from, &flags); ret = 0; -unlock: - unlock_page_cgroup(pc); - /* - * check events - */ + + local_irq_disable(); + mem_cgroup_charge_statistics(to, page, nr_pages); memcg_check_events(to, page); + mem_cgroup_charge_statistics(from, page, -nr_pages); memcg_check_events(from, page); + local_irq_enable(); +out_unlock: + unlock_page(page); out: return ret; } @@ -3566,193 +3581,6 @@ out: return ret; } -static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg, - unsigned int nr_pages, - const enum charge_type ctype) -{ - struct memcg_batch_info *batch = NULL; - bool uncharge_memsw = true; - - /* If swapout, usage of swap doesn't decrease */ - if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) - uncharge_memsw = false; - - batch = ¤t->memcg_batch; - /* - * In usual, we do css_get() when we remember memcg pointer. - * But in this case, we keep res->usage until end of a series of - * uncharges. Then, it's ok to ignore memcg's refcnt. - */ - if (!batch->memcg) - batch->memcg = memcg; - /* - * do_batch > 0 when unmapping pages or inode invalidate/truncate. - * In those cases, all pages freed continuously can be expected to be in - * the same cgroup and we have chance to coalesce uncharges. - * But we do uncharge one by one if this is killed by OOM(TIF_MEMDIE) - * because we want to do uncharge as soon as possible. - */ - - if (!batch->do_batch || test_thread_flag(TIF_MEMDIE)) - goto direct_uncharge; - - if (nr_pages > 1) - goto direct_uncharge; - - /* - * In typical case, batch->memcg == mem. This means we can - * merge a series of uncharges to an uncharge of res_counter. - * If not, we uncharge res_counter ony by one. - */ - if (batch->memcg != memcg) - goto direct_uncharge; - /* remember freed charge and uncharge it later */ - batch->nr_pages++; - if (uncharge_memsw) - batch->memsw_nr_pages++; - return; -direct_uncharge: - res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE); - if (uncharge_memsw) - res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE); - if (unlikely(batch->memcg != memcg)) - memcg_oom_recover(memcg); -} - -/* - * uncharge if !page_mapped(page) - */ -static struct mem_cgroup * -__mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype, - bool end_migration) -{ - struct mem_cgroup *memcg = NULL; - unsigned int nr_pages = 1; - struct page_cgroup *pc; - bool anon; - - if (mem_cgroup_disabled()) - return NULL; - - if (PageTransHuge(page)) { - nr_pages <<= compound_order(page); - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - } - /* - * Check if our page_cgroup is valid - */ - pc = lookup_page_cgroup(page); - if (unlikely(!PageCgroupUsed(pc))) - return NULL; - - lock_page_cgroup(pc); - - memcg = pc->mem_cgroup; - - if (!PageCgroupUsed(pc)) - goto unlock_out; - - anon = PageAnon(page); - - switch (ctype) { - case MEM_CGROUP_CHARGE_TYPE_ANON: - /* - * Generally PageAnon tells if it's the anon statistics to be - * updated; but sometimes e.g. mem_cgroup_uncharge_page() is - * used before page reached the stage of being marked PageAnon. - */ - anon = true; - /* fallthrough */ - case MEM_CGROUP_CHARGE_TYPE_DROP: - /* See mem_cgroup_prepare_migration() */ - if (page_mapped(page)) - goto unlock_out; - /* - * Pages under migration may not be uncharged. But - * end_migration() /must/ be the one uncharging the - * unused post-migration page and so it has to call - * here with the migration bit still set. See the - * res_counter handling below. - */ - if (!end_migration && PageCgroupMigration(pc)) - goto unlock_out; - break; - case MEM_CGROUP_CHARGE_TYPE_SWAPOUT: - if (!PageAnon(page)) { /* Shared memory */ - if (page->mapping && !page_is_file_cache(page)) - goto unlock_out; - } else if (page_mapped(page)) /* Anon */ - goto unlock_out; - break; - default: - break; - } - - mem_cgroup_charge_statistics(memcg, page, anon, -nr_pages); - - ClearPageCgroupUsed(pc); - /* - * pc->mem_cgroup is not cleared here. It will be accessed when it's - * freed from LRU. This is safe because uncharged page is expected not - * to be reused (freed soon). Exception is SwapCache, it's handled by - * special functions. - */ - - unlock_page_cgroup(pc); - /* - * even after unlock, we have memcg->res.usage here and this memcg - * will never be freed, so it's safe to call css_get(). - */ - memcg_check_events(memcg, page); - if (do_swap_account && ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) { - mem_cgroup_swap_statistics(memcg, true); - css_get(&memcg->css); - } - /* - * Migration does not charge the res_counter for the - * replacement page, so leave it alone when phasing out the - * page that is unused after the migration. - */ - if (!end_migration) - mem_cgroup_do_uncharge(memcg, nr_pages, ctype); - - return memcg; - -unlock_out: - unlock_page_cgroup(pc); - return NULL; -} - -void mem_cgroup_uncharge_page(struct page *page) -{ - /* early check. */ - if (page_mapped(page)) - return; - VM_BUG_ON_PAGE(page->mapping && !PageAnon(page), page); - /* - * If the page is in swap cache, uncharge should be deferred - * to the swap path, which also properly accounts swap usage - * and handles memcg lifetime. - * - * Note that this check is not stable and reclaim may add the - * page to swap cache at any time after this. However, if the - * page is not in swap cache by the time page->mapcount hits - * 0, there won't be any page table references to the swap - * slot, and reclaim will free it and not actually write the - * page to disk. - */ - if (PageSwapCache(page)) - return; - __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_ANON, false); -} - -void mem_cgroup_uncharge_cache_page(struct page *page) -{ - VM_BUG_ON_PAGE(page_mapped(page), page); - VM_BUG_ON_PAGE(page->mapping, page); - __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE, false); -} - /* * Batch_start/batch_end is called in unmap_page_range/invlidate/trucate. * In that cases, pages are freed continuously and we can expect pages @@ -3763,6 +3591,9 @@ void mem_cgroup_uncharge_cache_page(struct page *page) void mem_cgroup_uncharge_start(void) { + unsigned long flags; + + local_irq_save(flags); current->memcg_batch.do_batch++; /* We can do nest. */ if (current->memcg_batch.do_batch == 1) { @@ -3770,21 +3601,18 @@ void mem_cgroup_uncharge_start(void) current->memcg_batch.nr_pages = 0; current->memcg_batch.memsw_nr_pages = 0; } + local_irq_restore(flags); } void mem_cgroup_uncharge_end(void) { struct memcg_batch_info *batch = ¤t->memcg_batch; + unsigned long flags; - if (!batch->do_batch) - return; - - batch->do_batch--; - if (batch->do_batch) /* If stacked, do nothing. */ - return; - - if (!batch->memcg) - return; + local_irq_save(flags); + VM_BUG_ON(!batch->do_batch); + if (--batch->do_batch) /* If stacked, do nothing */ + goto out; /* * This "batch->memcg" is valid without any css_get/put etc... * bacause we hide charges behind us. @@ -3796,61 +3624,16 @@ void mem_cgroup_uncharge_end(void) res_counter_uncharge(&batch->memcg->memsw, batch->memsw_nr_pages * PAGE_SIZE); memcg_oom_recover(batch->memcg); - /* forget this pointer (for sanity check) */ - batch->memcg = NULL; -} - -#ifdef CONFIG_SWAP -/* - * called after __delete_from_swap_cache() and drop "page" account. - * memcg information is recorded to swap_cgroup of "ent" - */ -void -mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) -{ - struct mem_cgroup *memcg; - int ctype = MEM_CGROUP_CHARGE_TYPE_SWAPOUT; - - if (!swapout) /* this was a swap cache but the swap is unused ! */ - ctype = MEM_CGROUP_CHARGE_TYPE_DROP; - - memcg = __mem_cgroup_uncharge_common(page, ctype, false); - - /* - * record memcg information, if swapout && memcg != NULL, - * css_get() was called in uncharge(). - */ - if (do_swap_account && swapout && memcg) - swap_cgroup_record(ent, mem_cgroup_id(memcg)); +out: + local_irq_restore(flags); } -#endif #ifdef CONFIG_MEMCG_SWAP -/* - * called from swap_entry_free(). remove record in swap_cgroup and - * uncharge "memsw" account. - */ -void mem_cgroup_uncharge_swap(swp_entry_t ent) +static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg, + bool charge) { - struct mem_cgroup *memcg; - unsigned short id; - - if (!do_swap_account) - return; - - id = swap_cgroup_record(ent, 0); - rcu_read_lock(); - memcg = mem_cgroup_lookup(id); - if (memcg) { - /* - * We uncharge this because swap is freed. This memcg can - * be obsolete one. We avoid calling css_tryget_online(). - */ - res_counter_uncharge(&memcg->memsw, PAGE_SIZE); - mem_cgroup_swap_statistics(memcg, false); - css_put(&memcg->css); - } - rcu_read_unlock(); + int val = (charge) ? 1 : -1; + this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val); } /** @@ -3902,169 +3685,6 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, } #endif -/* - * Before starting migration, account PAGE_SIZE to mem_cgroup that the old - * page belongs to. - */ -void mem_cgroup_prepare_migration(struct page *page, struct page *newpage, - struct mem_cgroup **memcgp) -{ - struct mem_cgroup *memcg = NULL; - unsigned int nr_pages = 1; - struct page_cgroup *pc; - - *memcgp = NULL; - - if (mem_cgroup_disabled()) - return; - - if (PageTransHuge(page)) - nr_pages <<= compound_order(page); - - pc = lookup_page_cgroup(page); - lock_page_cgroup(pc); - if (PageCgroupUsed(pc)) { - memcg = pc->mem_cgroup; - css_get(&memcg->css); - /* - * At migrating an anonymous page, its mapcount goes down - * to 0 and uncharge() will be called. But, even if it's fully - * unmapped, migration may fail and this page has to be - * charged again. We set MIGRATION flag here and delay uncharge - * until end_migration() is called - * - * Corner Case Thinking - * A) - * When the old page was mapped as Anon and it's unmap-and-freed - * while migration was ongoing. - * If unmap finds the old page, uncharge() of it will be delayed - * until end_migration(). If unmap finds a new page, it's - * uncharged when it make mapcount to be 1->0. If unmap code - * finds swap_migration_entry, the new page will not be mapped - * and end_migration() will find it(mapcount==0). - * - * B) - * When the old page was mapped but migraion fails, the kernel - * remaps it. A charge for it is kept by MIGRATION flag even - * if mapcount goes down to 0. We can do remap successfully - * without charging it again. - * - * C) - * The "old" page is under lock_page() until the end of - * migration, so, the old page itself will not be swapped-out. - * If the new page is swapped out before end_migraton, our - * hook to usual swap-out path will catch the event. - */ - if (PageAnon(page)) - SetPageCgroupMigration(pc); - } - unlock_page_cgroup(pc); - /* - * If the page is not charged at this point, - * we return here. - */ - if (!memcg) - return; - - *memcgp = memcg; - /* - * We charge new page before it's used/mapped. So, even if unlock_page() - * is called before end_migration, we can catch all events on this new - * page. In the case new page is migrated but not remapped, new page's - * mapcount will be finally 0 and we call uncharge in end_migration(). - */ - /* - * The page is committed to the memcg, but it's not actually - * charged to the res_counter since we plan on replacing the - * old one and only one page is going to be left afterwards. - */ - commit_charge(newpage, memcg, nr_pages, PageAnon(page), false); -} - -/* remove redundant charge if migration failed*/ -void mem_cgroup_end_migration(struct mem_cgroup *memcg, - struct page *oldpage, struct page *newpage, bool migration_ok) -{ - struct page *used, *unused; - struct page_cgroup *pc; - bool anon; - - if (!memcg) - return; - - if (!migration_ok) { - used = oldpage; - unused = newpage; - } else { - used = newpage; - unused = oldpage; - } - anon = PageAnon(used); - __mem_cgroup_uncharge_common(unused, - anon ? MEM_CGROUP_CHARGE_TYPE_ANON - : MEM_CGROUP_CHARGE_TYPE_CACHE, - true); - css_put(&memcg->css); - /* - * We disallowed uncharge of pages under migration because mapcount - * of the page goes down to zero, temporarly. - * Clear the flag and check the page should be charged. - */ - pc = lookup_page_cgroup(oldpage); - lock_page_cgroup(pc); - ClearPageCgroupMigration(pc); - unlock_page_cgroup(pc); - - /* - * If a page is a file cache, radix-tree replacement is very atomic - * and we can skip this check. When it was an Anon page, its mapcount - * goes down to 0. But because we added MIGRATION flage, it's not - * uncharged yet. There are several case but page->mapcount check - * and USED bit check in mem_cgroup_uncharge_page() will do enough - * check. (see prepare_charge() also) - */ - if (anon) - mem_cgroup_uncharge_page(used); -} - -/* - * At replace page cache, newpage is not under any memcg but it's on - * LRU. So, this function doesn't touch res_counter but handles LRU - * in correct way. Both pages are locked so we cannot race with uncharge. - */ -void mem_cgroup_replace_page_cache(struct page *oldpage, - struct page *newpage) -{ - struct mem_cgroup *memcg = NULL; - struct page_cgroup *pc; - - if (mem_cgroup_disabled()) - return; - - pc = lookup_page_cgroup(oldpage); - /* fix accounting on old pages */ - lock_page_cgroup(pc); - if (PageCgroupUsed(pc)) { - memcg = pc->mem_cgroup; - mem_cgroup_charge_statistics(memcg, oldpage, false, -1); - ClearPageCgroupUsed(pc); - } - unlock_page_cgroup(pc); - - /* - * When called from shmem_replace_page(), in some cases the - * oldpage has already been charged, and in some cases not. - */ - if (!memcg) - return; - /* - * Even if newpage->mapping was NULL before starting replacement, - * the newpage may be on LRU(or pagevec for LRU) already. We lock - * LRU while we overwrite pc->mem_cgroup. - */ - commit_charge(newpage, memcg, 1, false, true); -} - #ifdef CONFIG_DEBUG_VM static struct page_cgroup *lookup_page_cgroup_used(struct page *page) { @@ -4263,7 +3883,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order, gfp_mask, &nr_scanned); nr_reclaimed += reclaimed; *total_scanned += nr_scanned; - spin_lock(&mctz->lock); + spin_lock_irq(&mctz->lock); /* * If we failed to reclaim anything from this memory cgroup @@ -4303,7 +3923,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order, */ /* If excess == 0, no tree ops */ __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock(&mctz->lock); + spin_unlock_irq(&mctz->lock); css_put(&mz->memcg->css); loop++; /* @@ -6265,9 +5885,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, if (page) { pc = lookup_page_cgroup(page); /* - * Do only loose check w/o page_cgroup lock. - * mem_cgroup_move_account() checks the pc is valid or not under - * the lock. + * Do only loose check w/o serialization. + * mem_cgroup_move_account() checks the pc is valid or + * not under LRU exclusion. */ if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; @@ -6729,6 +6349,67 @@ static void __init enable_swap_cgroup(void) } #endif +#ifdef CONFIG_MEMCG_SWAP +/** + * mem_cgroup_swapout - transfer a memsw charge to swap + * @page: page whose memsw charge to transfer + * @entry: swap entry to move the charge to + * + * Transfer the memsw charge of @page to @entry. + */ +void mem_cgroup_swapout(struct page *page, swp_entry_t entry) +{ + struct page_cgroup *pc; + unsigned short oldid; + + VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(page_count(page), page); + + if (!do_swap_account) + return; + + pc = lookup_page_cgroup(page); + + /* Readahead page, never charged */ + if (!PageCgroupUsed(pc)) + return; + + VM_BUG_ON_PAGE(!(pc->flags & PCG_MEMSW), page); + + oldid = swap_cgroup_record(entry, mem_cgroup_id(pc->mem_cgroup)); + VM_BUG_ON_PAGE(oldid, page); + + pc->flags &= ~PCG_MEMSW; + css_get(&pc->mem_cgroup->css); + mem_cgroup_swap_statistics(pc->mem_cgroup, true); +} + +/** + * mem_cgroup_uncharge_swap - uncharge a swap entry + * @entry: swap entry to uncharge + * + * Drop the memsw charge associated with @entry. + */ +void mem_cgroup_uncharge_swap(swp_entry_t entry) +{ + struct mem_cgroup *memcg; + unsigned short id; + + if (!do_swap_account) + return; + + id = swap_cgroup_record(entry, 0); + rcu_read_lock(); + memcg = mem_cgroup_lookup(id); + if (memcg) { + res_counter_uncharge(&memcg->memsw, PAGE_SIZE); + mem_cgroup_swap_statistics(memcg, false); + css_put(&memcg->css); + } + rcu_read_unlock(); +} +#endif + /** * mem_cgroup_try_charge - try charging a page * @page: page to charge @@ -6831,7 +6512,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, VM_BUG_ON_PAGE(!PageTransHuge(page), page); } - commit_charge(page, memcg, nr_pages, PageAnon(page), lrucare); + commit_charge(page, memcg, nr_pages, lrucare); if (do_swap_account && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; @@ -6873,6 +6554,139 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg) cancel_charge(memcg, nr_pages); } +/** + * mem_cgroup_uncharge - uncharge a page + * @page: page to uncharge + * + * Uncharge a page previously charged with mem_cgroup_try_charge() and + * mem_cgroup_commit_charge(). + */ +void mem_cgroup_uncharge(struct page *page) +{ + struct memcg_batch_info *batch; + unsigned int nr_pages = 1; + struct mem_cgroup *memcg; + struct page_cgroup *pc; + unsigned long pc_flags; + unsigned long flags; + + VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(page_count(page), page); + + if (mem_cgroup_disabled()) + return; + + pc = lookup_page_cgroup(page); + + /* Every final put_page() ends up here */ + if (!PageCgroupUsed(pc)) + return; + + if (PageTransHuge(page)) { + nr_pages <<= compound_order(page); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + } + /* + * Nobody should be changing or seriously looking at + * pc->mem_cgroup and pc->flags at this point, we have fully + * exclusive access to the page. + */ + memcg = pc->mem_cgroup; + pc_flags = pc->flags; + pc->flags = 0; + + local_irq_save(flags); + + if (nr_pages > 1) + goto direct; + if (unlikely(test_thread_flag(TIF_MEMDIE))) + goto direct; + batch = ¤t->memcg_batch; + if (!batch->do_batch) + goto direct; + if (batch->memcg && batch->memcg != memcg) + goto direct; + if (!batch->memcg) + batch->memcg = memcg; + if (pc_flags & PCG_MEM) + batch->nr_pages++; + if (pc_flags & PCG_MEMSW) + batch->memsw_nr_pages++; + goto out; +direct: + if (pc_flags & PCG_MEM) + res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE); + if (pc_flags & PCG_MEMSW) + res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE); + memcg_oom_recover(memcg); +out: + mem_cgroup_charge_statistics(memcg, page, -nr_pages); + memcg_check_events(memcg, page); + + local_irq_restore(flags); +} + +/** + * mem_cgroup_migrate - migrate a charge to another page + * @oldpage: currently charged page + * @newpage: page to transfer the charge to + * @lrucare: both pages might be on the LRU already + * + * Migrate the charge from @oldpage to @newpage. + * + * Both pages must be locked, @newpage->mapping must be set up. + */ +void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, + bool lrucare) +{ + unsigned int nr_pages = 1; + struct page_cgroup *pc; + int isolated; + + VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage); + VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); + VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage); + VM_BUG_ON_PAGE(!lrucare && PageLRU(newpage), newpage); + VM_BUG_ON_PAGE(PageAnon(oldpage) != PageAnon(newpage), newpage); + + if (mem_cgroup_disabled()) + return; + + /* Page cache replacement: new page already charged? */ + pc = lookup_page_cgroup(newpage); + if (PageCgroupUsed(pc)) + return; + + /* Re-entrant migration: old page already uncharged? */ + pc = lookup_page_cgroup(oldpage); + if (!PageCgroupUsed(pc)) + return; + + VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage); + VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage); + + if (PageTransHuge(oldpage)) { + nr_pages <<= compound_order(oldpage); + VM_BUG_ON_PAGE(!PageTransHuge(oldpage), oldpage); + VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage); + } + + if (lrucare) + lock_page_lru(oldpage, &isolated); + + pc->flags = 0; + + if (lrucare) + unlock_page_lru(oldpage, isolated); + + local_irq_disable(); + mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages); + memcg_check_events(pc->mem_cgroup, oldpage); + local_irq_enable(); + + commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare); +} + /* * subsys_initcall() for memory controller. * diff --git a/mm/memory.c b/mm/memory.c index 6d76487..2a899e4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1292,7 +1292,6 @@ static void unmap_page_range(struct mmu_gather *tlb, details = NULL; BUG_ON(addr >= end); - mem_cgroup_uncharge_start(); tlb_start_vma(tlb, vma); pgd = pgd_offset(vma->vm_mm, addr); do { @@ -1302,7 +1301,6 @@ static void unmap_page_range(struct mmu_gather *tlb, next = zap_pud_range(tlb, vma, pgd, addr, next, details); } while (pgd++, addr = next, addr != end); tlb_end_vma(tlb, vma); - mem_cgroup_uncharge_end(); } diff --git a/mm/migrate.c b/mm/migrate.c index be6dbf9..f78ec9b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -780,6 +780,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, if (rc != MIGRATEPAGE_SUCCESS) { newpage->mapping = NULL; } else { + mem_cgroup_migrate(page, newpage, false); if (remap_swapcache) remove_migration_ptes(page, newpage); page->mapping = NULL; @@ -795,7 +796,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage, { int rc = -EAGAIN; int remap_swapcache = 1; - struct mem_cgroup *mem; struct anon_vma *anon_vma = NULL; if (!trylock_page(page)) { @@ -821,9 +821,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage, lock_page(page); } - /* charge against new page */ - mem_cgroup_prepare_migration(page, newpage, &mem); - if (PageWriteback(page)) { /* * Only in the case of a full synchronous migration is it @@ -833,10 +830,10 @@ static int __unmap_and_move(struct page *page, struct page *newpage, */ if (mode != MIGRATE_SYNC) { rc = -EBUSY; - goto uncharge; + goto out_unlock; } if (!force) - goto uncharge; + goto out_unlock; wait_on_page_writeback(page); } /* @@ -872,7 +869,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, */ remap_swapcache = 0; } else { - goto uncharge; + goto out_unlock; } } @@ -885,7 +882,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * the page migration right away (proteced by page lock). */ rc = balloon_page_migrate(newpage, page, mode); - goto uncharge; + goto out_unlock; } /* @@ -904,7 +901,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, VM_BUG_ON_PAGE(PageAnon(page), page); if (page_has_private(page)) { try_to_free_buffers(page); - goto uncharge; + goto out_unlock; } goto skip_unmap; } @@ -923,10 +920,7 @@ skip_unmap: if (anon_vma) put_anon_vma(anon_vma); -uncharge: - mem_cgroup_end_migration(mem, page, newpage, - (rc == MIGRATEPAGE_SUCCESS || - rc == MIGRATEPAGE_BALLOON_SUCCESS)); +out_unlock: unlock_page(page); out: return rc; @@ -1786,7 +1780,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - struct mem_cgroup *memcg = NULL; int page_lru = page_is_file_cache(page); unsigned long mmun_start = address & HPAGE_PMD_MASK; unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; @@ -1852,15 +1845,6 @@ fail_putback: goto out_unlock; } - /* - * Traditional migration needs to prepare the memcg charge - * transaction early to prevent the old page from being - * uncharged when installing migration entries. Here we can - * save the potential rollback and start the charge transfer - * only when migration is already known to end successfully. - */ - mem_cgroup_prepare_migration(page, new_page, &memcg); - orig_entry = *pmd; entry = mk_pmd(new_page, vma->vm_page_prot); entry = pmd_mkhuge(entry); @@ -1888,14 +1872,10 @@ fail_putback: goto fail_putback; } + mem_cgroup_migrate(page, new_page, false); + page_remove_rmap(page); - /* - * Finish the charge transaction under the page table lock to - * prevent split_huge_page() from dividing up the charge - * before it's fully transferred to the new page. - */ - mem_cgroup_end_migration(memcg, page, new_page, true); spin_unlock(ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); diff --git a/mm/rmap.c b/mm/rmap.c index f56b5ed..3e8491c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1089,7 +1089,6 @@ void page_remove_rmap(struct page *page) if (unlikely(PageHuge(page))) goto out; if (anon) { - mem_cgroup_uncharge_page(page); if (PageTransHuge(page)) __dec_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); diff --git a/mm/shmem.c b/mm/shmem.c index 1f1a808..6dc80d2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -419,7 +419,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, pvec.pages, indices); if (!pvec.nr) break; - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -447,7 +446,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); cond_resched(); index++; } @@ -495,7 +493,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, index = start; continue; } - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -531,7 +528,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); index++; } @@ -835,7 +831,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) } mutex_unlock(&shmem_swaplist_mutex); - swapcache_free(swap, NULL); + swapcache_free(swap); redirty: set_page_dirty(page); if (wbc->for_reclaim) @@ -1008,7 +1004,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, */ oldpage = newpage; } else { - mem_cgroup_replace_page_cache(oldpage, newpage); + mem_cgroup_migrate(oldpage, newpage, false); lru_cache_add_anon(newpage); *pagep = newpage; } diff --git a/mm/swap.c b/mm/swap.c index 3baca70..00523ff 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -62,6 +62,7 @@ static void __page_cache_release(struct page *page) del_page_from_lru_list(page, lruvec, page_off_lru(page)); spin_unlock_irqrestore(&zone->lru_lock, flags); } + mem_cgroup_uncharge(page); } static void __put_single_page(struct page *page) @@ -907,6 +908,8 @@ void release_pages(struct page **pages, int nr, bool cold) struct lruvec *lruvec; unsigned long uninitialized_var(flags); + mem_cgroup_uncharge_start(); + for (i = 0; i < nr; i++) { struct page *page = pages[i]; @@ -938,6 +941,7 @@ void release_pages(struct page **pages, int nr, bool cold) __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); } + mem_cgroup_uncharge(page); /* Clear Active bit in case of parallel mark_page_accessed */ __ClearPageActive(page); @@ -947,6 +951,8 @@ void release_pages(struct page **pages, int nr, bool cold) if (zone) spin_unlock_irqrestore(&zone->lru_lock, flags); + mem_cgroup_uncharge_end(); + free_hot_cold_page_list(&pages_to_free, cold); } EXPORT_SYMBOL(release_pages); diff --git a/mm/swap_state.c b/mm/swap_state.c index 2972eee..e160151 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -176,7 +176,7 @@ int add_to_swap(struct page *page, struct list_head *list) if (unlikely(PageTransHuge(page))) if (unlikely(split_huge_page_to_list(page, list))) { - swapcache_free(entry, NULL); + swapcache_free(entry); return 0; } @@ -202,7 +202,7 @@ int add_to_swap(struct page *page, struct list_head *list) * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free(entry); return 0; } } @@ -225,7 +225,7 @@ void delete_from_swap_cache(struct page *page) __delete_from_swap_cache(page); spin_unlock_irq(&address_space->tree_lock); - swapcache_free(entry, page); + swapcache_free(entry); page_cache_release(page); } @@ -386,7 +386,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free(entry); } while (err != -ENOMEM); if (new_page) diff --git a/mm/swapfile.c b/mm/swapfile.c index 0883b49..8798b2e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -843,16 +843,13 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. */ -void swapcache_free(swp_entry_t entry, struct page *page) +void swapcache_free(swp_entry_t entry) { struct swap_info_struct *p; - unsigned char count; p = swap_info_get(entry); if (p) { - count = swap_entry_free(p, entry, SWAP_HAS_CACHE); - if (page) - mem_cgroup_uncharge_swapcache(page, entry, count != 0); + swap_entry_free(p, entry, SWAP_HAS_CACHE); spin_unlock(&p->lock); } } diff --git a/mm/truncate.c b/mm/truncate.c index eda2473..96d1673 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -281,7 +281,6 @@ void truncate_inode_pages_range(struct address_space *mapping, while (index < end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -307,7 +306,6 @@ void truncate_inode_pages_range(struct address_space *mapping, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); cond_resched(); index++; } @@ -369,7 +367,6 @@ void truncate_inode_pages_range(struct address_space *mapping, pagevec_release(&pvec); break; } - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -394,7 +391,6 @@ void truncate_inode_pages_range(struct address_space *mapping, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); index++; } cleancache_invalidate_inode(mapping); @@ -493,7 +489,6 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, indices)) { - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -522,7 +517,6 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); cond_resched(); index++; } @@ -553,7 +547,6 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page) BUG_ON(page_has_private(page)); __delete_from_page_cache(page, NULL); spin_unlock_irq(&mapping->tree_lock); - mem_cgroup_uncharge_cache_page(page); if (mapping->a_ops->freepage) mapping->a_ops->freepage(page); @@ -602,7 +595,6 @@ int invalidate_inode_pages2_range(struct address_space *mapping, while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, indices)) { - mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -655,7 +647,6 @@ int invalidate_inode_pages2_range(struct address_space *mapping, } pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - mem_cgroup_uncharge_end(); cond_resched(); index++; } diff --git a/mm/vmscan.c b/mm/vmscan.c index d2f65c8..7068e83 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -577,9 +577,10 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + mem_cgroup_swapout(page, swap); __delete_from_swap_cache(page); spin_unlock_irq(&mapping->tree_lock); - swapcache_free(swap, page); + swapcache_free(swap); } else { void (*freepage)(struct page *); void *shadow = NULL; @@ -600,7 +601,6 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, shadow = workingset_eviction(mapping, page); __delete_from_page_cache(page, shadow); spin_unlock_irq(&mapping->tree_lock); - mem_cgroup_uncharge_cache_page(page); if (freepage != NULL) freepage(page); @@ -1103,6 +1103,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ __clear_page_locked(page); free_it: + mem_cgroup_uncharge(page); nr_reclaimed++; /* @@ -1132,12 +1133,13 @@ keep: list_add(&page->lru, &ret_pages); VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } + mem_cgroup_uncharge_end(); free_hot_cold_page_list(&free_pages, true); list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); - mem_cgroup_uncharge_end(); + *ret_nr_dirty += nr_dirty; *ret_nr_congested += nr_congested; *ret_nr_unqueued_dirty += nr_unqueued_dirty; @@ -1435,6 +1437,8 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) __ClearPageActive(page); del_page_from_lru_list(page, lruvec, lru); + mem_cgroup_uncharge(page); + if (unlikely(PageCompound(page))) { spin_unlock_irq(&zone->lru_lock); (*get_compound_page_dtor(page))(page); @@ -1656,6 +1660,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec, __ClearPageActive(page); del_page_from_lru_list(page, lruvec, lru); + mem_cgroup_uncharge(page); + if (unlikely(PageCompound(page))) { spin_unlock_irq(&zone->lru_lock); (*get_compound_page_dtor(page))(page); diff --git a/mm/zswap.c b/mm/zswap.c index 032c21e..9da56af 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -507,7 +507,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free(entry); } while (err != -ENOMEM); if (new_page) -- cgit v0.10.2 From 747db954cab64c6b7a95b121b517165f34751898 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 8 Aug 2014 14:19:24 -0700 Subject: mm: memcontrol: use page lists for uncharge batching Pages are now uncharged at release time, and all sources of batched uncharges operate on lists of pages. Directly use those lists, and get rid of the per-task batching state. This also batches statistics accounting, in addition to the res counter charges, to reduce IRQ-disabling and re-enabling. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Naoya Horiguchi Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 806b8fa1..e0752d2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -59,12 +59,8 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare); void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); - void mem_cgroup_uncharge(struct page *page); - -/* Batched uncharging */ -void mem_cgroup_uncharge_start(void); -void mem_cgroup_uncharge_end(void); +void mem_cgroup_uncharge_list(struct list_head *page_list); void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, bool lrucare); @@ -233,11 +229,7 @@ static inline void mem_cgroup_uncharge(struct page *page) { } -static inline void mem_cgroup_uncharge_start(void) -{ -} - -static inline void mem_cgroup_uncharge_end(void) +static inline void mem_cgroup_uncharge_list(struct list_head *page_list) { } diff --git a/include/linux/sched.h b/include/linux/sched.h index 7c19d55..4fcf82a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1628,12 +1628,6 @@ struct task_struct { unsigned long trace_recursion; #endif /* CONFIG_TRACING */ #ifdef CONFIG_MEMCG /* memcg uses this to do batch job */ - struct memcg_batch_info { - int do_batch; /* incremented when batch uncharge started */ - struct mem_cgroup *memcg; /* target memcg of uncharge */ - unsigned long nr_pages; /* uncharged usage */ - unsigned long memsw_nr_pages; /* uncharged mem+swap usage */ - } memcg_batch; unsigned int memcg_kmem_skip_account; struct memcg_oom_info { struct mem_cgroup *memcg; diff --git a/kernel/fork.c b/kernel/fork.c index fbd3497..f6f5086 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1346,10 +1346,6 @@ static struct task_struct *copy_process(unsigned long clone_flags, #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif -#ifdef CONFIG_MEMCG - p->memcg_batch.do_batch = 0; - p->memcg_batch.memcg = NULL; -#endif #ifdef CONFIG_BCACHE p->sequential_io = 0; p->sequential_io_avg = 0; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9106f1b..a6e2be0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3581,53 +3581,6 @@ out: return ret; } -/* - * Batch_start/batch_end is called in unmap_page_range/invlidate/trucate. - * In that cases, pages are freed continuously and we can expect pages - * are in the same memcg. All these calls itself limits the number of - * pages freed at once, then uncharge_start/end() is called properly. - * This may be called prural(2) times in a context, - */ - -void mem_cgroup_uncharge_start(void) -{ - unsigned long flags; - - local_irq_save(flags); - current->memcg_batch.do_batch++; - /* We can do nest. */ - if (current->memcg_batch.do_batch == 1) { - current->memcg_batch.memcg = NULL; - current->memcg_batch.nr_pages = 0; - current->memcg_batch.memsw_nr_pages = 0; - } - local_irq_restore(flags); -} - -void mem_cgroup_uncharge_end(void) -{ - struct memcg_batch_info *batch = ¤t->memcg_batch; - unsigned long flags; - - local_irq_save(flags); - VM_BUG_ON(!batch->do_batch); - if (--batch->do_batch) /* If stacked, do nothing */ - goto out; - /* - * This "batch->memcg" is valid without any css_get/put etc... - * bacause we hide charges behind us. - */ - if (batch->nr_pages) - res_counter_uncharge(&batch->memcg->res, - batch->nr_pages * PAGE_SIZE); - if (batch->memsw_nr_pages) - res_counter_uncharge(&batch->memcg->memsw, - batch->memsw_nr_pages * PAGE_SIZE); - memcg_oom_recover(batch->memcg); -out: - local_irq_restore(flags); -} - #ifdef CONFIG_MEMCG_SWAP static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg, bool charge) @@ -6554,6 +6507,98 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg) cancel_charge(memcg, nr_pages); } +static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout, + unsigned long nr_mem, unsigned long nr_memsw, + unsigned long nr_anon, unsigned long nr_file, + unsigned long nr_huge, struct page *dummy_page) +{ + unsigned long flags; + + if (nr_mem) + res_counter_uncharge(&memcg->res, nr_mem * PAGE_SIZE); + if (nr_memsw) + res_counter_uncharge(&memcg->memsw, nr_memsw * PAGE_SIZE); + + memcg_oom_recover(memcg); + + local_irq_save(flags); + __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS], nr_anon); + __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_CACHE], nr_file); + __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS_HUGE], nr_huge); + __this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT], pgpgout); + __this_cpu_add(memcg->stat->nr_page_events, nr_anon + nr_file); + memcg_check_events(memcg, dummy_page); + local_irq_restore(flags); +} + +static void uncharge_list(struct list_head *page_list) +{ + struct mem_cgroup *memcg = NULL; + unsigned long nr_memsw = 0; + unsigned long nr_anon = 0; + unsigned long nr_file = 0; + unsigned long nr_huge = 0; + unsigned long pgpgout = 0; + unsigned long nr_mem = 0; + struct list_head *next; + struct page *page; + + next = page_list->next; + do { + unsigned int nr_pages = 1; + struct page_cgroup *pc; + + page = list_entry(next, struct page, lru); + next = page->lru.next; + + VM_BUG_ON_PAGE(PageLRU(page), page); + VM_BUG_ON_PAGE(page_count(page), page); + + pc = lookup_page_cgroup(page); + if (!PageCgroupUsed(pc)) + continue; + + /* + * Nobody should be changing or seriously looking at + * pc->mem_cgroup and pc->flags at this point, we have + * fully exclusive access to the page. + */ + + if (memcg != pc->mem_cgroup) { + if (memcg) { + uncharge_batch(memcg, pgpgout, nr_mem, nr_memsw, + nr_anon, nr_file, nr_huge, page); + pgpgout = nr_mem = nr_memsw = 0; + nr_anon = nr_file = nr_huge = 0; + } + memcg = pc->mem_cgroup; + } + + if (PageTransHuge(page)) { + nr_pages <<= compound_order(page); + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + nr_huge += nr_pages; + } + + if (PageAnon(page)) + nr_anon += nr_pages; + else + nr_file += nr_pages; + + if (pc->flags & PCG_MEM) + nr_mem += nr_pages; + if (pc->flags & PCG_MEMSW) + nr_memsw += nr_pages; + pc->flags = 0; + + pgpgout++; + } while (next != page_list); + + if (memcg) + uncharge_batch(memcg, pgpgout, nr_mem, nr_memsw, + nr_anon, nr_file, nr_huge, page); +} + /** * mem_cgroup_uncharge - uncharge a page * @page: page to uncharge @@ -6563,67 +6608,34 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg) */ void mem_cgroup_uncharge(struct page *page) { - struct memcg_batch_info *batch; - unsigned int nr_pages = 1; - struct mem_cgroup *memcg; struct page_cgroup *pc; - unsigned long pc_flags; - unsigned long flags; - - VM_BUG_ON_PAGE(PageLRU(page), page); - VM_BUG_ON_PAGE(page_count(page), page); if (mem_cgroup_disabled()) return; + /* Don't touch page->lru of any random page, pre-check: */ pc = lookup_page_cgroup(page); - - /* Every final put_page() ends up here */ if (!PageCgroupUsed(pc)) return; - if (PageTransHuge(page)) { - nr_pages <<= compound_order(page); - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - } - /* - * Nobody should be changing or seriously looking at - * pc->mem_cgroup and pc->flags at this point, we have fully - * exclusive access to the page. - */ - memcg = pc->mem_cgroup; - pc_flags = pc->flags; - pc->flags = 0; - - local_irq_save(flags); + INIT_LIST_HEAD(&page->lru); + uncharge_list(&page->lru); +} - if (nr_pages > 1) - goto direct; - if (unlikely(test_thread_flag(TIF_MEMDIE))) - goto direct; - batch = ¤t->memcg_batch; - if (!batch->do_batch) - goto direct; - if (batch->memcg && batch->memcg != memcg) - goto direct; - if (!batch->memcg) - batch->memcg = memcg; - if (pc_flags & PCG_MEM) - batch->nr_pages++; - if (pc_flags & PCG_MEMSW) - batch->memsw_nr_pages++; - goto out; -direct: - if (pc_flags & PCG_MEM) - res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE); - if (pc_flags & PCG_MEMSW) - res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE); - memcg_oom_recover(memcg); -out: - mem_cgroup_charge_statistics(memcg, page, -nr_pages); - memcg_check_events(memcg, page); +/** + * mem_cgroup_uncharge_list - uncharge a list of page + * @page_list: list of pages to uncharge + * + * Uncharge a list of pages previously charged with + * mem_cgroup_try_charge() and mem_cgroup_commit_charge(). + */ +void mem_cgroup_uncharge_list(struct list_head *page_list) +{ + if (mem_cgroup_disabled()) + return; - local_irq_restore(flags); + if (!list_empty(page_list)) + uncharge_list(page_list); } /** diff --git a/mm/swap.c b/mm/swap.c index 00523ff..6b2dc38 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -908,8 +908,6 @@ void release_pages(struct page **pages, int nr, bool cold) struct lruvec *lruvec; unsigned long uninitialized_var(flags); - mem_cgroup_uncharge_start(); - for (i = 0; i < nr; i++) { struct page *page = pages[i]; @@ -941,7 +939,6 @@ void release_pages(struct page **pages, int nr, bool cold) __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); } - mem_cgroup_uncharge(page); /* Clear Active bit in case of parallel mark_page_accessed */ __ClearPageActive(page); @@ -951,8 +948,7 @@ void release_pages(struct page **pages, int nr, bool cold) if (zone) spin_unlock_irqrestore(&zone->lru_lock, flags); - mem_cgroup_uncharge_end(); - + mem_cgroup_uncharge_list(&pages_to_free); free_hot_cold_page_list(&pages_to_free, cold); } EXPORT_SYMBOL(release_pages); diff --git a/mm/vmscan.c b/mm/vmscan.c index 7068e83..2836b53 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -822,7 +822,6 @@ static unsigned long shrink_page_list(struct list_head *page_list, cond_resched(); - mem_cgroup_uncharge_start(); while (!list_empty(page_list)) { struct address_space *mapping; struct page *page; @@ -1103,7 +1102,6 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ __clear_page_locked(page); free_it: - mem_cgroup_uncharge(page); nr_reclaimed++; /* @@ -1133,8 +1131,8 @@ keep: list_add(&page->lru, &ret_pages); VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); } - mem_cgroup_uncharge_end(); + mem_cgroup_uncharge_list(&free_pages); free_hot_cold_page_list(&free_pages, true); list_splice(&ret_pages, page_list); @@ -1437,10 +1435,9 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) __ClearPageActive(page); del_page_from_lru_list(page, lruvec, lru); - mem_cgroup_uncharge(page); - if (unlikely(PageCompound(page))) { spin_unlock_irq(&zone->lru_lock); + mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); spin_lock_irq(&zone->lru_lock); } else @@ -1548,6 +1545,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, spin_unlock_irq(&zone->lru_lock); + mem_cgroup_uncharge_list(&page_list); free_hot_cold_page_list(&page_list, true); /* @@ -1660,10 +1658,9 @@ static void move_active_pages_to_lru(struct lruvec *lruvec, __ClearPageActive(page); del_page_from_lru_list(page, lruvec, lru); - mem_cgroup_uncharge(page); - if (unlikely(PageCompound(page))) { spin_unlock_irq(&zone->lru_lock); + mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); spin_lock_irq(&zone->lru_lock); } else @@ -1771,6 +1768,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&zone->lru_lock); + mem_cgroup_uncharge_list(&l_hold); free_hot_cold_page_list(&l_hold, true); } -- cgit v0.10.2 From 434584fe68155f884b19f32b3befec8972c5d563 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:19:26 -0700 Subject: page-cgroup: trivial cleanup Add forward declarations for struct pglist_data, mem_cgroup. Remove __init, __meminit from function prototypes and inline functions. Remove redundant inclusion of bit_spinlock.h. Signed-off-by: Vladimir Davydov Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 9bfb8e6..b8f8c9e 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -12,8 +12,10 @@ enum { #ifndef __GENERATING_BOUNDS_H #include +struct pglist_data; + #ifdef CONFIG_MEMCG -#include +struct mem_cgroup; /* * Page Cgroup can be considered as an extended mem_map. @@ -27,16 +29,16 @@ struct page_cgroup { struct mem_cgroup *mem_cgroup; }; -void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat); +extern void pgdat_page_cgroup_init(struct pglist_data *pgdat); #ifdef CONFIG_SPARSEMEM -static inline void __init page_cgroup_init_flatmem(void) +static inline void page_cgroup_init_flatmem(void) { } -extern void __init page_cgroup_init(void); +extern void page_cgroup_init(void); #else -void __init page_cgroup_init_flatmem(void); -static inline void __init page_cgroup_init(void) +extern void page_cgroup_init_flatmem(void); +static inline void page_cgroup_init(void) { } #endif @@ -48,11 +50,10 @@ static inline int PageCgroupUsed(struct page_cgroup *pc) { return !!(pc->flags & PCG_USED); } - -#else /* CONFIG_MEMCG */ +#else /* !CONFIG_MEMCG */ struct page_cgroup; -static inline void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) +static inline void pgdat_page_cgroup_init(struct pglist_data *pgdat) { } @@ -65,10 +66,9 @@ static inline void page_cgroup_init(void) { } -static inline void __init page_cgroup_init_flatmem(void) +static inline void page_cgroup_init_flatmem(void) { } - #endif /* CONFIG_MEMCG */ #include -- cgit v0.10.2 From 9a3f4d85d58cb4e02e226f9be946d54c33eb715b Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:19:28 -0700 Subject: page-cgroup: get rid of NR_PCG_FLAGS It's not used anywhere today, so let's remove it. Signed-off-by: Vladimir Davydov Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index b8f8c9e..9d9f540 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -6,12 +6,8 @@ enum { PCG_USED = 0x01, /* This page is charged to a memcg */ PCG_MEM = 0x02, /* This page holds a memory charge */ PCG_MEMSW = 0x04, /* This page holds a memory+swap charge */ - __NR_PCG_FLAGS, }; -#ifndef __GENERATING_BOUNDS_H -#include - struct pglist_data; #ifdef CONFIG_MEMCG @@ -107,6 +103,4 @@ static inline void swap_cgroup_swapoff(int type) #endif /* CONFIG_MEMCG_SWAP */ -#endif /* !__GENERATING_BOUNDS_H */ - #endif /* __LINUX_PAGE_CGROUP_H */ diff --git a/kernel/bounds.c b/kernel/bounds.c index 9fd4246..e1d1d195 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -9,7 +9,6 @@ #include #include #include -#include #include #include @@ -18,7 +17,6 @@ void foo(void) /* The enum constants to put into include/generated/bounds.h */ DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS); DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES); - DEFINE(NR_PCG_FLAGS, __NR_PCG_FLAGS); #ifdef CONFIG_SMP DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS)); #endif -- cgit v0.10.2 From 3cbb01871e22709fdd39478eca831de317df332f Mon Sep 17 00:00:00 2001 From: Greg Thelen Date: Fri, 8 Aug 2014 14:19:31 -0700 Subject: memcg: remove lookup_cgroup_page() prototype Commit 6b208e3f6e35 ("mm: memcg: remove unused node/section info from pc->flags") deleted lookup_cgroup_page() but left a prototype for it. Kill the vestigial prototype. Signed-off-by: Greg Thelen Cc: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 9d9f540..5c831f1 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -40,7 +40,6 @@ static inline void page_cgroup_init(void) #endif struct page_cgroup *lookup_page_cgroup(struct page *page); -struct page *lookup_cgroup_page(struct page_cgroup *pc); static inline int PageCgroupUsed(struct page_cgroup *pc) { -- cgit v0.10.2 From 6abb5a867ba0866cb21827b172cee6aa71244bd1 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 8 Aug 2014 14:19:33 -0700 Subject: mm: memcontrol: avoid charge statistics churn during page migration Charge migration currently disables IRQs twice to update the charge statistics for the old page and then again for the new page. But migration is a seamless transition of a charge from one physical page to another one of the same size, so this should be a non-event from an accounting point of view. Leave the statistics alone. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a6e2be0..ec4dcf1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2728,7 +2728,7 @@ static void unlock_page_lru(struct page *page, int isolated) } static void commit_charge(struct page *page, struct mem_cgroup *memcg, - unsigned int nr_pages, bool lrucare) + bool lrucare) { struct page_cgroup *pc = lookup_page_cgroup(page); int isolated; @@ -2765,16 +2765,6 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, if (lrucare) unlock_page_lru(page, isolated); - - local_irq_disable(); - mem_cgroup_charge_statistics(memcg, page, nr_pages); - /* - * "charge_statistics" updated event counter. Then, check it. - * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree. - * if they exceeds softlimit. - */ - memcg_check_events(memcg, page); - local_irq_enable(); } static DEFINE_MUTEX(set_limit_mutex); @@ -6460,12 +6450,17 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, if (!memcg) return; + commit_charge(page, memcg, lrucare); + if (PageTransHuge(page)) { nr_pages <<= compound_order(page); VM_BUG_ON_PAGE(!PageTransHuge(page), page); } - commit_charge(page, memcg, nr_pages, lrucare); + local_irq_disable(); + mem_cgroup_charge_statistics(memcg, page, nr_pages); + memcg_check_events(memcg, page); + local_irq_enable(); if (do_swap_account && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; @@ -6651,7 +6646,6 @@ void mem_cgroup_uncharge_list(struct list_head *page_list) void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, bool lrucare) { - unsigned int nr_pages = 1; struct page_cgroup *pc; int isolated; @@ -6660,6 +6654,8 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage); VM_BUG_ON_PAGE(!lrucare && PageLRU(newpage), newpage); VM_BUG_ON_PAGE(PageAnon(oldpage) != PageAnon(newpage), newpage); + VM_BUG_ON_PAGE(PageTransHuge(oldpage) != PageTransHuge(newpage), + newpage); if (mem_cgroup_disabled()) return; @@ -6677,12 +6673,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage); VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage); - if (PageTransHuge(oldpage)) { - nr_pages <<= compound_order(oldpage); - VM_BUG_ON_PAGE(!PageTransHuge(oldpage), oldpage); - VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage); - } - if (lrucare) lock_page_lru(oldpage, &isolated); @@ -6691,12 +6681,7 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, if (lrucare) unlock_page_lru(oldpage, isolated); - local_irq_disable(); - mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages); - memcg_check_events(pc->mem_cgroup, oldpage); - local_irq_enable(); - - commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare); + commit_charge(newpage, pc->mem_cgroup, lrucare); } /* -- cgit v0.10.2 From c119239b16e0a45ca303cdcb082733dc4de84a45 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:19:35 -0700 Subject: mm/zswap.c: add __init to zswap_entry_cache_destroy() zswap_entry_cache_destroy() is only called by __init init_zswap(). This patch also fixes function name zswap_entry_cache_ s/destory/destroy Signed-off-by: Fabian Frederick Acked-by: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/zswap.c b/mm/zswap.c index 9da56af..ea064c1 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -212,7 +212,7 @@ static int zswap_entry_cache_create(void) return zswap_entry_cache == NULL; } -static void zswap_entry_cache_destory(void) +static void __init zswap_entry_cache_destroy(void) { kmem_cache_destroy(zswap_entry_cache); } @@ -941,7 +941,7 @@ static int __init init_zswap(void) pcpufail: zswap_comp_exit(); compfail: - zswap_entry_cache_destory(); + zswap_entry_cache_destroy(); cachefail: zpool_destroy_pool(zswap_pool); error: -- cgit v0.10.2 From ca35664031e166ed57659daace5527f16d42e40e Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:19:39 -0700 Subject: fs/efs/namei.c: return is not a function Fix checkpatch errors: "ERROR: return is not a function, parentheses are not required" Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/efs/namei.c b/fs/efs/namei.c index 356c044..bbee8f0 100644 --- a/fs/efs/namei.c +++ b/fs/efs/namei.c @@ -12,7 +12,8 @@ #include "efs.h" -static efs_ino_t efs_find_entry(struct inode *inode, const char *name, int len) { +static efs_ino_t efs_find_entry(struct inode *inode, const char *name, int len) +{ struct buffer_head *bh; int slot, namelen; @@ -40,10 +41,10 @@ static efs_ino_t efs_find_entry(struct inode *inode, const char *name, int len) if (be16_to_cpu(dirblock->magic) != EFS_DIRBLK_MAGIC) { pr_err("%s(): invalid directory block\n", __func__); brelse(bh); - return(0); + return 0; } - for(slot = 0; slot < dirblock->slots; slot++) { + for (slot = 0; slot < dirblock->slots; slot++) { dirslot = (struct efs_dentry *) (((char *) bh->b_data) + EFS_SLOTAT(dirblock, slot)); namelen = dirslot->namelen; @@ -52,12 +53,12 @@ static efs_ino_t efs_find_entry(struct inode *inode, const char *name, int len) if ((namelen == len) && (!memcmp(name, nameptr, len))) { inodenum = be32_to_cpu(dirslot->inode); brelse(bh); - return(inodenum); + return inodenum; } } brelse(bh); } - return(0); + return 0; } struct dentry *efs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) -- cgit v0.10.2 From b86280aa48b67c8119ed8f6c6bebd8c0af13a269 Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Fri, 8 Aug 2014 14:19:41 -0700 Subject: kernel/kallsyms.c: fix %pB when there's no symbol at the address __sprint_symbol() should restore original address when kallsyms_lookup() failed to find a symbol. It's reported when dumpstack shows an address in a dynamically allocated trampoline for ftrace. [ 1314.612287] [] dump_stack+0x45/0x56 [ 1314.612290] [] ? meminfo_proc_open+0x30/0x30 [ 1314.612293] [] kpatch_ftrace_handler+0x14/0xf0 [kpatch] [ 1314.612306] [] 0xffffffffa00160c3 You can see a difference in the hex address - c4 and c3. Fix it. Signed-off-by: Namhyung Kim Reported-by: Masami Hiramatsu Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Josh Poimboeuf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index cb0cf37..ae51670 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -364,7 +364,7 @@ static int __sprint_symbol(char *buffer, unsigned long address, address += symbol_offset; name = kallsyms_lookup(address, &size, &offset, &modname, buffer); if (!name) - return sprintf(buffer, "0x%lx", address); + return sprintf(buffer, "0x%lx", address - symbol_offset); if (name != buffer) strcpy(buffer, name); -- cgit v0.10.2 From 6f4535ed7d626c3760243b9c5223baaa358a5fdd Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:19:43 -0700 Subject: fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc Signed-off-by: Fabian Frederick Cc: Axel Lin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c index dda012a..bbafbde 100644 --- a/fs/ramfs/file-nommu.c +++ b/fs/ramfs/file-nommu.c @@ -222,7 +222,7 @@ static unsigned long ramfs_nommu_get_unmapped_area(struct file *file, /* gang-find the pages */ ret = -ENOMEM; - pages = kzalloc(lpages * sizeof(struct page *), GFP_KERNEL); + pages = kcalloc(lpages, sizeof(struct page *), GFP_KERNEL); if (!pages) goto out_free; -- cgit v0.10.2 From 4dfe694f616e00e6fd83e5bbcd7a3c4d7113493d Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Fri, 8 Aug 2014 14:19:46 -0700 Subject: init: make rootdelay=N consistent with rootwait behaviour Currently rootdelay=N and rootwait behave differently (aside from the obvious unbounded wait duration) because they are at different places in the init sequence. The difference manifests itself for md devices because the call to md_run_setup() lives between rootdelay and rootwait, so if you try to use rootdelay=20 to try and allow a slow RAID0 array to assemble, you get this: [ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk [ 22.972079] md: Waiting for all devices to be available before autodetect i.e. you've achieved nothing other than delaying the probing 20s, when what you wanted was a 20s delay _after_ the probing for md devices was initiated. Here we move the rootdelay code to be right beside the rootwait code, so that their behaviour is consistent. It should be noted that in doing so, the actions based on the saved_root_name[0] and initrd_load() were previously put on hold by rootdelay=N and now currently will not be delayed. However, I think consistent behaviour is more important than matching historical behaviour of delaying the above two operations. Signed-off-by: Paul Gortmaker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/do_mounts.c b/init/do_mounts.c index 82f2288..b6237c3 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -539,12 +539,6 @@ void __init prepare_namespace(void) { int is_floppy; - if (root_delay) { - printk(KERN_INFO "Waiting %d sec before mounting root device...\n", - root_delay); - ssleep(root_delay); - } - /* * wait for the known devices to complete their probing * @@ -571,6 +565,12 @@ void __init prepare_namespace(void) if (initrd_load()) goto out; + if (root_delay) { + pr_info("Waiting %d sec before mounting root device...\n", + root_delay); + ssleep(root_delay); + } + /* wait for any asynchronous scanning to complete */ if ((ROOT_DEV == 0) && root_wait) { printk(KERN_INFO "Waiting for root device %s...\n", -- cgit v0.10.2 From 4878b14b43188ffeceecfc32295ed2a783b7aa7a Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:19:48 -0700 Subject: kernel/test_kprobes.c: use current logging functions - Add pr_fmt - Coalesce formats - Use current pr_foo() functions instead of printk - Remove unnecessary "failed" display (already in log level). Signed-off-by: Fabian Frederick Cc: Ananth N Mavinakayanahalli Cc: Anil S Keshavamurthy Cc: "David S. Miller" Cc: Masami Hiramatsu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/test_kprobes.c b/kernel/test_kprobes.c index 12d6ebbf..0dbab6d 100644 --- a/kernel/test_kprobes.c +++ b/kernel/test_kprobes.c @@ -14,6 +14,8 @@ * the GNU General Public License for more details. */ +#define pr_fmt(fmt) "Kprobe smoke test: " fmt + #include #include #include @@ -41,8 +43,7 @@ static void kp_post_handler(struct kprobe *p, struct pt_regs *regs, { if (preh_val != (rand1 / div_factor)) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "incorrect value in post_handler\n"); + pr_err("incorrect value in post_handler\n"); } posth_val = preh_val + div_factor; } @@ -59,8 +60,7 @@ static int test_kprobe(void) ret = register_kprobe(&kp); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_kprobe returned %d\n", ret); + pr_err("register_kprobe returned %d\n", ret); return ret; } @@ -68,14 +68,12 @@ static int test_kprobe(void) unregister_kprobe(&kp); if (preh_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe pre_handler not called\n"); + pr_err("kprobe pre_handler not called\n"); handler_errors++; } if (posth_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe post_handler not called\n"); + pr_err("kprobe post_handler not called\n"); handler_errors++; } @@ -98,8 +96,7 @@ static void kp_post_handler2(struct kprobe *p, struct pt_regs *regs, { if (preh_val != (rand1 / div_factor) + 1) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "incorrect value in post_handler2\n"); + pr_err("incorrect value in post_handler2\n"); } posth_val = preh_val + div_factor; } @@ -120,8 +117,7 @@ static int test_kprobes(void) kp.flags = 0; ret = register_kprobes(kps, 2); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_kprobes returned %d\n", ret); + pr_err("register_kprobes returned %d\n", ret); return ret; } @@ -130,14 +126,12 @@ static int test_kprobes(void) ret = target(rand1); if (preh_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe pre_handler not called\n"); + pr_err("kprobe pre_handler not called\n"); handler_errors++; } if (posth_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe post_handler not called\n"); + pr_err("kprobe post_handler not called\n"); handler_errors++; } @@ -146,14 +140,12 @@ static int test_kprobes(void) ret = target2(rand1); if (preh_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe pre_handler2 not called\n"); + pr_err("kprobe pre_handler2 not called\n"); handler_errors++; } if (posth_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kprobe post_handler2 not called\n"); + pr_err("kprobe post_handler2 not called\n"); handler_errors++; } @@ -166,8 +158,7 @@ static u32 j_kprobe_target(u32 value) { if (value != rand1) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "incorrect value in jprobe handler\n"); + pr_err("incorrect value in jprobe handler\n"); } jph_val = rand1; @@ -186,16 +177,14 @@ static int test_jprobe(void) ret = register_jprobe(&jp); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_jprobe returned %d\n", ret); + pr_err("register_jprobe returned %d\n", ret); return ret; } ret = target(rand1); unregister_jprobe(&jp); if (jph_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "jprobe handler not called\n"); + pr_err("jprobe handler not called\n"); handler_errors++; } @@ -217,24 +206,21 @@ static int test_jprobes(void) jp.kp.flags = 0; ret = register_jprobes(jps, 2); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_jprobes returned %d\n", ret); + pr_err("register_jprobes returned %d\n", ret); return ret; } jph_val = 0; ret = target(rand1); if (jph_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "jprobe handler not called\n"); + pr_err("jprobe handler not called\n"); handler_errors++; } jph_val = 0; ret = target2(rand1); if (jph_val == 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "jprobe handler2 not called\n"); + pr_err("jprobe handler2 not called\n"); handler_errors++; } unregister_jprobes(jps, 2); @@ -256,13 +242,11 @@ static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs) if (ret != (rand1 / div_factor)) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "incorrect value in kretprobe handler\n"); + pr_err("incorrect value in kretprobe handler\n"); } if (krph_val == 0) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "call to kretprobe entry handler failed\n"); + pr_err("call to kretprobe entry handler failed\n"); } krph_val = rand1; @@ -281,16 +265,14 @@ static int test_kretprobe(void) ret = register_kretprobe(&rp); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_kretprobe returned %d\n", ret); + pr_err("register_kretprobe returned %d\n", ret); return ret; } ret = target(rand1); unregister_kretprobe(&rp); if (krph_val != rand1) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kretprobe handler not called\n"); + pr_err("kretprobe handler not called\n"); handler_errors++; } @@ -303,13 +285,11 @@ static int return_handler2(struct kretprobe_instance *ri, struct pt_regs *regs) if (ret != (rand1 / div_factor) + 1) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "incorrect value in kretprobe handler2\n"); + pr_err("incorrect value in kretprobe handler2\n"); } if (krph_val == 0) { handler_errors++; - printk(KERN_ERR "Kprobe smoke test failed: " - "call to kretprobe entry handler failed\n"); + pr_err("call to kretprobe entry handler failed\n"); } krph_val = rand1; @@ -332,24 +312,21 @@ static int test_kretprobes(void) rp.kp.flags = 0; ret = register_kretprobes(rps, 2); if (ret < 0) { - printk(KERN_ERR "Kprobe smoke test failed: " - "register_kretprobe returned %d\n", ret); + pr_err("register_kretprobe returned %d\n", ret); return ret; } krph_val = 0; ret = target(rand1); if (krph_val != rand1) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kretprobe handler not called\n"); + pr_err("kretprobe handler not called\n"); handler_errors++; } krph_val = 0; ret = target2(rand1); if (krph_val != rand1) { - printk(KERN_ERR "Kprobe smoke test failed: " - "kretprobe handler2 not called\n"); + pr_err("kretprobe handler2 not called\n"); handler_errors++; } unregister_kretprobes(rps, 2); @@ -368,7 +345,7 @@ int init_test_probes(void) rand1 = prandom_u32(); } while (rand1 <= div_factor); - printk(KERN_INFO "Kprobe smoke test started\n"); + pr_info("started\n"); num_tests++; ret = test_kprobe(); if (ret < 0) @@ -402,13 +379,11 @@ int init_test_probes(void) #endif /* CONFIG_KRETPROBES */ if (errors) - printk(KERN_ERR "BUG: Kprobe smoke test: %d out of " - "%d tests failed\n", errors, num_tests); + pr_err("BUG: %d out of %d tests failed\n", errors, num_tests); else if (handler_errors) - printk(KERN_ERR "BUG: Kprobe smoke test: %d error(s) " - "running handlers\n", handler_errors); + pr_err("BUG: %d error(s) running handlers\n", handler_errors); else - printk(KERN_INFO "Kprobe smoke test passed successfully\n"); + pr_info("passed successfully\n"); return 0; } -- cgit v0.10.2 From 26b7a54a35cbb2c1b980b339cc1d600edd5f153d Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Aug 2014 14:19:50 -0700 Subject: autofs4: remove unused autofs4_ispending() Signed-off-by: NeilBrown Acked-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index acf3205..22a2801 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -143,20 +143,6 @@ static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) { return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; } -/* Does a dentry have some pending activity? */ -static inline int autofs4_ispending(struct dentry *dentry) -{ - struct autofs_info *inf = autofs4_dentry_ino(dentry); - - if (inf->flags & AUTOFS_INF_PENDING) - return 1; - - if (inf->flags & AUTOFS_INF_EXPIRING) - return 1; - - return 0; -} - struct inode *autofs4_get_inode(struct super_block *, umode_t); void autofs4_free_ino(struct autofs_info *); -- cgit v0.10.2 From c312442fe3ce9561137910db35ad1409d67dbac6 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Aug 2014 14:19:52 -0700 Subject: autofs4: remove a redundant assignment The variable 'ino' already exists and already has the correct value. The d_fsdata of a dentry is never changed after the d_fsdata is instantiated, so this new assignment cannot be necessary. It was introduced in commit b5b801779d59 ("autofs4: Add d_manage() dentry operation"). Signed-off-by: NeilBrown Acked-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index 394e90b..a7be57e 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -333,7 +333,6 @@ struct dentry *autofs4_expire_direct(struct super_block *sb, if (ino->flags & AUTOFS_INF_PENDING) goto out; if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { - struct autofs_info *ino = autofs4_dentry_ino(root); ino->flags |= AUTOFS_INF_EXPIRING; init_completion(&ino->expire_complete); spin_unlock(&sbi->fs_lock); -- cgit v0.10.2 From 668128e90b9596668141539326178dacc42af188 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Aug 2014 14:19:54 -0700 Subject: autofs4: don't take spinlock when not needed in autofs4_lookup_expiring If the expiring_list is empty, we can avoid a costly spinlock in the rcu-walk path through autofs4_d_manage (once the rest of the path becomes rcu-walk friendly). Signed-off-by: NeilBrown Reviewed-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index cc87c1a..fb202ca 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -166,8 +166,10 @@ static struct dentry *autofs4_lookup_active(struct dentry *dentry) const unsigned char *str = name->name; struct list_head *p, *head; - spin_lock(&sbi->lookup_lock); head = &sbi->active_list; + if (list_empty(head)) + return NULL; + spin_lock(&sbi->lookup_lock); list_for_each(p, head) { struct autofs_info *ino; struct dentry *active; @@ -218,8 +220,10 @@ static struct dentry *autofs4_lookup_expiring(struct dentry *dentry) const unsigned char *str = name->name; struct list_head *p, *head; - spin_lock(&sbi->lookup_lock); head = &sbi->expiring_list; + if (list_empty(head)) + return NULL; + spin_lock(&sbi->lookup_lock); list_for_each(p, head) { struct autofs_info *ino; struct dentry *expiring; -- cgit v0.10.2 From bdac38329e2e26c6b2a8bb9e57e18d1db2f1db92 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Aug 2014 14:19:56 -0700 Subject: autofs4: remove some unused inline functions {__,}manage_dentry_{set,clear}_{automount,transit} are 4 unused inline functions. Discard them. Signed-off-by: NeilBrown Acked-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index 22a2801..9e359fb 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -177,55 +177,6 @@ extern const struct file_operations autofs4_root_operations; extern const struct dentry_operations autofs4_dentry_operations; /* VFS automount flags management functions */ - -static inline void __managed_dentry_set_automount(struct dentry *dentry) -{ - dentry->d_flags |= DCACHE_NEED_AUTOMOUNT; -} - -static inline void managed_dentry_set_automount(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - __managed_dentry_set_automount(dentry); - spin_unlock(&dentry->d_lock); -} - -static inline void __managed_dentry_clear_automount(struct dentry *dentry) -{ - dentry->d_flags &= ~DCACHE_NEED_AUTOMOUNT; -} - -static inline void managed_dentry_clear_automount(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - __managed_dentry_clear_automount(dentry); - spin_unlock(&dentry->d_lock); -} - -static inline void __managed_dentry_set_transit(struct dentry *dentry) -{ - dentry->d_flags |= DCACHE_MANAGE_TRANSIT; -} - -static inline void managed_dentry_set_transit(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - __managed_dentry_set_transit(dentry); - spin_unlock(&dentry->d_lock); -} - -static inline void __managed_dentry_clear_transit(struct dentry *dentry) -{ - dentry->d_flags &= ~DCACHE_MANAGE_TRANSIT; -} - -static inline void managed_dentry_clear_transit(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - __managed_dentry_clear_transit(dentry); - spin_unlock(&dentry->d_lock); -} - static inline void __managed_dentry_set_managed(struct dentry *dentry) { dentry->d_flags |= (DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT); -- cgit v0.10.2 From 3b97dd05810dcb5db1ca98309d52ba7caa1eb00b Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Aug 2014 14:19:58 -0700 Subject: autofs4: comment typo: remove a a doubled word Signed-off-by: NeilBrown Acked-by: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index fb202ca..cdb25eb 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -377,7 +377,7 @@ static struct vfsmount *autofs4_d_automount(struct path *path) * this because the leaves of the directory tree under the * mount never trigger mounts themselves (they have an autofs * trigger mount mounted on them). But v4 pseudo direct mounts - * do need the leaves to to trigger mounts. In this case we + * do need the leaves to trigger mounts. In this case we * have no choice but to use the list_empty() check and * require user space behave. */ -- cgit v0.10.2 From 571eb88390182b0439f72326bb18aa10ebef5d88 Mon Sep 17 00:00:00 2001 From: Raghavendra Ganiga Date: Fri, 8 Aug 2014 14:20:01 -0700 Subject: drivers/rtc/rtc-ds1343.c: add support of nvram for maxim dallas rtc ds1343 This is a patch to add support of nvram for maxim dallas rtc ds1343. Signed-off-by: Raghavendra Chandra Ganiga Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-ds1343.c b/drivers/rtc/rtc-ds1343.c index c371918..ae9f997 100644 --- a/drivers/rtc/rtc-ds1343.c +++ b/drivers/rtc/rtc-ds1343.c @@ -4,6 +4,7 @@ * Real Time Clock * * Author : Raghavendra Chandra Ganiga + * Ankur Srivastava : DS1343 Nvram Support * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -45,6 +46,9 @@ #define DS1343_CONTROL_REG 0x0F #define DS1343_STATUS_REG 0x10 #define DS1343_TRICKLE_REG 0x11 +#define DS1343_NVRAM 0x20 + +#define DS1343_NVRAM_LEN 96 /* DS1343 Control Registers bits */ #define DS1343_EOSC 0x80 @@ -149,6 +153,64 @@ static ssize_t ds1343_store_glitchfilter(struct device *dev, static DEVICE_ATTR(glitch_filter, S_IRUGO | S_IWUSR, ds1343_show_glitchfilter, ds1343_store_glitchfilter); +static ssize_t ds1343_nvram_write(struct file *filp, struct kobject *kobj, + struct bin_attribute *attr, + char *buf, loff_t off, size_t count) +{ + int ret; + unsigned char address; + struct device *dev = kobj_to_dev(kobj); + struct ds1343_priv *priv = dev_get_drvdata(dev); + + if (unlikely(!count)) + return count; + + if ((count + off) > DS1343_NVRAM_LEN) + count = DS1343_NVRAM_LEN - off; + + address = DS1343_NVRAM + off; + + ret = regmap_bulk_write(priv->map, address, buf, count); + if (ret < 0) + dev_err(&priv->spi->dev, "Error in nvram write %d", ret); + + return (ret < 0) ? ret : count; +} + + +static ssize_t ds1343_nvram_read(struct file *filp, struct kobject *kobj, + struct bin_attribute *attr, + char *buf, loff_t off, size_t count) +{ + int ret; + unsigned char address; + struct device *dev = kobj_to_dev(kobj); + struct ds1343_priv *priv = dev_get_drvdata(dev); + + if (unlikely(!count)) + return count; + + if ((count + off) > DS1343_NVRAM_LEN) + count = DS1343_NVRAM_LEN - off; + + address = DS1343_NVRAM + off; + + ret = regmap_bulk_read(priv->map, address, buf, count); + if (ret < 0) + dev_err(&priv->spi->dev, "Error in nvram read %d\n", ret); + + return (ret < 0) ? ret : count; +} + + +static struct bin_attribute nvram_attr = { + .attr.name = "nvram", + .attr.mode = S_IRUGO | S_IWUSR, + .read = ds1343_nvram_read, + .write = ds1343_nvram_write, + .size = DS1343_NVRAM_LEN, +}; + static ssize_t ds1343_show_alarmstatus(struct device *dev, struct device_attribute *attr, char *buf) { @@ -274,12 +336,16 @@ static int ds1343_sysfs_register(struct device *dev) if (err) goto error1; + err = device_create_bin_file(dev, &nvram_attr); + if (err) + goto error2; + if (priv->irq <= 0) return err; err = device_create_file(dev, &dev_attr_alarm_mode); if (err) - goto error2; + goto error3; err = device_create_file(dev, &dev_attr_alarm_status); if (!err) @@ -287,6 +353,9 @@ static int ds1343_sysfs_register(struct device *dev) device_remove_file(dev, &dev_attr_alarm_mode); +error3: + device_remove_bin_file(dev, &nvram_attr); + error2: device_remove_file(dev, &dev_attr_trickle_charger); @@ -302,6 +371,7 @@ static void ds1343_sysfs_unregister(struct device *dev) device_remove_file(dev, &dev_attr_glitch_filter); device_remove_file(dev, &dev_attr_trickle_charger); + device_remove_bin_file(dev, &nvram_attr); if (priv->irq <= 0) return; @@ -684,6 +754,7 @@ static struct spi_driver ds1343_driver = { module_spi_driver(ds1343_driver); MODULE_DESCRIPTION("DS1343 RTC SPI Driver"); -MODULE_AUTHOR("Raghavendra Chandra Ganiga "); +MODULE_AUTHOR("Raghavendra Chandra Ganiga ," + "Ankur Srivastava "); MODULE_LICENSE("GPL v2"); MODULE_VERSION(DS1343_DRV_VERSION); -- cgit v0.10.2 From ad0200f72d9875caa2023c59240ee677df66918e Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 8 Aug 2014 14:20:03 -0700 Subject: drivers/rtc/Kconfig: move DS2404 entry where it belongs Move the DS2404 entry together with the other Dallas entries, and add Maxim so that it looks nicer (and more correct, too.) Signed-off-by: Jean Delvare Cc: Sven Schnelle Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 0754f5c..2d99bf6 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -760,6 +760,15 @@ config RTC_DRV_DS1742 This driver can also be built as a module. If so, the module will be called rtc-ds1742. +config RTC_DRV_DS2404 + tristate "Maxim/Dallas DS2404" + help + If you say yes here you get support for the + Dallas DS2404 RTC chip. + + This driver can also be built as a module. If so, the module + will be called rtc-ds2404. + config RTC_DRV_DA9052 tristate "Dialog DA9052/DA9053 RTC" depends on PMIC_DA9052 @@ -873,15 +882,6 @@ config RTC_DRV_V3020 This driver can also be built as a module. If so, the module will be called rtc-v3020. -config RTC_DRV_DS2404 - tristate "Dallas DS2404" - help - If you say yes here you get support for the - Dallas DS2404 RTC chip. - - This driver can also be built as a module. If so, the module - will be called rtc-ds2404. - config RTC_DRV_WM831X tristate "Wolfson Microelectronics WM831x RTC" depends on MFD_WM831X -- cgit v0.10.2 From 441fb7684782be3553c67dc04defcf304b999bba Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 8 Aug 2014 14:20:05 -0700 Subject: drivers/rtc/Kconfig: add hardware dependency to rtc-moxart The rtc-moxart driver is only useful on MOXA ART systems so it should depend on ARCH_MOXART as all other MOXA ART drivers already do. Add COMPILE_TEST as an alternative to allow for build testing on other systems. Signed-off-by: Jean Delvare Cc: Alessandro Zummo Cc: Jonas Jensen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 2d99bf6..a672dd1 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -1349,6 +1349,7 @@ config RTC_DRV_SIRFSOC config RTC_DRV_MOXART tristate "MOXA ART RTC" + depends on ARCH_MOXART || COMPILE_TEST help If you say yes here you get support for the MOXA ART RTC module. -- cgit v0.10.2 From 3710f597aca5af4f605cda9793a8b2f0a42bc29a Mon Sep 17 00:00:00 2001 From: Alexander Shiyan Date: Fri, 8 Aug 2014 14:20:07 -0700 Subject: drivers/rtc/rtc-ds1742.c: revert "drivers/rtc/rtc-ds1742.c: remove redundant of_match_ptr() helper" Commit 5516f0971793 ("drivers/rtc/rtc-ds1742.c: remove redundant of_match_ptr() helper") has description as: "'ds1742_rtc_of_match' is always compiled in. Hence the helper macro is not needed". It is not true, of_match_ptr() macro makes of_device_id parameter unused and this constant is declared with __maybe_unused attribute, so normally this variable should be discarded by linker. This patch revers this commit, since there are no reason to compile of_device_id constant for non-DT systems. Signed-off-by: Alexander Shiyan Cc: Alessandro Zummo Cc: Sachin Kamat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-ds1742.c b/drivers/rtc/rtc-ds1742.c index c6b2191..9822715 100644 --- a/drivers/rtc/rtc-ds1742.c +++ b/drivers/rtc/rtc-ds1742.c @@ -231,7 +231,7 @@ static struct platform_driver ds1742_rtc_driver = { .driver = { .name = "rtc-ds1742", .owner = THIS_MODULE, - .of_match_table = ds1742_rtc_of_match, + .of_match_table = of_match_ptr(ds1742_rtc_of_match), }, }; -- cgit v0.10.2 From 6e85bab6bc1019f9b87c53b32da3ad7791e7ddf9 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Fri, 8 Aug 2014 14:20:09 -0700 Subject: drivers/rtc/rtc-efi.c: check for invalid data coming back from UEFI In particular seeing zero in eft->month is problematic, as it results in -1 (converted to unsigned int, i.e. yielding 0xffffffff) getting passed to rtc_year_days(), where the value gets used as an array index (normally resulting in a crash). This was observed with the driver enabled on x86 on some Fujitsu system (with possibly not up to date firmware, but anyway). Perhaps efi_read_alarm() should not fail if neither enabled nor pending are set, but the returned time is invalid? Signed-off-by: Jan Beulich Reported-by: Raymund Will Cc: Alessandro Zummo Cc: Jingoo Han Acked-by: Lee, Chun-Yi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-efi.c b/drivers/rtc/rtc-efi.c index c4c3843..8225b89 100644 --- a/drivers/rtc/rtc-efi.c +++ b/drivers/rtc/rtc-efi.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -48,8 +49,8 @@ compute_wday(efi_time_t *eft) int y; int ndays = 0; - if (eft->year < 1998) { - pr_err("EFI year < 1998, invalid date\n"); + if (eft->year < EFI_RTC_EPOCH) { + pr_err("EFI year < " __stringify(EFI_RTC_EPOCH) ", invalid date\n"); return -1; } @@ -78,19 +79,36 @@ convert_to_efi_time(struct rtc_time *wtime, efi_time_t *eft) eft->timezone = EFI_UNSPECIFIED_TIMEZONE; } -static void +static bool convert_from_efi_time(efi_time_t *eft, struct rtc_time *wtime) { memset(wtime, 0, sizeof(*wtime)); + + if (eft->second >= 60) + return false; wtime->tm_sec = eft->second; + + if (eft->minute >= 60) + return false; wtime->tm_min = eft->minute; + + if (eft->hour >= 24) + return false; wtime->tm_hour = eft->hour; + + if (!eft->day || eft->day > 31) + return false; wtime->tm_mday = eft->day; + + if (!eft->month || eft->month > 12) + return false; wtime->tm_mon = eft->month - 1; wtime->tm_year = eft->year - 1900; /* day of the week [0-6], Sunday=0 */ wtime->tm_wday = compute_wday(eft); + if (wtime->tm_wday < 0) + return false; /* day in the year [1-365]*/ wtime->tm_yday = compute_yday(eft); @@ -106,6 +124,8 @@ convert_from_efi_time(efi_time_t *eft, struct rtc_time *wtime) default: wtime->tm_isdst = -1; } + + return true; } static int efi_read_alarm(struct device *dev, struct rtc_wkalrm *wkalrm) @@ -122,7 +142,8 @@ static int efi_read_alarm(struct device *dev, struct rtc_wkalrm *wkalrm) if (status != EFI_SUCCESS) return -EINVAL; - convert_from_efi_time(&eft, &wkalrm->time); + if (!convert_from_efi_time(&eft, &wkalrm->time)) + return -EIO; return rtc_valid_tm(&wkalrm->time); } @@ -163,7 +184,8 @@ static int efi_read_time(struct device *dev, struct rtc_time *tm) return -EINVAL; } - convert_from_efi_time(&eft, tm); + if (!convert_from_efi_time(&eft, tm)) + return -EIO; return rtc_valid_tm(tm); } -- cgit v0.10.2 From ca6dc2dab97133b874c2f6a76b6125497b67b429 Mon Sep 17 00:00:00 2001 From: Hyogi Gim Date: Fri, 8 Aug 2014 14:20:11 -0700 Subject: drivers/rtc/interface.c: check the error after __rtc_read_time() In __rtc_set_alarm(), the error after __rtc_read_time() is not checked. If rtc device fail to read time, we cannot guarantee the following process. Add the verification code for returned __rtc_read_time() error. Signed-off-by: Hyogi Gim Acked-by: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 5813fa5..5b2717f 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -348,6 +348,8 @@ static int __rtc_set_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) /* Make sure we're not setting alarms in the past */ err = __rtc_read_time(rtc, &tm); + if (err) + return err; rtc_tm_to_time(&tm, &now); if (scheduled <= now) return -ETIME; -- cgit v0.10.2 From da167ad7638759adb811afa3c80ff4cb67608242 Mon Sep 17 00:00:00 2001 From: Mark Salter Date: Fri, 8 Aug 2014 14:20:14 -0700 Subject: rtc: ia64: allow other architectures to use EFI RTC Currently, the rtc-efi driver is restricted to ia64 only. Newer architectures with EFI support may want to also use that driver. This patch moves the platform device setup from ia64 into drivers/rtc and allow any architecture with CONFIG_EFI=y to use the rtc-efi driver. Signed-off-by: Mark Salter Cc: Alessandro Zummo Cc: Tony Luck Cc: Fenghua Yu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 3e71ef8..9a0104a 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -384,21 +384,6 @@ static struct irqaction timer_irqaction = { .name = "timer" }; -static struct platform_device rtc_efi_dev = { - .name = "rtc-efi", - .id = -1, -}; - -static int __init rtc_init(void) -{ - if (platform_device_register(&rtc_efi_dev) < 0) - printk(KERN_ERR "unable to register rtc device...\n"); - - /* not necessarily an error */ - return 0; -} -module_init(rtc_init); - void read_persistent_clock(struct timespec *ts) { efi_gettimeofday(ts); diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index a672dd1..83743a1a 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -798,7 +798,7 @@ config RTC_DRV_DA9063 config RTC_DRV_EFI tristate "EFI RTC" - depends on IA64 + depends on EFI help If you say yes here you will get support for the EFI Real Time Clock. diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile index 70347d0..f1dfc36 100644 --- a/drivers/rtc/Makefile +++ b/drivers/rtc/Makefile @@ -10,6 +10,10 @@ obj-$(CONFIG_RTC_SYSTOHC) += systohc.o obj-$(CONFIG_RTC_CLASS) += rtc-core.o rtc-core-y := class.o interface.o +ifdef CONFIG_RTC_DRV_EFI +rtc-core-y += rtc-efi-platform.o +endif + rtc-core-$(CONFIG_RTC_INTF_DEV) += rtc-dev.o rtc-core-$(CONFIG_RTC_INTF_PROC) += rtc-proc.o rtc-core-$(CONFIG_RTC_INTF_SYSFS) += rtc-sysfs.o diff --git a/drivers/rtc/rtc-efi-platform.c b/drivers/rtc/rtc-efi-platform.c new file mode 100644 index 0000000..b40fbe3 --- /dev/null +++ b/drivers/rtc/rtc-efi-platform.c @@ -0,0 +1,31 @@ +/* + * Moved from arch/ia64/kernel/time.c + * + * Copyright (C) 1998-2003 Hewlett-Packard Co + * Stephane Eranian + * David Mosberger + * Copyright (C) 1999 Don Dugger + * Copyright (C) 1999-2000 VA Linux Systems + * Copyright (C) 1999-2000 Walt Drummond + */ +#include +#include +#include +#include +#include + +static struct platform_device rtc_efi_dev = { + .name = "rtc-efi", + .id = -1, +}; + +static int __init rtc_init(void) +{ + if (efi_enabled(EFI_RUNTIME_SERVICES)) + if (platform_device_register(&rtc_efi_dev) < 0) + pr_err("unable to register rtc device...\n"); + + /* not necessarily an error */ + return 0; +} +module_init(rtc_init); -- cgit v0.10.2 From 2784366c6797e6a3078abca2f303ebf7a36290be Mon Sep 17 00:00:00 2001 From: Vincent Donnefort Date: Fri, 8 Aug 2014 14:20:16 -0700 Subject: drivers/rtc/rtc-pcf8563.c: introduce read|write_block_data This functions allow to factorize I2C I/O operations. Signed-off-by: Vincent Donnefort Cc: Simon Guinot Cc: Jason Cooper Cc: Andrew Lunn Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c index 63b558c..7cbbf50 100644 --- a/drivers/rtc/rtc-pcf8563.c +++ b/drivers/rtc/rtc-pcf8563.c @@ -69,35 +69,66 @@ struct pcf8563 { int voltage_low; /* incicates if a low_voltage was detected */ }; -/* - * In the routines that deal directly with the pcf8563 hardware, we use - * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. - */ -static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) +static int pcf8563_read_block_data(struct i2c_client *client, unsigned char reg, + unsigned char length, unsigned char *buf) { - struct pcf8563 *pcf8563 = i2c_get_clientdata(client); - unsigned char buf[13] = { PCF8563_REG_ST1 }; - struct i2c_msg msgs[] = { {/* setup read ptr */ .addr = client->addr, .len = 1, - .buf = buf + .buf = ®, }, - {/* read status + date */ + { .addr = client->addr, .flags = I2C_M_RD, - .len = 13, + .len = length, .buf = buf }, }; - /* read registers */ if ((i2c_transfer(client->adapter, msgs, 2)) != 2) { dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } + return 0; +} + +static int pcf8563_write_block_data(struct i2c_client *client, + unsigned char reg, unsigned char length, + unsigned char *buf) +{ + int i, err; + + for (i = 0; i < length; i++) { + unsigned char data[2] = { reg + i, buf[i] }; + + err = i2c_master_send(client, data, sizeof(data)); + if (err != sizeof(data)) { + dev_err(&client->dev, + "%s: err=%d addr=%02x, data=%02x\n", + __func__, err, data[0], data[1]); + return -EIO; + } + } + + return 0; +} + +/* + * In the routines that deal directly with the pcf8563 hardware, we use + * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. + */ +static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) +{ + struct pcf8563 *pcf8563 = i2c_get_clientdata(client); + unsigned char buf[9]; + int err; + + err = pcf8563_read_block_data(client, PCF8563_REG_ST1, 9, buf); + if (err) + return err; + if (buf[PCF8563_REG_SC] & PCF8563_SC_LV) { pcf8563->voltage_low = 1; dev_info(&client->dev, @@ -144,7 +175,7 @@ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) static int pcf8563_set_datetime(struct i2c_client *client, struct rtc_time *tm) { struct pcf8563 *pcf8563 = i2c_get_clientdata(client); - int i, err; + int err; unsigned char buf[9]; dev_dbg(&client->dev, "%s: secs=%d, mins=%d, hours=%d, " @@ -170,19 +201,10 @@ static int pcf8563_set_datetime(struct i2c_client *client, struct rtc_time *tm) buf[PCF8563_REG_DW] = tm->tm_wday & 0x07; - /* write register's data */ - for (i = 0; i < 7; i++) { - unsigned char data[2] = { PCF8563_REG_SC + i, - buf[PCF8563_REG_SC + i] }; - - err = i2c_master_send(client, data, sizeof(data)); - if (err != sizeof(data)) { - dev_err(&client->dev, - "%s: err=%d addr=%02x, data=%02x\n", - __func__, err, data[0], data[1]); - return -EIO; - } - } + err = pcf8563_write_block_data(client, PCF8563_REG_SC, + 9 - PCF8563_REG_SC, buf + PCF8563_REG_SC); + if (err) + return err; return 0; } -- cgit v0.10.2 From ede3e9d47cca565d96613bea9213eccf91c7a2a7 Mon Sep 17 00:00:00 2001 From: Vincent Donnefort Date: Fri, 8 Aug 2014 14:20:18 -0700 Subject: drivers/rtc/rtc-pcf8563.c: add alarm support This patch adds alarm support for the NXP PCF8563 chip. Signed-off-by: Vincent Donnefort Cc: Simon Guinot Cc: Jason Cooper Cc: Andrew Lunn Cc: Alessandro Zummo Cc: Dan Carpenter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c index 7cbbf50..5a197d9 100644 --- a/drivers/rtc/rtc-pcf8563.c +++ b/drivers/rtc/rtc-pcf8563.c @@ -26,6 +26,8 @@ #define PCF8563_REG_ST1 0x00 /* status */ #define PCF8563_REG_ST2 0x01 +#define PCF8563_BIT_AIE (1 << 1) +#define PCF8563_BIT_AF (1 << 3) #define PCF8563_REG_SC 0x02 /* datetime */ #define PCF8563_REG_MN 0x03 @@ -36,9 +38,6 @@ #define PCF8563_REG_YR 0x08 #define PCF8563_REG_AMN 0x09 /* alarm */ -#define PCF8563_REG_AHR 0x0A -#define PCF8563_REG_ADM 0x0B -#define PCF8563_REG_ADW 0x0C #define PCF8563_REG_CLKO 0x0D /* clock out */ #define PCF8563_REG_TMRC 0x0E /* timer control */ @@ -67,6 +66,8 @@ struct pcf8563 { */ int c_polarity; /* 0: MO_C=1 means 19xx, otherwise MO_C=1 means 20xx */ int voltage_low; /* incicates if a low_voltage was detected */ + + struct i2c_client *client; }; static int pcf8563_read_block_data(struct i2c_client *client, unsigned char reg, @@ -115,6 +116,69 @@ static int pcf8563_write_block_data(struct i2c_client *client, return 0; } +static int pcf8563_set_alarm_mode(struct i2c_client *client, bool on) +{ + unsigned char buf[2]; + int err; + + err = pcf8563_read_block_data(client, PCF8563_REG_ST2, 1, buf + 1); + if (err < 0) + return err; + + if (on) + buf[1] |= PCF8563_BIT_AIE; + else + buf[1] &= ~PCF8563_BIT_AIE; + + buf[1] &= ~PCF8563_BIT_AF; + buf[0] = PCF8563_REG_ST2; + + err = pcf8563_write_block_data(client, PCF8563_REG_ST2, 1, buf + 1); + if (err < 0) { + dev_err(&client->dev, "%s: write error\n", __func__); + return -EIO; + } + + return 0; +} + +static int pcf8563_get_alarm_mode(struct i2c_client *client, unsigned char *en, + unsigned char *pen) +{ + unsigned char buf; + int err; + + err = pcf8563_read_block_data(client, PCF8563_REG_ST2, 1, &buf); + if (err) + return err; + + if (en) + *en = !!(buf & PCF8563_BIT_AIE); + if (pen) + *pen = !!(buf & PCF8563_BIT_AF); + + return 0; +} + +static irqreturn_t pcf8563_irq(int irq, void *dev_id) +{ + struct pcf8563 *pcf8563 = i2c_get_clientdata(dev_id); + int err; + char pending; + + err = pcf8563_get_alarm_mode(pcf8563->client, NULL, &pending); + if (err < 0) + return err; + + if (pending) { + rtc_update_irq(pcf8563->rtc, 1, RTC_IRQF | RTC_AF); + pcf8563_set_alarm_mode(pcf8563->client, 1); + return IRQ_HANDLED; + } + + return IRQ_NONE; +} + /* * In the routines that deal directly with the pcf8563 hardware, we use * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. @@ -257,16 +321,83 @@ static int pcf8563_rtc_set_time(struct device *dev, struct rtc_time *tm) return pcf8563_set_datetime(to_i2c_client(dev), tm); } +static int pcf8563_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *tm) +{ + struct i2c_client *client = to_i2c_client(dev); + unsigned char buf[4]; + int err; + + err = pcf8563_read_block_data(client, PCF8563_REG_AMN, 4, buf); + if (err) + return err; + + dev_dbg(&client->dev, + "%s: raw data is min=%02x, hr=%02x, mday=%02x, wday=%02x\n", + __func__, buf[0], buf[1], buf[2], buf[3]); + + tm->time.tm_min = bcd2bin(buf[0] & 0x7F); + tm->time.tm_hour = bcd2bin(buf[1] & 0x7F); + tm->time.tm_mday = bcd2bin(buf[2] & 0x1F); + tm->time.tm_wday = bcd2bin(buf[3] & 0x7); + tm->time.tm_mon = -1; + tm->time.tm_year = -1; + tm->time.tm_yday = -1; + tm->time.tm_isdst = -1; + + err = pcf8563_get_alarm_mode(client, &tm->enabled, &tm->pending); + if (err < 0) + return err; + + dev_dbg(&client->dev, "%s: tm is mins=%d, hours=%d, mday=%d, wday=%d," + " enabled=%d, pending=%d\n", __func__, tm->time.tm_min, + tm->time.tm_hour, tm->time.tm_mday, tm->time.tm_wday, + tm->enabled, tm->pending); + + return 0; +} + +static int pcf8563_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *tm) +{ + struct i2c_client *client = to_i2c_client(dev); + unsigned char buf[4]; + int err; + + dev_dbg(dev, "%s, min=%d hour=%d wday=%d mday=%d " + "enabled=%d pending=%d\n", __func__, + tm->time.tm_min, tm->time.tm_hour, tm->time.tm_wday, + tm->time.tm_mday, tm->enabled, tm->pending); + + buf[0] = bin2bcd(tm->time.tm_min); + buf[1] = bin2bcd(tm->time.tm_hour); + buf[2] = bin2bcd(tm->time.tm_mday); + buf[3] = tm->time.tm_wday & 0x07; + + err = pcf8563_write_block_data(client, PCF8563_REG_AMN, 4, buf); + if (err) + return err; + + return pcf8563_set_alarm_mode(client, 1); +} + +static int pcf8563_irq_enable(struct device *dev, unsigned int enabled) +{ + return pcf8563_set_alarm_mode(to_i2c_client(dev), !!enabled); +} + static const struct rtc_class_ops pcf8563_rtc_ops = { .ioctl = pcf8563_rtc_ioctl, .read_time = pcf8563_rtc_read_time, .set_time = pcf8563_rtc_set_time, + .read_alarm = pcf8563_rtc_read_alarm, + .set_alarm = pcf8563_rtc_set_alarm, + .alarm_irq_enable = pcf8563_irq_enable, }; static int pcf8563_probe(struct i2c_client *client, const struct i2c_device_id *id) { struct pcf8563 *pcf8563; + int err; dev_dbg(&client->dev, "%s\n", __func__); @@ -281,12 +412,30 @@ static int pcf8563_probe(struct i2c_client *client, dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); i2c_set_clientdata(client, pcf8563); + pcf8563->client = client; + device_set_wakeup_capable(&client->dev, 1); pcf8563->rtc = devm_rtc_device_register(&client->dev, pcf8563_driver.driver.name, &pcf8563_rtc_ops, THIS_MODULE); - return PTR_ERR_OR_ZERO(pcf8563->rtc); + if (IS_ERR(pcf8563->rtc)) + return PTR_ERR(pcf8563->rtc); + + if (client->irq > 0) { + err = devm_request_threaded_irq(&client->dev, client->irq, + NULL, pcf8563_irq, + IRQF_SHARED|IRQF_ONESHOT|IRQF_TRIGGER_FALLING, + pcf8563->rtc->name, client); + if (err) { + dev_err(&client->dev, "unable to request IRQ %d\n", + client->irq); + return err; + } + + } + + return 0; } static const struct i2c_device_id pcf8563_id[] = { -- cgit v0.10.2 From db04d6284e2a626791e959dc71e922e8570440b6 Mon Sep 17 00:00:00 2001 From: Stuart Longland Date: Fri, 8 Aug 2014 14:20:20 -0700 Subject: drivers/rtc/rtc-isl12022.c: device tree support Add support for configuring the ISL12022 real-time clock via the Device tree framework. This is based on what I've seen in the related ISL12057 driver, it has been tested and works on a Technologic Systems TS-7670 device which uses a ISL12020 RTC device, my device tree looks like this: apbx@80040000 { i2c0: i2c@80058000 { pinctrl-names = "default"; pinctrl-0 = <&i2c0_pins_a>; clock-frequency = <400000>; status = "okay"; isl12022@0x6f { compatible = "isl,isl12022"; reg = <0x6f>; }; }; ... etc }; Signed-off-by: Stuart Longland Cc: Roman Fietze Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-isl12022.c b/drivers/rtc/rtc-isl12022.c index 03b8911..aa55f08 100644 --- a/drivers/rtc/rtc-isl12022.c +++ b/drivers/rtc/rtc-isl12022.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #define DRV_VERSION "0.1" @@ -271,6 +273,13 @@ static int isl12022_probe(struct i2c_client *client, return PTR_ERR_OR_ZERO(isl12022->rtc); } +#ifdef CONFIG_OF +static struct of_device_id isl12022_dt_match[] = { + { .compatible = "isl,isl12022" }, + { }, +}; +#endif + static const struct i2c_device_id isl12022_id[] = { { "isl12022", 0 }, { } @@ -280,6 +289,9 @@ MODULE_DEVICE_TABLE(i2c, isl12022_id); static struct i2c_driver isl12022_driver = { .driver = { .name = "rtc-isl12022", +#ifdef CONFIG_OF + .of_match_table = of_match_ptr(isl12022_dt_match), +#endif }, .probe = isl12022_probe, .id_table = isl12022_id, -- cgit v0.10.2 From 796b7abb33cd78412897a9e927eb5a8f5a9c4fe6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=B8ren=20Andersen?= Date: Fri, 8 Aug 2014 14:20:22 -0700 Subject: rtc: add pcf85063 support Add support for the pcf85063 rtc chip. [akpm@linux-foundation.org: fix comment typo, tweak conding style] Signed-off-by: Soeren Andersen Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt index 37803eb..6af570e 100644 --- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -70,6 +70,7 @@ nuvoton,npct501 i2c trusted platform module (TPM) nxp,pca9556 Octal SMBus and I2C registered interface nxp,pca9557 8-bit I2C-bus and SMBus I/O port with reset nxp,pcf8563 Real-time clock/calendar +nxp,pcf85063 Tiny Real-Time Clock ovti,ov5642 OV5642: Color CMOS QSXGA (5-megapixel) Image Sensor with OmniBSI and Embedded TrueFocus pericom,pt7c4338 Real-time Clock Module plx,pex8648 48-Lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 83743a1a..a168e96 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -373,6 +373,14 @@ config RTC_DRV_PCF8563 This driver can also be built as a module. If so, the module will be called rtc-pcf8563. +config RTC_DRV_PCF85063 + tristate "nxp PCF85063" + help + If you say yes here you get support for the PCF85063 RTC chip + + This driver can also be built as a module. If so, the module + will be called rtc-pcf85063. + config RTC_DRV_PCF8583 tristate "Philips PCF8583" help diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile index f1dfc36..56f061c 100644 --- a/drivers/rtc/Makefile +++ b/drivers/rtc/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_RTC_DRV_PCAP) += rtc-pcap.o obj-$(CONFIG_RTC_DRV_PCF2127) += rtc-pcf2127.o obj-$(CONFIG_RTC_DRV_PCF8523) += rtc-pcf8523.o obj-$(CONFIG_RTC_DRV_PCF8563) += rtc-pcf8563.o +obj-$(CONFIG_RTC_DRV_PCF85063) += rtc-pcf85063.o obj-$(CONFIG_RTC_DRV_PCF8583) += rtc-pcf8583.o obj-$(CONFIG_RTC_DRV_PCF2123) += rtc-pcf2123.o obj-$(CONFIG_RTC_DRV_PCF50633) += rtc-pcf50633.o diff --git a/drivers/rtc/rtc-pcf85063.c b/drivers/rtc/rtc-pcf85063.c new file mode 100644 index 0000000..6a12bf6 --- /dev/null +++ b/drivers/rtc/rtc-pcf85063.c @@ -0,0 +1,204 @@ +/* + * An I2C driver for the PCF85063 RTC + * Copyright 2014 Rose Technology + * + * Author: Søren Andersen + * Maintainers: http://www.nslu2-linux.org/ + * + * based on the other drivers in this same directory. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include +#include +#include +#include + +#define DRV_VERSION "0.0.1" + +#define PCF85063_REG_CTRL1 0x00 /* status */ +#define PCF85063_REG_CTRL2 0x01 + +#define PCF85063_REG_SC 0x04 /* datetime */ +#define PCF85063_REG_MN 0x05 +#define PCF85063_REG_HR 0x06 +#define PCF85063_REG_DM 0x07 +#define PCF85063_REG_DW 0x08 +#define PCF85063_REG_MO 0x09 +#define PCF85063_REG_YR 0x0A + +#define PCF85063_MO_C 0x80 /* century */ + +static struct i2c_driver pcf85063_driver; + +struct pcf85063 { + struct rtc_device *rtc; + int c_polarity; /* 0: MO_C=1 means 19xx, otherwise MO_C=1 means 20xx */ + int voltage_low; /* indicates if a low_voltage was detected */ +}; + +/* + * In the routines that deal directly with the pcf85063 hardware, we use + * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. + */ +static int pcf85063_get_datetime(struct i2c_client *client, struct rtc_time *tm) +{ + struct pcf85063 *pcf85063 = i2c_get_clientdata(client); + unsigned char buf[13] = { PCF85063_REG_CTRL1 }; + struct i2c_msg msgs[] = { + {/* setup read ptr */ + .addr = client->addr, + .len = 1, + .buf = buf + }, + {/* read status + date */ + .addr = client->addr, + .flags = I2C_M_RD, + .len = 13, + .buf = buf + }, + }; + + /* read registers */ + if ((i2c_transfer(client->adapter, msgs, 2)) != 2) { + dev_err(&client->dev, "%s: read error\n", __func__); + return -EIO; + } + + tm->tm_sec = bcd2bin(buf[PCF85063_REG_SC] & 0x7F); + tm->tm_min = bcd2bin(buf[PCF85063_REG_MN] & 0x7F); + tm->tm_hour = bcd2bin(buf[PCF85063_REG_HR] & 0x3F); /* rtc hr 0-23 */ + tm->tm_mday = bcd2bin(buf[PCF85063_REG_DM] & 0x3F); + tm->tm_wday = buf[PCF85063_REG_DW] & 0x07; + tm->tm_mon = bcd2bin(buf[PCF85063_REG_MO] & 0x1F) - 1; /* rtc mn 1-12 */ + tm->tm_year = bcd2bin(buf[PCF85063_REG_YR]); + if (tm->tm_year < 70) + tm->tm_year += 100; /* assume we are in 1970...2069 */ + /* detect the polarity heuristically. see note above. */ + pcf85063->c_polarity = (buf[PCF85063_REG_MO] & PCF85063_MO_C) ? + (tm->tm_year >= 100) : (tm->tm_year < 100); + + /* the clock can give out invalid datetime, but we cannot return + * -EINVAL otherwise hwclock will refuse to set the time on bootup. + */ + if (rtc_valid_tm(tm) < 0) + dev_err(&client->dev, "retrieved date/time is not valid.\n"); + + return 0; +} + +static int pcf85063_set_datetime(struct i2c_client *client, struct rtc_time *tm) +{ + int i = 0, err = 0; + unsigned char buf[11]; + + /* Control & status */ + buf[PCF85063_REG_CTRL1] = 0; + buf[PCF85063_REG_CTRL2] = 5; + + /* hours, minutes and seconds */ + buf[PCF85063_REG_SC] = bin2bcd(tm->tm_sec) & 0x7F; + + buf[PCF85063_REG_MN] = bin2bcd(tm->tm_min); + buf[PCF85063_REG_HR] = bin2bcd(tm->tm_hour); + + /* Day of month, 1 - 31 */ + buf[PCF85063_REG_DM] = bin2bcd(tm->tm_mday); + + /* Day, 0 - 6 */ + buf[PCF85063_REG_DW] = tm->tm_wday & 0x07; + + /* month, 1 - 12 */ + buf[PCF85063_REG_MO] = bin2bcd(tm->tm_mon + 1); + + /* year and century */ + buf[PCF85063_REG_YR] = bin2bcd(tm->tm_year % 100); + + /* write register's data */ + for (i = 0; i < sizeof(buf); i++) { + unsigned char data[2] = { i, buf[i] }; + + err = i2c_master_send(client, data, sizeof(data)); + if (err != sizeof(data)) { + dev_err(&client->dev, "%s: err=%d addr=%02x, data=%02x\n", + __func__, err, data[0], data[1]); + return -EIO; + } + } + + return 0; +} + +static int pcf85063_rtc_read_time(struct device *dev, struct rtc_time *tm) +{ + return pcf85063_get_datetime(to_i2c_client(dev), tm); +} + +static int pcf85063_rtc_set_time(struct device *dev, struct rtc_time *tm) +{ + return pcf85063_set_datetime(to_i2c_client(dev), tm); +} + +static const struct rtc_class_ops pcf85063_rtc_ops = { + .read_time = pcf85063_rtc_read_time, + .set_time = pcf85063_rtc_set_time +}; + +static int pcf85063_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + struct pcf85063 *pcf85063; + + dev_dbg(&client->dev, "%s\n", __func__); + + if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) + return -ENODEV; + + pcf85063 = devm_kzalloc(&client->dev, sizeof(struct pcf85063), + GFP_KERNEL); + if (!pcf85063) + return -ENOMEM; + + dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); + + i2c_set_clientdata(client, pcf85063); + + pcf85063->rtc = devm_rtc_device_register(&client->dev, + pcf85063_driver.driver.name, + &pcf85063_rtc_ops, THIS_MODULE); + + return PTR_ERR_OR_ZERO(pcf85063->rtc); +} + +static const struct i2c_device_id pcf85063_id[] = { + { "pcf85063", 0 }, + { } +}; +MODULE_DEVICE_TABLE(i2c, pcf85063_id); + +#ifdef CONFIG_OF +static const struct of_device_id pcf85063_of_match[] = { + { .compatible = "nxp,pcf85063" }, + {} +}; +MODULE_DEVICE_TABLE(of, pcf85063_of_match); +#endif + +static struct i2c_driver pcf85063_driver = { + .driver = { + .name = "rtc-pcf85063", + .owner = THIS_MODULE, + .of_match_table = of_match_ptr(pcf85063_of_match), + }, + .probe = pcf85063_probe, + .id_table = pcf85063_id, +}; + +module_i2c_driver(pcf85063_driver); + +MODULE_AUTHOR("Søren Andersen "); +MODULE_DESCRIPTION("PCF85063 RTC driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(DRV_VERSION); -- cgit v0.10.2 From e1d60093ca7341e884578c41a29da7cd1714c80e Mon Sep 17 00:00:00 2001 From: Hyogi Gim Date: Fri, 8 Aug 2014 14:20:24 -0700 Subject: driver/rtc/class.c: check the error after rtc_read_time() In rtc_suspend() and rtc_resume(), the error after rtc_read_time() is not checked. If rtc device fail to read time, we cannot guarantee the following process. Add the verification code for returned rtc_read_time() error. Signed-off-by: Hyogi Gim Cc: Alessandro Zummo Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c index 589351e..38e26be 100644 --- a/drivers/rtc/class.c +++ b/drivers/rtc/class.c @@ -53,6 +53,7 @@ static int rtc_suspend(struct device *dev) struct rtc_device *rtc = to_rtc_device(dev); struct rtc_time tm; struct timespec delta, delta_delta; + int err; if (has_persistent_clock()) return 0; @@ -61,7 +62,12 @@ static int rtc_suspend(struct device *dev) return 0; /* snapshot the current RTC and system time at suspend*/ - rtc_read_time(rtc, &tm); + err = rtc_read_time(rtc, &tm); + if (err < 0) { + pr_debug("%s: fail to read rtc time\n", dev_name(&rtc->dev)); + return 0; + } + getnstimeofday(&old_system); rtc_tm_to_time(&tm, &old_rtc.tv_sec); @@ -94,6 +100,7 @@ static int rtc_resume(struct device *dev) struct rtc_time tm; struct timespec new_system, new_rtc; struct timespec sleep_time; + int err; if (has_persistent_clock()) return 0; @@ -104,7 +111,12 @@ static int rtc_resume(struct device *dev) /* snapshot the current rtc and system time at resume */ getnstimeofday(&new_system); - rtc_read_time(rtc, &tm); + err = rtc_read_time(rtc, &tm); + if (err < 0) { + pr_debug("%s: fail to read rtc time\n", dev_name(&rtc->dev)); + return 0; + } + if (rtc_valid_tm(&tm) != 0) { pr_debug("%s: bogus resume time\n", dev_name(&rtc->dev)); return 0; -- cgit v0.10.2 From 9f7d7a1d0f36bed7f533807146483e8fdfe12a89 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Fri, 8 Aug 2014 14:20:26 -0700 Subject: drivers/rtc/rtc-tps65910.c: fix potential NULL-pointer dereference The interrupt handler gets the driver data associated with the RTC device and doesn't check it for validity. This can cause a NULL pointer being dereferenced when and interrupt fires before the driver data was properly set up. Fix this by setting the driver data earlier (before the interrupt is requested). Signed-off-by: Thierry Reding Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rtc/rtc-tps65910.c b/drivers/rtc/rtc-tps65910.c index 7af0020..2583349 100644 --- a/drivers/rtc/rtc-tps65910.c +++ b/drivers/rtc/rtc-tps65910.c @@ -258,6 +258,8 @@ static int tps65910_rtc_probe(struct platform_device *pdev) if (ret < 0) return ret; + platform_set_drvdata(pdev, tps_rtc); + irq = platform_get_irq(pdev, 0); if (irq <= 0) { dev_warn(&pdev->dev, "Wake up is not possible as irq = %d\n", @@ -283,8 +285,6 @@ static int tps65910_rtc_probe(struct platform_device *pdev) return ret; } - platform_set_drvdata(pdev, tps_rtc); - return 0; } -- cgit v0.10.2 From 6d6747f85314687f72012ae85cde401db531e130 Mon Sep 17 00:00:00 2001 From: Qi Yong Date: Fri, 8 Aug 2014 14:20:29 -0700 Subject: minix zmap block counts calculation fix The original minix zmap blocks calculation was correct, in the formula of: sbi->s_nzones - sbi->s_firstdatazone + 1 It is sp->s_zones - (sp->s_firstdatazone - 1) in the minix3 source code. But a later commit 016e8d44bc06 ("fs/minix: Verify bitmap block counts before mounting") has changed it unfortunately as: sbi->s_nzones - (sbi->s_firstdatazone + 1) This would show free blocks one block less than the real when the total data blocks are in "full zmap blocks plus one". This patch corrects that zmap blocks calculation and tidy a printk message while at it. Signed-off-by: Qi Yong Cc: Josh Boyer Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/minix/bitmap.c b/fs/minix/bitmap.c index 4bc50da..742942a 100644 --- a/fs/minix/bitmap.c +++ b/fs/minix/bitmap.c @@ -96,7 +96,7 @@ int minix_new_block(struct inode * inode) unsigned long minix_count_free_blocks(struct super_block *sb) { struct minix_sb_info *sbi = minix_sb(sb); - u32 bits = sbi->s_nzones - (sbi->s_firstdatazone + 1); + u32 bits = sbi->s_nzones - sbi->s_firstdatazone + 1; return (count_free(sbi->s_zmap, sb->s_blocksize, bits) << sbi->s_log_zone_size); diff --git a/fs/minix/inode.c b/fs/minix/inode.c index f007a33..3f57af1 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -267,12 +267,12 @@ static int minix_fill_super(struct super_block *s, void *data, int silent) block = minix_blocks_needed(sbi->s_ninodes, s->s_blocksize); if (sbi->s_imap_blocks < block) { printk("MINIX-fs: file system does not have enough " - "imap blocks allocated. Refusing to mount\n"); + "imap blocks allocated. Refusing to mount.\n"); goto out_no_bitmap; } block = minix_blocks_needed( - (sbi->s_nzones - (sbi->s_firstdatazone + 1)), + (sbi->s_nzones - sbi->s_firstdatazone + 1), s->s_blocksize); if (sbi->s_zmap_blocks < block) { printk("MINIX-fs: file system does not have enough " -- cgit v0.10.2 From 8e19189ef8d1fce44f3acdf0fe9846cff9b37c78 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:20:31 -0700 Subject: fs/befs/linuxvfs.c: check superblock before dump operation befs_dump_super_block was called between befs_load_sb and befs_check_sb. It has been reported to crash (5/900) with null block testing. This patch loads, checks and only dump superblock if it's a valid one then brelse bh. (befs_dump_super_block uses disk_sb (bh->b_data) so it seems we need to call it before brelse(bh) but I don't know why befs_check_sb was called after brelse. Another thing I don't understand is why this problem appears now). Signed-off-by: Fabian Frederick Reported-by: Fengguang Wu Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c index 0d6c07c..4cf61ec 100644 --- a/fs/befs/linuxvfs.c +++ b/fs/befs/linuxvfs.c @@ -832,16 +832,14 @@ befs_fill_super(struct super_block *sb, void *data, int silent) (befs_super_block *) ((void *) bh->b_data + x86_sb_off); } - if (befs_load_sb(sb, disk_sb) != BEFS_OK) + if ((befs_load_sb(sb, disk_sb) != BEFS_OK) || + (befs_check_sb(sb) != BEFS_OK)) goto unacquire_bh; befs_dump_super_block(sb, disk_sb); brelse(bh); - if (befs_check_sb(sb) != BEFS_OK) - goto unacquire_priv_sbp; - if( befs_sb->num_blocks > ~((sector_t)0) ) { befs_error(sb, "blocks count: %llu " "is larger than the host can use", -- cgit v0.10.2 From 834b46c37a2900bc90b5f1c5a11815be5a025445 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:20:33 -0700 Subject: fs/coda: use linux/uaccess.h Fix checkpatch warning WARNING: Use #include instead of Signed-off-by: Fabian Frederick Cc: Jan Harkes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/coda/cache.c b/fs/coda/cache.c index 1da168c..278f8fd 100644 --- a/fs/coda/cache.c +++ b/fs/coda/cache.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/fs/coda/coda_linux.c b/fs/coda/coda_linux.c index 2849f41..1326d38 100644 --- a/fs/coda/coda_linux.c +++ b/fs/coda/coda_linux.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include diff --git a/fs/coda/dir.c b/fs/coda/dir.c index cd8a632..9c3dedc 100644 --- a/fs/coda/dir.c +++ b/fs/coda/dir.c @@ -19,8 +19,7 @@ #include #include #include - -#include +#include #include #include diff --git a/fs/coda/file.c b/fs/coda/file.c index 9e83b77..d244d74 100644 --- a/fs/coda/file.c +++ b/fs/coda/file.c @@ -18,7 +18,7 @@ #include #include #include -#include +#include #include #include diff --git a/fs/coda/inode.c b/fs/coda/inode.c index fe3afb2..b945410 100644 --- a/fs/coda/inode.c +++ b/fs/coda/inode.c @@ -21,9 +21,7 @@ #include #include #include - -#include - +#include #include #include diff --git a/fs/coda/pioctl.c b/fs/coda/pioctl.c index 3f5de96..4326d17 100644 --- a/fs/coda/pioctl.c +++ b/fs/coda/pioctl.c @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include #include diff --git a/fs/coda/psdev.c b/fs/coda/psdev.c index 5c1e424..8226291 100644 --- a/fs/coda/psdev.c +++ b/fs/coda/psdev.c @@ -40,7 +40,7 @@ #include #include #include -#include +#include #include #include diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index 21fcf8d..5bb6e27 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -27,7 +27,7 @@ #include #include #include -#include +#include #include #include -- cgit v0.10.2 From aebe17f6844488ff0b824fbac28d9f342f7b078e Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:37 -0700 Subject: nilfs2: add /sys/fs/nilfs2/features group This patchset implements creation of sysfs groups and attributes with the purpose to show NILFS2 volume details, internal state of the driver and to manage internal state of NILFS2 driver. Sysfs is a virtual file system that exports information about devices and drivers from the kernel device model to user space, and is also used for configuration. NILFS2 is a complex file system that has segctor thread, GC thread, checkpoint/snapshot model and so on. Sysfs namespace provides native and easy way for: (1) getting info and statistics about volume state; (2) getting info and configuration of internal subsystems (segctor thread); (3) snapshots management. Suggested patchset provides basis for managing segctor thread behaviour and manipulation by snapshots. Currently, it informs only about segctor thread's internal parameters and about mounted snapshots. But sysfs interface can provide easy and simple way for deep management of segctor thread and snapshots. This patchset provides opportunity to manage interval of periodical update of superblock (in seconds). Default value is 10 seconds. Now a user can increase this value by means of nilfs2//superblock/sb_update_frequency attribute in the case of necessity. Also the patchset provides opportunity to get information easily about key volumes's parameters (free blocks, superblock write count, superblock update frequency, latest segment info, dirty data blocks count, count of clean segments, count of dirty segments and so on) in real time manner. Such information can be used in scripts for subtle management of filesystem. Implemented functionality creates such groups: (1) /sys/fs/nilfs2 - root group (2) /sys/fs/nilfs2/features - group contains attributes that describe NILFS file system driver features (3) /sys/fs/nilfs2/ - group contains attributes that describe file system partition's details (4) /sys/fs/nilfs2//superblock - group contains attributes that describe superblock's details (5) /sys/fs/nilfs2//segctor - group contains attributes that describe segctor thread activity details (6) /sys/fs/nilfs2//segments - group contains attributes that describe details about volume's segments (7) /sys/fs/nilfs2//checkpoints - group contains attributes that describe details about volume's checkpoints (8) /sys/fs/nilfs2//mounted_snapshots - group contains group for every mounted snapshot (9) /sys/fs/nilfs2//mounted_snapshots/ - group contains details about mounted snapshot This patch (of 9): This patch adds code of creation /sys/fs/nilfs2 group and /sys/fs/nilfs2/features group. The features group contains attributes that describe NILFS file system driver features: (1) revision - show current revision of NILFS file system driver. There are two formats of timestamp output - seconds and human-readable format. Every showed timestamp has two sysfs files (time- and time--secs). One sysfs file (time-) shows time in human-readable format. Another sysfs file (time--secs) shows time in seconds. It was reported by Michael Semon that timestamp output in human-readable format should be changed from "2014-4-12 14:5:38" to "2014-04-12 14:05:38". Second version of the patch fixes this issue. Reported-by: Michael L. Semon Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 new file mode 100644 index 0000000..71b4fd3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -0,0 +1,14 @@ + +What: /sys/fs/nilfs2/features/revision +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show current revision of NILFS file system driver. + This value informs about file system revision that + driver is ready to support. + +What: /sys/fs/nilfs2/features/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2/features group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c new file mode 100644 index 0000000..41b8f70 --- /dev/null +++ b/fs/nilfs2/sysfs.c @@ -0,0 +1,113 @@ +/* + * sysfs.c - sysfs support implementation. + * + * Copyright (C) 2005-2014 Nippon Telegraph and Telephone Corporation. + * Copyright (C) 2014 HGST, Inc., a Western Digital Company. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Written by Vyacheslav Dubeyko + */ + +#include + +#include "nilfs.h" +#include "mdt.h" +#include "sufile.h" +#include "cpfile.h" +#include "sysfs.h" + +/* /sys/fs// */ +static struct kset *nilfs_kset; + +#define NILFS_SHOW_TIME(time_t_val, buf) ({ \ + struct tm res; \ + int count = 0; \ + time_to_tm(time_t_val, 0, &res); \ + res.tm_year += 1900; \ + res.tm_mon += 1; \ + count = scnprintf(buf, PAGE_SIZE, \ + "%ld-%.2d-%.2d %.2d:%.2d:%.2d\n", \ + res.tm_year, res.tm_mon, res.tm_mday, \ + res.tm_hour, res.tm_min, res.tm_sec);\ + count; \ +}) + +/************************************************************************ + * NILFS feature attrs * + ************************************************************************/ + +static ssize_t nilfs_feature_revision_show(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%d.%d\n", + NILFS_CURRENT_REV, NILFS_MINOR_REV); +} + +static const char features_readme_str[] = + "The features group contains attributes that describe NILFS file\n" + "system driver features.\n\n" + "(1) revision\n\tshow current revision of NILFS file system driver.\n"; + +static ssize_t nilfs_feature_README_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, features_readme_str); +} + +NILFS_FEATURE_RO_ATTR(revision); +NILFS_FEATURE_RO_ATTR(README); + +static struct attribute *nilfs_feature_attrs[] = { + NILFS_FEATURE_ATTR_LIST(revision), + NILFS_FEATURE_ATTR_LIST(README), + NULL, +}; + +static const struct attribute_group nilfs_feature_attr_group = { + .name = "features", + .attrs = nilfs_feature_attrs, +}; + +int __init nilfs_sysfs_init(void) +{ + int err; + + nilfs_kset = kset_create_and_add(NILFS_ROOT_GROUP_NAME, NULL, fs_kobj); + if (!nilfs_kset) { + err = -ENOMEM; + printk(KERN_ERR "NILFS: unable to create sysfs entry: err %d\n", + err); + goto failed_sysfs_init; + } + + err = sysfs_create_group(&nilfs_kset->kobj, &nilfs_feature_attr_group); + if (unlikely(err)) { + printk(KERN_ERR "NILFS: unable to create feature group: err %d\n", + err); + goto cleanup_sysfs_init; + } + + return 0; + +cleanup_sysfs_init: + kset_unregister(nilfs_kset); + +failed_sysfs_init: + return err; +} + +void nilfs_sysfs_exit(void) +{ + sysfs_remove_group(&nilfs_kset->kobj, &nilfs_feature_attr_group); + kset_unregister(nilfs_kset); +} diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h new file mode 100644 index 0000000..0aed246 --- /dev/null +++ b/fs/nilfs2/sysfs.h @@ -0,0 +1,61 @@ +/* + * sysfs.h - sysfs support declarations. + * + * Copyright (C) 2005-2014 Nippon Telegraph and Telephone Corporation. + * Copyright (C) 2014 HGST, Inc., a Western Digital Company. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Written by Vyacheslav Dubeyko + */ + +#ifndef _NILFS_SYSFS_H +#define _NILFS_SYSFS_H + +#include + +#define NILFS_ROOT_GROUP_NAME "nilfs2" + +#define NILFS_COMMON_ATTR_STRUCT(name) \ +struct nilfs_##name##_attr { \ + struct attribute attr; \ + ssize_t (*show)(struct kobject *, struct attribute *, \ + char *); \ + ssize_t (*store)(struct kobject *, struct attribute *, \ + const char *, size_t); \ +}; + +NILFS_COMMON_ATTR_STRUCT(feature); + +#define NILFS_ATTR(type, name, mode, show, store) \ + static struct nilfs_##type##_attr nilfs_##type##_attr_##name = \ + __ATTR(name, mode, show, store) + +#define NILFS_INFO_ATTR(type, name) \ + NILFS_ATTR(type, name, 0444, NULL, NULL) +#define NILFS_RO_ATTR(type, name) \ + NILFS_ATTR(type, name, 0444, nilfs_##type##_##name##_show, NULL) +#define NILFS_RW_ATTR(type, name) \ + NILFS_ATTR(type, name, 0644, \ + nilfs_##type##_##name##_show, \ + nilfs_##type##_##name##_store) + +#define NILFS_FEATURE_INFO_ATTR(name) \ + NILFS_INFO_ATTR(feature, name) +#define NILFS_FEATURE_RO_ATTR(name) \ + NILFS_RO_ATTR(feature, name) +#define NILFS_FEATURE_RW_ATTR(name) \ + NILFS_RW_ATTR(feature, name) + +#define NILFS_FEATURE_ATTR_LIST(name) \ + (&nilfs_feature_attr_##name.attr) + +#endif /* _NILFS_SYSFS_H */ -- cgit v0.10.2 From da7141fb78db915680616e15677539fc8140cf53 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:39 -0700 Subject: nilfs2: add /sys/fs/nilfs2/ group This patch adds creation of /sys/fs/nilfs2/ group. The group contains attributes that describe file system partition's details: (1) revision - show NILFS file system revision. (2) blocksize - show volume block size in bytes. (3) device_size - show volume size in bytes. (4) free_blocks - show count of free blocks on volume. (5) uuid - show volume's UUID. (6) volume_name - show volume's name. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index 71b4fd3..0a521d0 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -12,3 +12,47 @@ Date: April 2014 Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2/features group. + +What: /sys/fs/nilfs2//revision +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show NILFS file system revision on volume. + This value informs about metadata structures' + revision on mounted volume. + +What: /sys/fs/nilfs2//blocksize +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show volume's block size in bytes. + +What: /sys/fs/nilfs2//device_size +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show volume size in bytes. + +What: /sys/fs/nilfs2//free_blocks +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show count of free blocks on volume. + +What: /sys/fs/nilfs2//uuid +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show volume's UUID (Universally Unique Identifier). + +What: /sys/fs/nilfs2//volume_name +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show volume's label. + +What: /sys/fs/nilfs2//README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2/ group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 41b8f70..780d0f5 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -42,6 +42,174 @@ static struct kset *nilfs_kset; }) /************************************************************************ + * NILFS device attrs * + ************************************************************************/ + +static +ssize_t nilfs_dev_revision_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + struct nilfs_super_block **sbp = nilfs->ns_sbp; + u32 major = le32_to_cpu(sbp[0]->s_rev_level); + u16 minor = le16_to_cpu(sbp[0]->s_minor_rev_level); + + return snprintf(buf, PAGE_SIZE, "%d.%d\n", major, minor); +} + +static +ssize_t nilfs_dev_blocksize_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%u\n", nilfs->ns_blocksize); +} + +static +ssize_t nilfs_dev_device_size_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + struct nilfs_super_block **sbp = nilfs->ns_sbp; + u64 dev_size = le64_to_cpu(sbp[0]->s_dev_size); + + return snprintf(buf, PAGE_SIZE, "%llu\n", dev_size); +} + +static +ssize_t nilfs_dev_free_blocks_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + sector_t free_blocks = 0; + + nilfs_count_free_blocks(nilfs, &free_blocks); + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long)free_blocks); +} + +static +ssize_t nilfs_dev_uuid_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + struct nilfs_super_block **sbp = nilfs->ns_sbp; + + return snprintf(buf, PAGE_SIZE, "%pUb\n", sbp[0]->s_uuid); +} + +static +ssize_t nilfs_dev_volume_name_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + struct nilfs_super_block **sbp = nilfs->ns_sbp; + + return scnprintf(buf, sizeof(sbp[0]->s_volume_name), "%s\n", + sbp[0]->s_volume_name); +} + +static const char dev_readme_str[] = + "The group contains attributes that describe file system\n" + "partition's details.\n\n" + "(1) revision\n\tshow NILFS file system revision.\n\n" + "(2) blocksize\n\tshow volume block size in bytes.\n\n" + "(3) device_size\n\tshow volume size in bytes.\n\n" + "(4) free_blocks\n\tshow count of free blocks on volume.\n\n" + "(5) uuid\n\tshow volume's UUID.\n\n" + "(6) volume_name\n\tshow volume's name.\n\n"; + +static ssize_t nilfs_dev_README_show(struct nilfs_dev_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, dev_readme_str); +} + +NILFS_DEV_RO_ATTR(revision); +NILFS_DEV_RO_ATTR(blocksize); +NILFS_DEV_RO_ATTR(device_size); +NILFS_DEV_RO_ATTR(free_blocks); +NILFS_DEV_RO_ATTR(uuid); +NILFS_DEV_RO_ATTR(volume_name); +NILFS_DEV_RO_ATTR(README); + +static struct attribute *nilfs_dev_attrs[] = { + NILFS_DEV_ATTR_LIST(revision), + NILFS_DEV_ATTR_LIST(blocksize), + NILFS_DEV_ATTR_LIST(device_size), + NILFS_DEV_ATTR_LIST(free_blocks), + NILFS_DEV_ATTR_LIST(uuid), + NILFS_DEV_ATTR_LIST(volume_name), + NILFS_DEV_ATTR_LIST(README), + NULL, +}; + +static ssize_t nilfs_dev_attr_show(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + struct the_nilfs *nilfs = container_of(kobj, struct the_nilfs, + ns_dev_kobj); + struct nilfs_dev_attr *a = container_of(attr, struct nilfs_dev_attr, + attr); + + return a->show ? a->show(a, nilfs, buf) : 0; +} + +static ssize_t nilfs_dev_attr_store(struct kobject *kobj, + struct attribute *attr, + const char *buf, size_t len) +{ + struct the_nilfs *nilfs = container_of(kobj, struct the_nilfs, + ns_dev_kobj); + struct nilfs_dev_attr *a = container_of(attr, struct nilfs_dev_attr, + attr); + + return a->store ? a->store(a, nilfs, buf, len) : 0; +} + +static void nilfs_dev_attr_release(struct kobject *kobj) +{ + struct the_nilfs *nilfs = container_of(kobj, struct the_nilfs, + ns_dev_kobj); + complete(&nilfs->ns_dev_kobj_unregister); +} + +static const struct sysfs_ops nilfs_dev_attr_ops = { + .show = nilfs_dev_attr_show, + .store = nilfs_dev_attr_store, +}; + +static struct kobj_type nilfs_dev_ktype = { + .default_attrs = nilfs_dev_attrs, + .sysfs_ops = &nilfs_dev_attr_ops, + .release = nilfs_dev_attr_release, +}; + +int nilfs_sysfs_create_device_group(struct super_block *sb) +{ + struct the_nilfs *nilfs = sb->s_fs_info; + int err; + + nilfs->ns_dev_kobj.kset = nilfs_kset; + init_completion(&nilfs->ns_dev_kobj_unregister); + err = kobject_init_and_add(&nilfs->ns_dev_kobj, &nilfs_dev_ktype, NULL, + "%s", sb->s_id); + if (err) + goto failed_create_device_group; + + return 0; + +failed_create_device_group: + return err; +} + +void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) +{ + kobject_del(&nilfs->ns_dev_kobj); +} + +/************************************************************************ * NILFS feature attrs * ************************************************************************/ diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 0aed246..2353b28 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -35,6 +35,17 @@ struct nilfs_##name##_attr { \ NILFS_COMMON_ATTR_STRUCT(feature); +#define NILFS_DEV_ATTR_STRUCT(name) \ +struct nilfs_##name##_attr { \ + struct attribute attr; \ + ssize_t (*show)(struct nilfs_##name##_attr *, struct the_nilfs *, \ + char *); \ + ssize_t (*store)(struct nilfs_##name##_attr *, struct the_nilfs *, \ + const char *, size_t); \ +}; + +NILFS_DEV_ATTR_STRUCT(dev); + #define NILFS_ATTR(type, name, mode, show, store) \ static struct nilfs_##type##_attr nilfs_##type##_attr_##name = \ __ATTR(name, mode, show, store) @@ -55,7 +66,16 @@ NILFS_COMMON_ATTR_STRUCT(feature); #define NILFS_FEATURE_RW_ATTR(name) \ NILFS_RW_ATTR(feature, name) +#define NILFS_DEV_INFO_ATTR(name) \ + NILFS_INFO_ATTR(dev, name) +#define NILFS_DEV_RO_ATTR(name) \ + NILFS_RO_ATTR(dev, name) +#define NILFS_DEV_RW_ATTR(name) \ + NILFS_RW_ATTR(dev, name) + #define NILFS_FEATURE_ATTR_LIST(name) \ (&nilfs_feature_attr_##name.attr) +#define NILFS_DEV_ATTR_LIST(name) \ + (&nilfs_dev_attr_##name.attr) #endif /* _NILFS_SYSFS_H */ diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index de8cc53..34bc7bd 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -95,6 +95,8 @@ enum { * @ns_inode_size: size of on-disk inode * @ns_first_ino: first not-special inode number * @ns_crc_seed: seed value of CRC32 calculation + * @ns_dev_kobj: /sys/fs// + * @ns_dev_kobj_unregister: completion state */ struct the_nilfs { unsigned long ns_flags; @@ -188,6 +190,10 @@ struct the_nilfs { int ns_inode_size; int ns_first_ino; u32 ns_crc_seed; + + /* /sys/fs// */ + struct kobject ns_dev_kobj; + struct completion ns_dev_kobj_unregister; }; #define THE_NILFS_FNS(bit, name) \ -- cgit v0.10.2 From caa05d49dfd7fe04820ba8b7e424343d5426a7f3 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:42 -0700 Subject: nilfs2: add /sys/fs/nilfs2//superblock group This patch adds creation of /sys/fs/nilfs2//superblock group. The superblock group contains attributes that describe superblock's details: (1) sb_write_time - show previous write time of super block in human-readable format. (2) sb_write_time_secs - show previous write time of super block in seconds. (3) sb_write_count - show write count of super block. (4) sb_update_frequency - show/set interval of periodical update of superblock (in seconds). You can set preferable frequency of superblock update by command: echo > /sys/fs/nilfs2//superblock/sb_update_frequency Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index 0a521d0..e9b4c61 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -56,3 +56,36 @@ Date: April 2014 Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2/ group. + +What: /sys/fs/nilfs2//superblock/sb_write_time +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show last write time of super block in human-readable + format. + +What: /sys/fs/nilfs2//superblock/sb_write_time_secs +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show last write time of super block in seconds. + +What: /sys/fs/nilfs2//superblock/sb_write_count +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show current write count of super block. + +What: /sys/fs/nilfs2//superblock/sb_update_frequency +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show/Set interval of periodical update of superblock + (in seconds). + +What: /sys/fs/nilfs2//superblock/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2//superblock + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 780d0f5..239f239 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -41,6 +41,202 @@ static struct kset *nilfs_kset; count; \ }) +#define NILFS_DEV_INT_GROUP_OPS(name, parent_name) \ +static ssize_t nilfs_##name##_attr_show(struct kobject *kobj, \ + struct attribute *attr, char *buf) \ +{ \ + struct the_nilfs *nilfs = container_of(kobj->parent, \ + struct the_nilfs, \ + ns_##parent_name##_kobj); \ + struct nilfs_##name##_attr *a = container_of(attr, \ + struct nilfs_##name##_attr, \ + attr); \ + return a->show ? a->show(a, nilfs, buf) : 0; \ +} \ +static ssize_t nilfs_##name##_attr_store(struct kobject *kobj, \ + struct attribute *attr, \ + const char *buf, size_t len) \ +{ \ + struct the_nilfs *nilfs = container_of(kobj->parent, \ + struct the_nilfs, \ + ns_##parent_name##_kobj); \ + struct nilfs_##name##_attr *a = container_of(attr, \ + struct nilfs_##name##_attr, \ + attr); \ + return a->store ? a->store(a, nilfs, buf, len) : 0; \ +} \ +static const struct sysfs_ops nilfs_##name##_attr_ops = { \ + .show = nilfs_##name##_attr_show, \ + .store = nilfs_##name##_attr_store, \ +}; + +#define NILFS_DEV_INT_GROUP_TYPE(name, parent_name) \ +static void nilfs_##name##_attr_release(struct kobject *kobj) \ +{ \ + struct nilfs_sysfs_##parent_name##_subgroups *subgroups; \ + struct the_nilfs *nilfs = container_of(kobj->parent, \ + struct the_nilfs, \ + ns_##parent_name##_kobj); \ + subgroups = nilfs->ns_##parent_name##_subgroups; \ + complete(&subgroups->sg_##name##_kobj_unregister); \ +} \ +static struct kobj_type nilfs_##name##_ktype = { \ + .default_attrs = nilfs_##name##_attrs, \ + .sysfs_ops = &nilfs_##name##_attr_ops, \ + .release = nilfs_##name##_attr_release, \ +}; + +#define NILFS_DEV_INT_GROUP_FNS(name, parent_name) \ +int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ +{ \ + struct kobject *parent; \ + struct kobject *kobj; \ + struct completion *kobj_unregister; \ + struct nilfs_sysfs_##parent_name##_subgroups *subgroups; \ + int err; \ + subgroups = nilfs->ns_##parent_name##_subgroups; \ + kobj = &subgroups->sg_##name##_kobj; \ + kobj_unregister = &subgroups->sg_##name##_kobj_unregister; \ + parent = &nilfs->ns_##parent_name##_kobj; \ + kobj->kset = nilfs_kset; \ + init_completion(kobj_unregister); \ + err = kobject_init_and_add(kobj, &nilfs_##name##_ktype, parent, \ + #name); \ + if (err) \ + return err; \ + return 0; \ +} \ +void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ +{ \ + kobject_del(&nilfs->ns_##parent_name##_subgroups->sg_##name##_kobj); \ +} + +/************************************************************************ + * NILFS superblock attrs * + ************************************************************************/ + +static ssize_t +nilfs_superblock_sb_write_time_show(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t sbwtime; + + down_read(&nilfs->ns_sem); + sbwtime = nilfs->ns_sbwtime; + up_read(&nilfs->ns_sem); + + return NILFS_SHOW_TIME(sbwtime, buf); +} + +static ssize_t +nilfs_superblock_sb_write_time_secs_show(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t sbwtime; + + down_read(&nilfs->ns_sem); + sbwtime = nilfs->ns_sbwtime; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long)sbwtime); +} + +static ssize_t +nilfs_superblock_sb_write_count_show(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + unsigned sbwcount; + + down_read(&nilfs->ns_sem); + sbwcount = nilfs->ns_sbwcount; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%u\n", sbwcount); +} + +static ssize_t +nilfs_superblock_sb_update_frequency_show(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + unsigned sb_update_freq; + + down_read(&nilfs->ns_sem); + sb_update_freq = nilfs->ns_sb_update_freq; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%u\n", sb_update_freq); +} + +static ssize_t +nilfs_superblock_sb_update_frequency_store(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, + const char *buf, size_t count) +{ + unsigned val; + int err; + + err = kstrtouint(skip_spaces(buf), 0, &val); + if (err) { + printk(KERN_ERR "NILFS: unable to convert string: err=%d\n", + err); + return err; + } + + if (val < NILFS_SB_FREQ) { + val = NILFS_SB_FREQ; + printk(KERN_WARNING "NILFS: superblock update frequency cannot be lesser than 10 seconds\n"); + } + + down_write(&nilfs->ns_sem); + nilfs->ns_sb_update_freq = val; + up_write(&nilfs->ns_sem); + + return count; +} + +static const char sb_readme_str[] = + "The superblock group contains attributes that describe\n" + "superblock's details.\n\n" + "(1) sb_write_time\n\tshow previous write time of super block " + "in human-readable format.\n\n" + "(2) sb_write_time_secs\n\tshow previous write time of super block " + "in seconds.\n\n" + "(3) sb_write_count\n\tshow write count of super block.\n\n" + "(4) sb_update_frequency\n" + "\tshow/set interval of periodical update of superblock (in seconds).\n\n" + "\tYou can set preferable frequency of superblock update by command:\n\n" + "\t'echo > /sys/fs///superblock/sb_update_frequency'\n"; + +static ssize_t +nilfs_superblock_README_show(struct nilfs_superblock_attr *attr, + struct the_nilfs *nilfs, char *buf) +{ + return snprintf(buf, PAGE_SIZE, sb_readme_str); +} + +NILFS_SUPERBLOCK_RO_ATTR(sb_write_time); +NILFS_SUPERBLOCK_RO_ATTR(sb_write_time_secs); +NILFS_SUPERBLOCK_RO_ATTR(sb_write_count); +NILFS_SUPERBLOCK_RW_ATTR(sb_update_frequency); +NILFS_SUPERBLOCK_RO_ATTR(README); + +static struct attribute *nilfs_superblock_attrs[] = { + NILFS_SUPERBLOCK_ATTR_LIST(sb_write_time), + NILFS_SUPERBLOCK_ATTR_LIST(sb_write_time_secs), + NILFS_SUPERBLOCK_ATTR_LIST(sb_write_count), + NILFS_SUPERBLOCK_ATTR_LIST(sb_update_frequency), + NILFS_SUPERBLOCK_ATTR_LIST(README), + NULL, +}; + +NILFS_DEV_INT_GROUP_OPS(superblock, dev); +NILFS_DEV_INT_GROUP_TYPE(superblock, dev); +NILFS_DEV_INT_GROUP_FNS(superblock, dev); + /************************************************************************ * NILFS device attrs * ************************************************************************/ @@ -189,24 +385,44 @@ static struct kobj_type nilfs_dev_ktype = { int nilfs_sysfs_create_device_group(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; + size_t devgrp_size = sizeof(struct nilfs_sysfs_dev_subgroups); int err; + nilfs->ns_dev_subgroups = kzalloc(devgrp_size, GFP_KERNEL); + if (unlikely(!nilfs->ns_dev_subgroups)) { + err = -ENOMEM; + printk(KERN_ERR "NILFS: unable to allocate memory for device group\n"); + goto failed_create_device_group; + } + nilfs->ns_dev_kobj.kset = nilfs_kset; init_completion(&nilfs->ns_dev_kobj_unregister); err = kobject_init_and_add(&nilfs->ns_dev_kobj, &nilfs_dev_ktype, NULL, "%s", sb->s_id); if (err) - goto failed_create_device_group; + goto free_dev_subgroups; + + err = nilfs_sysfs_create_superblock_group(nilfs); + if (err) + goto cleanup_dev_kobject; return 0; +cleanup_dev_kobject: + kobject_del(&nilfs->ns_dev_kobj); + +free_dev_subgroups: + kfree(nilfs->ns_dev_subgroups); + failed_create_device_group: return err; } void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) { + nilfs_sysfs_delete_superblock_group(nilfs); kobject_del(&nilfs->ns_dev_kobj); + kfree(nilfs->ns_dev_subgroups); } /************************************************************************ diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 2353b28..1d76af5 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -24,6 +24,17 @@ #define NILFS_ROOT_GROUP_NAME "nilfs2" +/* + * struct nilfs_sysfs_dev_subgroups - device subgroup kernel objects + * @sg_superblock_kobj: /sys/fs///superblock + * @sg_superblock_kobj_unregister: completion state + */ +struct nilfs_sysfs_dev_subgroups { + /* /sys/fs///superblock */ + struct kobject sg_superblock_kobj; + struct completion sg_superblock_kobj_unregister; +}; + #define NILFS_COMMON_ATTR_STRUCT(name) \ struct nilfs_##name##_attr { \ struct attribute attr; \ @@ -45,6 +56,7 @@ struct nilfs_##name##_attr { \ }; NILFS_DEV_ATTR_STRUCT(dev); +NILFS_DEV_ATTR_STRUCT(superblock); #define NILFS_ATTR(type, name, mode, show, store) \ static struct nilfs_##type##_attr nilfs_##type##_attr_##name = \ @@ -73,9 +85,16 @@ NILFS_DEV_ATTR_STRUCT(dev); #define NILFS_DEV_RW_ATTR(name) \ NILFS_RW_ATTR(dev, name) +#define NILFS_SUPERBLOCK_RO_ATTR(name) \ + NILFS_RO_ATTR(superblock, name) +#define NILFS_SUPERBLOCK_RW_ATTR(name) \ + NILFS_RW_ATTR(superblock, name) + #define NILFS_FEATURE_ATTR_LIST(name) \ (&nilfs_feature_attr_##name.attr) #define NILFS_DEV_ATTR_LIST(name) \ (&nilfs_dev_attr_##name.attr) +#define NILFS_SUPERBLOCK_ATTR_LIST(name) \ + (&nilfs_superblock_attr_##name.attr) #endif /* _NILFS_SYSFS_H */ diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index 8ba8229..59d5008 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -85,6 +85,7 @@ struct the_nilfs *alloc_nilfs(struct block_device *bdev) nilfs->ns_cptree = RB_ROOT; spin_lock_init(&nilfs->ns_cptree_lock); init_rwsem(&nilfs->ns_segctor_sem); + nilfs->ns_sb_update_freq = NILFS_SB_FREQ; return nilfs; } diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 34bc7bd..7fe8223 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -33,6 +33,7 @@ #include struct nilfs_sc_info; +struct nilfs_sysfs_dev_subgroups; /* the_nilfs struct */ enum { @@ -54,6 +55,7 @@ enum { * @ns_sbwcount: write count of super block * @ns_sbsize: size of valid data in super block * @ns_mount_state: file system state + * @ns_sb_update_freq: interval of periodical update of superblocks (in seconds) * @ns_seg_seq: segment sequence counter * @ns_segnum: index number of the latest full segment. * @ns_nextnum: index number of the full segment index to be used next @@ -97,6 +99,7 @@ enum { * @ns_crc_seed: seed value of CRC32 calculation * @ns_dev_kobj: /sys/fs// * @ns_dev_kobj_unregister: completion state + * @ns_dev_subgroups: subgroups pointer */ struct the_nilfs { unsigned long ns_flags; @@ -116,6 +119,7 @@ struct the_nilfs { unsigned ns_sbwcount; unsigned ns_sbsize; unsigned ns_mount_state; + unsigned ns_sb_update_freq; /* * Following fields are dedicated to a writable FS-instance. @@ -194,6 +198,7 @@ struct the_nilfs { /* /sys/fs// */ struct kobject ns_dev_kobj; struct completion ns_dev_kobj_unregister; + struct nilfs_sysfs_dev_subgroups *ns_dev_subgroups; }; #define THE_NILFS_FNS(bit, name) \ @@ -260,7 +265,8 @@ struct nilfs_root { static inline int nilfs_sb_need_update(struct the_nilfs *nilfs) { u64 t = get_seconds(); - return t < nilfs->ns_sbwtime || t > nilfs->ns_sbwtime + NILFS_SB_FREQ; + return t < nilfs->ns_sbwtime || + t > nilfs->ns_sbwtime + nilfs->ns_sb_update_freq; } static inline int nilfs_sb_will_flip(struct the_nilfs *nilfs) -- cgit v0.10.2 From abc968dbf291955ac750ecf59e3baf2b529a8257 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:44 -0700 Subject: nilfs2: add /sys/fs/nilfs2//segctor group This patch adds creation of /sys/fs/nilfs2//segctor group. The segctor group contains attributes that describe segctor thread activity details: (1) last_pseg_block - show start block number of the latest segment. (2) last_seg_sequence - show sequence value of the latest segment. (3) last_seg_checkpoint - show checkpoint number of the latest segment. (4) current_seg_sequence - show segment sequence counter. (5) current_last_full_seg - show index number of the latest full segment. (6) next_full_seg - show index number of the full segment index to be used next. (7) next_pseg_offset - show offset of next partial segment in the current full segment. (8) next_checkpoint - show next checkpoint number. (9) last_seg_write_time - show write time of the last segment in human-readable format. (10) last_seg_write_time_secs - show write time of the last segment in seconds. (11) last_nongc_write_time - show write time of the last segment not for cleaner operation in human-readable format. (12) last_nongc_write_time_secs - show write time of the last segment not for cleaner operation in seconds. (13) dirty_data_blocks_count - show number of dirty data blocks. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index e9b4c61..c9952f4 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -89,3 +89,93 @@ Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2//superblock group. + +What: /sys/fs/nilfs2//segctor/last_pseg_block +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show start block number of the latest segment. + +What: /sys/fs/nilfs2//segctor/last_seg_sequence +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show sequence value of the latest segment. + +What: /sys/fs/nilfs2//segctor/last_seg_checkpoint +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show checkpoint number of the latest segment. + +What: /sys/fs/nilfs2//segctor/current_seg_sequence +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show segment sequence counter. + +What: /sys/fs/nilfs2//segctor/current_last_full_seg +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show index number of the latest full segment. + +What: /sys/fs/nilfs2//segctor/next_full_seg +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show index number of the full segment index + to be used next. + +What: /sys/fs/nilfs2//segctor/next_pseg_offset +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show offset of next partial segment in the current + full segment. + +What: /sys/fs/nilfs2//segctor/next_checkpoint +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show next checkpoint number. + +What: /sys/fs/nilfs2//segctor/last_seg_write_time +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show write time of the last segment in + human-readable format. + +What: /sys/fs/nilfs2//segctor/last_seg_write_time_secs +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show write time of the last segment in seconds. + +What: /sys/fs/nilfs2//segctor/last_nongc_write_time +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show write time of the last segment not for cleaner + operation in human-readable format. + +What: /sys/fs/nilfs2//segctor/last_nongc_write_time_secs +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show write time of the last segment not for cleaner + operation in seconds. + +What: /sys/fs/nilfs2//segctor/dirty_data_blocks_count +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of dirty data blocks. + +What: /sys/fs/nilfs2//segctor/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2//segctor + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 239f239..61454a8 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -112,6 +112,268 @@ void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ } /************************************************************************ + * NILFS segctor attrs * + ************************************************************************/ + +static ssize_t +nilfs_segctor_last_pseg_block_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + sector_t last_pseg; + + spin_lock(&nilfs->ns_last_segment_lock); + last_pseg = nilfs->ns_last_pseg; + spin_unlock(&nilfs->ns_last_segment_lock); + + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long)last_pseg); +} + +static ssize_t +nilfs_segctor_last_seg_sequence_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + u64 last_seq; + + spin_lock(&nilfs->ns_last_segment_lock); + last_seq = nilfs->ns_last_seq; + spin_unlock(&nilfs->ns_last_segment_lock); + + return snprintf(buf, PAGE_SIZE, "%llu\n", last_seq); +} + +static ssize_t +nilfs_segctor_last_seg_checkpoint_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 last_cno; + + spin_lock(&nilfs->ns_last_segment_lock); + last_cno = nilfs->ns_last_cno; + spin_unlock(&nilfs->ns_last_segment_lock); + + return snprintf(buf, PAGE_SIZE, "%llu\n", last_cno); +} + +static ssize_t +nilfs_segctor_current_seg_sequence_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + u64 seg_seq; + + down_read(&nilfs->ns_sem); + seg_seq = nilfs->ns_seg_seq; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", seg_seq); +} + +static ssize_t +nilfs_segctor_current_last_full_seg_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 segnum; + + down_read(&nilfs->ns_sem); + segnum = nilfs->ns_segnum; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", segnum); +} + +static ssize_t +nilfs_segctor_next_full_seg_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 nextnum; + + down_read(&nilfs->ns_sem); + nextnum = nilfs->ns_nextnum; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", nextnum); +} + +static ssize_t +nilfs_segctor_next_pseg_offset_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + unsigned long pseg_offset; + + down_read(&nilfs->ns_sem); + pseg_offset = nilfs->ns_pseg_offset; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%lu\n", pseg_offset); +} + +static ssize_t +nilfs_segctor_next_checkpoint_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 cno; + + down_read(&nilfs->ns_sem); + cno = nilfs->ns_cno; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", cno); +} + +static ssize_t +nilfs_segctor_last_seg_write_time_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t ctime; + + down_read(&nilfs->ns_sem); + ctime = nilfs->ns_ctime; + up_read(&nilfs->ns_sem); + + return NILFS_SHOW_TIME(ctime, buf); +} + +static ssize_t +nilfs_segctor_last_seg_write_time_secs_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t ctime; + + down_read(&nilfs->ns_sem); + ctime = nilfs->ns_ctime; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long)ctime); +} + +static ssize_t +nilfs_segctor_last_nongc_write_time_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t nongc_ctime; + + down_read(&nilfs->ns_sem); + nongc_ctime = nilfs->ns_nongc_ctime; + up_read(&nilfs->ns_sem); + + return NILFS_SHOW_TIME(nongc_ctime, buf); +} + +static ssize_t +nilfs_segctor_last_nongc_write_time_secs_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + time_t nongc_ctime; + + down_read(&nilfs->ns_sem); + nongc_ctime = nilfs->ns_nongc_ctime; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long)nongc_ctime); +} + +static ssize_t +nilfs_segctor_dirty_data_blocks_count_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + u32 ndirtyblks; + + down_read(&nilfs->ns_sem); + ndirtyblks = atomic_read(&nilfs->ns_ndirtyblks); + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%u\n", ndirtyblks); +} + +static const char segctor_readme_str[] = + "The segctor group contains attributes that describe\n" + "segctor thread activity details.\n\n" + "(1) last_pseg_block\n" + "\tshow start block number of the latest segment.\n\n" + "(2) last_seg_sequence\n" + "\tshow sequence value of the latest segment.\n\n" + "(3) last_seg_checkpoint\n" + "\tshow checkpoint number of the latest segment.\n\n" + "(4) current_seg_sequence\n\tshow segment sequence counter.\n\n" + "(5) current_last_full_seg\n" + "\tshow index number of the latest full segment.\n\n" + "(6) next_full_seg\n" + "\tshow index number of the full segment index to be used next.\n\n" + "(7) next_pseg_offset\n" + "\tshow offset of next partial segment in the current full segment.\n\n" + "(8) next_checkpoint\n\tshow next checkpoint number.\n\n" + "(9) last_seg_write_time\n" + "\tshow write time of the last segment in human-readable format.\n\n" + "(10) last_seg_write_time_secs\n" + "\tshow write time of the last segment in seconds.\n\n" + "(11) last_nongc_write_time\n" + "\tshow write time of the last segment not for cleaner operation " + "in human-readable format.\n\n" + "(12) last_nongc_write_time_secs\n" + "\tshow write time of the last segment not for cleaner operation " + "in seconds.\n\n" + "(13) dirty_data_blocks_count\n" + "\tshow number of dirty data blocks.\n\n"; + +static ssize_t +nilfs_segctor_README_show(struct nilfs_segctor_attr *attr, + struct the_nilfs *nilfs, char *buf) +{ + return snprintf(buf, PAGE_SIZE, segctor_readme_str); +} + +NILFS_SEGCTOR_RO_ATTR(last_pseg_block); +NILFS_SEGCTOR_RO_ATTR(last_seg_sequence); +NILFS_SEGCTOR_RO_ATTR(last_seg_checkpoint); +NILFS_SEGCTOR_RO_ATTR(current_seg_sequence); +NILFS_SEGCTOR_RO_ATTR(current_last_full_seg); +NILFS_SEGCTOR_RO_ATTR(next_full_seg); +NILFS_SEGCTOR_RO_ATTR(next_pseg_offset); +NILFS_SEGCTOR_RO_ATTR(next_checkpoint); +NILFS_SEGCTOR_RO_ATTR(last_seg_write_time); +NILFS_SEGCTOR_RO_ATTR(last_seg_write_time_secs); +NILFS_SEGCTOR_RO_ATTR(last_nongc_write_time); +NILFS_SEGCTOR_RO_ATTR(last_nongc_write_time_secs); +NILFS_SEGCTOR_RO_ATTR(dirty_data_blocks_count); +NILFS_SEGCTOR_RO_ATTR(README); + +static struct attribute *nilfs_segctor_attrs[] = { + NILFS_SEGCTOR_ATTR_LIST(last_pseg_block), + NILFS_SEGCTOR_ATTR_LIST(last_seg_sequence), + NILFS_SEGCTOR_ATTR_LIST(last_seg_checkpoint), + NILFS_SEGCTOR_ATTR_LIST(current_seg_sequence), + NILFS_SEGCTOR_ATTR_LIST(current_last_full_seg), + NILFS_SEGCTOR_ATTR_LIST(next_full_seg), + NILFS_SEGCTOR_ATTR_LIST(next_pseg_offset), + NILFS_SEGCTOR_ATTR_LIST(next_checkpoint), + NILFS_SEGCTOR_ATTR_LIST(last_seg_write_time), + NILFS_SEGCTOR_ATTR_LIST(last_seg_write_time_secs), + NILFS_SEGCTOR_ATTR_LIST(last_nongc_write_time), + NILFS_SEGCTOR_ATTR_LIST(last_nongc_write_time_secs), + NILFS_SEGCTOR_ATTR_LIST(dirty_data_blocks_count), + NILFS_SEGCTOR_ATTR_LIST(README), + NULL, +}; + +NILFS_DEV_INT_GROUP_OPS(segctor, dev); +NILFS_DEV_INT_GROUP_TYPE(segctor, dev); +NILFS_DEV_INT_GROUP_FNS(segctor, dev); + +/************************************************************************ * NILFS superblock attrs * ************************************************************************/ @@ -406,8 +668,15 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) if (err) goto cleanup_dev_kobject; + err = nilfs_sysfs_create_segctor_group(nilfs); + if (err) + goto delete_superblock_group; + return 0; +delete_superblock_group: + nilfs_sysfs_delete_superblock_group(nilfs); + cleanup_dev_kobject: kobject_del(&nilfs->ns_dev_kobj); @@ -421,6 +690,7 @@ failed_create_device_group: void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) { nilfs_sysfs_delete_superblock_group(nilfs); + nilfs_sysfs_delete_segctor_group(nilfs); kobject_del(&nilfs->ns_dev_kobj); kfree(nilfs->ns_dev_subgroups); } diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 1d76af5..ad4d5bc 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -28,11 +28,17 @@ * struct nilfs_sysfs_dev_subgroups - device subgroup kernel objects * @sg_superblock_kobj: /sys/fs///superblock * @sg_superblock_kobj_unregister: completion state + * @sg_segctor_kobj: /sys/fs///segctor + * @sg_segctor_kobj_unregister: completion state */ struct nilfs_sysfs_dev_subgroups { /* /sys/fs///superblock */ struct kobject sg_superblock_kobj; struct completion sg_superblock_kobj_unregister; + + /* /sys/fs///segctor */ + struct kobject sg_segctor_kobj; + struct completion sg_segctor_kobj_unregister; }; #define NILFS_COMMON_ATTR_STRUCT(name) \ @@ -57,6 +63,7 @@ struct nilfs_##name##_attr { \ NILFS_DEV_ATTR_STRUCT(dev); NILFS_DEV_ATTR_STRUCT(superblock); +NILFS_DEV_ATTR_STRUCT(segctor); #define NILFS_ATTR(type, name, mode, show, store) \ static struct nilfs_##type##_attr nilfs_##type##_attr_##name = \ @@ -90,11 +97,20 @@ NILFS_DEV_ATTR_STRUCT(superblock); #define NILFS_SUPERBLOCK_RW_ATTR(name) \ NILFS_RW_ATTR(superblock, name) +#define NILFS_SEGCTOR_INFO_ATTR(name) \ + NILFS_INFO_ATTR(segctor, name) +#define NILFS_SEGCTOR_RO_ATTR(name) \ + NILFS_RO_ATTR(segctor, name) +#define NILFS_SEGCTOR_RW_ATTR(name) \ + NILFS_RW_ATTR(segctor, name) + #define NILFS_FEATURE_ATTR_LIST(name) \ (&nilfs_feature_attr_##name.attr) #define NILFS_DEV_ATTR_LIST(name) \ (&nilfs_dev_attr_##name.attr) #define NILFS_SUPERBLOCK_ATTR_LIST(name) \ (&nilfs_superblock_attr_##name.attr) +#define NILFS_SEGCTOR_ATTR_LIST(name) \ + (&nilfs_segctor_attr_##name.attr) #endif /* _NILFS_SYSFS_H */ -- cgit v0.10.2 From ef43d5cd84b7d2ea09846de34e14be7d74be3e6f Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:46 -0700 Subject: nilfs2: add /sys/fs/nilfs2//segments group This patch adds creation of /sys/fs/nilfs2//segments group. The segments group contains attributes that describe details about volume's segments: (1) segments_number - show number of segments on volume. (2) blocks_per_segment - show number of blocks in segment. (3) clean_segments - show count of clean segments. (4) dirty_segments - show count of dirty segments. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index c9952f4..d39e076 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -179,3 +179,34 @@ Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2//segctor group. + +What: /sys/fs/nilfs2//segments/segments_number +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of segments on a volume. + +What: /sys/fs/nilfs2//segments/blocks_per_segment +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of blocks in segment. + +What: /sys/fs/nilfs2//segments/clean_segments +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show count of clean segments. + +What: /sys/fs/nilfs2//segments/dirty_segments +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show count of dirty segments. + +What: /sys/fs/nilfs2//segments/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2//segments + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 61454a8..fe8e141 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -112,6 +112,95 @@ void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ } /************************************************************************ + * NILFS segments attrs * + ************************************************************************/ + +static ssize_t +nilfs_segments_segments_number_show(struct nilfs_segments_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%lu\n", nilfs->ns_nsegments); +} + +static ssize_t +nilfs_segments_blocks_per_segment_show(struct nilfs_segments_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%lu\n", nilfs->ns_blocks_per_segment); +} + +static ssize_t +nilfs_segments_clean_segments_show(struct nilfs_segments_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + unsigned long ncleansegs; + + down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem); + ncleansegs = nilfs_sufile_get_ncleansegs(nilfs->ns_sufile); + up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem); + + return snprintf(buf, PAGE_SIZE, "%lu\n", ncleansegs); +} + +static ssize_t +nilfs_segments_dirty_segments_show(struct nilfs_segments_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + struct nilfs_sustat sustat; + int err; + + down_read(&nilfs->ns_segctor_sem); + err = nilfs_sufile_get_stat(nilfs->ns_sufile, &sustat); + up_read(&nilfs->ns_segctor_sem); + if (err < 0) { + printk(KERN_ERR "NILFS: unable to get segment stat: err=%d\n", + err); + return err; + } + + return snprintf(buf, PAGE_SIZE, "%llu\n", sustat.ss_ndirtysegs); +} + +static const char segments_readme_str[] = + "The segments group contains attributes that describe\n" + "details about volume's segments.\n\n" + "(1) segments_number\n\tshow number of segments on volume.\n\n" + "(2) blocks_per_segment\n\tshow number of blocks in segment.\n\n" + "(3) clean_segments\n\tshow count of clean segments.\n\n" + "(4) dirty_segments\n\tshow count of dirty segments.\n\n"; + +static ssize_t +nilfs_segments_README_show(struct nilfs_segments_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, segments_readme_str); +} + +NILFS_SEGMENTS_RO_ATTR(segments_number); +NILFS_SEGMENTS_RO_ATTR(blocks_per_segment); +NILFS_SEGMENTS_RO_ATTR(clean_segments); +NILFS_SEGMENTS_RO_ATTR(dirty_segments); +NILFS_SEGMENTS_RO_ATTR(README); + +static struct attribute *nilfs_segments_attrs[] = { + NILFS_SEGMENTS_ATTR_LIST(segments_number), + NILFS_SEGMENTS_ATTR_LIST(blocks_per_segment), + NILFS_SEGMENTS_ATTR_LIST(clean_segments), + NILFS_SEGMENTS_ATTR_LIST(dirty_segments), + NILFS_SEGMENTS_ATTR_LIST(README), + NULL, +}; + +NILFS_DEV_INT_GROUP_OPS(segments, dev); +NILFS_DEV_INT_GROUP_TYPE(segments, dev); +NILFS_DEV_INT_GROUP_FNS(segments, dev); + +/************************************************************************ * NILFS segctor attrs * ************************************************************************/ @@ -664,10 +753,14 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) if (err) goto free_dev_subgroups; - err = nilfs_sysfs_create_superblock_group(nilfs); + err = nilfs_sysfs_create_segments_group(nilfs); if (err) goto cleanup_dev_kobject; + err = nilfs_sysfs_create_superblock_group(nilfs); + if (err) + goto delete_segments_group; + err = nilfs_sysfs_create_segctor_group(nilfs); if (err) goto delete_superblock_group; @@ -677,6 +770,9 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) delete_superblock_group: nilfs_sysfs_delete_superblock_group(nilfs); +delete_segments_group: + nilfs_sysfs_delete_segments_group(nilfs); + cleanup_dev_kobject: kobject_del(&nilfs->ns_dev_kobj); @@ -689,6 +785,7 @@ failed_create_device_group: void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) { + nilfs_sysfs_delete_segments_group(nilfs); nilfs_sysfs_delete_superblock_group(nilfs); nilfs_sysfs_delete_segctor_group(nilfs); kobject_del(&nilfs->ns_dev_kobj); diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index ad4d5bc..228bdd1 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -30,6 +30,8 @@ * @sg_superblock_kobj_unregister: completion state * @sg_segctor_kobj: /sys/fs///segctor * @sg_segctor_kobj_unregister: completion state + * @sg_segments_kobj: /sys/fs///segments + * @sg_segments_kobj_unregister: completion state */ struct nilfs_sysfs_dev_subgroups { /* /sys/fs///superblock */ @@ -39,6 +41,10 @@ struct nilfs_sysfs_dev_subgroups { /* /sys/fs///segctor */ struct kobject sg_segctor_kobj; struct completion sg_segctor_kobj_unregister; + + /* /sys/fs///segments */ + struct kobject sg_segments_kobj; + struct completion sg_segments_kobj_unregister; }; #define NILFS_COMMON_ATTR_STRUCT(name) \ @@ -62,6 +68,7 @@ struct nilfs_##name##_attr { \ }; NILFS_DEV_ATTR_STRUCT(dev); +NILFS_DEV_ATTR_STRUCT(segments); NILFS_DEV_ATTR_STRUCT(superblock); NILFS_DEV_ATTR_STRUCT(segctor); @@ -92,6 +99,11 @@ NILFS_DEV_ATTR_STRUCT(segctor); #define NILFS_DEV_RW_ATTR(name) \ NILFS_RW_ATTR(dev, name) +#define NILFS_SEGMENTS_RO_ATTR(name) \ + NILFS_RO_ATTR(segments, name) +#define NILFS_SEGMENTS_RW_ATTR(name) \ + NILFS_RW_ATTR(segs_info, name) + #define NILFS_SUPERBLOCK_RO_ATTR(name) \ NILFS_RO_ATTR(superblock, name) #define NILFS_SUPERBLOCK_RW_ATTR(name) \ @@ -108,6 +120,8 @@ NILFS_DEV_ATTR_STRUCT(segctor); (&nilfs_feature_attr_##name.attr) #define NILFS_DEV_ATTR_LIST(name) \ (&nilfs_dev_attr_##name.attr) +#define NILFS_SEGMENTS_ATTR_LIST(name) \ + (&nilfs_segments_attr_##name.attr) #define NILFS_SUPERBLOCK_ATTR_LIST(name) \ (&nilfs_superblock_attr_##name.attr) #define NILFS_SEGCTOR_ATTR_LIST(name) \ -- cgit v0.10.2 From 02a0ba1c60c2ad532322089a60256c8b0f46678c Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:48 -0700 Subject: nilfs2: add /sys/fs/nilfs2//checkpoints group This patch adds creation of /sys/fs/nilfs2//checkpoints group. The checkpoints group contains attributes that describe details about volume's checkpoints: (1) checkpoints_number - show number of checkpoints on volume. (2) snapshots_number - show number of snapshots on volume. (3) last_seg_checkpoint - show checkpoint number of the latest segment. (4) next_checkpoint - show next checkpoint number. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index d39e076..7b820d6 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -210,3 +210,34 @@ Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2//segments group. + +What: /sys/fs/nilfs2//checkpoints/checkpoints_number +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of checkpoints on volume. + +What: /sys/fs/nilfs2//checkpoints/snapshots_number +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of snapshots on volume. + +What: /sys/fs/nilfs2//checkpoints/last_seg_checkpoint +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show checkpoint number of the latest segment. + +What: /sys/fs/nilfs2//checkpoints/next_checkpoint +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show next checkpoint number. + +What: /sys/fs/nilfs2//checkpoints/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2//checkpoints + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index fe8e141..5cc735f 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -112,6 +112,119 @@ void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ } /************************************************************************ + * NILFS checkpoints attrs * + ************************************************************************/ + +static ssize_t +nilfs_checkpoints_checkpoints_number_show(struct nilfs_checkpoints_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 ncheckpoints; + struct nilfs_cpstat cpstat; + int err; + + down_read(&nilfs->ns_segctor_sem); + err = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat); + up_read(&nilfs->ns_segctor_sem); + if (err < 0) { + printk(KERN_ERR "NILFS: unable to get checkpoint stat: err=%d\n", + err); + return err; + } + + ncheckpoints = cpstat.cs_ncps; + + return snprintf(buf, PAGE_SIZE, "%llu\n", ncheckpoints); +} + +static ssize_t +nilfs_checkpoints_snapshots_number_show(struct nilfs_checkpoints_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 nsnapshots; + struct nilfs_cpstat cpstat; + int err; + + down_read(&nilfs->ns_segctor_sem); + err = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat); + up_read(&nilfs->ns_segctor_sem); + if (err < 0) { + printk(KERN_ERR "NILFS: unable to get checkpoint stat: err=%d\n", + err); + return err; + } + + nsnapshots = cpstat.cs_nsss; + + return snprintf(buf, PAGE_SIZE, "%llu\n", nsnapshots); +} + +static ssize_t +nilfs_checkpoints_last_seg_checkpoint_show(struct nilfs_checkpoints_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 last_cno; + + spin_lock(&nilfs->ns_last_segment_lock); + last_cno = nilfs->ns_last_cno; + spin_unlock(&nilfs->ns_last_segment_lock); + + return snprintf(buf, PAGE_SIZE, "%llu\n", last_cno); +} + +static ssize_t +nilfs_checkpoints_next_checkpoint_show(struct nilfs_checkpoints_attr *attr, + struct the_nilfs *nilfs, + char *buf) +{ + __u64 cno; + + down_read(&nilfs->ns_sem); + cno = nilfs->ns_cno; + up_read(&nilfs->ns_sem); + + return snprintf(buf, PAGE_SIZE, "%llu\n", cno); +} + +static const char checkpoints_readme_str[] = + "The checkpoints group contains attributes that describe\n" + "details about volume's checkpoints.\n\n" + "(1) checkpoints_number\n\tshow number of checkpoints on volume.\n\n" + "(2) snapshots_number\n\tshow number of snapshots on volume.\n\n" + "(3) last_seg_checkpoint\n" + "\tshow checkpoint number of the latest segment.\n\n" + "(4) next_checkpoint\n\tshow next checkpoint number.\n\n"; + +static ssize_t +nilfs_checkpoints_README_show(struct nilfs_checkpoints_attr *attr, + struct the_nilfs *nilfs, char *buf) +{ + return snprintf(buf, PAGE_SIZE, checkpoints_readme_str); +} + +NILFS_CHECKPOINTS_RO_ATTR(checkpoints_number); +NILFS_CHECKPOINTS_RO_ATTR(snapshots_number); +NILFS_CHECKPOINTS_RO_ATTR(last_seg_checkpoint); +NILFS_CHECKPOINTS_RO_ATTR(next_checkpoint); +NILFS_CHECKPOINTS_RO_ATTR(README); + +static struct attribute *nilfs_checkpoints_attrs[] = { + NILFS_CHECKPOINTS_ATTR_LIST(checkpoints_number), + NILFS_CHECKPOINTS_ATTR_LIST(snapshots_number), + NILFS_CHECKPOINTS_ATTR_LIST(last_seg_checkpoint), + NILFS_CHECKPOINTS_ATTR_LIST(next_checkpoint), + NILFS_CHECKPOINTS_ATTR_LIST(README), + NULL, +}; + +NILFS_DEV_INT_GROUP_OPS(checkpoints, dev); +NILFS_DEV_INT_GROUP_TYPE(checkpoints, dev); +NILFS_DEV_INT_GROUP_FNS(checkpoints, dev); + +/************************************************************************ * NILFS segments attrs * ************************************************************************/ @@ -753,10 +866,14 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) if (err) goto free_dev_subgroups; - err = nilfs_sysfs_create_segments_group(nilfs); + err = nilfs_sysfs_create_checkpoints_group(nilfs); if (err) goto cleanup_dev_kobject; + err = nilfs_sysfs_create_segments_group(nilfs); + if (err) + goto delete_checkpoints_group; + err = nilfs_sysfs_create_superblock_group(nilfs); if (err) goto delete_segments_group; @@ -773,6 +890,9 @@ delete_superblock_group: delete_segments_group: nilfs_sysfs_delete_segments_group(nilfs); +delete_checkpoints_group: + nilfs_sysfs_delete_checkpoints_group(nilfs); + cleanup_dev_kobject: kobject_del(&nilfs->ns_dev_kobj); @@ -785,6 +905,7 @@ failed_create_device_group: void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) { + nilfs_sysfs_delete_checkpoints_group(nilfs); nilfs_sysfs_delete_segments_group(nilfs); nilfs_sysfs_delete_superblock_group(nilfs); nilfs_sysfs_delete_segctor_group(nilfs); diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 228bdd1..62f0a0d 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -30,6 +30,8 @@ * @sg_superblock_kobj_unregister: completion state * @sg_segctor_kobj: /sys/fs///segctor * @sg_segctor_kobj_unregister: completion state + * @sg_checkpoints_kobj: /sys/fs///checkpoints + * @sg_checkpoints_kobj_unregister: completion state * @sg_segments_kobj: /sys/fs///segments * @sg_segments_kobj_unregister: completion state */ @@ -42,6 +44,10 @@ struct nilfs_sysfs_dev_subgroups { struct kobject sg_segctor_kobj; struct completion sg_segctor_kobj_unregister; + /* /sys/fs///checkpoints */ + struct kobject sg_checkpoints_kobj; + struct completion sg_checkpoints_kobj_unregister; + /* /sys/fs///segments */ struct kobject sg_segments_kobj; struct completion sg_segments_kobj_unregister; @@ -69,6 +75,7 @@ struct nilfs_##name##_attr { \ NILFS_DEV_ATTR_STRUCT(dev); NILFS_DEV_ATTR_STRUCT(segments); +NILFS_DEV_ATTR_STRUCT(checkpoints); NILFS_DEV_ATTR_STRUCT(superblock); NILFS_DEV_ATTR_STRUCT(segctor); @@ -104,6 +111,11 @@ NILFS_DEV_ATTR_STRUCT(segctor); #define NILFS_SEGMENTS_RW_ATTR(name) \ NILFS_RW_ATTR(segs_info, name) +#define NILFS_CHECKPOINTS_RO_ATTR(name) \ + NILFS_RO_ATTR(checkpoints, name) +#define NILFS_CHECKPOINTS_RW_ATTR(name) \ + NILFS_RW_ATTR(checkpoints, name) + #define NILFS_SUPERBLOCK_RO_ATTR(name) \ NILFS_RO_ATTR(superblock, name) #define NILFS_SUPERBLOCK_RW_ATTR(name) \ @@ -122,6 +134,8 @@ NILFS_DEV_ATTR_STRUCT(segctor); (&nilfs_dev_attr_##name.attr) #define NILFS_SEGMENTS_ATTR_LIST(name) \ (&nilfs_segments_attr_##name.attr) +#define NILFS_CHECKPOINTS_ATTR_LIST(name) \ + (&nilfs_checkpoints_attr_##name.attr) #define NILFS_SUPERBLOCK_ATTR_LIST(name) \ (&nilfs_superblock_attr_##name.attr) #define NILFS_SEGCTOR_ATTR_LIST(name) \ -- cgit v0.10.2 From a2ecb791a9d6e71a2d37d66034475a92ebc7e02c Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:50 -0700 Subject: nilfs2: add /sys/fs/nilfs2//mounted_snapshots group This patch adds creation of /sys/fs/nilfs2//mounted_snapshots group. The mounted_snapshots group contains group for every mounted snapshot. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index 7b820d6..29dbc8f 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -241,3 +241,10 @@ Contact: "Vyacheslav Dubeyko" Description: Describe attributes of /sys/fs/nilfs2//checkpoints group. + +What: /sys/fs/nilfs2//mounted_snapshots/README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe content of /sys/fs/nilfs2//mounted_snapshots + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 5cc735f..fc0c961 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -112,6 +112,32 @@ void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ } /************************************************************************ + * NILFS mounted snapshots attrs * + ************************************************************************/ + +static const char mounted_snapshots_readme_str[] = + "The mounted_snapshots group contains group for\n" + "every mounted snapshot.\n"; + +static ssize_t +nilfs_mounted_snapshots_README_show(struct nilfs_mounted_snapshots_attr *attr, + struct the_nilfs *nilfs, char *buf) +{ + return snprintf(buf, PAGE_SIZE, mounted_snapshots_readme_str); +} + +NILFS_MOUNTED_SNAPSHOTS_RO_ATTR(README); + +static struct attribute *nilfs_mounted_snapshots_attrs[] = { + NILFS_MOUNTED_SNAPSHOTS_ATTR_LIST(README), + NULL, +}; + +NILFS_DEV_INT_GROUP_OPS(mounted_snapshots, dev); +NILFS_DEV_INT_GROUP_TYPE(mounted_snapshots, dev); +NILFS_DEV_INT_GROUP_FNS(mounted_snapshots, dev); + +/************************************************************************ * NILFS checkpoints attrs * ************************************************************************/ @@ -866,10 +892,14 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) if (err) goto free_dev_subgroups; - err = nilfs_sysfs_create_checkpoints_group(nilfs); + err = nilfs_sysfs_create_mounted_snapshots_group(nilfs); if (err) goto cleanup_dev_kobject; + err = nilfs_sysfs_create_checkpoints_group(nilfs); + if (err) + goto delete_mounted_snapshots_group; + err = nilfs_sysfs_create_segments_group(nilfs); if (err) goto delete_checkpoints_group; @@ -893,6 +923,9 @@ delete_segments_group: delete_checkpoints_group: nilfs_sysfs_delete_checkpoints_group(nilfs); +delete_mounted_snapshots_group: + nilfs_sysfs_delete_mounted_snapshots_group(nilfs); + cleanup_dev_kobject: kobject_del(&nilfs->ns_dev_kobj); @@ -905,6 +938,7 @@ failed_create_device_group: void nilfs_sysfs_delete_device_group(struct the_nilfs *nilfs) { + nilfs_sysfs_delete_mounted_snapshots_group(nilfs); nilfs_sysfs_delete_checkpoints_group(nilfs); nilfs_sysfs_delete_segments_group(nilfs); nilfs_sysfs_delete_superblock_group(nilfs); diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 62f0a0d..4e84e23 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -30,6 +30,8 @@ * @sg_superblock_kobj_unregister: completion state * @sg_segctor_kobj: /sys/fs///segctor * @sg_segctor_kobj_unregister: completion state + * @sg_mounted_snapshots_kobj: /sys/fs///mounted_snapshots + * @sg_mounted_snapshots_kobj_unregister: completion state * @sg_checkpoints_kobj: /sys/fs///checkpoints * @sg_checkpoints_kobj_unregister: completion state * @sg_segments_kobj: /sys/fs///segments @@ -44,6 +46,10 @@ struct nilfs_sysfs_dev_subgroups { struct kobject sg_segctor_kobj; struct completion sg_segctor_kobj_unregister; + /* /sys/fs///mounted_snapshots */ + struct kobject sg_mounted_snapshots_kobj; + struct completion sg_mounted_snapshots_kobj_unregister; + /* /sys/fs///checkpoints */ struct kobject sg_checkpoints_kobj; struct completion sg_checkpoints_kobj_unregister; @@ -75,6 +81,7 @@ struct nilfs_##name##_attr { \ NILFS_DEV_ATTR_STRUCT(dev); NILFS_DEV_ATTR_STRUCT(segments); +NILFS_DEV_ATTR_STRUCT(mounted_snapshots); NILFS_DEV_ATTR_STRUCT(checkpoints); NILFS_DEV_ATTR_STRUCT(superblock); NILFS_DEV_ATTR_STRUCT(segctor); @@ -111,6 +118,9 @@ NILFS_DEV_ATTR_STRUCT(segctor); #define NILFS_SEGMENTS_RW_ATTR(name) \ NILFS_RW_ATTR(segs_info, name) +#define NILFS_MOUNTED_SNAPSHOTS_RO_ATTR(name) \ + NILFS_RO_ATTR(mounted_snapshots, name) + #define NILFS_CHECKPOINTS_RO_ATTR(name) \ NILFS_RO_ATTR(checkpoints, name) #define NILFS_CHECKPOINTS_RW_ATTR(name) \ @@ -134,6 +144,8 @@ NILFS_DEV_ATTR_STRUCT(segctor); (&nilfs_dev_attr_##name.attr) #define NILFS_SEGMENTS_ATTR_LIST(name) \ (&nilfs_segments_attr_##name.attr) +#define NILFS_MOUNTED_SNAPSHOTS_ATTR_LIST(name) \ + (&nilfs_mounted_snapshots_attr_##name.attr) #define NILFS_CHECKPOINTS_ATTR_LIST(name) \ (&nilfs_checkpoints_attr_##name.attr) #define NILFS_SUPERBLOCK_ATTR_LIST(name) \ -- cgit v0.10.2 From a5a7332a291b55beb0863b119816d12ffc04dfb0 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:52 -0700 Subject: nilfs2: add /sys/fs/nilfs2//mounted_snapshots/ group This patch adds creation of group for every mounted snapshot in /sys/fs/nilfs2//mounted_snapshots group. The group contains details about mounted snapshot: (1) inodes_count - show number of inodes for snapshot. (2) blocks_count - show number of blocks for snapshot. Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Cc: Michael L. Semon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/ABI/testing/sysfs-fs-nilfs2 b/Documentation/ABI/testing/sysfs-fs-nilfs2 index 29dbc8f..304ba84 100644 --- a/Documentation/ABI/testing/sysfs-fs-nilfs2 +++ b/Documentation/ABI/testing/sysfs-fs-nilfs2 @@ -248,3 +248,22 @@ Contact: "Vyacheslav Dubeyko" Description: Describe content of /sys/fs/nilfs2//mounted_snapshots group. + +What: /sys/fs/nilfs2//mounted_snapshots//inodes_count +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of inodes for snapshot. + +What: /sys/fs/nilfs2//mounted_snapshots//blocks_count +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Show number of blocks for snapshot. + +What: /sys/fs/nilfs2//mounted_snapshots//README +Date: April 2014 +Contact: "Vyacheslav Dubeyko" +Description: + Describe attributes of /sys/fs/nilfs2//mounted_snapshots/ + group. diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index fc0c961..0f6148c 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -112,6 +112,124 @@ void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ } /************************************************************************ + * NILFS snapshot attrs * + ************************************************************************/ + +static ssize_t +nilfs_snapshot_inodes_count_show(struct nilfs_snapshot_attr *attr, + struct nilfs_root *root, char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long)atomic64_read(&root->inodes_count)); +} + +static ssize_t +nilfs_snapshot_blocks_count_show(struct nilfs_snapshot_attr *attr, + struct nilfs_root *root, char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long)atomic64_read(&root->blocks_count)); +} + +static const char snapshot_readme_str[] = + "The group contains details about mounted snapshot.\n\n" + "(1) inodes_count\n\tshow number of inodes for snapshot.\n\n" + "(2) blocks_count\n\tshow number of blocks for snapshot.\n\n"; + +static ssize_t +nilfs_snapshot_README_show(struct nilfs_snapshot_attr *attr, + struct nilfs_root *root, char *buf) +{ + return snprintf(buf, PAGE_SIZE, snapshot_readme_str); +} + +NILFS_SNAPSHOT_RO_ATTR(inodes_count); +NILFS_SNAPSHOT_RO_ATTR(blocks_count); +NILFS_SNAPSHOT_RO_ATTR(README); + +static struct attribute *nilfs_snapshot_attrs[] = { + NILFS_SNAPSHOT_ATTR_LIST(inodes_count), + NILFS_SNAPSHOT_ATTR_LIST(blocks_count), + NILFS_SNAPSHOT_ATTR_LIST(README), + NULL, +}; + +static ssize_t nilfs_snapshot_attr_show(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + struct nilfs_root *root = + container_of(kobj, struct nilfs_root, snapshot_kobj); + struct nilfs_snapshot_attr *a = + container_of(attr, struct nilfs_snapshot_attr, attr); + + return a->show ? a->show(a, root, buf) : 0; +} + +static ssize_t nilfs_snapshot_attr_store(struct kobject *kobj, + struct attribute *attr, + const char *buf, size_t len) +{ + struct nilfs_root *root = + container_of(kobj, struct nilfs_root, snapshot_kobj); + struct nilfs_snapshot_attr *a = + container_of(attr, struct nilfs_snapshot_attr, attr); + + return a->store ? a->store(a, root, buf, len) : 0; +} + +static void nilfs_snapshot_attr_release(struct kobject *kobj) +{ + struct nilfs_root *root = container_of(kobj, struct nilfs_root, + snapshot_kobj); + complete(&root->snapshot_kobj_unregister); +} + +static const struct sysfs_ops nilfs_snapshot_attr_ops = { + .show = nilfs_snapshot_attr_show, + .store = nilfs_snapshot_attr_store, +}; + +static struct kobj_type nilfs_snapshot_ktype = { + .default_attrs = nilfs_snapshot_attrs, + .sysfs_ops = &nilfs_snapshot_attr_ops, + .release = nilfs_snapshot_attr_release, +}; + +int nilfs_sysfs_create_snapshot_group(struct nilfs_root *root) +{ + struct the_nilfs *nilfs; + struct kobject *parent; + int err; + + nilfs = root->nilfs; + parent = &nilfs->ns_dev_subgroups->sg_mounted_snapshots_kobj; + root->snapshot_kobj.kset = nilfs_kset; + init_completion(&root->snapshot_kobj_unregister); + + if (root->cno == NILFS_CPTREE_CURRENT_CNO) { + err = kobject_init_and_add(&root->snapshot_kobj, + &nilfs_snapshot_ktype, + &nilfs->ns_dev_kobj, + "current_checkpoint"); + } else { + err = kobject_init_and_add(&root->snapshot_kobj, + &nilfs_snapshot_ktype, + parent, + "%llu", root->cno); + } + + if (err) + return err; + + return 0; +} + +void nilfs_sysfs_delete_snapshot_group(struct nilfs_root *root) +{ + kobject_del(&root->snapshot_kobj); +} + +/************************************************************************ * NILFS mounted snapshots attrs * ************************************************************************/ diff --git a/fs/nilfs2/sysfs.h b/fs/nilfs2/sysfs.h index 4e84e23..677e3a1 100644 --- a/fs/nilfs2/sysfs.h +++ b/fs/nilfs2/sysfs.h @@ -86,6 +86,17 @@ NILFS_DEV_ATTR_STRUCT(checkpoints); NILFS_DEV_ATTR_STRUCT(superblock); NILFS_DEV_ATTR_STRUCT(segctor); +#define NILFS_CP_ATTR_STRUCT(name) \ +struct nilfs_##name##_attr { \ + struct attribute attr; \ + ssize_t (*show)(struct nilfs_##name##_attr *, struct nilfs_root *, \ + char *); \ + ssize_t (*store)(struct nilfs_##name##_attr *, struct nilfs_root *, \ + const char *, size_t); \ +}; + +NILFS_CP_ATTR_STRUCT(snapshot); + #define NILFS_ATTR(type, name, mode, show, store) \ static struct nilfs_##type##_attr nilfs_##type##_attr_##name = \ __ATTR(name, mode, show, store) @@ -126,6 +137,13 @@ NILFS_DEV_ATTR_STRUCT(segctor); #define NILFS_CHECKPOINTS_RW_ATTR(name) \ NILFS_RW_ATTR(checkpoints, name) +#define NILFS_SNAPSHOT_INFO_ATTR(name) \ + NILFS_INFO_ATTR(snapshot, name) +#define NILFS_SNAPSHOT_RO_ATTR(name) \ + NILFS_RO_ATTR(snapshot, name) +#define NILFS_SNAPSHOT_RW_ATTR(name) \ + NILFS_RW_ATTR(snapshot, name) + #define NILFS_SUPERBLOCK_RO_ATTR(name) \ NILFS_RO_ATTR(superblock, name) #define NILFS_SUPERBLOCK_RW_ATTR(name) \ @@ -148,6 +166,8 @@ NILFS_DEV_ATTR_STRUCT(segctor); (&nilfs_mounted_snapshots_attr_##name.attr) #define NILFS_CHECKPOINTS_ATTR_LIST(name) \ (&nilfs_checkpoints_attr_##name.attr) +#define NILFS_SNAPSHOT_ATTR_LIST(name) \ + (&nilfs_snapshot_attr_##name.attr) #define NILFS_SUPERBLOCK_ATTR_LIST(name) \ (&nilfs_superblock_attr_##name.attr) #define NILFS_SEGCTOR_ATTR_LIST(name) \ diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 7fe8223..d01ead1 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -243,6 +243,8 @@ THE_NILFS_FNS(SB_DIRTY, sb_dirty) * @ifile: inode file * @inodes_count: number of inodes * @blocks_count: number of blocks + * @snapshot_kobj: /sys/fs///mounted_snapshots/ + * @snapshot_kobj_unregister: completion state for kernel object */ struct nilfs_root { __u64 cno; @@ -254,6 +256,10 @@ struct nilfs_root { atomic64_t inodes_count; atomic64_t blocks_count; + + /* /sys/fs///mounted_snapshots/ */ + struct kobject snapshot_kobj; + struct completion snapshot_kobj_unregister; }; /* Special checkpoint number */ -- cgit v0.10.2 From dd70edbde2627f47df118d899de6bbb55abcfdbf Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Fri, 8 Aug 2014 14:20:55 -0700 Subject: nilfs2: integrate sysfs support into driver This patch integrates creation of sysfs groups and attributes into NILFS file system driver. It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group functions by Michael L Semon in the first version of the patch: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579 in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2 2 locks held by umount.nilfs2/32676: #0: (&type->s_umount_key#21){++++..}, at: [<790c18e2>] deactivate_super+0x37/0x58 #1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [<791bf659>] nilfs_put_root+0x23/0x5a Preemption disabled at:[<791bf659>] nilfs_put_root+0x23/0x5a CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2 Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002 Call Trace: dump_stack+0x4b/0x75 __might_sleep+0x111/0x16f mutex_lock_nested+0x1e/0x3ad kernfs_remove+0x12/0x26 sysfs_remove_dir+0x3d/0x62 kobject_del+0x13/0x38 nilfs_sysfs_delete_snapshot_group+0xb/0xd nilfs_put_root+0x2a/0x5a nilfs_detach_log_writer+0x1ab/0x2c1 nilfs_put_super+0x13/0x68 generic_shutdown_super+0x60/0xd1 kill_block_super+0x1d/0x60 deactivate_locked_super+0x22/0x3f deactivate_super+0x3e/0x58 mntput_no_expire+0xe2/0x141 SyS_oldumount+0x70/0xa5 syscall_call+0x7/0xb The reason of the issue was placement of nilfs_sysfs_{create/delete}_snapshot_group() call under nilfs->ns_cptree_lock protection. But this protection is unnecessary and wrong solution. The second version of the patch fixes this issue. [fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static] Reported-by: Michael L. Semon Signed-off-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Cc: Ryusuke Konishi Tested-by: Michael L. Semon Signed-off-by: Fengguang Wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/nilfs2/Makefile b/fs/nilfs2/Makefile index 85c9873..fc603e0 100644 --- a/fs/nilfs2/Makefile +++ b/fs/nilfs2/Makefile @@ -2,4 +2,4 @@ obj-$(CONFIG_NILFS2_FS) += nilfs2.o nilfs2-y := inode.o file.o dir.o super.o namei.o page.o mdt.o \ btnode.o bmap.o btree.o direct.o dat.o recovery.o \ the_nilfs.o segbuf.o segment.o cpfile.o sufile.o \ - ifile.o alloc.o gcinode.o ioctl.o + ifile.o alloc.o gcinode.o ioctl.o sysfs.o diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index 9bc72de..0696161 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -320,6 +320,14 @@ int nilfs_gccache_wait_and_mark_dirty(struct buffer_head *); int nilfs_init_gcinode(struct inode *inode); void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs); +/* sysfs.c */ +int __init nilfs_sysfs_init(void); +void nilfs_sysfs_exit(void); +int nilfs_sysfs_create_device_group(struct super_block *); +void nilfs_sysfs_delete_device_group(struct the_nilfs *); +int nilfs_sysfs_create_snapshot_group(struct nilfs_root *); +void nilfs_sysfs_delete_snapshot_group(struct nilfs_root *); + /* * Inodes and files operations */ diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 8c532b2..c519927 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -1452,13 +1452,19 @@ static int __init init_nilfs_fs(void) if (err) goto fail; - err = register_filesystem(&nilfs_fs_type); + err = nilfs_sysfs_init(); if (err) goto free_cachep; + err = register_filesystem(&nilfs_fs_type); + if (err) + goto deinit_sysfs_entry; + printk(KERN_INFO "NILFS version 2 loaded\n"); return 0; +deinit_sysfs_entry: + nilfs_sysfs_exit(); free_cachep: nilfs_destroy_cachep(); fail: @@ -1468,6 +1474,7 @@ fail: static void __exit exit_nilfs_fs(void) { nilfs_destroy_cachep(); + nilfs_sysfs_exit(); unregister_filesystem(&nilfs_fs_type); } diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 0f6148c..bbb0dcc 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -87,7 +87,7 @@ static struct kobj_type nilfs_##name##_ktype = { \ }; #define NILFS_DEV_INT_GROUP_FNS(name, parent_name) \ -int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ +static int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ { \ struct kobject *parent; \ struct kobject *kobj; \ @@ -106,7 +106,7 @@ int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ return err; \ return 0; \ } \ -void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ +static void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ { \ kobject_del(&nilfs->ns_##parent_name##_subgroups->sg_##name##_kobj); \ } diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index 59d5008..9da25fe 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -98,6 +98,7 @@ void destroy_nilfs(struct the_nilfs *nilfs) { might_sleep(); if (nilfs_init(nilfs)) { + nilfs_sysfs_delete_device_group(nilfs); brelse(nilfs->ns_sbh[0]); brelse(nilfs->ns_sbh[1]); } @@ -641,6 +642,10 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data) if (err) goto failed_sbh; + err = nilfs_sysfs_create_device_group(sb); + if (err) + goto failed_sbh; + set_nilfs_init(nilfs); err = 0; out: @@ -741,12 +746,13 @@ nilfs_find_or_create_root(struct the_nilfs *nilfs, __u64 cno) { struct rb_node **p, *parent; struct nilfs_root *root, *new; + int err; root = nilfs_lookup_root(nilfs, cno); if (root) return root; - new = kmalloc(sizeof(*root), GFP_KERNEL); + new = kzalloc(sizeof(*root), GFP_KERNEL); if (!new) return NULL; @@ -783,6 +789,12 @@ nilfs_find_or_create_root(struct the_nilfs *nilfs, __u64 cno) spin_unlock(&nilfs->ns_cptree_lock); + err = nilfs_sysfs_create_snapshot_group(new); + if (err) { + kfree(new); + new = NULL; + } + return new; } @@ -791,6 +803,8 @@ void nilfs_put_root(struct nilfs_root *root) if (atomic_dec_and_test(&root->count)) { struct the_nilfs *nilfs = root->nilfs; + nilfs_sysfs_delete_snapshot_group(root); + spin_lock(&nilfs->ns_cptree_lock); rb_erase(&root->rb_node, &nilfs->ns_cptree); spin_unlock(&nilfs->ns_cptree_lock); -- cgit v0.10.2 From a9814c5d2dae4b6a3052321a6cb8f2dcc0b3e30b Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:20:57 -0700 Subject: fs/ufs: convert printk to pr_foo() Use current logging functions. - no level printk under CONFIG_UFS_DEBUG converted to pr_debug - no level printk elsewhere converted to pr_err - add DDEBUG flag in Makefile - coalesce formats Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/Makefile b/fs/ufs/Makefile index dd39980..4d0e02b 100644 --- a/fs/ufs/Makefile +++ b/fs/ufs/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_UFS_FS) += ufs.o ufs-objs := balloc.o cylinder.o dir.o file.o ialloc.o inode.o \ namei.o super.o symlink.o truncate.o util.o +ccflags-$(CONFIG_UFS_DEBUG) += -DDEBUG diff --git a/fs/ufs/super.c b/fs/ufs/super.c index b879f1b..9778b9f 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -172,73 +172,73 @@ static void ufs_print_super_stuff(struct super_block *sb, { u32 magic = fs32_to_cpu(sb, usb3->fs_magic); - printk("ufs_print_super_stuff\n"); - printk(" magic: 0x%x\n", magic); + pr_debug("ufs_print_super_stuff\n"); + pr_debug(" magic: 0x%x\n", magic); if (fs32_to_cpu(sb, usb3->fs_magic) == UFS2_MAGIC) { - printk(" fs_size: %llu\n", (unsigned long long) - fs64_to_cpu(sb, usb3->fs_un1.fs_u2.fs_size)); - printk(" fs_dsize: %llu\n", (unsigned long long) - fs64_to_cpu(sb, usb3->fs_un1.fs_u2.fs_dsize)); - printk(" bsize: %u\n", - fs32_to_cpu(sb, usb1->fs_bsize)); - printk(" fsize: %u\n", - fs32_to_cpu(sb, usb1->fs_fsize)); - printk(" fs_volname: %s\n", usb2->fs_un.fs_u2.fs_volname); - printk(" fs_sblockloc: %llu\n", (unsigned long long) - fs64_to_cpu(sb, usb2->fs_un.fs_u2.fs_sblockloc)); - printk(" cs_ndir(No of dirs): %llu\n", (unsigned long long) - fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir)); - printk(" cs_nbfree(No of free blocks): %llu\n", - (unsigned long long) - fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_nbfree)); - printk(KERN_INFO" cs_nifree(Num of free inodes): %llu\n", - (unsigned long long) - fs64_to_cpu(sb, usb3->fs_un1.fs_u2.cs_nifree)); - printk(KERN_INFO" cs_nffree(Num of free frags): %llu\n", - (unsigned long long) - fs64_to_cpu(sb, usb3->fs_un1.fs_u2.cs_nffree)); - printk(KERN_INFO" fs_maxsymlinklen: %u\n", - fs32_to_cpu(sb, usb3->fs_un2.fs_44.fs_maxsymlinklen)); + pr_debug(" fs_size: %llu\n", (unsigned long long) + fs64_to_cpu(sb, usb3->fs_un1.fs_u2.fs_size)); + pr_debug(" fs_dsize: %llu\n", (unsigned long long) + fs64_to_cpu(sb, usb3->fs_un1.fs_u2.fs_dsize)); + pr_debug(" bsize: %u\n", + fs32_to_cpu(sb, usb1->fs_bsize)); + pr_debug(" fsize: %u\n", + fs32_to_cpu(sb, usb1->fs_fsize)); + pr_debug(" fs_volname: %s\n", usb2->fs_un.fs_u2.fs_volname); + pr_debug(" fs_sblockloc: %llu\n", (unsigned long long) + fs64_to_cpu(sb, usb2->fs_un.fs_u2.fs_sblockloc)); + pr_debug(" cs_ndir(No of dirs): %llu\n", (unsigned long long) + fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir)); + pr_debug(" cs_nbfree(No of free blocks): %llu\n", + (unsigned long long) + fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_nbfree)); + pr_info(" cs_nifree(Num of free inodes): %llu\n", + (unsigned long long) + fs64_to_cpu(sb, usb3->fs_un1.fs_u2.cs_nifree)); + pr_info(" cs_nffree(Num of free frags): %llu\n", + (unsigned long long) + fs64_to_cpu(sb, usb3->fs_un1.fs_u2.cs_nffree)); + pr_info(" fs_maxsymlinklen: %u\n", + fs32_to_cpu(sb, usb3->fs_un2.fs_44.fs_maxsymlinklen)); } else { - printk(" sblkno: %u\n", fs32_to_cpu(sb, usb1->fs_sblkno)); - printk(" cblkno: %u\n", fs32_to_cpu(sb, usb1->fs_cblkno)); - printk(" iblkno: %u\n", fs32_to_cpu(sb, usb1->fs_iblkno)); - printk(" dblkno: %u\n", fs32_to_cpu(sb, usb1->fs_dblkno)); - printk(" cgoffset: %u\n", - fs32_to_cpu(sb, usb1->fs_cgoffset)); - printk(" ~cgmask: 0x%x\n", - ~fs32_to_cpu(sb, usb1->fs_cgmask)); - printk(" size: %u\n", fs32_to_cpu(sb, usb1->fs_size)); - printk(" dsize: %u\n", fs32_to_cpu(sb, usb1->fs_dsize)); - printk(" ncg: %u\n", fs32_to_cpu(sb, usb1->fs_ncg)); - printk(" bsize: %u\n", fs32_to_cpu(sb, usb1->fs_bsize)); - printk(" fsize: %u\n", fs32_to_cpu(sb, usb1->fs_fsize)); - printk(" frag: %u\n", fs32_to_cpu(sb, usb1->fs_frag)); - printk(" fragshift: %u\n", - fs32_to_cpu(sb, usb1->fs_fragshift)); - printk(" ~fmask: %u\n", ~fs32_to_cpu(sb, usb1->fs_fmask)); - printk(" fshift: %u\n", fs32_to_cpu(sb, usb1->fs_fshift)); - printk(" sbsize: %u\n", fs32_to_cpu(sb, usb1->fs_sbsize)); - printk(" spc: %u\n", fs32_to_cpu(sb, usb1->fs_spc)); - printk(" cpg: %u\n", fs32_to_cpu(sb, usb1->fs_cpg)); - printk(" ipg: %u\n", fs32_to_cpu(sb, usb1->fs_ipg)); - printk(" fpg: %u\n", fs32_to_cpu(sb, usb1->fs_fpg)); - printk(" csaddr: %u\n", fs32_to_cpu(sb, usb1->fs_csaddr)); - printk(" cssize: %u\n", fs32_to_cpu(sb, usb1->fs_cssize)); - printk(" cgsize: %u\n", fs32_to_cpu(sb, usb1->fs_cgsize)); - printk(" fstodb: %u\n", - fs32_to_cpu(sb, usb1->fs_fsbtodb)); - printk(" nrpos: %u\n", fs32_to_cpu(sb, usb3->fs_nrpos)); - printk(" ndir %u\n", - fs32_to_cpu(sb, usb1->fs_cstotal.cs_ndir)); - printk(" nifree %u\n", - fs32_to_cpu(sb, usb1->fs_cstotal.cs_nifree)); - printk(" nbfree %u\n", - fs32_to_cpu(sb, usb1->fs_cstotal.cs_nbfree)); - printk(" nffree %u\n", - fs32_to_cpu(sb, usb1->fs_cstotal.cs_nffree)); + pr_debug(" sblkno: %u\n", fs32_to_cpu(sb, usb1->fs_sblkno)); + pr_debug(" cblkno: %u\n", fs32_to_cpu(sb, usb1->fs_cblkno)); + pr_debug(" iblkno: %u\n", fs32_to_cpu(sb, usb1->fs_iblkno)); + pr_debug(" dblkno: %u\n", fs32_to_cpu(sb, usb1->fs_dblkno)); + pr_debug(" cgoffset: %u\n", + fs32_to_cpu(sb, usb1->fs_cgoffset)); + pr_debug(" ~cgmask: 0x%x\n", + ~fs32_to_cpu(sb, usb1->fs_cgmask)); + pr_debug(" size: %u\n", fs32_to_cpu(sb, usb1->fs_size)); + pr_debug(" dsize: %u\n", fs32_to_cpu(sb, usb1->fs_dsize)); + pr_debug(" ncg: %u\n", fs32_to_cpu(sb, usb1->fs_ncg)); + pr_debug(" bsize: %u\n", fs32_to_cpu(sb, usb1->fs_bsize)); + pr_debug(" fsize: %u\n", fs32_to_cpu(sb, usb1->fs_fsize)); + pr_debug(" frag: %u\n", fs32_to_cpu(sb, usb1->fs_frag)); + pr_debug(" fragshift: %u\n", + fs32_to_cpu(sb, usb1->fs_fragshift)); + pr_debug(" ~fmask: %u\n", ~fs32_to_cpu(sb, usb1->fs_fmask)); + pr_debug(" fshift: %u\n", fs32_to_cpu(sb, usb1->fs_fshift)); + pr_debug(" sbsize: %u\n", fs32_to_cpu(sb, usb1->fs_sbsize)); + pr_debug(" spc: %u\n", fs32_to_cpu(sb, usb1->fs_spc)); + pr_debug(" cpg: %u\n", fs32_to_cpu(sb, usb1->fs_cpg)); + pr_debug(" ipg: %u\n", fs32_to_cpu(sb, usb1->fs_ipg)); + pr_debug(" fpg: %u\n", fs32_to_cpu(sb, usb1->fs_fpg)); + pr_debug(" csaddr: %u\n", fs32_to_cpu(sb, usb1->fs_csaddr)); + pr_debug(" cssize: %u\n", fs32_to_cpu(sb, usb1->fs_cssize)); + pr_debug(" cgsize: %u\n", fs32_to_cpu(sb, usb1->fs_cgsize)); + pr_debug(" fstodb: %u\n", + fs32_to_cpu(sb, usb1->fs_fsbtodb)); + pr_debug(" nrpos: %u\n", fs32_to_cpu(sb, usb3->fs_nrpos)); + pr_debug(" ndir %u\n", + fs32_to_cpu(sb, usb1->fs_cstotal.cs_ndir)); + pr_debug(" nifree %u\n", + fs32_to_cpu(sb, usb1->fs_cstotal.cs_nifree)); + pr_debug(" nbfree %u\n", + fs32_to_cpu(sb, usb1->fs_cstotal.cs_nbfree)); + pr_debug(" nffree %u\n", + fs32_to_cpu(sb, usb1->fs_cstotal.cs_nffree)); } - printk("\n"); + pr_debug("\n"); } /* @@ -247,38 +247,38 @@ static void ufs_print_super_stuff(struct super_block *sb, static void ufs_print_cylinder_stuff(struct super_block *sb, struct ufs_cylinder_group *cg) { - printk("\nufs_print_cylinder_stuff\n"); - printk("size of ucg: %zu\n", sizeof(struct ufs_cylinder_group)); - printk(" magic: %x\n", fs32_to_cpu(sb, cg->cg_magic)); - printk(" time: %u\n", fs32_to_cpu(sb, cg->cg_time)); - printk(" cgx: %u\n", fs32_to_cpu(sb, cg->cg_cgx)); - printk(" ncyl: %u\n", fs16_to_cpu(sb, cg->cg_ncyl)); - printk(" niblk: %u\n", fs16_to_cpu(sb, cg->cg_niblk)); - printk(" ndblk: %u\n", fs32_to_cpu(sb, cg->cg_ndblk)); - printk(" cs_ndir: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_ndir)); - printk(" cs_nbfree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nbfree)); - printk(" cs_nifree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nifree)); - printk(" cs_nffree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nffree)); - printk(" rotor: %u\n", fs32_to_cpu(sb, cg->cg_rotor)); - printk(" frotor: %u\n", fs32_to_cpu(sb, cg->cg_frotor)); - printk(" irotor: %u\n", fs32_to_cpu(sb, cg->cg_irotor)); - printk(" frsum: %u, %u, %u, %u, %u, %u, %u, %u\n", + pr_debug("\nufs_print_cylinder_stuff\n"); + pr_debug("size of ucg: %zu\n", sizeof(struct ufs_cylinder_group)); + pr_debug(" magic: %x\n", fs32_to_cpu(sb, cg->cg_magic)); + pr_debug(" time: %u\n", fs32_to_cpu(sb, cg->cg_time)); + pr_debug(" cgx: %u\n", fs32_to_cpu(sb, cg->cg_cgx)); + pr_debug(" ncyl: %u\n", fs16_to_cpu(sb, cg->cg_ncyl)); + pr_debug(" niblk: %u\n", fs16_to_cpu(sb, cg->cg_niblk)); + pr_debug(" ndblk: %u\n", fs32_to_cpu(sb, cg->cg_ndblk)); + pr_debug(" cs_ndir: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_ndir)); + pr_debug(" cs_nbfree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nbfree)); + pr_debug(" cs_nifree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nifree)); + pr_debug(" cs_nffree: %u\n", fs32_to_cpu(sb, cg->cg_cs.cs_nffree)); + pr_debug(" rotor: %u\n", fs32_to_cpu(sb, cg->cg_rotor)); + pr_debug(" frotor: %u\n", fs32_to_cpu(sb, cg->cg_frotor)); + pr_debug(" irotor: %u\n", fs32_to_cpu(sb, cg->cg_irotor)); + pr_debug(" frsum: %u, %u, %u, %u, %u, %u, %u, %u\n", fs32_to_cpu(sb, cg->cg_frsum[0]), fs32_to_cpu(sb, cg->cg_frsum[1]), fs32_to_cpu(sb, cg->cg_frsum[2]), fs32_to_cpu(sb, cg->cg_frsum[3]), fs32_to_cpu(sb, cg->cg_frsum[4]), fs32_to_cpu(sb, cg->cg_frsum[5]), fs32_to_cpu(sb, cg->cg_frsum[6]), fs32_to_cpu(sb, cg->cg_frsum[7])); - printk(" btotoff: %u\n", fs32_to_cpu(sb, cg->cg_btotoff)); - printk(" boff: %u\n", fs32_to_cpu(sb, cg->cg_boff)); - printk(" iuseoff: %u\n", fs32_to_cpu(sb, cg->cg_iusedoff)); - printk(" freeoff: %u\n", fs32_to_cpu(sb, cg->cg_freeoff)); - printk(" nextfreeoff: %u\n", fs32_to_cpu(sb, cg->cg_nextfreeoff)); - printk(" clustersumoff %u\n", - fs32_to_cpu(sb, cg->cg_u.cg_44.cg_clustersumoff)); - printk(" clusteroff %u\n", - fs32_to_cpu(sb, cg->cg_u.cg_44.cg_clusteroff)); - printk(" nclusterblks %u\n", - fs32_to_cpu(sb, cg->cg_u.cg_44.cg_nclusterblks)); - printk("\n"); + pr_debug(" btotoff: %u\n", fs32_to_cpu(sb, cg->cg_btotoff)); + pr_debug(" boff: %u\n", fs32_to_cpu(sb, cg->cg_boff)); + pr_debug(" iuseoff: %u\n", fs32_to_cpu(sb, cg->cg_iusedoff)); + pr_debug(" freeoff: %u\n", fs32_to_cpu(sb, cg->cg_freeoff)); + pr_debug(" nextfreeoff: %u\n", fs32_to_cpu(sb, cg->cg_nextfreeoff)); + pr_debug(" clustersumoff %u\n", + fs32_to_cpu(sb, cg->cg_u.cg_44.cg_clustersumoff)); + pr_debug(" clusteroff %u\n", + fs32_to_cpu(sb, cg->cg_u.cg_44.cg_clusteroff)); + pr_debug(" nclusterblks %u\n", + fs32_to_cpu(sb, cg->cg_u.cg_44.cg_nclusterblks)); + pr_debug("\n"); } #else # define ufs_print_super_stuff(sb, usb1, usb2, usb3) /**/ @@ -316,7 +316,7 @@ void ufs_error (struct super_block * sb, const char * function, case UFS_MOUNT_ONERROR_LOCK: case UFS_MOUNT_ONERROR_UMOUNT: case UFS_MOUNT_ONERROR_REPAIR: - printk (KERN_CRIT "UFS-fs error (device %s): %s: %s\n", + pr_crit("UFS-fs error (device %s): %s: %s\n", sb->s_id, function, error_buf); } } @@ -340,7 +340,7 @@ void ufs_panic (struct super_block * sb, const char * function, vsnprintf (error_buf, sizeof(error_buf), fmt, args); va_end (args); sb->s_flags |= MS_RDONLY; - printk (KERN_CRIT "UFS-fs panic (device %s): %s: %s\n", + pr_crit("UFS-fs panic (device %s): %s: %s\n", sb->s_id, function, error_buf); } @@ -352,7 +352,7 @@ void ufs_warning (struct super_block * sb, const char * function, va_start (args, fmt); vsnprintf (error_buf, sizeof(error_buf), fmt, args); va_end (args); - printk (KERN_WARNING "UFS-fs warning (device %s): %s: %s\n", + pr_warn("UFS-fs warning (device %s): %s: %s\n", sb->s_id, function, error_buf); } @@ -464,14 +464,12 @@ static int ufs_parse_options (char * options, unsigned * mount_options) ufs_set_opt (*mount_options, ONERROR_UMOUNT); break; case Opt_onerror_repair: - printk("UFS-fs: Unable to do repair on error, " - "will lock lock instead\n"); + pr_err("UFS-fs: Unable to do repair on error, will lock lock instead\n"); ufs_clear_opt (*mount_options, ONERROR); ufs_set_opt (*mount_options, ONERROR_REPAIR); break; default: - printk("UFS-fs: Invalid option: \"%s\" " - "or missing value\n", p); + pr_err("UFS-fs: Invalid option: \"%s\" or missing value\n", p); return 0; } } @@ -788,8 +786,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) #ifndef CONFIG_UFS_FS_WRITE if (!(sb->s_flags & MS_RDONLY)) { - printk("ufs was compiled with read-only support, " - "can't be mounted as read-write\n"); + pr_err("ufs was compiled with read-only support, can't be mounted as read-write\n"); return -EROFS; } #endif @@ -812,12 +809,12 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) sbi->s_mount_opt = 0; ufs_set_opt (sbi->s_mount_opt, ONERROR_LOCK); if (!ufs_parse_options ((char *) data, &sbi->s_mount_opt)) { - printk("wrong mount options\n"); + pr_err("wrong mount options\n"); goto failed; } if (!(sbi->s_mount_opt & UFS_MOUNT_UFSTYPE)) { if (!silent) - printk("You didn't specify the type of your ufs filesystem\n\n" + pr_err("You didn't specify the type of your ufs filesystem\n\n" "mount -t ufs -o ufstype=" "sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ...\n\n" ">>>WARNING<<< Wrong ufstype may corrupt your filesystem, " @@ -900,7 +897,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_DE_OLD | UFS_UID_OLD | UFS_ST_OLD | UFS_CG_OLD; if (!(sb->s_flags & MS_RDONLY)) { if (!silent) - printk(KERN_INFO "ufstype=old is supported read-only\n"); + pr_info("ufstype=old is supported read-only\n"); sb->s_flags |= MS_RDONLY; } break; @@ -916,7 +913,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_DE_OLD | UFS_UID_OLD | UFS_ST_OLD | UFS_CG_OLD; if (!(sb->s_flags & MS_RDONLY)) { if (!silent) - printk(KERN_INFO "ufstype=nextstep is supported read-only\n"); + pr_info("ufstype=nextstep is supported read-only\n"); sb->s_flags |= MS_RDONLY; } break; @@ -932,7 +929,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_DE_OLD | UFS_UID_OLD | UFS_ST_OLD | UFS_CG_OLD; if (!(sb->s_flags & MS_RDONLY)) { if (!silent) - printk(KERN_INFO "ufstype=nextstep-cd is supported read-only\n"); + pr_info("ufstype=nextstep-cd is supported read-only\n"); sb->s_flags |= MS_RDONLY; } break; @@ -948,7 +945,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_DE_44BSD | UFS_UID_44BSD | UFS_ST_44BSD | UFS_CG_44BSD; if (!(sb->s_flags & MS_RDONLY)) { if (!silent) - printk(KERN_INFO "ufstype=openstep is supported read-only\n"); + pr_info("ufstype=openstep is supported read-only\n"); sb->s_flags |= MS_RDONLY; } break; @@ -963,19 +960,19 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_DE_OLD | UFS_UID_OLD | UFS_ST_OLD | UFS_CG_OLD; if (!(sb->s_flags & MS_RDONLY)) { if (!silent) - printk(KERN_INFO "ufstype=hp is supported read-only\n"); + pr_info("ufstype=hp is supported read-only\n"); sb->s_flags |= MS_RDONLY; } break; default: if (!silent) - printk("unknown ufstype\n"); + pr_err("unknown ufstype\n"); goto failed; } again: if (!sb_set_blocksize(sb, block_size)) { - printk(KERN_ERR "UFS: failed to set blocksize\n"); + pr_err("UFS: failed to set blocksize\n"); goto failed; } @@ -1034,7 +1031,7 @@ again: goto again; } if (!silent) - printk("ufs_read_super: bad magic number\n"); + pr_err("ufs_read_super: bad magic number\n"); goto failed; magic_found: @@ -1048,33 +1045,33 @@ magic_found: uspi->s_fshift = fs32_to_cpu(sb, usb1->fs_fshift); if (!is_power_of_2(uspi->s_fsize)) { - printk(KERN_ERR "ufs_read_super: fragment size %u is not a power of 2\n", - uspi->s_fsize); - goto failed; + pr_err("ufs_read_super: fragment size %u is not a power of 2\n", + uspi->s_fsize); + goto failed; } if (uspi->s_fsize < 512) { - printk(KERN_ERR "ufs_read_super: fragment size %u is too small\n", - uspi->s_fsize); + pr_err("ufs_read_super: fragment size %u is too small\n", + uspi->s_fsize); goto failed; } if (uspi->s_fsize > 4096) { - printk(KERN_ERR "ufs_read_super: fragment size %u is too large\n", - uspi->s_fsize); + pr_err("ufs_read_super: fragment size %u is too large\n", + uspi->s_fsize); goto failed; } if (!is_power_of_2(uspi->s_bsize)) { - printk(KERN_ERR "ufs_read_super: block size %u is not a power of 2\n", - uspi->s_bsize); + pr_err("ufs_read_super: block size %u is not a power of 2\n", + uspi->s_bsize); goto failed; } if (uspi->s_bsize < 4096) { - printk(KERN_ERR "ufs_read_super: block size %u is too small\n", - uspi->s_bsize); + pr_err("ufs_read_super: block size %u is too small\n", + uspi->s_bsize); goto failed; } if (uspi->s_bsize / uspi->s_fsize > 8) { - printk(KERN_ERR "ufs_read_super: too many fragments per block (%u)\n", - uspi->s_bsize / uspi->s_fsize); + pr_err("ufs_read_super: too many fragments per block (%u)\n", + uspi->s_bsize / uspi->s_fsize); goto failed; } if (uspi->s_fsize != block_size || uspi->s_sbsize != super_block_size) { @@ -1113,20 +1110,21 @@ magic_found: UFSD("fs is DEC OSF/1\n"); break; case UFS_FSACTIVE: - printk("ufs_read_super: fs is active\n"); + pr_err("ufs_read_super: fs is active\n"); sb->s_flags |= MS_RDONLY; break; case UFS_FSBAD: - printk("ufs_read_super: fs is bad\n"); + pr_err("ufs_read_super: fs is bad\n"); sb->s_flags |= MS_RDONLY; break; default: - printk("ufs_read_super: can't grok fs_clean 0x%x\n", usb1->fs_clean); + pr_err("ufs_read_super: can't grok fs_clean 0x%x\n", + usb1->fs_clean); sb->s_flags |= MS_RDONLY; break; } } else { - printk("ufs_read_super: fs needs fsck\n"); + pr_err("ufs_read_super: fs needs fsck\n"); sb->s_flags |= MS_RDONLY; } @@ -1299,7 +1297,7 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data) if (!(new_mount_opt & UFS_MOUNT_UFSTYPE)) { new_mount_opt |= ufstype; } else if ((new_mount_opt & UFS_MOUNT_UFSTYPE) != ufstype) { - printk("ufstype can't be changed during remount\n"); + pr_err("ufstype can't be changed during remount\n"); unlock_ufs(sb); return -EINVAL; } @@ -1328,8 +1326,7 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data) * fs was mounted as ro, remounting rw */ #ifndef CONFIG_UFS_FS_WRITE - printk("ufs was compiled with read-only support, " - "can't be mounted as read-write\n"); + pr_err("ufs was compiled with read-only support, can't be mounted as read-write\n"); unlock_ufs(sb); return -EINVAL; #else @@ -1338,12 +1335,12 @@ static int ufs_remount (struct super_block *sb, int *mount_flags, char *data) ufstype != UFS_MOUNT_UFSTYPE_44BSD && ufstype != UFS_MOUNT_UFSTYPE_SUNx86 && ufstype != UFS_MOUNT_UFSTYPE_UFS2) { - printk("this ufstype is read-only supported\n"); + pr_err("this ufstype is read-only supported\n"); unlock_ufs(sb); return -EINVAL; } if (!ufs_read_cylinder_structures(sb)) { - printk("failed during remounting\n"); + pr_err("failed during remounting\n"); unlock_ufs(sb); return -EPERM; } -- cgit v0.10.2 From de771bdaaa9dd2585c59e1bd6cf2cac83aa0beec Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:20:59 -0700 Subject: fs/ufs: use pr_fmt Replace UFS-fs, UFS-fs: and UFS: by pr_fmt with module name "ufs: " Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/super.c b/fs/ufs/super.c index 9778b9f..cfa878e 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -65,7 +65,6 @@ * Evgeniy Dushistov , 2007 */ - #include #include #include @@ -310,13 +309,13 @@ void ufs_error (struct super_block * sb, const char * function, va_end (args); switch (UFS_SB(sb)->s_mount_opt & UFS_MOUNT_ONERROR) { case UFS_MOUNT_ONERROR_PANIC: - panic ("UFS-fs panic (device %s): %s: %s\n", + panic("panic (device %s): %s: %s\n", sb->s_id, function, error_buf); case UFS_MOUNT_ONERROR_LOCK: case UFS_MOUNT_ONERROR_UMOUNT: case UFS_MOUNT_ONERROR_REPAIR: - pr_crit("UFS-fs error (device %s): %s: %s\n", + pr_crit("error (device %s): %s: %s\n", sb->s_id, function, error_buf); } } @@ -340,7 +339,7 @@ void ufs_panic (struct super_block * sb, const char * function, vsnprintf (error_buf, sizeof(error_buf), fmt, args); va_end (args); sb->s_flags |= MS_RDONLY; - pr_crit("UFS-fs panic (device %s): %s: %s\n", + pr_crit("panic (device %s): %s: %s\n", sb->s_id, function, error_buf); } @@ -352,7 +351,7 @@ void ufs_warning (struct super_block * sb, const char * function, va_start (args, fmt); vsnprintf (error_buf, sizeof(error_buf), fmt, args); va_end (args); - pr_warn("UFS-fs warning (device %s): %s: %s\n", + pr_warn("warning (device %s): %s: %s\n", sb->s_id, function, error_buf); } @@ -464,12 +463,12 @@ static int ufs_parse_options (char * options, unsigned * mount_options) ufs_set_opt (*mount_options, ONERROR_UMOUNT); break; case Opt_onerror_repair: - pr_err("UFS-fs: Unable to do repair on error, will lock lock instead\n"); + pr_err("Unable to do repair on error, will lock lock instead\n"); ufs_clear_opt (*mount_options, ONERROR); ufs_set_opt (*mount_options, ONERROR_REPAIR); break; default: - pr_err("UFS-fs: Invalid option: \"%s\" or missing value\n", p); + pr_err("Invalid option: \"%s\" or missing value\n", p); return 0; } } @@ -972,7 +971,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) again: if (!sb_set_blocksize(sb, block_size)) { - pr_err("UFS: failed to set blocksize\n"); + pr_err("failed to set blocksize\n"); goto failed; } diff --git a/fs/ufs/ufs.h b/fs/ufs/ufs.h index 343e6fc..cbe1e9b 100644 --- a/fs/ufs/ufs.h +++ b/fs/ufs/ufs.h @@ -1,6 +1,12 @@ #ifndef _UFS_UFS_H #define _UFS_UFS_H 1 +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #define UFS_MAX_GROUP_LOADED 8 #define UFS_CGNO_EMPTY ((unsigned)-1) -- cgit v0.10.2 From 07bc94fdb4dc86d7008ab6b48ea410cbe72c96c3 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:01 -0700 Subject: fs/ufs/super.c: use __func__ in logging Replace approximate function name by __func__ using standard format "function():" Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/super.c b/fs/ufs/super.c index cfa878e..ca24557 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -1030,7 +1030,7 @@ again: goto again; } if (!silent) - pr_err("ufs_read_super: bad magic number\n"); + pr_err("%s(): bad magic number\n", __func__); goto failed; magic_found: @@ -1044,33 +1044,33 @@ magic_found: uspi->s_fshift = fs32_to_cpu(sb, usb1->fs_fshift); if (!is_power_of_2(uspi->s_fsize)) { - pr_err("ufs_read_super: fragment size %u is not a power of 2\n", - uspi->s_fsize); + pr_err("%s(): fragment size %u is not a power of 2\n", + __func__, uspi->s_fsize); goto failed; } if (uspi->s_fsize < 512) { - pr_err("ufs_read_super: fragment size %u is too small\n", - uspi->s_fsize); + pr_err("%s(): fragment size %u is too small\n", + __func__, uspi->s_fsize); goto failed; } if (uspi->s_fsize > 4096) { - pr_err("ufs_read_super: fragment size %u is too large\n", - uspi->s_fsize); + pr_err("%s(): fragment size %u is too large\n", + __func__, uspi->s_fsize); goto failed; } if (!is_power_of_2(uspi->s_bsize)) { - pr_err("ufs_read_super: block size %u is not a power of 2\n", - uspi->s_bsize); + pr_err("%s(): block size %u is not a power of 2\n", + __func__, uspi->s_bsize); goto failed; } if (uspi->s_bsize < 4096) { - pr_err("ufs_read_super: block size %u is too small\n", - uspi->s_bsize); + pr_err("%s(): block size %u is too small\n", + __func__, uspi->s_bsize); goto failed; } if (uspi->s_bsize / uspi->s_fsize > 8) { - pr_err("ufs_read_super: too many fragments per block (%u)\n", - uspi->s_bsize / uspi->s_fsize); + pr_err("%s(): too many fragments per block (%u)\n", + __func__, uspi->s_bsize / uspi->s_fsize); goto failed; } if (uspi->s_fsize != block_size || uspi->s_sbsize != super_block_size) { @@ -1109,21 +1109,21 @@ magic_found: UFSD("fs is DEC OSF/1\n"); break; case UFS_FSACTIVE: - pr_err("ufs_read_super: fs is active\n"); + pr_err("%s(): fs is active\n", __func__); sb->s_flags |= MS_RDONLY; break; case UFS_FSBAD: - pr_err("ufs_read_super: fs is bad\n"); + pr_err("%s(): fs is bad\n", __func__); sb->s_flags |= MS_RDONLY; break; default: - pr_err("ufs_read_super: can't grok fs_clean 0x%x\n", - usb1->fs_clean); + pr_err("%s(): can't grok fs_clean 0x%x\n", + __func__, usb1->fs_clean); sb->s_flags |= MS_RDONLY; break; } } else { - pr_err("ufs_read_super: fs needs fsck\n"); + pr_err("%s(): fs needs fsck\n", __func__); sb->s_flags |= MS_RDONLY; } -- cgit v0.10.2 From 7e1e4167d489b929a5f786076731de92e3e513e0 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:03 -0700 Subject: fs/ufs/super.c: use va_format instead of buffer/vsnprintf Remove error_buffer and use %pV Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/super.c b/fs/ufs/super.c index ca24557..206f7d6 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -286,13 +286,12 @@ static void ufs_print_cylinder_stuff(struct super_block *sb, static const struct super_operations ufs_super_ops; -static char error_buf[1024]; - void ufs_error (struct super_block * sb, const char * function, const char * fmt, ...) { struct ufs_sb_private_info * uspi; struct ufs_super_block_first * usb1; + struct va_format vaf; va_list args; uspi = UFS_SB(sb)->s_uspi; @@ -304,20 +303,21 @@ void ufs_error (struct super_block * sb, const char * function, ufs_mark_sb_dirty(sb); sb->s_flags |= MS_RDONLY; } - va_start (args, fmt); - vsnprintf (error_buf, sizeof(error_buf), fmt, args); - va_end (args); + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; switch (UFS_SB(sb)->s_mount_opt & UFS_MOUNT_ONERROR) { case UFS_MOUNT_ONERROR_PANIC: - panic("panic (device %s): %s: %s\n", - sb->s_id, function, error_buf); + panic("panic (device %s): %s: %pV\n", + sb->s_id, function, &vaf); case UFS_MOUNT_ONERROR_LOCK: case UFS_MOUNT_ONERROR_UMOUNT: case UFS_MOUNT_ONERROR_REPAIR: - pr_crit("error (device %s): %s: %s\n", - sb->s_id, function, error_buf); - } + pr_crit("error (device %s): %s: %pV\n", + sb->s_id, function, &vaf); + } + va_end(args); } void ufs_panic (struct super_block * sb, const char * function, @@ -325,6 +325,7 @@ void ufs_panic (struct super_block * sb, const char * function, { struct ufs_sb_private_info * uspi; struct ufs_super_block_first * usb1; + struct va_format vaf; va_list args; uspi = UFS_SB(sb)->s_uspi; @@ -335,24 +336,27 @@ void ufs_panic (struct super_block * sb, const char * function, ubh_mark_buffer_dirty(USPI_UBH(uspi)); ufs_mark_sb_dirty(sb); } - va_start (args, fmt); - vsnprintf (error_buf, sizeof(error_buf), fmt, args); - va_end (args); + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; sb->s_flags |= MS_RDONLY; - pr_crit("panic (device %s): %s: %s\n", - sb->s_id, function, error_buf); + pr_crit("panic (device %s): %s: %pV\n", + sb->s_id, function, &vaf); + va_end(args); } void ufs_warning (struct super_block * sb, const char * function, const char * fmt, ...) { + struct va_format vaf; va_list args; - va_start (args, fmt); - vsnprintf (error_buf, sizeof(error_buf), fmt, args); - va_end (args); - pr_warn("warning (device %s): %s: %s\n", - sb->s_id, function, error_buf); + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; + pr_warn("(device %s): %s: %pV\n", + sb->s_id, function, &vaf); + va_end(args); } enum { -- cgit v0.10.2 From d4beaabd307b4b7853ac97013ab0d6ee69429b04 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:05 -0700 Subject: fs/ufs: convert UFSD printk to pr_debug Convert no level printk to pr_debug in UFSD. DEBUG is defined with CONFIG_UFS_DEBUG so pr_debug are emitted here. Also fixing call to UFSD (add;) Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/super.c b/fs/ufs/super.c index 206f7d6..da73801 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -868,7 +868,7 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) break; case UFS_MOUNT_UFSTYPE_SUNOS: - UFSD(("ufstype=sunos\n")) + UFSD("ufstype=sunos\n"); uspi->s_fsize = block_size = 1024; uspi->s_fmask = ~(1024 - 1); uspi->s_fshift = 10; diff --git a/fs/ufs/ufs.h b/fs/ufs/ufs.h index cbe1e9b..2a07396 100644 --- a/fs/ufs/ufs.h +++ b/fs/ufs/ufs.h @@ -77,9 +77,9 @@ struct ufs_inode_info { */ #ifdef CONFIG_UFS_DEBUG # define UFSD(f, a...) { \ - printk ("UFSD (%s, %d): %s:", \ + pr_debug("UFSD (%s, %d): %s:", \ __FILE__, __LINE__, __func__); \ - printk (f, ## a); \ + pr_debug(f, ## a); \ } #else # define UFSD(f, a...) /**/ -- cgit v0.10.2 From edc023caf407b74ee2eb56dda23437c5836655f3 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:08 -0700 Subject: fs/ufs/inode.c: kernel-doc warning fixes Signed-off-by: Fabian Frederick Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c index 61e8a9b..7c580c9 100644 --- a/fs/ufs/inode.c +++ b/fs/ufs/inode.c @@ -158,16 +158,16 @@ out: /** * ufs_inode_getfrag() - allocate new fragment(s) - * @inode - pointer to inode - * @fragment - number of `fragment' which hold pointer + * @inode: pointer to inode + * @fragment: number of `fragment' which hold pointer * to new allocated fragment(s) - * @new_fragment - number of new allocated fragment(s) - * @required - how many fragment(s) we require - * @err - we set it if something wrong - * @phys - pointer to where we save physical number of new allocated fragments, + * @new_fragment: number of new allocated fragment(s) + * @required: how many fragment(s) we require + * @err: we set it if something wrong + * @phys: pointer to where we save physical number of new allocated fragments, * NULL if we allocate not data(indirect blocks for example). - * @new - we set it if we allocate new block - * @locked_page - for ufs_new_fragments() + * @new: we set it if we allocate new block + * @locked_page: for ufs_new_fragments() */ static struct buffer_head * ufs_inode_getfrag(struct inode *inode, u64 fragment, @@ -315,16 +315,16 @@ repeat2: /** * ufs_inode_getblock() - allocate new block - * @inode - pointer to inode - * @bh - pointer to block which hold "pointer" to new allocated block - * @fragment - number of `fragment' which hold pointer + * @inode: pointer to inode + * @bh: pointer to block which hold "pointer" to new allocated block + * @fragment: number of `fragment' which hold pointer * to new allocated block - * @new_fragment - number of new allocated fragment + * @new_fragment: number of new allocated fragment * (block will hold this fragment and also uspi->s_fpb-1) - * @err - see ufs_inode_getfrag() - * @phys - see ufs_inode_getfrag() - * @new - see ufs_inode_getfrag() - * @locked_page - see ufs_inode_getfrag() + * @err: see ufs_inode_getfrag() + * @phys: see ufs_inode_getfrag() + * @new: see ufs_inode_getfrag() + * @locked_page: see ufs_inode_getfrag() */ static struct buffer_head * ufs_inode_getblock(struct inode *inode, struct buffer_head *bh, -- cgit v0.10.2 From 53872ed07786714bff3642ee9ee61afd3f4eb749 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:10 -0700 Subject: fs/reiserfs: replace not-standard %Lu/%Ld Fixes checkpatch warnings: "WARNING: %Ld/%Lu are not-standard C, use %lld/%llu" Signed-off-by: Fabian Frederick Cc: Jeff Mahoney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/reiserfs/item_ops.c b/fs/reiserfs/item_ops.c index cfaee91..aca73dd 100644 --- a/fs/reiserfs/item_ops.c +++ b/fs/reiserfs/item_ops.c @@ -54,7 +54,7 @@ static void sd_print_item(struct item_head *ih, char *item) } else { struct stat_data *sd = (struct stat_data *)item; - printk("\t0%-6o | %6Lu | %2u | %d | %s\n", sd_v2_mode(sd), + printk("\t0%-6o | %6llu | %2u | %d | %s\n", sd_v2_mode(sd), (unsigned long long)sd_v2_size(sd), sd_v2_nlink(sd), sd_v2_rdev(sd), print_time(sd_v2_mtime(sd))); } @@ -408,7 +408,7 @@ static void direntry_print_item(struct item_head *ih, char *item) namebuf[namelen + 2] = 0; } - printk("%d: %-15s%-15d%-15d%-15Ld%-15Ld(%s)\n", + printk("%d: %-15s%-15d%-15d%-15lld%-15lld(%s)\n", i, namebuf, deh_dir_id(deh), deh_objectid(deh), GET_HASH_VALUE(deh_offset(deh)), diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c index c9b47e9..ae1dc84 100644 --- a/fs/reiserfs/prints.c +++ b/fs/reiserfs/prints.c @@ -17,7 +17,7 @@ static char off_buf[80]; static char *reiserfs_cpu_offset(struct cpu_key *key) { if (cpu_key_k_type(key) == TYPE_DIRENTRY) - sprintf(off_buf, "%Lu(%Lu)", + sprintf(off_buf, "%llu(%llu)", (unsigned long long) GET_HASH_VALUE(cpu_key_k_offset(key)), (unsigned long long) @@ -34,7 +34,7 @@ static char *le_offset(struct reiserfs_key *key) version = le_key_version(key); if (le_key_k_type(version, key) == TYPE_DIRENTRY) - sprintf(off_buf, "%Lu(%Lu)", + sprintf(off_buf, "%llu(%llu)", (unsigned long long) GET_HASH_VALUE(le_key_k_offset(version, key)), (unsigned long long) diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index dd44468..24cbe01 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -2006,7 +2006,7 @@ int reiserfs_do_truncate(struct reiserfs_transaction_handle *th, &s_search_path) == POSITION_FOUND); RFALSE(file_size > ROUND_UP(new_file_size), - "PAP-5680: truncate did not finish: new_file_size %Ld, current %Ld, oid %d", + "PAP-5680: truncate did not finish: new_file_size %lld, current %lld, oid %d", new_file_size, file_size, s_item_key.on_disk_key.k_objectid); update_and_out: diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index a392cef..debdb8e 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -331,7 +331,7 @@ static int finish_unfinished(struct super_block *s) * not completed truncate found. New size was * committed together with "save" link */ - reiserfs_info(s, "Truncating %k to %Ld ..", + reiserfs_info(s, "Truncating %k to %lld ..", INODE_PKEY(inode), inode->i_size); /* don't update modification time */ @@ -1577,7 +1577,7 @@ static int read_super_block(struct super_block *s, int offset) rs = (struct reiserfs_super_block *)bh->b_data; if (sb_blocksize(rs) != s->s_blocksize) { reiserfs_warning(s, "sh-2011", "can't find a reiserfs " - "filesystem on (dev %s, block %Lu, size %lu)", + "filesystem on (dev %s, block %llu, size %lu)", s->s_id, (unsigned long long)bh->b_blocknr, s->s_blocksize); @@ -2441,8 +2441,7 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type, struct buffer_head tmp_bh, *bh; if (!current->journal_info) { - printk(KERN_WARNING "reiserfs: Quota write (off=%Lu, len=%Lu)" - " cancelled because transaction is not started.\n", + printk(KERN_WARNING "reiserfs: Quota write (off=%llu, len=%llu) cancelled because transaction is not started.\n", (unsigned long long)off, (unsigned long long)len); return -EIO; } -- cgit v0.10.2 From 17093991af4995c4b93f6d8ac63aab68fcd9e1be Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:12 -0700 Subject: fs/reiserfs: use linux/uaccess.h Fix checkpatch warning WARNING: Use #include instead of Signed-off-by: Fabian Frederick Cc: Jeff Mahoney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c index d9f5a60..0a7dc94 100644 --- a/fs/reiserfs/dir.c +++ b/fs/reiserfs/dir.c @@ -9,7 +9,7 @@ #include #include #include -#include +#include extern const struct reiserfs_key MIN_KEY; diff --git a/fs/reiserfs/do_balan.c b/fs/reiserfs/do_balan.c index 54fdf19..5739cb9 100644 --- a/fs/reiserfs/do_balan.c +++ b/fs/reiserfs/do_balan.c @@ -10,7 +10,7 @@ * and using buffers obtained after all above. */ -#include +#include #include #include "reiserfs.h" #include diff --git a/fs/reiserfs/file.c b/fs/reiserfs/file.c index db9e80b..751dd3f 100644 --- a/fs/reiserfs/file.c +++ b/fs/reiserfs/file.c @@ -6,7 +6,7 @@ #include "reiserfs.h" #include "acl.h" #include "xattr.h" -#include +#include #include #include #include diff --git a/fs/reiserfs/ibalance.c b/fs/reiserfs/ibalance.c index 73231b1..b751eea 100644 --- a/fs/reiserfs/ibalance.c +++ b/fs/reiserfs/ibalance.c @@ -2,7 +2,7 @@ * Copyright 2000 by Hans Reiser, licensing governed by reiserfs/README */ -#include +#include #include #include #include "reiserfs.h" diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 63b2b0e..a7eec98 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -11,7 +11,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c index 501ed68..6ec8a30 100644 --- a/fs/reiserfs/ioctl.c +++ b/fs/reiserfs/ioctl.c @@ -7,7 +7,7 @@ #include #include "reiserfs.h" #include -#include +#include #include #include diff --git a/fs/reiserfs/lbalance.c b/fs/reiserfs/lbalance.c index d6744c8..814dda3 100644 --- a/fs/reiserfs/lbalance.c +++ b/fs/reiserfs/lbalance.c @@ -2,7 +2,7 @@ * Copyright 2000 by Hans Reiser, licensing governed by reiserfs/README */ -#include +#include #include #include #include "reiserfs.h" diff --git a/fs/reiserfs/procfs.c b/fs/reiserfs/procfs.c index 02b0b7d..621b9f3 100644 --- a/fs/reiserfs/procfs.c +++ b/fs/reiserfs/procfs.c @@ -11,7 +11,7 @@ #include #include #include -#include +#include #include "reiserfs.h" #include #include diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index debdb8e..709ea92 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -15,7 +15,7 @@ #include #include #include -#include +#include #include "reiserfs.h" #include "acl.h" #include "xattr.h" diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c index ca416d0..40c13dd 100644 --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -45,7 +45,7 @@ #include #include "xattr.h" #include "acl.h" -#include +#include #include #include #include diff --git a/fs/reiserfs/xattr_acl.c b/fs/reiserfs/xattr_acl.c index 44503e2..4b34b9d 100644 --- a/fs/reiserfs/xattr_acl.c +++ b/fs/reiserfs/xattr_acl.c @@ -9,7 +9,7 @@ #include #include "xattr.h" #include "acl.h" -#include +#include static int __reiserfs_set_acl(struct reiserfs_transaction_handle *th, struct inode *inode, int type, diff --git a/fs/reiserfs/xattr_security.c b/fs/reiserfs/xattr_security.c index 800a3ce..e7f8939 100644 --- a/fs/reiserfs/xattr_security.c +++ b/fs/reiserfs/xattr_security.c @@ -6,7 +6,7 @@ #include #include "xattr.h" #include -#include +#include static int security_get(struct dentry *dentry, const char *name, void *buffer, size_t size, diff --git a/fs/reiserfs/xattr_trusted.c b/fs/reiserfs/xattr_trusted.c index a003571..5eeb0c4 100644 --- a/fs/reiserfs/xattr_trusted.c +++ b/fs/reiserfs/xattr_trusted.c @@ -5,7 +5,7 @@ #include #include #include "xattr.h" -#include +#include static int trusted_get(struct dentry *dentry, const char *name, void *buffer, size_t size, diff --git a/fs/reiserfs/xattr_user.c b/fs/reiserfs/xattr_user.c index 8667491..e50eab0 100644 --- a/fs/reiserfs/xattr_user.c +++ b/fs/reiserfs/xattr_user.c @@ -4,7 +4,7 @@ #include #include #include "xattr.h" -#include +#include static int user_get(struct dentry *dentry, const char *name, void *buffer, size_t size, -- cgit v0.10.2 From f3fb9e27325c4e1730440820ea8a1e9d9a5af709 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:14 -0700 Subject: fs/reiserfs/xattr.c: fix blank line missing after declarations Fix checkpatch warning: WARNING: Missing a blank line after declarations Signed-off-by: Fabian Frederick Cc: Jeff Mahoney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c index 40c13dd..7c36898 100644 --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -84,6 +84,7 @@ static int xattr_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) static int xattr_unlink(struct inode *dir, struct dentry *dentry) { int error; + BUG_ON(!mutex_is_locked(&dir->i_mutex)); mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD); @@ -98,6 +99,7 @@ static int xattr_unlink(struct inode *dir, struct dentry *dentry) static int xattr_rmdir(struct inode *dir, struct dentry *dentry) { int error; + BUG_ON(!mutex_is_locked(&dir->i_mutex)); mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD); @@ -117,6 +119,7 @@ static struct dentry *open_xa_root(struct super_block *sb, int flags) { struct dentry *privroot = REISERFS_SB(sb)->priv_root; struct dentry *xaroot; + if (!privroot->d_inode) return ERR_PTR(-ENODATA); @@ -127,6 +130,7 @@ static struct dentry *open_xa_root(struct super_block *sb, int flags) xaroot = ERR_PTR(-ENODATA); else if (!xaroot->d_inode) { int err = -ENODATA; + if (xattr_may_create(flags)) err = xattr_mkdir(privroot->d_inode, xaroot, 0700); if (err) { @@ -157,6 +161,7 @@ static struct dentry *open_xa_dir(const struct inode *inode, int flags) xadir = lookup_one_len(namebuf, xaroot, strlen(namebuf)); if (!IS_ERR(xadir) && !xadir->d_inode) { int err = -ENODATA; + if (xattr_may_create(flags)) err = xattr_mkdir(xaroot->d_inode, xadir, 0700); if (err) { @@ -188,6 +193,7 @@ fill_with_dentries(void *buf, const char *name, int namelen, loff_t offset, { struct reiserfs_dentry_buf *dbuf = buf; struct dentry *dentry; + WARN_ON_ONCE(!mutex_is_locked(&dbuf->xadir->d_inode->i_mutex)); if (dbuf->count == ARRAY_SIZE(dbuf->dentries)) @@ -218,6 +224,7 @@ static void cleanup_dentry_buf(struct reiserfs_dentry_buf *buf) { int i; + for (i = 0; i < buf->count; i++) if (buf->dentries[i]) dput(buf->dentries[i]); @@ -283,11 +290,13 @@ static int reiserfs_for_each_xattr(struct inode *inode, int blocks = JOURNAL_PER_BALANCE_CNT * 2 + 2 + 4 * REISERFS_QUOTA_TRANS_BLOCKS(inode->i_sb); struct reiserfs_transaction_handle th; + reiserfs_write_lock(inode->i_sb); err = journal_begin(&th, inode->i_sb, blocks); reiserfs_write_unlock(inode->i_sb); if (!err) { int jerror; + mutex_lock_nested(&dir->d_parent->d_inode->i_mutex, I_MUTEX_XATTR); err = action(dir, data); @@ -340,6 +349,7 @@ static int chown_one_xattr(struct dentry *dentry, void *data) int reiserfs_delete_xattrs(struct inode *inode) { int err = reiserfs_for_each_xattr(inode, delete_one_xattr, NULL); + if (err) reiserfs_warning(inode->i_sb, "jdm-20004", "Couldn't delete all xattrs (%d)\n", err); @@ -350,6 +360,7 @@ int reiserfs_delete_xattrs(struct inode *inode) int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs) { int err = reiserfs_for_each_xattr(inode, chown_one_xattr, attrs); + if (err) reiserfs_warning(inode->i_sb, "jdm-20007", "Couldn't chown all xattrs (%d)\n", err); @@ -439,6 +450,7 @@ int reiserfs_commit_write(struct file *f, struct page *page, static void update_ctime(struct inode *inode) { struct timespec now = current_fs_time(inode->i_sb); + if (inode_unhashed(inode) || !inode->i_nlink || timespec_equal(&inode->i_ctime, &now)) return; @@ -514,6 +526,7 @@ reiserfs_xattr_set_handle(struct reiserfs_transaction_handle *th, size_t chunk; size_t skip = 0; size_t page_offset = (file_pos & (PAGE_CACHE_SIZE - 1)); + if (buffer_size - buffer_pos > PAGE_CACHE_SIZE) chunk = PAGE_CACHE_SIZE; else @@ -530,6 +543,7 @@ reiserfs_xattr_set_handle(struct reiserfs_transaction_handle *th, if (file_pos == 0) { struct reiserfs_xattr_header *rxh; + skip = file_pos = sizeof(struct reiserfs_xattr_header); if (chunk + skip > PAGE_CACHE_SIZE) chunk = PAGE_CACHE_SIZE - skip; @@ -659,6 +673,7 @@ reiserfs_xattr_get(struct inode *inode, const char *name, void *buffer, size_t chunk; char *data; size_t skip = 0; + if (isize - file_pos > PAGE_CACHE_SIZE) chunk = PAGE_CACHE_SIZE; else @@ -792,6 +807,7 @@ reiserfs_setxattr(struct dentry *dentry, const char *name, const void *value, int reiserfs_removexattr(struct dentry *dentry, const char *name) { const struct xattr_handler *handler; + handler = find_xattr_handler_prefix(dentry->d_sb->s_xattr, name); if (!handler || get_inode_sd_version(dentry->d_inode) == STAT_DATA_V1) @@ -813,9 +829,11 @@ static int listxattr_filler(void *buf, const char *name, int namelen, { struct listxattr_buf *b = (struct listxattr_buf *)buf; size_t size; + if (name[0] != '.' || (namelen != 1 && (name[1] != '.' || namelen != 2))) { const struct xattr_handler *handler; + handler = find_xattr_handler_prefix(b->dentry->d_sb->s_xattr, name); if (!handler) /* Unsupported xattr name */ @@ -885,6 +903,7 @@ static int create_privroot(struct dentry *dentry) { int err; struct inode *inode = dentry->d_parent->d_inode; + WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex)); err = xattr_mkdir(inode, dentry, 0700); @@ -1015,6 +1034,7 @@ int reiserfs_xattr_init(struct super_block *s, int mount_flags) mutex_lock(&privroot->d_inode->i_mutex); if (!REISERFS_SB(s)->xattr_root) { struct dentry *dentry; + dentry = lookup_one_len(XAROOT_NAME, privroot, strlen(XAROOT_NAME)); if (!IS_ERR(dentry)) -- cgit v0.10.2 From f58d6c75471c1b034b5bb0b38a7d6a9671f96299 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:16 -0700 Subject: fs/hpfs/dnode.c: fix suspect code indent Fix 2 checkpatch warnings: WARNING: suspect code indent for conditional statements Signed-off-by: Fabian Frederick Cc: Mikulas Patocka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/hpfs/dnode.c b/fs/hpfs/dnode.c index f36fc01..2923a7b 100644 --- a/fs/hpfs/dnode.c +++ b/fs/hpfs/dnode.c @@ -545,12 +545,13 @@ static void delete_empty_dnode(struct inode *i, dnode_secno dno) struct dnode *d1; struct quad_buffer_head qbh1; if (hpfs_sb(i->i_sb)->sb_chk) - if (up != i->i_ino) { - hpfs_error(i->i_sb, - "bad pointer to fnode, dnode %08x, pointing to %08x, should be %08lx", - dno, up, (unsigned long)i->i_ino); - return; - } + if (up != i->i_ino) { + hpfs_error(i->i_sb, + "bad pointer to fnode, dnode %08x, pointing to %08x, should be %08lx", + dno, up, + (unsigned long)i->i_ino); + return; + } if ((d1 = hpfs_map_dnode(i->i_sb, down, &qbh1))) { d1->up = cpu_to_le32(up); d1->root_dnode = 1; @@ -1061,8 +1062,8 @@ struct hpfs_dirent *map_fnode_dirent(struct super_block *s, fnode_secno fno, hpfs_brelse4(qbh); if (hpfs_sb(s)->sb_chk) if (hpfs_stop_cycles(s, dno, &c1, &c2, "map_fnode_dirent #1")) { - kfree(name2); - return NULL; + kfree(name2); + return NULL; } goto go_down; } -- cgit v0.10.2 From a0be55dee71d437f7593c8c3673edd92962bafaf Mon Sep 17 00:00:00 2001 From: Ionut Alexa Date: Fri, 8 Aug 2014 14:21:18 -0700 Subject: kernel/exit.c: fix coding style warnings and errors Fixed coding style warnings and errors. Signed-off-by: Ionut Alexa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/exit.c b/kernel/exit.c index 88c6b3e..32c58f7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -59,7 +59,7 @@ #include #include -static void exit_mm(struct task_struct * tsk); +static void exit_mm(struct task_struct *tsk); static void __unhash_process(struct task_struct *p, bool group_dead) { @@ -151,7 +151,7 @@ static void __exit_signal(struct task_struct *tsk) spin_unlock(&sighand->siglock); __cleanup_sighand(sighand); - clear_tsk_thread_flag(tsk,TIF_SIGPENDING); + clear_tsk_thread_flag(tsk, TIF_SIGPENDING); if (group_dead) { flush_sigqueue(&sig->shared_pending); tty_kref_put(tty); @@ -168,7 +168,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp) } -void release_task(struct task_struct * p) +void release_task(struct task_struct *p) { struct task_struct *leader; int zap_leader; @@ -192,7 +192,8 @@ repeat: */ zap_leader = 0; leader = p->group_leader; - if (leader != p && thread_group_empty(leader) && leader->exit_state == EXIT_ZOMBIE) { + if (leader != p && thread_group_empty(leader) + && leader->exit_state == EXIT_ZOMBIE) { /* * If we were the last child thread and the leader has * exited already, and the leader's parent ignores SIGCHLD, @@ -241,7 +242,8 @@ struct pid *session_of_pgrp(struct pid *pgrp) * * "I ask you, have you ever known what it is to be an orphan?" */ -static int will_become_orphaned_pgrp(struct pid *pgrp, struct task_struct *ignored_task) +static int will_become_orphaned_pgrp(struct pid *pgrp, + struct task_struct *ignored_task) { struct task_struct *p; @@ -294,9 +296,9 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) struct task_struct *ignored_task = tsk; if (!parent) - /* exit: our father is in a different pgrp than - * we are and we were the only connection outside. - */ + /* exit: our father is in a different pgrp than + * we are and we were the only connection outside. + */ parent = tsk->real_parent; else /* reparent: our child is in a different pgrp than @@ -405,7 +407,7 @@ assign_new_owner: * Turn us into a lazy TLB process if we * aren't already.. */ -static void exit_mm(struct task_struct * tsk) +static void exit_mm(struct task_struct *tsk) { struct mm_struct *mm = tsk->mm; struct core_state *core_state; @@ -425,6 +427,7 @@ static void exit_mm(struct task_struct * tsk) core_state = mm->core_state; if (core_state) { struct core_thread self; + up_read(&mm->mmap_sem); self.task = tsk; @@ -566,6 +569,7 @@ static void forget_original_parent(struct task_struct *father) list_for_each_entry_safe(p, n, &father->children, sibling) { struct task_struct *t = p; + do { t->real_parent = reaper; if (t->parent == father) { @@ -599,7 +603,7 @@ static void exit_notify(struct task_struct *tsk, int group_dead) /* * This does two things: * - * A. Make init inherit all the child processes + * A. Make init inherit all the child processes * B. Check to see if any process groups have become orphaned * as a result of our exiting, and if they have any stopped * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) @@ -649,9 +653,8 @@ static void check_stack_usage(void) spin_lock(&low_water_lock); if (free < lowest_to_date) { - printk(KERN_WARNING "%s (%d) used greatest stack depth: " - "%lu bytes left\n", - current->comm, task_pid_nr(current), free); + pr_warn("%s (%d) used greatest stack depth: %lu bytes left\n", + current->comm, task_pid_nr(current), free); lowest_to_date = free; } spin_unlock(&low_water_lock); @@ -692,8 +695,7 @@ void do_exit(long code) * leave this task alone and wait for reboot. */ if (unlikely(tsk->flags & PF_EXITING)) { - printk(KERN_ALERT - "Fixing recursive fault but reboot is needed!\n"); + pr_alert("Fixing recursive fault but reboot is needed!\n"); /* * We can do this unlocked here. The futex code uses * this flag just to verify whether the pi state @@ -717,9 +719,9 @@ void do_exit(long code) raw_spin_unlock_wait(&tsk->pi_lock); if (unlikely(in_atomic())) - printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n", - current->comm, task_pid_nr(current), - preempt_count()); + pr_info("note: %s[%d] exited with preempt_count %d\n", + current->comm, task_pid_nr(current), + preempt_count()); acct_update_integrals(tsk); /* sync mm's RSS info before statistics gathering */ @@ -837,7 +839,6 @@ void do_exit(long code) for (;;) cpu_relax(); /* For when BUG is null */ } - EXPORT_SYMBOL_GPL(do_exit); void complete_and_exit(struct completion *comp, long code) @@ -847,7 +848,6 @@ void complete_and_exit(struct completion *comp, long code) do_exit(code); } - EXPORT_SYMBOL(complete_and_exit); SYSCALL_DEFINE1(exit, int, error_code) @@ -870,6 +870,7 @@ do_group_exit(int exit_code) exit_code = sig->group_exit_code; else if (!thread_group_empty(current)) { struct sighand_struct *const sighand = current->sighand; + spin_lock_irq(&sighand->siglock); if (signal_group_exit(sig)) /* Another thread got here before we took the lock. */ @@ -1034,9 +1035,9 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) * as other threads in the parent group can be right * here reaping other children at the same time. * - * We use thread_group_cputime_adjusted() to get times for the thread - * group, which consolidates times for all threads in the - * group including the group leader. + * We use thread_group_cputime_adjusted() to get times for + * the thread group, which consolidates times for all threads + * in the group including the group leader. */ thread_group_cputime_adjusted(p, &tgutime, &tgstime); spin_lock_irq(&p->real_parent->sighand->siglock); @@ -1418,6 +1419,7 @@ static int do_wait_thread(struct wait_opts *wo, struct task_struct *tsk) list_for_each_entry(p, &tsk->children, sibling) { int ret = wait_consider_task(wo, 0, p); + if (ret) return ret; } @@ -1431,6 +1433,7 @@ static int ptrace_do_wait(struct wait_opts *wo, struct task_struct *tsk) list_for_each_entry(p, &tsk->ptraced, ptrace_entry) { int ret = wait_consider_task(wo, 1, p); + if (ret) return ret; } -- cgit v0.10.2 From 108a8a11cb64981b03bcb78fd78d09a53967d14f Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:20 -0700 Subject: fs/proc/kcore.c: use PAGE_ALIGN instead of ALIGN(PAGE_SIZE) Use mm.h definition. Signed-off-by: Fabian Frederick Cc: Xishi Qiu Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 39e6ef3..6df8d07 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -172,7 +172,7 @@ get_sparsemem_vmemmap_info(struct kcore_list *ent, struct list_head *head) start = ((unsigned long)pfn_to_page(pfn)) & PAGE_MASK; end = ((unsigned long)pfn_to_page(pfn + nr_pages)) - 1; - end = ALIGN(end, PAGE_SIZE); + end = PAGE_ALIGN(end); /* overlap check (because we have to align page */ list_for_each_entry(tmp, head, list) { if (tmp->type != KCORE_VMEMMAP) -- cgit v0.10.2 From ccf94f1b4a8560ffdc221840535bae5e5a91a53c Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:21:22 -0700 Subject: proc: constify seq_operations proc_uid_seq_operations, proc_gid_seq_operations and proc_projid_seq_operations are only called in proc_id_map_open with seq_open as const struct seq_operations so we can constify the 3 structures and update proc_id_map_open prototype. text data bss dec hex filename 6817 404 1984 9205 23f5 kernel/user_namespace.o-before 6913 308 1984 9205 23f5 kernel/user_namespace.o-after Signed-off-by: Fabian Frederick Cc: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 2d696b0..79df9ff 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2449,7 +2449,7 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer) #ifdef CONFIG_USER_NS static int proc_id_map_open(struct inode *inode, struct file *file, - struct seq_operations *seq_ops) + const struct seq_operations *seq_ops) { struct user_namespace *ns = NULL; struct task_struct *task; diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 4836ba3..e953726 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -57,9 +57,9 @@ static inline void put_user_ns(struct user_namespace *ns) } struct seq_operations; -extern struct seq_operations proc_uid_seq_operations; -extern struct seq_operations proc_gid_seq_operations; -extern struct seq_operations proc_projid_seq_operations; +extern const struct seq_operations proc_uid_seq_operations; +extern const struct seq_operations proc_gid_seq_operations; +extern const struct seq_operations proc_projid_seq_operations; extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *); extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, loff_t *); diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index fcc0256..aa312b0 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -526,21 +526,21 @@ static void m_stop(struct seq_file *seq, void *v) return; } -struct seq_operations proc_uid_seq_operations = { +const struct seq_operations proc_uid_seq_operations = { .start = uid_m_start, .stop = m_stop, .next = m_next, .show = uid_m_show, }; -struct seq_operations proc_gid_seq_operations = { +const struct seq_operations proc_gid_seq_operations = { .start = gid_m_start, .stop = m_stop, .next = m_next, .show = gid_m_show, }; -struct seq_operations proc_projid_seq_operations = { +const struct seq_operations proc_projid_seq_operations = { .start = projid_m_start, .stop = m_stop, .next = m_next, -- cgit v0.10.2 From dbcdb504417ae108a20454ef89776a614b948571 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:25 -0700 Subject: proc: add and remove /proc entry create checks * remove proc_create(NULL, ...) check, let it oops * warn about proc_create("", ...) and proc_create("very very long name", ...) proc code keeps length as u8, no 256+ name length possible * warn about proc_create("123", ...) /proc/$PID and /proc/misc namespaces are separate things, but dumb module might create funky a-la $PID entry. * remove post mortem strchr('/') check Triggering it implies either strchr() is buggy or memory corruption. It should be VFS check anyway. In reality, none of these checks will ever trigger, it is preparation for the next patch. Based on patch from Al Viro. Signed-off-by: Alexey Dobriyan Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 79df9ff..1137521 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2785,7 +2785,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign unsigned tgid; struct pid_namespace *ns; - tgid = name_to_int(dentry); + tgid = name_to_int(&dentry->d_name); if (tgid == ~0U) goto out; @@ -3033,7 +3033,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry if (!leader) goto out_no_task; - tid = name_to_int(dentry); + tid = name_to_int(&dentry->d_name); if (tid == ~0U) goto out; diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 0788d09..955bb55 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -206,7 +206,7 @@ static struct dentry *proc_lookupfd_common(struct inode *dir, { struct task_struct *task = get_proc_task(dir); int result = -ENOENT; - unsigned fd = name_to_int(dentry); + unsigned fd = name_to_int(&dentry->d_name); if (!task) goto out_no_task; diff --git a/fs/proc/generic.c b/fs/proc/generic.c index b7f268e..190862e 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -330,28 +330,28 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent, nlink_t nlink) { struct proc_dir_entry *ent = NULL; - const char *fn = name; - unsigned int len; - - /* make sure name is valid */ - if (!name || !strlen(name)) - goto out; + const char *fn; + struct qstr qstr; if (xlate_proc_name(name, parent, &fn) != 0) goto out; + qstr.name = fn; + qstr.len = strlen(fn); + if (qstr.len == 0 || qstr.len >= 256) { + WARN(1, "name len %u\n", qstr.len); + return NULL; + } + if (*parent == &proc_root && name_to_int(&qstr) != ~0U) { + WARN(1, "create '/proc/%s' by hand\n", qstr.name); + return NULL; + } - /* At this point there must not be any '/' characters beyond *fn */ - if (strchr(fn, '/')) - goto out; - - len = strlen(fn); - - ent = kzalloc(sizeof(struct proc_dir_entry) + len + 1, GFP_KERNEL); + ent = kzalloc(sizeof(struct proc_dir_entry) + qstr.len + 1, GFP_KERNEL); if (!ent) goto out; - memcpy(ent->name, fn, len + 1); - ent->namelen = len; + memcpy(ent->name, fn, qstr.len + 1); + ent->namelen = qstr.len; ent->mode = mode; ent->nlink = nlink; atomic_set(&ent->count, 1); diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 3ab6d14..a38408a 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -112,10 +112,10 @@ static inline int task_dumpable(struct task_struct *task) return 0; } -static inline unsigned name_to_int(struct dentry *dentry) +static inline unsigned name_to_int(const struct qstr *qstr) { - const char *name = dentry->d_name.name; - int len = dentry->d_name.len; + const char *name = qstr->name; + int len = qstr->len; unsigned n = 0; if (len > 1 && *name == '0') -- cgit v0.10.2 From 335eb53158466a4c4d018fa53ceb8c8ba1067fa3 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:27 -0700 Subject: proc: faster /proc/$PID lookup Currently lookup for /proc/$PID first goes through spinlock and whole list of misc /proc entries only to confirm that, yes, /proc/42 can not possibly match random proc entry. List is is several dozens entries long (52 entries on my setup). None of this is necessary. Try to convert dentry name to integer first. If it works, it must be /proc/$PID. If it doesn't, it must be random proc entry. Based on patch from Al Viro. Signed-off-by: Alexey Dobriyan Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 1137521..35afb2e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2780,7 +2780,7 @@ out: struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsigned int flags) { - int result = 0; + int result = -ENOENT; struct task_struct *task; unsigned tgid; struct pid_namespace *ns; diff --git a/fs/proc/root.c b/fs/proc/root.c index 5dbadec..574bafc 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -199,10 +199,10 @@ static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentry, unsigned int flags) { - if (!proc_lookup(dir, dentry, flags)) + if (!proc_pid_lookup(dir, dentry, flags)) return NULL; - return proc_pid_lookup(dir, dentry, flags); + return proc_lookup(dir, dentry, flags); } static int proc_root_readdir(struct file *file, struct dir_context *ctx) -- cgit v0.10.2 From 4dcc03fc4553943c8bae086aa826bb75171e82a1 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:29 -0700 Subject: proc: make proc_subdir_lock static Signed-off-by: Alexey Dobriyan Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 190862e..317b726 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -27,7 +27,7 @@ #include "internal.h" -DEFINE_SPINLOCK(proc_subdir_lock); +static DEFINE_SPINLOCK(proc_subdir_lock); static int proc_match(unsigned int len, const char *name, struct proc_dir_entry *de) { diff --git a/fs/proc/internal.h b/fs/proc/internal.h index a38408a..a17accbd 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -178,8 +178,6 @@ extern bool proc_fill_cache(struct file *, struct dir_context *, const char *, i /* * generic.c */ -extern spinlock_t proc_subdir_lock; - extern struct dentry *proc_lookup(struct inode *, struct dentry *, unsigned int); extern struct dentry *proc_lookup_de(struct proc_dir_entry *, struct inode *, struct dentry *); -- cgit v0.10.2 From 849d20a12822cba355c054ab0bfe490473413cce Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:31 -0700 Subject: proc: remove proc_tty_ldisc variable /proc/tty/ldisc appear to be unused as a directory and it had been always that way. But it is userspace visible thing. Cowardly remove only in-kernel variable holding it. [akpm@linux-foundation.org: add comment] Signed-off-by: Alexey Dobriyan Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/proc_tty.c b/fs/proc/proc_tty.c index cb761f0..15f327b 100644 --- a/fs/proc/proc_tty.c +++ b/fs/proc/proc_tty.c @@ -18,7 +18,7 @@ /* * The /proc/tty directory inodes... */ -static struct proc_dir_entry *proc_tty_ldisc, *proc_tty_driver; +static struct proc_dir_entry *proc_tty_driver; /* * This is the handler for /proc/tty/drivers @@ -176,7 +176,7 @@ void __init proc_tty_init(void) { if (!proc_mkdir("tty", NULL)) return; - proc_tty_ldisc = proc_mkdir("tty/ldisc", NULL); + proc_mkdir("tty/ldisc", NULL); /* Preserved: it's userspace visible */ /* * /proc/tty/driver/serial reveals the exact character counts for * serial links which is just too easy to abuse for inferring -- cgit v0.10.2 From cedbccab8bb104a1e2d6b40e3db2159b30c6cd76 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:33 -0700 Subject: proc: more "const char *" pointers Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 35afb2e..0e76895 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -105,7 +105,7 @@ */ struct pid_entry { - char *name; + const char *name; int len; umode_t mode; const struct inode_operations *iop; @@ -418,8 +418,8 @@ static int proc_oom_score(struct task_struct *task, char *buffer) } struct limit_names { - char *name; - char *unit; + const char *name; + const char *unit; }; static const struct limit_names lnames[RLIM_NLIMITS] = { @@ -2056,7 +2056,7 @@ static int show_timer(struct seq_file *m, void *v) struct k_itimer *timer; struct timers_private *tp = m->private; int notify; - static char *nstr[] = { + static const char * const nstr[] = { [SIGEV_SIGNAL] = "signal", [SIGEV_NONE] = "none", [SIGEV_THREAD] = "thread", -- cgit v0.10.2 From f9ea536ef80d7b3ce531943e2c6aae7c0fedd9bd Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:35 -0700 Subject: proc: convert /proc/$PID/auxv to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 0e76895..154bbd3 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -205,22 +205,20 @@ static int proc_pid_cmdline(struct task_struct *task, char *buffer) return get_cmdline(task, buffer, PAGE_SIZE); } -static int proc_pid_auxv(struct task_struct *task, char *buffer) +static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { struct mm_struct *mm = mm_access(task, PTRACE_MODE_READ); - int res = PTR_ERR(mm); if (mm && !IS_ERR(mm)) { unsigned int nwords = 0; do { nwords += 2; } while (mm->saved_auxv[nwords - 2] != 0); /* AT_NULL */ - res = nwords * sizeof(mm->saved_auxv[0]); - if (res > PAGE_SIZE) - res = PAGE_SIZE; - memcpy(buffer, mm->saved_auxv, res); + seq_write(m, mm->saved_auxv, nwords * sizeof(mm->saved_auxv[0])); mmput(mm); - } - return res; + return 0; + } else + return PTR_ERR(mm); } @@ -2557,7 +2555,7 @@ static const struct pid_entry tgid_base_stuff[] = { DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), #endif REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + ONE("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUSR, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), @@ -2896,7 +2894,7 @@ static const struct pid_entry tid_base_stuff[] = { DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), REG("environ", S_IRUSR, proc_environ_operations), - INF("auxv", S_IRUSR, proc_pid_auxv), + ONE("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUSR, proc_pid_personality), INF("limits", S_IRUGO, proc_pid_limits), -- cgit v0.10.2 From 1c963eb135075ecc5126c1ef695090cb0aea95c4 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:37 -0700 Subject: proc: convert /proc/$PID/limits to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 154bbd3..1dd6921 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -440,12 +440,11 @@ static const struct limit_names lnames[RLIM_NLIMITS] = { }; /* Display limits for a process */ -static int proc_pid_limits(struct task_struct *task, char *buffer) +static int proc_pid_limits(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { unsigned int i; - int count = 0; unsigned long flags; - char *bufptr = buffer; struct rlimit rlim[RLIM_NLIMITS]; @@ -457,31 +456,29 @@ static int proc_pid_limits(struct task_struct *task, char *buffer) /* * print the file header */ - count += sprintf(&bufptr[count], "%-25s %-20s %-20s %-10s\n", + seq_printf(m, "%-25s %-20s %-20s %-10s\n", "Limit", "Soft Limit", "Hard Limit", "Units"); for (i = 0; i < RLIM_NLIMITS; i++) { if (rlim[i].rlim_cur == RLIM_INFINITY) - count += sprintf(&bufptr[count], "%-25s %-20s ", + seq_printf(m, "%-25s %-20s ", lnames[i].name, "unlimited"); else - count += sprintf(&bufptr[count], "%-25s %-20lu ", + seq_printf(m, "%-25s %-20lu ", lnames[i].name, rlim[i].rlim_cur); if (rlim[i].rlim_max == RLIM_INFINITY) - count += sprintf(&bufptr[count], "%-20s ", "unlimited"); + seq_printf(m, "%-20s ", "unlimited"); else - count += sprintf(&bufptr[count], "%-20lu ", - rlim[i].rlim_max); + seq_printf(m, "%-20lu ", rlim[i].rlim_max); if (lnames[i].unit) - count += sprintf(&bufptr[count], "%-10s\n", - lnames[i].unit); + seq_printf(m, "%-10s\n", lnames[i].unit); else - count += sprintf(&bufptr[count], "\n"); + seq_putc(m, '\n'); } - return count; + return 0; } #ifdef CONFIG_HAVE_ARCH_TRACEHOOK @@ -2558,7 +2555,7 @@ static const struct pid_entry tgid_base_stuff[] = { ONE("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUSR, proc_pid_personality), - INF("limits", S_IRUGO, proc_pid_limits), + ONE("limits", S_IRUGO, proc_pid_limits), #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif @@ -2897,7 +2894,7 @@ static const struct pid_entry tid_base_stuff[] = { ONE("auxv", S_IRUSR, proc_pid_auxv), ONE("status", S_IRUGO, proc_pid_status), ONE("personality", S_IRUSR, proc_pid_personality), - INF("limits", S_IRUGO, proc_pid_limits), + ONE("limits", S_IRUGO, proc_pid_limits), #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif -- cgit v0.10.2 From 09d93bd6270981446cfaa5938c8d5f6cfc2d84a1 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:39 -0700 Subject: proc: convert /proc/$PID/syscall to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 1dd6921..f5d0f8f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -482,7 +482,8 @@ static int proc_pid_limits(struct seq_file *m, struct pid_namespace *ns, } #ifdef CONFIG_HAVE_ARCH_TRACEHOOK -static int proc_pid_syscall(struct task_struct *task, char *buffer) +static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { long nr; unsigned long args[6], sp, pc; @@ -491,11 +492,11 @@ static int proc_pid_syscall(struct task_struct *task, char *buffer) return res; if (task_current_syscall(task, &nr, args, 6, &sp, &pc)) - res = sprintf(buffer, "running\n"); + seq_puts(m, "running\n"); else if (nr < 0) - res = sprintf(buffer, "%ld 0x%lx 0x%lx\n", nr, sp, pc); + seq_printf(m, "%ld 0x%lx 0x%lx\n", nr, sp, pc); else - res = sprintf(buffer, + seq_printf(m, "%ld 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx\n", nr, args[0], args[1], args[2], args[3], args[4], args[5], @@ -2564,7 +2565,7 @@ static const struct pid_entry tgid_base_stuff[] = { #endif REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK - INF("syscall", S_IRUSR, proc_pid_syscall), + ONE("syscall", S_IRUSR, proc_pid_syscall), #endif INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), @@ -2900,7 +2901,7 @@ static const struct pid_entry tid_base_stuff[] = { #endif REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK - INF("syscall", S_IRUSR, proc_pid_syscall), + ONE("syscall", S_IRUSR, proc_pid_syscall), #endif INF("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), -- cgit v0.10.2 From 2ca66ff70a49f0c92352282d6a67d47be090a462 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:41 -0700 Subject: proc: convert /proc/$PID/cmdline to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index f5d0f8f..a6d0214 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -200,9 +200,16 @@ static int proc_root_link(struct dentry *dentry, struct path *path) return result; } -static int proc_pid_cmdline(struct task_struct *task, char *buffer) +static int proc_pid_cmdline(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { - return get_cmdline(task, buffer, PAGE_SIZE); + /* + * Rely on struct seq_operations::show() being called once + * per internal buffer allocation. See single_open(), traverse(). + */ + BUG_ON(m->size < PAGE_SIZE); + m->count += get_cmdline(task, m->buf, PAGE_SIZE); + return 0; } static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns, @@ -2567,7 +2574,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_HAVE_ARCH_TRACEHOOK ONE("syscall", S_IRUSR, proc_pid_syscall), #endif - INF("cmdline", S_IRUGO, proc_pid_cmdline), + ONE("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tgid_stat), ONE("statm", S_IRUGO, proc_pid_statm), REG("maps", S_IRUGO, proc_pid_maps_operations), @@ -2903,7 +2910,7 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_HAVE_ARCH_TRACEHOOK ONE("syscall", S_IRUSR, proc_pid_syscall), #endif - INF("cmdline", S_IRUGO, proc_pid_cmdline), + ONE("cmdline", S_IRUGO, proc_pid_cmdline), ONE("stat", S_IRUGO, proc_tid_stat), ONE("statm", S_IRUGO, proc_pid_statm), REG("maps", S_IRUGO, proc_tid_maps_operations), -- cgit v0.10.2 From edfcd6064fa7d368ea9b67f1fa0b1ea4c14a31c1 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:44 -0700 Subject: proc: convert /proc/$PID/wchan to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index a6d0214..353f287 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -234,7 +234,8 @@ static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns, * Provides a wchan file via kallsyms in a proper one-value-per-file format. * Returns the resolved symbol. If that fails, simply return the address. */ -static int proc_pid_wchan(struct task_struct *task, char *buffer) +static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { unsigned long wchan; char symname[KSYM_NAME_LEN]; @@ -245,9 +246,9 @@ static int proc_pid_wchan(struct task_struct *task, char *buffer) if (!ptrace_may_access(task, PTRACE_MODE_READ)) return 0; else - return sprintf(buffer, "%lu", wchan); + return seq_printf(m, "%lu", wchan); else - return sprintf(buffer, "%s", symname); + return seq_printf(m, "%s", symname); } #endif /* CONFIG_KALLSYMS */ @@ -2597,7 +2598,7 @@ static const struct pid_entry tgid_base_stuff[] = { DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations), #endif #ifdef CONFIG_KALLSYMS - INF("wchan", S_IRUGO, proc_pid_wchan), + ONE("wchan", S_IRUGO, proc_pid_wchan), #endif #ifdef CONFIG_STACKTRACE ONE("stack", S_IRUSR, proc_pid_stack), @@ -2935,7 +2936,7 @@ static const struct pid_entry tid_base_stuff[] = { DIR("attr", S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations), #endif #ifdef CONFIG_KALLSYMS - INF("wchan", S_IRUGO, proc_pid_wchan), + ONE("wchan", S_IRUGO, proc_pid_wchan), #endif #ifdef CONFIG_STACKTRACE ONE("stack", S_IRUSR, proc_pid_stack), -- cgit v0.10.2 From f6e826ca37a56e103cf45e0658ef293bb6b54c7c Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:46 -0700 Subject: proc: convert /proc/$PID/schedstat to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 353f287..4bb8d34 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -310,9 +310,10 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns, /* * Provides /proc/PID/schedstat */ -static int proc_pid_schedstat(struct task_struct *task, char *buffer) +static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { - return sprintf(buffer, "%llu %llu %lu\n", + return seq_printf(m, "%llu %llu %lu\n", (unsigned long long)task->se.sum_exec_runtime, (unsigned long long)task->sched_info.run_delay, task->sched_info.pcount); @@ -2604,7 +2605,7 @@ static const struct pid_entry tgid_base_stuff[] = { ONE("stack", S_IRUSR, proc_pid_stack), #endif #ifdef CONFIG_SCHEDSTATS - INF("schedstat", S_IRUGO, proc_pid_schedstat), + ONE("schedstat", S_IRUGO, proc_pid_schedstat), #endif #ifdef CONFIG_LATENCYTOP REG("latency", S_IRUGO, proc_lstats_operations), @@ -2942,7 +2943,7 @@ static const struct pid_entry tid_base_stuff[] = { ONE("stack", S_IRUSR, proc_pid_stack), #endif #ifdef CONFIG_SCHEDSTATS - INF("schedstat", S_IRUGO, proc_pid_schedstat), + ONE("schedstat", S_IRUGO, proc_pid_schedstat), #endif #ifdef CONFIG_LATENCYTOP REG("latency", S_IRUGO, proc_lstats_operations), -- cgit v0.10.2 From 6ba51e3751a3543f353a06cd8a6c975bda707c45 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:48 -0700 Subject: proc: convert /proc/$PID/oom_score to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index 4bb8d34..d0bf000 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -411,7 +411,8 @@ static const struct file_operations proc_cpuset_operations = { }; #endif -static int proc_oom_score(struct task_struct *task, char *buffer) +static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { unsigned long totalpages = totalram_pages + total_swap_pages; unsigned long points = 0; @@ -421,7 +422,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer) points = oom_badness(task, NULL, NULL, totalpages) * 1000 / totalpages; read_unlock(&tasklist_lock); - return sprintf(buffer, "%lu\n", points); + return seq_printf(m, "%lu\n", points); } struct limit_names { @@ -2616,7 +2617,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_CGROUPS REG("cgroup", S_IRUGO, proc_cgroup_operations), #endif - INF("oom_score", S_IRUGO, proc_oom_score), + ONE("oom_score", S_IRUGO, proc_oom_score), REG("oom_adj", S_IRUGO|S_IWUSR, proc_oom_adj_operations), REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations), #ifdef CONFIG_AUDITSYSCALL @@ -2954,7 +2955,7 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_CGROUPS REG("cgroup", S_IRUGO, proc_cgroup_operations), #endif - INF("oom_score", S_IRUGO, proc_oom_score), + ONE("oom_score", S_IRUGO, proc_oom_score), REG("oom_adj", S_IRUGO|S_IWUSR, proc_oom_adj_operations), REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations), #ifdef CONFIG_AUDITSYSCALL -- cgit v0.10.2 From 19aadc98d6a242e84c4a9e7cfac3d5140b885348 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:50 -0700 Subject: proc: convert /proc/$PID/io to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index d0bf000..089373e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2398,7 +2398,7 @@ static const struct file_operations proc_coredump_filter_operations = { #endif #ifdef CONFIG_TASK_IO_ACCOUNTING -static int do_io_accounting(struct task_struct *task, char *buffer, int whole) +static int do_io_accounting(struct task_struct *task, struct seq_file *m, int whole) { struct task_io_accounting acct = task->ioac; unsigned long flags; @@ -2422,7 +2422,7 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole) unlock_task_sighand(task, &flags); } - result = sprintf(buffer, + result = seq_printf(m, "rchar: %llu\n" "wchar: %llu\n" "syscr: %llu\n" @@ -2442,14 +2442,16 @@ out_unlock: return result; } -static int proc_tid_io_accounting(struct task_struct *task, char *buffer) +static int proc_tid_io_accounting(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { - return do_io_accounting(task, buffer, 0); + return do_io_accounting(task, m, 0); } -static int proc_tgid_io_accounting(struct task_struct *task, char *buffer) +static int proc_tgid_io_accounting(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { - return do_io_accounting(task, buffer, 1); + return do_io_accounting(task, m, 1); } #endif /* CONFIG_TASK_IO_ACCOUNTING */ @@ -2631,7 +2633,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("coredump_filter", S_IRUGO|S_IWUSR, proc_coredump_filter_operations), #endif #ifdef CONFIG_TASK_IO_ACCOUNTING - INF("io", S_IRUSR, proc_tgid_io_accounting), + ONE("io", S_IRUSR, proc_tgid_io_accounting), #endif #ifdef CONFIG_HARDWALL INF("hardwall", S_IRUGO, proc_pid_hardwall), @@ -2966,7 +2968,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations), #endif #ifdef CONFIG_TASK_IO_ACCOUNTING - INF("io", S_IRUSR, proc_tid_io_accounting), + ONE("io", S_IRUSR, proc_tid_io_accounting), #endif #ifdef CONFIG_HARDWALL INF("hardwall", S_IRUGO, proc_pid_hardwall), -- cgit v0.10.2 From d962c144839b231d7a787f9d2503f2d1171e2310 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:52 -0700 Subject: proc: convert /proc/$PID/hardwall to seq_file interface Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/tile/include/asm/hardwall.h b/arch/tile/include/asm/hardwall.h index 2f572b6..44d2765 100644 --- a/arch/tile/include/asm/hardwall.h +++ b/arch/tile/include/asm/hardwall.h @@ -23,7 +23,7 @@ struct proc_dir_entry; #ifdef CONFIG_HARDWALL void proc_tile_hardwall_init(struct proc_dir_entry *root); -int proc_pid_hardwall(struct task_struct *task, char *buffer); +int proc_pid_hardwall(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #else static inline void proc_tile_hardwall_init(struct proc_dir_entry *root) {} #endif diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c index 531f4c3..aca6000 100644 --- a/arch/tile/kernel/hardwall.c +++ b/arch/tile/kernel/hardwall.c @@ -947,15 +947,15 @@ static void hardwall_remove_proc(struct hardwall_info *info) remove_proc_entry(buf, info->type->proc_dir); } -int proc_pid_hardwall(struct task_struct *task, char *buffer) +int proc_pid_hardwall(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) { int i; int n = 0; for (i = 0; i < HARDWALL_TYPES; ++i) { struct hardwall_info *info = task->thread.hardwall[i].info; if (info) - n += sprintf(&buffer[n], "%s: %d\n", - info->type->name, info->id); + seq_printf(m, "%s: %d\n", info->type->name, info->id); } return n; } diff --git a/fs/proc/base.c b/fs/proc/base.c index 089373e..be41e8b 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2636,7 +2636,7 @@ static const struct pid_entry tgid_base_stuff[] = { ONE("io", S_IRUSR, proc_tgid_io_accounting), #endif #ifdef CONFIG_HARDWALL - INF("hardwall", S_IRUGO, proc_pid_hardwall), + ONE("hardwall", S_IRUGO, proc_pid_hardwall), #endif #ifdef CONFIG_USER_NS REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), @@ -2971,7 +2971,7 @@ static const struct pid_entry tid_base_stuff[] = { ONE("io", S_IRUSR, proc_tid_io_accounting), #endif #ifdef CONFIG_HARDWALL - INF("hardwall", S_IRUGO, proc_pid_hardwall), + ONE("hardwall", S_IRUGO, proc_pid_hardwall), #endif #ifdef CONFIG_USER_NS REG("uid_map", S_IRUGO|S_IWUSR, proc_uid_map_operations), -- cgit v0.10.2 From 8f053ac11f96cc6edcabcbb154c9cf06c5d63333 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 8 Aug 2014 14:21:54 -0700 Subject: proc: remove INF macro If you're applying this patch, all /proc/$PID/* files were converted to seq_file interface and this code became unused. Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/base.c b/fs/proc/base.c index be41e8b..043c83c 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -130,10 +130,6 @@ struct pid_entry { { .proc_get_link = get_link } ) #define REG(NAME, MODE, fops) \ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}) -#define INF(NAME, MODE, read) \ - NOD(NAME, (S_IFREG|(MODE)), \ - NULL, &proc_info_file_operations, \ - { .proc_read = read } ) #define ONE(NAME, MODE, show) \ NOD(NAME, (S_IFREG|(MODE)), \ NULL, &proc_single_file_operations, \ @@ -604,43 +600,6 @@ static const struct inode_operations proc_def_inode_operations = { .setattr = proc_setattr, }; -#define PROC_BLOCK_SIZE (3*1024) /* 4K page size but our output routines use some slack for overruns */ - -static ssize_t proc_info_read(struct file * file, char __user * buf, - size_t count, loff_t *ppos) -{ - struct inode * inode = file_inode(file); - unsigned long page; - ssize_t length; - struct task_struct *task = get_proc_task(inode); - - length = -ESRCH; - if (!task) - goto out_no_task; - - if (count > PROC_BLOCK_SIZE) - count = PROC_BLOCK_SIZE; - - length = -ENOMEM; - if (!(page = __get_free_page(GFP_TEMPORARY))) - goto out; - - length = PROC_I(inode)->op.proc_read(task, (char*)page); - - if (length >= 0) - length = simple_read_from_buffer(buf, count, ppos, (char *)page, length); - free_page(page); -out: - put_task_struct(task); -out_no_task: - return length; -} - -static const struct file_operations proc_info_file_operations = { - .read = proc_info_read, - .llseek = generic_file_llseek, -}; - static int proc_single_show(struct seq_file *m, void *v) { struct inode *inode = m->private; diff --git a/fs/proc/internal.h b/fs/proc/internal.h index a17accbd..a024cf7 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -52,7 +52,6 @@ struct proc_dir_entry { union proc_op { int (*proc_get_link)(struct dentry *, struct path *); - int (*proc_read)(struct task_struct *task, char *page); int (*proc_show)(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); -- cgit v0.10.2 From 41f727fde1fe40efeb4fef6fdce74ff794be5aeb Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:21:56 -0700 Subject: fork/exec: cleanup mm initialization mm initialization on fork/exec is spread all over the place, which makes the code look inconsistent. We have mm_init(), which is supposed to init/nullify mm's internals, but it doesn't init all the fields it should: - on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm are zeroed in dup_mmap(); - on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before calling mm_init(); - ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is called before mm_init() on both fork and exec; - ->context is initialized by init_new_context(), which is called after mm_init() on both fork and exec; Let's consolidate all the initializations in mm_init() to make the code look cleaner. Signed-off-by: Vladimir Davydov Cc: Oleg Nesterov Cc: David Rientjes Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exec.c b/fs/exec.c index ab1f120..a2b42a9 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -368,10 +368,6 @@ static int bprm_mm_init(struct linux_binprm *bprm) if (!mm) goto err; - err = init_new_context(current, mm); - if (err) - goto err; - err = __bprm_mm_init(bprm); if (err) goto err; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 796deac..6e0b286 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -461,6 +461,7 @@ static inline void mm_init_cpumask(struct mm_struct *mm) #ifdef CONFIG_CPUMASK_OFFSTACK mm->cpu_vm_mask_var = &mm->cpumask_allocation; #endif + cpumask_clear(mm->cpu_vm_mask_var); } /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */ diff --git a/kernel/fork.c b/kernel/fork.c index f6f5086..418b52a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -374,12 +374,6 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) */ down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING); - mm->locked_vm = 0; - mm->mmap = NULL; - mm->vmacache_seqnum = 0; - mm->map_count = 0; - cpumask_clear(mm_cpumask(mm)); - mm->mm_rb = RB_ROOT; rb_link = &mm->mm_rb.rb_node; rb_parent = NULL; pprev = &mm->mmap; @@ -538,17 +532,27 @@ static void mm_init_aio(struct mm_struct *mm) static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) { + mm->mmap = NULL; + mm->mm_rb = RB_ROOT; + mm->vmacache_seqnum = 0; atomic_set(&mm->mm_users, 1); atomic_set(&mm->mm_count, 1); init_rwsem(&mm->mmap_sem); INIT_LIST_HEAD(&mm->mmlist); mm->core_state = NULL; atomic_long_set(&mm->nr_ptes, 0); + mm->map_count = 0; + mm->locked_vm = 0; memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); + mm_init_cpumask(mm); mm_init_aio(mm); mm_init_owner(mm, p); + mmu_notifier_mm_init(mm); clear_tlb_flush_pending(mm); +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS + mm->pmd_huge_pte = NULL; +#endif if (current->mm) { mm->flags = current->mm->flags & MMF_INIT_MASK; @@ -558,11 +562,17 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) mm->def_flags = 0; } - if (likely(!mm_alloc_pgd(mm))) { - mmu_notifier_mm_init(mm); - return mm; - } + if (mm_alloc_pgd(mm)) + goto fail_nopgd; + + if (init_new_context(p, mm)) + goto fail_nocontext; + return mm; + +fail_nocontext: + mm_free_pgd(mm); +fail_nopgd: free_mm(mm); return NULL; } @@ -596,7 +606,6 @@ struct mm_struct *mm_alloc(void) return NULL; memset(mm, 0, sizeof(*mm)); - mm_init_cpumask(mm); return mm_init(mm, current); } @@ -828,17 +837,10 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) goto fail_nomem; memcpy(mm, oldmm, sizeof(*mm)); - mm_init_cpumask(mm); -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS - mm->pmd_huge_pte = NULL; -#endif if (!mm_init(mm, tsk)) goto fail_nomem; - if (init_new_context(tsk, mm)) - goto fail_nocontext; - dup_mm_exe_file(oldmm, mm); err = dup_mmap(mm, oldmm); @@ -860,15 +862,6 @@ free_pt: fail_nomem: return NULL; - -fail_nocontext: - /* - * If init_new_context() failed, we cannot use mmput() to free the mm - * because it calls destroy_context() - */ - mm_free_pgd(mm); - free_mm(mm); - return NULL; } static int copy_mm(unsigned long clone_flags, struct task_struct *tsk) -- cgit v0.10.2 From ce65cefa5debefc0e81d0a533bda467f0aa67350 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:21:58 -0700 Subject: fork: reset mm->pinned_vm mm->pinned_vm counts pages of mm's address space that were permanently pinned in memory by increasing their reference counter. The counter was introduced by commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages"), while before it locked_vm had been used for such pages. Obviously, we should reset the counter on fork if !CLONE_VM, just like we do with locked_vm, but currently we don't. Let's fix it. This patch will fix the contents of /proc/pid/status:VmPin. ib_umem_get[infiniband] and perf_mmap still check pinned_vm against RLIMIT_MEMLOCK. It's left from the times when pinned pages were accounted under locked_vm, but today it looks wrong. It isn't clear how we should deal with it. We still have some drivers accounting pinned pages under mm->locked_vm - this is what commit bc3e53f682d9 was fighting against. It's infiniband/usnic and vfio. Signed-off-by: Vladimir Davydov Cc: Oleg Nesterov Cc: David Rientjes Cc: Christoph Lameter Cc: Roland Dreier Cc: Sean Hefty Cc: Hal Rosenstock Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/fork.c b/kernel/fork.c index 418b52a..5a547a5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -543,6 +543,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) atomic_long_set(&mm->nr_ptes, 0); mm->map_count = 0; mm->locked_vm = 0; + mm->pinned_vm = 0; memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); mm_init_cpumask(mm); -- cgit v0.10.2 From 4f7d461433bb4a4deee61baefdac6cd1a1ecb546 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:22:01 -0700 Subject: fork: copy mm's vm usage counters under mmap_sem If a forking process has a thread calling (un)mmap (silly but still), the child process may have some of its mm's vm usage counters (total_vm and friends) screwed up, because currently they are copied from oldmm w/o holding any locks (memcpy in dup_mm). This patch moves the counters initialization to dup_mmap() to be called under oldmm->mmap_sem, which eliminates any possibility of race. Signed-off-by: Vladimir Davydov Cc: Oleg Nesterov Cc: David Rientjes Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/fork.c b/kernel/fork.c index 5a547a5..aff84f8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -374,6 +374,11 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) */ down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING); + mm->total_vm = oldmm->total_vm; + mm->shared_vm = oldmm->shared_vm; + mm->exec_vm = oldmm->exec_vm; + mm->stack_vm = oldmm->stack_vm; + rb_link = &mm->mm_rb.rb_node; rb_parent = NULL; pprev = &mm->mmap; -- cgit v0.10.2 From 33144e8429bd7fceacbb869a7f5061db42e13fe6 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Fri, 8 Aug 2014 14:22:03 -0700 Subject: kernel/fork.c: make mm_init_owner static It's only used in fork.c:mm_init(). Signed-off-by: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/sched.h b/include/linux/sched.h index 4fcf82a..b21e921 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2961,15 +2961,10 @@ static inline void inc_syscw(struct task_struct *tsk) #ifdef CONFIG_MEMCG extern void mm_update_next_owner(struct mm_struct *mm); -extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p); #else static inline void mm_update_next_owner(struct mm_struct *mm) { } - -static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p) -{ -} #endif /* CONFIG_MEMCG */ static inline unsigned long task_rlimit(const struct task_struct *tsk, diff --git a/kernel/fork.c b/kernel/fork.c index aff84f8..86da59e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -535,6 +535,13 @@ static void mm_init_aio(struct mm_struct *mm) #endif } +static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) +{ +#ifdef CONFIG_MEMCG + mm->owner = p; +#endif +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) { mm->mmap = NULL; @@ -1139,13 +1146,6 @@ static void rt_mutex_init_task(struct task_struct *p) #endif } -#ifdef CONFIG_MEMCG -void mm_init_owner(struct mm_struct *mm, struct task_struct *p) -{ - mm->owner = p; -} -#endif /* CONFIG_MEMCG */ - /* * Initialize POSIX timer handling for a single task. */ -- cgit v0.10.2 From 0692dedcf64bf3cdcfb9f6a51c70d49c8db351d2 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Fri, 8 Aug 2014 14:22:05 -0700 Subject: fs/proc/vmcore.c:mmap_vmcore: skip non-ram pages reported by hypervisors We have a special check in read_vmcore() handler to check if the page was reported as ram or not by the hypervisor (pfn_is_ram()). However, when vmcore is read with mmap() no such check is performed. That can lead to unpredictable results, e.g. when running Xen PVHVM guest memcpy() after mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating enormous load in both DomU and Dom0. Fix the issue by mapping each non-ram page to the zero page. Keep direct path with remap_oldmem_pfn_range() to avoid looping through all pages on bare metal. The issue can also be solved by overriding remap_oldmem_pfn_range() in xen-specific code, as remap_oldmem_pfn_range() was been designed for. That, however, would involve non-obvious xen code path for all x86 builds with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific code on x86 arch from doing the same override. [fengguang.wu@intel.com: remap_oldmem_pfn_checked() can be static] [akpm@linux-foundation.org: clean up layout] Signed-off-by: Vitaly Kuznetsov Reviewed-by: Andrew Jones Cc: Michael Holzheu Acked-by: Vivek Goyal Cc: David Vrabel Signed-off-by: Fengguang Wu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 382aa89..a90d6d35 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -328,6 +328,82 @@ static inline char *alloc_elfnotes_buf(size_t notes_sz) * virtually contiguous user-space in ELF layout. */ #ifdef CONFIG_MMU +/* + * remap_oldmem_pfn_checked - do remap_oldmem_pfn_range replacing all pages + * reported as not being ram with the zero page. + * + * @vma: vm_area_struct describing requested mapping + * @from: start remapping from + * @pfn: page frame number to start remapping to + * @size: remapping size + * @prot: protection bits + * + * Returns zero on success, -EAGAIN on failure. + */ +static int remap_oldmem_pfn_checked(struct vm_area_struct *vma, + unsigned long from, unsigned long pfn, + unsigned long size, pgprot_t prot) +{ + unsigned long map_size; + unsigned long pos_start, pos_end, pos; + unsigned long zeropage_pfn = my_zero_pfn(0); + size_t len = 0; + + pos_start = pfn; + pos_end = pfn + (size >> PAGE_SHIFT); + + for (pos = pos_start; pos < pos_end; ++pos) { + if (!pfn_is_ram(pos)) { + /* + * We hit a page which is not ram. Remap the continuous + * region between pos_start and pos-1 and replace + * the non-ram page at pos with the zero page. + */ + if (pos > pos_start) { + /* Remap continuous region */ + map_size = (pos - pos_start) << PAGE_SHIFT; + if (remap_oldmem_pfn_range(vma, from + len, + pos_start, map_size, + prot)) + goto fail; + len += map_size; + } + /* Remap the zero page */ + if (remap_oldmem_pfn_range(vma, from + len, + zeropage_pfn, + PAGE_SIZE, prot)) + goto fail; + len += PAGE_SIZE; + pos_start = pos + 1; + } + } + if (pos > pos_start) { + /* Remap the rest */ + map_size = (pos - pos_start) << PAGE_SHIFT; + if (remap_oldmem_pfn_range(vma, from + len, pos_start, + map_size, prot)) + goto fail; + } + return 0; +fail: + do_munmap(vma->vm_mm, from, len); + return -EAGAIN; +} + +static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, + unsigned long from, unsigned long pfn, + unsigned long size, pgprot_t prot) +{ + /* + * Check if oldmem_pfn_is_ram was registered to avoid + * looping over all pages without a reason. + */ + if (oldmem_pfn_is_ram) + return remap_oldmem_pfn_checked(vma, from, pfn, size, prot); + else + return remap_oldmem_pfn_range(vma, from, pfn, size, prot); +} + static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) { size_t size = vma->vm_end - vma->vm_start; @@ -387,9 +463,9 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) tsz = min_t(size_t, m->offset + m->size - start, size); paddr = m->paddr + start - m->offset; - if (remap_oldmem_pfn_range(vma, vma->vm_start + len, - paddr >> PAGE_SHIFT, tsz, - vma->vm_page_prot)) + if (vmcore_remap_oldmem_pfn(vma, vma->vm_start + len, + paddr >> PAGE_SHIFT, tsz, + vma->vm_page_prot)) goto fail; size -= tsz; start += tsz; -- cgit v0.10.2 From 93b7aca35dd7bf0c3ba7ea0542b556bcfdb28e76 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Fri, 8 Aug 2014 14:22:07 -0700 Subject: lib/idr.c: fix out-of-bounds pointer dereference I'm working on address sanitizer project for kernel. Recently we started experiments with stack instrumentation, to detect out-of-bounds read/write bugs on stack. Just after booting I've hit out-of-bounds read on stack in idr_for_each (and in __idr_remove_all as well): struct idr_layer **paa = &pa[0]; while (id >= 0 && id <= max) { ... while (n < fls(id)) { n += IDR_BITS; p = *--paa; <--- here we are reading pa[-1] value. } } Despite the fact that after this dereference we are exiting out of loop and never use p, such behaviour is undefined and should be avoided. Fix this by moving pointer derference to the beggining of the loop, right before we will use it. Signed-off-by: Andrey Ryabinin Reviewed-by: Lai Jiangshan Cc: Tejun Heo Cc: Alexey Preobrazhensky Cc: Dmitry Vyukov Cc: Konstantin Khlebnikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/idr.c b/lib/idr.c index 39158ab..50be3fa 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -590,26 +590,27 @@ static void __idr_remove_all(struct idr *idp) struct idr_layer **paa = &pa[0]; n = idp->layers * IDR_BITS; - p = idp->top; + *paa = idp->top; RCU_INIT_POINTER(idp->top, NULL); max = idr_max(idp->layers); id = 0; while (id >= 0 && id <= max) { + p = *paa; while (n > IDR_BITS && p) { n -= IDR_BITS; - *paa++ = p; p = p->ary[(id >> n) & IDR_MASK]; + *++paa = p; } bt_mask = id; id += 1 << n; /* Get the highest bit that the above add changed from 0->1. */ while (n < fls(id ^ bt_mask)) { - if (p) - free_layer(idp, p); + if (*paa) + free_layer(idp, *paa); n += IDR_BITS; - p = *--paa; + --paa; } } idp->layers = 0; @@ -692,15 +693,16 @@ int idr_for_each(struct idr *idp, struct idr_layer **paa = &pa[0]; n = idp->layers * IDR_BITS; - p = rcu_dereference_raw(idp->top); + *paa = rcu_dereference_raw(idp->top); max = idr_max(idp->layers); id = 0; while (id >= 0 && id <= max) { + p = *paa; while (n > 0 && p) { n -= IDR_BITS; - *paa++ = p; p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]); + *++paa = p; } if (p) { @@ -712,7 +714,7 @@ int idr_for_each(struct idr *idp, id += 1 << n; while (n < fls(id)) { n += IDR_BITS; - p = *--paa; + --paa; } } @@ -740,17 +742,18 @@ void *idr_get_next(struct idr *idp, int *nextidp) int n, max; /* find first ent */ - p = rcu_dereference_raw(idp->top); + p = *paa = rcu_dereference_raw(idp->top); if (!p) return NULL; n = (p->layer + 1) * IDR_BITS; max = idr_max(p->layer + 1); while (id >= 0 && id <= max) { + p = *paa; while (n > 0 && p) { n -= IDR_BITS; - *paa++ = p; p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]); + *++paa = p; } if (p) { @@ -768,7 +771,7 @@ void *idr_get_next(struct idr *idp, int *nextidp) id = round_up(id + 1, 1 << n); while (n < fls(id)) { n += IDR_BITS; - p = *--paa; + --paa; } } return NULL; -- cgit v0.10.2 From 4aff1ce7add1c432fe5ea3ae0231155f33e5ef38 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 8 Aug 2014 14:22:09 -0700 Subject: rapidio: add new RapidIO DMA interface routines Add RapidIO DMA interface routines that directly use reference to the mport device object and/or target device destination ID as parameters. This allows to perform RapidIO DMA transfer requests by modules that do not have an access to the RapidIO device list. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Andre van Herk Cc: Stef van Os Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c index a54ba04..d7b87c6 100644 --- a/drivers/rapidio/rio.c +++ b/drivers/rapidio/rio.c @@ -1509,30 +1509,39 @@ EXPORT_SYMBOL_GPL(rio_route_clr_table); static bool rio_chan_filter(struct dma_chan *chan, void *arg) { - struct rio_dev *rdev = arg; + struct rio_mport *mport = arg; /* Check that DMA device belongs to the right MPORT */ - return (rdev->net->hport == - container_of(chan->device, struct rio_mport, dma)); + return mport == container_of(chan->device, struct rio_mport, dma); } /** - * rio_request_dma - request RapidIO capable DMA channel that supports - * specified target RapidIO device. - * @rdev: RIO device control structure + * rio_request_mport_dma - request RapidIO capable DMA channel associated + * with specified local RapidIO mport device. + * @mport: RIO mport to perform DMA data transfers * * Returns pointer to allocated DMA channel or NULL if failed. */ -struct dma_chan *rio_request_dma(struct rio_dev *rdev) +struct dma_chan *rio_request_mport_dma(struct rio_mport *mport) { dma_cap_mask_t mask; - struct dma_chan *dchan; dma_cap_zero(mask); dma_cap_set(DMA_SLAVE, mask); - dchan = dma_request_channel(mask, rio_chan_filter, rdev); + return dma_request_channel(mask, rio_chan_filter, mport); +} +EXPORT_SYMBOL_GPL(rio_request_mport_dma); - return dchan; +/** + * rio_request_dma - request RapidIO capable DMA channel that supports + * specified target RapidIO device. + * @rdev: RIO device associated with DMA transfer + * + * Returns pointer to allocated DMA channel or NULL if failed. + */ +struct dma_chan *rio_request_dma(struct rio_dev *rdev) +{ + return rio_request_mport_dma(rdev->net->hport); } EXPORT_SYMBOL_GPL(rio_request_dma); @@ -1547,10 +1556,10 @@ void rio_release_dma(struct dma_chan *dchan) EXPORT_SYMBOL_GPL(rio_release_dma); /** - * rio_dma_prep_slave_sg - RapidIO specific wrapper + * rio_dma_prep_xfer - RapidIO specific wrapper * for device_prep_slave_sg callback defined by DMAENGINE. - * @rdev: RIO device control structure * @dchan: DMA channel to configure + * @destid: target RapidIO device destination ID * @data: RIO specific data descriptor * @direction: DMA data transfer direction (TO or FROM the device) * @flags: dmaengine defined flags @@ -1560,11 +1569,10 @@ EXPORT_SYMBOL_GPL(rio_release_dma); * target RIO device. * Returns pointer to DMA transaction descriptor or NULL if failed. */ -struct dma_async_tx_descriptor *rio_dma_prep_slave_sg(struct rio_dev *rdev, - struct dma_chan *dchan, struct rio_dma_data *data, +struct dma_async_tx_descriptor *rio_dma_prep_xfer(struct dma_chan *dchan, + u16 destid, struct rio_dma_data *data, enum dma_transfer_direction direction, unsigned long flags) { - struct dma_async_tx_descriptor *txd = NULL; struct rio_dma_ext rio_ext; if (dchan->device->device_prep_slave_sg == NULL) { @@ -1572,15 +1580,35 @@ struct dma_async_tx_descriptor *rio_dma_prep_slave_sg(struct rio_dev *rdev, return NULL; } - rio_ext.destid = rdev->destid; + rio_ext.destid = destid; rio_ext.rio_addr_u = data->rio_addr_u; rio_ext.rio_addr = data->rio_addr; rio_ext.wr_type = data->wr_type; - txd = dmaengine_prep_rio_sg(dchan, data->sg, data->sg_len, - direction, flags, &rio_ext); + return dmaengine_prep_rio_sg(dchan, data->sg, data->sg_len, + direction, flags, &rio_ext); +} +EXPORT_SYMBOL_GPL(rio_dma_prep_xfer); - return txd; +/** + * rio_dma_prep_slave_sg - RapidIO specific wrapper + * for device_prep_slave_sg callback defined by DMAENGINE. + * @rdev: RIO device control structure + * @dchan: DMA channel to configure + * @data: RIO specific data descriptor + * @direction: DMA data transfer direction (TO or FROM the device) + * @flags: dmaengine defined flags + * + * Initializes RapidIO capable DMA channel for the specified data transfer. + * Uses DMA channel private extension to pass information related to remote + * target RIO device. + * Returns pointer to DMA transaction descriptor or NULL if failed. + */ +struct dma_async_tx_descriptor *rio_dma_prep_slave_sg(struct rio_dev *rdev, + struct dma_chan *dchan, struct rio_dma_data *data, + enum dma_transfer_direction direction, unsigned long flags) +{ + return rio_dma_prep_xfer(dchan, rdev->destid, data, direction, flags); } EXPORT_SYMBOL_GPL(rio_dma_prep_slave_sg); diff --git a/include/linux/rio_drv.h b/include/linux/rio_drv.h index 5059994..9fc2f21 100644 --- a/include/linux/rio_drv.h +++ b/include/linux/rio_drv.h @@ -384,11 +384,16 @@ void rio_dev_put(struct rio_dev *); #ifdef CONFIG_RAPIDIO_DMA_ENGINE extern struct dma_chan *rio_request_dma(struct rio_dev *rdev); +extern struct dma_chan *rio_request_mport_dma(struct rio_mport *mport); extern void rio_release_dma(struct dma_chan *dchan); extern struct dma_async_tx_descriptor *rio_dma_prep_slave_sg( struct rio_dev *rdev, struct dma_chan *dchan, struct rio_dma_data *data, enum dma_transfer_direction direction, unsigned long flags); +extern struct dma_async_tx_descriptor *rio_dma_prep_xfer( + struct dma_chan *dchan, u16 destid, + struct rio_dma_data *data, + enum dma_transfer_direction direction, unsigned long flags); #endif /** -- cgit v0.10.2 From 50835e977b69c3278bd5e4264737138346df133f Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 8 Aug 2014 14:22:12 -0700 Subject: rapidio/tsi721_dma: rework scatter-gather list handling Rework Tsi721 RapidIO DMA engine support to allow handling data scatter/gather lists longer than number of hardware buffer descriptors in the DMA channel's descriptor list. The current implementation of Tsi721 DMA transfers requires that number of entries in a scatter/gather list provided by a caller of dmaengine_prep_rio_sg() should not exceed number of allocated hardware buffer descriptors. This patch removes the limitation by processing long scatter/gather lists by sections that can be transferred using hardware descriptor ring of configured size. It also introduces a module parameter "dma_desc_per_channel" to allow run-time configuration of Tsi721 hardware buffer descriptor rings. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Andre van Herk Cc: Stef van Os Cc: Vinod Koul Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/rapidio/tsi721.txt b/Documentation/rapidio/tsi721.txt index 335f3c6..626052f 100644 --- a/Documentation/rapidio/tsi721.txt +++ b/Documentation/rapidio/tsi721.txt @@ -20,13 +20,26 @@ II. Known problems None. -III. To do +III. DMA Engine Support - Add DMA data transfers (non-messaging). - Add inbound region (SRIO-to-PCIe) mapping. +Tsi721 mport driver supports DMA data transfers between local system memory and +remote RapidIO devices. This functionality is implemented according to SLAVE +mode API defined by common Linux kernel DMA Engine framework. + +Depending on system requirements RapidIO DMA operations can be included/excluded +by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven +out of eight available BDMA channels to support DMA data transfers. +One BDMA channel is reserved for generation of maintenance read/write requests. + +If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included, +this driver will accept DMA-specific module parameter: + "dma_desc_per_channel" - defines number of hardware buffer descriptors used by + each BDMA channel of Tsi721 (by default - 128). IV. Version History + 1.1.0 - DMA operations re-worked to support data scatter/gather lists larger + than hardware buffer descriptors ring. 1.0.0 - Initial driver release. V. License diff --git a/drivers/rapidio/devices/tsi721.h b/drivers/rapidio/devices/tsi721.h index 0305675..a7b4268 100644 --- a/drivers/rapidio/devices/tsi721.h +++ b/drivers/rapidio/devices/tsi721.h @@ -644,27 +644,26 @@ enum tsi721_smsg_int_flag { #ifdef CONFIG_RAPIDIO_DMA_ENGINE -#define TSI721_BDMA_BD_RING_SZ 128 #define TSI721_BDMA_MAX_BCOUNT (TSI721_DMAD_BCOUNT1 + 1) struct tsi721_tx_desc { struct dma_async_tx_descriptor txd; - struct tsi721_dma_desc *hw_desc; u16 destid; /* low 64-bits of 66-bit RIO address */ u64 rio_addr; /* upper 2-bits of 66-bit RIO address */ u8 rio_addr_u; - u32 bcount; - bool interrupt; + enum dma_rtype rtype; struct list_head desc_node; - struct list_head tx_list; + struct scatterlist *sg; + unsigned int sg_len; + enum dma_status status; }; struct tsi721_bdma_chan { int id; void __iomem *regs; - int bd_num; /* number of buffer descriptors */ + int bd_num; /* number of HW buffer descriptors */ void *bd_base; /* start of DMA descriptors */ dma_addr_t bd_phys; void *sts_base; /* start of DMA BD status FIFO */ @@ -680,7 +679,6 @@ struct tsi721_bdma_chan { struct list_head active_list; struct list_head queue; struct list_head free_list; - dma_cookie_t completed_cookie; struct tasklet_struct tasklet; bool active; }; diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c index 44341dc..f64c5de 100644 --- a/drivers/rapidio/devices/tsi721_dma.c +++ b/drivers/rapidio/devices/tsi721_dma.c @@ -1,7 +1,7 @@ /* * DMA Engine support for Tsi721 PCIExpress-to-SRIO bridge * - * Copyright 2011 Integrated Device Technology, Inc. + * Copyright (c) 2011-2014 Integrated Device Technology, Inc. * Alexandre Bounine * * This program is free software; you can redistribute it and/or modify it @@ -14,9 +14,8 @@ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for * more details. * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 - * Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * The full GNU General Public License is included in this distribution in the + * file called COPYING. */ #include @@ -32,9 +31,22 @@ #include #include #include +#include "../../dma/dmaengine.h" #include "tsi721.h" +#define TSI721_DMA_TX_QUEUE_SZ 16 /* number of transaction descriptors */ + +#ifdef CONFIG_PCI_MSI +static irqreturn_t tsi721_bdma_msix(int irq, void *ptr); +#endif +static int tsi721_submit_sg(struct tsi721_tx_desc *desc); + +static unsigned int dma_desc_per_channel = 128; +module_param(dma_desc_per_channel, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(dma_desc_per_channel, + "Number of DMA descriptors per channel (default: 128)"); + static inline struct tsi721_bdma_chan *to_tsi721_chan(struct dma_chan *chan) { return container_of(chan, struct tsi721_bdma_chan, dchan); @@ -59,7 +71,7 @@ struct tsi721_tx_desc *tsi721_dma_first_active( struct tsi721_tx_desc, desc_node); } -static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) +static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan, int bd_num) { struct tsi721_dma_desc *bd_ptr; struct device *dev = bdma_chan->dchan.device->dev; @@ -67,17 +79,23 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) dma_addr_t bd_phys; dma_addr_t sts_phys; int sts_size; - int bd_num = bdma_chan->bd_num; +#ifdef CONFIG_PCI_MSI + struct tsi721_device *priv = to_tsi721(bdma_chan->dchan.device); +#endif dev_dbg(dev, "Init Block DMA Engine, CH%d\n", bdma_chan->id); - /* Allocate space for DMA descriptors */ + /* + * Allocate space for DMA descriptors + * (add an extra element for link descriptor) + */ bd_ptr = dma_zalloc_coherent(dev, - bd_num * sizeof(struct tsi721_dma_desc), + (bd_num + 1) * sizeof(struct tsi721_dma_desc), &bd_phys, GFP_KERNEL); if (!bd_ptr) return -ENOMEM; + bdma_chan->bd_num = bd_num; bdma_chan->bd_phys = bd_phys; bdma_chan->bd_base = bd_ptr; @@ -85,8 +103,8 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) bd_ptr, (unsigned long long)bd_phys); /* Allocate space for descriptor status FIFO */ - sts_size = (bd_num >= TSI721_DMA_MINSTSSZ) ? - bd_num : TSI721_DMA_MINSTSSZ; + sts_size = ((bd_num + 1) >= TSI721_DMA_MINSTSSZ) ? + (bd_num + 1) : TSI721_DMA_MINSTSSZ; sts_size = roundup_pow_of_two(sts_size); sts_ptr = dma_zalloc_coherent(dev, sts_size * sizeof(struct tsi721_dma_sts), @@ -94,7 +112,7 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) if (!sts_ptr) { /* Free space allocated for DMA descriptors */ dma_free_coherent(dev, - bd_num * sizeof(struct tsi721_dma_desc), + (bd_num + 1) * sizeof(struct tsi721_dma_desc), bd_ptr, bd_phys); bdma_chan->bd_base = NULL; return -ENOMEM; @@ -108,11 +126,11 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) "desc status FIFO @ %p (phys = %llx) size=0x%x\n", sts_ptr, (unsigned long long)sts_phys, sts_size); - /* Initialize DMA descriptors ring */ - bd_ptr[bd_num - 1].type_id = cpu_to_le32(DTYPE3 << 29); - bd_ptr[bd_num - 1].next_lo = cpu_to_le32((u64)bd_phys & + /* Initialize DMA descriptors ring using added link descriptor */ + bd_ptr[bd_num].type_id = cpu_to_le32(DTYPE3 << 29); + bd_ptr[bd_num].next_lo = cpu_to_le32((u64)bd_phys & TSI721_DMAC_DPTRL_MASK); - bd_ptr[bd_num - 1].next_hi = cpu_to_le32((u64)bd_phys >> 32); + bd_ptr[bd_num].next_hi = cpu_to_le32((u64)bd_phys >> 32); /* Setup DMA descriptor pointers */ iowrite32(((u64)bd_phys >> 32), @@ -134,6 +152,55 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) ioread32(bdma_chan->regs + TSI721_DMAC_INT); +#ifdef CONFIG_PCI_MSI + /* Request interrupt service if we are in MSI-X mode */ + if (priv->flags & TSI721_USING_MSIX) { + int rc, idx; + + idx = TSI721_VECT_DMA0_DONE + bdma_chan->id; + + rc = request_irq(priv->msix[idx].vector, tsi721_bdma_msix, 0, + priv->msix[idx].irq_name, (void *)bdma_chan); + + if (rc) { + dev_dbg(dev, "Unable to get MSI-X for BDMA%d-DONE\n", + bdma_chan->id); + goto err_out; + } + + idx = TSI721_VECT_DMA0_INT + bdma_chan->id; + + rc = request_irq(priv->msix[idx].vector, tsi721_bdma_msix, 0, + priv->msix[idx].irq_name, (void *)bdma_chan); + + if (rc) { + dev_dbg(dev, "Unable to get MSI-X for BDMA%d-INT\n", + bdma_chan->id); + free_irq( + priv->msix[TSI721_VECT_DMA0_DONE + + bdma_chan->id].vector, + (void *)bdma_chan); + } + +err_out: + if (rc) { + /* Free space allocated for DMA descriptors */ + dma_free_coherent(dev, + (bd_num + 1) * sizeof(struct tsi721_dma_desc), + bd_ptr, bd_phys); + bdma_chan->bd_base = NULL; + + /* Free space allocated for status descriptors */ + dma_free_coherent(dev, + sts_size * sizeof(struct tsi721_dma_sts), + sts_ptr, sts_phys); + bdma_chan->sts_base = NULL; + + return -EIO; + } + } +#endif /* CONFIG_PCI_MSI */ + /* Toggle DMA channel initialization */ iowrite32(TSI721_DMAC_CTL_INIT, bdma_chan->regs + TSI721_DMAC_CTL); ioread32(bdma_chan->regs + TSI721_DMAC_CTL); @@ -147,6 +214,9 @@ static int tsi721_bdma_ch_init(struct tsi721_bdma_chan *bdma_chan) static int tsi721_bdma_ch_free(struct tsi721_bdma_chan *bdma_chan) { u32 ch_stat; +#ifdef CONFIG_PCI_MSI + struct tsi721_device *priv = to_tsi721(bdma_chan->dchan.device); +#endif if (bdma_chan->bd_base == NULL) return 0; @@ -159,9 +229,18 @@ static int tsi721_bdma_ch_free(struct tsi721_bdma_chan *bdma_chan) /* Put DMA channel into init state */ iowrite32(TSI721_DMAC_CTL_INIT, bdma_chan->regs + TSI721_DMAC_CTL); +#ifdef CONFIG_PCI_MSI + if (priv->flags & TSI721_USING_MSIX) { + free_irq(priv->msix[TSI721_VECT_DMA0_DONE + + bdma_chan->id].vector, (void *)bdma_chan); + free_irq(priv->msix[TSI721_VECT_DMA0_INT + + bdma_chan->id].vector, (void *)bdma_chan); + } +#endif /* CONFIG_PCI_MSI */ + /* Free space allocated for DMA descriptors */ dma_free_coherent(bdma_chan->dchan.device->dev, - bdma_chan->bd_num * sizeof(struct tsi721_dma_desc), + (bdma_chan->bd_num + 1) * sizeof(struct tsi721_dma_desc), bdma_chan->bd_base, bdma_chan->bd_phys); bdma_chan->bd_base = NULL; @@ -243,8 +322,8 @@ static void tsi721_start_dma(struct tsi721_bdma_chan *bdma_chan) } dev_dbg(bdma_chan->dchan.device->dev, - "tx_chan: %p, chan: %d, regs: %p\n", - bdma_chan, bdma_chan->dchan.chan_id, bdma_chan->regs); + "%s: chan_%d (wrc=%d)\n", __func__, bdma_chan->id, + bdma_chan->wr_count_next); iowrite32(bdma_chan->wr_count_next, bdma_chan->regs + TSI721_DMAC_DWRCNT); @@ -253,72 +332,19 @@ static void tsi721_start_dma(struct tsi721_bdma_chan *bdma_chan) bdma_chan->wr_count = bdma_chan->wr_count_next; } -static void tsi721_desc_put(struct tsi721_bdma_chan *bdma_chan, - struct tsi721_tx_desc *desc) -{ - dev_dbg(bdma_chan->dchan.device->dev, - "Put desc: %p into free list\n", desc); - - if (desc) { - spin_lock_bh(&bdma_chan->lock); - list_splice_init(&desc->tx_list, &bdma_chan->free_list); - list_add(&desc->desc_node, &bdma_chan->free_list); - bdma_chan->wr_count_next = bdma_chan->wr_count; - spin_unlock_bh(&bdma_chan->lock); - } -} - -static -struct tsi721_tx_desc *tsi721_desc_get(struct tsi721_bdma_chan *bdma_chan) -{ - struct tsi721_tx_desc *tx_desc, *_tx_desc; - struct tsi721_tx_desc *ret = NULL; - int i; - - spin_lock_bh(&bdma_chan->lock); - list_for_each_entry_safe(tx_desc, _tx_desc, - &bdma_chan->free_list, desc_node) { - if (async_tx_test_ack(&tx_desc->txd)) { - list_del(&tx_desc->desc_node); - ret = tx_desc; - break; - } - dev_dbg(bdma_chan->dchan.device->dev, - "desc %p not ACKed\n", tx_desc); - } - - if (ret == NULL) { - dev_dbg(bdma_chan->dchan.device->dev, - "%s: unable to obtain tx descriptor\n", __func__); - goto err_out; - } - - i = bdma_chan->wr_count_next % bdma_chan->bd_num; - if (i == bdma_chan->bd_num - 1) { - i = 0; - bdma_chan->wr_count_next++; /* skip link descriptor */ - } - - bdma_chan->wr_count_next++; - tx_desc->txd.phys = bdma_chan->bd_phys + - i * sizeof(struct tsi721_dma_desc); - tx_desc->hw_desc = &((struct tsi721_dma_desc *)bdma_chan->bd_base)[i]; -err_out: - spin_unlock_bh(&bdma_chan->lock); - - return ret; -} - static int -tsi721_desc_fill_init(struct tsi721_tx_desc *desc, struct scatterlist *sg, - enum dma_rtype rtype, u32 sys_size) +tsi721_desc_fill_init(struct tsi721_tx_desc *desc, + struct tsi721_dma_desc *bd_ptr, + struct scatterlist *sg, u32 sys_size) { - struct tsi721_dma_desc *bd_ptr = desc->hw_desc; u64 rio_addr; + if (bd_ptr == NULL) + return -EINVAL; + /* Initialize DMA descriptor */ bd_ptr->type_id = cpu_to_le32((DTYPE1 << 29) | - (rtype << 19) | desc->destid); + (desc->rtype << 19) | desc->destid); bd_ptr->bcount = cpu_to_le32(((desc->rio_addr & 0x3) << 30) | (sys_size << 26)); rio_addr = (desc->rio_addr >> 2) | @@ -335,51 +361,32 @@ tsi721_desc_fill_init(struct tsi721_tx_desc *desc, struct scatterlist *sg, } static int -tsi721_desc_fill_end(struct tsi721_tx_desc *desc) +tsi721_desc_fill_end(struct tsi721_dma_desc *bd_ptr, u32 bcount, bool interrupt) { - struct tsi721_dma_desc *bd_ptr = desc->hw_desc; + if (bd_ptr == NULL) + return -EINVAL; /* Update DMA descriptor */ - if (desc->interrupt) + if (interrupt) bd_ptr->type_id |= cpu_to_le32(TSI721_DMAD_IOF); - bd_ptr->bcount |= cpu_to_le32(desc->bcount & TSI721_DMAD_BCOUNT1); + bd_ptr->bcount |= cpu_to_le32(bcount & TSI721_DMAD_BCOUNT1); return 0; } - -static void tsi721_dma_chain_complete(struct tsi721_bdma_chan *bdma_chan, - struct tsi721_tx_desc *desc) +static void tsi721_dma_tx_err(struct tsi721_bdma_chan *bdma_chan, + struct tsi721_tx_desc *desc) { struct dma_async_tx_descriptor *txd = &desc->txd; dma_async_tx_callback callback = txd->callback; void *param = txd->callback_param; - list_splice_init(&desc->tx_list, &bdma_chan->free_list); list_move(&desc->desc_node, &bdma_chan->free_list); - bdma_chan->completed_cookie = txd->cookie; if (callback) callback(param); } -static void tsi721_dma_complete_all(struct tsi721_bdma_chan *bdma_chan) -{ - struct tsi721_tx_desc *desc, *_d; - LIST_HEAD(list); - - BUG_ON(!tsi721_dma_is_idle(bdma_chan)); - - if (!list_empty(&bdma_chan->queue)) - tsi721_start_dma(bdma_chan); - - list_splice_init(&bdma_chan->active_list, &list); - list_splice_init(&bdma_chan->queue, &bdma_chan->active_list); - - list_for_each_entry_safe(desc, _d, &list, desc_node) - tsi721_dma_chain_complete(bdma_chan, desc); -} - static void tsi721_clr_stat(struct tsi721_bdma_chan *bdma_chan) { u32 srd_ptr; @@ -403,20 +410,159 @@ static void tsi721_clr_stat(struct tsi721_bdma_chan *bdma_chan) bdma_chan->sts_rdptr = srd_ptr; } +/* Must be called with the channel spinlock held */ +static int tsi721_submit_sg(struct tsi721_tx_desc *desc) +{ + struct dma_chan *dchan = desc->txd.chan; + struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); + u32 sys_size; + u64 rio_addr; + dma_addr_t next_addr; + u32 bcount; + struct scatterlist *sg; + unsigned int i; + int err = 0; + struct tsi721_dma_desc *bd_ptr = NULL; + u32 idx, rd_idx; + u32 add_count = 0; + + if (!tsi721_dma_is_idle(bdma_chan)) { + dev_err(bdma_chan->dchan.device->dev, + "BUG: Attempt to use non-idle channel\n"); + return -EIO; + } + + /* + * Fill DMA channel's hardware buffer descriptors. + * (NOTE: RapidIO destination address is limited to 64 bits for now) + */ + rio_addr = desc->rio_addr; + next_addr = -1; + bcount = 0; + sys_size = dma_to_mport(bdma_chan->dchan.device)->sys_size; + + rd_idx = ioread32(bdma_chan->regs + TSI721_DMAC_DRDCNT); + rd_idx %= (bdma_chan->bd_num + 1); + + idx = bdma_chan->wr_count_next % (bdma_chan->bd_num + 1); + if (idx == bdma_chan->bd_num) { + /* wrap around link descriptor */ + idx = 0; + add_count++; + } + + dev_dbg(dchan->device->dev, "%s: BD ring status: rdi=%d wri=%d\n", + __func__, rd_idx, idx); + + for_each_sg(desc->sg, sg, desc->sg_len, i) { + + dev_dbg(dchan->device->dev, "sg%d/%d addr: 0x%llx len: %d\n", + i, desc->sg_len, + (unsigned long long)sg_dma_address(sg), sg_dma_len(sg)); + + if (sg_dma_len(sg) > TSI721_BDMA_MAX_BCOUNT) { + dev_err(dchan->device->dev, + "%s: SG entry %d is too large\n", __func__, i); + err = -EINVAL; + break; + } + + /* + * If this sg entry forms contiguous block with previous one, + * try to merge it into existing DMA descriptor + */ + if (next_addr == sg_dma_address(sg) && + bcount + sg_dma_len(sg) <= TSI721_BDMA_MAX_BCOUNT) { + /* Adjust byte count of the descriptor */ + bcount += sg_dma_len(sg); + goto entry_done; + } else if (next_addr != -1) { + /* Finalize descriptor using total byte count value */ + tsi721_desc_fill_end(bd_ptr, bcount, 0); + dev_dbg(dchan->device->dev, + "%s: prev desc final len: %d\n", + __func__, bcount); + } + + desc->rio_addr = rio_addr; + + if (i && idx == rd_idx) { + dev_dbg(dchan->device->dev, + "%s: HW descriptor ring is full @ %d\n", + __func__, i); + desc->sg = sg; + desc->sg_len -= i; + break; + } + + bd_ptr = &((struct tsi721_dma_desc *)bdma_chan->bd_base)[idx]; + err = tsi721_desc_fill_init(desc, bd_ptr, sg, sys_size); + if (err) { + dev_err(dchan->device->dev, + "Failed to build desc: err=%d\n", err); + break; + } + + dev_dbg(dchan->device->dev, "bd_ptr = %p did=%d raddr=0x%llx\n", + bd_ptr, desc->destid, desc->rio_addr); + + next_addr = sg_dma_address(sg); + bcount = sg_dma_len(sg); + + add_count++; + if (++idx == bdma_chan->bd_num) { + /* wrap around link descriptor */ + idx = 0; + add_count++; + } + +entry_done: + if (sg_is_last(sg)) { + tsi721_desc_fill_end(bd_ptr, bcount, 0); + dev_dbg(dchan->device->dev, "%s: last desc final len: %d\n", + __func__, bcount); + desc->sg_len = 0; + } else { + rio_addr += sg_dma_len(sg); + next_addr += sg_dma_len(sg); + } + } + + if (!err) + bdma_chan->wr_count_next += add_count; + + return err; +} + static void tsi721_advance_work(struct tsi721_bdma_chan *bdma_chan) { - if (list_empty(&bdma_chan->active_list) || - list_is_singular(&bdma_chan->active_list)) { - dev_dbg(bdma_chan->dchan.device->dev, - "%s: Active_list empty\n", __func__); - tsi721_dma_complete_all(bdma_chan); - } else { - dev_dbg(bdma_chan->dchan.device->dev, - "%s: Active_list NOT empty\n", __func__); - tsi721_dma_chain_complete(bdma_chan, - tsi721_dma_first_active(bdma_chan)); - tsi721_start_dma(bdma_chan); + struct tsi721_tx_desc *desc; + int err; + + dev_dbg(bdma_chan->dchan.device->dev, "%s: Enter\n", __func__); + + /* + * If there are any new transactions in the queue add them + * into the processing list + */ + if (!list_empty(&bdma_chan->queue)) + list_splice_init(&bdma_chan->queue, &bdma_chan->active_list); + + /* Start new transaction (if available) */ + if (!list_empty(&bdma_chan->active_list)) { + desc = tsi721_dma_first_active(bdma_chan); + err = tsi721_submit_sg(desc); + if (!err) + tsi721_start_dma(bdma_chan); + else { + tsi721_dma_tx_err(bdma_chan, desc); + dev_dbg(bdma_chan->dchan.device->dev, + "ERR: tsi721_submit_sg failed with err=%d\n", + err); + } } + + dev_dbg(bdma_chan->dchan.device->dev, "%s: Exit\n", __func__); } static void tsi721_dma_tasklet(unsigned long data) @@ -444,8 +590,29 @@ static void tsi721_dma_tasklet(unsigned long data) } if (dmac_int & (TSI721_DMAC_INT_DONE | TSI721_DMAC_INT_IOFDONE)) { + struct tsi721_tx_desc *desc; + tsi721_clr_stat(bdma_chan); spin_lock(&bdma_chan->lock); + desc = tsi721_dma_first_active(bdma_chan); + + if (desc->sg_len == 0) { + dma_async_tx_callback callback = NULL; + void *param = NULL; + + desc->status = DMA_COMPLETE; + dma_cookie_complete(&desc->txd); + if (desc->txd.flags & DMA_PREP_INTERRUPT) { + callback = desc->txd.callback; + param = desc->txd.callback_param; + } + list_move(&desc->desc_node, &bdma_chan->free_list); + spin_unlock(&bdma_chan->lock); + if (callback) + callback(param); + spin_lock(&bdma_chan->lock); + } + tsi721_advance_work(bdma_chan); spin_unlock(&bdma_chan->lock); } @@ -460,21 +627,24 @@ static dma_cookie_t tsi721_tx_submit(struct dma_async_tx_descriptor *txd) struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(txd->chan); dma_cookie_t cookie; - spin_lock_bh(&bdma_chan->lock); + /* Check if the descriptor is detached from any lists */ + if (!list_empty(&desc->desc_node)) { + dev_err(bdma_chan->dchan.device->dev, + "%s: wrong state of descriptor %p\n", __func__, txd); + return -EIO; + } - cookie = txd->chan->cookie; - if (++cookie < 0) - cookie = 1; - txd->chan->cookie = cookie; - txd->cookie = cookie; + spin_lock_bh(&bdma_chan->lock); - if (list_empty(&bdma_chan->active_list)) { - list_add_tail(&desc->desc_node, &bdma_chan->active_list); - tsi721_start_dma(bdma_chan); - } else { - list_add_tail(&desc->desc_node, &bdma_chan->queue); + if (!bdma_chan->active) { + spin_unlock_bh(&bdma_chan->lock); + return -ENODEV; } + cookie = dma_cookie_assign(txd); + desc->status = DMA_IN_PROGRESS; + list_add_tail(&desc->desc_node, &bdma_chan->queue); + spin_unlock_bh(&bdma_chan->lock); return cookie; } @@ -482,115 +652,52 @@ static dma_cookie_t tsi721_tx_submit(struct dma_async_tx_descriptor *txd) static int tsi721_alloc_chan_resources(struct dma_chan *dchan) { struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); -#ifdef CONFIG_PCI_MSI - struct tsi721_device *priv = to_tsi721(dchan->device); -#endif struct tsi721_tx_desc *desc = NULL; - LIST_HEAD(tmp_list); int i; - int rc; + + dev_dbg(dchan->device->dev, "%s: for channel %d\n", + __func__, bdma_chan->id); if (bdma_chan->bd_base) - return bdma_chan->bd_num - 1; + return TSI721_DMA_TX_QUEUE_SZ; /* Initialize BDMA channel */ - if (tsi721_bdma_ch_init(bdma_chan)) { + if (tsi721_bdma_ch_init(bdma_chan, dma_desc_per_channel)) { dev_err(dchan->device->dev, "Unable to initialize data DMA" " channel %d, aborting\n", bdma_chan->id); - return -ENOMEM; + return -ENODEV; } - /* Alocate matching number of logical descriptors */ - desc = kcalloc((bdma_chan->bd_num - 1), sizeof(struct tsi721_tx_desc), + /* Allocate queue of transaction descriptors */ + desc = kcalloc(TSI721_DMA_TX_QUEUE_SZ, sizeof(struct tsi721_tx_desc), GFP_KERNEL); if (!desc) { dev_err(dchan->device->dev, "Failed to allocate logical descriptors\n"); - rc = -ENOMEM; - goto err_out; + tsi721_bdma_ch_free(bdma_chan); + return -ENOMEM; } bdma_chan->tx_desc = desc; - for (i = 0; i < bdma_chan->bd_num - 1; i++) { + for (i = 0; i < TSI721_DMA_TX_QUEUE_SZ; i++) { dma_async_tx_descriptor_init(&desc[i].txd, dchan); desc[i].txd.tx_submit = tsi721_tx_submit; desc[i].txd.flags = DMA_CTRL_ACK; - INIT_LIST_HEAD(&desc[i].tx_list); - list_add_tail(&desc[i].desc_node, &tmp_list); + list_add(&desc[i].desc_node, &bdma_chan->free_list); } - spin_lock_bh(&bdma_chan->lock); - list_splice(&tmp_list, &bdma_chan->free_list); - bdma_chan->completed_cookie = dchan->cookie = 1; - spin_unlock_bh(&bdma_chan->lock); - -#ifdef CONFIG_PCI_MSI - if (priv->flags & TSI721_USING_MSIX) { - /* Request interrupt service if we are in MSI-X mode */ - rc = request_irq( - priv->msix[TSI721_VECT_DMA0_DONE + - bdma_chan->id].vector, - tsi721_bdma_msix, 0, - priv->msix[TSI721_VECT_DMA0_DONE + - bdma_chan->id].irq_name, - (void *)bdma_chan); - - if (rc) { - dev_dbg(dchan->device->dev, - "Unable to allocate MSI-X interrupt for " - "BDMA%d-DONE\n", bdma_chan->id); - goto err_out; - } - - rc = request_irq(priv->msix[TSI721_VECT_DMA0_INT + - bdma_chan->id].vector, - tsi721_bdma_msix, 0, - priv->msix[TSI721_VECT_DMA0_INT + - bdma_chan->id].irq_name, - (void *)bdma_chan); - - if (rc) { - dev_dbg(dchan->device->dev, - "Unable to allocate MSI-X interrupt for " - "BDMA%d-INT\n", bdma_chan->id); - free_irq( - priv->msix[TSI721_VECT_DMA0_DONE + - bdma_chan->id].vector, - (void *)bdma_chan); - rc = -EIO; - goto err_out; - } - } -#endif /* CONFIG_PCI_MSI */ + dma_cookie_init(dchan); bdma_chan->active = true; tsi721_bdma_interrupt_enable(bdma_chan, 1); - return bdma_chan->bd_num - 1; - -err_out: - kfree(desc); - tsi721_bdma_ch_free(bdma_chan); - return rc; + return TSI721_DMA_TX_QUEUE_SZ; } -static void tsi721_free_chan_resources(struct dma_chan *dchan) +static void tsi721_sync_dma_irq(struct tsi721_bdma_chan *bdma_chan) { - struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); - struct tsi721_device *priv = to_tsi721(dchan->device); - LIST_HEAD(list); - - dev_dbg(dchan->device->dev, "%s: Entry\n", __func__); - - if (bdma_chan->bd_base == NULL) - return; - - BUG_ON(!list_empty(&bdma_chan->active_list)); - BUG_ON(!list_empty(&bdma_chan->queue)); - - tsi721_bdma_interrupt_enable(bdma_chan, 0); - bdma_chan->active = false; + struct tsi721_device *priv = to_tsi721(bdma_chan->dchan.device); #ifdef CONFIG_PCI_MSI if (priv->flags & TSI721_USING_MSIX) { @@ -601,64 +708,48 @@ static void tsi721_free_chan_resources(struct dma_chan *dchan) } else #endif synchronize_irq(priv->pdev->irq); +} - tasklet_kill(&bdma_chan->tasklet); +static void tsi721_free_chan_resources(struct dma_chan *dchan) +{ + struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); - spin_lock_bh(&bdma_chan->lock); - list_splice_init(&bdma_chan->free_list, &list); - spin_unlock_bh(&bdma_chan->lock); + dev_dbg(dchan->device->dev, "%s: for channel %d\n", + __func__, bdma_chan->id); -#ifdef CONFIG_PCI_MSI - if (priv->flags & TSI721_USING_MSIX) { - free_irq(priv->msix[TSI721_VECT_DMA0_DONE + - bdma_chan->id].vector, (void *)bdma_chan); - free_irq(priv->msix[TSI721_VECT_DMA0_INT + - bdma_chan->id].vector, (void *)bdma_chan); - } -#endif /* CONFIG_PCI_MSI */ + if (bdma_chan->bd_base == NULL) + return; - tsi721_bdma_ch_free(bdma_chan); + BUG_ON(!list_empty(&bdma_chan->active_list)); + BUG_ON(!list_empty(&bdma_chan->queue)); + + tsi721_bdma_interrupt_enable(bdma_chan, 0); + bdma_chan->active = false; + tsi721_sync_dma_irq(bdma_chan); + tasklet_kill(&bdma_chan->tasklet); + INIT_LIST_HEAD(&bdma_chan->free_list); kfree(bdma_chan->tx_desc); + tsi721_bdma_ch_free(bdma_chan); } static enum dma_status tsi721_tx_status(struct dma_chan *dchan, dma_cookie_t cookie, struct dma_tx_state *txstate) { - struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); - dma_cookie_t last_used; - dma_cookie_t last_completed; - int ret; - - spin_lock_bh(&bdma_chan->lock); - last_completed = bdma_chan->completed_cookie; - last_used = dchan->cookie; - spin_unlock_bh(&bdma_chan->lock); - - ret = dma_async_is_complete(cookie, last_completed, last_used); - - dma_set_tx_state(txstate, last_completed, last_used, 0); - - dev_dbg(dchan->device->dev, - "%s: exit, ret: %d, last_completed: %d, last_used: %d\n", - __func__, ret, last_completed, last_used); - - return ret; + return dma_cookie_status(dchan, cookie, txstate); } static void tsi721_issue_pending(struct dma_chan *dchan) { struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); - dev_dbg(dchan->device->dev, "%s: Entry\n", __func__); + dev_dbg(dchan->device->dev, "%s: Enter\n", __func__); - if (tsi721_dma_is_idle(bdma_chan)) { + if (tsi721_dma_is_idle(bdma_chan) && bdma_chan->active) { spin_lock_bh(&bdma_chan->lock); tsi721_advance_work(bdma_chan); spin_unlock_bh(&bdma_chan->lock); - } else - dev_dbg(dchan->device->dev, - "%s: DMA channel still busy\n", __func__); + } } static @@ -668,21 +759,19 @@ struct dma_async_tx_descriptor *tsi721_prep_rio_sg(struct dma_chan *dchan, void *tinfo) { struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); - struct tsi721_tx_desc *desc = NULL; - struct tsi721_tx_desc *first = NULL; - struct scatterlist *sg; + struct tsi721_tx_desc *desc, *_d; struct rio_dma_ext *rext = tinfo; - u64 rio_addr = rext->rio_addr; /* limited to 64-bit rio_addr for now */ - unsigned int i; - u32 sys_size = dma_to_mport(dchan->device)->sys_size; enum dma_rtype rtype; - dma_addr_t next_addr = -1; + struct dma_async_tx_descriptor *txd = NULL; if (!sgl || !sg_len) { dev_err(dchan->device->dev, "%s: No SG list\n", __func__); return NULL; } + dev_dbg(dchan->device->dev, "%s: %s\n", __func__, + (dir == DMA_DEV_TO_MEM)?"READ":"WRITE"); + if (dir == DMA_DEV_TO_MEM) rtype = NREAD; else if (dir == DMA_MEM_TO_DEV) { @@ -704,97 +793,26 @@ struct dma_async_tx_descriptor *tsi721_prep_rio_sg(struct dma_chan *dchan, return NULL; } - for_each_sg(sgl, sg, sg_len, i) { - int err; - - if (sg_dma_len(sg) > TSI721_BDMA_MAX_BCOUNT) { - dev_err(dchan->device->dev, - "%s: SG entry %d is too large\n", __func__, i); - goto err_desc_put; - } - - /* - * If this sg entry forms contiguous block with previous one, - * try to merge it into existing DMA descriptor - */ - if (desc) { - if (next_addr == sg_dma_address(sg) && - desc->bcount + sg_dma_len(sg) <= - TSI721_BDMA_MAX_BCOUNT) { - /* Adjust byte count of the descriptor */ - desc->bcount += sg_dma_len(sg); - goto entry_done; - } - - /* - * Finalize this descriptor using total - * byte count value. - */ - tsi721_desc_fill_end(desc); - dev_dbg(dchan->device->dev, "%s: desc final len: %d\n", - __func__, desc->bcount); - } - - /* - * Obtain and initialize a new descriptor - */ - desc = tsi721_desc_get(bdma_chan); - if (!desc) { - dev_err(dchan->device->dev, - "%s: Failed to get new descriptor for SG %d\n", - __func__, i); - goto err_desc_put; - } - - desc->destid = rext->destid; - desc->rio_addr = rio_addr; - desc->rio_addr_u = 0; - desc->bcount = sg_dma_len(sg); - - dev_dbg(dchan->device->dev, - "sg%d desc: 0x%llx, addr: 0x%llx len: %d\n", - i, (u64)desc->txd.phys, - (unsigned long long)sg_dma_address(sg), - sg_dma_len(sg)); - - dev_dbg(dchan->device->dev, - "bd_ptr = %p did=%d raddr=0x%llx\n", - desc->hw_desc, desc->destid, desc->rio_addr); - - err = tsi721_desc_fill_init(desc, sg, rtype, sys_size); - if (err) { - dev_err(dchan->device->dev, - "Failed to build desc: %d\n", err); - goto err_desc_put; - } - - next_addr = sg_dma_address(sg); - - if (!first) - first = desc; - else - list_add_tail(&desc->desc_node, &first->tx_list); + spin_lock_bh(&bdma_chan->lock); -entry_done: - if (sg_is_last(sg)) { - desc->interrupt = (flags & DMA_PREP_INTERRUPT) != 0; - tsi721_desc_fill_end(desc); - dev_dbg(dchan->device->dev, "%s: desc final len: %d\n", - __func__, desc->bcount); - } else { - rio_addr += sg_dma_len(sg); - next_addr += sg_dma_len(sg); + list_for_each_entry_safe(desc, _d, &bdma_chan->free_list, desc_node) { + if (async_tx_test_ack(&desc->txd)) { + list_del_init(&desc->desc_node); + desc->destid = rext->destid; + desc->rio_addr = rext->rio_addr; + desc->rio_addr_u = 0; + desc->rtype = rtype; + desc->sg_len = sg_len; + desc->sg = sgl; + txd = &desc->txd; + txd->flags = flags; + break; } } - first->txd.cookie = -EBUSY; - desc->txd.flags = flags; - - return &first->txd; + spin_unlock_bh(&bdma_chan->lock); -err_desc_put: - tsi721_desc_put(bdma_chan, first); - return NULL; + return txd; } static int tsi721_device_control(struct dma_chan *dchan, enum dma_ctrl_cmd cmd, @@ -802,23 +820,34 @@ static int tsi721_device_control(struct dma_chan *dchan, enum dma_ctrl_cmd cmd, { struct tsi721_bdma_chan *bdma_chan = to_tsi721_chan(dchan); struct tsi721_tx_desc *desc, *_d; + u32 dmac_int; LIST_HEAD(list); dev_dbg(dchan->device->dev, "%s: Entry\n", __func__); if (cmd != DMA_TERMINATE_ALL) - return -ENXIO; + return -ENOSYS; spin_lock_bh(&bdma_chan->lock); - /* make sure to stop the transfer */ - iowrite32(TSI721_DMAC_CTL_SUSP, bdma_chan->regs + TSI721_DMAC_CTL); + bdma_chan->active = false; + + if (!tsi721_dma_is_idle(bdma_chan)) { + /* make sure to stop the transfer */ + iowrite32(TSI721_DMAC_CTL_SUSP, + bdma_chan->regs + TSI721_DMAC_CTL); + + /* Wait until DMA channel stops */ + do { + dmac_int = ioread32(bdma_chan->regs + TSI721_DMAC_INT); + } while ((dmac_int & TSI721_DMAC_INT_SUSP) == 0); + } list_splice_init(&bdma_chan->active_list, &list); list_splice_init(&bdma_chan->queue, &list); list_for_each_entry_safe(desc, _d, &list, desc_node) - tsi721_dma_chain_complete(bdma_chan, desc); + tsi721_dma_tx_err(bdma_chan, desc); spin_unlock_bh(&bdma_chan->lock); @@ -828,22 +857,18 @@ static int tsi721_device_control(struct dma_chan *dchan, enum dma_ctrl_cmd cmd, int tsi721_register_dma(struct tsi721_device *priv) { int i; - int nr_channels = TSI721_DMA_MAXCH; + int nr_channels = 0; int err; struct rio_mport *mport = priv->mport; - mport->dma.dev = &priv->pdev->dev; - mport->dma.chancnt = nr_channels; - INIT_LIST_HEAD(&mport->dma.channels); - for (i = 0; i < nr_channels; i++) { + for (i = 0; i < TSI721_DMA_MAXCH; i++) { struct tsi721_bdma_chan *bdma_chan = &priv->bdma[i]; if (i == TSI721_DMACH_MAINT) continue; - bdma_chan->bd_num = TSI721_BDMA_BD_RING_SZ; bdma_chan->regs = priv->regs + TSI721_DMAC_BASE(i); bdma_chan->dchan.device = &mport->dma; @@ -862,12 +887,15 @@ int tsi721_register_dma(struct tsi721_device *priv) (unsigned long)bdma_chan); list_add_tail(&bdma_chan->dchan.device_node, &mport->dma.channels); + nr_channels++; } + mport->dma.chancnt = nr_channels; dma_cap_zero(mport->dma.cap_mask); dma_cap_set(DMA_PRIVATE, mport->dma.cap_mask); dma_cap_set(DMA_SLAVE, mport->dma.cap_mask); + mport->dma.dev = &priv->pdev->dev; mport->dma.device_alloc_chan_resources = tsi721_alloc_chan_resources; mport->dma.device_free_chan_resources = tsi721_free_chan_resources; mport->dma.device_tx_status = tsi721_tx_status; -- cgit v0.10.2 From 1b9c53e849aa65776d4f611d99aa09f856518dad Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Fri, 8 Aug 2014 14:22:14 -0700 Subject: lib/rbtree.c: fix typo in comment of __rb_insert() In case 1, it passes down the BLACK color from G to p and u, and maintains the color of n. By doing so, it maintains the black height of the sub-tree. While in the comment, it marks the color of n to BLACK. This is a typo and not consistents with the code. This patch fixs this typo in comment. Signed-off-by: Wei Yang Acked-by: Michel Lespinasse Cc: Xiao Guangrong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/rbtree.c b/lib/rbtree.c index 65f4eff..c16c81a 100644 --- a/lib/rbtree.c +++ b/lib/rbtree.c @@ -101,7 +101,7 @@ __rb_insert(struct rb_node *node, struct rb_root *root, * / \ / \ * p u --> P U * / / - * n N + * n n * * However, since g's parent might be red, and * 4) does not allow this, we need to recurse -- cgit v0.10.2 From e5eea0981a3840f3f39f43d2d00461c4c24018e7 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:22:16 -0700 Subject: sysctl: remove typedef ctl_table Remove the final user, and the typedef itself. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 7129046..f92d5dd 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -632,7 +632,7 @@ out: return ret; } -static int scan(struct ctl_table_header *head, ctl_table *table, +static int scan(struct ctl_table_header *head, struct ctl_table *table, unsigned long *pos, struct file *file, struct dir_context *ctx) { diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 14a8ff2..b7361f8 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -34,8 +34,6 @@ struct ctl_table_root; struct ctl_table_header; struct ctl_dir; -typedef struct ctl_table ctl_table; - typedef int proc_handler (struct ctl_table *ctl, int write, void __user *buffer, size_t *lenp, loff_t *ppos); -- cgit v0.10.2 From b134079f1f4691a0e271211aefad757c5380cec7 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:18 -0700 Subject: fs/exofs/ore_raid.c: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick Acked-by: Boaz Harrosh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c index 7f20f25..84529b8 100644 --- a/fs/exofs/ore_raid.c +++ b/fs/exofs/ore_raid.c @@ -116,7 +116,7 @@ static int _sp2d_alloc(unsigned pages_in_unit, unsigned group_width, num_a1pa = min_t(unsigned, PAGE_SIZE / sizeof__a1pa, pages_in_unit - i); - __a1pa = kzalloc(num_a1pa * sizeof__a1pa, GFP_KERNEL); + __a1pa = kcalloc(num_a1pa, sizeof__a1pa, GFP_KERNEL); if (unlikely(!__a1pa)) { ORE_DBGMSG("!! Failed to _alloc_1p_arrays=%d\n", num_a1pa); -- cgit v0.10.2 From 834b18b23e1012e6c2987af703490bc60956d211 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:20 -0700 Subject: kernel/gcov/fs.c: remove unnecessary null test before debugfs_remove This fixes checkpatch warning: WARNING: debugfs_remove(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick Cc: Peter Oberparleiter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/gcov/fs.c b/kernel/gcov/fs.c index 15ff01a..edf67c4 100644 --- a/kernel/gcov/fs.c +++ b/kernel/gcov/fs.c @@ -784,8 +784,7 @@ static __init int gcov_fs_init(void) err_remove: pr_err("init failed\n"); - if (root_node.dentry) - debugfs_remove(root_node.dentry); + debugfs_remove(root_node.dentry); return rc; } -- cgit v0.10.2 From e2ffcf5c7ee7eb979b1f2375404668a2f5076c1b Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:22 -0700 Subject: fs/adfs/dir_fplus.c: use ARRAY_SIZE instead of sizeof/sizeof[0] Use kernel.h definition. Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/adfs/dir_fplus.c b/fs/adfs/dir_fplus.c index d9e3bee..7673c1f 100644 --- a/fs/adfs/dir_fplus.c +++ b/fs/adfs/dir_fplus.c @@ -55,7 +55,7 @@ adfs_fplus_read(struct super_block *sb, unsigned int id, unsigned int sz, struct } size >>= sb->s_blocksize_bits; - if (size > sizeof(dir->bh)/sizeof(dir->bh[0])) { + if (size > ARRAY_SIZE(dir->bh)) { /* this directory is too big for fixed bh set, must allocate */ struct buffer_head **bh_fplus = kzalloc(size * sizeof(struct buffer_head *), -- cgit v0.10.2 From b16214d43d9060722360360f741416660b54c4f5 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:24 -0700 Subject: fs/adfs/dir_fplus.c: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/adfs/dir_fplus.c b/fs/adfs/dir_fplus.c index 7673c1f..c52f1ce 100644 --- a/fs/adfs/dir_fplus.c +++ b/fs/adfs/dir_fplus.c @@ -58,7 +58,7 @@ adfs_fplus_read(struct super_block *sb, unsigned int id, unsigned int sz, struct if (size > ARRAY_SIZE(dir->bh)) { /* this directory is too big for fixed bh set, must allocate */ struct buffer_head **bh_fplus = - kzalloc(size * sizeof(struct buffer_head *), + kcalloc(size, sizeof(struct buffer_head *), GFP_KERNEL); if (!bh_fplus) { adfs_error(sb, "not enough memory for" -- cgit v0.10.2 From 19bdd41a57e1418b8661148125e9b6d99f468c1b Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:22:26 -0700 Subject: adfs: add __printf verification, fix format/argument mismatches Might as well do the right thing. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h index c770337..24575d9 100644 --- a/fs/adfs/adfs.h +++ b/fs/adfs/adfs.h @@ -153,6 +153,7 @@ extern int adfs_map_lookup(struct super_block *sb, unsigned int frag_id, unsigne extern unsigned int adfs_map_free(struct super_block *sb); /* Misc */ +__printf(3, 4) void __adfs_error(struct super_block *sb, const char *function, const char *fmt, ...); #define adfs_error(sb, fmt...) __adfs_error(sb, __func__, fmt) diff --git a/fs/adfs/dir.c b/fs/adfs/dir.c index 0d138c0..51c279a 100644 --- a/fs/adfs/dir.c +++ b/fs/adfs/dir.c @@ -138,7 +138,7 @@ adfs_dir_lookup_byname(struct inode *inode, struct qstr *name, struct object_inf goto out; if (ADFS_I(inode)->parent_id != dir.parent_id) { - adfs_error(sb, "parent directory changed under me! (%lx but got %lx)\n", + adfs_error(sb, "parent directory changed under me! (%lx but got %x)\n", ADFS_I(inode)->parent_id, dir.parent_id); ret = -EIO; goto free_out; diff --git a/fs/adfs/dir_fplus.c b/fs/adfs/dir_fplus.c index c52f1ce..f2ba88a 100644 --- a/fs/adfs/dir_fplus.c +++ b/fs/adfs/dir_fplus.c @@ -79,9 +79,8 @@ adfs_fplus_read(struct super_block *sb, unsigned int id, unsigned int sz, struct dir->bh_fplus[blk] = sb_bread(sb, block); if (!dir->bh_fplus[blk]) { - adfs_error(sb, "dir object %X failed read for" - " offset %d, mapped block %X", - id, blk, block); + adfs_error(sb, "dir object %x failed read for offset %d, mapped block %lX", + id, blk, block); goto out; } -- cgit v0.10.2 From 1da85fdff55da539fbe9c270689a7bcedead1cdd Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:29 -0700 Subject: fs/bfs: use bfs prefix for dump_imap All bfs related functions use bfs_ prefix. This patch also moves extern declaration to bfs.h and removes prototype from inode.c This fixes checkpatch warning: WARNING: externs should be avoided in .c files Signed-off-by: Fabian Frederick Cc: "Tigran A. Aivazian" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/bfs/bfs.h b/fs/bfs/bfs.h index f7f87e2..f40006d 100644 --- a/fs/bfs/bfs.h +++ b/fs/bfs/bfs.h @@ -46,6 +46,7 @@ static inline struct bfs_inode_info *BFS_I(struct inode *inode) /* inode.c */ extern struct inode *bfs_iget(struct super_block *sb, unsigned long ino); +extern void bfs_dump_imap(const char *, struct super_block *); /* file.c */ extern const struct inode_operations bfs_file_inops; diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c index a399e6d..08063ae 100644 --- a/fs/bfs/dir.c +++ b/fs/bfs/dir.c @@ -75,8 +75,6 @@ const struct file_operations bfs_dir_operations = { .llseek = generic_file_llseek, }; -extern void dump_imap(const char *, struct super_block *); - static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { @@ -110,7 +108,7 @@ static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, BFS_I(inode)->i_eblock = 0; insert_inode_hash(inode); mark_inode_dirty(inode); - dump_imap("create", s); + bfs_dump_imap("create", s); err = bfs_add_entry(dir, dentry->d_name.name, dentry->d_name.len, inode->i_ino); diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c index 7041ac3..90bc079 100644 --- a/fs/bfs/inode.c +++ b/fs/bfs/inode.c @@ -30,8 +30,6 @@ MODULE_LICENSE("GPL"); #define dprintf(x...) #endif -void dump_imap(const char *prefix, struct super_block *s); - struct inode *bfs_iget(struct super_block *sb, unsigned long ino) { struct bfs_inode *di; @@ -194,7 +192,7 @@ static void bfs_evict_inode(struct inode *inode) info->si_freeb += bi->i_eblock + 1 - bi->i_sblock; info->si_freei++; clear_bit(ino, info->si_imap); - dump_imap("delete_inode", s); + bfs_dump_imap("delete_inode", s); } /* @@ -297,7 +295,7 @@ static const struct super_operations bfs_sops = { .statfs = bfs_statfs, }; -void dump_imap(const char *prefix, struct super_block *s) +void bfs_dump_imap(const char *prefix, struct super_block *s) { #ifdef DEBUG int i; @@ -443,7 +441,7 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent) } brelse(bh); brelse(sbh); - dump_imap("read_super", s); + bfs_dump_imap("read_super", s); return 0; out3: -- cgit v0.10.2 From 69361eef9056b0babb507798c2135ad1572f0ef7 Mon Sep 17 00:00:00 2001 From: Josh Hunt Date: Fri, 8 Aug 2014 14:22:31 -0700 Subject: panic: add TAINT_SOFTLOCKUP This taint flag will be set if the system has ever entered a softlockup state. Similar to TAINT_WARN it is useful to know whether or not the system has been in a softlockup state when debugging. [akpm@linux-foundation.org: apply the taint before calling panic()] Signed-off-by: Josh Hunt Cc: Jason Baron Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Documentation/oops-tracing.txt b/Documentation/oops-tracing.txt index e315599..beefb9f 100644 --- a/Documentation/oops-tracing.txt +++ b/Documentation/oops-tracing.txt @@ -268,6 +268,8 @@ characters, each representing a particular tainted value. 14: 'E' if an unsigned module has been loaded in a kernel supporting module signature. + 15: 'L' if a soft lockup has previously occurred on the system. + The primary reason for the 'Tainted: ' string is to tell kernel debuggers if this is a clean kernel or if anything unusual has occurred. Tainting is permanent: even if an offending module is diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index c14374e..f79eb96 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -826,6 +826,7 @@ can be ORed together: 4096 - An out-of-tree module has been loaded. 8192 - An unsigned module has been loaded in a kernel supporting module signature. +16384 - A soft lockup has previously occurred on the system. ============================================================== diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 3dc22ab..31ae66f 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -470,6 +470,7 @@ extern enum system_states { #define TAINT_FIRMWARE_WORKAROUND 11 #define TAINT_OOT_MODULE 12 #define TAINT_UNSIGNED_MODULE 13 +#define TAINT_SOFTLOCKUP 14 extern const char hex_asc[]; #define hex_asc_lo(x) hex_asc[((x) & 0x0f)] diff --git a/kernel/panic.c b/kernel/panic.c index 62e16ce..d09dc5c 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -224,6 +224,7 @@ static const struct tnt tnts[] = { { TAINT_FIRMWARE_WORKAROUND, 'I', ' ' }, { TAINT_OOT_MODULE, 'O', ' ' }, { TAINT_UNSIGNED_MODULE, 'E', ' ' }, + { TAINT_SOFTLOCKUP, 'L', ' ' }, }; /** diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 51b29e9..a8d6914 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -368,6 +368,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) smp_mb__after_atomic(); } + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); if (softlockup_panic) panic("softlockup: hung tasks"); __this_cpu_write(soft_watchdog_warn, true); -- cgit v0.10.2 From 340f365dd9d74d0ca55089481b9c665955db0aaa Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:33 -0700 Subject: drivers/parport/parport_ip32.c: use PTR_ERR_OR_ZERO replace IS_ERR/PTR_ERR Signed-off-by: Fabian Frederick Cc: Wolfram Sang Cc: Linus Walleij Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/parport/parport_ip32.c b/drivers/parport/parport_ip32.c index c864f82..30e981b 100644 --- a/drivers/parport/parport_ip32.c +++ b/drivers/parport/parport_ip32.c @@ -2204,7 +2204,7 @@ static int __init parport_ip32_init(void) { pr_info(PPIP32 "SGI IP32 built-in parallel port driver v0.6\n"); this_port = parport_ip32_probe_port(); - return IS_ERR(this_port) ? PTR_ERR(this_port) : 0; + return PTR_ERR_OR_ZERO(this_port); } /** -- cgit v0.10.2 From b8f52d89c0c6daec8a9f9e00c1c9afb680cbcf5e Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:35 -0700 Subject: fs/pstore/ram_core.c: replace count*size kmalloc by kmalloc_array kmalloc_array manages count*sizeof overflow. Signed-off-by: Fabian Frederick Cc: Anton Vorontsov Cc: Colin Cross Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 34a1e5a..9d7b9a8 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -394,7 +394,7 @@ static void *persistent_ram_vmap(phys_addr_t start, size_t size) prot = pgprot_noncached(PAGE_KERNEL); - pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL); + pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); if (!pages) { pr_err("%s: Failed to allocate array for %u pages\n", __func__, page_count); -- cgit v0.10.2 From 998d6688fb399ab313dcd6f24d1196e3aaa8578c Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:41 -0700 Subject: fs/omfs/inode.c: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick Acked-by: Bob Copeland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c index ec58c76..ba88197 100644 --- a/fs/omfs/inode.c +++ b/fs/omfs/inode.c @@ -321,7 +321,7 @@ static int omfs_get_imap(struct super_block *sb) goto out; sbi->s_imap_size = array_size; - sbi->s_imap = kzalloc(array_size * sizeof(unsigned long *), GFP_KERNEL); + sbi->s_imap = kcalloc(array_size, sizeof(unsigned long *), GFP_KERNEL); if (!sbi->s_imap) goto nomem; -- cgit v0.10.2 From 89b3ac63013e64621369f619fe732b629879c671 Mon Sep 17 00:00:00 2001 From: Himangi Saraogi Date: Fri, 8 Aug 2014 14:22:44 -0700 Subject: kfifo: use BUG_ON Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: // @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // Signed-off-by: Himangi Saraogi Acked-by: Julia Lawall Cc: Stefani Seibold Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/kfifo.c b/lib/kfifo.c index d79b9d2..90ba1eb 100644 --- a/lib/kfifo.c +++ b/lib/kfifo.c @@ -561,8 +561,7 @@ EXPORT_SYMBOL(__kfifo_to_user_r); unsigned int __kfifo_dma_in_prepare_r(struct __kfifo *fifo, struct scatterlist *sgl, int nents, unsigned int len, size_t recsize) { - if (!nents) - BUG(); + BUG_ON(!nents); len = __kfifo_max_r(len, recsize); @@ -585,8 +584,7 @@ EXPORT_SYMBOL(__kfifo_dma_in_finish_r); unsigned int __kfifo_dma_out_prepare_r(struct __kfifo *fifo, struct scatterlist *sgl, int nents, unsigned int len, size_t recsize) { - if (!nents) - BUG(); + BUG_ON(!nents); len = __kfifo_max_r(len, recsize); -- cgit v0.10.2 From 8b6aaf65d3b001ec9b5dcba0992b3b68cbf6057f Mon Sep 17 00:00:00 2001 From: Thierry Fauck Date: Fri, 8 Aug 2014 14:22:46 -0700 Subject: tools/testing/selftests/ptrace/peeksiginfo.c: add PAGE_SIZE definition On IBM powerpc where multiple page size value are supported, current ppc64 and ppc64el distro don't define the PAGE_SIZE variable in /usr/include as this is a dynamic value retrieved by the getpagesize() or sysconf() defined in unistd.h. The PAGE_SIZE variable sounds defined when only one value is supported by the kernel. As such, when the PAGE_SIZE definition doesn't exist system should retrieve the dynamic value. Signed-off-by: Thierry Fauck Cc: Andrey Vagin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/tools/testing/selftests/ptrace/peeksiginfo.c b/tools/testing/selftests/ptrace/peeksiginfo.c index d46558b..c34cd8a 100644 --- a/tools/testing/selftests/ptrace/peeksiginfo.c +++ b/tools/testing/selftests/ptrace/peeksiginfo.c @@ -31,6 +31,10 @@ static int sys_ptrace(int request, pid_t pid, void *addr, void *data) #define TEST_SICODE_PRIV -1 #define TEST_SICODE_SHARE -2 +#ifndef PAGE_SIZE +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#endif + #define err(fmt, ...) \ fprintf(stderr, \ "Error (%s:%d): " fmt, \ -- cgit v0.10.2 From f175ff8100eef0eb4b946c08e78a27bd17ca5896 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:48 -0700 Subject: fs/cramfs: convert printk to pr_foo() Use current logging functions. No level printk converted to pr_err Signed-off-by: Fabian Frederick Cc: "Theodore Ts'o" Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index ddcfe59..480fcb8 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -277,7 +277,7 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) /* check for wrong endianness */ if (super.magic == CRAMFS_MAGIC_WEND) { if (!silent) - printk(KERN_ERR "cramfs: wrong endianness\n"); + pr_err("cramfs: wrong endianness\n"); return -EINVAL; } @@ -287,22 +287,22 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) mutex_unlock(&read_mutex); if (super.magic != CRAMFS_MAGIC) { if (super.magic == CRAMFS_MAGIC_WEND && !silent) - printk(KERN_ERR "cramfs: wrong endianness\n"); + pr_err("cramfs: wrong endianness\n"); else if (!silent) - printk(KERN_ERR "cramfs: wrong magic\n"); + pr_err("cramfs: wrong magic\n"); return -EINVAL; } } /* get feature flags first */ if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) { - printk(KERN_ERR "cramfs: unsupported filesystem features\n"); + pr_err("cramfs: unsupported filesystem features\n"); return -EINVAL; } /* Check that the root inode is in a sane state */ if (!S_ISDIR(super.root.mode)) { - printk(KERN_ERR "cramfs: root is not a directory\n"); + pr_err("cramfs: root is not a directory\n"); return -EINVAL; } /* correct strange, hard-coded permissions of mkcramfs */ @@ -321,12 +321,12 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) sbi->magic=super.magic; sbi->flags=super.flags; if (root_offset == 0) - printk(KERN_INFO "cramfs: empty filesystem"); + pr_info("cramfs: empty filesystem"); else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) && ((root_offset != sizeof(struct cramfs_super)) && (root_offset != 512 + sizeof(struct cramfs_super)))) { - printk(KERN_ERR "cramfs: bad root offset %lu\n", root_offset); + pr_err("cramfs: bad root offset %lu\n", root_offset); return -EINVAL; } diff --git a/fs/cramfs/uncompress.c b/fs/cramfs/uncompress.c index 1760c1b..76428cc 100644 --- a/fs/cramfs/uncompress.c +++ b/fs/cramfs/uncompress.c @@ -37,7 +37,7 @@ int cramfs_uncompress_block(void *dst, int dstlen, void *src, int srclen) err = zlib_inflateReset(&stream); if (err != Z_OK) { - printk("zlib_inflateReset error %d\n", err); + pr_err("zlib_inflateReset error %d\n", err); zlib_inflateEnd(&stream); zlib_inflateInit(&stream); } @@ -48,8 +48,8 @@ int cramfs_uncompress_block(void *dst, int dstlen, void *src, int srclen) return stream.total_out; err: - printk("Error %d while decompressing!\n", err); - printk("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen); + pr_err("Error %d while decompressing!\n", err); + pr_err("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen); return -EIO; } -- cgit v0.10.2 From 4f21e1ea09e1e337604f235a22ec2493ae1bd1db Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:50 -0700 Subject: fs/cramfs: use pr_fmt Use module name for "cramfs: " prefix. (note that uncompress.c printk had no prefix). Signed-off-by: Fabian Frederick Cc: "Theodore Ts'o" Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 480fcb8..fa5c759 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -11,6 +11,8 @@ * The actual compression is based on zlib, see the other files. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -277,7 +279,7 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) /* check for wrong endianness */ if (super.magic == CRAMFS_MAGIC_WEND) { if (!silent) - pr_err("cramfs: wrong endianness\n"); + pr_err("wrong endianness\n"); return -EINVAL; } @@ -287,22 +289,22 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) mutex_unlock(&read_mutex); if (super.magic != CRAMFS_MAGIC) { if (super.magic == CRAMFS_MAGIC_WEND && !silent) - pr_err("cramfs: wrong endianness\n"); + pr_err("wrong endianness\n"); else if (!silent) - pr_err("cramfs: wrong magic\n"); + pr_err("wrong magic\n"); return -EINVAL; } } /* get feature flags first */ if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) { - pr_err("cramfs: unsupported filesystem features\n"); + pr_err("unsupported filesystem features\n"); return -EINVAL; } /* Check that the root inode is in a sane state */ if (!S_ISDIR(super.root.mode)) { - pr_err("cramfs: root is not a directory\n"); + pr_err("root is not a directory\n"); return -EINVAL; } /* correct strange, hard-coded permissions of mkcramfs */ @@ -321,12 +323,12 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) sbi->magic=super.magic; sbi->flags=super.flags; if (root_offset == 0) - pr_info("cramfs: empty filesystem"); + pr_info("empty filesystem"); else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) && ((root_offset != sizeof(struct cramfs_super)) && (root_offset != 512 + sizeof(struct cramfs_super)))) { - pr_err("cramfs: bad root offset %lu\n", root_offset); + pr_err("bad root offset %lu\n", root_offset); return -EINVAL; } @@ -511,7 +513,7 @@ static int cramfs_readpage(struct file *file, struct page * page) if (compr_len == 0) ; /* hole */ else if (unlikely(compr_len > (PAGE_CACHE_SIZE << 1))) { - pr_err("cramfs: bad compressed blocksize %u\n", + pr_err("bad compressed blocksize %u\n", compr_len); goto err; } else { diff --git a/fs/cramfs/uncompress.c b/fs/cramfs/uncompress.c index 76428cc..b7f030b 100644 --- a/fs/cramfs/uncompress.c +++ b/fs/cramfs/uncompress.c @@ -15,6 +15,8 @@ * then is used by multiple filesystems. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include -- cgit v0.10.2 From 31d92e55198d4ec32862aea9441de46a13b33ed8 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:52 -0700 Subject: fs/cramfs: code clean-up Fixes some checkpatch errors/warnings: WARNING: Missing a blank line after declarations ERROR: spaces required around that '=' (ctx:VxV) ERROR: "foo * bar" should be "foo *bar" ERROR: space prohibited after that open parenthesis '(' Signed-off-by: Fabian Frederick Cc: "Theodore Ts'o" Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index fa5c759..4d37c35 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -155,7 +155,7 @@ static struct inode *get_cramfs_inode(struct super_block *sb, static unsigned char read_buffers[READ_BUFFERS][BUFFER_SIZE]; static unsigned buffer_blocknr[READ_BUFFERS]; -static struct super_block * buffer_dev[READ_BUFFERS]; +static struct super_block *buffer_dev[READ_BUFFERS]; static int next_buffer; /* @@ -207,6 +207,7 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i for (i = 0; i < BLKS_PER_BUF; i++) { struct page *page = pages[i]; + if (page) { wait_on_page_locked(page); if (!PageUptodate(page)) { @@ -225,6 +226,7 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i data = read_buffers[buffer]; for (i = 0; i < BLKS_PER_BUF; i++) { struct page *page = pages[i]; + if (page) { memcpy(data, kmap(page), PAGE_CACHE_SIZE); kunmap(page); @@ -239,6 +241,7 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i static void cramfs_kill_sb(struct super_block *sb) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb); + kill_block_super(sb); kfree(sbi); } @@ -312,16 +315,16 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) root_offset = super.root.offset << 2; if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) { - sbi->size=super.size; - sbi->blocks=super.fsid.blocks; - sbi->files=super.fsid.files; + sbi->size = super.size; + sbi->blocks = super.fsid.blocks; + sbi->files = super.fsid.files; } else { - sbi->size=1<<28; - sbi->blocks=0; - sbi->files=0; + sbi->size = 1<<28; + sbi->blocks = 0; + sbi->files = 0; } - sbi->magic=super.magic; - sbi->flags=super.flags; + sbi->magic = super.magic; + sbi->flags = super.flags; if (root_offset == 0) pr_info("empty filesystem"); else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) && @@ -427,7 +430,7 @@ static int cramfs_readdir(struct file *file, struct dir_context *ctx) /* * Lookup and fill in the inode data.. */ -static struct dentry * cramfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) +static struct dentry *cramfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { unsigned int offset = 0; struct inode *inode = NULL; @@ -485,7 +488,7 @@ out: return NULL; } -static int cramfs_readpage(struct file *file, struct page * page) +static int cramfs_readpage(struct file *file, struct page *page) { struct inode *inode = page->mapping->host; u32 maxblock; diff --git a/fs/cramfs/uncompress.c b/fs/cramfs/uncompress.c index b7f030b..ec4f1d4 100644 --- a/fs/cramfs/uncompress.c +++ b/fs/cramfs/uncompress.c @@ -59,7 +59,7 @@ int cramfs_uncompress_init(void) { if (!initialized++) { stream.workspace = vmalloc(zlib_inflate_workspacesize()); - if ( !stream.workspace ) { + if (!stream.workspace) { initialized = 0; return -ENOMEM; } -- cgit v0.10.2 From 1508f3eb6970858c04fd6a1899a8c999cc3c4ae2 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:55 -0700 Subject: fs/cramfs/inode.c: use linux/uaccess.h Fixes checkpatch warning: WARNING: Use #include instead of Signed-off-by: Fabian Frederick Cc: "Theodore Ts'o" Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 4d37c35..355c522 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -23,7 +23,7 @@ #include #include #include -#include +#include #include "internal.h" -- cgit v0.10.2 From 3d79d3145f9c0a955300c03f5386d2c86318e4b0 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:57 -0700 Subject: fs/romfs/super.c: convert printk to pr_foo() Use current logging functions. Coalesce formats. Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/romfs/super.c b/fs/romfs/super.c index ef90e8b..c32006e 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -380,7 +380,7 @@ static struct inode *romfs_iget(struct super_block *sb, unsigned long pos) eio: ret = -EIO; error: - printk(KERN_ERR "ROMFS: read error for inode 0x%lx\n", pos); + pr_err("ROMFS: read error for inode 0x%lx\n", pos); return ERR_PTR(ret); } @@ -507,14 +507,13 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent) if (rsb->word0 != ROMSB_WORD0 || rsb->word1 != ROMSB_WORD1 || img_size < ROMFH_SIZE) { if (!silent) - printk(KERN_WARNING "VFS:" - " Can't find a romfs filesystem on dev %s.\n", + pr_warn("VFS: Can't find a romfs filesystem on dev %s.\n", sb->s_id); goto error_rsb_inval; } if (romfs_checksum(rsb, min_t(size_t, img_size, 512))) { - printk(KERN_ERR "ROMFS: bad initial checksum on dev %s.\n", + pr_err("ROMFS: bad initial checksum on dev %s.\n", sb->s_id); goto error_rsb_inval; } @@ -523,7 +522,7 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent) len = strnlen(rsb->name, ROMFS_MAXFN); if (!silent) - printk(KERN_NOTICE "ROMFS: Mounting image '%*.*s' through %s\n", + pr_notice("ROMFS: Mounting image '%*.*s' through %s\n", (unsigned) len, (unsigned) len, rsb->name, storage); kfree(rsb); @@ -614,7 +613,7 @@ static int __init init_romfs_fs(void) { int ret; - printk(KERN_INFO "ROMFS MTD (C) 2007 Red Hat, Inc.\n"); + pr_info("ROMFS MTD (C) 2007 Red Hat, Inc.\n"); romfs_inode_cachep = kmem_cache_create("romfs_i", @@ -623,13 +622,12 @@ static int __init init_romfs_fs(void) romfs_i_init_once); if (!romfs_inode_cachep) { - printk(KERN_ERR - "ROMFS error: Failed to initialise inode cache\n"); + pr_err("ROMFS error: Failed to initialise inode cache\n"); return -ENOMEM; } ret = register_filesystem(&romfs_fs_type); if (ret) { - printk(KERN_ERR "ROMFS error: Failed to register filesystem\n"); + pr_err("ROMFS error: Failed to register filesystem\n"); goto error_register; } return 0; -- cgit v0.10.2 From ca9e5368a7d993558a39c327fa3991ec242ea4c3 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:22:59 -0700 Subject: fs/romfs/super.c: use pr_fmt in logging - Remove "Error" in format logging (already in pr_ level) - Use modulename in pr_fmt instead of ROMFS: in each pr_ callsites. Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/romfs/super.c b/fs/romfs/super.c index c32006e..6e73879 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -56,6 +56,8 @@ * 2 of the Licence, or (at your option) any later version. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -380,7 +382,7 @@ static struct inode *romfs_iget(struct super_block *sb, unsigned long pos) eio: ret = -EIO; error: - pr_err("ROMFS: read error for inode 0x%lx\n", pos); + pr_err("read error for inode 0x%lx\n", pos); return ERR_PTR(ret); } @@ -513,8 +515,7 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent) } if (romfs_checksum(rsb, min_t(size_t, img_size, 512))) { - pr_err("ROMFS: bad initial checksum on dev %s.\n", - sb->s_id); + pr_err("bad initial checksum on dev %s.\n", sb->s_id); goto error_rsb_inval; } @@ -522,8 +523,8 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent) len = strnlen(rsb->name, ROMFS_MAXFN); if (!silent) - pr_notice("ROMFS: Mounting image '%*.*s' through %s\n", - (unsigned) len, (unsigned) len, rsb->name, storage); + pr_notice("Mounting image '%*.*s' through %s\n", + (unsigned) len, (unsigned) len, rsb->name, storage); kfree(rsb); rsb = NULL; @@ -622,12 +623,12 @@ static int __init init_romfs_fs(void) romfs_i_init_once); if (!romfs_inode_cachep) { - pr_err("ROMFS error: Failed to initialise inode cache\n"); + pr_err("Failed to initialise inode cache\n"); return -ENOMEM; } ret = register_filesystem(&romfs_fs_type); if (ret) { - pr_err("ROMFS error: Failed to register filesystem\n"); + pr_err("Failed to register filesystem\n"); goto error_register; } return 0; -- cgit v0.10.2 From b6d53b1b113492632bdc37ee9ab1dda8b9a4bf34 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:01 -0700 Subject: fs/romfs/super.c: add blank line after declarations Fix checkpatch warning: WARNING: Missing a blank line after declarations Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/romfs/super.c b/fs/romfs/super.c index 6e73879..e98dd88 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -392,6 +392,7 @@ error: static struct inode *romfs_alloc_inode(struct super_block *sb) { struct romfs_inode_info *inode; + inode = kmem_cache_alloc(romfs_inode_cachep, GFP_KERNEL); return inode ? &inode->vfs_inode : NULL; } @@ -402,6 +403,7 @@ static struct inode *romfs_alloc_inode(struct super_block *sb) static void romfs_i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); + kmem_cache_free(romfs_inode_cachep, ROMFS_I(inode)); } -- cgit v0.10.2 From e00d5b5ad70c6c4fb978fe843c0bbb3294d63223 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:03 -0700 Subject: fs/qnx6: convert printk to pr_foo() Use current logging functions. Coalesce formats. Signed-off-by: Fabian Frederick Cc: Joe Perches Cc: Kai Bankett Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/qnx6/dir.c b/fs/qnx6/dir.c index 15b7d92..2289ea1 100644 --- a/fs/qnx6/dir.c +++ b/fs/qnx6/dir.c @@ -77,13 +77,12 @@ static int qnx6_dir_longfilename(struct inode *inode, if (de->de_size != 0xff) { /* error - long filename entries always have size 0xff in direntry */ - printk(KERN_ERR "qnx6: invalid direntry size (%i).\n", - de->de_size); + pr_err("qnx6: invalid direntry size (%i).\n", de->de_size); return 0; } lf = qnx6_longname(s, de, &page); if (IS_ERR(lf)) { - printk(KERN_ERR "qnx6:Error reading longname\n"); + pr_err("qnx6:Error reading longname\n"); return 0; } @@ -91,7 +90,7 @@ static int qnx6_dir_longfilename(struct inode *inode, if (lf_size > QNX6_LONG_NAME_MAX) { QNX6DEBUG((KERN_INFO "file %s\n", lf->lf_fname)); - printk(KERN_ERR "qnx6:Filename too long (%i)\n", lf_size); + pr_err("qnx6:Filename too long (%i)\n", lf_size); qnx6_put_page(page); return 0; } @@ -100,7 +99,7 @@ static int qnx6_dir_longfilename(struct inode *inode, mmi 3g filesystem does not have that checksum */ if (!test_opt(s, MMI_FS) && fs32_to_cpu(sbi, de->de_checksum) != qnx6_lfile_checksum(lf->lf_fname, lf_size)) - printk(KERN_INFO "qnx6: long filename checksum error.\n"); + pr_info("qnx6: long filename checksum error.\n"); QNX6DEBUG((KERN_INFO "qnx6_readdir:%.*s inode:%u\n", lf_size, lf->lf_fname, de_inode)); @@ -136,7 +135,7 @@ static int qnx6_readdir(struct file *file, struct dir_context *ctx) int i = start; if (IS_ERR(page)) { - printk(KERN_ERR "qnx6_readdir: read failed\n"); + pr_err("qnx6_readdir: read failed\n"); ctx->pos = (n + 1) << PAGE_CACHE_SHIFT; return PTR_ERR(page); } @@ -259,8 +258,7 @@ unsigned qnx6_find_entry(int len, struct inode *dir, const char *name, if (ino) goto found; } else - printk(KERN_ERR "qnx6: undefined " - "filename size in inode.\n"); + pr_err("qnx6: undefined filename size in inode.\n"); } qnx6_put_page(page); } diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c index 65cdaab..0042af9 100644 --- a/fs/qnx6/inode.c +++ b/fs/qnx6/inode.c @@ -87,7 +87,7 @@ static int qnx6_get_block(struct inode *inode, sector_t iblock, static int qnx6_check_blockptr(__fs32 ptr) { if (ptr == ~(__fs32)0) { - printk(KERN_ERR "qnx6: hit unused blockpointer.\n"); + pr_err("qnx6: hit unused blockpointer.\n"); return 0; } return 1; @@ -127,8 +127,7 @@ static unsigned qnx6_block_map(struct inode *inode, unsigned no) levelptr = no >> bitdelta; if (levelptr > QNX6_NO_DIRECT_POINTERS - 1) { - printk(KERN_ERR "qnx6:Requested file block number (%u) too big.", - no); + pr_err("qnx6:Requested file block number (%u) too big.", no); return 0; } @@ -137,8 +136,7 @@ static unsigned qnx6_block_map(struct inode *inode, unsigned no) for (i = 0; i < depth; i++) { bh = sb_bread(s, block); if (!bh) { - printk(KERN_ERR "qnx6:Error reading block (%u)\n", - block); + pr_err("qnx6:Error reading block (%u)\n", block); return 0; } bitdelta -= ptrbits; @@ -277,7 +275,7 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s, start with the first superblock */ bh = sb_bread(s, offset); if (!bh) { - printk(KERN_ERR "qnx6: unable to read the first superblock\n"); + pr_err("qnx6: unable to read the first superblock\n"); return NULL; } sb = (struct qnx6_super_block *)bh->b_data; @@ -292,13 +290,10 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s, sbi->s_bytesex = BYTESEX_LE; if (!silent) { if (offset == 0) { - printk(KERN_ERR "qnx6: wrong signature (magic)" - " in superblock #1.\n"); + pr_err("qnx6: wrong signature (magic) in superblock #1.\n"); } else { - printk(KERN_INFO "qnx6: wrong signature (magic)" - " at position (0x%lx) - will try" - " alternative position (0x0000).\n", - offset * s->s_blocksize); + pr_info("qnx6: wrong signature (magic) at position (0x%lx) - will try alternative position (0x0000).\n", + offset * s->s_blocksize); } } brelse(bh); @@ -329,13 +324,13 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* Superblock always is 512 Byte long */ if (!sb_set_blocksize(s, QNX6_SUPERBLOCK_SIZE)) { - printk(KERN_ERR "qnx6: unable to set blocksize\n"); + pr_err("qnx6: unable to set blocksize\n"); goto outnobh; } /* parse the mount-options */ if (!qnx6_parse_options((char *) data, s)) { - printk(KERN_ERR "qnx6: invalid mount options.\n"); + pr_err("qnx6: invalid mount options.\n"); goto outnobh; } if (test_opt(s, MMI_FS)) { @@ -355,7 +350,7 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* try again without bootblock offset */ bh1 = qnx6_check_first_superblock(s, 0, silent); if (!bh1) { - printk(KERN_ERR "qnx6: unable to read the first superblock\n"); + pr_err("qnx6: unable to read the first superblock\n"); goto outnobh; } /* seems that no bootblock at partition start */ @@ -370,13 +365,13 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb1->sb_checksum) != crc32_be(0, (char *)(bh1->b_data + 8), 504)) { - printk(KERN_ERR "qnx6: superblock #1 checksum error\n"); + pr_err("qnx6: superblock #1 checksum error\n"); goto out; } /* set new blocksize */ if (!sb_set_blocksize(s, fs32_to_cpu(sbi, sb1->sb_blocksize))) { - printk(KERN_ERR "qnx6: unable to set blocksize\n"); + pr_err("qnx6: unable to set blocksize\n"); goto out; } /* blocksize invalidates bh - pull it back in */ @@ -398,21 +393,20 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* next the second superblock */ bh2 = sb_bread(s, offset); if (!bh2) { - printk(KERN_ERR "qnx6: unable to read the second superblock\n"); + pr_err("qnx6: unable to read the second superblock\n"); goto out; } sb2 = (struct qnx6_super_block *)bh2->b_data; if (fs32_to_cpu(sbi, sb2->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) - printk(KERN_ERR "qnx6: wrong signature (magic)" - " in superblock #2.\n"); + pr_err("qnx6: wrong signature (magic) in superblock #2.\n"); goto out; } /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb2->sb_checksum) != crc32_be(0, (char *)(bh2->b_data + 8), 504)) { - printk(KERN_ERR "qnx6: superblock #2 checksum error\n"); + pr_err("qnx6: superblock #2 checksum error\n"); goto out; } @@ -422,25 +416,24 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) sbi->sb_buf = bh1; sbi->sb = (struct qnx6_super_block *)bh1->b_data; brelse(bh2); - printk(KERN_INFO "qnx6: superblock #1 active\n"); + pr_info("qnx6: superblock #1 active\n"); } else { /* superblock #2 active */ sbi->sb_buf = bh2; sbi->sb = (struct qnx6_super_block *)bh2->b_data; brelse(bh1); - printk(KERN_INFO "qnx6: superblock #2 active\n"); + pr_info("qnx6: superblock #2 active\n"); } mmi_success: /* sanity check - limit maximum indirect pointer levels */ if (sb1->Inode.levels > QNX6_PTR_MAX_LEVELS) { - printk(KERN_ERR "qnx6: too many inode levels (max %i, sb %i)\n", - QNX6_PTR_MAX_LEVELS, sb1->Inode.levels); + pr_err("qnx6: too many inode levels (max %i, sb %i)\n", + QNX6_PTR_MAX_LEVELS, sb1->Inode.levels); goto out; } if (sb1->Longfile.levels > QNX6_PTR_MAX_LEVELS) { - printk(KERN_ERR "qnx6: too many longfilename levels" - " (max %i, sb %i)\n", - QNX6_PTR_MAX_LEVELS, sb1->Longfile.levels); + pr_err("qnx6: too many longfilename levels (max %i, sb %i)\n", + QNX6_PTR_MAX_LEVELS, sb1->Longfile.levels); goto out; } s->s_op = &qnx6_sops; @@ -460,7 +453,7 @@ mmi_success: /* prefetch root inode */ root = qnx6_iget(s, QNX6_ROOT_INO); if (IS_ERR(root)) { - printk(KERN_ERR "qnx6: get inode failed\n"); + pr_err("qnx6: get inode failed\n"); ret = PTR_ERR(root); goto out2; } @@ -474,7 +467,7 @@ mmi_success: errmsg = qnx6_checkroot(s); if (errmsg != NULL) { if (!silent) - printk(KERN_ERR "qnx6: %s\n", errmsg); + pr_err("qnx6: %s\n", errmsg); goto out3; } return 0; @@ -555,8 +548,7 @@ struct inode *qnx6_iget(struct super_block *sb, unsigned ino) inode->i_mode = 0; if (ino == 0) { - printk(KERN_ERR "qnx6: bad inode number on dev %s: %u is " - "out of range\n", + pr_err("qnx6: bad inode number on dev %s: %u is out of range\n", sb->s_id, ino); iget_failed(inode); return ERR_PTR(-EIO); @@ -566,8 +558,8 @@ struct inode *qnx6_iget(struct super_block *sb, unsigned ino) mapping = sbi->inodes->i_mapping; page = read_mapping_page(mapping, n, NULL); if (IS_ERR(page)) { - printk(KERN_ERR "qnx6: major problem: unable to read inode from " - "dev %s\n", sb->s_id); + pr_err("qnx6: major problem: unable to read inode from dev %s\n", + sb->s_id); iget_failed(inode); return ERR_CAST(page); } @@ -689,7 +681,7 @@ static int __init init_qnx6_fs(void) return err; } - printk(KERN_INFO "QNX6 filesystem 1.0.0 registered.\n"); + pr_info("QNX6 filesystem 1.0.0 registered.\n"); return 0; } diff --git a/fs/qnx6/super_mmi.c b/fs/qnx6/super_mmi.c index 29c32cb..8e056b0 100644 --- a/fs/qnx6/super_mmi.c +++ b/fs/qnx6/super_mmi.c @@ -44,15 +44,14 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) start with the first superblock */ bh1 = sb_bread(s, 0); if (!bh1) { - printk(KERN_ERR "qnx6: Unable to read first mmi superblock\n"); + pr_err("qnx6: Unable to read first mmi superblock\n"); return NULL; } sb1 = (struct qnx6_mmi_super_block *)bh1->b_data; sbi = QNX6_SB(s); if (fs32_to_cpu(sbi, sb1->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) { - printk(KERN_ERR "qnx6: wrong signature (magic) in" - " superblock #1.\n"); + pr_err("qnx6: wrong signature (magic) in superblock #1.\n"); goto out; } } @@ -60,7 +59,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb1->sb_checksum) != crc32_be(0, (char *)(bh1->b_data + 8), 504)) { - printk(KERN_ERR "qnx6: superblock #1 checksum error\n"); + pr_err("qnx6: superblock #1 checksum error\n"); goto out; } @@ -70,7 +69,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* set new blocksize */ if (!sb_set_blocksize(s, fs32_to_cpu(sbi, sb1->sb_blocksize))) { - printk(KERN_ERR "qnx6: unable to set blocksize\n"); + pr_err("qnx6: unable to set blocksize\n"); goto out; } /* blocksize invalidates bh - pull it back in */ @@ -83,27 +82,26 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* read second superblock */ bh2 = sb_bread(s, offset); if (!bh2) { - printk(KERN_ERR "qnx6: unable to read the second superblock\n"); + pr_err("qnx6: unable to read the second superblock\n"); goto out; } sb2 = (struct qnx6_mmi_super_block *)bh2->b_data; if (fs32_to_cpu(sbi, sb2->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) - printk(KERN_ERR "qnx6: wrong signature (magic) in" - " superblock #2.\n"); + pr_err("qnx6: wrong signature (magic) in superblock #2.\n"); goto out; } /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb2->sb_checksum) != crc32_be(0, (char *)(bh2->b_data + 8), 504)) { - printk(KERN_ERR "qnx6: superblock #1 checksum error\n"); + pr_err("qnx6: superblock #1 checksum error\n"); goto out; } qsb = kmalloc(sizeof(*qsb), GFP_KERNEL); if (!qsb) { - printk(KERN_ERR "qnx6: unable to allocate memory.\n"); + pr_err("qnx6: unable to allocate memory.\n"); goto out; } @@ -119,7 +117,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) sbi->sb_buf = bh1; sbi->sb = (struct qnx6_super_block *)bh1->b_data; brelse(bh2); - printk(KERN_INFO "qnx6: superblock #1 active\n"); + pr_info("qnx6: superblock #1 active\n"); } else { /* superblock #2 active */ qnx6_mmi_copy_sb(qsb, sb2); @@ -131,7 +129,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) sbi->sb_buf = bh2; sbi->sb = (struct qnx6_super_block *)bh2->b_data; brelse(bh1); - printk(KERN_INFO "qnx6: superblock #2 active\n"); + pr_info("qnx6: superblock #2 active\n"); } kfree(qsb); -- cgit v0.10.2 From e6c3261653a22f7481791e02fe19d11faac214fb Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:05 -0700 Subject: fs/qnx6: use pr_fmt and __func__ in logging Remove "qnx6:" and "qnx6: " from each logging instruction. Signed-off-by: Fabian Frederick Cc: Joe Perches Cc: Kai Bankett Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/qnx6/dir.c b/fs/qnx6/dir.c index 2289ea1..c78a339 100644 --- a/fs/qnx6/dir.c +++ b/fs/qnx6/dir.c @@ -77,12 +77,12 @@ static int qnx6_dir_longfilename(struct inode *inode, if (de->de_size != 0xff) { /* error - long filename entries always have size 0xff in direntry */ - pr_err("qnx6: invalid direntry size (%i).\n", de->de_size); + pr_err("invalid direntry size (%i).\n", de->de_size); return 0; } lf = qnx6_longname(s, de, &page); if (IS_ERR(lf)) { - pr_err("qnx6:Error reading longname\n"); + pr_err("Error reading longname\n"); return 0; } @@ -90,7 +90,7 @@ static int qnx6_dir_longfilename(struct inode *inode, if (lf_size > QNX6_LONG_NAME_MAX) { QNX6DEBUG((KERN_INFO "file %s\n", lf->lf_fname)); - pr_err("qnx6:Filename too long (%i)\n", lf_size); + pr_err("Filename too long (%i)\n", lf_size); qnx6_put_page(page); return 0; } @@ -99,7 +99,7 @@ static int qnx6_dir_longfilename(struct inode *inode, mmi 3g filesystem does not have that checksum */ if (!test_opt(s, MMI_FS) && fs32_to_cpu(sbi, de->de_checksum) != qnx6_lfile_checksum(lf->lf_fname, lf_size)) - pr_info("qnx6: long filename checksum error.\n"); + pr_info("long filename checksum error.\n"); QNX6DEBUG((KERN_INFO "qnx6_readdir:%.*s inode:%u\n", lf_size, lf->lf_fname, de_inode)); @@ -135,7 +135,7 @@ static int qnx6_readdir(struct file *file, struct dir_context *ctx) int i = start; if (IS_ERR(page)) { - pr_err("qnx6_readdir: read failed\n"); + pr_err("%s(): read failed\n", __func__); ctx->pos = (n + 1) << PAGE_CACHE_SHIFT; return PTR_ERR(page); } @@ -258,7 +258,7 @@ unsigned qnx6_find_entry(int len, struct inode *dir, const char *name, if (ino) goto found; } else - pr_err("qnx6: undefined filename size in inode.\n"); + pr_err("undefined filename size in inode.\n"); } qnx6_put_page(page); } diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c index 0042af9..1652e8b 100644 --- a/fs/qnx6/inode.c +++ b/fs/qnx6/inode.c @@ -87,7 +87,7 @@ static int qnx6_get_block(struct inode *inode, sector_t iblock, static int qnx6_check_blockptr(__fs32 ptr) { if (ptr == ~(__fs32)0) { - pr_err("qnx6: hit unused blockpointer.\n"); + pr_err("hit unused blockpointer.\n"); return 0; } return 1; @@ -127,7 +127,7 @@ static unsigned qnx6_block_map(struct inode *inode, unsigned no) levelptr = no >> bitdelta; if (levelptr > QNX6_NO_DIRECT_POINTERS - 1) { - pr_err("qnx6:Requested file block number (%u) too big.", no); + pr_err("Requested file block number (%u) too big.", no); return 0; } @@ -136,7 +136,7 @@ static unsigned qnx6_block_map(struct inode *inode, unsigned no) for (i = 0; i < depth; i++) { bh = sb_bread(s, block); if (!bh) { - pr_err("qnx6:Error reading block (%u)\n", block); + pr_err("Error reading block (%u)\n", block); return 0; } bitdelta -= ptrbits; @@ -275,7 +275,7 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s, start with the first superblock */ bh = sb_bread(s, offset); if (!bh) { - pr_err("qnx6: unable to read the first superblock\n"); + pr_err("unable to read the first superblock\n"); return NULL; } sb = (struct qnx6_super_block *)bh->b_data; @@ -290,9 +290,9 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s, sbi->s_bytesex = BYTESEX_LE; if (!silent) { if (offset == 0) { - pr_err("qnx6: wrong signature (magic) in superblock #1.\n"); + pr_err("wrong signature (magic) in superblock #1.\n"); } else { - pr_info("qnx6: wrong signature (magic) at position (0x%lx) - will try alternative position (0x0000).\n", + pr_info("wrong signature (magic) at position (0x%lx) - will try alternative position (0x0000).\n", offset * s->s_blocksize); } } @@ -324,13 +324,13 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* Superblock always is 512 Byte long */ if (!sb_set_blocksize(s, QNX6_SUPERBLOCK_SIZE)) { - pr_err("qnx6: unable to set blocksize\n"); + pr_err("unable to set blocksize\n"); goto outnobh; } /* parse the mount-options */ if (!qnx6_parse_options((char *) data, s)) { - pr_err("qnx6: invalid mount options.\n"); + pr_err("invalid mount options.\n"); goto outnobh; } if (test_opt(s, MMI_FS)) { @@ -350,7 +350,7 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* try again without bootblock offset */ bh1 = qnx6_check_first_superblock(s, 0, silent); if (!bh1) { - pr_err("qnx6: unable to read the first superblock\n"); + pr_err("unable to read the first superblock\n"); goto outnobh; } /* seems that no bootblock at partition start */ @@ -365,13 +365,13 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb1->sb_checksum) != crc32_be(0, (char *)(bh1->b_data + 8), 504)) { - pr_err("qnx6: superblock #1 checksum error\n"); + pr_err("superblock #1 checksum error\n"); goto out; } /* set new blocksize */ if (!sb_set_blocksize(s, fs32_to_cpu(sbi, sb1->sb_blocksize))) { - pr_err("qnx6: unable to set blocksize\n"); + pr_err("unable to set blocksize\n"); goto out; } /* blocksize invalidates bh - pull it back in */ @@ -393,20 +393,20 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) /* next the second superblock */ bh2 = sb_bread(s, offset); if (!bh2) { - pr_err("qnx6: unable to read the second superblock\n"); + pr_err("unable to read the second superblock\n"); goto out; } sb2 = (struct qnx6_super_block *)bh2->b_data; if (fs32_to_cpu(sbi, sb2->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) - pr_err("qnx6: wrong signature (magic) in superblock #2.\n"); + pr_err("wrong signature (magic) in superblock #2.\n"); goto out; } /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb2->sb_checksum) != crc32_be(0, (char *)(bh2->b_data + 8), 504)) { - pr_err("qnx6: superblock #2 checksum error\n"); + pr_err("superblock #2 checksum error\n"); goto out; } @@ -416,23 +416,23 @@ static int qnx6_fill_super(struct super_block *s, void *data, int silent) sbi->sb_buf = bh1; sbi->sb = (struct qnx6_super_block *)bh1->b_data; brelse(bh2); - pr_info("qnx6: superblock #1 active\n"); + pr_info("superblock #1 active\n"); } else { /* superblock #2 active */ sbi->sb_buf = bh2; sbi->sb = (struct qnx6_super_block *)bh2->b_data; brelse(bh1); - pr_info("qnx6: superblock #2 active\n"); + pr_info("superblock #2 active\n"); } mmi_success: /* sanity check - limit maximum indirect pointer levels */ if (sb1->Inode.levels > QNX6_PTR_MAX_LEVELS) { - pr_err("qnx6: too many inode levels (max %i, sb %i)\n", + pr_err("too many inode levels (max %i, sb %i)\n", QNX6_PTR_MAX_LEVELS, sb1->Inode.levels); goto out; } if (sb1->Longfile.levels > QNX6_PTR_MAX_LEVELS) { - pr_err("qnx6: too many longfilename levels (max %i, sb %i)\n", + pr_err("too many longfilename levels (max %i, sb %i)\n", QNX6_PTR_MAX_LEVELS, sb1->Longfile.levels); goto out; } @@ -453,7 +453,7 @@ mmi_success: /* prefetch root inode */ root = qnx6_iget(s, QNX6_ROOT_INO); if (IS_ERR(root)) { - pr_err("qnx6: get inode failed\n"); + pr_err("get inode failed\n"); ret = PTR_ERR(root); goto out2; } @@ -467,7 +467,7 @@ mmi_success: errmsg = qnx6_checkroot(s); if (errmsg != NULL) { if (!silent) - pr_err("qnx6: %s\n", errmsg); + pr_err("%s\n", errmsg); goto out3; } return 0; @@ -548,7 +548,7 @@ struct inode *qnx6_iget(struct super_block *sb, unsigned ino) inode->i_mode = 0; if (ino == 0) { - pr_err("qnx6: bad inode number on dev %s: %u is out of range\n", + pr_err("bad inode number on dev %s: %u is out of range\n", sb->s_id, ino); iget_failed(inode); return ERR_PTR(-EIO); @@ -558,7 +558,7 @@ struct inode *qnx6_iget(struct super_block *sb, unsigned ino) mapping = sbi->inodes->i_mapping; page = read_mapping_page(mapping, n, NULL); if (IS_ERR(page)) { - pr_err("qnx6: major problem: unable to read inode from dev %s\n", + pr_err("major problem: unable to read inode from dev %s\n", sb->s_id); iget_failed(inode); return ERR_CAST(page); diff --git a/fs/qnx6/qnx6.h b/fs/qnx6/qnx6.h index b00fcc9..b0aceda 100644 --- a/fs/qnx6/qnx6.h +++ b/fs/qnx6/qnx6.h @@ -10,6 +10,12 @@ * */ +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include diff --git a/fs/qnx6/super_mmi.c b/fs/qnx6/super_mmi.c index 8e056b0..62aaf3e 100644 --- a/fs/qnx6/super_mmi.c +++ b/fs/qnx6/super_mmi.c @@ -44,14 +44,14 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) start with the first superblock */ bh1 = sb_bread(s, 0); if (!bh1) { - pr_err("qnx6: Unable to read first mmi superblock\n"); + pr_err("Unable to read first mmi superblock\n"); return NULL; } sb1 = (struct qnx6_mmi_super_block *)bh1->b_data; sbi = QNX6_SB(s); if (fs32_to_cpu(sbi, sb1->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) { - pr_err("qnx6: wrong signature (magic) in superblock #1.\n"); + pr_err("wrong signature (magic) in superblock #1.\n"); goto out; } } @@ -59,7 +59,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb1->sb_checksum) != crc32_be(0, (char *)(bh1->b_data + 8), 504)) { - pr_err("qnx6: superblock #1 checksum error\n"); + pr_err("superblock #1 checksum error\n"); goto out; } @@ -69,7 +69,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* set new blocksize */ if (!sb_set_blocksize(s, fs32_to_cpu(sbi, sb1->sb_blocksize))) { - pr_err("qnx6: unable to set blocksize\n"); + pr_err("unable to set blocksize\n"); goto out; } /* blocksize invalidates bh - pull it back in */ @@ -82,26 +82,26 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) /* read second superblock */ bh2 = sb_bread(s, offset); if (!bh2) { - pr_err("qnx6: unable to read the second superblock\n"); + pr_err("unable to read the second superblock\n"); goto out; } sb2 = (struct qnx6_mmi_super_block *)bh2->b_data; if (fs32_to_cpu(sbi, sb2->sb_magic) != QNX6_SUPER_MAGIC) { if (!silent) - pr_err("qnx6: wrong signature (magic) in superblock #2.\n"); + pr_err("wrong signature (magic) in superblock #2.\n"); goto out; } /* checksum check - start at byte 8 and end at byte 512 */ if (fs32_to_cpu(sbi, sb2->sb_checksum) != crc32_be(0, (char *)(bh2->b_data + 8), 504)) { - pr_err("qnx6: superblock #1 checksum error\n"); + pr_err("superblock #1 checksum error\n"); goto out; } qsb = kmalloc(sizeof(*qsb), GFP_KERNEL); if (!qsb) { - pr_err("qnx6: unable to allocate memory.\n"); + pr_err("unable to allocate memory.\n"); goto out; } @@ -117,7 +117,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) sbi->sb_buf = bh1; sbi->sb = (struct qnx6_super_block *)bh1->b_data; brelse(bh2); - pr_info("qnx6: superblock #1 active\n"); + pr_info("superblock #1 active\n"); } else { /* superblock #2 active */ qnx6_mmi_copy_sb(qsb, sb2); @@ -129,7 +129,7 @@ struct qnx6_super_block *qnx6_mmi_fill_super(struct super_block *s, int silent) sbi->sb_buf = bh2; sbi->sb = (struct qnx6_super_block *)bh2->b_data; brelse(bh1); - pr_info("qnx6: superblock #2 active\n"); + pr_info("superblock #2 active\n"); } kfree(qsb); -- cgit v0.10.2 From fa5a7a41a601d952e53bfcfa6d50ca22b956ee3a Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:07 -0700 Subject: fs/qnx6: update debugging to current functions Add DDEBUG in Makefile when CONFIG_QNX6FS_DEBUG is set. All QNX6DEBUG messages are replaced by pr_debug which means debugging will be emitted in debug level only and no more in error and info levels. debug uses now pr_fmt and __func__ QNX6DEBUG definition has been removed. Signed-off-by: Fabian Frederick Cc: Joe Perches Cc: Kai Bankett Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/qnx6/Makefile b/fs/qnx6/Makefile index 9dd0619..5e6bae6 100644 --- a/fs/qnx6/Makefile +++ b/fs/qnx6/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_QNX6FS_FS) += qnx6.o qnx6-objs := inode.o dir.o namei.o super_mmi.o +ccflags-$(CONFIG_QNX6FS_DEBUG) += -DDEBUG diff --git a/fs/qnx6/dir.c b/fs/qnx6/dir.c index c78a339..8d64bb5 100644 --- a/fs/qnx6/dir.c +++ b/fs/qnx6/dir.c @@ -89,7 +89,7 @@ static int qnx6_dir_longfilename(struct inode *inode, lf_size = fs16_to_cpu(sbi, lf->lf_size); if (lf_size > QNX6_LONG_NAME_MAX) { - QNX6DEBUG((KERN_INFO "file %s\n", lf->lf_fname)); + pr_debug("file %s\n", lf->lf_fname); pr_err("Filename too long (%i)\n", lf_size); qnx6_put_page(page); return 0; @@ -101,8 +101,8 @@ static int qnx6_dir_longfilename(struct inode *inode, qnx6_lfile_checksum(lf->lf_fname, lf_size)) pr_info("long filename checksum error.\n"); - QNX6DEBUG((KERN_INFO "qnx6_readdir:%.*s inode:%u\n", - lf_size, lf->lf_fname, de_inode)); + pr_debug("qnx6_readdir:%.*s inode:%u\n", + lf_size, lf->lf_fname, de_inode); if (!dir_emit(ctx, lf->lf_fname, lf_size, de_inode, DT_UNKNOWN)) { qnx6_put_page(page); return 0; @@ -158,9 +158,9 @@ static int qnx6_readdir(struct file *file, struct dir_context *ctx) break; } } else { - QNX6DEBUG((KERN_INFO "qnx6_readdir:%.*s" - " inode:%u\n", size, de->de_fname, - no_inode)); + pr_debug("%s():%.*s inode:%u\n", + __func__, size, de->de_fname, + no_inode); if (!dir_emit(ctx, de->de_fname, size, no_inode, DT_UNKNOWN)) { done = true; diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c index 1652e8b..44e7392 100644 --- a/fs/qnx6/inode.c +++ b/fs/qnx6/inode.c @@ -73,8 +73,8 @@ static int qnx6_get_block(struct inode *inode, sector_t iblock, { unsigned phys; - QNX6DEBUG((KERN_INFO "qnx6: qnx6_get_block inode=[%ld] iblock=[%ld]\n", - inode->i_ino, (unsigned long)iblock)); + pr_debug("qnx6_get_block inode=[%ld] iblock=[%ld]\n", + inode->i_ino, (unsigned long)iblock); phys = qnx6_block_map(inode, iblock); if (phys) { @@ -205,26 +205,16 @@ void qnx6_superblock_debug(struct qnx6_super_block *sb, struct super_block *s) { struct qnx6_sb_info *sbi = QNX6_SB(s); - QNX6DEBUG((KERN_INFO "magic: %08x\n", - fs32_to_cpu(sbi, sb->sb_magic))); - QNX6DEBUG((KERN_INFO "checksum: %08x\n", - fs32_to_cpu(sbi, sb->sb_checksum))); - QNX6DEBUG((KERN_INFO "serial: %llx\n", - fs64_to_cpu(sbi, sb->sb_serial))); - QNX6DEBUG((KERN_INFO "flags: %08x\n", - fs32_to_cpu(sbi, sb->sb_flags))); - QNX6DEBUG((KERN_INFO "blocksize: %08x\n", - fs32_to_cpu(sbi, sb->sb_blocksize))); - QNX6DEBUG((KERN_INFO "num_inodes: %08x\n", - fs32_to_cpu(sbi, sb->sb_num_inodes))); - QNX6DEBUG((KERN_INFO "free_inodes: %08x\n", - fs32_to_cpu(sbi, sb->sb_free_inodes))); - QNX6DEBUG((KERN_INFO "num_blocks: %08x\n", - fs32_to_cpu(sbi, sb->sb_num_blocks))); - QNX6DEBUG((KERN_INFO "free_blocks: %08x\n", - fs32_to_cpu(sbi, sb->sb_free_blocks))); - QNX6DEBUG((KERN_INFO "inode_levels: %02x\n", - sb->Inode.levels)); + pr_debug("magic: %08x\n", fs32_to_cpu(sbi, sb->sb_magic)); + pr_debug("checksum: %08x\n", fs32_to_cpu(sbi, sb->sb_checksum)); + pr_debug("serial: %llx\n", fs64_to_cpu(sbi, sb->sb_serial)); + pr_debug("flags: %08x\n", fs32_to_cpu(sbi, sb->sb_flags)); + pr_debug("blocksize: %08x\n", fs32_to_cpu(sbi, sb->sb_blocksize)); + pr_debug("num_inodes: %08x\n", fs32_to_cpu(sbi, sb->sb_num_inodes)); + pr_debug("free_inodes: %08x\n", fs32_to_cpu(sbi, sb->sb_free_inodes)); + pr_debug("num_blocks: %08x\n", fs32_to_cpu(sbi, sb->sb_num_blocks)); + pr_debug("free_blocks: %08x\n", fs32_to_cpu(sbi, sb->sb_free_blocks)); + pr_debug("inode_levels: %02x\n", sb->Inode.levels); } #endif @@ -283,8 +273,7 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s, sbi->s_bytesex = BYTESEX_BE; if (fs32_to_cpu(sbi, sb->sb_magic) == QNX6_SUPER_MAGIC) { /* we got a big endian fs */ - QNX6DEBUG((KERN_INFO "qnx6: fs got different" - " endianness.\n")); + pr_debug("fs got different endianness.\n"); return bh; } else sbi->s_bytesex = BYTESEX_LE; diff --git a/fs/qnx6/namei.c b/fs/qnx6/namei.c index 0561326..6c1a323 100644 --- a/fs/qnx6/namei.c +++ b/fs/qnx6/namei.c @@ -29,12 +29,12 @@ struct dentry *qnx6_lookup(struct inode *dir, struct dentry *dentry, foundinode = qnx6_iget(dir->i_sb, ino); qnx6_put_page(page); if (IS_ERR(foundinode)) { - QNX6DEBUG((KERN_ERR "qnx6: lookup->iget -> " - " error %ld\n", PTR_ERR(foundinode))); + pr_debug("lookup->iget -> error %ld\n", + PTR_ERR(foundinode)); return ERR_CAST(foundinode); } } else { - QNX6DEBUG((KERN_INFO "qnx6_lookup: not found %s\n", name)); + pr_debug("%s(): not found %s\n", __func__, name); return NULL; } d_add(dentry, foundinode); diff --git a/fs/qnx6/qnx6.h b/fs/qnx6/qnx6.h index b0aceda..d3fb2b6 100644 --- a/fs/qnx6/qnx6.h +++ b/fs/qnx6/qnx6.h @@ -25,12 +25,6 @@ typedef __u64 __bitwise __fs64; #include -#ifdef CONFIG_QNX6FS_DEBUG -#define QNX6DEBUG(X) printk X -#else -#define QNX6DEBUG(X) (void) 0 -#endif - struct qnx6_sb_info { struct buffer_head *sb_buf; /* superblock buffer */ struct qnx6_super_block *sb; /* our superblock */ -- cgit v0.10.2 From 4d4b866aee039d609c0b40e7e5b27204607ce614 Mon Sep 17 00:00:00 2001 From: Yinghai Lu Date: Fri, 8 Aug 2014 14:23:10 -0700 Subject: initrd: fix lz4 decompress with initrd During testing initrd (>2G) support, find decompress/lz4 does not work with initrd at all. decompress_* should support: 1. inbuf[]/outbuf[] for kernel preboot. 2. inbuf[]/flush() for initramfs 3. fill()/flush() for initrd. in the unlz4 does not handle case 3, as input len is passed as 0, and it failed in first try. Fix that add one extra if (fill) checking, and get out if EOF from the fill(). Signed-off-by: Yinghai Lu Cc: Kyungsik Lee Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c index 7d1e83c..3ad7f39 100644 --- a/lib/decompress_unlz4.c +++ b/lib/decompress_unlz4.c @@ -83,13 +83,20 @@ STATIC inline int INIT unlz4(u8 *input, int in_len, if (posp) *posp = 0; - if (fill) - fill(inp, 4); + if (fill) { + size = fill(inp, 4); + if (size < 4) { + error("data corrupted"); + goto exit_2; + } + } chunksize = get_unaligned_le32(inp); if (chunksize == ARCHIVE_MAGICNUMBER) { - inp += 4; - size -= 4; + if (!fill) { + inp += 4; + size -= 4; + } } else { error("invalid header"); goto exit_2; @@ -100,29 +107,44 @@ STATIC inline int INIT unlz4(u8 *input, int in_len, for (;;) { - if (fill) - fill(inp, 4); + if (fill) { + size = fill(inp, 4); + if (size == 0) + break; + if (size < 4) { + error("data corrupted"); + goto exit_2; + } + } chunksize = get_unaligned_le32(inp); if (chunksize == ARCHIVE_MAGICNUMBER) { - inp += 4; - size -= 4; + if (!fill) { + inp += 4; + size -= 4; + } if (posp) *posp += 4; continue; } - inp += 4; - size -= 4; + if (posp) *posp += 4; - if (fill) { + if (!fill) { + inp += 4; + size -= 4; + } else { if (chunksize > lz4_compressbound(uncomp_chunksize)) { error("chunk length is longer than allocated"); goto exit_2; } - fill(inp, chunksize); + size = fill(inp, chunksize); + if (size < chunksize) { + error("data corrupted"); + goto exit_2; + } } #ifdef PREBOOT if (out_len >= uncomp_chunksize) { @@ -149,18 +171,17 @@ STATIC inline int INIT unlz4(u8 *input, int in_len, if (posp) *posp += chunksize; - size -= chunksize; + if (!fill) { + size -= chunksize; - if (size == 0) - break; - else if (size < 0) { - error("data corrupted"); - goto exit_2; + if (size == 0) + break; + else if (size < 0) { + error("data corrupted"); + goto exit_2; + } + inp += chunksize; } - - inp += chunksize; - if (fill) - inp = inp_start; } ret = 0; -- cgit v0.10.2 From 38747439914c468ecba70b492b54dc4ef0b50453 Mon Sep 17 00:00:00 2001 From: Yinghai Lu Date: Fri, 8 Aug 2014 14:23:12 -0700 Subject: initramfs: support initrd that is bigger than 2GiB When initrd (compressed or not) is used, kernel report data corrupted with /dev/ram0. The root cause: During initramfs checking, if it is initrd, it will be transferred to /initrd.image with sys_write. sys_write only support 2G-4K write, so if the initrd ram is more than that, /initrd.image will not complete at all. Add local xwrite to loop calling sys_write to workaround the problem. Also need to use xwrite in write_buffer() to handle: image is uncompressed cpio and there is one big file (>2G) in it. unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy At the same time, we don't need to worry about sys_read/sys_write in do_mounts_rd.c::crd_load. As decompressor will have fill/flush and local buffer that is smaller than 2G. Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz, lzop. Signed-off-by: Yinghai Lu Acked-by: H. Peter Anvin Cc: Ingo Molnar Cc: Geert Uytterhoeven Cc: Tetsuo Handa Cc: "Daniel M. Weeks" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/initramfs.c b/init/initramfs.c index a8497fa..4f276b6 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -19,6 +19,29 @@ #include #include +static ssize_t __init xwrite(int fd, const char *p, size_t count) +{ + ssize_t out = 0; + + /* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */ + while (count) { + ssize_t rv = sys_write(fd, p, count); + + if (rv < 0) { + if (rv == -EINTR || rv == -EAGAIN) + continue; + return out ? out : rv; + } else if (rv == 0) + break; + + p += rv; + out += rv; + count -= rv; + } + + return out; +} + static __initdata char *message; static void __init error(char *x) { @@ -346,7 +369,7 @@ static int __init do_name(void) static int __init do_copy(void) { if (count >= body_len) { - sys_write(wfd, victim, body_len); + xwrite(wfd, victim, body_len); sys_close(wfd); do_utime(vcollected, mtime); kfree(vcollected); @@ -354,7 +377,7 @@ static int __init do_copy(void) state = SkipIt; return 0; } else { - sys_write(wfd, victim, count); + xwrite(wfd, victim, count); body_len -= count; eat(count); return 1; @@ -603,8 +626,13 @@ static int __init populate_rootfs(void) fd = sys_open("/initrd.image", O_WRONLY|O_CREAT, 0700); if (fd >= 0) { - sys_write(fd, (char *)initrd_start, - initrd_end - initrd_start); + ssize_t written = xwrite(fd, (char *)initrd_start, + initrd_end - initrd_start); + + if (written != initrd_end - initrd_start) + pr_err("/initrd.image: incomplete write (%zd != %ld)\n", + written, initrd_end - initrd_start); + sys_close(fd); free_initrd(); } -- cgit v0.10.2 From d97b07c54f34e88352ebe676beb798c8f59ac588 Mon Sep 17 00:00:00 2001 From: Yinghai Lu Date: Fri, 8 Aug 2014 14:23:14 -0700 Subject: initramfs: support initramfs that is bigger than 2GiB Now with 64bit bzImage and kexec tools, we support ramdisk that size is bigger than 2g, as we could put it above 4G. Found compressed initramfs image could not be decompressed properly. It turns out that image length is int during decompress detection, and it will become < 0 when length is more than 2G. Furthermore, during decompressing len as int is used for inbuf count, that has problem too. Change len to long, that should be ok as on 32 bit platform long is 32bits. Tested with following compressed initramfs image as root with kexec. gzip, bzip2, xz, lzma, lzop, lz4. run time for populate_rootfs(): size name Nehalem-EX Westmere-EX Ivybridge-EX 9034400256 root_img : 26s 24s 30s 3561095057 root_img.lz4 : 28s 27s 27s 3459554629 root_img.lzo : 29s 29s 28s 3219399480 root_img.gz : 64s 62s 49s 2251594592 root_img.xz : 262s 260s 183s 2226366598 root_img.lzma: 386s 376s 277s 2901482513 root_img.bz2 : 635s 599s Signed-off-by: Yinghai Lu Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Rashika Kheria Cc: Josh Triplett Cc: Kyungsik Lee Cc: P J P Cc: Al Viro Cc: Tetsuo Handa Cc: "Daniel M. Weeks" Cc: Alexandre Courbot Cc: Jan Beulich Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/crypto/zlib.c b/crypto/zlib.c index 06b62e5..c9ee681 100644 --- a/crypto/zlib.c +++ b/crypto/zlib.c @@ -168,7 +168,7 @@ static int zlib_compress_update(struct crypto_pcomp *tfm, } ret = req->avail_out - stream->avail_out; - pr_debug("avail_in %u, avail_out %u (consumed %u, produced %u)\n", + pr_debug("avail_in %lu, avail_out %lu (consumed %lu, produced %u)\n", stream->avail_in, stream->avail_out, req->avail_in - stream->avail_in, ret); req->next_in = stream->next_in; @@ -198,7 +198,7 @@ static int zlib_compress_final(struct crypto_pcomp *tfm, } ret = req->avail_out - stream->avail_out; - pr_debug("avail_in %u, avail_out %u (consumed %u, produced %u)\n", + pr_debug("avail_in %lu, avail_out %lu (consumed %lu, produced %u)\n", stream->avail_in, stream->avail_out, req->avail_in - stream->avail_in, ret); req->next_in = stream->next_in; @@ -283,7 +283,7 @@ static int zlib_decompress_update(struct crypto_pcomp *tfm, } ret = req->avail_out - stream->avail_out; - pr_debug("avail_in %u, avail_out %u (consumed %u, produced %u)\n", + pr_debug("avail_in %lu, avail_out %lu (consumed %lu, produced %u)\n", stream->avail_in, stream->avail_out, req->avail_in - stream->avail_in, ret); req->next_in = stream->next_in; @@ -331,7 +331,7 @@ static int zlib_decompress_final(struct crypto_pcomp *tfm, } ret = req->avail_out - stream->avail_out; - pr_debug("avail_in %u, avail_out %u (consumed %u, produced %u)\n", + pr_debug("avail_in %lu, avail_out %lu (consumed %lu, produced %u)\n", stream->avail_in, stream->avail_out, req->avail_in - stream->avail_in, ret); req->next_in = stream->next_in; diff --git a/fs/isofs/compress.c b/fs/isofs/compress.c index 592e511..f311bf0 100644 --- a/fs/isofs/compress.c +++ b/fs/isofs/compress.c @@ -158,8 +158,8 @@ static loff_t zisofs_uncompress_block(struct inode *inode, loff_t block_start, "zisofs: zisofs_inflate returned" " %d, inode = %lu," " page idx = %d, bh idx = %d," - " avail_in = %d," - " avail_out = %d\n", + " avail_in = %ld," + " avail_out = %ld\n", zerr, inode->i_ino, curpage, curbh, stream.avail_in, stream.avail_out); diff --git a/fs/jffs2/compr_zlib.c b/fs/jffs2/compr_zlib.c index 0b9a1e4..5698dae 100644 --- a/fs/jffs2/compr_zlib.c +++ b/fs/jffs2/compr_zlib.c @@ -94,11 +94,12 @@ static int jffs2_zlib_compress(unsigned char *data_in, while (def_strm.total_out < *dstlen - STREAM_END_SPACE && def_strm.total_in < *sourcelen) { def_strm.avail_out = *dstlen - (def_strm.total_out + STREAM_END_SPACE); - def_strm.avail_in = min((unsigned)(*sourcelen-def_strm.total_in), def_strm.avail_out); - jffs2_dbg(1, "calling deflate with avail_in %d, avail_out %d\n", + def_strm.avail_in = min_t(unsigned long, + (*sourcelen-def_strm.total_in), def_strm.avail_out); + jffs2_dbg(1, "calling deflate with avail_in %ld, avail_out %ld\n", def_strm.avail_in, def_strm.avail_out); ret = zlib_deflate(&def_strm, Z_PARTIAL_FLUSH); - jffs2_dbg(1, "deflate returned with avail_in %d, avail_out %d, total_in %ld, total_out %ld\n", + jffs2_dbg(1, "deflate returned with avail_in %ld, avail_out %ld, total_in %ld, total_out %ld\n", def_strm.avail_in, def_strm.avail_out, def_strm.total_in, def_strm.total_out); if (ret != Z_OK) { diff --git a/include/linux/decompress/bunzip2.h b/include/linux/decompress/bunzip2.h index 1152721..4d683df 100644 --- a/include/linux/decompress/bunzip2.h +++ b/include/linux/decompress/bunzip2.h @@ -1,10 +1,10 @@ #ifndef DECOMPRESS_BUNZIP2_H #define DECOMPRESS_BUNZIP2_H -int bunzip2(unsigned char *inbuf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +int bunzip2(unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *pos, + long *pos, void(*error)(char *x)); #endif diff --git a/include/linux/decompress/generic.h b/include/linux/decompress/generic.h index 0c7111a..1fcfd64 100644 --- a/include/linux/decompress/generic.h +++ b/include/linux/decompress/generic.h @@ -1,11 +1,11 @@ #ifndef DECOMPRESS_GENERIC_H #define DECOMPRESS_GENERIC_H -typedef int (*decompress_fn) (unsigned char *inbuf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +typedef int (*decompress_fn) (unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *outbuf, - int *posp, + long *posp, void(*error)(char *x)); /* inbuf - input buffer @@ -33,7 +33,7 @@ typedef int (*decompress_fn) (unsigned char *inbuf, int len, /* Utility routine to detect the decompression method */ -decompress_fn decompress_method(const unsigned char *inbuf, int len, +decompress_fn decompress_method(const unsigned char *inbuf, long len, const char **name); #endif diff --git a/include/linux/decompress/inflate.h b/include/linux/decompress/inflate.h index 1d0aede..e4f411f 100644 --- a/include/linux/decompress/inflate.h +++ b/include/linux/decompress/inflate.h @@ -1,10 +1,10 @@ #ifndef LINUX_DECOMPRESS_INFLATE_H #define LINUX_DECOMPRESS_INFLATE_H -int gunzip(unsigned char *inbuf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +int gunzip(unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *pos, + long *pos, void(*error_fn)(char *x)); #endif diff --git a/include/linux/decompress/unlz4.h b/include/linux/decompress/unlz4.h index d5b68bf..3273c2f 100644 --- a/include/linux/decompress/unlz4.h +++ b/include/linux/decompress/unlz4.h @@ -1,10 +1,10 @@ #ifndef DECOMPRESS_UNLZ4_H #define DECOMPRESS_UNLZ4_H -int unlz4(unsigned char *inbuf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +int unlz4(unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *pos, + long *pos, void(*error)(char *x)); #endif diff --git a/include/linux/decompress/unlzma.h b/include/linux/decompress/unlzma.h index 7796538..8a891a1 100644 --- a/include/linux/decompress/unlzma.h +++ b/include/linux/decompress/unlzma.h @@ -1,11 +1,11 @@ #ifndef DECOMPRESS_UNLZMA_H #define DECOMPRESS_UNLZMA_H -int unlzma(unsigned char *, int, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +int unlzma(unsigned char *, long, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *posp, + long *posp, void(*error)(char *x) ); diff --git a/include/linux/decompress/unlzo.h b/include/linux/decompress/unlzo.h index 9872297..af18f95 100644 --- a/include/linux/decompress/unlzo.h +++ b/include/linux/decompress/unlzo.h @@ -1,10 +1,10 @@ #ifndef DECOMPRESS_UNLZO_H #define DECOMPRESS_UNLZO_H -int unlzo(unsigned char *inbuf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +int unlzo(unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *pos, + long *pos, void(*error)(char *x)); #endif diff --git a/include/linux/decompress/unxz.h b/include/linux/decompress/unxz.h index 41728fc..f764e2a 100644 --- a/include/linux/decompress/unxz.h +++ b/include/linux/decompress/unxz.h @@ -10,10 +10,10 @@ #ifndef DECOMPRESS_UNXZ_H #define DECOMPRESS_UNXZ_H -int unxz(unsigned char *in, int in_size, - int (*fill)(void *dest, unsigned int size), - int (*flush)(void *src, unsigned int size), - unsigned char *out, int *in_used, +int unxz(unsigned char *in, long in_size, + long (*fill)(void *dest, unsigned long size), + long (*flush)(void *src, unsigned long size), + unsigned char *out, long *in_used, void (*error)(char *x)); #endif diff --git a/include/linux/zlib.h b/include/linux/zlib.h index 197abb2..92dbbd3 100644 --- a/include/linux/zlib.h +++ b/include/linux/zlib.h @@ -83,11 +83,11 @@ struct internal_state; typedef struct z_stream_s { const Byte *next_in; /* next input byte */ - uInt avail_in; /* number of bytes available at next_in */ + uLong avail_in; /* number of bytes available at next_in */ uLong total_in; /* total nb of input bytes read so far */ Byte *next_out; /* next output byte should be put there */ - uInt avail_out; /* remaining free space at next_out */ + uLong avail_out; /* remaining free space at next_out */ uLong total_out; /* total nb of bytes output so far */ char *msg; /* last error message, NULL if no error */ diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c index a822702..e5d059e 100644 --- a/init/do_mounts_rd.c +++ b/init/do_mounts_rd.c @@ -311,9 +311,9 @@ static int exit_code; static int decompress_error; static int crd_infd, crd_outfd; -static int __init compr_fill(void *buf, unsigned int len) +static long __init compr_fill(void *buf, unsigned long len) { - int r = sys_read(crd_infd, buf, len); + long r = sys_read(crd_infd, buf, len); if (r < 0) printk(KERN_ERR "RAMDISK: error while reading compressed data"); else if (r == 0) @@ -321,13 +321,13 @@ static int __init compr_fill(void *buf, unsigned int len) return r; } -static int __init compr_flush(void *window, unsigned int outcnt) +static long __init compr_flush(void *window, unsigned long outcnt) { - int written = sys_write(crd_outfd, window, outcnt); + long written = sys_write(crd_outfd, window, outcnt); if (written != outcnt) { if (decompress_error == 0) printk(KERN_ERR - "RAMDISK: incomplete write (%d != %d)\n", + "RAMDISK: incomplete write (%ld != %ld)\n", written, outcnt); decompress_error = 1; return -1; diff --git a/init/initramfs.c b/init/initramfs.c index 4f276b6..a756603 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -197,7 +197,7 @@ static __initdata enum state { } state, next_state; static __initdata char *victim; -static __initdata unsigned count; +static unsigned long count __initdata; static __initdata loff_t this_header, next_header; static inline void __init eat(unsigned n) @@ -209,7 +209,7 @@ static inline void __init eat(unsigned n) static __initdata char *vcollected; static __initdata char *collected; -static __initdata int remains; +static long remains __initdata; static __initdata char *collect; static void __init read_into(char *buf, unsigned size, enum state next) @@ -236,7 +236,7 @@ static int __init do_start(void) static int __init do_collect(void) { - unsigned n = remains; + unsigned long n = remains; if (count < n) n = count; memcpy(collect, victim, n); @@ -407,7 +407,7 @@ static __initdata int (*actions[])(void) = { [Reset] = do_reset, }; -static int __init write_buffer(char *buf, unsigned len) +static long __init write_buffer(char *buf, unsigned long len) { count = len; victim = buf; @@ -417,11 +417,11 @@ static int __init write_buffer(char *buf, unsigned len) return len - count; } -static int __init flush_buffer(void *bufv, unsigned len) +static long __init flush_buffer(void *bufv, unsigned long len) { char *buf = (char *) bufv; - int written; - int origLen = len; + long written; + long origLen = len; if (message) return -1; while ((written = write_buffer(buf, len)) < len && !message) { @@ -440,13 +440,13 @@ static int __init flush_buffer(void *bufv, unsigned len) return origLen; } -static unsigned my_inptr; /* index of next byte to be processed in inbuf */ +static unsigned long my_inptr; /* index of next byte to be processed in inbuf */ #include -static char * __init unpack_to_rootfs(char *buf, unsigned len) +static char * __init unpack_to_rootfs(char *buf, unsigned long len) { - int written, res; + long written; decompress_fn decompress; const char *compress_name; static __initdata char msg_buf[64]; @@ -480,7 +480,7 @@ static char * __init unpack_to_rootfs(char *buf, unsigned len) decompress = decompress_method(buf, len, &compress_name); pr_debug("Detected %s compressed data\n", compress_name); if (decompress) { - res = decompress(buf, len, NULL, flush_buffer, NULL, + int res = decompress(buf, len, NULL, flush_buffer, NULL, &my_inptr, error); if (res) error("decompressor failed"); diff --git a/lib/decompress.c b/lib/decompress.c index 86069d74..37f3c78 100644 --- a/lib/decompress.c +++ b/lib/decompress.c @@ -54,7 +54,7 @@ static const struct compress_format compressed_formats[] __initconst = { { {0, 0}, NULL, NULL } }; -decompress_fn __init decompress_method(const unsigned char *inbuf, int len, +decompress_fn __init decompress_method(const unsigned char *inbuf, long len, const char **name) { const struct compress_format *cf; diff --git a/lib/decompress_bunzip2.c b/lib/decompress_bunzip2.c index 31c5f76..8290e0b 100644 --- a/lib/decompress_bunzip2.c +++ b/lib/decompress_bunzip2.c @@ -92,8 +92,8 @@ struct bunzip_data { /* State for interrupting output loop */ int writeCopies, writePos, writeRunCountdown, writeCount, writeCurrent; /* I/O tracking data (file handles, buffers, positions, etc.) */ - int (*fill)(void*, unsigned int); - int inbufCount, inbufPos /*, outbufPos*/; + long (*fill)(void*, unsigned long); + long inbufCount, inbufPos /*, outbufPos*/; unsigned char *inbuf /*,*outbuf*/; unsigned int inbufBitCount, inbufBits; /* The CRC values stored in the block header and calculated from the @@ -617,7 +617,7 @@ decode_next_byte: goto decode_next_byte; } -static int INIT nofill(void *buf, unsigned int len) +static long INIT nofill(void *buf, unsigned long len) { return -1; } @@ -625,8 +625,8 @@ static int INIT nofill(void *buf, unsigned int len) /* Allocate the structure, read file header. If in_fd ==-1, inbuf must contain a complete bunzip file (len bytes long). If in_fd!=-1, inbuf and len are ignored, and data is read from file handle into temporary buffer. */ -static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, int len, - int (*fill)(void*, unsigned int)) +static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, long len, + long (*fill)(void*, unsigned long)) { struct bunzip_data *bd; unsigned int i, j, c; @@ -675,11 +675,11 @@ static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, int len, /* Example usage: decompress src_fd to dst_fd. (Stops at end of bzip2 data, not end of file.) */ -STATIC int INIT bunzip2(unsigned char *buf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC int INIT bunzip2(unsigned char *buf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *outbuf, - int *pos, + long *pos, void(*error)(char *x)) { struct bunzip_data *bd; @@ -743,11 +743,11 @@ exit_0: } #ifdef PREBOOT -STATIC int INIT decompress(unsigned char *buf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC int INIT decompress(unsigned char *buf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *outbuf, - int *pos, + long *pos, void(*error)(char *x)) { return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error); diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c index 0edfd74..d4c7891 100644 --- a/lib/decompress_inflate.c +++ b/lib/decompress_inflate.c @@ -27,17 +27,17 @@ #define GZIP_IOBUF_SIZE (16*1024) -static int INIT nofill(void *buffer, unsigned int len) +static long INIT nofill(void *buffer, unsigned long len) { return -1; } /* Included from initramfs et al code */ -STATIC int INIT gunzip(unsigned char *buf, int len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC int INIT gunzip(unsigned char *buf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *out_buf, - int *pos, + long *pos, void(*error)(char *x)) { u8 *zbuf; struct z_stream_s *strm; @@ -142,7 +142,7 @@ STATIC int INIT gunzip(unsigned char *buf, int len, /* Write any data generated */ if (flush && strm->next_out > out_buf) { - int l = strm->next_out - out_buf; + long l = strm->next_out - out_buf; if (l != flush(out_buf, l)) { rc = -1; error("write error"); diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c index 3ad7f39..40f66eb 100644 --- a/lib/decompress_unlz4.c +++ b/lib/decompress_unlz4.c @@ -31,10 +31,10 @@ #define LZ4_DEFAULT_UNCOMPRESSED_CHUNK_SIZE (8 << 20) #define ARCHIVE_MAGICNUMBER 0x184C2102 -STATIC inline int INIT unlz4(u8 *input, int in_len, - int (*fill) (void *, unsigned int), - int (*flush) (void *, unsigned int), - u8 *output, int *posp, +STATIC inline int INIT unlz4(u8 *input, long in_len, + long (*fill)(void *, unsigned long), + long (*flush)(void *, unsigned long), + u8 *output, long *posp, void (*error) (char *x)) { int ret = -1; @@ -43,7 +43,7 @@ STATIC inline int INIT unlz4(u8 *input, int in_len, u8 *inp; u8 *inp_start; u8 *outp; - int size = in_len; + long size = in_len; #ifdef PREBOOT size_t out_len = get_unaligned_le32(input + in_len); #endif @@ -196,11 +196,11 @@ exit_0: } #ifdef PREBOOT -STATIC int INIT decompress(unsigned char *buf, int in_len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC int INIT decompress(unsigned char *buf, long in_len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *posp, + long *posp, void(*error)(char *x) ) { diff --git a/lib/decompress_unlzma.c b/lib/decompress_unlzma.c index 32adb73..0be83af 100644 --- a/lib/decompress_unlzma.c +++ b/lib/decompress_unlzma.c @@ -65,11 +65,11 @@ static long long INIT read_int(unsigned char *ptr, int size) #define LZMA_IOBUF_SIZE 0x10000 struct rc { - int (*fill)(void*, unsigned int); + long (*fill)(void*, unsigned long); uint8_t *ptr; uint8_t *buffer; uint8_t *buffer_end; - int buffer_size; + long buffer_size; uint32_t code; uint32_t range; uint32_t bound; @@ -82,7 +82,7 @@ struct rc { #define RC_MODEL_TOTAL_BITS 11 -static int INIT nofill(void *buffer, unsigned int len) +static long INIT nofill(void *buffer, unsigned long len) { return -1; } @@ -99,8 +99,8 @@ static void INIT rc_read(struct rc *rc) /* Called once */ static inline void INIT rc_init(struct rc *rc, - int (*fill)(void*, unsigned int), - char *buffer, int buffer_size) + long (*fill)(void*, unsigned long), + char *buffer, long buffer_size) { if (fill) rc->fill = fill; @@ -280,7 +280,7 @@ struct writer { size_t buffer_pos; int bufsize; size_t global_pos; - int(*flush)(void*, unsigned int); + long (*flush)(void*, unsigned long); struct lzma_header *header; }; @@ -534,11 +534,11 @@ static inline int INIT process_bit1(struct writer *wr, struct rc *rc, -STATIC inline int INIT unlzma(unsigned char *buf, int in_len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC inline int INIT unlzma(unsigned char *buf, long in_len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *posp, + long *posp, void(*error)(char *x) ) { @@ -667,11 +667,11 @@ exit_0: } #ifdef PREBOOT -STATIC int INIT decompress(unsigned char *buf, int in_len, - int(*fill)(void*, unsigned int), - int(*flush)(void*, unsigned int), +STATIC int INIT decompress(unsigned char *buf, long in_len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), unsigned char *output, - int *posp, + long *posp, void(*error)(char *x) ) { diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c index 960183d..b94a31b 100644 --- a/lib/decompress_unlzo.c +++ b/lib/decompress_unlzo.c @@ -51,7 +51,7 @@ static const unsigned char lzop_magic[] = { #define HEADER_SIZE_MIN (9 + 7 + 4 + 8 + 1 + 4) #define HEADER_SIZE_MAX (9 + 7 + 1 + 8 + 8 + 4 + 1 + 255 + 4) -STATIC inline int INIT parse_header(u8 *input, int *skip, int in_len) +STATIC inline long INIT parse_header(u8 *input, long *skip, long in_len) { int l; u8 *parse = input; @@ -108,14 +108,14 @@ STATIC inline int INIT parse_header(u8 *input, int *skip, int in_len) return 1; } -STATIC inline int INIT unlzo(u8 *input, int in_len, - int (*fill) (void *, unsigned int), - int (*flush) (void *, unsigned int), - u8 *output, int *posp, +STATIC int INIT unlzo(u8 *input, long in_len, + long (*fill)(void *, unsigned long), + long (*flush)(void *, unsigned long), + u8 *output, long *posp, void (*error) (char *x)) { u8 r = 0; - int skip = 0; + long skip = 0; u32 src_len, dst_len; size_t tmp; u8 *in_buf, *in_buf_save, *out_buf; diff --git a/lib/decompress_unxz.c b/lib/decompress_unxz.c index 9f34eb5..b07a783 100644 --- a/lib/decompress_unxz.c +++ b/lib/decompress_unxz.c @@ -248,10 +248,10 @@ void *memmove(void *dest, const void *src, size_t size) * both input and output buffers are available as a single chunk, i.e. when * fill() and flush() won't be used. */ -STATIC int INIT unxz(unsigned char *in, int in_size, - int (*fill)(void *dest, unsigned int size), - int (*flush)(void *src, unsigned int size), - unsigned char *out, int *in_used, +STATIC int INIT unxz(unsigned char *in, long in_size, + long (*fill)(void *dest, unsigned long size), + long (*flush)(void *src, unsigned long size), + unsigned char *out, long *in_used, void (*error)(char *x)) { struct xz_buf b; @@ -329,7 +329,7 @@ STATIC int INIT unxz(unsigned char *in, int in_size, * returned by xz_dec_run(), but probably * it's not too bad. */ - if (flush(b.out, b.out_pos) != (int)b.out_pos) + if (flush(b.out, b.out_pos) != (long)b.out_pos) ret = XZ_BUF_ERROR; b.out_pos = 0; -- cgit v0.10.2 From 9687fd9101afaa1c4b1de7ffd2f9d7e53f45b29f Mon Sep 17 00:00:00 2001 From: David Engraf Date: Fri, 8 Aug 2014 14:23:16 -0700 Subject: initramfs: add write error checks On a system with low memory extracting the initramfs may fail. If this happens the user gets "Failed to execute /init" instead of an initramfs error. Check return value of sys_write and call error() when the write was incomplete or failed. Signed-off-by: David Engraf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/initramfs.c b/init/initramfs.c index a756603..bece48c 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -369,7 +369,8 @@ static int __init do_name(void) static int __init do_copy(void) { if (count >= body_len) { - xwrite(wfd, victim, body_len); + if (xwrite(wfd, victim, body_len) != body_len) + error("write error"); sys_close(wfd); do_utime(vcollected, mtime); kfree(vcollected); @@ -377,7 +378,8 @@ static int __init do_copy(void) state = SkipIt; return 0; } else { - xwrite(wfd, victim, count); + if (xwrite(wfd, victim, count) != count) + error("write error"); body_len -= count; eat(count); return 1; -- cgit v0.10.2 From ab602f799159393143d567e5c04b936fec79d6bd Mon Sep 17 00:00:00 2001 From: Jack Miller Date: Fri, 8 Aug 2014 14:23:19 -0700 Subject: shm: make exit_shm work proportional to task activity This is small set of patches our team has had kicking around for a few versions internally that fixes tasks getting hung on shm_exit when there are many threads hammering it at once. Anton wrote a simple test to cause the issue: http://ozlabs.org/~anton/junkcode/bust_shm_exit.c Before applying this patchset, this test code will cause either hanging tracebacks or pthread out of memory errors. After this patchset, it will still produce output like: root@somehost:~# ./bust_shm_exit 1024 160 ... INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113) INFO: Stall ended before state dump start ... But the task will continue to run along happily, so we consider this an improvement over hanging, even if it's a bit noisy. This patch (of 3): exit_shm obtains the ipc_ns shm rwsem for write and holds it while it walks every shared memory segment in the namespace. Thus the amount of work is related to the number of shm segments in the namespace not the number of segments that might need to be cleaned. In addition, this occurs after the task has been notified the thread has exited, so the number of tasks waiting for the ns shm rwsem can grow without bound until memory is exausted. Add a list to the task struct of all shmids allocated by this task. Init the list head in copy_process. Use the ns->rwsem for locking. Add segments after id is added, remove before removing from id. On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar to handling of semaphore undo. I chose a define for the init sequence since its a simple list init, otherwise it would require a function call to avoid include loops between the semaphore code and the task struct. Converting the list_del to list_del_init for the unshare cases would remove the exit followed by init, but I left it blow up if not inited. Signed-off-by: Milton Miller Signed-off-by: Jack Miller Cc: Davidlohr Bueso Cc: Manfred Spraul Cc: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/sched.h b/include/linux/sched.h index b21e921..db2f647 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -33,6 +33,7 @@ struct sched_param { #include #include +#include #include #include #include @@ -1385,6 +1386,7 @@ struct task_struct { #ifdef CONFIG_SYSVIPC /* ipc stuff */ struct sysv_sem sysvsem; + struct sysv_shm sysvshm; #endif #ifdef CONFIG_DETECT_HUNG_TASK /* hung task detection */ diff --git a/include/linux/shm.h b/include/linux/shm.h index 57d7770..fd20638 100644 --- a/include/linux/shm.h +++ b/include/linux/shm.h @@ -1,6 +1,7 @@ #ifndef _LINUX_SHM_H_ #define _LINUX_SHM_H_ +#include #include #include #include @@ -20,6 +21,7 @@ struct shmid_kernel /* private to the kernel */ /* The task created the shm object. NULL if the task is dead. */ struct task_struct *shm_creator; + struct list_head shm_clist; /* list by creator */ }; /* shm_mode upper byte flags */ @@ -44,11 +46,20 @@ struct shmid_kernel /* private to the kernel */ #define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT) #ifdef CONFIG_SYSVIPC +struct sysv_shm { + struct list_head shm_clist; +}; + long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr, unsigned long shmlba); extern int is_file_shm_hugepages(struct file *file); -extern void exit_shm(struct task_struct *task); +void exit_shm(struct task_struct *task); +#define shm_init_task(task) INIT_LIST_HEAD(&(task)->sysvshm.shm_clist) #else +struct sysv_shm { + /* empty */ +}; + static inline long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr, unsigned long shmlba) @@ -62,6 +73,9 @@ static inline int is_file_shm_hugepages(struct file *file) static inline void exit_shm(struct task_struct *task) { } +static inline void shm_init_task(struct task_struct *task) +{ +} #endif #endif /* _LINUX_SHM_H_ */ diff --git a/ipc/shm.c b/ipc/shm.c index 89fc354..1fc3a61 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -178,6 +178,7 @@ static void shm_rcu_free(struct rcu_head *head) static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s) { + list_del(&s->shm_clist); ipc_rmid(&shm_ids(ns), &s->shm_perm); } @@ -268,14 +269,10 @@ static void shm_close(struct vm_area_struct *vma) } /* Called with ns->shm_ids(ns).rwsem locked */ -static int shm_try_destroy_current(int id, void *p, void *data) +static void shm_mark_orphan(struct shmid_kernel *shp, struct ipc_namespace *ns) { - struct ipc_namespace *ns = data; - struct kern_ipc_perm *ipcp = p; - struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm); - - if (shp->shm_creator != current) - return 0; + if (WARN_ON(shp->shm_creator != current)) /* Remove me when it works */ + return; /* * Mark it as orphaned to destroy the segment when @@ -289,13 +286,12 @@ static int shm_try_destroy_current(int id, void *p, void *data) * is not set, it shouldn't be deleted here. */ if (!ns->shm_rmid_forced) - return 0; + return; if (shm_may_destroy(ns, shp)) { shm_lock_by_ptr(shp); shm_destroy(ns, shp); } - return 0; } /* Called with ns->shm_ids(ns).rwsem locked */ @@ -333,14 +329,17 @@ void shm_destroy_orphaned(struct ipc_namespace *ns) void exit_shm(struct task_struct *task) { struct ipc_namespace *ns = task->nsproxy->ipc_ns; + struct shmid_kernel *shp, *n; if (shm_ids(ns).in_use == 0) return; /* Destroy all already created segments, but not mapped yet */ down_write(&shm_ids(ns).rwsem); - if (shm_ids(ns).in_use) - idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns); + list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) + shm_mark_orphan(shp, ns); + /* remove the list head from any segments still attached */ + list_del(&task->sysvshm.shm_clist); up_write(&shm_ids(ns).rwsem); } @@ -561,6 +560,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params) shp->shm_nattch = 0; shp->shm_file = file; shp->shm_creator = current; + list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist); /* * shmid gets reported as "inode#" in /proc/pid/maps. diff --git a/kernel/fork.c b/kernel/fork.c index 86da59e..fa91243 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1362,6 +1362,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (retval) goto bad_fork_cleanup_policy; /* copy all the process information */ + shm_init_task(p); retval = copy_semundo(clone_flags, p); if (retval) goto bad_fork_cleanup_audit; @@ -1913,6 +1914,11 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) */ exit_sem(current); } + if (unshare_flags & CLONE_NEWIPC) { + /* Orphan segments in old ns (see sem above). */ + exit_shm(current); + shm_init_task(current); + } if (new_nsproxy) switch_task_namespaces(current, new_nsproxy); -- cgit v0.10.2 From 83293c0f5a6130bf7d60b7b406f4a4de0cadd3d8 Mon Sep 17 00:00:00 2001 From: Jack Miller Date: Fri, 8 Aug 2014 14:23:21 -0700 Subject: shm: allow exit_shm in parallel if only marking orphans If shm_rmid_force (the default state) is not set then the shmids are only marked as orphaned and does not require any add, delete, or locking of the tree structure. Seperate the sysctl on and off case, and only obtain the read lock. The newly added list head can be deleted under the read lock because we are only called with current and will only change the semids allocated by this task and not manipulate the list. This commit assumes that up_read includes a sufficient memory barrier for the writes to be seen my others that later obtain a write lock. Signed-off-by: Milton Miller Signed-off-by: Jack Miller Cc: Davidlohr Bueso Cc: Manfred Spraul Cc: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/ipc/shm.c b/ipc/shm.c index 1fc3a61..7fc9f9f 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -269,32 +269,6 @@ static void shm_close(struct vm_area_struct *vma) } /* Called with ns->shm_ids(ns).rwsem locked */ -static void shm_mark_orphan(struct shmid_kernel *shp, struct ipc_namespace *ns) -{ - if (WARN_ON(shp->shm_creator != current)) /* Remove me when it works */ - return; - - /* - * Mark it as orphaned to destroy the segment when - * kernel.shm_rmid_forced is changed. - * It is noop if the following shm_may_destroy() returns true. - */ - shp->shm_creator = NULL; - - /* - * Don't even try to destroy it. If shm_rmid_forced=0 and IPC_RMID - * is not set, it shouldn't be deleted here. - */ - if (!ns->shm_rmid_forced) - return; - - if (shm_may_destroy(ns, shp)) { - shm_lock_by_ptr(shp); - shm_destroy(ns, shp); - } -} - -/* Called with ns->shm_ids(ns).rwsem locked */ static int shm_try_destroy_orphaned(int id, void *p, void *data) { struct ipc_namespace *ns = data; @@ -325,20 +299,49 @@ void shm_destroy_orphaned(struct ipc_namespace *ns) up_write(&shm_ids(ns).rwsem); } - +/* Locking assumes this will only be called with task == current */ void exit_shm(struct task_struct *task) { struct ipc_namespace *ns = task->nsproxy->ipc_ns; struct shmid_kernel *shp, *n; - if (shm_ids(ns).in_use == 0) + if (list_empty(&task->sysvshm.shm_clist)) + return; + + /* + * If kernel.shm_rmid_forced is not set then only keep track of + * which shmids are orphaned, so that a later set of the sysctl + * can clean them up. + */ + if (!ns->shm_rmid_forced) { + down_read(&shm_ids(ns).rwsem); + list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist) + shp->shm_creator = NULL; + /* + * Only under read lock but we are only called on current + * so no entry on the list will be shared. + */ + list_del(&task->sysvshm.shm_clist); + up_read(&shm_ids(ns).rwsem); return; + } - /* Destroy all already created segments, but not mapped yet */ + /* + * Destroy all already created segments, that were not yet mapped, + * and mark any mapped as orphan to cover the sysctl toggling. + * Destroy is skipped if shm_may_destroy() returns false. + */ down_write(&shm_ids(ns).rwsem); - list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) - shm_mark_orphan(shp, ns); - /* remove the list head from any segments still attached */ + list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) { + shp->shm_creator = NULL; + + if (shm_may_destroy(ns, shp)) { + shm_lock_by_ptr(shp); + shm_destroy(ns, shp); + } + } + + /* Remove the list head from any segments still attached. */ list_del(&task->sysvshm.shm_clist); up_write(&shm_ids(ns).rwsem); } -- cgit v0.10.2 From 2f137d66fb65ef41df6e558f23d481f07394a424 Mon Sep 17 00:00:00 2001 From: Jack Miller Date: Fri, 8 Aug 2014 14:23:23 -0700 Subject: shm: remove unneeded extern for function A small cleanup while changing adjacent code. Extern is not needed for functions and only one declaration had it so remove it from the odd line. Signed-off-by: Milton Miller Signed-off-by: Jack Miller Cc: Davidlohr Bueso Cc: Manfred Spraul Cc: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/shm.h b/include/linux/shm.h index fd20638..6fb8016 100644 --- a/include/linux/shm.h +++ b/include/linux/shm.h @@ -52,7 +52,7 @@ struct sysv_shm { long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr, unsigned long shmlba); -extern int is_file_shm_hugepages(struct file *file); +int is_file_shm_hugepages(struct file *file); void exit_shm(struct task_struct *task); #define shm_init_task(task) INIT_LIST_HEAD(&(task)->sysvshm.shm_clist) #else -- cgit v0.10.2 From 308c09f17da4adc53935115dbeb5bce4f067d8f9 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Fri, 8 Aug 2014 14:23:25 -0700 Subject: lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture specific scatterlist.h, make it a proper Kconfig option and use that instead. At same time, remove the header files are are now mostly useless and just include asm-generic/scatterlist.h. [sfr@canb.auug.org.au: powerpc files now need asm/dma.h] Signed-off-by: Laura Abbott Acked-by: Thomas Gleixner [x86] Acked-by: Benjamin Herrenschmidt [powerpc] Acked-by: Heiko Carstens Cc: Russell King Cc: Tony Luck Cc: Fenghua Yu Cc: Paul Mackerras Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: "James E.J. Bottomley" Cc: Martin Schwidefsky Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index d31c500..8e9dbcb 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -83,6 +83,7 @@ config ARM . config ARM_HAS_SG_CHAIN + select ARCH_HAS_SG_CHAIN bool config NEED_SG_DMA_LENGTH diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index f5a3576..70cd84e 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -22,6 +22,7 @@ generic-y += poll.h generic-y += preempt.h generic-y += resource.h generic-y += rwsem.h +generic-y += scatterlist.h generic-y += sections.h generic-y += segment.h generic-y += sembuf.h diff --git a/arch/arm/include/asm/scatterlist.h b/arch/arm/include/asm/scatterlist.h deleted file mode 100644 index cefdb8f..0000000 --- a/arch/arm/include/asm/scatterlist.h +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef _ASMARM_SCATTERLIST_H -#define _ASMARM_SCATTERLIST_H - -#ifdef CONFIG_ARM_HAS_SG_CHAIN -#define ARCH_HAS_SG_CHAIN -#endif - -#include -#include -#include - -#endif /* _ASMARM_SCATTERLIST_H */ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b0f9c9d..fd4e81a 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1,6 +1,7 @@ config ARM64 def_bool y select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_USE_CMPXCHG_LOCKREF select ARCH_SUPPORTS_ATOMIC_RMW diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 44a6915..c84c88b 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -28,6 +28,7 @@ config IA64 select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP select HAVE_VIRT_CPU_ACCOUNTING + select ARCH_HAS_SG_CHAIN select VIRT_TO_BUS select ARCH_DISCARD_MEMBLOCK select GENERIC_IRQ_PROBE diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index 0da4aa2..e8317d2 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -5,5 +5,6 @@ generic-y += hash.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vtime.h diff --git a/arch/ia64/include/asm/scatterlist.h b/arch/ia64/include/asm/scatterlist.h deleted file mode 100644 index 08fd93b..0000000 --- a/arch/ia64/include/asm/scatterlist.h +++ /dev/null @@ -1,7 +0,0 @@ -#ifndef _ASM_IA64_SCATTERLIST_H -#define _ASM_IA64_SCATTERLIST_H - -#include -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_IA64_SCATTERLIST_H */ diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 80b94b0..4bc7b62 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -111,6 +111,7 @@ config PPC select HAVE_DMA_API_DEBUG select HAVE_OPROFILE select HAVE_DEBUG_KMEMLEAK + select ARCH_HAS_SG_CHAIN select GENERIC_ATOMIC64 if PPC32 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select HAVE_PERF_EVENTS diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 3fb1bc4..7f23f16 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -4,5 +4,6 @@ generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h generic-y += rwsem.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vtime.h diff --git a/arch/powerpc/include/asm/scatterlist.h b/arch/powerpc/include/asm/scatterlist.h deleted file mode 100644 index de1f620..0000000 --- a/arch/powerpc/include/asm/scatterlist.h +++ /dev/null @@ -1,17 +0,0 @@ -#ifndef _ASM_POWERPC_SCATTERLIST_H -#define _ASM_POWERPC_SCATTERLIST_H -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include - -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_POWERPC_SCATTERLIST_H */ diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 7b6c107..d85e86a 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -33,6 +33,7 @@ #include #include +#include #include "mmu_decl.h" diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c index 534574a..3a10428 100644 --- a/arch/powerpc/platforms/44x/warp.c +++ b/arch/powerpc/platforms/44x/warp.c @@ -25,6 +25,7 @@ #include #include #include +#include static __initdata struct of_device_id warp_of_bus[] = { diff --git a/arch/powerpc/platforms/52xx/efika.c b/arch/powerpc/platforms/52xx/efika.c index 6e19b0a..3feffde 100644 --- a/arch/powerpc/platforms/52xx/efika.c +++ b/arch/powerpc/platforms/52xx/efika.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/powerpc/platforms/amigaone/setup.c b/arch/powerpc/platforms/amigaone/setup.c index 03aabc0..2fe1204 100644 --- a/arch/powerpc/platforms/amigaone/setup.c +++ b/arch/powerpc/platforms/amigaone/setup.c @@ -24,6 +24,7 @@ #include #include #include +#include extern void __flush_disable_L1(void); diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 8ca60f8..05c78bb 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -145,6 +145,7 @@ config S390 select TTY select VIRT_CPU_ACCOUNTING select VIRT_TO_BUS + select ARCH_HAS_SG_CHAIN config SCHED_OMIT_FRAME_POINTER def_bool y diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 57892a8..b3fea07 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -4,4 +4,5 @@ generic-y += clkdev.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/s390/include/asm/scatterlist.h b/arch/s390/include/asm/scatterlist.h deleted file mode 100644 index 6d45ef6..0000000 --- a/arch/s390/include/asm/scatterlist.h +++ /dev/null @@ -1,3 +0,0 @@ -#include - -#define ARCH_HAS_SG_CHAIN diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 4692c90..a537816 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -42,6 +42,7 @@ config SPARC select MODULES_USE_ELF_RELA select ODD_RT_SIGACTION select OLD_SIGSUSPEND + select ARCH_HAS_SG_CHAIN config SPARC32 def_bool !64BIT diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index a458218..cdd1b44 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -15,6 +15,7 @@ generic-y += mcs_spinlock.h generic-y += module.h generic-y += mutex.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += serial.h generic-y += trace_clock.h generic-y += types.h diff --git a/arch/sparc/include/asm/scatterlist.h b/arch/sparc/include/asm/scatterlist.h deleted file mode 100644 index 92bb638..0000000 --- a/arch/sparc/include/asm/scatterlist.h +++ /dev/null @@ -1,8 +0,0 @@ -#ifndef _SPARC_SCATTERLIST_H -#define _SPARC_SCATTERLIST_H - -#include - -#define ARCH_HAS_SG_CHAIN - -#endif /* !(_SPARC_SCATTERLIST_H) */ diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index a5e4b60..7bd64aa 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -21,6 +21,7 @@ generic-y += param.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += sections.h generic-y += switch_to.h generic-y += topology.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bf24050..c915cc6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -96,6 +96,7 @@ config X86 select IRQ_FORCED_THREADING select HAVE_BPF_JIT if X86_64 select HAVE_ARCH_TRANSPARENT_HUGEPAGE + select ARCH_HAS_SG_CHAIN select CLKEVT_I8253 select ARCH_HAVE_NMI_SAFE_CMPXCHG select GENERIC_IOMAP diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild index 3ca9762..3bf000f 100644 --- a/arch/x86/include/asm/Kbuild +++ b/arch/x86/include/asm/Kbuild @@ -5,6 +5,7 @@ genhdr-y += unistd_64.h genhdr-y += unistd_x32.h generic-y += clkdev.h -generic-y += early_ioremap.h generic-y += cputime.h +generic-y += early_ioremap.h generic-y += mcs_spinlock.h +generic-y += scatterlist.h diff --git a/arch/x86/include/asm/scatterlist.h b/arch/x86/include/asm/scatterlist.h deleted file mode 100644 index 4240878..0000000 --- a/arch/x86/include/asm/scatterlist.h +++ /dev/null @@ -1,8 +0,0 @@ -#ifndef _ASM_X86_SCATTERLIST_H -#define _ASM_X86_SCATTERLIST_H - -#include - -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_X86_SCATTERLIST_H */ diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index f4ec8bb..ed8f9e7 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -136,7 +136,7 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents, struct scatterlist *sgl) { -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN BUG(); #endif diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index e6df23c..261e708 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -31,7 +31,7 @@ enum scsi_timeouts { * Like SCSI_MAX_SG_SEGMENTS, but for archs that have sg chaining. This limit * is totally arbitrary, a setting of 2048 will get you at least 8mb ios. */ -#ifdef ARCH_HAS_SG_CHAIN +#ifdef CONFIG_ARCH_HAS_SG_CHAIN #define SCSI_MAX_SG_CHAIN_SEGMENTS 2048 #else #define SCSI_MAX_SG_CHAIN_SEGMENTS SCSI_MAX_SG_SEGMENTS diff --git a/lib/Kconfig b/lib/Kconfig index df87265..a5ce0c7 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -508,4 +508,11 @@ config UCS2_STRING source "lib/fonts/Kconfig" +# +# sg chaining option +# + +config ARCH_HAS_SG_CHAIN + def_bool n + endmenu diff --git a/lib/scatterlist.c b/lib/scatterlist.c index b4415fc..9cdf62f 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -73,7 +73,7 @@ EXPORT_SYMBOL(sg_nents); **/ struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents) { -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN struct scatterlist *ret = &sgl[nents - 1]; #else struct scatterlist *sg, *ret = NULL; @@ -255,7 +255,7 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents, if (nents == 0) return -EINVAL; -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN if (WARN_ON_ONCE(nents > max_ents)) return -EINVAL; #endif -- cgit v0.10.2 From 56106da7932074804f9ef70191f126ea15dc3fc4 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Fri, 8 Aug 2014 14:23:27 -0700 Subject: lib/scatterlist: clean up useless architecture versions of scatterlist.h There's no need to have an architecture version of scatterlist.h if the only thing the file does is include asm-generic/scatterlist.h. Switch to the asm-generic versions directly. Acked-by: Jesper Nilsson Acked-by: David Howells Signed-off-by: Laura Abbott Cc: Lennox Wu Cc: Chen Liqin , Cc: Koichi Yasutake Cc: Michal Simek Cc: Hirokazu Takata Cc: Mikael Starvik Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Heiko Carstens Cc: Russell King Cc: Tony Luck Cc: Fenghua Yu Cc: Paul Mackerras Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: "James E.J. Bottomley" Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 96e54be..e858aa0 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/alpha/include/asm/scatterlist.h b/arch/alpha/include/asm/scatterlist.h deleted file mode 100644 index 017d747..0000000 --- a/arch/alpha/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ALPHA_SCATTERLIST_H -#define _ALPHA_SCATTERLIST_H - -#include - -#endif /* !(_ALPHA_SCATTERLIST_H) */ diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild index afff510..31742df 100644 --- a/arch/cris/include/asm/Kbuild +++ b/arch/cris/include/asm/Kbuild @@ -13,6 +13,7 @@ generic-y += linkage.h generic-y += mcs_spinlock.h generic-y += module.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vga.h generic-y += xor.h diff --git a/arch/cris/include/asm/scatterlist.h b/arch/cris/include/asm/scatterlist.h deleted file mode 100644 index f11f8f4..0000000 --- a/arch/cris/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_CRIS_SCATTERLIST_H -#define __ASM_CRIS_SCATTERLIST_H - -#include - -#endif /* !(__ASM_CRIS_SCATTERLIST_H) */ diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild index 87b95eb..5b73921 100644 --- a/arch/frv/include/asm/Kbuild +++ b/arch/frv/include/asm/Kbuild @@ -5,4 +5,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/frv/include/asm/scatterlist.h b/arch/frv/include/asm/scatterlist.h deleted file mode 100644 index 0e5eb30..0000000 --- a/arch/frv/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_SCATTERLIST_H -#define _ASM_SCATTERLIST_H - -#include - -#endif /* !_ASM_SCATTERLIST_H */ diff --git a/arch/m32r/include/asm/Kbuild b/arch/m32r/include/asm/Kbuild index 67779a7..accc10a 100644 --- a/arch/m32r/include/asm/Kbuild +++ b/arch/m32r/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += hash.h generic-y += mcs_spinlock.h generic-y += module.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/m32r/include/asm/scatterlist.h b/arch/m32r/include/asm/scatterlist.h deleted file mode 100644 index 7370b8b..0000000 --- a/arch/m32r/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_M32R_SCATTERLIST_H -#define _ASM_M32R_SCATTERLIST_H - -#include - -#endif /* _ASM_M32R_SCATTERLIST_H */ diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index 35b3ecaf..27a3acd 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -7,5 +7,6 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += syscalls.h generic-y += trace_clock.h diff --git a/arch/microblaze/include/asm/scatterlist.h b/arch/microblaze/include/asm/scatterlist.h deleted file mode 100644 index 35d786f..0000000 --- a/arch/microblaze/include/asm/scatterlist.h +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/arch/mn10300/include/asm/Kbuild b/arch/mn10300/include/asm/Kbuild index 654d5ba..ecbd667 100644 --- a/arch/mn10300/include/asm/Kbuild +++ b/arch/mn10300/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/mn10300/include/asm/scatterlist.h b/arch/mn10300/include/asm/scatterlist.h deleted file mode 100644 index 7baa400..0000000 --- a/arch/mn10300/include/asm/scatterlist.h +++ /dev/null @@ -1,16 +0,0 @@ -/* MN10300 Scatterlist definitions - * - * Copyright (C) 2007 Matsushita Electric Industrial Co., Ltd. - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public Licence - * as published by the Free Software Foundation; either version - * 2 of the Licence, or (at your option) any later version. - */ -#ifndef _ASM_SCATTERLIST_H -#define _ASM_SCATTERLIST_H - -#include - -#endif /* _ASM_SCATTERLIST_H */ diff --git a/arch/score/include/asm/Kbuild b/arch/score/include/asm/Kbuild index 2f947ab..aad2091 100644 --- a/arch/score/include/asm/Kbuild +++ b/arch/score/include/asm/Kbuild @@ -8,5 +8,6 @@ generic-y += cputime.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += xor.h diff --git a/arch/score/include/asm/scatterlist.h b/arch/score/include/asm/scatterlist.h deleted file mode 100644 index 9f533b8..0000000 --- a/arch/score/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_SCORE_SCATTERLIST_H -#define _ASM_SCORE_SCATTERLIST_H - -#include - -#endif /* _ASM_SCORE_SCATTERLIST_H */ -- cgit v0.10.2 From 791dfeb49519f18a6efea4619f5678457f1532ff Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:29 -0700 Subject: scripts/coccinelle/free: add NULL test before freeing functions Warns or generates patch for NULL check before the following functions: kfree usb_free_urb debugfs_remove debugfs_remove_recursive Signed-off-by: Fabian Frederick Acked-by: Julia Lawall Cc: Gilles Muller Cc: Joe Perches Cc: Markus Elfring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/coccinelle/free/ifnullfree.cocci b/scripts/coccinelle/free/ifnullfree.cocci new file mode 100644 index 0000000..77ab43f --- /dev/null +++ b/scripts/coccinelle/free/ifnullfree.cocci @@ -0,0 +1,52 @@ +/// NULL check before some freeing functions is not needed. +/// +/// Based on checkpatch warning +/// "kfree(NULL) is safe this check is probably not required" +/// and kfreeaddr.cocci by Julia Lawall. +/// +/// Comments: - +/// Options: --no-includes --include-headers + +virtual patch +virtual org +virtual report +virtual context + +@r2 depends on patch@ +expression E; +@@ +- if (E) +( +- kfree(E); ++ kfree(E); +| +- debugfs_remove(E); ++ debugfs_remove(E); +| +- debugfs_remove_recursive(E); ++ debugfs_remove_recursive(E); +| +- usb_free_urb(E); ++ usb_free_urb(E); +) + +@r depends on context || report || org @ +expression E; +position p; +@@ + +* if (E) +* \(kfree@p\|debugfs_remove@p\|debugfs_remove_recursive@p\|usb_free_urb\)(E); + +@script:python depends on org@ +p << r.p; +@@ + +cocci.print_main("NULL check before that freeing function is not needed", p) + +@script:python depends on report@ +p << r.p; +@@ + +msg = "WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values." +coccilib.report.print_report(p[0], msg) -- cgit v0.10.2 From 45715f33d447ba878a3ca81673514785bb4dfe5a Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:31 -0700 Subject: scripts/coccinelle/free/ifnullfree.cocci: add copyright information All coccinelle scripts have a copyright in the header. Signed-off-by: Fabian Frederick Suggested-by: Julia Lawall Acked-by: Julia Lawall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/coccinelle/free/ifnullfree.cocci b/scripts/coccinelle/free/ifnullfree.cocci index 77ab43f..a42d70b 100644 --- a/scripts/coccinelle/free/ifnullfree.cocci +++ b/scripts/coccinelle/free/ifnullfree.cocci @@ -4,8 +4,9 @@ /// "kfree(NULL) is safe this check is probably not required" /// and kfreeaddr.cocci by Julia Lawall. /// -/// Comments: - -/// Options: --no-includes --include-headers +// Copyright: (C) 2014 Fabian Frederick. GPLv2. +// Comments: - +// Options: --no-includes --include-headers virtual patch virtual org -- cgit v0.10.2 From c965526a82a62a97363ea7f3caae0e4a8a809bb9 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 8 Aug 2014 14:23:33 -0700 Subject: scripts/tags.sh: include compat_sys_* symbols in the generated tags Since the kernel now has a COMPAT_SYSCALL infrastructure via commit 468366138850 ("COMPAT_SYSCALL_DEFINE: infrastructure"), add the corresponding regex for generating compat_sys_* symbols in the tags files (similar to sys_*). Signed-off-by: Catalin Marinas Cc: Alexander Viro Cc: Michal Marek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/tags.sh b/scripts/tags.sh index e6b011f..cbfd269 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -168,6 +168,7 @@ exuberant() --extra=+f --c-kinds=+px \ --regex-asm='/^(ENTRY|_GLOBAL)\(([^)]*)\).*/\2/' \ --regex-c='/^SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/sys_\1/' \ + --regex-c='/^COMPAT_SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/compat_sys_\1/' \ --regex-c++='/^TRACE_EVENT\(([^,)]*).*/trace_\1/' \ --regex-c++='/^DEFINE_EVENT\([^,)]*, *([^,)]*).*/trace_\1/' \ --regex-c++='/PAGEFLAG\(([^,)]*).*/Page\1/' \ @@ -231,6 +232,7 @@ emacs() all_target_sources | xargs $1 -a \ --regex='/^\(ENTRY\|_GLOBAL\)(\([^)]*\)).*/\2/' \ --regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/' \ + --regex='/^COMPAT_SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/compat_sys_\1/' \ --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/' \ --regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1/' \ --regex='/PAGEFLAG(\([^,)]*\).*/Page\1/' \ -- cgit v0.10.2 From fda9f9903be6c3b590472c175c514b0834bb3c83 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 8 Aug 2014 14:23:35 -0700 Subject: scripts/checkstack.pl: automatically handle 32-bit and 64-bit mode for ARCH=x86 This patch adds support for ARCH=x86 into checkstack. Commit ffee0de411fd ("x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT") had merged ARCH=i386 and ARCH=x86_64 into one ARCH=x86. checkstack.pl searches patterns of machine instructions which are usually used for allocating stack frames. checkstalk.pl needs either i386 or x86_64, x86 isn't enough: $ make checkstack objdump -d vmlinux $(find . -name '*.ko') | \ perl linux/scripts/checkstack.pl x86 wrong or unknown architecture "x86" Signed-off-by: Konstantin Khlebnikov Cc: David Woodhouse Cc: "H. Peter Anvin" Cc: Richard Weinberger Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl index c05d586..899b423 100755 --- a/scripts/checkstack.pl +++ b/scripts/checkstack.pl @@ -52,14 +52,12 @@ my (@stack, $re, $dre, $x, $xs, $funcre); #8000008a: 20 1d sub sp,4 #80000ca8: fa cd 05 b0 sub sp,sp,1456 $re = qr/^.*sub.*sp.*,([0-9]{1,8})/o; - } elsif ($arch =~ /^i[3456]86$/) { + } elsif ($arch =~ /^x86(_64)?$/ || $arch =~ /^i[3456]86$/) { #c0105234: 81 ec ac 05 00 00 sub $0x5ac,%esp - $re = qr/^.*[as][du][db] \$(0x$x{1,8}),\%esp$/o; - $dre = qr/^.*[as][du][db] (%.*),\%esp$/o; - } elsif ($arch eq 'x86_64') { - # 2f60: 48 81 ec e8 05 00 00 sub $0x5e8,%rsp - $re = qr/^.*[as][du][db] \$(0x$x{1,8}),\%rsp$/o; - $dre = qr/^.*[as][du][db] (\%.*),\%rsp$/o; + # or + # 2f60: 48 81 ec e8 05 00 00 sub $0x5e8,%rsp + $re = qr/^.*[as][du][db] \$(0x$x{1,8}),\%(e|r)sp$/o; + $dre = qr/^.*[as][du][db] (%.*),\%(e|r)sp$/o; } elsif ($arch eq 'ia64') { #e0000000044011fc: 01 0f fc 8c adds r12=-384,r12 $re = qr/.*adds.*r12=-(([0-9]{2}|[3-9])[0-9]{2}),r12/o; -- cgit v0.10.2 From e0d9bf4cc0888befd00b1a7db383681be68aada9 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:37 -0700 Subject: fs/dlm/debug_fs.c: remove unnecessary null test before debugfs_remove This fixes checkpatch warning: WARNING: debugfs_remove(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick Cc: Christine Caulfield Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 8d77ba7..1323c56 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -718,16 +718,11 @@ static const struct file_operations waiters_fops = { void dlm_delete_debug_file(struct dlm_ls *ls) { - if (ls->ls_debug_rsb_dentry) - debugfs_remove(ls->ls_debug_rsb_dentry); - if (ls->ls_debug_waiters_dentry) - debugfs_remove(ls->ls_debug_waiters_dentry); - if (ls->ls_debug_locks_dentry) - debugfs_remove(ls->ls_debug_locks_dentry); - if (ls->ls_debug_all_dentry) - debugfs_remove(ls->ls_debug_all_dentry); - if (ls->ls_debug_toss_dentry) - debugfs_remove(ls->ls_debug_toss_dentry); + debugfs_remove(ls->ls_debug_rsb_dentry); + debugfs_remove(ls->ls_debug_waiters_dentry); + debugfs_remove(ls->ls_debug_locks_dentry); + debugfs_remove(ls->ls_debug_all_dentry); + debugfs_remove(ls->ls_debug_toss_dentry); } int dlm_create_debug_file(struct dlm_ls *ls) -- cgit v0.10.2 From a6c19dfe39941a5d3f4d072121c0a4841e7e26fd Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Fri, 8 Aug 2014 14:23:40 -0700 Subject: arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: Andy Lutomirski Acked-by: Nathan Lynch Acked-by: H. Peter Anvin Acked-by: Benjamin Herrenschmidt [in principle] Acked-by: Richard Weinberger [for um] Acked-by: Will Deacon [for arm64] Cc: Catalin Marinas Cc: Will Deacon Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Chris Metcalf Cc: Jeff Dike Cc: Richard Weinberger Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Nathan Lynch Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 7a3f462..22b1623 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -28,9 +28,6 @@ #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) -/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */ -#define __HAVE_ARCH_GATE_AREA 1 - /* * The idmap and swapper page tables need some space reserved in the kernel * image. Both require pgd, pud (4 levels only) and pmd tables to (section) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index a81a446..32aeea0 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -195,25 +195,6 @@ up_fail: } /* - * We define AT_SYSINFO_EHDR, so we need these function stubs to keep - * Linux happy. - */ -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -/* * Update the vDSO data page to keep in sync with kernel timekeeping. */ void update_vsyscall(struct timekeeper *tk) diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h index f1e1b2e..1f1bf14 100644 --- a/arch/ia64/include/asm/page.h +++ b/arch/ia64/include/asm/page.h @@ -231,4 +231,6 @@ get_order (unsigned long size) #define PERCPU_ADDR (-PERCPU_PAGE_SIZE) #define LOAD_OFFSET (KERNEL_START - KERNEL_TR_PAGE_SIZE) +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* _ASM_IA64_PAGE_H */ diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 892d43e..6b33457 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -278,6 +278,37 @@ setup_gate (void) ia64_patch_gate(); } +static struct vm_area_struct gate_vma; + +static int __init gate_vma_init(void) +{ + gate_vma.vm_mm = NULL; + gate_vma.vm_start = FIXADDR_USER_START; + gate_vma.vm_end = FIXADDR_USER_END; + gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC; + gate_vma.vm_page_prot = __P101; + + return 0; +} +__initcall(gate_vma_init); + +struct vm_area_struct *get_gate_vma(struct mm_struct *mm) +{ + return &gate_vma; +} + +int in_gate_area_no_mm(unsigned long addr) +{ + if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END)) + return 1; + return 0; +} + +int in_gate_area(struct mm_struct *mm, unsigned long addr) +{ + return in_gate_area_no_mm(addr); +} + void ia64_mmu_init(void *my_cpu_data) { unsigned long pta, impl_va_bits; diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 32e4e21..26fe1ae 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -48,9 +48,6 @@ extern unsigned int HPAGE_SHIFT; #define HUGE_MAX_HSTATE (MMU_PAGE_COUNT-1) #endif -/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */ -#define __HAVE_ARCH_GATE_AREA 1 - /* * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we * assign PAGE_MASK to a larger type it gets extended the way we want diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index ce74c33..f174351 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -840,19 +840,3 @@ static int __init vdso_init(void) return 0; } arch_initcall(vdso_init); - -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index 114258e..7b2ac6e 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -162,6 +162,4 @@ static inline int devmem_is_allowed(unsigned long pfn) #include #include -#define __HAVE_ARCH_GATE_AREA 1 - #endif /* _S390_PAGE_H */ diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c index 6136490..0bbb7e0 100644 --- a/arch/s390/kernel/vdso.c +++ b/arch/s390/kernel/vdso.c @@ -316,18 +316,3 @@ static int __init vdso_init(void) return 0; } early_initcall(vdso_init); - -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} diff --git a/arch/sh/include/asm/page.h b/arch/sh/include/asm/page.h index 15d9703..fe20d14 100644 --- a/arch/sh/include/asm/page.h +++ b/arch/sh/include/asm/page.h @@ -186,11 +186,6 @@ typedef struct page *pgtable_t; #include #include -/* vDSO support */ -#ifdef CONFIG_VSYSCALL -#define __HAVE_ARCH_GATE_AREA -#endif - /* * Some drivers need to perform DMA into kmalloc'ed buffers * and so we have to increase the kmalloc minalign for this. diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c index 5ca5797..ea2aa13 100644 --- a/arch/sh/kernel/vsyscall/vsyscall.c +++ b/arch/sh/kernel/vsyscall/vsyscall.c @@ -92,18 +92,3 @@ const char *arch_vma_name(struct vm_area_struct *vma) return NULL; } - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -int in_gate_area(struct mm_struct *mm, unsigned long address) -{ - return 0; -} - -int in_gate_area_no_mm(unsigned long address) -{ - return 0; -} diff --git a/arch/tile/include/asm/page.h b/arch/tile/include/asm/page.h index 6727680..a213a8d 100644 --- a/arch/tile/include/asm/page.h +++ b/arch/tile/include/asm/page.h @@ -39,12 +39,6 @@ #define HPAGE_MASK (~(HPAGE_SIZE - 1)) /* - * We do define AT_SYSINFO_EHDR to support vDSO, - * but don't use the gate mechanism. - */ -#define __HAVE_ARCH_GATE_AREA 1 - -/* * If the Kconfig doesn't specify, set a maximum zone order that * is enough so that we can create huge pages from small pages given * the respective sizes of the two page types. See . diff --git a/arch/tile/kernel/vdso.c b/arch/tile/kernel/vdso.c index 1533af2..5bc51d7 100644 --- a/arch/tile/kernel/vdso.c +++ b/arch/tile/kernel/vdso.c @@ -121,21 +121,6 @@ const char *arch_vma_name(struct vm_area_struct *vma) return NULL; } -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -int in_gate_area(struct mm_struct *mm, unsigned long address) -{ - return 0; -} - -int in_gate_area_no_mm(unsigned long address) -{ - return 0; -} - int setup_vdso_pages(void) { struct page **pagelist; diff --git a/arch/um/include/asm/page.h b/arch/um/include/asm/page.h index 5ff53d9..71c5d13 100644 --- a/arch/um/include/asm/page.h +++ b/arch/um/include/asm/page.h @@ -119,4 +119,9 @@ extern unsigned long uml_physmem; #include #endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_X86_32 +#define __HAVE_ARCH_GATE_AREA 1 +#endif + #endif /* __UM_PAGE_H */ diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 775873d..802dde3 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -70,7 +70,6 @@ extern bool __virt_addr_valid(unsigned long kaddr); #include #include -#define __HAVE_ARCH_GATE_AREA 1 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #endif /* __KERNEL__ */ diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 0f1ddee..f408caf 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -39,4 +39,6 @@ void copy_page(void *to, void *from); #endif /* !__ASSEMBLY__ */ +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* _ASM_X86_PAGE_64_H */ diff --git a/arch/x86/um/asm/elf.h b/arch/x86/um/asm/elf.h index 0feee2f..25a1022 100644 --- a/arch/x86/um/asm/elf.h +++ b/arch/x86/um/asm/elf.h @@ -216,6 +216,5 @@ extern long elf_aux_hwcap; #define ELF_HWCAP (elf_aux_hwcap) #define SET_PERSONALITY(ex) do ; while(0) -#define __HAVE_ARCH_GATE_AREA 1 #endif diff --git a/arch/x86/um/mem_64.c b/arch/x86/um/mem_64.c index c6492e7..f8fecad 100644 --- a/arch/x86/um/mem_64.c +++ b/arch/x86/um/mem_64.c @@ -9,18 +9,3 @@ const char *arch_vma_name(struct vm_area_struct *vma) return NULL; } - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c index e4f7781..e904c27 100644 --- a/arch/x86/vdso/vdso32-setup.c +++ b/arch/x86/vdso/vdso32-setup.c @@ -115,23 +115,6 @@ static __init int ia32_binfmt_init(void) return 0; } __initcall(ia32_binfmt_init); -#endif - -#else /* CONFIG_X86_32 */ - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} +#endif /* CONFIG_SYSCTL */ #endif /* CONFIG_X86_64 */ diff --git a/include/linux/mm.h b/include/linux/mm.h index e03dd29..8981cc8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2014,13 +2014,20 @@ static inline bool kernel_page_present(struct page *page) { return true; } #endif /* CONFIG_HIBERNATION */ #endif +#ifdef __HAVE_ARCH_GATE_AREA extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm); -#ifdef __HAVE_ARCH_GATE_AREA -int in_gate_area_no_mm(unsigned long addr); -int in_gate_area(struct mm_struct *mm, unsigned long addr); +extern int in_gate_area_no_mm(unsigned long addr); +extern int in_gate_area(struct mm_struct *mm, unsigned long addr); #else -int in_gate_area_no_mm(unsigned long addr); -#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);}) +static inline struct vm_area_struct *get_gate_vma(struct mm_struct *mm) +{ + return NULL; +} +static inline int in_gate_area_no_mm(unsigned long addr) { return 0; } +static inline int in_gate_area(struct mm_struct *mm, unsigned long addr) +{ + return 0; +} #endif /* __HAVE_ARCH_GATE_AREA */ #ifdef CONFIG_SYSCTL diff --git a/mm/memory.c b/mm/memory.c index 2a899e4..ab3537b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3430,44 +3430,6 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) } #endif /* __PAGETABLE_PMD_FOLDED */ -#if !defined(__HAVE_ARCH_GATE_AREA) - -#if defined(AT_SYSINFO_EHDR) -static struct vm_area_struct gate_vma; - -static int __init gate_vma_init(void) -{ - gate_vma.vm_mm = NULL; - gate_vma.vm_start = FIXADDR_USER_START; - gate_vma.vm_end = FIXADDR_USER_END; - gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC; - gate_vma.vm_page_prot = __P101; - - return 0; -} -__initcall(gate_vma_init); -#endif - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ -#ifdef AT_SYSINFO_EHDR - return &gate_vma; -#else - return NULL; -#endif -} - -int in_gate_area_no_mm(unsigned long addr) -{ -#ifdef AT_SYSINFO_EHDR - if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END)) - return 1; -#endif - return 0; -} - -#endif /* __HAVE_ARCH_GATE_AREA */ - static int __follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp, spinlock_t **ptlp) { diff --git a/mm/nommu.c b/mm/nommu.c index 4a852f6..a881d96 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1981,11 +1981,6 @@ error: return -ENOMEM; } -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { BUG(); -- cgit v0.10.2 From 934fc295b30ea8ce5d5e0ab9024a10fab9b6f200 Mon Sep 17 00:00:00 2001 From: Ionut Alexa Date: Fri, 8 Aug 2014 14:23:42 -0700 Subject: kernel/acct.c: fix coding style warnings and errors Signed-off-by: Ionut Alexa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/acct.c b/kernel/acct.c index a1844f1..5179352 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -141,12 +141,12 @@ static int check_free_space(struct bsd_acct_struct *acct, struct file *file) if (acct->active) { if (act < 0) { acct->active = 0; - printk(KERN_INFO "Process accounting paused\n"); + pr_info("Process accounting paused\n"); } } else { if (act > 0) { acct->active = 1; - printk(KERN_INFO "Process accounting resumed\n"); + pr_info("Process accounting resumed\n"); } } @@ -261,6 +261,7 @@ SYSCALL_DEFINE1(acct, const char __user *, name) if (name) { struct filename *tmp = getname(name); + if (IS_ERR(tmp)) return PTR_ERR(tmp); error = acct_on(tmp); @@ -376,7 +377,7 @@ static comp_t encode_comp_t(unsigned long value) return exp; } -#if ACCT_VERSION==1 || ACCT_VERSION==2 +#if ACCT_VERSION == 1 || ACCT_VERSION == 2 /* * encode an u64 into a comp2_t (24 bits) * @@ -389,7 +390,7 @@ static comp_t encode_comp_t(unsigned long value) #define MANTSIZE2 20 /* 20 bit mantissa. */ #define EXPSIZE2 5 /* 5 bit base 2 exponent. */ #define MAXFRACT2 ((1ul << MANTSIZE2) - 1) /* Maximum fractional value. */ -#define MAXEXP2 ((1 < 0){ + if (value == 0) + return 0; + while ((s64)value > 0) { value <<= 1; exp--; } @@ -486,16 +488,17 @@ static void do_acct_process(struct bsd_acct_struct *acct, run_time -= current->group_leader->start_time; /* convert nsec -> AHZ */ elapsed = nsec_to_AHZ(run_time); -#if ACCT_VERSION==3 +#if ACCT_VERSION == 3 ac.ac_etime = encode_float(elapsed); #else ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ? - (unsigned long) elapsed : (unsigned long) -1l); + (unsigned long) elapsed : (unsigned long) -1l); #endif -#if ACCT_VERSION==1 || ACCT_VERSION==2 +#if ACCT_VERSION == 1 || ACCT_VERSION == 2 { /* new enlarged etime field */ comp2_t etime = encode_comp2_t(elapsed); + ac.ac_etime_hi = etime >> 16; ac.ac_etime_lo = (u16) etime; } @@ -505,15 +508,15 @@ static void do_acct_process(struct bsd_acct_struct *acct, /* we really need to bite the bullet and change layout */ ac.ac_uid = from_kuid_munged(file->f_cred->user_ns, orig_cred->uid); ac.ac_gid = from_kgid_munged(file->f_cred->user_ns, orig_cred->gid); -#if ACCT_VERSION==2 +#if ACCT_VERSION == 2 ac.ac_ahz = AHZ; #endif -#if ACCT_VERSION==1 || ACCT_VERSION==2 +#if ACCT_VERSION == 1 || ACCT_VERSION == 2 /* backward-compatible 16 bit fields */ ac.ac_uid16 = ac.ac_uid; ac.ac_gid16 = ac.ac_gid; #endif -#if ACCT_VERSION==3 +#if ACCT_VERSION == 3 ac.ac_pid = task_tgid_nr_ns(current, ns); rcu_read_lock(); ac.ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), ns); @@ -574,6 +577,7 @@ void acct_collect(long exitcode, int group_dead) if (group_dead && current->mm) { struct vm_area_struct *vma; + down_read(¤t->mm->mmap_sem); vma = current->mm->mmap; while (vma) { -- cgit v0.10.2 From dd4d9fecbeba893e9ce2488e4d619e5397a2712a Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Fri, 8 Aug 2014 14:23:44 -0700 Subject: init/main.c: code clean-up Fixing some checkpatch warnings(remove global initialization, move __initdata, coalesce formats ...) Signed-off-by: Fabian Frederick Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/init/main.c b/init/main.c index e8ae1fe..bb1aed9 100644 --- a/init/main.c +++ b/init/main.c @@ -6,7 +6,7 @@ * GK 2/5/95 - Changed to support mounting root fs via NFS * Added initrd & change_root: Werner Almesberger & Hans Lermen, Feb '96 * Moan early if gcc is old, avoiding bogus kernels - Paul Gortmaker, May '96 - * Simplified starting of init: Michael A. Griffith + * Simplified starting of init: Michael A. Griffith */ #define DEBUG /* Enable initcall_debug */ @@ -136,7 +136,7 @@ static char *ramdisk_execute_command; * Used to generate warnings if static_key manipulation functions are used * before jump_label_init is called. */ -bool static_key_initialized __read_mostly = false; +bool static_key_initialized __read_mostly; EXPORT_SYMBOL_GPL(static_key_initialized); /* @@ -159,8 +159,8 @@ static int __init set_reset_devices(char *str) __setup("reset_devices", set_reset_devices); -static const char * argv_init[MAX_INIT_ARGS+2] = { "init", NULL, }; -const char * envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, }; +static const char *argv_init[MAX_INIT_ARGS+2] = { "init", NULL, }; +const char *envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, }; static const char *panic_later, *panic_param; extern const struct obs_kernel_param __setup_start[], __setup_end[]; @@ -199,7 +199,6 @@ static int __init obsolete_checksetup(char *line) * still work even if initially too large, it will just take slightly longer */ unsigned long loops_per_jiffy = (1<<12); - EXPORT_SYMBOL(loops_per_jiffy); static int __init debug_kernel(char *str) @@ -376,8 +375,8 @@ static void __init setup_command_line(char *command_line) initcall_command_line = memblock_virt_alloc(strlen(boot_command_line) + 1, 0); static_command_line = memblock_virt_alloc(strlen(command_line) + 1, 0); - strcpy (saved_command_line, boot_command_line); - strcpy (static_command_line, command_line); + strcpy(saved_command_line, boot_command_line); + strcpy(static_command_line, command_line); } /* @@ -445,8 +444,8 @@ void __init parse_early_options(char *cmdline) /* Arch code calls this early on, or if not, just before other parsing. */ void __init parse_early_param(void) { - static __initdata int done = 0; - static __initdata char tmp_cmdline[COMMAND_LINE_SIZE]; + static int done __initdata; + static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata; if (done) return; @@ -500,7 +499,8 @@ static void __init mm_init(void) asmlinkage __visible void __init start_kernel(void) { - char * command_line, *after_dashes; + char *command_line; + char *after_dashes; extern const struct kernel_param __start___param[], __stop___param[]; /* @@ -572,7 +572,8 @@ asmlinkage __visible void __init start_kernel(void) * fragile until we cpu_idle() for the first time. */ preempt_disable(); - if (WARN(!irqs_disabled(), "Interrupts were enabled *very* early, fixing it\n")) + if (WARN(!irqs_disabled(), + "Interrupts were enabled *very* early, fixing it\n")) local_irq_disable(); idr_init_cache(); rcu_init(); -- cgit v0.10.2 From e846ee5f013d8236326f890ea4573004b286df88 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Fri, 8 Aug 2014 14:23:46 -0700 Subject: update Roland McGrath's mail roland@redhat.com bounces, change it to roland@hack.frob.com. Signed-off-by: Richard Weinberger Acked-by: Roland McGrath Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 946a67e..1219dbae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7203,7 +7203,7 @@ F: drivers/ptp/* F: include/linux/ptp_cl* PTRACE SUPPORT -M: Roland McGrath +M: Roland McGrath M: Oleg Nesterov S: Maintained F: include/asm-generic/syscall.h -- cgit v0.10.2 From d347087b6eb5793ec96be998c5a283935972673d Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:23:48 -0700 Subject: MAINTAINERS: remove two ancient EATA sections These haven't had a single ack by the listed maintainer in all git history and the email addresses don't work. An EATA entry for Michael Neuffer is already in CREDITS. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 1219dbae..7624103d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3239,26 +3239,12 @@ T: git git://linuxtv.org/anttip/media_tree.git S: Maintained F: drivers/media/tuners/e4000* -EATA-DMA SCSI DRIVER -M: Michael Neuffer -L: linux-eata@i-connect.net -L: linux-scsi@vger.kernel.org -S: Maintained -F: drivers/scsi/eata* - EATA ISA/EISA/PCI SCSI DRIVER M: Dario Ballabio L: linux-scsi@vger.kernel.org S: Maintained F: drivers/scsi/eata.c -EATA-PIO SCSI DRIVER -M: Michael Neuffer -L: linux-eata@i-connect.net -L: linux-scsi@vger.kernel.org -S: Maintained -F: drivers/scsi/eata_pio.* - EC100 MEDIA DRIVER M: Antti Palosaari L: linux-media@vger.kernel.org -- cgit v0.10.2 From f9213e78c42ce9f712400b699511acfff51351a5 Mon Sep 17 00:00:00 2001 From: Michael Opdenacker Date: Fri, 8 Aug 2014 14:23:50 -0700 Subject: MAINTAINERS: update IBM ServeRAID RAID info - Invalid maintainer e-mail address: Mail server reply: Recipient address rejected: User unknown in virtual alias table - Remove no longer working webpage URL - Remove obsolete "Person" field - Move status to "Orphan" - Add Dave Jeffery and Jack Hammer to the CREDITS file Signed-off-by: Michael Opdenacker Reviewed-by: Jean Delvare Cc: David Jeffery Cc: James Bottomley Cc: Paul Bolle Reviewed-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/CREDITS b/CREDITS index a80b667..bb62788 100644 --- a/CREDITS +++ b/CREDITS @@ -1381,6 +1381,9 @@ S: 17 rue Danton S: F - 94270 Le Kremlin-Bicêtre S: France +N: Jack Hammer +D: IBM ServeRAID RAID (ips) driver maintenance + N: Greg Hankins E: gregh@cc.gatech.edu D: fixed keyboard driver to separate LED and locking status @@ -1691,6 +1694,10 @@ S: Reading S: RG6 2NU S: United Kingdom +N: Dave Jeffery +E: dhjeffery@gmail.com +D: SCSI hacks and IBM ServeRAID RAID driver maintenance + N: Jakub Jelinek E: jakub@redhat.com W: http://sunsite.mff.cuni.cz/~jj diff --git a/MAINTAINERS b/MAINTAINERS index 7624103d..17f5f9f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4495,10 +4495,7 @@ S: Supported F: drivers/scsi/ibmvscsi/ibmvfc* IBM ServeRAID RAID DRIVER -P: Jack Hammer -M: Dave Jeffery -W: http://www.developer.ibm.com/welcome/netfinity/serveraid.html -S: Supported +S: Orphan F: drivers/scsi/ips.* ICH LPC AND GPIO DRIVER -- cgit v0.10.2 From 056610ac140ea4775d8855989c4316017ed3bd0a Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:23:52 -0700 Subject: arch/arm/mach-omap2: replace strict_strto* with kstrto* Replace obsolete strict_strto call with kstrto calls. Simplify copy_from_user/strict_strto by using kstrto_from_user Signed-off-by: Daniel Walter Cc: Tony Lindgren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/mach-omap2/board-omap3touchbook.c b/arch/arm/mach-omap2/board-omap3touchbook.c index 7da48bc..70b904c 100644 --- a/arch/arm/mach-omap2/board-omap3touchbook.c +++ b/arch/arm/mach-omap2/board-omap3touchbook.c @@ -336,7 +336,7 @@ static int __init early_touchbook_revision(char *p) if (!p) return 0; - return strict_strtoul(p, 10, &touchbook_revision); + return kstrtoul(p, 10, &touchbook_revision); } early_param("tbr", early_touchbook_revision); diff --git a/arch/arm/mach-omap2/mux.c b/arch/arm/mach-omap2/mux.c index f62f753..ac8a249 100644 --- a/arch/arm/mach-omap2/mux.c +++ b/arch/arm/mach-omap2/mux.c @@ -681,29 +681,19 @@ static ssize_t omap_mux_dbg_signal_write(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) { - char buf[OMAP_MUX_MAX_ARG_CHAR]; struct seq_file *seqf; struct omap_mux *m; - unsigned long val; - int buf_size, ret; + u16 val; + int ret; struct omap_mux_partition *partition; if (count > OMAP_MUX_MAX_ARG_CHAR) return -EINVAL; - memset(buf, 0, sizeof(buf)); - buf_size = min(count, sizeof(buf) - 1); - - if (copy_from_user(buf, user_buf, buf_size)) - return -EFAULT; - - ret = strict_strtoul(buf, 0x10, &val); + ret = kstrtou16_from_user(user_buf, count, 0x10, &val); if (ret < 0) return ret; - if (val > 0xffff) - return -EINVAL; - seqf = file->private_data; m = seqf->private; @@ -711,7 +701,7 @@ static ssize_t omap_mux_dbg_signal_write(struct file *file, if (!partition) return -ENODEV; - omap_mux_write(partition, (u16)val, m->reg_offset); + omap_mux_write(partition, val, m->reg_offset); *ppos += count; return count; @@ -917,14 +907,14 @@ static void __init omap_mux_set_cmdline_signals(void) while ((token = strsep(&next_opt, ",")) != NULL) { char *keyval, *name; - unsigned long val; + u16 val; keyval = token; name = strsep(&keyval, "="); if (name) { int res; - res = strict_strtoul(keyval, 0x10, &val); + res = kstrtou16(keyval, 0x10, &val); if (res < 0) continue; -- cgit v0.10.2 From 4fce45b44bdbf293bc1aed5526e4da90f32eb8eb Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:23:54 -0700 Subject: arch/arm/mach-pxa: replace strict_strto call with kstrto Replace obsolete call to strict_strto with kstrto Signed-off-by: Daniel Walter Cc: Eric Miao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/mach-pxa/balloon3.c b/arch/arm/mach-pxa/balloon3.c index 43596e0..d897292 100644 --- a/arch/arm/mach-pxa/balloon3.c +++ b/arch/arm/mach-pxa/balloon3.c @@ -90,7 +90,7 @@ int __init parse_balloon3_features(char *arg) if (!arg) return 0; - return strict_strtoul(arg, 0, &balloon3_features_present); + return kstrtoul(arg, 0, &balloon3_features_present); } early_param("balloon3_features", parse_balloon3_features); diff --git a/arch/arm/mach-pxa/viper.c b/arch/arm/mach-pxa/viper.c index 41f27f6..de3b080 100644 --- a/arch/arm/mach-pxa/viper.c +++ b/arch/arm/mach-pxa/viper.c @@ -769,7 +769,7 @@ static unsigned long viper_tpm; static int __init viper_tpm_setup(char *str) { - return strict_strtoul(str, 10, &viper_tpm) >= 0; + return kstrtoul(str, 10, &viper_tpm) >= 0; } __setup("tpm=", viper_tpm_setup); -- cgit v0.10.2 From 5c2432cbe87d693879c11b0e39a32d2528233c64 Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:23:57 -0700 Subject: arch/arm/mach-s3c24xx/mach-jive.c: replace strict_strto* with kstrto* Replace obsolete strict_strto call with kstrto Signed-off-by: Daniel Walter Cc: Ben Dooks Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/mach-s3c24xx/mach-jive.c b/arch/arm/mach-s3c24xx/mach-jive.c index e81ea82..bac9bb5 100644 --- a/arch/arm/mach-s3c24xx/mach-jive.c +++ b/arch/arm/mach-s3c24xx/mach-jive.c @@ -243,7 +243,7 @@ static int __init jive_mtdset(char *options) if (options == NULL || options[0] == '\0') return 0; - if (strict_strtoul(options, 10, &set)) { + if (kstrtoul(options, 10, &set)) { printk(KERN_ERR "failed to parse mtdset=%s\n", options); return 0; } -- cgit v0.10.2 From 37028623d78c7289af0a044202a16c1dd0482b6e Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:23:59 -0700 Subject: arch/arm/mach-w90x900/cpu.c: replace obsolete strict_strto Replace obsolete strict_strto with kstrto calls Signed-off-by: Daniel Walter Cc: Wan ZongShun Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/mach-w90x900/cpu.c b/arch/arm/mach-w90x900/cpu.c index b1eabaa..213230ee 100644 --- a/arch/arm/mach-w90x900/cpu.c +++ b/arch/arm/mach-w90x900/cpu.c @@ -178,7 +178,8 @@ static int __init nuc900_set_cpufreq(char *str) if (!*str) return 0; - strict_strtoul(str, 0, &cpufreq); + if (kstrtoul(str, 0, &cpufreq)) + return 0; nuc900_clock_source(NULL, "ext"); -- cgit v0.10.2 From 1618bd53e6f43918f90ca04a4fcaf664b0a78749 Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:24:01 -0700 Subject: arch/powerpc: replace obsolete strict_strto* calls Replace strict_strto calls with more appropriate kstrto calls Signed-off-by: Daniel Walter Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index d022557..75d62d6 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -149,13 +149,13 @@ static void check_smt_enabled(void) else if (!strcmp(smt_enabled_cmdline, "off")) smt_enabled_at_boot = 0; else { - long smt; + int smt; int rc; - rc = strict_strtol(smt_enabled_cmdline, 10, &smt); + rc = kstrtoint(smt_enabled_cmdline, 10, &smt); if (!rc) smt_enabled_at_boot = - min(threads_per_core, (int)smt); + min(threads_per_core, smt); } } else { dn = of_find_node_by_path("/options"); diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index 904c661..5bfdab90 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -977,7 +977,7 @@ static ssize_t viodev_cmo_desired_set(struct device *dev, size_t new_desired; int ret; - ret = strict_strtoul(buf, 10, &new_desired); + ret = kstrtoul(buf, 10, &new_desired); if (ret) return ret; diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 2d0b4d6..a2450b8 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -400,10 +400,10 @@ out: static ssize_t dlpar_cpu_probe(const char *buf, size_t count) { struct device_node *dn, *parent; - unsigned long drc_index; + u32 drc_index; int rc; - rc = strict_strtoul(buf, 0, &drc_index); + rc = kstrtou32(buf, 0, &drc_index); if (rc) return -EINVAL; diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index d146fef..e7cb6d4 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -320,7 +320,7 @@ static ssize_t migrate_store(struct class *class, struct class_attribute *attr, u64 streamid; int rc; - rc = strict_strtoull(buf, 0, &streamid); + rc = kstrtou64(buf, 0, &streamid); if (rc) return rc; -- cgit v0.10.2 From 164109e3cdba52b9f2ece063bc3aa2a90f77c273 Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:24:03 -0700 Subject: arch/x86: replace strict_strto calls Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter Acked-by: Borislav Petkov Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c index 9c8f739..c703507 100644 --- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -461,7 +461,7 @@ static ssize_t store_cache_disable(struct _cpuid4_info *this_leaf, cpu = cpumask_first(to_cpumask(this_leaf->shared_cpu_map)); - if (strict_strtoul(buf, 10, &val) < 0) + if (kstrtoul(buf, 10, &val) < 0) return -EINVAL; err = amd_set_l3_disable_slot(this_leaf->base.nb, cpu, slot, val); @@ -511,7 +511,7 @@ store_subcaches(struct _cpuid4_info *this_leaf, const char *buf, size_t count, if (!this_leaf->base.nb || !amd_nb_has_feature(AMD_NB_L3_PARTITIONING)) return -EINVAL; - if (strict_strtoul(buf, 16, &val) < 0) + if (kstrtoul(buf, 16, &val) < 0) return -EINVAL; if (amd_set_subcaches(cpu, val)) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 4fc5797..bd9ccda 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2136,7 +2136,7 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr, { u64 new; - if (strict_strtoull(buf, 0, &new) < 0) + if (kstrtou64(buf, 0, &new) < 0) return -EINVAL; attr_to_bank(attr)->ctl = new; @@ -2174,7 +2174,7 @@ static ssize_t set_ignore_ce(struct device *s, { u64 new; - if (strict_strtoull(buf, 0, &new) < 0) + if (kstrtou64(buf, 0, &new) < 0) return -EINVAL; if (mca_cfg.ignore_ce ^ !!new) { @@ -2198,7 +2198,7 @@ static ssize_t set_cmci_disabled(struct device *s, { u64 new; - if (strict_strtoull(buf, 0, &new) < 0) + if (kstrtou64(buf, 0, &new) < 0) return -EINVAL; if (mca_cfg.cmci_disabled ^ !!new) { diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 603df4f..1e49f8f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -353,7 +353,7 @@ store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size) if (!b->interrupt_capable) return -EINVAL; - if (strict_strtoul(buf, 0, &new) < 0) + if (kstrtoul(buf, 0, &new) < 0) return -EINVAL; b->interrupt_enable = !!new; @@ -372,7 +372,7 @@ store_threshold_limit(struct threshold_block *b, const char *buf, size_t size) struct thresh_restart tr; unsigned long new; - if (strict_strtoul(buf, 0, &new) < 0) + if (kstrtoul(buf, 0, &new) < 0) return -EINVAL; if (new > THRESHOLD_MAX) diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 1185fe7..9ade5cf 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -273,7 +273,7 @@ static int mmu_audit_set(const char *val, const struct kernel_param *kp) int ret; unsigned long enable; - ret = strict_strtoul(val, 10, &enable); + ret = kstrtoul(val, 10, &enable); if (ret < 0) return -EINVAL; diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c index ed161c6..3968d67 100644 --- a/arch/x86/platform/uv/tlb_uv.c +++ b/arch/x86/platform/uv/tlb_uv.c @@ -1479,7 +1479,7 @@ static ssize_t ptc_proc_write(struct file *file, const char __user *user, return count; } - if (strict_strtol(optstr, 10, &input_arg) < 0) { + if (kstrtol(optstr, 10, &input_arg) < 0) { printk(KERN_DEBUG "%s is invalid\n", optstr); return -EINVAL; } -- cgit v0.10.2 From f7c65af5136f7999121f05a8398dd34a43bad139 Mon Sep 17 00:00:00 2001 From: Daniel Walter Date: Fri, 8 Aug 2014 14:24:05 -0700 Subject: drivers/scsi: replace strict_strto calls Replace obsolete strict_strto with more appropriate kstrto calls Signed-off-by: Daniel Walter Cc: "James E.J. Bottomley" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c index 017f8b9..6f3275d 100644 --- a/drivers/scsi/pmcraid.c +++ b/drivers/scsi/pmcraid.c @@ -4213,9 +4213,9 @@ static ssize_t pmcraid_store_log_level( { struct Scsi_Host *shost; struct pmcraid_instance *pinstance; - unsigned long val; + u8 val; - if (strict_strtoul(buf, 10, &val)) + if (kstrtou8(buf, 10, &val)) return -EINVAL; /* log-level should be from 0 to 2 */ if (val > 2) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 406b303..8b4105a 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -910,9 +910,9 @@ sdev_store_queue_ramp_up_period(struct device *dev, const char *buf, size_t count) { struct scsi_device *sdev = to_scsi_device(dev); - unsigned long period; + unsigned int period; - if (strict_strtoul(buf, 10, &period)) + if (kstrtouint(buf, 10, &period)) return -EINVAL; sdev->queue_ramp_up_period = msecs_to_jiffies(period); -- cgit v0.10.2 From 82bf0baad9768c21850d06ddc24ea47984c4c90f Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:08 -0700 Subject: pci-dma-compat: add pci_zalloc_consistent helper Add this helper for consistency with pci_zalloc_coherent and the ability to remove unnecessary memset(,0,) uses. Signed-off-by: Joe Perches Cc: Arnd Bergmann Cc: "James E.J. Bottomley" Cc: "John W. Linville" Cc: "Stephen M. Cameron" Cc: Adam Radford Cc: Chaoming Li Cc: Chas Williams Cc: Christian Benvenuti Cc: Christopher Harrer Cc: Dario Ballabio Cc: David Airlie Cc: Don Fry Cc: Faisal Latif Cc: Forest Bond Cc: Govindarajulu Varadarajan <_govind@gmx.com> Cc: Greg Kroah-Hartman Cc: Hal Rosenstock Cc: Hans Verkuil Cc: Jayamohan Kallickal Cc: Jiri Slaby Cc: Jitendra Kalsaria Cc: Larry Finger Cc: Lennert Buytenhek Cc: Lior Dotan Cc: Manish Chopra Cc: Manohar Vanga Cc: Martyn Welch Cc: Mauro Carvalho Chehab Cc: Michael Neuffer Cc: Mirko Lindner Cc: Neel Patel Cc: Neela Syam Kolli Cc: Rajesh Borundia Cc: Roland Dreier Cc: Ron Mercer Cc: Samuel Ortiz Cc: Sean Hefty Cc: Shahed Shaikh Cc: Sony Chacko Cc: Stanislav Yakovlev Cc: Stephen Hemminger Cc: Steve Wise Cc: Sujith Sankar Cc: Tom Tucker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h index 1437b7d..c110843 100644 --- a/include/asm-generic/pci-dma-compat.h +++ b/include/asm-generic/pci-dma-compat.h @@ -19,6 +19,14 @@ pci_alloc_consistent(struct pci_dev *hwdev, size_t size, return dma_alloc_coherent(hwdev == NULL ? NULL : &hwdev->dev, size, dma_handle, GFP_ATOMIC); } +static inline void * +pci_zalloc_consistent(struct pci_dev *hwdev, size_t size, + dma_addr_t *dma_handle) +{ + return dma_zalloc_coherent(hwdev == NULL ? NULL : &hwdev->dev, + size, dma_handle, GFP_ATOMIC); +} + static inline void pci_free_consistent(struct pci_dev *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) -- cgit v0.10.2 From 6f2a011afc324260d8e5e2aa480a58c34283c581 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:10 -0700 Subject: atm: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Chas Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/atm/he.c b/drivers/atm/he.c index aa6be26..c39702b 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -533,14 +533,13 @@ static void he_init_tx_lbfp(struct he_dev *he_dev) static int he_init_tpdrq(struct he_dev *he_dev) { - he_dev->tpdrq_base = pci_alloc_consistent(he_dev->pci_dev, - CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq), &he_dev->tpdrq_phys); + he_dev->tpdrq_base = pci_zalloc_consistent(he_dev->pci_dev, + CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq), + &he_dev->tpdrq_phys); if (he_dev->tpdrq_base == NULL) { hprintk("failed to alloc tpdrq\n"); return -ENOMEM; } - memset(he_dev->tpdrq_base, 0, - CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq)); he_dev->tpdrq_tail = he_dev->tpdrq_base; he_dev->tpdrq_head = he_dev->tpdrq_base; @@ -804,13 +803,13 @@ static int he_init_group(struct he_dev *he_dev, int group) goto out_free_rbpl_virt; } - he_dev->rbpl_base = pci_alloc_consistent(he_dev->pci_dev, - CONFIG_RBPL_SIZE * sizeof(struct he_rbp), &he_dev->rbpl_phys); + he_dev->rbpl_base = pci_zalloc_consistent(he_dev->pci_dev, + CONFIG_RBPL_SIZE * sizeof(struct he_rbp), + &he_dev->rbpl_phys); if (he_dev->rbpl_base == NULL) { hprintk("failed to alloc rbpl_base\n"); goto out_destroy_rbpl_pool; } - memset(he_dev->rbpl_base, 0, CONFIG_RBPL_SIZE * sizeof(struct he_rbp)); INIT_LIST_HEAD(&he_dev->rbpl_outstanding); @@ -843,13 +842,13 @@ static int he_init_group(struct he_dev *he_dev, int group) /* rx buffer ready queue */ - he_dev->rbrq_base = pci_alloc_consistent(he_dev->pci_dev, - CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq), &he_dev->rbrq_phys); + he_dev->rbrq_base = pci_zalloc_consistent(he_dev->pci_dev, + CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq), + &he_dev->rbrq_phys); if (he_dev->rbrq_base == NULL) { hprintk("failed to allocate rbrq\n"); goto out_free_rbpl; } - memset(he_dev->rbrq_base, 0, CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq)); he_dev->rbrq_head = he_dev->rbrq_base; he_writel(he_dev, he_dev->rbrq_phys, G0_RBRQ_ST + (group * 16)); @@ -867,13 +866,13 @@ static int he_init_group(struct he_dev *he_dev, int group) /* tx buffer ready queue */ - he_dev->tbrq_base = pci_alloc_consistent(he_dev->pci_dev, - CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq), &he_dev->tbrq_phys); + he_dev->tbrq_base = pci_zalloc_consistent(he_dev->pci_dev, + CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq), + &he_dev->tbrq_phys); if (he_dev->tbrq_base == NULL) { hprintk("failed to allocate tbrq\n"); goto out_free_rbpq_base; } - memset(he_dev->tbrq_base, 0, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq)); he_dev->tbrq_head = he_dev->tbrq_base; @@ -1460,13 +1459,13 @@ static int he_start(struct atm_dev *dev) /* host status page */ - he_dev->hsp = pci_alloc_consistent(he_dev->pci_dev, - sizeof(struct he_hsp), &he_dev->hsp_phys); + he_dev->hsp = pci_zalloc_consistent(he_dev->pci_dev, + sizeof(struct he_hsp), + &he_dev->hsp_phys); if (he_dev->hsp == NULL) { hprintk("failed to allocate host status page\n"); return -ENOMEM; } - memset(he_dev->hsp, 0, sizeof(struct he_hsp)); he_writel(he_dev, he_dev->hsp_phys, HSP_BA); /* initialize framer */ diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index b621f56..2b24ed0 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -641,13 +641,11 @@ alloc_scq(struct idt77252_dev *card, int class) scq = kzalloc(sizeof(struct scq_info), GFP_KERNEL); if (!scq) return NULL; - scq->base = pci_alloc_consistent(card->pcidev, SCQ_SIZE, - &scq->paddr); + scq->base = pci_zalloc_consistent(card->pcidev, SCQ_SIZE, &scq->paddr); if (scq->base == NULL) { kfree(scq); return NULL; } - memset(scq->base, 0, SCQ_SIZE); scq->next = scq->base; scq->last = scq->base + (SCQ_ENTRIES - 1); @@ -972,13 +970,12 @@ init_rsq(struct idt77252_dev *card) { struct rsq_entry *rsqe; - card->rsq.base = pci_alloc_consistent(card->pcidev, RSQSIZE, - &card->rsq.paddr); + card->rsq.base = pci_zalloc_consistent(card->pcidev, RSQSIZE, + &card->rsq.paddr); if (card->rsq.base == NULL) { printk("%s: can't allocate RSQ.\n", card->name); return -1; } - memset(card->rsq.base, 0, RSQSIZE); card->rsq.last = card->rsq.base + RSQ_NUM_ENTRIES - 1; card->rsq.next = card->rsq.last; @@ -3400,14 +3397,14 @@ static int init_card(struct atm_dev *dev) writel(0, SAR_REG_GP); /* Initialize RAW Cell Handle Register */ - card->raw_cell_hnd = pci_alloc_consistent(card->pcidev, 2 * sizeof(u32), - &card->raw_cell_paddr); + card->raw_cell_hnd = pci_zalloc_consistent(card->pcidev, + 2 * sizeof(u32), + &card->raw_cell_paddr); if (!card->raw_cell_hnd) { printk("%s: memory allocation failure.\n", card->name); deinit_card(card); return -1; } - memset(card->raw_cell_hnd, 0, 2 * sizeof(u32)); writel(card->raw_cell_paddr, SAR_REG_RAWHND); IPRINTK("%s: raw cell handle is at 0x%p.\n", card->name, card->raw_cell_hnd); -- cgit v0.10.2 From a5bbf6160c2b5ebce1533bf989c7cd0086beeabf Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:12 -0700 Subject: block: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Mike Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c index 125d845..811e11c 100644 --- a/drivers/block/DAC960.c +++ b/drivers/block/DAC960.c @@ -6741,11 +6741,11 @@ static long DAC960_gam_ioctl(struct file *file, unsigned int Request, ErrorCode = -ENOMEM; if (DataTransferLength > 0) { - DataTransferBuffer = pci_alloc_consistent(Controller->PCIDevice, - DataTransferLength, &DataTransferBufferDMA); + DataTransferBuffer = pci_zalloc_consistent(Controller->PCIDevice, + DataTransferLength, + &DataTransferBufferDMA); if (DataTransferBuffer == NULL) break; - memset(DataTransferBuffer, 0, DataTransferLength); } else if (DataTransferLength < 0) { @@ -6877,11 +6877,11 @@ static long DAC960_gam_ioctl(struct file *file, unsigned int Request, ErrorCode = -ENOMEM; if (DataTransferLength > 0) { - DataTransferBuffer = pci_alloc_consistent(Controller->PCIDevice, - DataTransferLength, &DataTransferBufferDMA); + DataTransferBuffer = pci_zalloc_consistent(Controller->PCIDevice, + DataTransferLength, + &DataTransferBufferDMA); if (DataTransferBuffer == NULL) break; - memset(DataTransferBuffer, 0, DataTransferLength); } else if (DataTransferLength < 0) { @@ -6899,14 +6899,14 @@ static long DAC960_gam_ioctl(struct file *file, unsigned int Request, RequestSenseLength = UserCommand.RequestSenseLength; if (RequestSenseLength > 0) { - RequestSenseBuffer = pci_alloc_consistent(Controller->PCIDevice, - RequestSenseLength, &RequestSenseBufferDMA); + RequestSenseBuffer = pci_zalloc_consistent(Controller->PCIDevice, + RequestSenseLength, + &RequestSenseBufferDMA); if (RequestSenseBuffer == NULL) { ErrorCode = -ENOMEM; goto Failure2; } - memset(RequestSenseBuffer, 0, RequestSenseLength); } spin_lock_irqsave(&Controller->queue_lock, flags); while ((Command = DAC960_AllocateCommand(Controller)) == NULL) diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 4595c22..ff20f19 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -1014,24 +1014,21 @@ static CommandList_struct *cmd_special_alloc(ctlr_info_t *h) u64bit temp64; dma_addr_t cmd_dma_handle, err_dma_handle; - c = (CommandList_struct *) pci_alloc_consistent(h->pdev, - sizeof(CommandList_struct), &cmd_dma_handle); + c = pci_zalloc_consistent(h->pdev, sizeof(CommandList_struct), + &cmd_dma_handle); if (c == NULL) return NULL; - memset(c, 0, sizeof(CommandList_struct)); c->cmdindex = -1; - c->err_info = (ErrorInfo_struct *) - pci_alloc_consistent(h->pdev, sizeof(ErrorInfo_struct), - &err_dma_handle); + c->err_info = pci_zalloc_consistent(h->pdev, sizeof(ErrorInfo_struct), + &err_dma_handle); if (c->err_info == NULL) { pci_free_consistent(h->pdev, sizeof(CommandList_struct), c, cmd_dma_handle); return NULL; } - memset(c->err_info, 0, sizeof(ErrorInfo_struct)); INIT_LIST_HEAD(&c->list); c->busaddr = (__u32) cmd_dma_handle; diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index 608532d..f0a089d 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -4112,16 +4112,14 @@ static int skd_cons_skcomp(struct skd_device *skdev) skdev->name, __func__, __LINE__, nbytes, SKD_N_COMPLETION_ENTRY); - skcomp = pci_alloc_consistent(skdev->pdev, nbytes, - &skdev->cq_dma_address); + skcomp = pci_zalloc_consistent(skdev->pdev, nbytes, + &skdev->cq_dma_address); if (skcomp == NULL) { rc = -ENOMEM; goto err_out; } - memset(skcomp, 0, nbytes); - skdev->skcomp_table = skcomp; skdev->skerr_table = (struct fit_comp_error_info *)((char *)skcomp + sizeof(*skcomp) * @@ -4304,15 +4302,14 @@ static int skd_cons_skspcl(struct skd_device *skdev) nbytes = SKD_N_SPECIAL_FITMSG_BYTES; - skspcl->msg_buf = pci_alloc_consistent(skdev->pdev, nbytes, - &skspcl->mb_dma_address); + skspcl->msg_buf = + pci_zalloc_consistent(skdev->pdev, nbytes, + &skspcl->mb_dma_address); if (skspcl->msg_buf == NULL) { rc = -ENOMEM; goto err_out; } - memset(skspcl->msg_buf, 0, nbytes); - skspcl->req.sg = kzalloc(sizeof(struct scatterlist) * SKD_N_SG_PER_SPECIAL, GFP_KERNEL); if (skspcl->req.sg == NULL) { @@ -4353,25 +4350,21 @@ static int skd_cons_sksb(struct skd_device *skdev) nbytes = SKD_N_INTERNAL_BYTES; - skspcl->data_buf = pci_alloc_consistent(skdev->pdev, nbytes, - &skspcl->db_dma_address); + skspcl->data_buf = pci_zalloc_consistent(skdev->pdev, nbytes, + &skspcl->db_dma_address); if (skspcl->data_buf == NULL) { rc = -ENOMEM; goto err_out; } - memset(skspcl->data_buf, 0, nbytes); - nbytes = SKD_N_SPECIAL_FITMSG_BYTES; - skspcl->msg_buf = pci_alloc_consistent(skdev->pdev, nbytes, - &skspcl->mb_dma_address); + skspcl->msg_buf = pci_zalloc_consistent(skdev->pdev, nbytes, + &skspcl->mb_dma_address); if (skspcl->msg_buf == NULL) { rc = -ENOMEM; goto err_out; } - memset(skspcl->msg_buf, 0, nbytes); - skspcl->req.sksg_list = skd_cons_sg_list(skdev, 1, &skspcl->req.sksg_dma_address); if (skspcl->req.sksg_list == NULL) { -- cgit v0.10.2 From 7e835084fe1fff705be16e1db8a9e5e8f56b9b73 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:14 -0700 Subject: crypto: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Herbert Xu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/crypto/hifn_795x.c b/drivers/crypto/hifn_795x.c index 12fea3e2..8d2a772 100644 --- a/drivers/crypto/hifn_795x.c +++ b/drivers/crypto/hifn_795x.c @@ -2617,14 +2617,13 @@ static int hifn_probe(struct pci_dev *pdev, const struct pci_device_id *id) } } - dev->desc_virt = pci_alloc_consistent(pdev, sizeof(struct hifn_dma), - &dev->desc_dma); + dev->desc_virt = pci_zalloc_consistent(pdev, sizeof(struct hifn_dma), + &dev->desc_dma); if (!dev->desc_virt) { dprintk("Failed to allocate descriptor rings.\n"); err = -ENOMEM; goto err_out_unmap_bars; } - memset(dev->desc_virt, 0, sizeof(struct hifn_dma)); dev->pdev = pdev; dev->irq = pdev->irq; -- cgit v0.10.2 From 9011a67b2fe9b89048bcba81745ef34394f6f7ab Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:16 -0700 Subject: infiniband: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Tom Tucker Acked-by: Steve Wise Cc: Roland Dreier Cc: Sean Hefty Cc: Hal Rosenstock Cc: Faisal Latif Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/infiniband/hw/amso1100/c2.c b/drivers/infiniband/hw/amso1100/c2.c index 00400c3..766a71c 100644 --- a/drivers/infiniband/hw/amso1100/c2.c +++ b/drivers/infiniband/hw/amso1100/c2.c @@ -604,16 +604,14 @@ static int c2_up(struct net_device *netdev) tx_size = c2_port->tx_ring.count * sizeof(struct c2_tx_desc); c2_port->mem_size = tx_size + rx_size; - c2_port->mem = pci_alloc_consistent(c2dev->pcidev, c2_port->mem_size, - &c2_port->dma); + c2_port->mem = pci_zalloc_consistent(c2dev->pcidev, c2_port->mem_size, + &c2_port->dma); if (c2_port->mem == NULL) { pr_debug("Unable to allocate memory for " "host descriptor rings\n"); return -ENOMEM; } - memset(c2_port->mem, 0, c2_port->mem_size); - /* Create the Rx host descriptor ring */ if ((ret = c2_rx_ring_alloc(&c2_port->rx_ring, c2_port->mem, c2_port->dma, diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 9020024..02120d3 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -1003,13 +1003,13 @@ int nes_init_cqp(struct nes_device *nesdev) (sizeof(struct nes_hw_aeqe) * nesadapter->max_qp) + sizeof(struct nes_hw_cqp_qp_context); - nesdev->cqp_vbase = pci_alloc_consistent(nesdev->pcidev, nesdev->cqp_mem_size, - &nesdev->cqp_pbase); + nesdev->cqp_vbase = pci_zalloc_consistent(nesdev->pcidev, + nesdev->cqp_mem_size, + &nesdev->cqp_pbase); if (!nesdev->cqp_vbase) { nes_debug(NES_DBG_INIT, "Unable to allocate memory for host descriptor rings\n"); return -ENOMEM; } - memset(nesdev->cqp_vbase, 0, nesdev->cqp_mem_size); /* Allocate a twice the number of CQP requests as the SQ size */ nesdev->nes_cqp_requests = kzalloc(sizeof(struct nes_cqp_request) * @@ -1691,13 +1691,13 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev) (NES_NIC_WQ_SIZE * 2 * sizeof(struct nes_hw_nic_cqe)) + sizeof(struct nes_hw_nic_qp_context); - nesvnic->nic_vbase = pci_alloc_consistent(nesdev->pcidev, nesvnic->nic_mem_size, - &nesvnic->nic_pbase); + nesvnic->nic_vbase = pci_zalloc_consistent(nesdev->pcidev, + nesvnic->nic_mem_size, + &nesvnic->nic_pbase); if (!nesvnic->nic_vbase) { nes_debug(NES_DBG_INIT, "Unable to allocate memory for NIC host descriptor rings\n"); return -ENOMEM; } - memset(nesvnic->nic_vbase, 0, nesvnic->nic_mem_size); nes_debug(NES_DBG_INIT, "Allocated NIC QP structures at %p (phys = %016lX), size = %u.\n", nesvnic->nic_vbase, (unsigned long)nesvnic->nic_pbase, nesvnic->nic_mem_size); diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 218dd35..fef067c 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1616,8 +1616,8 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, entries, nescq->cq_mem_size, nescq->hw_cq.cq_number); /* allocate the physical buffer space */ - mem = pci_alloc_consistent(nesdev->pcidev, nescq->cq_mem_size, - &nescq->hw_cq.cq_pbase); + mem = pci_zalloc_consistent(nesdev->pcidev, nescq->cq_mem_size, + &nescq->hw_cq.cq_pbase); if (!mem) { printk(KERN_ERR PFX "Unable to allocate pci memory for cq\n"); nes_free_resource(nesadapter, nesadapter->allocated_cqs, cq_num); @@ -1625,7 +1625,6 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, return ERR_PTR(-ENOMEM); } - memset(mem, 0, nescq->cq_mem_size); nescq->hw_cq.cq_vbase = mem; nescq->hw_cq.cq_head = 0; nes_debug(NES_DBG_CQ, "CQ%u virtual address @ %p, phys = 0x%08X\n", -- cgit v0.10.2 From 59e2623b433552d7402dfacd2bc2d7a22a360a4f Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:19 -0700 Subject: i810: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: David Airlie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/gpu/drm/i810/i810_dma.c b/drivers/gpu/drm/i810/i810_dma.c index e88bac1..bae897d 100644 --- a/drivers/gpu/drm/i810/i810_dma.c +++ b/drivers/gpu/drm/i810/i810_dma.c @@ -393,15 +393,14 @@ static int i810_dma_initialize(struct drm_device *dev, /* Program Hardware Status Page */ dev_priv->hw_status_page = - pci_alloc_consistent(dev->pdev, PAGE_SIZE, - &dev_priv->dma_status_page); + pci_zalloc_consistent(dev->pdev, PAGE_SIZE, + &dev_priv->dma_status_page); if (!dev_priv->hw_status_page) { dev->dev_private = (void *)dev_priv; i810_dma_cleanup(dev); DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM; } - memset(dev_priv->hw_status_page, 0, PAGE_SIZE); DRM_DEBUG("hw status page @ %p\n", dev_priv->hw_status_page); I810_WRITE(0x02080, dev_priv->dma_status_page); -- cgit v0.10.2 From 6850aeabdd1fde7cd35f26d4e8779bec943e1cd9 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:21 -0700 Subject: media: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Hans Verkuil Cc: Mauro Carvalho Chehab Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/media/common/saa7146/saa7146_core.c b/drivers/media/common/saa7146/saa7146_core.c index 34b0d0d..97afee6 100644 --- a/drivers/media/common/saa7146/saa7146_core.c +++ b/drivers/media/common/saa7146/saa7146_core.c @@ -421,23 +421,20 @@ static int saa7146_init_one(struct pci_dev *pci, const struct pci_device_id *ent err = -ENOMEM; /* get memory for various stuff */ - dev->d_rps0.cpu_addr = pci_alloc_consistent(pci, SAA7146_RPS_MEM, - &dev->d_rps0.dma_handle); + dev->d_rps0.cpu_addr = pci_zalloc_consistent(pci, SAA7146_RPS_MEM, + &dev->d_rps0.dma_handle); if (!dev->d_rps0.cpu_addr) goto err_free_irq; - memset(dev->d_rps0.cpu_addr, 0x0, SAA7146_RPS_MEM); - dev->d_rps1.cpu_addr = pci_alloc_consistent(pci, SAA7146_RPS_MEM, - &dev->d_rps1.dma_handle); + dev->d_rps1.cpu_addr = pci_zalloc_consistent(pci, SAA7146_RPS_MEM, + &dev->d_rps1.dma_handle); if (!dev->d_rps1.cpu_addr) goto err_free_rps0; - memset(dev->d_rps1.cpu_addr, 0x0, SAA7146_RPS_MEM); - dev->d_i2c.cpu_addr = pci_alloc_consistent(pci, SAA7146_RPS_MEM, - &dev->d_i2c.dma_handle); + dev->d_i2c.cpu_addr = pci_zalloc_consistent(pci, SAA7146_RPS_MEM, + &dev->d_i2c.dma_handle); if (!dev->d_i2c.cpu_addr) goto err_free_rps1; - memset(dev->d_i2c.cpu_addr, 0x0, SAA7146_RPS_MEM); /* the rest + print status message */ diff --git a/drivers/media/common/saa7146/saa7146_fops.c b/drivers/media/common/saa7146/saa7146_fops.c index d9e1d63..6c47f3f 100644 --- a/drivers/media/common/saa7146/saa7146_fops.c +++ b/drivers/media/common/saa7146/saa7146_fops.c @@ -520,14 +520,15 @@ int saa7146_vv_init(struct saa7146_dev* dev, struct saa7146_ext_vv *ext_vv) configuration data) */ dev->ext_vv_data = ext_vv; - vv->d_clipping.cpu_addr = pci_alloc_consistent(dev->pci, SAA7146_CLIPPING_MEM, &vv->d_clipping.dma_handle); + vv->d_clipping.cpu_addr = + pci_zalloc_consistent(dev->pci, SAA7146_CLIPPING_MEM, + &vv->d_clipping.dma_handle); if( NULL == vv->d_clipping.cpu_addr ) { ERR("out of memory. aborting.\n"); kfree(vv); v4l2_ctrl_handler_free(hdl); return -1; } - memset(vv->d_clipping.cpu_addr, 0x0, SAA7146_CLIPPING_MEM); saa7146_video_uops.init(dev,vv); if (dev->ext_vv_data->capabilities & V4L2_CAP_VBI_CAPTURE) diff --git a/drivers/media/pci/bt8xx/bt878.c b/drivers/media/pci/bt8xx/bt878.c index d0c281f..1176583 100644 --- a/drivers/media/pci/bt8xx/bt878.c +++ b/drivers/media/pci/bt8xx/bt878.c @@ -101,28 +101,20 @@ static int bt878_mem_alloc(struct bt878 *bt) if (!bt->buf_cpu) { bt->buf_size = 128 * 1024; - bt->buf_cpu = - pci_alloc_consistent(bt->dev, bt->buf_size, - &bt->buf_dma); - + bt->buf_cpu = pci_zalloc_consistent(bt->dev, bt->buf_size, + &bt->buf_dma); if (!bt->buf_cpu) return -ENOMEM; - - memset(bt->buf_cpu, 0, bt->buf_size); } if (!bt->risc_cpu) { bt->risc_size = PAGE_SIZE; - bt->risc_cpu = - pci_alloc_consistent(bt->dev, bt->risc_size, - &bt->risc_dma); - + bt->risc_cpu = pci_zalloc_consistent(bt->dev, bt->risc_size, + &bt->risc_dma); if (!bt->risc_cpu) { bt878_mem_free(bt); return -ENOMEM; } - - memset(bt->risc_cpu, 0, bt->risc_size); } return 0; diff --git a/drivers/media/pci/ngene/ngene-core.c b/drivers/media/pci/ngene/ngene-core.c index 826228c..4930b55 100644 --- a/drivers/media/pci/ngene/ngene-core.c +++ b/drivers/media/pci/ngene/ngene-core.c @@ -1075,12 +1075,11 @@ static int AllocCommonBuffers(struct ngene *dev) dev->ngenetohost = dev->FWInterfaceBuffer + 256; dev->EventBuffer = dev->FWInterfaceBuffer + 512; - dev->OverflowBuffer = pci_alloc_consistent(dev->pci_dev, - OVERFLOW_BUFFER_SIZE, - &dev->PAOverflowBuffer); + dev->OverflowBuffer = pci_zalloc_consistent(dev->pci_dev, + OVERFLOW_BUFFER_SIZE, + &dev->PAOverflowBuffer); if (!dev->OverflowBuffer) return -ENOMEM; - memset(dev->OverflowBuffer, 0, OVERFLOW_BUFFER_SIZE); for (i = STREAM_VIDEOIN1; i < MAX_STREAM; i++) { int type = dev->card_info->io_type[i]; diff --git a/drivers/media/usb/ttusb-budget/dvb-ttusb-budget.c b/drivers/media/usb/ttusb-budget/dvb-ttusb-budget.c index f166ffc..cef7a00 100644 --- a/drivers/media/usb/ttusb-budget/dvb-ttusb-budget.c +++ b/drivers/media/usb/ttusb-budget/dvb-ttusb-budget.c @@ -803,11 +803,9 @@ static int ttusb_alloc_iso_urbs(struct ttusb *ttusb) { int i; - ttusb->iso_buffer = pci_alloc_consistent(NULL, - ISO_FRAME_SIZE * - FRAMES_PER_ISO_BUF * - ISO_BUF_COUNT, - &ttusb->iso_dma_handle); + ttusb->iso_buffer = pci_zalloc_consistent(NULL, + ISO_FRAME_SIZE * FRAMES_PER_ISO_BUF * ISO_BUF_COUNT, + &ttusb->iso_dma_handle); if (!ttusb->iso_buffer) { dprintk("%s: pci_alloc_consistent - not enough memory\n", @@ -815,9 +813,6 @@ static int ttusb_alloc_iso_urbs(struct ttusb *ttusb) return -ENOMEM; } - memset(ttusb->iso_buffer, 0, - ISO_FRAME_SIZE * FRAMES_PER_ISO_BUF * ISO_BUF_COUNT); - for (i = 0; i < ISO_BUF_COUNT; i++) { struct urb *urb; diff --git a/drivers/media/usb/ttusb-dec/ttusb_dec.c b/drivers/media/usb/ttusb-dec/ttusb_dec.c index 29724af..15ab584 100644 --- a/drivers/media/usb/ttusb-dec/ttusb_dec.c +++ b/drivers/media/usb/ttusb-dec/ttusb_dec.c @@ -1151,11 +1151,9 @@ static int ttusb_dec_alloc_iso_urbs(struct ttusb_dec *dec) dprintk("%s\n", __func__); - dec->iso_buffer = pci_alloc_consistent(NULL, - ISO_FRAME_SIZE * - (FRAMES_PER_ISO_BUF * - ISO_BUF_COUNT), - &dec->iso_dma_handle); + dec->iso_buffer = pci_zalloc_consistent(NULL, + ISO_FRAME_SIZE * (FRAMES_PER_ISO_BUF * ISO_BUF_COUNT), + &dec->iso_dma_handle); if (!dec->iso_buffer) { dprintk("%s: pci_alloc_consistent - not enough memory\n", @@ -1163,9 +1161,6 @@ static int ttusb_dec_alloc_iso_urbs(struct ttusb_dec *dec) return -ENOMEM; } - memset(dec->iso_buffer, 0, - ISO_FRAME_SIZE * (FRAMES_PER_ISO_BUF * ISO_BUF_COUNT)); - for (i = 0; i < ISO_BUF_COUNT; i++) { struct urb *urb; -- cgit v0.10.2 From 8c6a5dca88401d119e2e6b8e81c2ea3b9b18b4ff Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:23 -0700 Subject: amd: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Acked-by: Don Fry Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/amd/pcnet32.c b/drivers/net/ethernet/amd/pcnet32.c index e7cc917..8099fdc 100644 --- a/drivers/net/ethernet/amd/pcnet32.c +++ b/drivers/net/ethernet/amd/pcnet32.c @@ -484,15 +484,14 @@ static void pcnet32_realloc_tx_ring(struct net_device *dev, pcnet32_purge_tx_ring(dev); - new_tx_ring = pci_alloc_consistent(lp->pci_dev, - sizeof(struct pcnet32_tx_head) * - (1 << size), - &new_ring_dma_addr); + new_tx_ring = pci_zalloc_consistent(lp->pci_dev, + sizeof(struct pcnet32_tx_head) * + (1 << size), + &new_ring_dma_addr); if (new_tx_ring == NULL) { netif_err(lp, drv, dev, "Consistent memory allocation failed\n"); return; } - memset(new_tx_ring, 0, sizeof(struct pcnet32_tx_head) * (1 << size)); new_dma_addr_list = kcalloc(1 << size, sizeof(dma_addr_t), GFP_ATOMIC); @@ -551,15 +550,14 @@ static void pcnet32_realloc_rx_ring(struct net_device *dev, int new, overlap; unsigned int entries = 1 << size; - new_rx_ring = pci_alloc_consistent(lp->pci_dev, - sizeof(struct pcnet32_rx_head) * - entries, - &new_ring_dma_addr); + new_rx_ring = pci_zalloc_consistent(lp->pci_dev, + sizeof(struct pcnet32_rx_head) * + entries, + &new_ring_dma_addr); if (new_rx_ring == NULL) { netif_err(lp, drv, dev, "Consistent memory allocation failed\n"); return; } - memset(new_rx_ring, 0, sizeof(struct pcnet32_rx_head) * entries); new_dma_addr_list = kcalloc(entries, sizeof(dma_addr_t), GFP_ATOMIC); if (!new_dma_addr_list) -- cgit v0.10.2 From 191182688a3303f70ed358f3b0029110d74dec5a Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:25 -0700 Subject: atl1e: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c index 4345332..316e0c3 100644 --- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c +++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c @@ -831,17 +831,14 @@ static int atl1e_setup_ring_resources(struct atl1e_adapter *adapter) /* real ring DMA buffer */ size = adapter->ring_size; - adapter->ring_vir_addr = pci_alloc_consistent(pdev, - adapter->ring_size, &adapter->ring_dma); - + adapter->ring_vir_addr = pci_zalloc_consistent(pdev, adapter->ring_size, + &adapter->ring_dma); if (adapter->ring_vir_addr == NULL) { netdev_err(adapter->netdev, "pci_alloc_consistent failed, size = D%d\n", size); return -ENOMEM; } - memset(adapter->ring_vir_addr, 0, adapter->ring_size); - rx_page_desc = rx_ring->rx_page_desc; /* Init TPD Ring */ -- cgit v0.10.2 From 87f44b4e282a419fe82e9625ea6d3f35f614381f Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:27 -0700 Subject: enic: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Christian Benvenuti Cc: Sujith Sankar Cc: Govindarajulu Varadarajan <_govind@gmx.com> Cc: Neel Patel Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c b/drivers/net/ethernet/cisco/enic/vnic_dev.c index 5abc496..37472ce 100644 --- a/drivers/net/ethernet/cisco/enic/vnic_dev.c +++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c @@ -432,14 +432,12 @@ int vnic_dev_fw_info(struct vnic_dev *vdev, int err = 0; if (!vdev->fw_info) { - vdev->fw_info = pci_alloc_consistent(vdev->pdev, - sizeof(struct vnic_devcmd_fw_info), - &vdev->fw_info_pa); + vdev->fw_info = pci_zalloc_consistent(vdev->pdev, + sizeof(struct vnic_devcmd_fw_info), + &vdev->fw_info_pa); if (!vdev->fw_info) return -ENOMEM; - memset(vdev->fw_info, 0, sizeof(struct vnic_devcmd_fw_info)); - a0 = vdev->fw_info_pa; a1 = sizeof(struct vnic_devcmd_fw_info); -- cgit v0.10.2 From 12fe08b2b5acada5ce708866786d08918e1f4819 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:29 -0700 Subject: sky2: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Mirko Lindner Cc: Stephen Hemminger Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c index 6969338..5991514 100644 --- a/drivers/net/ethernet/marvell/sky2.c +++ b/drivers/net/ethernet/marvell/sky2.c @@ -1622,11 +1622,10 @@ static int sky2_alloc_buffers(struct sky2_port *sky2) if (!sky2->tx_ring) goto nomem; - sky2->rx_le = pci_alloc_consistent(hw->pdev, RX_LE_BYTES, - &sky2->rx_le_map); + sky2->rx_le = pci_zalloc_consistent(hw->pdev, RX_LE_BYTES, + &sky2->rx_le_map); if (!sky2->rx_le) goto nomem; - memset(sky2->rx_le, 0, RX_LE_BYTES); sky2->rx_ring = kcalloc(sky2->rx_pending, sizeof(struct rx_ring_info), GFP_KERNEL); -- cgit v0.10.2 From a2d0abc6aacf2ba9e76ffad60f1782aa10942bce Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:31 -0700 Subject: micrel: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c index 064a48d..cd5f106 100644 --- a/drivers/net/ethernet/micrel/ksz884x.c +++ b/drivers/net/ethernet/micrel/ksz884x.c @@ -4409,14 +4409,13 @@ static int ksz_alloc_desc(struct dev_info *adapter) DESC_ALIGNMENT; adapter->desc_pool.alloc_virt = - pci_alloc_consistent( - adapter->pdev, adapter->desc_pool.alloc_size, - &adapter->desc_pool.dma_addr); + pci_zalloc_consistent(adapter->pdev, + adapter->desc_pool.alloc_size, + &adapter->desc_pool.dma_addr); if (adapter->desc_pool.alloc_virt == NULL) { adapter->desc_pool.alloc_size = 0; return 1; } - memset(adapter->desc_pool.alloc_virt, 0, adapter->desc_pool.alloc_size); /* Align to the next cache line boundary. */ offset = (((ulong) adapter->desc_pool.alloc_virt % DESC_ALIGNMENT) ? -- cgit v0.10.2 From 440c734fefdabcf9caf4532e9bc51313cd0c546b Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:34 -0700 Subject: qlogic: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Manish Chopra Cc: Sony Chacko Cc: Rajesh Borundia Cc: Shahed Shaikh Cc: Jitendra Kalsaria Cc: Ron Mercer Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c index 6f6be57..b8d5270 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c @@ -129,14 +129,12 @@ netxen_get_minidump_template(struct netxen_adapter *adapter) return NX_RCODE_INVALID_ARGS; } - addr = pci_alloc_consistent(adapter->pdev, size, &md_template_addr); - + addr = pci_zalloc_consistent(adapter->pdev, size, &md_template_addr); if (!addr) { dev_err(&adapter->pdev->dev, "Unable to allocate dmable memory for template.\n"); return -ENOMEM; } - memset(addr, 0, size); memset(&cmd, 0, sizeof(cmd)); memset(&cmd.rsp, 1, sizeof(struct _cdrp_cmd)); cmd.req.cmd = NX_CDRP_CMD_GET_TEMP_HDR; diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c index b40050e..d836ace5 100644 --- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c +++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c @@ -2727,23 +2727,22 @@ static void ql_free_shadow_space(struct ql_adapter *qdev) static int ql_alloc_shadow_space(struct ql_adapter *qdev) { qdev->rx_ring_shadow_reg_area = - pci_alloc_consistent(qdev->pdev, - PAGE_SIZE, &qdev->rx_ring_shadow_reg_dma); + pci_zalloc_consistent(qdev->pdev, PAGE_SIZE, + &qdev->rx_ring_shadow_reg_dma); if (qdev->rx_ring_shadow_reg_area == NULL) { netif_err(qdev, ifup, qdev->ndev, "Allocation of RX shadow space failed.\n"); return -ENOMEM; } - memset(qdev->rx_ring_shadow_reg_area, 0, PAGE_SIZE); + qdev->tx_ring_shadow_reg_area = - pci_alloc_consistent(qdev->pdev, PAGE_SIZE, - &qdev->tx_ring_shadow_reg_dma); + pci_zalloc_consistent(qdev->pdev, PAGE_SIZE, + &qdev->tx_ring_shadow_reg_dma); if (qdev->tx_ring_shadow_reg_area == NULL) { netif_err(qdev, ifup, qdev->ndev, "Allocation of TX shadow space failed.\n"); goto err_wqp_sh_area; } - memset(qdev->tx_ring_shadow_reg_area, 0, PAGE_SIZE); return 0; err_wqp_sh_area: -- cgit v0.10.2 From 38537b7f558bca5375b87f0c75ec5cfa2ed69dab Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:36 -0700 Subject: irda: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Samuel Ortiz Acked-by: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/irda/vlsi_ir.c b/drivers/net/irda/vlsi_ir.c index 4850066..58ef594 100644 --- a/drivers/net/irda/vlsi_ir.c +++ b/drivers/net/irda/vlsi_ir.c @@ -485,13 +485,13 @@ static int vlsi_create_hwif(vlsi_irda_dev_t *idev) idev->virtaddr = NULL; idev->busaddr = 0; - ringarea = pci_alloc_consistent(idev->pdev, HW_RING_AREA_SIZE, &idev->busaddr); + ringarea = pci_zalloc_consistent(idev->pdev, HW_RING_AREA_SIZE, + &idev->busaddr); if (!ringarea) { IRDA_ERROR("%s: insufficient memory for descriptor rings\n", __func__); goto out; } - memset(ringarea, 0, HW_RING_AREA_SIZE); hwmap = (struct ring_descr_hw *)ringarea; idev->rx_ring = vlsi_alloc_ring(idev->pdev, hwmap, ringsize[1], -- cgit v0.10.2 From f0b6539cfaa6c13c9ebaa47f0136fee7a4c2d981 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:38 -0700 Subject: ipw2100: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Stanislav Yakovlev Cc: "John W. Linville" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/wireless/ipw2x00/ipw2100.c b/drivers/net/wireless/ipw2x00/ipw2100.c index dfc6dfc..1ab8e50 100644 --- a/drivers/net/wireless/ipw2x00/ipw2100.c +++ b/drivers/net/wireless/ipw2x00/ipw2100.c @@ -3449,8 +3449,9 @@ static int ipw2100_msg_allocate(struct ipw2100_priv *priv) return -ENOMEM; for (i = 0; i < IPW_COMMAND_POOL_SIZE; i++) { - v = pci_alloc_consistent(priv->pci_dev, - sizeof(struct ipw2100_cmd_header), &p); + v = pci_zalloc_consistent(priv->pci_dev, + sizeof(struct ipw2100_cmd_header), + &p); if (!v) { printk(KERN_ERR DRV_NAME ": " "%s: PCI alloc failed for msg " @@ -3459,8 +3460,6 @@ static int ipw2100_msg_allocate(struct ipw2100_priv *priv) break; } - memset(v, 0, sizeof(struct ipw2100_cmd_header)); - priv->msg_buffers[i].type = COMMAND; priv->msg_buffers[i].info.c_struct.cmd = (struct ipw2100_cmd_header *)v; @@ -4336,16 +4335,12 @@ static int status_queue_allocate(struct ipw2100_priv *priv, int entries) IPW_DEBUG_INFO("enter\n"); q->size = entries * sizeof(struct ipw2100_status); - q->drv = - (struct ipw2100_status *)pci_alloc_consistent(priv->pci_dev, - q->size, &q->nic); + q->drv = pci_zalloc_consistent(priv->pci_dev, q->size, &q->nic); if (!q->drv) { IPW_DEBUG_WARNING("Can not allocate status queue.\n"); return -ENOMEM; } - memset(q->drv, 0, q->size); - IPW_DEBUG_INFO("exit\n"); return 0; @@ -4374,13 +4369,12 @@ static int bd_queue_allocate(struct ipw2100_priv *priv, q->entries = entries; q->size = entries * sizeof(struct ipw2100_bd); - q->drv = pci_alloc_consistent(priv->pci_dev, q->size, &q->nic); + q->drv = pci_zalloc_consistent(priv->pci_dev, q->size, &q->nic); if (!q->drv) { IPW_DEBUG_INFO ("can't allocate shared memory for buffer descriptors\n"); return -ENOMEM; } - memset(q->drv, 0, q->size); IPW_DEBUG_INFO("exit\n"); -- cgit v0.10.2 From e6443b24c74110309790a2bb3f5566c9fb5eac66 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:40 -0700 Subject: mwl8k: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Lennert Buytenhek Cc: "John W. Linville" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/wireless/mwl8k.c b/drivers/net/wireless/mwl8k.c index 9a3d4d6..fc6cb21 100644 --- a/drivers/net/wireless/mwl8k.c +++ b/drivers/net/wireless/mwl8k.c @@ -1159,12 +1159,11 @@ static int mwl8k_rxq_init(struct ieee80211_hw *hw, int index) size = MWL8K_RX_DESCS * priv->rxd_ops->rxd_size; - rxq->rxd = pci_alloc_consistent(priv->pdev, size, &rxq->rxd_dma); + rxq->rxd = pci_zalloc_consistent(priv->pdev, size, &rxq->rxd_dma); if (rxq->rxd == NULL) { wiphy_err(hw->wiphy, "failed to alloc RX descriptors\n"); return -ENOMEM; } - memset(rxq->rxd, 0, size); rxq->buf = kcalloc(MWL8K_RX_DESCS, sizeof(*rxq->buf), GFP_KERNEL); if (rxq->buf == NULL) { @@ -1451,12 +1450,11 @@ static int mwl8k_txq_init(struct ieee80211_hw *hw, int index) size = MWL8K_TX_DESCS * sizeof(struct mwl8k_tx_desc); - txq->txd = pci_alloc_consistent(priv->pdev, size, &txq->txd_dma); + txq->txd = pci_zalloc_consistent(priv->pdev, size, &txq->txd_dma); if (txq->txd == NULL) { wiphy_err(hw->wiphy, "failed to alloc TX descriptors\n"); return -ENOMEM; } - memset(txq->txd, 0, size); txq->skb = kcalloc(MWL8K_TX_DESCS, sizeof(*txq->skb), GFP_KERNEL); if (txq->skb == NULL) { -- cgit v0.10.2 From 504e3b4f5a210057419fe7a2ea3a0b5b3bc15aa1 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:42 -0700 Subject: rtl818x: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: "John W. Linville" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/wireless/rtl818x/rtl8180/dev.c b/drivers/net/wireless/rtl818x/rtl8180/dev.c index 4b904f70..fcc45e5 100644 --- a/drivers/net/wireless/rtl818x/rtl8180/dev.c +++ b/drivers/net/wireless/rtl818x/rtl8180/dev.c @@ -972,16 +972,13 @@ static int rtl8180_init_rx_ring(struct ieee80211_hw *dev) else priv->rx_ring_sz = sizeof(struct rtl8180_rx_desc); - priv->rx_ring = pci_alloc_consistent(priv->pdev, - priv->rx_ring_sz * 32, - &priv->rx_ring_dma); - + priv->rx_ring = pci_zalloc_consistent(priv->pdev, priv->rx_ring_sz * 32, + &priv->rx_ring_dma); if (!priv->rx_ring || (unsigned long)priv->rx_ring & 0xFF) { wiphy_err(dev->wiphy, "Cannot allocate RX ring\n"); return -ENOMEM; } - memset(priv->rx_ring, 0, priv->rx_ring_sz * 32); priv->rx_idx = 0; for (i = 0; i < 32; i++) { @@ -1040,14 +1037,14 @@ static int rtl8180_init_tx_ring(struct ieee80211_hw *dev, dma_addr_t dma; int i; - ring = pci_alloc_consistent(priv->pdev, sizeof(*ring) * entries, &dma); + ring = pci_zalloc_consistent(priv->pdev, sizeof(*ring) * entries, + &dma); if (!ring || (unsigned long)ring & 0xFF) { wiphy_err(dev->wiphy, "Cannot allocate TX ring (prio = %d)\n", prio); return -ENOMEM; } - memset(ring, 0, sizeof(*ring)*entries); priv->tx_ring[prio].desc = ring; priv->tx_ring[prio].dma = dma; priv->tx_ring[prio].idx = 0; -- cgit v0.10.2 From 8ac41b9dc719b4a5c65a2192988e8199278bb71a Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:44 -0700 Subject: rtlwifi: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Larry Finger Cc: Chaoming Li Cc: "John W. Linville" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c index dae5525..67d1ee6 100644 --- a/drivers/net/wireless/rtlwifi/pci.c +++ b/drivers/net/wireless/rtlwifi/pci.c @@ -1092,16 +1092,14 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, u32 nextdescaddress; int i; - ring = pci_alloc_consistent(rtlpci->pdev, - sizeof(*ring) * entries, &dma); - + ring = pci_zalloc_consistent(rtlpci->pdev, sizeof(*ring) * entries, + &dma); if (!ring || (unsigned long)ring & 0xFF) { RT_TRACE(rtlpriv, COMP_ERR, DBG_EMERG, "Cannot allocate TX ring (prio = %d)\n", prio); return -ENOMEM; } - memset(ring, 0, sizeof(*ring) * entries); rtlpci->tx_ring[prio].desc = ring; rtlpci->tx_ring[prio].dma = dma; rtlpci->tx_ring[prio].idx = 0; @@ -1139,10 +1137,9 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw) for (rx_queue_idx = 0; rx_queue_idx < RTL_PCI_MAX_RX_QUEUE; rx_queue_idx++) { rtlpci->rx_ring[rx_queue_idx].desc = - pci_alloc_consistent(rtlpci->pdev, - sizeof(*rtlpci->rx_ring[rx_queue_idx]. - desc) * rtlpci->rxringcount, - &rtlpci->rx_ring[rx_queue_idx].dma); + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*rtlpci->rx_ring[rx_queue_idx].desc) * rtlpci->rxringcount, + &rtlpci->rx_ring[rx_queue_idx].dma); if (!rtlpci->rx_ring[rx_queue_idx].desc || (unsigned long)rtlpci->rx_ring[rx_queue_idx].desc & 0xFF) { @@ -1151,10 +1148,6 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw) return -ENOMEM; } - memset(rtlpci->rx_ring[rx_queue_idx].desc, 0, - sizeof(*rtlpci->rx_ring[rx_queue_idx].desc) * - rtlpci->rxringcount); - rtlpci->rx_ring[rx_queue_idx].idx = 0; /* If amsdu_8k is disabled, set buffersize to 4096. This -- cgit v0.10.2 From 7c845eb5e184977d9c7135ae20d012b59f8cc729 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:46 -0700 Subject: scsi: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Adam Radford Cc: "James E.J. Bottomley" Cc: Jayamohan Kallickal Cc: Dario Ballabio Cc: Michael Neuffer Cc: "Stephen M. Cameron" Cc: Neela Syam Kolli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/scsi/3w-sas.c b/drivers/scsi/3w-sas.c index 4de3460..6da6cec 100644 --- a/drivers/scsi/3w-sas.c +++ b/drivers/scsi/3w-sas.c @@ -683,14 +683,13 @@ static int twl_allocate_memory(TW_Device_Extension *tw_dev, int size, int which) unsigned long *cpu_addr; int retval = 1; - cpu_addr = pci_alloc_consistent(tw_dev->tw_pci_dev, size*TW_Q_LENGTH, &dma_handle); + cpu_addr = pci_zalloc_consistent(tw_dev->tw_pci_dev, size * TW_Q_LENGTH, + &dma_handle); if (!cpu_addr) { TW_PRINTK(tw_dev->host, TW_DRIVER, 0x5, "Memory allocation failed"); goto out; } - memset(cpu_addr, 0, size*TW_Q_LENGTH); - for (i = 0; i < TW_Q_LENGTH; i++) { switch(which) { case 0: diff --git a/drivers/scsi/a100u2w.c b/drivers/scsi/a100u2w.c index 522570d..7e33a61 100644 --- a/drivers/scsi/a100u2w.c +++ b/drivers/scsi/a100u2w.c @@ -1125,23 +1125,19 @@ static int inia100_probe_one(struct pci_dev *pdev, /* Get total memory needed for SCB */ sz = ORC_MAXQUEUE * sizeof(struct orc_scb); - host->scb_virt = pci_alloc_consistent(pdev, sz, - &host->scb_phys); + host->scb_virt = pci_zalloc_consistent(pdev, sz, &host->scb_phys); if (!host->scb_virt) { printk("inia100: SCB memory allocation error\n"); goto out_host_put; } - memset(host->scb_virt, 0, sz); /* Get total memory needed for ESCB */ sz = ORC_MAXQUEUE * sizeof(struct orc_extended_scb); - host->escb_virt = pci_alloc_consistent(pdev, sz, - &host->escb_phys); + host->escb_virt = pci_zalloc_consistent(pdev, sz, &host->escb_phys); if (!host->escb_virt) { printk("inia100: ESCB memory allocation error\n"); goto out_free_scb_array; } - memset(host->escb_virt, 0, sz); biosaddr = host->BIOScfg; biosaddr = (biosaddr << 4); diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c index 56467df..eb3e3e6 100644 --- a/drivers/scsi/be2iscsi/be_main.c +++ b/drivers/scsi/be2iscsi/be_main.c @@ -3538,10 +3538,9 @@ static int be_queue_alloc(struct beiscsi_hba *phba, struct be_queue_info *q, q->len = len; q->entry_size = entry_size; mem->size = len * entry_size; - mem->va = pci_alloc_consistent(phba->pcidev, mem->size, &mem->dma); + mem->va = pci_zalloc_consistent(phba->pcidev, mem->size, &mem->dma); if (!mem->va) return -ENOMEM; - memset(mem->va, 0, mem->size); return 0; } @@ -4320,9 +4319,9 @@ static int beiscsi_get_boot_info(struct beiscsi_hba *phba) "BM_%d : No boot session\n"); return ret; } - nonemb_cmd.va = pci_alloc_consistent(phba->ctrl.pdev, - sizeof(*session_resp), - &nonemb_cmd.dma); + nonemb_cmd.va = pci_zalloc_consistent(phba->ctrl.pdev, + sizeof(*session_resp), + &nonemb_cmd.dma); if (nonemb_cmd.va == NULL) { beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_INIT | BEISCSI_LOG_CONFIG, @@ -4332,7 +4331,6 @@ static int beiscsi_get_boot_info(struct beiscsi_hba *phba) return -ENOMEM; } - memset(nonemb_cmd.va, 0, sizeof(*session_resp)); tag = mgmt_get_session_info(phba, s_handle, &nonemb_cmd); if (!tag) { diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c index a3e5648..665afcb 100644 --- a/drivers/scsi/be2iscsi/be_mgmt.c +++ b/drivers/scsi/be2iscsi/be_mgmt.c @@ -900,13 +900,12 @@ free_cmd: static int mgmt_alloc_cmd_data(struct beiscsi_hba *phba, struct be_dma_mem *cmd, int iscsi_cmd, int size) { - cmd->va = pci_alloc_consistent(phba->ctrl.pdev, size, &cmd->dma); + cmd->va = pci_zalloc_consistent(phba->ctrl.pdev, size, &cmd->dma); if (!cmd->va) { beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG, "BG_%d : Failed to allocate memory for if info\n"); return -ENOMEM; } - memset(cmd->va, 0, size); cmd->size = size; be_cmd_hdr_prepare(cmd->va, CMD_SUBSYSTEM_ISCSI, iscsi_cmd, size); return 0; diff --git a/drivers/scsi/csiostor/csio_wr.c b/drivers/scsi/csiostor/csio_wr.c index 4255ce2..773da14 100644 --- a/drivers/scsi/csiostor/csio_wr.c +++ b/drivers/scsi/csiostor/csio_wr.c @@ -232,7 +232,7 @@ csio_wr_alloc_q(struct csio_hw *hw, uint32_t qsize, uint32_t wrsize, q = wrm->q_arr[free_idx]; - q->vstart = pci_alloc_consistent(hw->pdev, qsz, &q->pstart); + q->vstart = pci_zalloc_consistent(hw->pdev, qsz, &q->pstart); if (!q->vstart) { csio_err(hw, "Failed to allocate DMA memory for " @@ -240,12 +240,6 @@ csio_wr_alloc_q(struct csio_hw *hw, uint32_t qsize, uint32_t wrsize, return -1; } - /* - * We need to zero out the contents, importantly for ingress, - * since we start with a generatiom bit of 1 for ingress. - */ - memset(q->vstart, 0, qsz); - q->type = type; q->owner = owner; q->pidx = q->cidx = q->inc_idx = 0; diff --git a/drivers/scsi/eata.c b/drivers/scsi/eata.c index 03372cf..813dd5c 100644 --- a/drivers/scsi/eata.c +++ b/drivers/scsi/eata.c @@ -1238,8 +1238,8 @@ static int port_detect(unsigned long port_base, unsigned int j, struct eata_config *cf; dma_addr_t cf_dma_addr; - cf = pci_alloc_consistent(pdev, sizeof(struct eata_config), - &cf_dma_addr); + cf = pci_zalloc_consistent(pdev, sizeof(struct eata_config), + &cf_dma_addr); if (!cf) { printk @@ -1249,7 +1249,6 @@ static int port_detect(unsigned long port_base, unsigned int j, } /* Set board configuration */ - memset((char *)cf, 0, sizeof(struct eata_config)); cf->len = (ushort) H2DEV16((ushort) 510); cf->ocena = 1; diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 8545d18..6b35d0d 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -4732,23 +4732,21 @@ static struct CommandList *cmd_special_alloc(struct ctlr_info *h) union u64bit temp64; dma_addr_t cmd_dma_handle, err_dma_handle; - c = pci_alloc_consistent(h->pdev, sizeof(*c), &cmd_dma_handle); + c = pci_zalloc_consistent(h->pdev, sizeof(*c), &cmd_dma_handle); if (c == NULL) return NULL; - memset(c, 0, sizeof(*c)); c->cmd_type = CMD_SCSI; c->cmdindex = -1; - c->err_info = pci_alloc_consistent(h->pdev, sizeof(*c->err_info), - &err_dma_handle); + c->err_info = pci_zalloc_consistent(h->pdev, sizeof(*c->err_info), + &err_dma_handle); if (c->err_info == NULL) { pci_free_consistent(h->pdev, sizeof(*c), c, cmd_dma_handle); return NULL; } - memset(c->err_info, 0, sizeof(*c->err_info)); INIT_LIST_HEAD(&c->list); c->busaddr = (u32) cmd_dma_handle; diff --git a/drivers/scsi/megaraid/megaraid_mbox.c b/drivers/scsi/megaraid/megaraid_mbox.c index e2237a9..531dce4 100644 --- a/drivers/scsi/megaraid/megaraid_mbox.c +++ b/drivers/scsi/megaraid/megaraid_mbox.c @@ -998,8 +998,9 @@ megaraid_alloc_cmd_packets(adapter_t *adapter) * Allocate the common 16-byte aligned memory for the handshake * mailbox. */ - raid_dev->una_mbox64 = pci_alloc_consistent(adapter->pdev, - sizeof(mbox64_t), &raid_dev->una_mbox64_dma); + raid_dev->una_mbox64 = pci_zalloc_consistent(adapter->pdev, + sizeof(mbox64_t), + &raid_dev->una_mbox64_dma); if (!raid_dev->una_mbox64) { con_log(CL_ANN, (KERN_WARNING @@ -1007,7 +1008,6 @@ megaraid_alloc_cmd_packets(adapter_t *adapter) __LINE__)); return -1; } - memset(raid_dev->una_mbox64, 0, sizeof(mbox64_t)); /* * Align the mailbox at 16-byte boundary @@ -1026,8 +1026,8 @@ megaraid_alloc_cmd_packets(adapter_t *adapter) align; // Allocate memory for commands issued internally - adapter->ibuf = pci_alloc_consistent(pdev, MBOX_IBUF_SIZE, - &adapter->ibuf_dma_h); + adapter->ibuf = pci_zalloc_consistent(pdev, MBOX_IBUF_SIZE, + &adapter->ibuf_dma_h); if (!adapter->ibuf) { con_log(CL_ANN, (KERN_WARNING @@ -1036,7 +1036,6 @@ megaraid_alloc_cmd_packets(adapter_t *adapter) goto out_free_common_mbox; } - memset(adapter->ibuf, 0, MBOX_IBUF_SIZE); // Allocate memory for our SCSI Command Blocks and their associated // memory @@ -2972,8 +2971,8 @@ megaraid_mbox_product_info(adapter_t *adapter) * Issue an ENQUIRY3 command to find out certain adapter parameters, * e.g., max channels, max commands etc. */ - pinfo = pci_alloc_consistent(adapter->pdev, sizeof(mraid_pinfo_t), - &pinfo_dma_h); + pinfo = pci_zalloc_consistent(adapter->pdev, sizeof(mraid_pinfo_t), + &pinfo_dma_h); if (pinfo == NULL) { con_log(CL_ANN, (KERN_WARNING @@ -2982,7 +2981,6 @@ megaraid_mbox_product_info(adapter_t *adapter) return -1; } - memset(pinfo, 0, sizeof(mraid_pinfo_t)); mbox->xferaddr = (uint32_t)adapter->ibuf_dma_h; memset((void *)adapter->ibuf, 0, MBOX_IBUF_SIZE); diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 112799b..22a04e3 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -2038,9 +2038,9 @@ int megasas_sriov_start_heartbeat(struct megasas_instance *instance, if (initial) { instance->hb_host_mem = - pci_alloc_consistent(instance->pdev, - sizeof(struct MR_CTRL_HB_HOST_MEM), - &instance->hb_host_mem_h); + pci_zalloc_consistent(instance->pdev, + sizeof(struct MR_CTRL_HB_HOST_MEM), + &instance->hb_host_mem_h); if (!instance->hb_host_mem) { printk(KERN_DEBUG "megasas: SR-IOV: Couldn't allocate" " memory for heartbeat host memory for " @@ -2048,8 +2048,6 @@ int megasas_sriov_start_heartbeat(struct megasas_instance *instance, retval = -ENOMEM; goto out; } - memset(instance->hb_host_mem, 0, - sizeof(struct MR_CTRL_HB_HOST_MEM)); } memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE); diff --git a/drivers/scsi/mesh.c b/drivers/scsi/mesh.c index 7a6160f..57a95e2 100644 --- a/drivers/scsi/mesh.c +++ b/drivers/scsi/mesh.c @@ -1915,14 +1915,12 @@ static int mesh_probe(struct macio_dev *mdev, const struct of_device_id *match) /* We use the PCI APIs for now until the generic one gets fixed * enough or until we get some macio-specific versions */ - dma_cmd_space = pci_alloc_consistent(macio_get_pci_dev(mdev), - ms->dma_cmd_size, - &dma_cmd_bus); + dma_cmd_space = pci_zalloc_consistent(macio_get_pci_dev(mdev), + ms->dma_cmd_size, &dma_cmd_bus); if (dma_cmd_space == NULL) { printk(KERN_ERR "mesh: can't allocate DMA table\n"); goto out_unmap; } - memset(dma_cmd_space, 0, ms->dma_cmd_size); ms->dma_cmds = (struct dbdma_cmd *) DBDMA_ALIGN(dma_cmd_space); ms->dma_cmd_space = dma_cmd_space; diff --git a/drivers/scsi/mvumi.c b/drivers/scsi/mvumi.c index edbee8d..3e716b2 100644 --- a/drivers/scsi/mvumi.c +++ b/drivers/scsi/mvumi.c @@ -142,8 +142,8 @@ static struct mvumi_res *mvumi_alloc_mem_resource(struct mvumi_hba *mhba, case RESOURCE_UNCACHED_MEMORY: size = round_up(size, 8); - res->virt_addr = pci_alloc_consistent(mhba->pdev, size, - &res->bus_addr); + res->virt_addr = pci_zalloc_consistent(mhba->pdev, size, + &res->bus_addr); if (!res->virt_addr) { dev_err(&mhba->pdev->dev, "unable to allocate consistent mem," @@ -151,7 +151,6 @@ static struct mvumi_res *mvumi_alloc_mem_resource(struct mvumi_hba *mhba, kfree(res); return NULL; } - memset(res->virt_addr, 0, size); break; default: @@ -258,12 +257,10 @@ static int mvumi_internal_cmd_sgl(struct mvumi_hba *mhba, struct mvumi_cmd *cmd, if (size == 0) return 0; - virt_addr = pci_alloc_consistent(mhba->pdev, size, &phy_addr); + virt_addr = pci_zalloc_consistent(mhba->pdev, size, &phy_addr); if (!virt_addr) return -1; - memset(virt_addr, 0, size); - m_sg = (struct mvumi_sgl *) &cmd->frame->payload[0]; cmd->frame->sg_counts = 1; cmd->data_buf = virt_addr; diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index 34cea82..76570e6 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -116,13 +116,12 @@ int pm8001_mem_alloc(struct pci_dev *pdev, void **virt_addr, u64 align_offset = 0; if (align) align_offset = (dma_addr_t)align - 1; - mem_virt_alloc = - pci_alloc_consistent(pdev, mem_size + align, &mem_dma_handle); + mem_virt_alloc = pci_zalloc_consistent(pdev, mem_size + align, + &mem_dma_handle); if (!mem_virt_alloc) { pm8001_printk("memory allocation error\n"); return -1; } - memset((void *)mem_virt_alloc, 0, mem_size+align); *pphys_addr = mem_dma_handle; phys_align = (*pphys_addr + align_offset) & ~align_offset; *virt_addr = (void *)mem_virt_alloc + phys_align - *pphys_addr; -- cgit v0.10.2 From 8b983be54bb5c4ff133e3aa45fe50669089d0a68 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:48 -0700 Subject: staging: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Acked-by: Greg Kroah-Hartman Cc: Lior Dotan Cc: Christopher Harrer Cc: Forest Bond Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c index 2920e40..5729cf6 100644 --- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c +++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c @@ -2065,20 +2065,16 @@ static short rtl8192_alloc_rx_desc_ring(struct net_device *dev) int i, rx_queue_idx; for (rx_queue_idx = 0; rx_queue_idx < MAX_RX_QUEUE; rx_queue_idx++) { - priv->rx_ring[rx_queue_idx] = pci_alloc_consistent(priv->pdev, - sizeof(*priv->rx_ring[rx_queue_idx]) * - priv->rxringcount, - &priv->rx_ring_dma[rx_queue_idx]); - + priv->rx_ring[rx_queue_idx] = + pci_zalloc_consistent(priv->pdev, + sizeof(*priv->rx_ring[rx_queue_idx]) * priv->rxringcount, + &priv->rx_ring_dma[rx_queue_idx]); if (!priv->rx_ring[rx_queue_idx] || (unsigned long)priv->rx_ring[rx_queue_idx] & 0xFF) { RT_TRACE(COMP_ERR, "Cannot allocate RX ring\n"); return -ENOMEM; } - memset(priv->rx_ring[rx_queue_idx], 0, - sizeof(*priv->rx_ring[rx_queue_idx]) * - priv->rxringcount); priv->rx_idx[rx_queue_idx] = 0; for (i = 0; i < priv->rxringcount; i++) { @@ -2118,14 +2114,13 @@ static int rtl8192_alloc_tx_desc_ring(struct net_device *dev, dma_addr_t dma; int i; - ring = pci_alloc_consistent(priv->pdev, sizeof(*ring) * entries, &dma); + ring = pci_zalloc_consistent(priv->pdev, sizeof(*ring) * entries, &dma); if (!ring || (unsigned long)ring & 0xFF) { RT_TRACE(COMP_ERR, "Cannot allocate TX ring (prio = %d)\n", prio); return -ENOMEM; } - memset(ring, 0, sizeof(*ring)*entries); priv->tx_ring[prio].desc = ring; priv->tx_ring[prio].dma = dma; priv->tx_ring[prio].idx = 0; diff --git a/drivers/staging/rtl8192ee/pci.c b/drivers/staging/rtl8192ee/pci.c index f3abbcc..0215aef 100644 --- a/drivers/staging/rtl8192ee/pci.c +++ b/drivers/staging/rtl8192ee/pci.c @@ -1224,10 +1224,10 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, /* alloc tx buffer desc for new trx flow*/ if (rtlpriv->use_new_trx_flow) { - buffer_desc = pci_alloc_consistent(rtlpci->pdev, - sizeof(*buffer_desc) * entries, - &buffer_desc_dma); - + buffer_desc = + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*buffer_desc) * entries, + &buffer_desc_dma); if (!buffer_desc || (unsigned long)buffer_desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, ("Cannot allocate TX ring (prio = %d)\n", @@ -1235,7 +1235,6 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, return -ENOMEM; } - memset(buffer_desc, 0, sizeof(*buffer_desc) * entries); rtlpci->tx_ring[prio].buffer_desc = buffer_desc; rtlpci->tx_ring[prio].buffer_desc_dma = buffer_desc_dma; @@ -1245,16 +1244,14 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, } /* alloc dma for this ring */ - desc = pci_alloc_consistent(rtlpci->pdev, - sizeof(*desc) * entries, &desc_dma); - + desc = pci_zalloc_consistent(rtlpci->pdev, sizeof(*desc) * entries, + &desc_dma); if (!desc || (unsigned long)desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, ("Cannot allocate TX ring (prio = %d)\n", prio)); return -ENOMEM; } - memset(desc, 0, sizeof(*desc) * entries); rtlpci->tx_ring[prio].desc = desc; rtlpci->tx_ring[prio].dma = desc_dma; @@ -1290,11 +1287,9 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) struct rtl_rx_buffer_desc *entry = NULL; /* alloc dma for this ring */ rtlpci->rx_ring[rxring_idx].buffer_desc = - pci_alloc_consistent(rtlpci->pdev, - sizeof(*rtlpci->rx_ring[rxring_idx]. - buffer_desc) * - rtlpci->rxringcount, - &rtlpci->rx_ring[rxring_idx].dma); + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*rtlpci->rx_ring[rxring_idx].buffer_desc) * rtlpci->rxringcount, + &rtlpci->rx_ring[rxring_idx].dma); if (!rtlpci->rx_ring[rxring_idx].buffer_desc || (unsigned long)rtlpci->rx_ring[rxring_idx].buffer_desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, @@ -1302,10 +1297,6 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) return -ENOMEM; } - memset(rtlpci->rx_ring[rxring_idx].buffer_desc, 0, - sizeof(*rtlpci->rx_ring[rxring_idx].buffer_desc) * - rtlpci->rxringcount); - /* init every desc in this ring */ rtlpci->rx_ring[rxring_idx].idx = 0; @@ -1320,19 +1311,15 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) u8 tmp_one = 1; /* alloc dma for this ring */ rtlpci->rx_ring[rxring_idx].desc = - pci_alloc_consistent(rtlpci->pdev, - sizeof(*rtlpci->rx_ring[rxring_idx]. - desc) * rtlpci->rxringcount, - &rtlpci->rx_ring[rxring_idx].dma); + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*rtlpci->rx_ring[rxring_idx].desc) * rtlpci->rxringcount, + &rtlpci->rx_ring[rxring_idx].dma); if (!rtlpci->rx_ring[rxring_idx].desc || (unsigned long)rtlpci->rx_ring[rxring_idx].desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, ("Cannot allocate RX ring\n")); return -ENOMEM; } - memset(rtlpci->rx_ring[rxring_idx].desc, 0, - sizeof(*rtlpci->rx_ring[rxring_idx].desc) * - rtlpci->rxringcount); /* init every desc in this ring */ rtlpci->rx_ring[rxring_idx].idx = 0; diff --git a/drivers/staging/rtl8821ae/pci.c b/drivers/staging/rtl8821ae/pci.c index f9847d1..26d7b2f 100644 --- a/drivers/staging/rtl8821ae/pci.c +++ b/drivers/staging/rtl8821ae/pci.c @@ -1248,9 +1248,10 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, /* alloc tx buffer desc for new trx flow*/ if (rtlpriv->use_new_trx_flow) { - buffer_desc = pci_alloc_consistent(rtlpci->pdev, - sizeof(*buffer_desc) * entries, - &buffer_desc_dma); + buffer_desc = + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*buffer_desc) * entries, + &buffer_desc_dma); if (!buffer_desc || (unsigned long)buffer_desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, @@ -1259,7 +1260,6 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, return -ENOMEM; } - memset(buffer_desc, 0, sizeof(*buffer_desc) * entries); rtlpci->tx_ring[prio].buffer_desc = buffer_desc; rtlpci->tx_ring[prio].buffer_desc_dma = buffer_desc_dma; @@ -1270,8 +1270,8 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, } /* alloc dma for this ring */ - desc = pci_alloc_consistent(rtlpci->pdev, - sizeof(*desc) * entries, &desc_dma); + desc = pci_zalloc_consistent(rtlpci->pdev, sizeof(*desc) * entries, + &desc_dma); if (!desc || (unsigned long)desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, @@ -1279,7 +1279,6 @@ static int _rtl_pci_init_tx_ring(struct ieee80211_hw *hw, return -ENOMEM; } - memset(desc, 0, sizeof(*desc) * entries); rtlpci->tx_ring[prio].desc = desc; rtlpci->tx_ring[prio].dma = desc_dma; @@ -1316,21 +1315,15 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) struct rtl_rx_buffer_desc *entry = NULL; /* alloc dma for this ring */ rtlpci->rx_ring[rxring_idx].buffer_desc = - pci_alloc_consistent(rtlpci->pdev, - sizeof(*rtlpci->rx_ring[rxring_idx]. - buffer_desc) * - rtlpci->rxringcount, - &rtlpci->rx_ring[rxring_idx].dma); + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*rtlpci->rx_ring[rxring_idx].buffer_desc) * rtlpci->rxringcount, + &rtlpci->rx_ring[rxring_idx].dma); if (!rtlpci->rx_ring[rxring_idx].buffer_desc || (unsigned long)rtlpci->rx_ring[rxring_idx].buffer_desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, ("Cannot allocate RX ring\n")); return -ENOMEM; } - memset(rtlpci->rx_ring[rxring_idx].buffer_desc, 0, - sizeof(*rtlpci->rx_ring[rxring_idx].buffer_desc) * - rtlpci->rxringcount); - /* init every desc in this ring */ rtlpci->rx_ring[rxring_idx].idx = 0; for (i = 0; i < rtlpci->rxringcount; i++) { @@ -1344,10 +1337,9 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) u8 tmp_one = 1; /* alloc dma for this ring */ rtlpci->rx_ring[rxring_idx].desc = - pci_alloc_consistent(rtlpci->pdev, - sizeof(*rtlpci->rx_ring[rxring_idx]. - desc) * rtlpci->rxringcount, - &rtlpci->rx_ring[rxring_idx].dma); + pci_zalloc_consistent(rtlpci->pdev, + sizeof(*rtlpci->rx_ring[rxring_idx].desc) * rtlpci->rxringcount, + &rtlpci->rx_ring[rxring_idx].dma); if (!rtlpci->rx_ring[rxring_idx].desc || (unsigned long)rtlpci->rx_ring[rxring_idx].desc & 0xFF) { RT_TRACE(COMP_ERR, DBG_EMERG, @@ -1355,10 +1347,6 @@ static int _rtl_pci_init_rx_ring(struct ieee80211_hw *hw, int rxring_idx) return -ENOMEM; } - memset(rtlpci->rx_ring[rxring_idx].desc, 0, - sizeof(*rtlpci->rx_ring[rxring_idx].desc) * - rtlpci->rxringcount); - /* init every desc in this ring */ rtlpci->rx_ring[rxring_idx].idx = 0; for (i = 0; i < rtlpci->rxringcount; i++) { diff --git a/drivers/staging/slicoss/slicoss.c b/drivers/staging/slicoss/slicoss.c index 50ece29..f35fa3d 100644 --- a/drivers/staging/slicoss/slicoss.c +++ b/drivers/staging/slicoss/slicoss.c @@ -1191,18 +1191,15 @@ static int slic_rspqueue_init(struct adapter *adapter) rspq->num_pages = SLIC_RSPQ_PAGES_GB; for (i = 0; i < rspq->num_pages; i++) { - rspq->vaddr[i] = pci_alloc_consistent(adapter->pcidev, - PAGE_SIZE, - &rspq->paddr[i]); + rspq->vaddr[i] = pci_zalloc_consistent(adapter->pcidev, + PAGE_SIZE, + &rspq->paddr[i]); if (!rspq->vaddr[i]) { dev_err(&adapter->pcidev->dev, "pci_alloc_consistent failed\n"); slic_rspqueue_free(adapter); return -ENOMEM; } - /* FIXME: - * do we really need this assertions (4K PAGE_SIZE aligned addr)? */ - memset(rspq->vaddr[i], 0, PAGE_SIZE); if (paddrh == 0) { slic_reg32_write(&slic_regs->slic_rbar, diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c index c78d06e..0b583a3 100644 --- a/drivers/staging/vt6655/device_main.c +++ b/drivers/staging/vt6655/device_main.c @@ -1111,25 +1111,17 @@ static bool device_init_rings(PSDevice pDevice) void *vir_pool; /*allocate all RD/TD rings a single pool*/ - vir_pool = pci_alloc_consistent(pDevice->pcid, - pDevice->sOpts.nRxDescs0 * sizeof(SRxDesc) + - pDevice->sOpts.nRxDescs1 * sizeof(SRxDesc) + - pDevice->sOpts.nTxDescs[0] * sizeof(STxDesc) + - pDevice->sOpts.nTxDescs[1] * sizeof(STxDesc), - &pDevice->pool_dma); - + vir_pool = pci_zalloc_consistent(pDevice->pcid, + pDevice->sOpts.nRxDescs0 * sizeof(SRxDesc) + + pDevice->sOpts.nRxDescs1 * sizeof(SRxDesc) + + pDevice->sOpts.nTxDescs[0] * sizeof(STxDesc) + + pDevice->sOpts.nTxDescs[1] * sizeof(STxDesc), + &pDevice->pool_dma); if (vir_pool == NULL) { DBG_PRT(MSG_LEVEL_ERR, KERN_ERR "%s : allocate desc dma memory failed\n", pDevice->dev->name); return false; } - memset(vir_pool, 0, - pDevice->sOpts.nRxDescs0 * sizeof(SRxDesc) + - pDevice->sOpts.nRxDescs1 * sizeof(SRxDesc) + - pDevice->sOpts.nTxDescs[0] * sizeof(STxDesc) + - pDevice->sOpts.nTxDescs[1] * sizeof(STxDesc) - ); - pDevice->aRD0Ring = vir_pool; pDevice->aRD1Ring = vir_pool + pDevice->sOpts.nRxDescs0 * sizeof(SRxDesc); @@ -1138,13 +1130,12 @@ static bool device_init_rings(PSDevice pDevice) pDevice->rd1_pool_dma = pDevice->rd0_pool_dma + pDevice->sOpts.nRxDescs0 * sizeof(SRxDesc); - pDevice->tx0_bufs = pci_alloc_consistent(pDevice->pcid, - pDevice->sOpts.nTxDescs[0] * PKT_BUF_SZ + - pDevice->sOpts.nTxDescs[1] * PKT_BUF_SZ + - CB_BEACON_BUF_SIZE + - CB_MAX_BUF_SIZE, - &pDevice->tx_bufs_dma0); - + pDevice->tx0_bufs = pci_zalloc_consistent(pDevice->pcid, + pDevice->sOpts.nTxDescs[0] * PKT_BUF_SZ + + pDevice->sOpts.nTxDescs[1] * PKT_BUF_SZ + + CB_BEACON_BUF_SIZE + + CB_MAX_BUF_SIZE, + &pDevice->tx_bufs_dma0); if (pDevice->tx0_bufs == NULL) { DBG_PRT(MSG_LEVEL_ERR, KERN_ERR "%s: allocate buf dma memory failed\n", pDevice->dev->name); pci_free_consistent(pDevice->pcid, @@ -1157,13 +1148,6 @@ static bool device_init_rings(PSDevice pDevice) return false; } - memset(pDevice->tx0_bufs, 0, - pDevice->sOpts.nTxDescs[0] * PKT_BUF_SZ + - pDevice->sOpts.nTxDescs[1] * PKT_BUF_SZ + - CB_BEACON_BUF_SIZE + - CB_MAX_BUF_SIZE - ); - pDevice->td0_pool_dma = pDevice->rd1_pool_dma + pDevice->sOpts.nRxDescs1 * sizeof(SRxDesc); -- cgit v0.10.2 From d54d7796c5f86eea0748a185b56ffb74ebc050e6 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:51 -0700 Subject: synclink_gt: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Acked-by: Greg Kroah-Hartman Cc: Jiri Slaby Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/tty/synclink_gt.c b/drivers/tty/synclink_gt.c index ba1dbcd..0e8c39b 100644 --- a/drivers/tty/synclink_gt.c +++ b/drivers/tty/synclink_gt.c @@ -3383,12 +3383,11 @@ static int alloc_desc(struct slgt_info *info) unsigned int pbufs; /* allocate memory to hold descriptor lists */ - info->bufs = pci_alloc_consistent(info->pdev, DESC_LIST_SIZE, &info->bufs_dma_addr); + info->bufs = pci_zalloc_consistent(info->pdev, DESC_LIST_SIZE, + &info->bufs_dma_addr); if (info->bufs == NULL) return -ENOMEM; - memset(info->bufs, 0, DESC_LIST_SIZE); - info->rbufs = (struct slgt_desc*)info->bufs; info->tbufs = ((struct slgt_desc*)info->bufs) + info->rbuf_count; -- cgit v0.10.2 From 88b2608c49b7fe3d9131d5a1d4a03438a589997c Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:53 -0700 Subject: vme: bridges: use pci_zalloc_consistent Remove the now unnecessary memset too. Signed-off-by: Joe Perches Cc: Martyn Welch Cc: Manohar Vanga Acked-by: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/vme/bridges/vme_ca91cx42.c b/drivers/vme/bridges/vme_ca91cx42.c index bfb2d3f..18078ec 100644 --- a/drivers/vme/bridges/vme_ca91cx42.c +++ b/drivers/vme/bridges/vme_ca91cx42.c @@ -1555,16 +1555,14 @@ static int ca91cx42_crcsr_init(struct vme_bridge *ca91cx42_bridge, } /* Allocate mem for CR/CSR image */ - bridge->crcsr_kernel = pci_alloc_consistent(pdev, VME_CRCSR_BUF_SIZE, - &bridge->crcsr_bus); + bridge->crcsr_kernel = pci_zalloc_consistent(pdev, VME_CRCSR_BUF_SIZE, + &bridge->crcsr_bus); if (bridge->crcsr_kernel == NULL) { dev_err(&pdev->dev, "Failed to allocate memory for CR/CSR " "image\n"); return -ENOMEM; } - memset(bridge->crcsr_kernel, 0, VME_CRCSR_BUF_SIZE); - crcsr_addr = slot * (512 * 1024); iowrite32(bridge->crcsr_bus - crcsr_addr, bridge->base + VCSR_TO); diff --git a/drivers/vme/bridges/vme_tsi148.c b/drivers/vme/bridges/vme_tsi148.c index 61e706c0..e07cfa8 100644 --- a/drivers/vme/bridges/vme_tsi148.c +++ b/drivers/vme/bridges/vme_tsi148.c @@ -2275,16 +2275,14 @@ static int tsi148_crcsr_init(struct vme_bridge *tsi148_bridge, bridge = tsi148_bridge->driver_priv; /* Allocate mem for CR/CSR image */ - bridge->crcsr_kernel = pci_alloc_consistent(pdev, VME_CRCSR_BUF_SIZE, - &bridge->crcsr_bus); + bridge->crcsr_kernel = pci_zalloc_consistent(pdev, VME_CRCSR_BUF_SIZE, + &bridge->crcsr_bus); if (bridge->crcsr_kernel == NULL) { dev_err(tsi148_bridge->parent, "Failed to allocate memory for " "CR/CSR image\n"); return -ENOMEM; } - memset(bridge->crcsr_kernel, 0, VME_CRCSR_BUF_SIZE); - reg_split(bridge->crcsr_bus, &crcsr_bus_high, &crcsr_bus_low); iowrite32be(crcsr_bus_high, bridge->base + TSI148_LCSR_CROU); -- cgit v0.10.2 From e03aec1686e4169d208881e10078d668ad6adb2b Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:55 -0700 Subject: drivers/net/ethernet/amd/pcnet32.c: neaten and remove unnecessary OOM messages Make the code flow a little better for 80 columns. Use a consistent style for the RX and TX rings allocation. Use BIT macro. Use a temporary unsiged int entries for (1< Acked-by: Don Fry Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/drivers/net/ethernet/amd/pcnet32.c b/drivers/net/ethernet/amd/pcnet32.c index 8099fdc..4a8fdc4 100644 --- a/drivers/net/ethernet/amd/pcnet32.c +++ b/drivers/net/ethernet/amd/pcnet32.c @@ -481,36 +481,32 @@ static void pcnet32_realloc_tx_ring(struct net_device *dev, dma_addr_t *new_dma_addr_list; struct pcnet32_tx_head *new_tx_ring; struct sk_buff **new_skb_list; + unsigned int entries = BIT(size); pcnet32_purge_tx_ring(dev); - new_tx_ring = pci_zalloc_consistent(lp->pci_dev, - sizeof(struct pcnet32_tx_head) * - (1 << size), - &new_ring_dma_addr); - if (new_tx_ring == NULL) { - netif_err(lp, drv, dev, "Consistent memory allocation failed\n"); + new_tx_ring = + pci_zalloc_consistent(lp->pci_dev, + sizeof(struct pcnet32_tx_head) * entries, + &new_ring_dma_addr); + if (new_tx_ring == NULL) return; - } - new_dma_addr_list = kcalloc(1 << size, sizeof(dma_addr_t), - GFP_ATOMIC); + new_dma_addr_list = kcalloc(entries, sizeof(dma_addr_t), GFP_ATOMIC); if (!new_dma_addr_list) goto free_new_tx_ring; - new_skb_list = kcalloc(1 << size, sizeof(struct sk_buff *), - GFP_ATOMIC); + new_skb_list = kcalloc(entries, sizeof(struct sk_buff *), GFP_ATOMIC); if (!new_skb_list) goto free_new_lists; kfree(lp->tx_skbuff); kfree(lp->tx_dma_addr); pci_free_consistent(lp->pci_dev, - sizeof(struct pcnet32_tx_head) * - lp->tx_ring_size, lp->tx_ring, - lp->tx_ring_dma_addr); + sizeof(struct pcnet32_tx_head) * lp->tx_ring_size, + lp->tx_ring, lp->tx_ring_dma_addr); - lp->tx_ring_size = (1 << size); + lp->tx_ring_size = entries; lp->tx_mod_mask = lp->tx_ring_size - 1; lp->tx_len_bits = (size << 12); lp->tx_ring = new_tx_ring; @@ -523,8 +519,7 @@ free_new_lists: kfree(new_dma_addr_list); free_new_tx_ring: pci_free_consistent(lp->pci_dev, - sizeof(struct pcnet32_tx_head) * - (1 << size), + sizeof(struct pcnet32_tx_head) * entries, new_tx_ring, new_ring_dma_addr); } @@ -548,16 +543,14 @@ static void pcnet32_realloc_rx_ring(struct net_device *dev, struct pcnet32_rx_head *new_rx_ring; struct sk_buff **new_skb_list; int new, overlap; - unsigned int entries = 1 << size; + unsigned int entries = BIT(size); - new_rx_ring = pci_zalloc_consistent(lp->pci_dev, - sizeof(struct pcnet32_rx_head) * - entries, - &new_ring_dma_addr); - if (new_rx_ring == NULL) { - netif_err(lp, drv, dev, "Consistent memory allocation failed\n"); + new_rx_ring = + pci_zalloc_consistent(lp->pci_dev, + sizeof(struct pcnet32_rx_head) * entries, + &new_ring_dma_addr); + if (new_rx_ring == NULL) return; - } new_dma_addr_list = kcalloc(entries, sizeof(dma_addr_t), GFP_ATOMIC); if (!new_dma_addr_list) -- cgit v0.10.2 From 73d425fd1dda72bbc198e23ccb7c259d15fe49d8 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:57 -0700 Subject: MAINTAINERS: update microcode patterns Commit bad5fa631fca ("x86, microcode: Move to a proper location") moved the files, update the pattern. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 17f5f9f..a306473 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -621,7 +621,7 @@ AMD MICROCODE UPDATE SUPPORT M: Andreas Herrmann L: amd64-microcode@amd64.org S: Maintained -F: arch/x86/kernel/microcode_amd.c +F: arch/x86/kernel/cpu/microcode/amd* AMD XGBE DRIVER M: Tom Lendacky @@ -4692,8 +4692,8 @@ F: drivers/platform/x86/intel_menlow.c INTEL IA32 MICROCODE UPDATE SUPPORT M: Tigran Aivazian S: Maintained -F: arch/x86/kernel/microcode_core.c -F: arch/x86/kernel/microcode_intel.c +F: arch/x86/kernel/cpu/microcode/core* +F: arch/x86/kernel/cpu/microcode/intel* INTEL I/OAT DMA DRIVER M: Dan Williams -- cgit v0.10.2 From ec421a7144d450e306232ed568a8edf328a06d6a Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:24:59 -0700 Subject: MAINTAINERS: update cifs location Commit 30706a545417 ("cifs: create a new Documentation/ directory and move docfiles into it") moved the files, update the pattern. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index a306473..12e8cfa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2415,7 +2415,7 @@ W: http://linux-cifs.samba.org/ Q: http://patchwork.ozlabs.org/project/linux-cifs-client/list/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6.git S: Supported -F: Documentation/filesystems/cifs.txt +F: Documentation/filesystems/cifs/ F: fs/cifs/ COMPACTPCI HOTPLUG CORE -- cgit v0.10.2 From fb2efb5ce83c30aa4687251024c648e099f426ae Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:01 -0700 Subject: MAINTAINERS: use the correct efi-stub location Commit 4171fe2f8a47 ("EFI stub documentation updates") moved the file, update the pattern. Signed-off-by: Joe Perches Acked-by: Roy Franz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 12e8cfa..4031bb8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3419,7 +3419,7 @@ M: Matt Fleming L: linux-efi@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git S: Maintained -F: Documentation/x86/efi-stub.txt +F: Documentation/efi-stub.txt F: arch/ia64/kernel/efi.c F: arch/x86/boot/compressed/eboot.[ch] F: arch/x86/include/asm/efi.h -- cgit v0.10.2 From 4a9c44f15a73846ae059285daec07d16c6fe81dd Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:03 -0700 Subject: MAINTAINERS: update clk/sirf patterns Commit 7bf21bc81f28 ("clk: sirf: re-arch to make the codes support both prima2 and atlas6") moved the files, update the patterns. Signed-off-by: Joe Perches Cc: Barry Song Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 4031bb8..18dddd8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -911,7 +911,7 @@ L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux.git S: Maintained F: arch/arm/mach-prima2/ -F: drivers/clk/clk-prima2.c +F: drivers/clk/sirf/ F: drivers/clocksource/timer-prima2.c F: drivers/clocksource/timer-marco.c N: [^a-z]sirf -- cgit v0.10.2 From 0a759c6ead8470ca6c8ef000615d167db9715626 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:05 -0700 Subject: MAINTAINERS: fix ssbi pattern Incorrect pattern used, it's not a directory, it's a file. Fix it. Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 18dddd8..4d5790c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1178,7 +1178,7 @@ F: drivers/mmc/host/msm_sdcc.h F: drivers/tty/serial/msm_serial.h F: drivers/tty/serial/msm_serial.c F: drivers/*/pm8???-* -F: drivers/mfd/ssbi/ +F: drivers/mfd/ssbi.c F: include/linux/mfd/pm8xxx/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm.git S: Maintained -- cgit v0.10.2 From e4ef47f2fe33906be2a03e5b91c68c84d2613af4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:08 -0700 Subject: MAINTAINERS: use correct filename for sdhci-bcm-kona Use dashes not underscores. Signed-off-by: Joe Perches Cc: Christian Daudt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 4d5790c..c453de3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1983,7 +1983,7 @@ F: arch/arm/boot/dts/bcm113* F: arch/arm/boot/dts/bcm216* F: arch/arm/boot/dts/bcm281* F: arch/arm/configs/bcm_defconfig -F: drivers/mmc/host/sdhci_bcm_kona.c +F: drivers/mmc/host/sdhci-bcm-kona.c F: drivers/clocksource/bcm_kona_timer.c BROADCOM BCM2835 ARM ARCHICTURE -- cgit v0.10.2 From 9a67f099c802879c1dd7952114faee4adcf0a5b4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:10 -0700 Subject: MAINTAINERS: fix PXA3xx NAND FLASH DRIVER pattern Use underscore, not dash Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index c453de3..13dbfe9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7245,7 +7245,7 @@ PXA3xx NAND FLASH DRIVER M: Ezequiel Garcia L: linux-mtd@lists.infradead.org S: Maintained -F: drivers/mtd/nand/pxa3xx-nand.c +F: drivers/mtd/nand/pxa3xx_nand.c MMP SUPPORT M: Eric Miao -- cgit v0.10.2 From b87339877e0f15ecf61d34b41ab442209c1ca085 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:12 -0700 Subject: MAINTAINERS: update picoxcell patterns Fix the picoxcell patterns, add the dts directory too. Signed-off-by: Joe Perches Acked-by: Jamie Iles Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 13dbfe9..b824adc3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6959,9 +6959,9 @@ M: Jamie Iles L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) T: git git://github.com/jamieiles/linux-2.6-ji.git S: Supported +F: arch/arm/boot/dts/picoxcell* F: arch/arm/mach-picoxcell/ -F: drivers/*/picoxcell* -F: drivers/*/*/picoxcell* +F: drivers/crypto/picoxcell* PIN CONTROL SUBSYSTEM M: Linus Walleij -- cgit v0.10.2 From 988636a8bed6c7b0ccdb1ccfdf7264e953ee4299 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:14 -0700 Subject: MAINTAINERS: remove section CIRRUS LOGIC EP93XX OHCI USB HOST DRIVER Commit e55f7cd24676 ("usb: ohci: remove ep93xx bus glue platform driver") removed the file, remove the section. Signed-off-by: Joe Perches Cc: H Hartley Sweeten Cc: Lennert Buytenhek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index b824adc3..8791bc1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2325,12 +2325,6 @@ L: netdev@vger.kernel.org S: Maintained F: drivers/net/ethernet/cirrus/ep93xx_eth.c -CIRRUS LOGIC EP93XX OHCI USB HOST DRIVER -M: Lennert Buytenhek -L: linux-usb@vger.kernel.org -S: Maintained -F: drivers/usb/host/ohci-ep93xx.c - CIRRUS LOGIC AUDIO CODEC DRIVERS M: Brian Austin M: Paul Handrigan -- cgit v0.10.2 From d656143a1d2363aa8dbef2797edf0c91c9e891fc Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:16 -0700 Subject: MAINTAINERS: remove METAG imgdafs pattern This never made it into the kernel tree. Remove it. Signed-off-by: Joe Perches Acked-by: James Hogan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 8791bc1..73bb906 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5867,7 +5867,6 @@ F: drivers/clocksource/metag_generic.c F: drivers/irqchip/irq-metag.c F: drivers/irqchip/irq-metag-ext.c F: drivers/tty/metag_da.c -F: fs/imgdafs/ MICROBLAZE ARCHITECTURE M: Michal Simek -- cgit v0.10.2 From eb231527b54b47f6ce61a8771813e7cd69510d82 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:18 -0700 Subject: MAINTAINERS: remove unused radeon drm pattern Commit 8dcedd7e87f4 ("UAPI: (Scripted) Disintegrate include/drm") moved the file, remove the pattern. Signed-off-by: Joe Perches Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 73bb906..527cc95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3039,7 +3039,6 @@ L: dri-devel@lists.freedesktop.org T: git git://people.freedesktop.org/~agd5f/linux S: Supported F: drivers/gpu/drm/radeon/ -F: include/drm/radeon* F: include/uapi/drm/radeon* DRM PANEL DRIVERS -- cgit v0.10.2 From 1db22e8b837246608a467e7f0a7d1b2ff0c92a19 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:20 -0700 Subject: MAINTAINERS: remove unusd ARM/QUALCOMM MSM pattern Commit 87933a68dce6 ("mfd: pm8921: Remove pm8xxx API now that sub-devices use regmap") removed the file, remove the pattern. Signed-off-by: Joe Perches Reviewed-by: Stephen Boyd Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 527cc95..f0cf109 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1179,7 +1179,6 @@ F: drivers/tty/serial/msm_serial.h F: drivers/tty/serial/msm_serial.c F: drivers/*/pm8???-* F: drivers/mfd/ssbi.c -F: include/linux/mfd/pm8xxx/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm.git S: Maintained -- cgit v0.10.2 From 935e9f02e798051d2923d59f6025cd74f59aa0e1 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:25:22 -0700 Subject: MAINTAINERS: remove unused NFSD pattern A series of commits by Christoph Hellwig removed all the files in this directory, remove the pattern. Signed-off-by: Joe Perches Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index f0cf109..f49eb67 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5144,7 +5144,6 @@ L: linux-nfs@vger.kernel.org W: http://nfs.sourceforge.net/ S: Supported F: fs/nfsd/ -F: include/linux/nfsd/ F: include/uapi/linux/nfsd/ F: fs/lockd/ F: fs/nfs_common/ -- cgit v0.10.2 From 4bb5f5d9395bc112d93a134d8f5b05611eddc9c0 Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:25 -0700 Subject: mm: allow drivers to prevent new writable mappings This patch (of 6): The i_mmap_writable field counts existing writable mappings of an address_space. To allow drivers to prevent new writable mappings, make this counter signed and prevent new writable mappings if it is negative. This is modelled after i_writecount and DENYWRITE. This will be required by the shmem-sealing infrastructure to prevent any new writable mappings after the WRITE seal has been set. In case there exists a writable mapping, this operation will fail with EBUSY. Note that we rely on the fact that iff you already own a writable mapping, you can increase the counter without using the helpers. This is the same that we do for i_writecount. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/inode.c b/fs/inode.c index 5938f39..26753ba 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -165,6 +165,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping->a_ops = &empty_aops; mapping->host = inode; mapping->flags = 0; + atomic_set(&mapping->i_mmap_writable, 0); mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); mapping->private_data = NULL; mapping->backing_dev_info = &default_backing_dev_info; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1ab6c69..f0890e4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -387,7 +387,7 @@ struct address_space { struct inode *host; /* owner: inode, block_device */ struct radix_tree_root page_tree; /* radix tree of all pages */ spinlock_t tree_lock; /* and lock protecting it */ - unsigned int i_mmap_writable;/* count VM_SHARED mappings */ + atomic_t i_mmap_writable;/* count VM_SHARED mappings */ struct rb_root i_mmap; /* tree of private and shared mappings */ struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ struct mutex i_mmap_mutex; /* protect tree, count, list */ @@ -470,10 +470,35 @@ static inline int mapping_mapped(struct address_space *mapping) * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap_pgoff * marks vma as VM_SHARED if it is shared, and the file was opened for * writing i.e. vma may be mprotected writable even if now readonly. + * + * If i_mmap_writable is negative, no new writable mappings are allowed. You + * can only deny writable mappings, if none exists right now. */ static inline int mapping_writably_mapped(struct address_space *mapping) { - return mapping->i_mmap_writable != 0; + return atomic_read(&mapping->i_mmap_writable) > 0; +} + +static inline int mapping_map_writable(struct address_space *mapping) +{ + return atomic_inc_unless_negative(&mapping->i_mmap_writable) ? + 0 : -EPERM; +} + +static inline void mapping_unmap_writable(struct address_space *mapping) +{ + atomic_dec(&mapping->i_mmap_writable); +} + +static inline int mapping_deny_writable(struct address_space *mapping) +{ + return atomic_dec_unless_positive(&mapping->i_mmap_writable) ? + 0 : -EBUSY; +} + +static inline void mapping_allow_writable(struct address_space *mapping) +{ + atomic_inc(&mapping->i_mmap_writable); } /* diff --git a/kernel/fork.c b/kernel/fork.c index fa91243..1380d8a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -429,7 +429,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) atomic_dec(&inode->i_writecount); mutex_lock(&mapping->i_mmap_mutex); if (tmp->vm_flags & VM_SHARED) - mapping->i_mmap_writable++; + atomic_inc(&mapping->i_mmap_writable); flush_dcache_mmap_lock(mapping); /* insert tmp into the share list, just after mpnt */ if (unlikely(tmp->vm_flags & VM_NONLINEAR)) diff --git a/mm/mmap.c b/mm/mmap.c index 64c9d73..c1f2ea4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -221,7 +221,7 @@ static void __remove_shared_vm_struct(struct vm_area_struct *vma, if (vma->vm_flags & VM_DENYWRITE) atomic_inc(&file_inode(file)->i_writecount); if (vma->vm_flags & VM_SHARED) - mapping->i_mmap_writable--; + mapping_unmap_writable(mapping); flush_dcache_mmap_lock(mapping); if (unlikely(vma->vm_flags & VM_NONLINEAR)) @@ -622,7 +622,7 @@ static void __vma_link_file(struct vm_area_struct *vma) if (vma->vm_flags & VM_DENYWRITE) atomic_dec(&file_inode(file)->i_writecount); if (vma->vm_flags & VM_SHARED) - mapping->i_mmap_writable++; + atomic_inc(&mapping->i_mmap_writable); flush_dcache_mmap_lock(mapping); if (unlikely(vma->vm_flags & VM_NONLINEAR)) @@ -1577,6 +1577,17 @@ munmap_back: if (error) goto free_vma; } + if (vm_flags & VM_SHARED) { + error = mapping_map_writable(file->f_mapping); + if (error) + goto allow_write_and_free_vma; + } + + /* ->mmap() can change vma->vm_file, but must guarantee that + * vma_link() below can deny write-access if VM_DENYWRITE is set + * and map writably if VM_SHARED is set. This usually means the + * new file must not have been exposed to user-space, yet. + */ vma->vm_file = get_file(file); error = file->f_op->mmap(file, vma); if (error) @@ -1616,8 +1627,12 @@ munmap_back: vma_link(mm, vma, prev, rb_link, rb_parent); /* Once vma denies write, undo our temporary denial count */ - if (vm_flags & VM_DENYWRITE) - allow_write_access(file); + if (file) { + if (vm_flags & VM_SHARED) + mapping_unmap_writable(file->f_mapping); + if (vm_flags & VM_DENYWRITE) + allow_write_access(file); + } file = vma->vm_file; out: perf_event_mmap(vma); @@ -1646,14 +1661,17 @@ out: return addr; unmap_and_free_vma: - if (vm_flags & VM_DENYWRITE) - allow_write_access(file); vma->vm_file = NULL; fput(file); /* Undo any partial mapping done by a device driver. */ unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); charged = 0; + if (vm_flags & VM_SHARED) + mapping_unmap_writable(file->f_mapping); +allow_write_and_free_vma: + if (vm_flags & VM_DENYWRITE) + allow_write_access(file); free_vma: kmem_cache_free(vm_area_cachep, vma); unacct_error: diff --git a/mm/swap_state.c b/mm/swap_state.c index e160151..3e0ec83 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -39,6 +39,7 @@ static struct backing_dev_info swap_backing_dev_info = { struct address_space swapper_spaces[MAX_SWAPFILES] = { [0 ... MAX_SWAPFILES - 1] = { .page_tree = RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN), + .i_mmap_writable = ATOMIC_INIT(0), .a_ops = &swap_aops, .backing_dev_info = &swap_backing_dev_info, } -- cgit v0.10.2 From 40e041a2c858b3caefc757e26cb85bfceae5062b Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:27 -0700 Subject: shm: add sealing API If two processes share a common memory region, they usually want some guarantees to allow safe access. This often includes: - one side cannot overwrite data while the other reads it - one side cannot shrink the buffer while the other accesses it - one side cannot grow the buffer beyond previously set boundaries If there is a trust-relationship between both parties, there is no need for policy enforcement. However, if there's no trust relationship (eg., for general-purpose IPC) sharing memory-regions is highly fragile and often not possible without local copies. Look at the following two use-cases: 1) A graphics client wants to share its rendering-buffer with a graphics-server. The memory-region is allocated by the client for read/write access and a second FD is passed to the server. While scanning out from the memory region, the server has no guarantee that the client doesn't shrink the buffer at any time, requiring rather cumbersome SIGBUS handling. 2) A process wants to perform an RPC on another process. To avoid huge bandwidth consumption, zero-copy is preferred. After a message is assembled in-memory and a FD is passed to the remote side, both sides want to be sure that neither modifies this shared copy, anymore. The source may have put sensible data into the message without a separate copy and the target may want to parse the message inline, to avoid a local copy. While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide ways to achieve most of this, the first one is unproportionally ugly to use in libraries and the latter two are broken/racy or even disabled due to denial of service attacks. This patch introduces the concept of SEALING. If you seal a file, a specific set of operations is blocked on that file forever. Unlike locks, seals can only be set, never removed. Hence, once you verified a specific set of seals is set, you're guaranteed that no-one can perform the blocked operations on this file, anymore. An initial set of SEALS is introduced by this patch: - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced in size. This affects ftruncate() and open(O_TRUNC). - GROW: If SEAL_GROW is set, the file in question cannot be increased in size. This affects ftruncate(), fallocate() and write(). - WRITE: If SEAL_WRITE is set, no write operations (besides resizing) are possible. This affects fallocate(PUNCH_HOLE), mmap() and write(). - SEAL: If SEAL_SEAL is set, no further seals can be added to a file. This basically prevents the F_ADD_SEAL operation on a file and can be set to prevent others from adding further seals that you don't want. The described use-cases can easily use these seals to provide safe use without any trust-relationship: 1) The graphics server can verify that a passed file-descriptor has SEAL_SHRINK set. This allows safe scanout, while the client is allowed to increase buffer size for window-resizing on-the-fly. Concurrent writes are explicitly allowed. 2) For general-purpose IPC, both processes can verify that SEAL_SHRINK, SEAL_GROW and SEAL_WRITE are set. This guarantees that neither process can modify the data while the other side parses it. Furthermore, it guarantees that even with writable FDs passed to the peer, it cannot increase the size to hit memory-limits of the source process (in case the file-storage is accounted to the source). The new API is an extension to fcntl(), adding two new commands: F_GET_SEALS: Return a bitset describing the seals on the file. This can be called on any FD if the underlying file supports sealing. F_ADD_SEALS: Change the seals of a given file. This requires WRITE access to the file and F_SEAL_SEAL may not already be set. Furthermore, the underlying file must support sealing and there may not be any existing shared mapping of that file. Otherwise, EBADF/EPERM is returned. The given seals are _added_ to the existing set of seals on the file. You cannot remove seals again. The fcntl() handler is currently specific to shmem and disabled on all files. A file needs to explicitly support sealing for this interface to work. A separate syscall is added in a follow-up, which creates files that support sealing. There is no intention to support this on other file-systems. Semantics are unclear for non-volatile files and we lack any use-case right now. Therefore, the implementation is specific to shmem. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/fs/fcntl.c b/fs/fcntl.c index 72c82f6..22d1c3d 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -336,6 +337,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_GETPIPE_SZ: err = pipe_fcntl(filp, cmd, arg); break; + case F_ADD_SEALS: + case F_GET_SEALS: + err = shmem_fcntl(filp, cmd, arg); + break; default: break; } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 4d1771c..50777b5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -1,6 +1,7 @@ #ifndef __SHMEM_FS_H #define __SHMEM_FS_H +#include #include #include #include @@ -11,6 +12,7 @@ struct shmem_inode_info { spinlock_t lock; + unsigned int seals; /* shmem seals */ unsigned long flags; unsigned long alloced; /* data pages alloced to file */ union { @@ -65,4 +67,19 @@ static inline struct page *shmem_read_mapping_page( mapping_gfp_mask(mapping)); } +#ifdef CONFIG_TMPFS + +extern int shmem_add_seals(struct file *file, unsigned int seals); +extern int shmem_get_seals(struct file *file); +extern long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg); + +#else + +static inline long shmem_fcntl(struct file *f, unsigned int c, unsigned long a) +{ + return -EINVAL; +} + +#endif + #endif diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 074b886..beed138 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -28,6 +28,21 @@ #define F_GETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 8) /* + * Set/Get seals + */ +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9) +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10) + +/* + * Types of seals + */ +#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */ +#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ +#define F_SEAL_GROW 0x0004 /* prevent file from growing */ +#define F_SEAL_WRITE 0x0008 /* prevent writes */ +/* (1U << 31) is reserved for signed error codes */ + +/* * Types of directory notifications that may be requested. */ #define DN_ACCESS 0x00000001 /* File accessed */ diff --git a/mm/shmem.c b/mm/shmem.c index 6dc80d2..8b43bb7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -66,6 +66,7 @@ static struct vfsmount *shm_mnt; #include #include #include +#include #include #include @@ -547,6 +548,7 @@ EXPORT_SYMBOL_GPL(shmem_truncate_range); static int shmem_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry->d_inode; + struct shmem_inode_info *info = SHMEM_I(inode); int error; error = inode_change_ok(inode, attr); @@ -557,6 +559,11 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr) loff_t oldsize = inode->i_size; loff_t newsize = attr->ia_size; + /* protected by i_mutex */ + if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || + (newsize > oldsize && (info->seals & F_SEAL_GROW))) + return -EPERM; + if (newsize != oldsize) { error = shmem_reacct_size(SHMEM_I(inode)->flags, oldsize, newsize); @@ -1412,6 +1419,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode info = SHMEM_I(inode); memset(info, 0, (char *)inode - (char *)info); spin_lock_init(&info->lock); + info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; INIT_LIST_HEAD(&info->swaplist); simple_xattrs_init(&info->xattrs); @@ -1470,7 +1478,17 @@ shmem_write_begin(struct file *file, struct address_space *mapping, struct page **pagep, void **fsdata) { struct inode *inode = mapping->host; + struct shmem_inode_info *info = SHMEM_I(inode); pgoff_t index = pos >> PAGE_CACHE_SHIFT; + + /* i_mutex is held by caller */ + if (unlikely(info->seals)) { + if (info->seals & F_SEAL_WRITE) + return -EPERM; + if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) + return -EPERM; + } + return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL); } @@ -1808,11 +1826,125 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return offset; } +static int shmem_wait_for_pins(struct address_space *mapping) +{ + return 0; +} + +#define F_ALL_SEALS (F_SEAL_SEAL | \ + F_SEAL_SHRINK | \ + F_SEAL_GROW | \ + F_SEAL_WRITE) + +int shmem_add_seals(struct file *file, unsigned int seals) +{ + struct inode *inode = file_inode(file); + struct shmem_inode_info *info = SHMEM_I(inode); + int error; + + /* + * SEALING + * Sealing allows multiple parties to share a shmem-file but restrict + * access to a specific subset of file operations. Seals can only be + * added, but never removed. This way, mutually untrusted parties can + * share common memory regions with a well-defined policy. A malicious + * peer can thus never perform unwanted operations on a shared object. + * + * Seals are only supported on special shmem-files and always affect + * the whole underlying inode. Once a seal is set, it may prevent some + * kinds of access to the file. Currently, the following seals are + * defined: + * SEAL_SEAL: Prevent further seals from being set on this file + * SEAL_SHRINK: Prevent the file from shrinking + * SEAL_GROW: Prevent the file from growing + * SEAL_WRITE: Prevent write access to the file + * + * As we don't require any trust relationship between two parties, we + * must prevent seals from being removed. Therefore, sealing a file + * only adds a given set of seals to the file, it never touches + * existing seals. Furthermore, the "setting seals"-operation can be + * sealed itself, which basically prevents any further seal from being + * added. + * + * Semantics of sealing are only defined on volatile files. Only + * anonymous shmem files support sealing. More importantly, seals are + * never written to disk. Therefore, there's no plan to support it on + * other file types. + */ + + if (file->f_op != &shmem_file_operations) + return -EINVAL; + if (!(file->f_mode & FMODE_WRITE)) + return -EPERM; + if (seals & ~(unsigned int)F_ALL_SEALS) + return -EINVAL; + + mutex_lock(&inode->i_mutex); + + if (info->seals & F_SEAL_SEAL) { + error = -EPERM; + goto unlock; + } + + if ((seals & F_SEAL_WRITE) && !(info->seals & F_SEAL_WRITE)) { + error = mapping_deny_writable(file->f_mapping); + if (error) + goto unlock; + + error = shmem_wait_for_pins(file->f_mapping); + if (error) { + mapping_allow_writable(file->f_mapping); + goto unlock; + } + } + + info->seals |= seals; + error = 0; + +unlock: + mutex_unlock(&inode->i_mutex); + return error; +} +EXPORT_SYMBOL_GPL(shmem_add_seals); + +int shmem_get_seals(struct file *file) +{ + if (file->f_op != &shmem_file_operations) + return -EINVAL; + + return SHMEM_I(file_inode(file))->seals; +} +EXPORT_SYMBOL_GPL(shmem_get_seals); + +long shmem_fcntl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long error; + + switch (cmd) { + case F_ADD_SEALS: + /* disallow upper 32bit */ + if (arg > UINT_MAX) + return -EINVAL; + + error = shmem_add_seals(file, arg); + break; + case F_GET_SEALS: + error = shmem_get_seals(file); + break; + default: + error = -EINVAL; + break; + } + + return error; +} + static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_falloc shmem_falloc; pgoff_t start, index, end; int error; @@ -1828,6 +1960,12 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq); + /* protected by i_mutex */ + if (info->seals & F_SEAL_WRITE) { + error = -EPERM; + goto out; + } + shmem_falloc.waitq = &shmem_falloc_waitq; shmem_falloc.start = unmap_start >> PAGE_SHIFT; shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT; @@ -1854,6 +1992,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, if (error) goto out; + if ((info->seals & F_SEAL_GROW) && offset + len > inode->i_size) { + error = -EPERM; + goto out; + } + start = offset >> PAGE_CACHE_SHIFT; end = (offset + len + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; /* Try to avoid a swapstorm if len is impossible to satisfy */ -- cgit v0.10.2 From 9183df25fe7b194563db3fec6dc3202a5855839c Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:29 -0700 Subject: shm: add memfd_create() syscall memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor that you can pass to mmap(). It can support sealing and avoids any connection to user-visible mount-points. Thus, it's not subject to quotas on mounted file-systems, but can be used like malloc()'ed memory, but with a file-descriptor to it. memfd_create() returns the raw shmem file, so calls like ftruncate() can be used to modify the underlying inode. Also calls like fstat() will return proper information and mark the file as regular file. If you want sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not supported (like on all other regular files). Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not subject to a filesystem size limit. It is still properly accounted to memcg limits, though, and to the same overcommit or no-overcommit accounting as all user memory. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index d1b4a11..028b781 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -362,3 +362,4 @@ 353 i386 renameat2 sys_renameat2 354 i386 seccomp sys_seccomp 355 i386 getrandom sys_getrandom +356 i386 memfd_create sys_memfd_create diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 252c804..ca2b9aa 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -325,6 +325,7 @@ 316 common renameat2 sys_renameat2 317 common seccomp sys_seccomp 318 common getrandom sys_getrandom +319 common memfd_create sys_memfd_create # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 701daff..15a0694 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -802,6 +802,7 @@ asmlinkage long sys_timerfd_settime(int ufd, int flags, asmlinkage long sys_timerfd_gettime(int ufd, struct itimerspec __user *otmr); asmlinkage long sys_eventfd(unsigned int count); asmlinkage long sys_eventfd2(unsigned int count, int flags); +asmlinkage long sys_memfd_create(const char __user *uname_ptr, unsigned int flags); asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len); asmlinkage long sys_old_readdir(unsigned int, struct old_linux_dirent __user *, unsigned int); asmlinkage long sys_pselect6(int, fd_set __user *, fd_set __user *, diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h new file mode 100644 index 0000000..534e364 --- /dev/null +++ b/include/uapi/linux/memfd.h @@ -0,0 +1,8 @@ +#ifndef _UAPI_LINUX_MEMFD_H +#define _UAPI_LINUX_MEMFD_H + +/* flags for memfd_create(2) (unsigned int) */ +#define MFD_CLOEXEC 0x0001U +#define MFD_ALLOW_SEALING 0x0002U + +#endif /* _UAPI_LINUX_MEMFD_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 2904a21..1f79e37 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -197,6 +197,7 @@ cond_syscall(compat_sys_timerfd_settime); cond_syscall(compat_sys_timerfd_gettime); cond_syscall(sys_eventfd); cond_syscall(sys_eventfd2); +cond_syscall(sys_memfd_create); /* performance counters: */ cond_syscall(sys_perf_event_open); diff --git a/mm/shmem.c b/mm/shmem.c index 8b43bb7..4a54987 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -66,7 +66,9 @@ static struct vfsmount *shm_mnt; #include #include #include +#include #include +#include #include #include @@ -2732,6 +2734,77 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) shmem_show_mpol(seq, sbinfo->mpol); return 0; } + +#define MFD_NAME_PREFIX "memfd:" +#define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) +#define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) + +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING) + +SYSCALL_DEFINE2(memfd_create, + const char __user *, uname, + unsigned int, flags) +{ + struct shmem_inode_info *info; + struct file *file; + int fd, error; + char *name; + long len; + + if (flags & ~(unsigned int)MFD_ALL_FLAGS) + return -EINVAL; + + /* length includes terminating zero */ + len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); + if (len <= 0) + return -EFAULT; + if (len > MFD_NAME_MAX_LEN + 1) + return -EINVAL; + + name = kmalloc(len + MFD_NAME_PREFIX_LEN, GFP_TEMPORARY); + if (!name) + return -ENOMEM; + + strcpy(name, MFD_NAME_PREFIX); + if (copy_from_user(&name[MFD_NAME_PREFIX_LEN], uname, len)) { + error = -EFAULT; + goto err_name; + } + + /* terminating-zero may have changed after strnlen_user() returned */ + if (name[len + MFD_NAME_PREFIX_LEN - 1]) { + error = -EFAULT; + goto err_name; + } + + fd = get_unused_fd_flags((flags & MFD_CLOEXEC) ? O_CLOEXEC : 0); + if (fd < 0) { + error = fd; + goto err_name; + } + + file = shmem_file_setup(name, 0, VM_NORESERVE); + if (IS_ERR(file)) { + error = PTR_ERR(file); + goto err_fd; + } + info = SHMEM_I(file_inode(file)); + file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; + file->f_flags |= O_RDWR | O_LARGEFILE; + if (flags & MFD_ALLOW_SEALING) + info->seals &= ~F_SEAL_SEAL; + + fd_install(fd, file); + kfree(name); + return fd; + +err_fd: + put_unused_fd(fd); +err_name: + kfree(name); + return error; +} + #endif /* CONFIG_TMPFS */ static void shmem_put_super(struct super_block *sb) -- cgit v0.10.2 From 4f5ce5e8d7e2da3c714df8a7fa42edb9f992fc52 Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:32 -0700 Subject: selftests: add memfd_create() + sealing tests Some basic tests to verify sealing on memfds works as expected and guarantees the advertised semantics. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index d10f95c..6fd2a44 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -2,6 +2,7 @@ TARGETS = breakpoints TARGETS += cpu-hotplug TARGETS += efivarfs TARGETS += kcmp +TARGETS += memfd TARGETS += memory-hotplug TARGETS += mqueue TARGETS += net diff --git a/tools/testing/selftests/memfd/.gitignore b/tools/testing/selftests/memfd/.gitignore new file mode 100644 index 0000000..bcc8ee2 --- /dev/null +++ b/tools/testing/selftests/memfd/.gitignore @@ -0,0 +1,2 @@ +memfd_test +memfd-test-file diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile new file mode 100644 index 0000000..36653b9 --- /dev/null +++ b/tools/testing/selftests/memfd/Makefile @@ -0,0 +1,29 @@ +uname_M := $(shell uname -m 2>/dev/null || echo not) +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/) +ifeq ($(ARCH),i386) + ARCH := X86 +endif +ifeq ($(ARCH),x86_64) + ARCH := X86 +endif + +CFLAGS += -I../../../../arch/x86/include/generated/uapi/ +CFLAGS += -I../../../../arch/x86/include/uapi/ +CFLAGS += -I../../../../include/uapi/ +CFLAGS += -I../../../../include/ + +all: +ifeq ($(ARCH),X86) + gcc $(CFLAGS) memfd_test.c -o memfd_test +else + echo "Not an x86 target, can't build memfd selftest" +endif + +run_tests: all +ifeq ($(ARCH),X86) + gcc $(CFLAGS) memfd_test.c -o memfd_test +endif + @./memfd_test || echo "memfd_test: [FAIL]" + +clean: + $(RM) memfd_test diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c new file mode 100644 index 0000000..3634c90 --- /dev/null +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -0,0 +1,913 @@ +#define _GNU_SOURCE +#define __EXPORTED_HEADERS__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MFD_DEF_SIZE 8192 +#define STACK_SIZE 65535 + +static int sys_memfd_create(const char *name, + unsigned int flags) +{ + return syscall(__NR_memfd_create, name, flags); +} + +static int mfd_assert_new(const char *name, loff_t sz, unsigned int flags) +{ + int r, fd; + + fd = sys_memfd_create(name, flags); + if (fd < 0) { + printf("memfd_create(\"%s\", %u) failed: %m\n", + name, flags); + abort(); + } + + r = ftruncate(fd, sz); + if (r < 0) { + printf("ftruncate(%llu) failed: %m\n", (unsigned long long)sz); + abort(); + } + + return fd; +} + +static void mfd_fail_new(const char *name, unsigned int flags) +{ + int r; + + r = sys_memfd_create(name, flags); + if (r >= 0) { + printf("memfd_create(\"%s\", %u) succeeded, but failure expected\n", + name, flags); + close(r); + abort(); + } +} + +static __u64 mfd_assert_get_seals(int fd) +{ + long r; + + r = fcntl(fd, F_GET_SEALS); + if (r < 0) { + printf("GET_SEALS(%d) failed: %m\n", fd); + abort(); + } + + return r; +} + +static void mfd_assert_has_seals(int fd, __u64 seals) +{ + __u64 s; + + s = mfd_assert_get_seals(fd); + if (s != seals) { + printf("%llu != %llu = GET_SEALS(%d)\n", + (unsigned long long)seals, (unsigned long long)s, fd); + abort(); + } +} + +static void mfd_assert_add_seals(int fd, __u64 seals) +{ + long r; + __u64 s; + + s = mfd_assert_get_seals(fd); + r = fcntl(fd, F_ADD_SEALS, seals); + if (r < 0) { + printf("ADD_SEALS(%d, %llu -> %llu) failed: %m\n", + fd, (unsigned long long)s, (unsigned long long)seals); + abort(); + } +} + +static void mfd_fail_add_seals(int fd, __u64 seals) +{ + long r; + __u64 s; + + r = fcntl(fd, F_GET_SEALS); + if (r < 0) + s = 0; + else + s = r; + + r = fcntl(fd, F_ADD_SEALS, seals); + if (r >= 0) { + printf("ADD_SEALS(%d, %llu -> %llu) didn't fail as expected\n", + fd, (unsigned long long)s, (unsigned long long)seals); + abort(); + } +} + +static void mfd_assert_size(int fd, size_t size) +{ + struct stat st; + int r; + + r = fstat(fd, &st); + if (r < 0) { + printf("fstat(%d) failed: %m\n", fd); + abort(); + } else if (st.st_size != size) { + printf("wrong file size %lld, but expected %lld\n", + (long long)st.st_size, (long long)size); + abort(); + } +} + +static int mfd_assert_dup(int fd) +{ + int r; + + r = dup(fd); + if (r < 0) { + printf("dup(%d) failed: %m\n", fd); + abort(); + } + + return r; +} + +static void *mfd_assert_mmap_shared(int fd) +{ + void *p; + + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + return p; +} + +static void *mfd_assert_mmap_private(int fd) +{ + void *p; + + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ, + MAP_PRIVATE, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + return p; +} + +static int mfd_assert_open(int fd, int flags, mode_t mode) +{ + char buf[512]; + int r; + + sprintf(buf, "/proc/self/fd/%d", fd); + r = open(buf, flags, mode); + if (r < 0) { + printf("open(%s) failed: %m\n", buf); + abort(); + } + + return r; +} + +static void mfd_fail_open(int fd, int flags, mode_t mode) +{ + char buf[512]; + int r; + + sprintf(buf, "/proc/self/fd/%d", fd); + r = open(buf, flags, mode); + if (r >= 0) { + printf("open(%s) didn't fail as expected\n"); + abort(); + } +} + +static void mfd_assert_read(int fd) +{ + char buf[16]; + void *p; + ssize_t l; + + l = read(fd, buf, sizeof(buf)); + if (l != sizeof(buf)) { + printf("read() failed: %m\n"); + abort(); + } + + /* verify PROT_READ *is* allowed */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ, + MAP_PRIVATE, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + munmap(p, MFD_DEF_SIZE); + + /* verify MAP_PRIVATE is *always* allowed (even writable) */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_PRIVATE, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + munmap(p, MFD_DEF_SIZE); +} + +static void mfd_assert_write(int fd) +{ + ssize_t l; + void *p; + int r; + + /* verify write() succeeds */ + l = write(fd, "\0\0\0\0", 4); + if (l != 4) { + printf("write() failed: %m\n"); + abort(); + } + + /* verify PROT_READ | PROT_WRITE is allowed */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + *(char *)p = 0; + munmap(p, MFD_DEF_SIZE); + + /* verify PROT_WRITE is allowed */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + *(char *)p = 0; + munmap(p, MFD_DEF_SIZE); + + /* verify PROT_READ with MAP_SHARED is allowed and a following + * mprotect(PROT_WRITE) allows writing */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ, + MAP_SHARED, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + r = mprotect(p, MFD_DEF_SIZE, PROT_READ | PROT_WRITE); + if (r < 0) { + printf("mprotect() failed: %m\n"); + abort(); + } + + *(char *)p = 0; + munmap(p, MFD_DEF_SIZE); + + /* verify PUNCH_HOLE works */ + r = fallocate(fd, + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + 0, + MFD_DEF_SIZE); + if (r < 0) { + printf("fallocate(PUNCH_HOLE) failed: %m\n"); + abort(); + } +} + +static void mfd_fail_write(int fd) +{ + ssize_t l; + void *p; + int r; + + /* verify write() fails */ + l = write(fd, "data", 4); + if (l != -EPERM) { + printf("expected EPERM on write(), but got %d: %m\n", (int)l); + abort(); + } + + /* verify PROT_READ | PROT_WRITE is not allowed */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p != MAP_FAILED) { + printf("mmap() didn't fail as expected\n"); + abort(); + } + + /* verify PROT_WRITE is not allowed */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p != MAP_FAILED) { + printf("mmap() didn't fail as expected\n"); + abort(); + } + + /* Verify PROT_READ with MAP_SHARED with a following mprotect is not + * allowed. Note that for r/w the kernel already prevents the mmap. */ + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ, + MAP_SHARED, + fd, + 0); + if (p != MAP_FAILED) { + r = mprotect(p, MFD_DEF_SIZE, PROT_READ | PROT_WRITE); + if (r >= 0) { + printf("mmap()+mprotect() didn't fail as expected\n"); + abort(); + } + } + + /* verify PUNCH_HOLE fails */ + r = fallocate(fd, + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + 0, + MFD_DEF_SIZE); + if (r >= 0) { + printf("fallocate(PUNCH_HOLE) didn't fail as expected\n"); + abort(); + } +} + +static void mfd_assert_shrink(int fd) +{ + int r, fd2; + + r = ftruncate(fd, MFD_DEF_SIZE / 2); + if (r < 0) { + printf("ftruncate(SHRINK) failed: %m\n"); + abort(); + } + + mfd_assert_size(fd, MFD_DEF_SIZE / 2); + + fd2 = mfd_assert_open(fd, + O_RDWR | O_CREAT | O_TRUNC, + S_IRUSR | S_IWUSR); + close(fd2); + + mfd_assert_size(fd, 0); +} + +static void mfd_fail_shrink(int fd) +{ + int r; + + r = ftruncate(fd, MFD_DEF_SIZE / 2); + if (r >= 0) { + printf("ftruncate(SHRINK) didn't fail as expected\n"); + abort(); + } + + mfd_fail_open(fd, + O_RDWR | O_CREAT | O_TRUNC, + S_IRUSR | S_IWUSR); +} + +static void mfd_assert_grow(int fd) +{ + int r; + + r = ftruncate(fd, MFD_DEF_SIZE * 2); + if (r < 0) { + printf("ftruncate(GROW) failed: %m\n"); + abort(); + } + + mfd_assert_size(fd, MFD_DEF_SIZE * 2); + + r = fallocate(fd, + 0, + 0, + MFD_DEF_SIZE * 4); + if (r < 0) { + printf("fallocate(ALLOC) failed: %m\n"); + abort(); + } + + mfd_assert_size(fd, MFD_DEF_SIZE * 4); +} + +static void mfd_fail_grow(int fd) +{ + int r; + + r = ftruncate(fd, MFD_DEF_SIZE * 2); + if (r >= 0) { + printf("ftruncate(GROW) didn't fail as expected\n"); + abort(); + } + + r = fallocate(fd, + 0, + 0, + MFD_DEF_SIZE * 4); + if (r >= 0) { + printf("fallocate(ALLOC) didn't fail as expected\n"); + abort(); + } +} + +static void mfd_assert_grow_write(int fd) +{ + static char buf[MFD_DEF_SIZE * 8]; + ssize_t l; + + l = pwrite(fd, buf, sizeof(buf), 0); + if (l != sizeof(buf)) { + printf("pwrite() failed: %m\n"); + abort(); + } + + mfd_assert_size(fd, MFD_DEF_SIZE * 8); +} + +static void mfd_fail_grow_write(int fd) +{ + static char buf[MFD_DEF_SIZE * 8]; + ssize_t l; + + l = pwrite(fd, buf, sizeof(buf), 0); + if (l == sizeof(buf)) { + printf("pwrite() didn't fail as expected\n"); + abort(); + } +} + +static int idle_thread_fn(void *arg) +{ + sigset_t set; + int sig; + + /* dummy waiter; SIGTERM terminates us anyway */ + sigemptyset(&set); + sigaddset(&set, SIGTERM); + sigwait(&set, &sig); + + return 0; +} + +static pid_t spawn_idle_thread(unsigned int flags) +{ + uint8_t *stack; + pid_t pid; + + stack = malloc(STACK_SIZE); + if (!stack) { + printf("malloc(STACK_SIZE) failed: %m\n"); + abort(); + } + + pid = clone(idle_thread_fn, + stack + STACK_SIZE, + SIGCHLD | flags, + NULL); + if (pid < 0) { + printf("clone() failed: %m\n"); + abort(); + } + + return pid; +} + +static void join_idle_thread(pid_t pid) +{ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +/* + * Test memfd_create() syscall + * Verify syscall-argument validation, including name checks, flag validation + * and more. + */ +static void test_create(void) +{ + char buf[2048]; + int fd; + + /* test NULL name */ + mfd_fail_new(NULL, 0); + + /* test over-long name (not zero-terminated) */ + memset(buf, 0xff, sizeof(buf)); + mfd_fail_new(buf, 0); + + /* test over-long zero-terminated name */ + memset(buf, 0xff, sizeof(buf)); + buf[sizeof(buf) - 1] = 0; + mfd_fail_new(buf, 0); + + /* verify "" is a valid name */ + fd = mfd_assert_new("", 0, 0); + close(fd); + + /* verify invalid O_* open flags */ + mfd_fail_new("", 0x0100); + mfd_fail_new("", ~MFD_CLOEXEC); + mfd_fail_new("", ~MFD_ALLOW_SEALING); + mfd_fail_new("", ~0); + mfd_fail_new("", 0x80000000U); + + /* verify MFD_CLOEXEC is allowed */ + fd = mfd_assert_new("", 0, MFD_CLOEXEC); + close(fd); + + /* verify MFD_ALLOW_SEALING is allowed */ + fd = mfd_assert_new("", 0, MFD_ALLOW_SEALING); + close(fd); + + /* verify MFD_ALLOW_SEALING | MFD_CLOEXEC is allowed */ + fd = mfd_assert_new("", 0, MFD_ALLOW_SEALING | MFD_CLOEXEC); + close(fd); +} + +/* + * Test basic sealing + * A very basic sealing test to see whether setting/retrieving seals works. + */ +static void test_basic(void) +{ + int fd; + + fd = mfd_assert_new("kern_memfd_basic", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + + /* add basic seals */ + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_SHRINK | + F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_SHRINK | + F_SEAL_WRITE); + + /* add them again */ + mfd_assert_add_seals(fd, F_SEAL_SHRINK | + F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_SHRINK | + F_SEAL_WRITE); + + /* add more seals and seal against sealing */ + mfd_assert_add_seals(fd, F_SEAL_GROW | F_SEAL_SEAL); + mfd_assert_has_seals(fd, F_SEAL_SHRINK | + F_SEAL_GROW | + F_SEAL_WRITE | + F_SEAL_SEAL); + + /* verify that sealing no longer works */ + mfd_fail_add_seals(fd, F_SEAL_GROW); + mfd_fail_add_seals(fd, 0); + + close(fd); + + /* verify sealing does not work without MFD_ALLOW_SEALING */ + fd = mfd_assert_new("kern_memfd_basic", + MFD_DEF_SIZE, + MFD_CLOEXEC); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + mfd_fail_add_seals(fd, F_SEAL_SHRINK | + F_SEAL_GROW | + F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + close(fd); +} + +/* + * Test SEAL_WRITE + * Test whether SEAL_WRITE actually prevents modifications. + */ +static void test_seal_write(void) +{ + int fd; + + fd = mfd_assert_new("kern_memfd_seal_write", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_WRITE); + + mfd_assert_read(fd); + mfd_fail_write(fd); + mfd_assert_shrink(fd); + mfd_assert_grow(fd); + mfd_fail_grow_write(fd); + + close(fd); +} + +/* + * Test SEAL_SHRINK + * Test whether SEAL_SHRINK actually prevents shrinking + */ +static void test_seal_shrink(void) +{ + int fd; + + fd = mfd_assert_new("kern_memfd_seal_shrink", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_SHRINK); + mfd_assert_has_seals(fd, F_SEAL_SHRINK); + + mfd_assert_read(fd); + mfd_assert_write(fd); + mfd_fail_shrink(fd); + mfd_assert_grow(fd); + mfd_assert_grow_write(fd); + + close(fd); +} + +/* + * Test SEAL_GROW + * Test whether SEAL_GROW actually prevents growing + */ +static void test_seal_grow(void) +{ + int fd; + + fd = mfd_assert_new("kern_memfd_seal_grow", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_GROW); + mfd_assert_has_seals(fd, F_SEAL_GROW); + + mfd_assert_read(fd); + mfd_assert_write(fd); + mfd_assert_shrink(fd); + mfd_fail_grow(fd); + mfd_fail_grow_write(fd); + + close(fd); +} + +/* + * Test SEAL_SHRINK | SEAL_GROW + * Test whether SEAL_SHRINK | SEAL_GROW actually prevents resizing + */ +static void test_seal_resize(void) +{ + int fd; + + fd = mfd_assert_new("kern_memfd_seal_resize", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_SHRINK | F_SEAL_GROW); + mfd_assert_has_seals(fd, F_SEAL_SHRINK | F_SEAL_GROW); + + mfd_assert_read(fd); + mfd_assert_write(fd); + mfd_fail_shrink(fd); + mfd_fail_grow(fd); + mfd_fail_grow_write(fd); + + close(fd); +} + +/* + * Test sharing via dup() + * Test that seals are shared between dupped FDs and they're all equal. + */ +static void test_share_dup(void) +{ + int fd, fd2; + + fd = mfd_assert_new("kern_memfd_share_dup", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + + fd2 = mfd_assert_dup(fd); + mfd_assert_has_seals(fd2, 0); + + mfd_assert_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd2, F_SEAL_WRITE); + + mfd_assert_add_seals(fd2, F_SEAL_SHRINK); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK); + mfd_assert_has_seals(fd2, F_SEAL_WRITE | F_SEAL_SHRINK); + + mfd_assert_add_seals(fd, F_SEAL_SEAL); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_SEAL); + mfd_assert_has_seals(fd2, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_SEAL); + + mfd_fail_add_seals(fd, F_SEAL_GROW); + mfd_fail_add_seals(fd2, F_SEAL_GROW); + mfd_fail_add_seals(fd, F_SEAL_SEAL); + mfd_fail_add_seals(fd2, F_SEAL_SEAL); + + close(fd2); + + mfd_fail_add_seals(fd, F_SEAL_GROW); + close(fd); +} + +/* + * Test sealing with active mmap()s + * Modifying seals is only allowed if no other mmap() refs exist. + */ +static void test_share_mmap(void) +{ + int fd; + void *p; + + fd = mfd_assert_new("kern_memfd_share_mmap", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + + /* shared/writable ref prevents sealing WRITE, but allows others */ + p = mfd_assert_mmap_shared(fd); + mfd_fail_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, 0); + mfd_assert_add_seals(fd, F_SEAL_SHRINK); + mfd_assert_has_seals(fd, F_SEAL_SHRINK); + munmap(p, MFD_DEF_SIZE); + + /* readable ref allows sealing */ + p = mfd_assert_mmap_private(fd); + mfd_assert_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK); + munmap(p, MFD_DEF_SIZE); + + close(fd); +} + +/* + * Test sealing with open(/proc/self/fd/%d) + * Via /proc we can get access to a separate file-context for the same memfd. + * This is *not* like dup(), but like a real separate open(). Make sure the + * semantics are as expected and we correctly check for RDONLY / WRONLY / RDWR. + */ +static void test_share_open(void) +{ + int fd, fd2; + + fd = mfd_assert_new("kern_memfd_share_open", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + + fd2 = mfd_assert_open(fd, O_RDWR, 0); + mfd_assert_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd2, F_SEAL_WRITE); + + mfd_assert_add_seals(fd2, F_SEAL_SHRINK); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK); + mfd_assert_has_seals(fd2, F_SEAL_WRITE | F_SEAL_SHRINK); + + close(fd); + fd = mfd_assert_open(fd2, O_RDONLY, 0); + + mfd_fail_add_seals(fd, F_SEAL_SEAL); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK); + mfd_assert_has_seals(fd2, F_SEAL_WRITE | F_SEAL_SHRINK); + + close(fd2); + fd2 = mfd_assert_open(fd, O_RDWR, 0); + + mfd_assert_add_seals(fd2, F_SEAL_SEAL); + mfd_assert_has_seals(fd, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_SEAL); + mfd_assert_has_seals(fd2, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_SEAL); + + close(fd2); + close(fd); +} + +/* + * Test sharing via fork() + * Test whether seal-modifications work as expected with forked childs. + */ +static void test_share_fork(void) +{ + int fd; + pid_t pid; + + fd = mfd_assert_new("kern_memfd_share_fork", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_has_seals(fd, 0); + + pid = spawn_idle_thread(0); + mfd_assert_add_seals(fd, F_SEAL_SEAL); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + + mfd_fail_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + + join_idle_thread(pid); + + mfd_fail_add_seals(fd, F_SEAL_WRITE); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + + close(fd); +} + +int main(int argc, char **argv) +{ + pid_t pid; + + printf("memfd: CREATE\n"); + test_create(); + printf("memfd: BASIC\n"); + test_basic(); + + printf("memfd: SEAL-WRITE\n"); + test_seal_write(); + printf("memfd: SEAL-SHRINK\n"); + test_seal_shrink(); + printf("memfd: SEAL-GROW\n"); + test_seal_grow(); + printf("memfd: SEAL-RESIZE\n"); + test_seal_resize(); + + printf("memfd: SHARE-DUP\n"); + test_share_dup(); + printf("memfd: SHARE-MMAP\n"); + test_share_mmap(); + printf("memfd: SHARE-OPEN\n"); + test_share_open(); + printf("memfd: SHARE-FORK\n"); + test_share_fork(); + + /* Run test-suite in a multi-threaded environment with a shared + * file-table. */ + pid = spawn_idle_thread(CLONE_FILES | CLONE_FS | CLONE_VM); + printf("memfd: SHARE-DUP (shared file-table)\n"); + test_share_dup(); + printf("memfd: SHARE-MMAP (shared file-table)\n"); + test_share_mmap(); + printf("memfd: SHARE-OPEN (shared file-table)\n"); + test_share_open(); + printf("memfd: SHARE-FORK (shared file-table)\n"); + test_share_fork(); + join_idle_thread(pid); + + printf("memfd: DONE\n"); + + return 0; +} -- cgit v0.10.2 From 87b2d44026e0e315a7401551e95b189ac4b28217 Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:34 -0700 Subject: selftests: add memfd/sealing page-pinning tests Setting SEAL_WRITE is not possible if there're pending GUP users. This commit adds selftests for memfd+sealing that use FUSE to create pending page-references. FUSE is very helpful here in that it allows us to delay direct-IO operations for an arbitrary amount of time. This way, we can force the kernel to pin pages and then run our normal selftests. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/tools/testing/selftests/memfd/.gitignore b/tools/testing/selftests/memfd/.gitignore index bcc8ee2..afe87c4 100644 --- a/tools/testing/selftests/memfd/.gitignore +++ b/tools/testing/selftests/memfd/.gitignore @@ -1,2 +1,4 @@ +fuse_mnt +fuse_test memfd_test memfd-test-file diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile index 36653b9..6816c49 100644 --- a/tools/testing/selftests/memfd/Makefile +++ b/tools/testing/selftests/memfd/Makefile @@ -7,6 +7,7 @@ ifeq ($(ARCH),x86_64) ARCH := X86 endif +CFLAGS += -D_FILE_OFFSET_BITS=64 CFLAGS += -I../../../../arch/x86/include/generated/uapi/ CFLAGS += -I../../../../arch/x86/include/uapi/ CFLAGS += -I../../../../include/uapi/ @@ -25,5 +26,16 @@ ifeq ($(ARCH),X86) endif @./memfd_test || echo "memfd_test: [FAIL]" +build_fuse: +ifeq ($(ARCH),X86) + gcc $(CFLAGS) fuse_mnt.c `pkg-config fuse --cflags --libs` -o fuse_mnt + gcc $(CFLAGS) fuse_test.c -o fuse_test +else + echo "Not an x86 target, can't build memfd selftest" +endif + +run_fuse: build_fuse + @./run_fuse_test.sh || echo "fuse_test: [FAIL]" + clean: - $(RM) memfd_test + $(RM) memfd_test fuse_test diff --git a/tools/testing/selftests/memfd/fuse_mnt.c b/tools/testing/selftests/memfd/fuse_mnt.c new file mode 100644 index 0000000..feacf12 --- /dev/null +++ b/tools/testing/selftests/memfd/fuse_mnt.c @@ -0,0 +1,110 @@ +/* + * memfd test file-system + * This file uses FUSE to create a dummy file-system with only one file /memfd. + * This file is read-only and takes 1s per read. + * + * This file-system is used by the memfd test-cases to force the kernel to pin + * pages during reads(). Due to the 1s delay of this file-system, this is a + * nice way to test race-conditions against get_user_pages() in the kernel. + * + * We use direct_io==1 to force the kernel to use direct-IO for this + * file-system. + */ + +#define FUSE_USE_VERSION 26 + +#include +#include +#include +#include +#include +#include + +static const char memfd_content[] = "memfd-example-content"; +static const char memfd_path[] = "/memfd"; + +static int memfd_getattr(const char *path, struct stat *st) +{ + memset(st, 0, sizeof(*st)); + + if (!strcmp(path, "/")) { + st->st_mode = S_IFDIR | 0755; + st->st_nlink = 2; + } else if (!strcmp(path, memfd_path)) { + st->st_mode = S_IFREG | 0444; + st->st_nlink = 1; + st->st_size = strlen(memfd_content); + } else { + return -ENOENT; + } + + return 0; +} + +static int memfd_readdir(const char *path, + void *buf, + fuse_fill_dir_t filler, + off_t offset, + struct fuse_file_info *fi) +{ + if (strcmp(path, "/")) + return -ENOENT; + + filler(buf, ".", NULL, 0); + filler(buf, "..", NULL, 0); + filler(buf, memfd_path + 1, NULL, 0); + + return 0; +} + +static int memfd_open(const char *path, struct fuse_file_info *fi) +{ + if (strcmp(path, memfd_path)) + return -ENOENT; + + if ((fi->flags & 3) != O_RDONLY) + return -EACCES; + + /* force direct-IO */ + fi->direct_io = 1; + + return 0; +} + +static int memfd_read(const char *path, + char *buf, + size_t size, + off_t offset, + struct fuse_file_info *fi) +{ + size_t len; + + if (strcmp(path, memfd_path) != 0) + return -ENOENT; + + sleep(1); + + len = strlen(memfd_content); + if (offset < len) { + if (offset + size > len) + size = len - offset; + + memcpy(buf, memfd_content + offset, size); + } else { + size = 0; + } + + return size; +} + +static struct fuse_operations memfd_ops = { + .getattr = memfd_getattr, + .readdir = memfd_readdir, + .open = memfd_open, + .read = memfd_read, +}; + +int main(int argc, char *argv[]) +{ + return fuse_main(argc, argv, &memfd_ops, NULL); +} diff --git a/tools/testing/selftests/memfd/fuse_test.c b/tools/testing/selftests/memfd/fuse_test.c new file mode 100644 index 0000000..67908b1 --- /dev/null +++ b/tools/testing/selftests/memfd/fuse_test.c @@ -0,0 +1,311 @@ +/* + * memfd GUP test-case + * This tests memfd interactions with get_user_pages(). We require the + * fuse_mnt.c program to provide a fake direct-IO FUSE mount-point for us. This + * file-system delays _all_ reads by 1s and forces direct-IO. This means, any + * read() on files in that file-system will pin the receive-buffer pages for at + * least 1s via get_user_pages(). + * + * We use this trick to race ADD_SEALS against a write on a memfd object. The + * ADD_SEALS must fail if the memfd pages are still pinned. Note that we use + * the read() syscall with our memory-mapped memfd object as receive buffer to + * force the kernel to write into our memfd object. + */ + +#define _GNU_SOURCE +#define __EXPORTED_HEADERS__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MFD_DEF_SIZE 8192 +#define STACK_SIZE 65535 + +static int sys_memfd_create(const char *name, + unsigned int flags) +{ + return syscall(__NR_memfd_create, name, flags); +} + +static int mfd_assert_new(const char *name, loff_t sz, unsigned int flags) +{ + int r, fd; + + fd = sys_memfd_create(name, flags); + if (fd < 0) { + printf("memfd_create(\"%s\", %u) failed: %m\n", + name, flags); + abort(); + } + + r = ftruncate(fd, sz); + if (r < 0) { + printf("ftruncate(%llu) failed: %m\n", (unsigned long long)sz); + abort(); + } + + return fd; +} + +static __u64 mfd_assert_get_seals(int fd) +{ + long r; + + r = fcntl(fd, F_GET_SEALS); + if (r < 0) { + printf("GET_SEALS(%d) failed: %m\n", fd); + abort(); + } + + return r; +} + +static void mfd_assert_has_seals(int fd, __u64 seals) +{ + __u64 s; + + s = mfd_assert_get_seals(fd); + if (s != seals) { + printf("%llu != %llu = GET_SEALS(%d)\n", + (unsigned long long)seals, (unsigned long long)s, fd); + abort(); + } +} + +static void mfd_assert_add_seals(int fd, __u64 seals) +{ + long r; + __u64 s; + + s = mfd_assert_get_seals(fd); + r = fcntl(fd, F_ADD_SEALS, seals); + if (r < 0) { + printf("ADD_SEALS(%d, %llu -> %llu) failed: %m\n", + fd, (unsigned long long)s, (unsigned long long)seals); + abort(); + } +} + +static int mfd_busy_add_seals(int fd, __u64 seals) +{ + long r; + __u64 s; + + r = fcntl(fd, F_GET_SEALS); + if (r < 0) + s = 0; + else + s = r; + + r = fcntl(fd, F_ADD_SEALS, seals); + if (r < 0 && errno != EBUSY) { + printf("ADD_SEALS(%d, %llu -> %llu) didn't fail as expected with EBUSY: %m\n", + fd, (unsigned long long)s, (unsigned long long)seals); + abort(); + } + + return r; +} + +static void *mfd_assert_mmap_shared(int fd) +{ + void *p; + + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + return p; +} + +static void *mfd_assert_mmap_private(int fd) +{ + void *p; + + p = mmap(NULL, + MFD_DEF_SIZE, + PROT_READ | PROT_WRITE, + MAP_PRIVATE, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + return p; +} + +static int global_mfd = -1; +static void *global_p = NULL; + +static int sealing_thread_fn(void *arg) +{ + int sig, r; + + /* + * This thread first waits 200ms so any pending operation in the parent + * is correctly started. After that, it tries to seal @global_mfd as + * SEAL_WRITE. This _must_ fail as the parent thread has a read() into + * that memory mapped object still ongoing. + * We then wait one more second and try sealing again. This time it + * must succeed as there shouldn't be anyone else pinning the pages. + */ + + /* wait 200ms for FUSE-request to be active */ + usleep(200000); + + /* unmount mapping before sealing to avoid i_mmap_writable failures */ + munmap(global_p, MFD_DEF_SIZE); + + /* Try sealing the global file; expect EBUSY or success. Current + * kernels will never succeed, but in the future, kernels might + * implement page-replacements or other fancy ways to avoid racing + * writes. */ + r = mfd_busy_add_seals(global_mfd, F_SEAL_WRITE); + if (r >= 0) { + printf("HURRAY! This kernel fixed GUP races!\n"); + } else { + /* wait 1s more so the FUSE-request is done */ + sleep(1); + + /* try sealing the global file again */ + mfd_assert_add_seals(global_mfd, F_SEAL_WRITE); + } + + return 0; +} + +static pid_t spawn_sealing_thread(void) +{ + uint8_t *stack; + pid_t pid; + + stack = malloc(STACK_SIZE); + if (!stack) { + printf("malloc(STACK_SIZE) failed: %m\n"); + abort(); + } + + pid = clone(sealing_thread_fn, + stack + STACK_SIZE, + SIGCHLD | CLONE_FILES | CLONE_FS | CLONE_VM, + NULL); + if (pid < 0) { + printf("clone() failed: %m\n"); + abort(); + } + + return pid; +} + +static void join_sealing_thread(pid_t pid) +{ + waitpid(pid, NULL, 0); +} + +int main(int argc, char **argv) +{ + static const char zero[MFD_DEF_SIZE]; + int fd, mfd, r; + void *p; + int was_sealed; + pid_t pid; + + if (argc < 2) { + printf("error: please pass path to file in fuse_mnt mount-point\n"); + abort(); + } + + /* open FUSE memfd file for GUP testing */ + printf("opening: %s\n", argv[1]); + fd = open(argv[1], O_RDONLY | O_CLOEXEC); + if (fd < 0) { + printf("cannot open(\"%s\"): %m\n", argv[1]); + abort(); + } + + /* create new memfd-object */ + mfd = mfd_assert_new("kern_memfd_fuse", + MFD_DEF_SIZE, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + + /* mmap memfd-object for writing */ + p = mfd_assert_mmap_shared(mfd); + + /* pass mfd+mapping to a separate sealing-thread which tries to seal + * the memfd objects with SEAL_WRITE while we write into it */ + global_mfd = mfd; + global_p = p; + pid = spawn_sealing_thread(); + + /* Use read() on the FUSE file to read into our memory-mapped memfd + * object. This races the other thread which tries to seal the + * memfd-object. + * If @fd is on the memfd-fake-FUSE-FS, the read() is delayed by 1s. + * This guarantees that the receive-buffer is pinned for 1s until the + * data is written into it. The racing ADD_SEALS should thus fail as + * the pages are still pinned. */ + r = read(fd, p, MFD_DEF_SIZE); + if (r < 0) { + printf("read() failed: %m\n"); + abort(); + } else if (!r) { + printf("unexpected EOF on read()\n"); + abort(); + } + + was_sealed = mfd_assert_get_seals(mfd) & F_SEAL_WRITE; + + /* Wait for sealing-thread to finish and verify that it + * successfully sealed the file after the second try. */ + join_sealing_thread(pid); + mfd_assert_has_seals(mfd, F_SEAL_WRITE); + + /* *IF* the memfd-object was sealed at the time our read() returned, + * then the kernel did a page-replacement or canceled the read() (or + * whatever magic it did..). In that case, the memfd object is still + * all zero. + * In case the memfd-object was *not* sealed, the read() was successfull + * and the memfd object must *not* be all zero. + * Note that in real scenarios, there might be a mixture of both, but + * in this test-cases, we have explicit 200ms delays which should be + * enough to avoid any in-flight writes. */ + + p = mfd_assert_mmap_private(mfd); + if (was_sealed && memcmp(p, zero, MFD_DEF_SIZE)) { + printf("memfd sealed during read() but data not discarded\n"); + abort(); + } else if (!was_sealed && !memcmp(p, zero, MFD_DEF_SIZE)) { + printf("memfd sealed after read() but data discarded\n"); + abort(); + } + + close(mfd); + close(fd); + + printf("fuse: DONE\n"); + + return 0; +} diff --git a/tools/testing/selftests/memfd/run_fuse_test.sh b/tools/testing/selftests/memfd/run_fuse_test.sh new file mode 100644 index 0000000..69b930e --- /dev/null +++ b/tools/testing/selftests/memfd/run_fuse_test.sh @@ -0,0 +1,14 @@ +#!/bin/sh + +if test -d "./mnt" ; then + fusermount -u ./mnt + rmdir ./mnt +fi + +set -e + +mkdir mnt +./fuse_mnt ./mnt +./fuse_test ./mnt/memfd +fusermount -u ./mnt +rmdir ./mnt -- cgit v0.10.2 From 05f65b5c70909ef686f865f0a85406d74d75f70f Mon Sep 17 00:00:00 2001 From: David Herrmann Date: Fri, 8 Aug 2014 14:25:36 -0700 Subject: shm: wait for pins to be released when sealing If we set SEAL_WRITE on a file, we must make sure there cannot be any ongoing write-operations on the file. For write() calls, we simply lock the inode mutex, for mmap() we simply verify there're no writable mappings. However, there might be pages pinned by AIO, Direct-IO and similar operations via GUP. We must make sure those do not write to the memfd file after we set SEAL_WRITE. As there is no way to notify GUP users to drop pages or to wait for them to be done, we implement the wait ourself: When setting SEAL_WRITE, we check all pages for their ref-count. If it's bigger than 1, we know there's some user of the page. We then mark the page and wait for up to 150ms for those ref-counts to be dropped. If the ref-counts are not dropped in time, we refuse the seal operation. Signed-off-by: David Herrmann Acked-by: Hugh Dickins Cc: Michael Kerrisk Cc: Ryan Lortie Cc: Lennart Poettering Cc: Daniel Mack Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/shmem.c b/mm/shmem.c index 4a54987..a42add1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1828,9 +1828,117 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return offset; } +/* + * We need a tag: a new tag would expand every radix_tree_node by 8 bytes, + * so reuse a tag which we firmly believe is never set or cleared on shmem. + */ +#define SHMEM_TAG_PINNED PAGECACHE_TAG_TOWRITE +#define LAST_SCAN 4 /* about 150ms max */ + +static void shmem_tag_pins(struct address_space *mapping) +{ + struct radix_tree_iter iter; + void **slot; + pgoff_t start; + struct page *page; + + lru_add_drain(); + start = 0; + rcu_read_lock(); + +restart: + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { + page = radix_tree_deref_slot(slot); + if (!page || radix_tree_exception(page)) { + if (radix_tree_deref_retry(page)) + goto restart; + } else if (page_count(page) - page_mapcount(page) > 1) { + spin_lock_irq(&mapping->tree_lock); + radix_tree_tag_set(&mapping->page_tree, iter.index, + SHMEM_TAG_PINNED); + spin_unlock_irq(&mapping->tree_lock); + } + + if (need_resched()) { + cond_resched_rcu(); + start = iter.index + 1; + goto restart; + } + } + rcu_read_unlock(); +} + +/* + * Setting SEAL_WRITE requires us to verify there's no pending writer. However, + * via get_user_pages(), drivers might have some pending I/O without any active + * user-space mappings (eg., direct-IO, AIO). Therefore, we look at all pages + * and see whether it has an elevated ref-count. If so, we tag them and wait for + * them to be dropped. + * The caller must guarantee that no new user will acquire writable references + * to those pages to avoid races. + */ static int shmem_wait_for_pins(struct address_space *mapping) { - return 0; + struct radix_tree_iter iter; + void **slot; + pgoff_t start; + struct page *page; + int error, scan; + + shmem_tag_pins(mapping); + + error = 0; + for (scan = 0; scan <= LAST_SCAN; scan++) { + if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED)) + break; + + if (!scan) + lru_add_drain_all(); + else if (schedule_timeout_killable((HZ << scan) / 200)) + scan = LAST_SCAN; + + start = 0; + rcu_read_lock(); +restart: + radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, + start, SHMEM_TAG_PINNED) { + + page = radix_tree_deref_slot(slot); + if (radix_tree_exception(page)) { + if (radix_tree_deref_retry(page)) + goto restart; + + page = NULL; + } + + if (page && + page_count(page) - page_mapcount(page) != 1) { + if (scan < LAST_SCAN) + goto continue_resched; + + /* + * On the last scan, we clean up all those tags + * we inserted; but make a note that we still + * found pages pinned. + */ + error = -EBUSY; + } + + spin_lock_irq(&mapping->tree_lock); + radix_tree_tag_clear(&mapping->page_tree, + iter.index, SHMEM_TAG_PINNED); + spin_unlock_irq(&mapping->tree_lock); +continue_resched: + if (need_resched()) { + cond_resched_rcu(); + start = iter.index + 1; + goto restart; + } + } + rcu_read_unlock(); + } + + return error; } #define F_ALL_SEALS (F_SEAL_SEAL | \ -- cgit v0.10.2 From 8370edea81e321b8a976969753d6b2811e6d5ed6 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:38 -0700 Subject: bin2c: move bin2c in scripts/basic This patch series does not do kernel signature verification yet. I plan to post another patch series for that. Now distributions are already signing PE/COFF bzImage with PKCS7 signature I plan to parse and verify those signatures. Primary goal of this patchset is to prepare groundwork so that kernel image can be signed and signatures be verified during kexec load. This should help with two things. - It should allow kexec/kdump on secureboot enabled machines. - In general it can help even without secureboot. By being able to verify kernel image signature in kexec, it should help with avoiding module signing restrictions. Matthew Garret showed how to boot into a custom kernel, modify first kernel's memory and then jump back to old kernel and bypass any policy one wants to. This patch (of 15): Kexec wants to use bin2c and it wants to use it really early in the build process. See arch/x86/purgatory/ code in later patches. So move bin2c in scripts/basic so that it can be built very early and be usable by arch/x86/purgatory/ Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/Makefile b/kernel/Makefile index 0026cf5..dc5c775 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -105,7 +105,7 @@ targets += config_data.gz $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE $(call if_changed,gzip) - filechk_ikconfiggz = (echo "static const char kernel_config_data[] __used = MAGIC_START"; cat $< | scripts/bin2c; echo "MAGIC_END;") + filechk_ikconfiggz = (echo "static const char kernel_config_data[] __used = MAGIC_START"; cat $< | scripts/basic/bin2c; echo "MAGIC_END;") targets += config_data.h $(obj)/config_data.h: $(obj)/config_data.gz FORCE $(call filechk,ikconfiggz) diff --git a/scripts/.gitignore b/scripts/.gitignore index fb070fa..5ecfe93 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore @@ -4,7 +4,6 @@ conmakehash kallsyms pnmtologo -bin2c unifdef ihex2fw recordmcount diff --git a/scripts/Makefile b/scripts/Makefile index 890df5c..72902b5 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -13,7 +13,6 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include hostprogs-$(CONFIG_KALLSYMS) += kallsyms hostprogs-$(CONFIG_LOGO) += pnmtologo hostprogs-$(CONFIG_VT) += conmakehash -hostprogs-$(CONFIG_IKCONFIG) += bin2c hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable hostprogs-$(CONFIG_ASN1) += asn1_compiler diff --git a/scripts/basic/.gitignore b/scripts/basic/.gitignore index a776371..9528ec9 100644 --- a/scripts/basic/.gitignore +++ b/scripts/basic/.gitignore @@ -1 +1,2 @@ fixdep +bin2c diff --git a/scripts/basic/Makefile b/scripts/basic/Makefile index 4fcef87..afbc1cd 100644 --- a/scripts/basic/Makefile +++ b/scripts/basic/Makefile @@ -9,6 +9,7 @@ # fixdep: Used to generate dependency information during build process hostprogs-y := fixdep +hostprogs-$(CONFIG_IKCONFIG) += bin2c always := $(hostprogs-y) # fixdep is needed to compile other host programs diff --git a/scripts/basic/bin2c.c b/scripts/basic/bin2c.c new file mode 100644 index 0000000..af187e6 --- /dev/null +++ b/scripts/basic/bin2c.c @@ -0,0 +1,35 @@ +/* + * Unloved program to convert a binary on stdin to a C include on stdout + * + * Jan 1999 Matt Mackall + * + * This software may be used and distributed according to the terms + * of the GNU General Public License, incorporated herein by reference. + */ + +#include + +int main(int argc, char *argv[]) +{ + int ch, total = 0; + + if (argc > 1) + printf("const char %s[] %s=\n", + argv[1], argc > 2 ? argv[2] : ""); + + do { + printf("\t\""); + while ((ch = getchar()) != EOF) { + total++; + printf("\\x%02x", ch); + if (total % 16 == 0) + break; + } + printf("\"\n"); + } while (ch != EOF); + + if (argc > 1) + printf("\t;\n\nconst int %s_size = %d;\n", argv[1], total); + + return 0; +} diff --git a/scripts/bin2c.c b/scripts/bin2c.c deleted file mode 100644 index 96dd2bc..0000000 --- a/scripts/bin2c.c +++ /dev/null @@ -1,36 +0,0 @@ -/* - * Unloved program to convert a binary on stdin to a C include on stdout - * - * Jan 1999 Matt Mackall - * - * This software may be used and distributed according to the terms - * of the GNU General Public License, incorporated herein by reference. - */ - -#include - -int main(int argc, char *argv[]) -{ - int ch, total=0; - - if (argc > 1) - printf("const char %s[] %s=\n", - argv[1], argc > 2 ? argv[2] : ""); - - do { - printf("\t\""); - while ((ch = getchar()) != EOF) - { - total++; - printf("\\x%02x",ch); - if (total % 16 == 0) - break; - } - printf("\"\n"); - } while (ch != EOF); - - if (argc > 1) - printf("\t;\n\nconst int %s_size = %d;\n", argv[1], total); - - return 0; -} -- cgit v0.10.2 From de5b56ba51f63973ceb5c184ee0855f0c8a13fc9 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:41 -0700 Subject: kernel: build bin2c based on config option CONFIG_BUILD_BIN2C currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c915cc6..98fe3df 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1582,6 +1582,7 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call" + select BUILD_BIN2C ---help--- kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/init/Kconfig b/init/Kconfig index a291b7e..44f9ed3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -783,8 +783,13 @@ endchoice endmenu # "RCU Subsystem" +config BUILD_BIN2C + bool + default n + config IKCONFIG tristate "Kernel .config support" + select BUILD_BIN2C ---help--- This option enables the complete Linux kernel ".config" file contents to be saved in the kernel. It provides documentation diff --git a/scripts/basic/Makefile b/scripts/basic/Makefile index afbc1cd..ec10d93 100644 --- a/scripts/basic/Makefile +++ b/scripts/basic/Makefile @@ -9,7 +9,7 @@ # fixdep: Used to generate dependency information during build process hostprogs-y := fixdep -hostprogs-$(CONFIG_IKCONFIG) += bin2c +hostprogs-$(CONFIG_BUILD_BIN2C) += bin2c always := $(hostprogs-y) # fixdep is needed to compile other host programs -- cgit v0.10.2 From 7d3e2bca22feb1f4a624009ff6c15e6f724cb4e7 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:43 -0700 Subject: kexec: rename unusebale_pages to unusable_pages Let's use the more common "unusable". This patch was originally written and posted by Boris. I am including it in this patch series. Signed-off-by: Borislav Petkov Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/kexec.h b/include/linux/kexec.h index a756419..d9bb0a5 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -100,7 +100,7 @@ struct kimage { struct list_head control_pages; struct list_head dest_pages; - struct list_head unuseable_pages; + struct list_head unusable_pages; /* Address of next control page to allocate for crash kernels. */ unsigned long control_page; diff --git a/kernel/kexec.c b/kernel/kexec.c index 4b8f0c9..c7cc2a0 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -154,7 +154,7 @@ static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, INIT_LIST_HEAD(&image->dest_pages); /* Initialize the list of unusable pages */ - INIT_LIST_HEAD(&image->unuseable_pages); + INIT_LIST_HEAD(&image->unusable_pages); /* Read in the segments */ image->nr_segments = nr_segments; @@ -609,7 +609,7 @@ static void kimage_free_extra_pages(struct kimage *image) kimage_free_page_list(&image->dest_pages); /* Walk through and free any unusable pages I have cached */ - kimage_free_page_list(&image->unuseable_pages); + kimage_free_page_list(&image->unusable_pages); } static void kimage_terminate(struct kimage *image) @@ -732,7 +732,7 @@ static struct page *kimage_alloc_page(struct kimage *image, /* If the page cannot be used file it away */ if (page_to_pfn(page) > (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { - list_add(&page->lru, &image->unuseable_pages); + list_add(&page->lru, &image->unusable_pages); continue; } addr = page_to_pfn(page) << PAGE_SHIFT; -- cgit v0.10.2 From dabe78628dd886c4b71971d1d78f1cecc674b760 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:45 -0700 Subject: kexec: move segment verification code in a separate function Previously do_kimage_alloc() will allocate a kimage structure, copy segment list from user space and then do the segment list sanity verification. Break down this function in 3 parts. do_kimage_alloc_init() to do actual allocation and basic initialization of kimage structure. copy_user_segment_list() to copy segment list from user space and sanity_check_segment_list() to verify the sanity of segment list as passed by user space. In later patches, I need to only allocate kimage and not copy segment list from user space. So breaking down in smaller functions enables re-use of code at other places. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/kexec.c b/kernel/kexec.c index c7cc2a0..062e556 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -125,45 +125,27 @@ static struct page *kimage_alloc_page(struct kimage *image, gfp_t gfp_mask, unsigned long dest); -static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, - unsigned long nr_segments, - struct kexec_segment __user *segments) +static int copy_user_segment_list(struct kimage *image, + unsigned long nr_segments, + struct kexec_segment __user *segments) { + int ret; size_t segment_bytes; - struct kimage *image; - unsigned long i; - int result; - - /* Allocate a controlling structure */ - result = -ENOMEM; - image = kzalloc(sizeof(*image), GFP_KERNEL); - if (!image) - goto out; - - image->head = 0; - image->entry = &image->head; - image->last_entry = &image->head; - image->control_page = ~0; /* By default this does not apply */ - image->start = entry; - image->type = KEXEC_TYPE_DEFAULT; - - /* Initialize the list of control pages */ - INIT_LIST_HEAD(&image->control_pages); - - /* Initialize the list of destination pages */ - INIT_LIST_HEAD(&image->dest_pages); - - /* Initialize the list of unusable pages */ - INIT_LIST_HEAD(&image->unusable_pages); /* Read in the segments */ image->nr_segments = nr_segments; segment_bytes = nr_segments * sizeof(*segments); - result = copy_from_user(image->segment, segments, segment_bytes); - if (result) { - result = -EFAULT; - goto out; - } + ret = copy_from_user(image->segment, segments, segment_bytes); + if (ret) + ret = -EFAULT; + + return ret; +} + +static int sanity_check_segment_list(struct kimage *image) +{ + int result, i; + unsigned long nr_segments = image->nr_segments; /* * Verify we have good destination addresses. The caller is @@ -185,9 +167,9 @@ static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, mstart = image->segment[i].mem; mend = mstart + image->segment[i].memsz; if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK)) - goto out; + return result; if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT) - goto out; + return result; } /* Verify our destination addresses do not overlap. @@ -208,7 +190,7 @@ static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, pend = pstart + image->segment[j].memsz; /* Do the segments overlap ? */ if ((mend > pstart) && (mstart < pend)) - goto out; + return result; } } @@ -220,18 +202,61 @@ static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, result = -EINVAL; for (i = 0; i < nr_segments; i++) { if (image->segment[i].bufsz > image->segment[i].memsz) - goto out; + return result; } - result = 0; -out: - if (result == 0) - *rimage = image; - else - kfree(image); + /* + * Verify we have good destination addresses. Normally + * the caller is responsible for making certain we don't + * attempt to load the new image into invalid or reserved + * areas of RAM. But crash kernels are preloaded into a + * reserved area of ram. We must ensure the addresses + * are in the reserved area otherwise preloading the + * kernel could corrupt things. + */ - return result; + if (image->type == KEXEC_TYPE_CRASH) { + result = -EADDRNOTAVAIL; + for (i = 0; i < nr_segments; i++) { + unsigned long mstart, mend; + mstart = image->segment[i].mem; + mend = mstart + image->segment[i].memsz - 1; + /* Ensure we are within the crash kernel limits */ + if ((mstart < crashk_res.start) || + (mend > crashk_res.end)) + return result; + } + } + + return 0; +} + +static struct kimage *do_kimage_alloc_init(void) +{ + struct kimage *image; + + /* Allocate a controlling structure */ + image = kzalloc(sizeof(*image), GFP_KERNEL); + if (!image) + return NULL; + + image->head = 0; + image->entry = &image->head; + image->last_entry = &image->head; + image->control_page = ~0; /* By default this does not apply */ + image->type = KEXEC_TYPE_DEFAULT; + + /* Initialize the list of control pages */ + INIT_LIST_HEAD(&image->control_pages); + + /* Initialize the list of destination pages */ + INIT_LIST_HEAD(&image->dest_pages); + + /* Initialize the list of unusable pages */ + INIT_LIST_HEAD(&image->unusable_pages); + + return image; } static void kimage_free_page_list(struct list_head *list); @@ -244,10 +269,19 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, struct kimage *image; /* Allocate and initialize a controlling structure */ - image = NULL; - result = do_kimage_alloc(&image, entry, nr_segments, segments); + image = do_kimage_alloc_init(); + if (!image) + return -ENOMEM; + + image->start = entry; + + result = copy_user_segment_list(image, nr_segments, segments); if (result) - goto out; + goto out_free_image; + + result = sanity_check_segment_list(image); + if (result) + goto out_free_image; /* * Find a location for the control code buffer, and add it @@ -259,22 +293,21 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { pr_err("Could not allocate control_code_buffer\n"); - goto out_free; + goto out_free_image; } image->swap_page = kimage_alloc_control_pages(image, 0); if (!image->swap_page) { pr_err("Could not allocate swap buffer\n"); - goto out_free; + goto out_free_control_pages; } *rimage = image; return 0; - -out_free: +out_free_control_pages: kimage_free_page_list(&image->control_pages); +out_free_image: kfree(image); -out: return result; } @@ -284,19 +317,17 @@ static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, { int result; struct kimage *image; - unsigned long i; - image = NULL; /* Verify we have a valid entry point */ - if ((entry < crashk_res.start) || (entry > crashk_res.end)) { - result = -EADDRNOTAVAIL; - goto out; - } + if ((entry < crashk_res.start) || (entry > crashk_res.end)) + return -EADDRNOTAVAIL; /* Allocate and initialize a controlling structure */ - result = do_kimage_alloc(&image, entry, nr_segments, segments); - if (result) - goto out; + image = do_kimage_alloc_init(); + if (!image) + return -ENOMEM; + + image->start = entry; /* Enable the special crash kernel control page * allocation policy. @@ -304,25 +335,13 @@ static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, image->control_page = crashk_res.start; image->type = KEXEC_TYPE_CRASH; - /* - * Verify we have good destination addresses. Normally - * the caller is responsible for making certain we don't - * attempt to load the new image into invalid or reserved - * areas of RAM. But crash kernels are preloaded into a - * reserved area of ram. We must ensure the addresses - * are in the reserved area otherwise preloading the - * kernel could corrupt things. - */ - result = -EADDRNOTAVAIL; - for (i = 0; i < nr_segments; i++) { - unsigned long mstart, mend; + result = copy_user_segment_list(image, nr_segments, segments); + if (result) + goto out_free_image; - mstart = image->segment[i].mem; - mend = mstart + image->segment[i].memsz - 1; - /* Ensure we are within the crash kernel limits */ - if ((mstart < crashk_res.start) || (mend > crashk_res.end)) - goto out_free; - } + result = sanity_check_segment_list(image); + if (result) + goto out_free_image; /* * Find a location for the control code buffer, and add @@ -334,15 +353,14 @@ static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { pr_err("Could not allocate control_code_buffer\n"); - goto out_free; + goto out_free_image; } *rimage = image; return 0; -out_free: +out_free_image: kfree(image); -out: return result; } -- cgit v0.10.2 From 255aedd90e3e804fb52e1a71636a3b22cf12f81b Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:48 -0700 Subject: kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc() kimage_normal_alloc() and kimage_crash_alloc() are doing lot of similar things and differ only little. So instead of having two separate functions create a common function kimage_alloc_init() and pass it the "flags" argument which tells whether it is normal kexec or kexec_on_panic. And this function should be able to deal with both the cases. This consolidation also helps later where we can use a common function kimage_file_alloc_init() to handle normal and crash cases for new file based kexec syscall. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/kexec.c b/kernel/kexec.c index 062e556..bfdda31 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -261,12 +261,20 @@ static struct kimage *do_kimage_alloc_init(void) static void kimage_free_page_list(struct list_head *list); -static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, - unsigned long nr_segments, - struct kexec_segment __user *segments) +static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, + unsigned long nr_segments, + struct kexec_segment __user *segments, + unsigned long flags) { - int result; + int ret; struct kimage *image; + bool kexec_on_panic = flags & KEXEC_ON_CRASH; + + if (kexec_on_panic) { + /* Verify we have a valid entry point */ + if ((entry < crashk_res.start) || (entry > crashk_res.end)) + return -EADDRNOTAVAIL; + } /* Allocate and initialize a controlling structure */ image = do_kimage_alloc_init(); @@ -275,20 +283,26 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, image->start = entry; - result = copy_user_segment_list(image, nr_segments, segments); - if (result) + ret = copy_user_segment_list(image, nr_segments, segments); + if (ret) goto out_free_image; - result = sanity_check_segment_list(image); - if (result) + ret = sanity_check_segment_list(image); + if (ret) goto out_free_image; + /* Enable the special crash kernel control page allocation policy. */ + if (kexec_on_panic) { + image->control_page = crashk_res.start; + image->type = KEXEC_TYPE_CRASH; + } + /* * Find a location for the control code buffer, and add it * the vector of segments so that it's pages will also be * counted as destination pages. */ - result = -ENOMEM; + ret = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { @@ -296,10 +310,12 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, goto out_free_image; } - image->swap_page = kimage_alloc_control_pages(image, 0); - if (!image->swap_page) { - pr_err("Could not allocate swap buffer\n"); - goto out_free_control_pages; + if (!kexec_on_panic) { + image->swap_page = kimage_alloc_control_pages(image, 0); + if (!image->swap_page) { + pr_err("Could not allocate swap buffer\n"); + goto out_free_control_pages; + } } *rimage = image; @@ -308,60 +324,7 @@ out_free_control_pages: kimage_free_page_list(&image->control_pages); out_free_image: kfree(image); - return result; -} - -static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, - unsigned long nr_segments, - struct kexec_segment __user *segments) -{ - int result; - struct kimage *image; - - /* Verify we have a valid entry point */ - if ((entry < crashk_res.start) || (entry > crashk_res.end)) - return -EADDRNOTAVAIL; - - /* Allocate and initialize a controlling structure */ - image = do_kimage_alloc_init(); - if (!image) - return -ENOMEM; - - image->start = entry; - - /* Enable the special crash kernel control page - * allocation policy. - */ - image->control_page = crashk_res.start; - image->type = KEXEC_TYPE_CRASH; - - result = copy_user_segment_list(image, nr_segments, segments); - if (result) - goto out_free_image; - - result = sanity_check_segment_list(image); - if (result) - goto out_free_image; - - /* - * Find a location for the control code buffer, and add - * the vector of segments so that it's pages will also be - * counted as destination pages. - */ - result = -ENOMEM; - image->control_code_page = kimage_alloc_control_pages(image, - get_order(KEXEC_CONTROL_PAGE_SIZE)); - if (!image->control_code_page) { - pr_err("Could not allocate control_code_buffer\n"); - goto out_free_image; - } - - *rimage = image; - return 0; - -out_free_image: - kfree(image); - return result; + return ret; } static int kimage_is_destination_range(struct kimage *image, @@ -1004,16 +967,16 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, /* Loading another kernel to reboot into */ if ((flags & KEXEC_ON_CRASH) == 0) - result = kimage_normal_alloc(&image, entry, - nr_segments, segments); + result = kimage_alloc_init(&image, entry, nr_segments, + segments, flags); /* Loading another kernel to switch to if this one crashes */ else if (flags & KEXEC_ON_CRASH) { /* Free any current crash dump kernel before * we corrupt it. */ kimage_free(xchg(&kexec_crash_image, NULL)); - result = kimage_crash_alloc(&image, entry, - nr_segments, segments); + result = kimage_alloc_init(&image, entry, nr_segments, + segments, flags); crash_map_reserved_pages(); } if (result) -- cgit v0.10.2 From 8c86e70acead629aacb4afcd818add66bf6844d9 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:50 -0700 Subject: resource: provide new functions to walk through resources I have added two more functions to walk through resources. Currently walk_system_ram_range() deals with pfn and /proc/iomem can contain partial pages. By dealing in pfn, callback function loses the info that last page of a memory range is a partial page and not the full page. So I implemented walk_system_ram_res() which returns u64 values to callback functions and now it properly return start and end address. walk_system_ram_range() uses find_next_system_ram() to find the next ram resource. This in turn only travels through siblings of top level child and does not travers through all the nodes of the resoruce tree. I also need another function where I can walk through all the resources, for example figure out where "GART" aperture is. Figure out where ACPI memory is. So I wrote another function walk_iomem_res() which walks through all /proc/iomem resources and returns matches as asked by caller. Caller can specify "name" of resource, start and end and flags. Got rid of find_next_system_ram_res() and instead implemented more generic find_next_iomem_res() which can be used to traverse top level children only based on an argument. Signed-off-by: Vivek Goyal Cc: Yinghai Lu Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 5e3a906..142ec54 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -237,6 +237,12 @@ extern int iomem_is_exclusive(u64 addr); extern int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages, void *arg, int (*func)(unsigned long, unsigned long, void *)); +extern int +walk_system_ram_res(u64 start, u64 end, void *arg, + int (*func)(u64, u64, void *)); +extern int +walk_iomem_res(char *name, unsigned long flags, u64 start, u64 end, void *arg, + int (*func)(u64, u64, void *)); /* True if any part of r1 overlaps r2 */ static inline bool resource_overlaps(struct resource *r1, struct resource *r2) diff --git a/kernel/resource.c b/kernel/resource.c index 3c2237a..da14b8d 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -59,10 +59,12 @@ static DEFINE_RWLOCK(resource_lock); static struct resource *bootmem_resource_free; static DEFINE_SPINLOCK(bootmem_resource_lock); -static void *r_next(struct seq_file *m, void *v, loff_t *pos) +static struct resource *next_resource(struct resource *p, bool sibling_only) { - struct resource *p = v; - (*pos)++; + /* Caller wants to traverse through siblings only */ + if (sibling_only) + return p->sibling; + if (p->child) return p->child; while (!p->sibling && p->parent) @@ -70,6 +72,13 @@ static void *r_next(struct seq_file *m, void *v, loff_t *pos) return p->sibling; } +static void *r_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct resource *p = v; + (*pos)++; + return (void *)next_resource(p, false); +} + #ifdef CONFIG_PROC_FS enum { MAX_IORES_LEVEL = 5 }; @@ -322,16 +331,19 @@ int release_resource(struct resource *old) EXPORT_SYMBOL(release_resource); -#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY) /* - * Finds the lowest memory reosurce exists within [res->start.res->end) + * Finds the lowest iomem reosurce exists with-in [res->start.res->end) * the caller must specify res->start, res->end, res->flags and "name". * If found, returns 0, res is overwritten, if not found, returns -1. + * This walks through whole tree and not just first level children + * until and unless first_level_children_only is true. */ -static int find_next_system_ram(struct resource *res, char *name) +static int find_next_iomem_res(struct resource *res, char *name, + bool first_level_children_only) { resource_size_t start, end; struct resource *p; + bool sibling_only = false; BUG_ON(!res); @@ -340,8 +352,14 @@ static int find_next_system_ram(struct resource *res, char *name) BUG_ON(start >= end); read_lock(&resource_lock); - for (p = iomem_resource.child; p ; p = p->sibling) { - /* system ram is just marked as IORESOURCE_MEM */ + + if (first_level_children_only) { + p = iomem_resource.child; + sibling_only = true; + } else + p = &iomem_resource; + + while ((p = next_resource(p, sibling_only))) { if (p->flags != res->flags) continue; if (name && strcmp(p->name, name)) @@ -353,6 +371,7 @@ static int find_next_system_ram(struct resource *res, char *name) if ((p->end >= start) && (p->start < end)) break; } + read_unlock(&resource_lock); if (!p) return -1; @@ -365,6 +384,70 @@ static int find_next_system_ram(struct resource *res, char *name) } /* + * Walks through iomem resources and calls func() with matching resource + * ranges. This walks through whole tree and not just first level children. + * All the memory ranges which overlap start,end and also match flags and + * name are valid candidates. + * + * @name: name of resource + * @flags: resource flags + * @start: start addr + * @end: end addr + */ +int walk_iomem_res(char *name, unsigned long flags, u64 start, u64 end, + void *arg, int (*func)(u64, u64, void *)) +{ + struct resource res; + u64 orig_end; + int ret = -1; + + res.start = start; + res.end = end; + res.flags = flags; + orig_end = res.end; + while ((res.start < res.end) && + (!find_next_iomem_res(&res, name, false))) { + ret = (*func)(res.start, res.end, arg); + if (ret) + break; + res.start = res.end + 1; + res.end = orig_end; + } + return ret; +} + +/* + * This function calls callback against all memory range of "System RAM" + * which are marked as IORESOURCE_MEM and IORESOUCE_BUSY. + * Now, this function is only for "System RAM". This function deals with + * full ranges and not pfn. If resources are not pfn aligned, dealing + * with pfn can truncate ranges. + */ +int walk_system_ram_res(u64 start, u64 end, void *arg, + int (*func)(u64, u64, void *)) +{ + struct resource res; + u64 orig_end; + int ret = -1; + + res.start = start; + res.end = end; + res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; + orig_end = res.end; + while ((res.start < res.end) && + (!find_next_iomem_res(&res, "System RAM", true))) { + ret = (*func)(res.start, res.end, arg); + if (ret) + break; + res.start = res.end + 1; + res.end = orig_end; + } + return ret; +} + +#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY) + +/* * This function calls callback against all memory range of "System RAM" * which are marked as IORESOURCE_MEM and IORESOUCE_BUSY. * Now, this function is only for "System RAM". @@ -382,7 +465,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages, res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; orig_end = res.end; while ((res.start < res.end) && - (find_next_system_ram(&res, "System RAM") >= 0)) { + (find_next_iomem_res(&res, "System RAM", true) >= 0)) { pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT; end_pfn = (res.end + 1) >> PAGE_SHIFT; if (end_pfn > pfn) -- cgit v0.10.2 From 815d5704a337a662bf960757edbff7a0680d40fd Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:52 -0700 Subject: kexec: make kexec_segment user buffer pointer a union So far kexec_segment->buf was always a user space pointer as user space passed the array of kexec_segment structures and kernel copied it. But with new system call, list of kexec segments will be prepared by kernel and kexec_segment->buf will point to a kernel memory. So while I was adding code where I made assumption that ->buf is pointing to kernel memory, sparse started giving warning. Make ->buf a union. And where a user space pointer is expected, access it using ->buf and where a kernel space pointer is expected, access it using ->kbuf. That takes care of sparse warnings. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/kexec.h b/include/linux/kexec.h index d9bb0a5..66d56ac 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -69,7 +69,18 @@ typedef unsigned long kimage_entry_t; #define IND_SOURCE 0x8 struct kexec_segment { - void __user *buf; + /* + * This pointer can point to user memory if kexec_load() system + * call is used or will point to kernel memory if + * kexec_file_load() system call is used. + * + * Use ->buf when expecting to deal with user memory and use ->kbuf + * when expecting to deal with kernel memory. + */ + union { + void __user *buf; + void *kbuf; + }; size_t bufsz; unsigned long mem; size_t memsz; -- cgit v0.10.2 From f0895685c7fd8c938c91a9d8a6f7c11f22df58d2 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:55 -0700 Subject: kexec: new syscall kexec_file_load() declaration This is the new syscall kexec_file_load() declaration/interface. I have reserved the syscall number only for x86_64 so far. Other architectures (including i386) can reserve syscall number when they enable the support for this new syscall. Signed-off-by: Vivek Goyal Cc: Michael Kerrisk Cc: Borislav Petkov Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index ca2b9aa..35dd922 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -326,6 +326,7 @@ 317 common seccomp sys_seccomp 318 common getrandom sys_getrandom 319 common memfd_create sys_memfd_create +320 common kexec_file_load sys_kexec_file_load # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 15a0694..0f86d85 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -317,6 +317,10 @@ asmlinkage long sys_restart_syscall(void); asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments, struct kexec_segment __user *segments, unsigned long flags); +asmlinkage long sys_kexec_file_load(int kernel_fd, int initrd_fd, + unsigned long cmdline_len, + const char __user *cmdline_ptr, + unsigned long flags); asmlinkage long sys_exit(int error_code); asmlinkage long sys_exit_group(int error_code); diff --git a/kernel/kexec.c b/kernel/kexec.c index bfdda31..ec4386c 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1058,6 +1058,13 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry, } #endif +SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, + unsigned long, cmdline_len, const char __user *, cmdline_ptr, + unsigned long, flags) +{ + return -ENOSYS; +} + void crash_kexec(struct pt_regs *regs) { /* Take the kexec_mutex here to prevent sys_kexec_load diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 1f79e37..391d4dd 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -25,6 +25,7 @@ cond_syscall(sys_swapon); cond_syscall(sys_swapoff); cond_syscall(sys_kexec_load); cond_syscall(compat_sys_kexec_load); +cond_syscall(sys_kexec_file_load); cond_syscall(sys_init_module); cond_syscall(sys_finit_module); cond_syscall(sys_delete_module); -- cgit v0.10.2 From cb1052581e2bddd6096544f3f944f4e7fdad4c7f Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:57 -0700 Subject: kexec: implementation of new syscall kexec_file_load Previous patch provided the interface definition and this patch prvides implementation of new syscall. Previously segment list was prepared in user space. Now user space just passes kernel fd, initrd fd and command line and kernel will create a segment list internally. This patch contains generic part of the code. Actual segment preparation and loading is done by arch and image specific loader. Which comes in next patch. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 679cef0..c8875b5 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -22,6 +22,10 @@ #include #include +static struct kexec_file_ops *kexec_file_loaders[] = { + NULL, +}; + static void free_transition_pgtable(struct kimage *image) { free_page((unsigned long)image->arch.pud); @@ -283,3 +287,44 @@ void arch_crash_save_vmcoreinfo(void) (unsigned long)&_text - __START_KERNEL); } +/* arch-dependent functionality related to kexec file-based syscall */ + +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, + unsigned long buf_len) +{ + int i, ret = -ENOEXEC; + struct kexec_file_ops *fops; + + for (i = 0; i < ARRAY_SIZE(kexec_file_loaders); i++) { + fops = kexec_file_loaders[i]; + if (!fops || !fops->probe) + continue; + + ret = fops->probe(buf, buf_len); + if (!ret) { + image->fops = fops; + return ret; + } + } + + return ret; +} + +void *arch_kexec_kernel_image_load(struct kimage *image) +{ + if (!image->fops || !image->fops->load) + return ERR_PTR(-ENOEXEC); + + return image->fops->load(image, image->kernel_buf, + image->kernel_buf_len, image->initrd_buf, + image->initrd_buf_len, image->cmdline_buf, + image->cmdline_buf_len); +} + +int arch_kimage_file_post_load_cleanup(struct kimage *image) +{ + if (!image->fops || !image->fops->cleanup) + return 0; + + return image->fops->cleanup(image); +} diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 66d56ac..8e80901 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -121,13 +121,57 @@ struct kimage { #define KEXEC_TYPE_DEFAULT 0 #define KEXEC_TYPE_CRASH 1 unsigned int preserve_context : 1; + /* If set, we are using file mode kexec syscall */ + unsigned int file_mode:1; #ifdef ARCH_HAS_KIMAGE_ARCH struct kimage_arch arch; #endif + + /* Additional fields for file based kexec syscall */ + void *kernel_buf; + unsigned long kernel_buf_len; + + void *initrd_buf; + unsigned long initrd_buf_len; + + char *cmdline_buf; + unsigned long cmdline_buf_len; + + /* File operations provided by image loader */ + struct kexec_file_ops *fops; + + /* Image loader handling the kernel can store a pointer here */ + void *image_loader_data; }; +/* + * Keeps track of buffer parameters as provided by caller for requesting + * memory placement of buffer. + */ +struct kexec_buf { + struct kimage *image; + char *buffer; + unsigned long bufsz; + unsigned long memsz; + unsigned long buf_align; + unsigned long buf_min; + unsigned long buf_max; + bool top_down; /* allocate from top of memory hole */ +}; +typedef int (kexec_probe_t)(const char *kernel_buf, unsigned long kernel_size); +typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf, + unsigned long kernel_len, char *initrd, + unsigned long initrd_len, char *cmdline, + unsigned long cmdline_len); +typedef int (kexec_cleanup_t)(struct kimage *image); + +struct kexec_file_ops { + kexec_probe_t *probe; + kexec_load_t *load; + kexec_cleanup_t *cleanup; +}; /* kexec interface functions */ extern void machine_kexec(struct kimage *image); @@ -138,6 +182,11 @@ extern asmlinkage long sys_kexec_load(unsigned long entry, struct kexec_segment __user *segments, unsigned long flags); extern int kernel_kexec(void); +extern int kexec_add_buffer(struct kimage *image, char *buffer, + unsigned long bufsz, unsigned long memsz, + unsigned long buf_align, unsigned long buf_min, + unsigned long buf_max, bool top_down, + unsigned long *load_addr); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); extern void crash_kexec(struct pt_regs *); @@ -188,6 +237,10 @@ extern int kexec_load_disabled; #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT) #endif +/* List of defined/legal kexec file flags */ +#define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ + KEXEC_FILE_NO_INITRAMFS) + #define VMCOREINFO_BYTES (4096) #define VMCOREINFO_NOTE_NAME "VMCOREINFO" #define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4) diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index d6629d4..6925f5b 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -13,6 +13,17 @@ #define KEXEC_PRESERVE_CONTEXT 0x00000002 #define KEXEC_ARCH_MASK 0xffff0000 +/* + * Kexec file load interface flags. + * KEXEC_FILE_UNLOAD : Unload already loaded kexec/kdump image. + * KEXEC_FILE_ON_CRASH : Load/unload operation belongs to kdump image. + * KEXEC_FILE_NO_INITRAMFS : No initramfs is being loaded. Ignore the initrd + * fd field. + */ +#define KEXEC_FILE_UNLOAD 0x00000001 +#define KEXEC_FILE_ON_CRASH 0x00000002 +#define KEXEC_FILE_NO_INITRAMFS 0x00000004 + /* These values match the ELF architecture values. * Unless there is a good reason that should continue to be the case. */ diff --git a/kernel/kexec.c b/kernel/kexec.c index ec4386c..9b46219 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -6,6 +6,8 @@ * Version 2. See the file COPYING for more details. */ +#define pr_fmt(fmt) "kexec: " fmt + #include #include #include @@ -327,6 +329,221 @@ out_free_image: return ret; } +static int copy_file_from_fd(int fd, void **buf, unsigned long *buf_len) +{ + struct fd f = fdget(fd); + int ret; + struct kstat stat; + loff_t pos; + ssize_t bytes = 0; + + if (!f.file) + return -EBADF; + + ret = vfs_getattr(&f.file->f_path, &stat); + if (ret) + goto out; + + if (stat.size > INT_MAX) { + ret = -EFBIG; + goto out; + } + + /* Don't hand 0 to vmalloc, it whines. */ + if (stat.size == 0) { + ret = -EINVAL; + goto out; + } + + *buf = vmalloc(stat.size); + if (!*buf) { + ret = -ENOMEM; + goto out; + } + + pos = 0; + while (pos < stat.size) { + bytes = kernel_read(f.file, pos, (char *)(*buf) + pos, + stat.size - pos); + if (bytes < 0) { + vfree(*buf); + ret = bytes; + goto out; + } + + if (bytes == 0) + break; + pos += bytes; + } + + if (pos != stat.size) { + ret = -EBADF; + vfree(*buf); + goto out; + } + + *buf_len = pos; +out: + fdput(f); + return ret; +} + +/* Architectures can provide this probe function */ +int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, + unsigned long buf_len) +{ + return -ENOEXEC; +} + +void * __weak arch_kexec_kernel_image_load(struct kimage *image) +{ + return ERR_PTR(-ENOEXEC); +} + +void __weak arch_kimage_file_post_load_cleanup(struct kimage *image) +{ +} + +/* + * Free up memory used by kernel, initrd, and comand line. This is temporary + * memory allocation which is not needed any more after these buffers have + * been loaded into separate segments and have been copied elsewhere. + */ +static void kimage_file_post_load_cleanup(struct kimage *image) +{ + vfree(image->kernel_buf); + image->kernel_buf = NULL; + + vfree(image->initrd_buf); + image->initrd_buf = NULL; + + kfree(image->cmdline_buf); + image->cmdline_buf = NULL; + + /* See if architecture has anything to cleanup post load */ + arch_kimage_file_post_load_cleanup(image); +} + +/* + * In file mode list of segments is prepared by kernel. Copy relevant + * data from user space, do error checking, prepare segment list + */ +static int +kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, + const char __user *cmdline_ptr, + unsigned long cmdline_len, unsigned flags) +{ + int ret = 0; + void *ldata; + + ret = copy_file_from_fd(kernel_fd, &image->kernel_buf, + &image->kernel_buf_len); + if (ret) + return ret; + + /* Call arch image probe handlers */ + ret = arch_kexec_kernel_image_probe(image, image->kernel_buf, + image->kernel_buf_len); + + if (ret) + goto out; + + /* It is possible that there no initramfs is being loaded */ + if (!(flags & KEXEC_FILE_NO_INITRAMFS)) { + ret = copy_file_from_fd(initrd_fd, &image->initrd_buf, + &image->initrd_buf_len); + if (ret) + goto out; + } + + if (cmdline_len) { + image->cmdline_buf = kzalloc(cmdline_len, GFP_KERNEL); + if (!image->cmdline_buf) { + ret = -ENOMEM; + goto out; + } + + ret = copy_from_user(image->cmdline_buf, cmdline_ptr, + cmdline_len); + if (ret) { + ret = -EFAULT; + goto out; + } + + image->cmdline_buf_len = cmdline_len; + + /* command line should be a string with last byte null */ + if (image->cmdline_buf[cmdline_len - 1] != '\0') { + ret = -EINVAL; + goto out; + } + } + + /* Call arch image load handlers */ + ldata = arch_kexec_kernel_image_load(image); + + if (IS_ERR(ldata)) { + ret = PTR_ERR(ldata); + goto out; + } + + image->image_loader_data = ldata; +out: + /* In case of error, free up all allocated memory in this function */ + if (ret) + kimage_file_post_load_cleanup(image); + return ret; +} + +static int +kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, + int initrd_fd, const char __user *cmdline_ptr, + unsigned long cmdline_len, unsigned long flags) +{ + int ret; + struct kimage *image; + + image = do_kimage_alloc_init(); + if (!image) + return -ENOMEM; + + image->file_mode = 1; + + ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd, + cmdline_ptr, cmdline_len, flags); + if (ret) + goto out_free_image; + + ret = sanity_check_segment_list(image); + if (ret) + goto out_free_post_load_bufs; + + ret = -ENOMEM; + image->control_code_page = kimage_alloc_control_pages(image, + get_order(KEXEC_CONTROL_PAGE_SIZE)); + if (!image->control_code_page) { + pr_err("Could not allocate control_code_buffer\n"); + goto out_free_post_load_bufs; + } + + image->swap_page = kimage_alloc_control_pages(image, 0); + if (!image->swap_page) { + pr_err(KERN_ERR "Could not allocate swap buffer\n"); + goto out_free_control_pages; + } + + *rimage = image; + return 0; +out_free_control_pages: + kimage_free_page_list(&image->control_pages); +out_free_post_load_bufs: + kimage_file_post_load_cleanup(image); + kfree(image->image_loader_data); +out_free_image: + kfree(image); + return ret; +} + static int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end) @@ -644,6 +861,16 @@ static void kimage_free(struct kimage *image) /* Free the kexec control pages... */ kimage_free_page_list(&image->control_pages); + + kfree(image->image_loader_data); + + /* + * Free up any temporary buffers allocated. This might hit if + * error occurred much later after buffer allocation. + */ + if (image->file_mode) + kimage_file_post_load_cleanup(image); + kfree(image); } @@ -772,10 +999,14 @@ static int kimage_load_normal_segment(struct kimage *image, unsigned long maddr; size_t ubytes, mbytes; int result; - unsigned char __user *buf; + unsigned char __user *buf = NULL; + unsigned char *kbuf = NULL; result = 0; - buf = segment->buf; + if (image->file_mode) + kbuf = segment->kbuf; + else + buf = segment->buf; ubytes = segment->bufsz; mbytes = segment->memsz; maddr = segment->mem; @@ -807,7 +1038,11 @@ static int kimage_load_normal_segment(struct kimage *image, PAGE_SIZE - (maddr & ~PAGE_MASK)); uchunk = min(ubytes, mchunk); - result = copy_from_user(ptr, buf, uchunk); + /* For file based kexec, source pages are in kernel memory */ + if (image->file_mode) + memcpy(ptr, kbuf, uchunk); + else + result = copy_from_user(ptr, buf, uchunk); kunmap(page); if (result) { result = -EFAULT; @@ -815,7 +1050,10 @@ static int kimage_load_normal_segment(struct kimage *image, } ubytes -= uchunk; maddr += mchunk; - buf += mchunk; + if (image->file_mode) + kbuf += mchunk; + else + buf += mchunk; mbytes -= mchunk; } out: @@ -1062,7 +1300,72 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, unsigned long, cmdline_len, const char __user *, cmdline_ptr, unsigned long, flags) { - return -ENOSYS; + int ret = 0, i; + struct kimage **dest_image, *image; + + /* We only trust the superuser with rebooting the system. */ + if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) + return -EPERM; + + /* Make sure we have a legal set of flags */ + if (flags != (flags & KEXEC_FILE_FLAGS)) + return -EINVAL; + + image = NULL; + + if (!mutex_trylock(&kexec_mutex)) + return -EBUSY; + + dest_image = &kexec_image; + if (flags & KEXEC_FILE_ON_CRASH) + dest_image = &kexec_crash_image; + + if (flags & KEXEC_FILE_UNLOAD) + goto exchange; + + /* + * In case of crash, new kernel gets loaded in reserved region. It is + * same memory where old crash kernel might be loaded. Free any + * current crash dump kernel before we corrupt it. + */ + if (flags & KEXEC_FILE_ON_CRASH) + kimage_free(xchg(&kexec_crash_image, NULL)); + + ret = kimage_file_alloc_init(&image, kernel_fd, initrd_fd, cmdline_ptr, + cmdline_len, flags); + if (ret) + goto out; + + ret = machine_kexec_prepare(image); + if (ret) + goto out; + + for (i = 0; i < image->nr_segments; i++) { + struct kexec_segment *ksegment; + + ksegment = &image->segment[i]; + pr_debug("Loading segment %d: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", + i, ksegment->buf, ksegment->bufsz, ksegment->mem, + ksegment->memsz); + + ret = kimage_load_segment(image, &image->segment[i]); + if (ret) + goto out; + } + + kimage_terminate(image); + + /* + * Free up any temporary buffers allocated which are not needed + * after image has been loaded + */ + kimage_file_post_load_cleanup(image); +exchange: + image = xchg(dest_image, image); +out: + mutex_unlock(&kexec_mutex); + kimage_free(image); + return ret; } void crash_kexec(struct pt_regs *regs) @@ -1620,6 +1923,176 @@ static int __init crash_save_vmcoreinfo_init(void) subsys_initcall(crash_save_vmcoreinfo_init); +static int __kexec_add_segment(struct kimage *image, char *buf, + unsigned long bufsz, unsigned long mem, + unsigned long memsz) +{ + struct kexec_segment *ksegment; + + ksegment = &image->segment[image->nr_segments]; + ksegment->kbuf = buf; + ksegment->bufsz = bufsz; + ksegment->mem = mem; + ksegment->memsz = memsz; + image->nr_segments++; + + return 0; +} + +static int locate_mem_hole_top_down(unsigned long start, unsigned long end, + struct kexec_buf *kbuf) +{ + struct kimage *image = kbuf->image; + unsigned long temp_start, temp_end; + + temp_end = min(end, kbuf->buf_max); + temp_start = temp_end - kbuf->memsz; + + do { + /* align down start */ + temp_start = temp_start & (~(kbuf->buf_align - 1)); + + if (temp_start < start || temp_start < kbuf->buf_min) + return 0; + + temp_end = temp_start + kbuf->memsz - 1; + + /* + * Make sure this does not conflict with any of existing + * segments + */ + if (kimage_is_destination_range(image, temp_start, temp_end)) { + temp_start = temp_start - PAGE_SIZE; + continue; + } + + /* We found a suitable memory range */ + break; + } while (1); + + /* If we are here, we found a suitable memory range */ + __kexec_add_segment(image, kbuf->buffer, kbuf->bufsz, temp_start, + kbuf->memsz); + + /* Success, stop navigating through remaining System RAM ranges */ + return 1; +} + +static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, + struct kexec_buf *kbuf) +{ + struct kimage *image = kbuf->image; + unsigned long temp_start, temp_end; + + temp_start = max(start, kbuf->buf_min); + + do { + temp_start = ALIGN(temp_start, kbuf->buf_align); + temp_end = temp_start + kbuf->memsz - 1; + + if (temp_end > end || temp_end > kbuf->buf_max) + return 0; + /* + * Make sure this does not conflict with any of existing + * segments + */ + if (kimage_is_destination_range(image, temp_start, temp_end)) { + temp_start = temp_start + PAGE_SIZE; + continue; + } + + /* We found a suitable memory range */ + break; + } while (1); + + /* If we are here, we found a suitable memory range */ + __kexec_add_segment(image, kbuf->buffer, kbuf->bufsz, temp_start, + kbuf->memsz); + + /* Success, stop navigating through remaining System RAM ranges */ + return 1; +} + +static int locate_mem_hole_callback(u64 start, u64 end, void *arg) +{ + struct kexec_buf *kbuf = (struct kexec_buf *)arg; + unsigned long sz = end - start + 1; + + /* Returning 0 will take to next memory range */ + if (sz < kbuf->memsz) + return 0; + + if (end < kbuf->buf_min || start > kbuf->buf_max) + return 0; + + /* + * Allocate memory top down with-in ram range. Otherwise bottom up + * allocation. + */ + if (kbuf->top_down) + return locate_mem_hole_top_down(start, end, kbuf); + return locate_mem_hole_bottom_up(start, end, kbuf); +} + +/* + * Helper function for placing a buffer in a kexec segment. This assumes + * that kexec_mutex is held. + */ +int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, + unsigned long memsz, unsigned long buf_align, + unsigned long buf_min, unsigned long buf_max, + bool top_down, unsigned long *load_addr) +{ + + struct kexec_segment *ksegment; + struct kexec_buf buf, *kbuf; + int ret; + + /* Currently adding segment this way is allowed only in file mode */ + if (!image->file_mode) + return -EINVAL; + + if (image->nr_segments >= KEXEC_SEGMENT_MAX) + return -EINVAL; + + /* + * Make sure we are not trying to add buffer after allocating + * control pages. All segments need to be placed first before + * any control pages are allocated. As control page allocation + * logic goes through list of segments to make sure there are + * no destination overlaps. + */ + if (!list_empty(&image->control_pages)) { + WARN_ON(1); + return -EINVAL; + } + + memset(&buf, 0, sizeof(struct kexec_buf)); + kbuf = &buf; + kbuf->image = image; + kbuf->buffer = buffer; + kbuf->bufsz = bufsz; + + kbuf->memsz = ALIGN(memsz, PAGE_SIZE); + kbuf->buf_align = max(buf_align, PAGE_SIZE); + kbuf->buf_min = buf_min; + kbuf->buf_max = buf_max; + kbuf->top_down = top_down; + + /* Walk the RAM ranges and allocate a suitable range for the buffer */ + ret = walk_system_ram_res(0, -1, kbuf, locate_mem_hole_callback); + if (ret != 1) { + /* A suitable memory range could not be found for buffer */ + return -EADDRNOTAVAIL; + } + + /* Found a suitable memory range */ + ksegment = &image->segment[image->nr_segments - 1]; + *load_addr = ksegment->mem; + return 0; +} + + /* * Move into place and start executing a preloaded standalone * executable. If nothing was preloaded return an error. -- cgit v0.10.2 From daeba0641a707626f3674db71016f333edf64395 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:25:59 -0700 Subject: purgatory/sha256: provide implementation of sha256 in purgaotory context Next two patches provide code for purgatory. This is a code which does not link against the kernel and runs stand alone. This code runs between two kernels. One of the primary purpose of this code is to verify the digest of newly loaded kernel and making sure it matches the digest computed at kernel load time. We use sha256 for calculating digest of kexec segmetns. Purgatory can't use stanard crypto API as that API is not available in purgatory context. Hence, I have copied code from crypto/sha256_generic.c and compiled it with purgaotry code so that it could be used. I could not #include sha256_generic.c file here as some of the function signature requiered little tweaking. Original functions work with crypto API but these ones don't So instead of doing #include on sha256_generic.c I just copied relevant portions of code into arch/x86/purgatory/sha256.c. Now we shouldn't have to touch this code at all. Do let me know if there are better ways to handle it. This patch does not enable compiling of this code. That happens in next patch. I wanted to highlight this change in a separate patch for easy review. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/purgatory/sha256.c b/arch/x86/purgatory/sha256.c new file mode 100644 index 0000000..548ca67 --- /dev/null +++ b/arch/x86/purgatory/sha256.c @@ -0,0 +1,283 @@ +/* + * SHA-256, as specified in + * http://csrc.nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf + * + * SHA-256 code by Jean-Luc Cooke . + * + * Copyright (c) Jean-Luc Cooke + * Copyright (c) Andrew McDonald + * Copyright (c) 2002 James Morris + * Copyright (c) 2014 Red Hat Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ + +#include +#include +#include "sha256.h" +#include "../boot/string.h" + +static inline u32 Ch(u32 x, u32 y, u32 z) +{ + return z ^ (x & (y ^ z)); +} + +static inline u32 Maj(u32 x, u32 y, u32 z) +{ + return (x & y) | (z & (x | y)); +} + +#define e0(x) (ror32(x, 2) ^ ror32(x, 13) ^ ror32(x, 22)) +#define e1(x) (ror32(x, 6) ^ ror32(x, 11) ^ ror32(x, 25)) +#define s0(x) (ror32(x, 7) ^ ror32(x, 18) ^ (x >> 3)) +#define s1(x) (ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10)) + +static inline void LOAD_OP(int I, u32 *W, const u8 *input) +{ + W[I] = __be32_to_cpu(((__be32 *)(input))[I]); +} + +static inline void BLEND_OP(int I, u32 *W) +{ + W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; +} + +static void sha256_transform(u32 *state, const u8 *input) +{ + u32 a, b, c, d, e, f, g, h, t1, t2; + u32 W[64]; + int i; + + /* load the input */ + for (i = 0; i < 16; i++) + LOAD_OP(i, W, input); + + /* now blend */ + for (i = 16; i < 64; i++) + BLEND_OP(i, W); + + /* load the state into our registers */ + a = state[0]; b = state[1]; c = state[2]; d = state[3]; + e = state[4]; f = state[5]; g = state[6]; h = state[7]; + + /* now iterate */ + t1 = h + e1(e) + Ch(e, f, g) + 0x428a2f98 + W[0]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; + t1 = g + e1(d) + Ch(d, e, f) + 0x71374491 + W[1]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; + t1 = f + e1(c) + Ch(c, d, e) + 0xb5c0fbcf + W[2]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; + t1 = e + e1(b) + Ch(b, c, d) + 0xe9b5dba5 + W[3]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x3956c25b + W[4]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; + t1 = c + e1(h) + Ch(h, a, b) + 0x59f111f1 + W[5]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x923f82a4 + W[6]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; + t1 = a + e1(f) + Ch(f, g, h) + 0xab1c5ed5 + W[7]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1 + t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0xd807aa98 + W[8]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1 + t2; + t1 = g + e1(d) + Ch(d, e, f) + 0x12835b01 + W[9]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1 + t2; + t1 = f + e1(c) + Ch(c, d, e) + 0x243185be + W[10]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1 + t2; + t1 = e + e1(b) + Ch(b, c, d) + 0x550c7dc3 + W[11]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1 + t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x72be5d74 + W[12]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1 + t2; + t1 = c + e1(h) + Ch(h, a, b) + 0x80deb1fe + W[13]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1 + t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x9bdc06a7 + W[14]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1 + t2; + t1 = a + e1(f) + Ch(f, g, h) + 0xc19bf174 + W[15]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0xe49b69c1 + W[16]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0xefbe4786 + W[17]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0x0fc19dc6 + W[18]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0x240ca1cc + W[19]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x2de92c6f + W[20]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0x4a7484aa + W[21]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x5cb0a9dc + W[22]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0x76f988da + W[23]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0x983e5152 + W[24]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0xa831c66d + W[25]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0xb00327c8 + W[26]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0xbf597fc7 + W[27]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0xc6e00bf3 + W[28]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0xd5a79147 + W[29]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x06ca6351 + W[30]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0x14292967 + W[31]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0x27b70a85 + W[32]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0x2e1b2138 + W[33]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0x4d2c6dfc + W[34]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0x53380d13 + W[35]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x650a7354 + W[36]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0x766a0abb + W[37]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x81c2c92e + W[38]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0x92722c85 + W[39]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0xa2bfe8a1 + W[40]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0xa81a664b + W[41]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0xc24b8b70 + W[42]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0xc76c51a3 + W[43]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0xd192e819 + W[44]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0xd6990624 + W[45]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0xf40e3585 + W[46]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0x106aa070 + W[47]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0x19a4c116 + W[48]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0x1e376c08 + W[49]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0x2748774c + W[50]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0x34b0bcb5 + W[51]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x391c0cb3 + W[52]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0x4ed8aa4a + W[53]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0x5b9cca4f + W[54]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0x682e6ff3 + W[55]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + t1 = h + e1(e) + Ch(e, f, g) + 0x748f82ee + W[56]; + t2 = e0(a) + Maj(a, b, c); d += t1; h = t1+t2; + t1 = g + e1(d) + Ch(d, e, f) + 0x78a5636f + W[57]; + t2 = e0(h) + Maj(h, a, b); c += t1; g = t1+t2; + t1 = f + e1(c) + Ch(c, d, e) + 0x84c87814 + W[58]; + t2 = e0(g) + Maj(g, h, a); b += t1; f = t1+t2; + t1 = e + e1(b) + Ch(b, c, d) + 0x8cc70208 + W[59]; + t2 = e0(f) + Maj(f, g, h); a += t1; e = t1+t2; + t1 = d + e1(a) + Ch(a, b, c) + 0x90befffa + W[60]; + t2 = e0(e) + Maj(e, f, g); h += t1; d = t1+t2; + t1 = c + e1(h) + Ch(h, a, b) + 0xa4506ceb + W[61]; + t2 = e0(d) + Maj(d, e, f); g += t1; c = t1+t2; + t1 = b + e1(g) + Ch(g, h, a) + 0xbef9a3f7 + W[62]; + t2 = e0(c) + Maj(c, d, e); f += t1; b = t1+t2; + t1 = a + e1(f) + Ch(f, g, h) + 0xc67178f2 + W[63]; + t2 = e0(b) + Maj(b, c, d); e += t1; a = t1+t2; + + state[0] += a; state[1] += b; state[2] += c; state[3] += d; + state[4] += e; state[5] += f; state[6] += g; state[7] += h; + + /* clear any sensitive info... */ + a = b = c = d = e = f = g = h = t1 = t2 = 0; + memset(W, 0, 64 * sizeof(u32)); +} + +int sha256_init(struct sha256_state *sctx) +{ + sctx->state[0] = SHA256_H0; + sctx->state[1] = SHA256_H1; + sctx->state[2] = SHA256_H2; + sctx->state[3] = SHA256_H3; + sctx->state[4] = SHA256_H4; + sctx->state[5] = SHA256_H5; + sctx->state[6] = SHA256_H6; + sctx->state[7] = SHA256_H7; + sctx->count = 0; + + return 0; +} + +int sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len) +{ + unsigned int partial, done; + const u8 *src; + + partial = sctx->count & 0x3f; + sctx->count += len; + done = 0; + src = data; + + if ((partial + len) > 63) { + if (partial) { + done = -partial; + memcpy(sctx->buf + partial, data, done + 64); + src = sctx->buf; + } + + do { + sha256_transform(sctx->state, src); + done += 64; + src = data + done; + } while (done + 63 < len); + + partial = 0; + } + memcpy(sctx->buf + partial, src, len - done); + + return 0; +} + +int sha256_final(struct sha256_state *sctx, u8 *out) +{ + __be32 *dst = (__be32 *)out; + __be64 bits; + unsigned int index, pad_len; + int i; + static const u8 padding[64] = { 0x80, }; + + /* Save number of bits */ + bits = cpu_to_be64(sctx->count << 3); + + /* Pad out to 56 mod 64. */ + index = sctx->count & 0x3f; + pad_len = (index < 56) ? (56 - index) : ((64+56) - index); + sha256_update(sctx, padding, pad_len); + + /* Append length (before padding) */ + sha256_update(sctx, (const u8 *)&bits, sizeof(bits)); + + /* Store state in digest */ + for (i = 0; i < 8; i++) + dst[i] = cpu_to_be32(sctx->state[i]); + + /* Zeroize sensitive information. */ + memset(sctx, 0, sizeof(*sctx)); + + return 0; +} diff --git a/arch/x86/purgatory/sha256.h b/arch/x86/purgatory/sha256.h new file mode 100644 index 0000000..bd15a41 --- /dev/null +++ b/arch/x86/purgatory/sha256.h @@ -0,0 +1,22 @@ +/* + * Copyright (C) 2014 Red Hat Inc. + * + * Author: Vivek Goyal + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#ifndef SHA256_H +#define SHA256_H + + +#include +#include + +extern int sha256_init(struct sha256_state *sctx); +extern int sha256_update(struct sha256_state *sctx, const u8 *input, + unsigned int length); +extern int sha256_final(struct sha256_state *sctx, u8 *hash); + +#endif /* SHA256_H */ -- cgit v0.10.2 From 8fc5b4d4121c95482b2583a07863c6b0aba2d9e1 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:02 -0700 Subject: purgatory: core purgatory functionality Create a stand alone relocatable object purgatory which runs between two kernels. This name, concept and some code has been taken from kexec-tools. Idea is that this code runs after a crash and it runs in minimal environment. So keep it separate from rest of the kernel and in long term we will have to practically do no maintenance of this code. This code also has the logic to do verify sha256 hashes of various segments which have been loaded into memory. So first we verify that the kernel we are jumping to is fine and has not been corrupted and make progress only if checsums are verified. This code also takes care of copying some memory contents to backup region. [sfr@canb.auug.org.au: run host built programs from objtree] Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index e5287d8..61b6d51 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -16,3 +16,7 @@ obj-$(CONFIG_IA32_EMULATION) += ia32/ obj-y += platform/ obj-y += net/ + +ifeq ($(CONFIG_X86_64),y) +obj-$(CONFIG_KEXEC) += purgatory/ +endif diff --git a/arch/x86/Makefile b/arch/x86/Makefile index c65fd96..c1aa368 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -183,6 +183,14 @@ archscripts: scripts_basic archheaders: $(Q)$(MAKE) $(build)=arch/x86/syscalls all +archprepare: +ifeq ($(CONFIG_KEXEC),y) +# Build only for 64bit. No loaders for 32bit yet. + ifeq ($(CONFIG_X86_64),y) + $(Q)$(MAKE) $(build)=arch/x86/purgatory arch/x86/purgatory/kexec-purgatory.c + endif +endif + ### # Kernel objects diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile new file mode 100644 index 0000000..7fde9ee --- /dev/null +++ b/arch/x86/purgatory/Makefile @@ -0,0 +1,30 @@ +purgatory-y := purgatory.o stack.o setup-x86_$(BITS).o sha256.o entry64.o string.o + +targets += $(purgatory-y) +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y)) + +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib +targets += purgatory.ro + +# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That +# in turn leaves some undefined symbols like __fentry__ in purgatory and not +# sure how to relocate those. Like kexec-tools, use custom flags. + +KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os -mcmodel=large + +$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE + $(call if_changed,ld) + +targets += kexec-purgatory.c + +quiet_cmd_bin2c = BIN2C $@ + cmd_bin2c = cat $(obj)/purgatory.ro | $(objtree)/scripts/basic/bin2c kexec_purgatory > $(obj)/kexec-purgatory.c + +$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE + $(call if_changed,bin2c) + + +# No loaders for 32bits yet. +ifeq ($(CONFIG_X86_64),y) + obj-$(CONFIG_KEXEC) += kexec-purgatory.o +endif diff --git a/arch/x86/purgatory/entry64.S b/arch/x86/purgatory/entry64.S new file mode 100644 index 0000000..be3249d --- /dev/null +++ b/arch/x86/purgatory/entry64.S @@ -0,0 +1,101 @@ +/* + * Copyright (C) 2003,2004 Eric Biederman (ebiederm@xmission.com) + * Copyright (C) 2014 Red Hat Inc. + + * Author(s): Vivek Goyal + * + * This code has been taken from kexec-tools. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + + .text + .balign 16 + .code64 + .globl entry64, entry64_regs + + +entry64: + /* Setup a gdt that should be preserved */ + lgdt gdt(%rip) + + /* load the data segments */ + movl $0x18, %eax /* data segment */ + movl %eax, %ds + movl %eax, %es + movl %eax, %ss + movl %eax, %fs + movl %eax, %gs + + /* Setup new stack */ + leaq stack_init(%rip), %rsp + pushq $0x10 /* CS */ + leaq new_cs_exit(%rip), %rax + pushq %rax + lretq +new_cs_exit: + + /* Load the registers */ + movq rax(%rip), %rax + movq rbx(%rip), %rbx + movq rcx(%rip), %rcx + movq rdx(%rip), %rdx + movq rsi(%rip), %rsi + movq rdi(%rip), %rdi + movq rsp(%rip), %rsp + movq rbp(%rip), %rbp + movq r8(%rip), %r8 + movq r9(%rip), %r9 + movq r10(%rip), %r10 + movq r11(%rip), %r11 + movq r12(%rip), %r12 + movq r13(%rip), %r13 + movq r14(%rip), %r14 + movq r15(%rip), %r15 + + /* Jump to the new code... */ + jmpq *rip(%rip) + + .section ".rodata" + .balign 4 +entry64_regs: +rax: .quad 0x0 +rbx: .quad 0x0 +rcx: .quad 0x0 +rdx: .quad 0x0 +rsi: .quad 0x0 +rdi: .quad 0x0 +rsp: .quad 0x0 +rbp: .quad 0x0 +r8: .quad 0x0 +r9: .quad 0x0 +r10: .quad 0x0 +r11: .quad 0x0 +r12: .quad 0x0 +r13: .quad 0x0 +r14: .quad 0x0 +r15: .quad 0x0 +rip: .quad 0x0 + .size entry64_regs, . - entry64_regs + + /* GDT */ + .section ".rodata" + .balign 16 +gdt: + /* 0x00 unusable segment + * 0x08 unused + * so use them as gdt ptr + */ + .word gdt_end - gdt - 1 + .quad gdt + .word 0, 0, 0 + + /* 0x10 4GB flat code segment */ + .word 0xFFFF, 0x0000, 0x9A00, 0x00AF + + /* 0x18 4GB flat data segment */ + .word 0xFFFF, 0x0000, 0x9200, 0x00CF +gdt_end: +stack: .quad 0, 0 +stack_init: diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c new file mode 100644 index 0000000..25e068b --- /dev/null +++ b/arch/x86/purgatory/purgatory.c @@ -0,0 +1,72 @@ +/* + * purgatory: Runs between two kernels + * + * Copyright (C) 2014 Red Hat Inc. + * + * Author: + * Vivek Goyal + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include "sha256.h" +#include "../boot/string.h" + +struct sha_region { + unsigned long start; + unsigned long len; +}; + +unsigned long backup_dest = 0; +unsigned long backup_src = 0; +unsigned long backup_sz = 0; + +u8 sha256_digest[SHA256_DIGEST_SIZE] = { 0 }; + +struct sha_region sha_regions[16] = {}; + +/* + * On x86, second kernel requries first 640K of memory to boot. Copy + * first 640K to a backup region in reserved memory range so that second + * kernel can use first 640K. + */ +static int copy_backup_region(void) +{ + if (backup_dest) + memcpy((void *)backup_dest, (void *)backup_src, backup_sz); + + return 0; +} + +int verify_sha256_digest(void) +{ + struct sha_region *ptr, *end; + u8 digest[SHA256_DIGEST_SIZE]; + struct sha256_state sctx; + + sha256_init(&sctx); + end = &sha_regions[sizeof(sha_regions)/sizeof(sha_regions[0])]; + for (ptr = sha_regions; ptr < end; ptr++) + sha256_update(&sctx, (uint8_t *)(ptr->start), ptr->len); + + sha256_final(&sctx, digest); + + if (memcmp(digest, sha256_digest, sizeof(digest))) + return 1; + + return 0; +} + +void purgatory(void) +{ + int ret; + + ret = verify_sha256_digest(); + if (ret) { + /* loop forever */ + for (;;) + ; + } + copy_backup_region(); +} diff --git a/arch/x86/purgatory/setup-x86_64.S b/arch/x86/purgatory/setup-x86_64.S new file mode 100644 index 0000000..fe3c91b --- /dev/null +++ b/arch/x86/purgatory/setup-x86_64.S @@ -0,0 +1,58 @@ +/* + * purgatory: setup code + * + * Copyright (C) 2003,2004 Eric Biederman (ebiederm@xmission.com) + * Copyright (C) 2014 Red Hat Inc. + * + * This code has been taken from kexec-tools. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + + .text + .globl purgatory_start + .balign 16 +purgatory_start: + .code64 + + /* Load a gdt so I know what the segment registers are */ + lgdt gdt(%rip) + + /* load the data segments */ + movl $0x18, %eax /* data segment */ + movl %eax, %ds + movl %eax, %es + movl %eax, %ss + movl %eax, %fs + movl %eax, %gs + + /* Setup a stack */ + leaq lstack_end(%rip), %rsp + + /* Call the C code */ + call purgatory + jmp entry64 + + .section ".rodata" + .balign 16 +gdt: /* 0x00 unusable segment + * 0x08 unused + * so use them as the gdt ptr + */ + .word gdt_end - gdt - 1 + .quad gdt + .word 0, 0, 0 + + /* 0x10 4GB flat code segment */ + .word 0xFFFF, 0x0000, 0x9A00, 0x00AF + + /* 0x18 4GB flat data segment */ + .word 0xFFFF, 0x0000, 0x9200, 0x00CF +gdt_end: + + .bss + .balign 4096 +lstack: + .skip 4096 +lstack_end: diff --git a/arch/x86/purgatory/stack.S b/arch/x86/purgatory/stack.S new file mode 100644 index 0000000..3cefba1 --- /dev/null +++ b/arch/x86/purgatory/stack.S @@ -0,0 +1,19 @@ +/* + * purgatory: stack + * + * Copyright (C) 2014 Red Hat Inc. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + + /* A stack for the loaded kernel. + * Seperate and in the data section so it can be prepopulated. + */ + .data + .balign 4096 + .globl stack, stack_end + +stack: + .skip 4096 +stack_end: diff --git a/arch/x86/purgatory/string.c b/arch/x86/purgatory/string.c new file mode 100644 index 0000000..d886b1f --- /dev/null +++ b/arch/x86/purgatory/string.c @@ -0,0 +1,13 @@ +/* + * Simple string functions. + * + * Copyright (C) 2014 Red Hat Inc. + * + * Author: + * Vivek Goyal + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include "../boot/string.c" -- cgit v0.10.2 From 12db5562e0352986a265841638482b84f3a6899b Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:04 -0700 Subject: kexec: load and relocate purgatory at kernel load time Load purgatory code in RAM and relocate it based on the location. Relocation code has been inspired by module relocation code and purgatory relocation code in kexec-tools. Also compute the checksums of loaded kexec segments and store them in purgatory. Arch independent code provides this functionality so that arch dependent bootloaders can make use of it. Helper functions are provided to get/set symbol values in purgatory which are used by bootloaders later to set things like stack and entry point of second kernel etc. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 8e9dbcb..cacc8d5 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -2065,6 +2065,8 @@ config XIP_PHYS_ADDR config KEXEC bool "Kexec system call (EXPERIMENTAL)" depends on (!SMP || PM_SLEEP_SMP) + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index c84c88b..64aefb7 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -549,6 +549,8 @@ source "drivers/sn/Kconfig" config KEXEC bool "kexec system call" depends on !IA64_HP_SIM && (!SMP || HOTPLUG_CPU) + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 87b7c75..3ff8c9a 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -91,6 +91,8 @@ config MMU_SUN3 config KEXEC bool "kexec system call" depends on M68KCLASSIC + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 900c7e5..df51e78 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2396,6 +2396,8 @@ source "kernel/Kconfig.preempt" config KEXEC bool "Kexec system call" + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4bc7b62..a577609f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -399,6 +399,8 @@ config PPC64_SUPPORTS_MEMORY_FAILURE config KEXEC bool "kexec system call" depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 05c78bb..ab39ceb8 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -48,6 +48,8 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC config KEXEC def_bool y + select CRYPTO + select CRYPTO_SHA256 config AUDIT_ARCH def_bool y diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index aa2df3e..453fa5c 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -595,6 +595,8 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call (EXPERIMENTAL)" depends on SUPERH32 && MMU + select CRYPTO + select CRYPTO_SHA256 help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig index 7fcd492..a3ffe2d 100644 --- a/arch/tile/Kconfig +++ b/arch/tile/Kconfig @@ -191,6 +191,8 @@ source "kernel/Kconfig.hz" config KEXEC bool "kexec system call" + select CRYPTO + select CRYPTO_SHA256 ---help--- kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 98fe3df..9558b9f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1583,6 +1583,8 @@ source kernel/Kconfig.hz config KEXEC bool "kexec system call" select BUILD_BIN2C + select CRYPTO + select CRYPTO_SHA256 ---help--- kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index c8875b5..88404c4 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -6,6 +6,8 @@ * Version 2. See the file COPYING for more details. */ +#define pr_fmt(fmt) "kexec: " fmt + #include #include #include @@ -328,3 +330,143 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) return image->fops->cleanup(image); } + +/* + * Apply purgatory relocations. + * + * ehdr: Pointer to elf headers + * sechdrs: Pointer to section headers. + * relsec: section index of SHT_RELA section. + * + * TODO: Some of the code belongs to generic code. Move that in kexec.c. + */ +int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr, + Elf64_Shdr *sechdrs, unsigned int relsec) +{ + unsigned int i; + Elf64_Rela *rel; + Elf64_Sym *sym; + void *location; + Elf64_Shdr *section, *symtabsec; + unsigned long address, sec_base, value; + const char *strtab, *name, *shstrtab; + + /* + * ->sh_offset has been modified to keep the pointer to section + * contents in memory + */ + rel = (void *)sechdrs[relsec].sh_offset; + + /* Section to which relocations apply */ + section = &sechdrs[sechdrs[relsec].sh_info]; + + pr_debug("Applying relocate section %u to %u\n", relsec, + sechdrs[relsec].sh_info); + + /* Associated symbol table */ + symtabsec = &sechdrs[sechdrs[relsec].sh_link]; + + /* String table */ + if (symtabsec->sh_link >= ehdr->e_shnum) { + /* Invalid strtab section number */ + pr_err("Invalid string table section index %d\n", + symtabsec->sh_link); + return -ENOEXEC; + } + + strtab = (char *)sechdrs[symtabsec->sh_link].sh_offset; + + /* section header string table */ + shstrtab = (char *)sechdrs[ehdr->e_shstrndx].sh_offset; + + for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) { + + /* + * rel[i].r_offset contains byte offset from beginning + * of section to the storage unit affected. + * + * This is location to update (->sh_offset). This is temporary + * buffer where section is currently loaded. This will finally + * be loaded to a different address later, pointed to by + * ->sh_addr. kexec takes care of moving it + * (kexec_load_segment()). + */ + location = (void *)(section->sh_offset + rel[i].r_offset); + + /* Final address of the location */ + address = section->sh_addr + rel[i].r_offset; + + /* + * rel[i].r_info contains information about symbol table index + * w.r.t which relocation must be made and type of relocation + * to apply. ELF64_R_SYM() and ELF64_R_TYPE() macros get + * these respectively. + */ + sym = (Elf64_Sym *)symtabsec->sh_offset + + ELF64_R_SYM(rel[i].r_info); + + if (sym->st_name) + name = strtab + sym->st_name; + else + name = shstrtab + sechdrs[sym->st_shndx].sh_name; + + pr_debug("Symbol: %s info: %02x shndx: %02x value=%llx size: %llx\n", + name, sym->st_info, sym->st_shndx, sym->st_value, + sym->st_size); + + if (sym->st_shndx == SHN_UNDEF) { + pr_err("Undefined symbol: %s\n", name); + return -ENOEXEC; + } + + if (sym->st_shndx == SHN_COMMON) { + pr_err("symbol '%s' in common section\n", name); + return -ENOEXEC; + } + + if (sym->st_shndx == SHN_ABS) + sec_base = 0; + else if (sym->st_shndx >= ehdr->e_shnum) { + pr_err("Invalid section %d for symbol %s\n", + sym->st_shndx, name); + return -ENOEXEC; + } else + sec_base = sechdrs[sym->st_shndx].sh_addr; + + value = sym->st_value; + value += sec_base; + value += rel[i].r_addend; + + switch (ELF64_R_TYPE(rel[i].r_info)) { + case R_X86_64_NONE: + break; + case R_X86_64_64: + *(u64 *)location = value; + break; + case R_X86_64_32: + *(u32 *)location = value; + if (value != *(u32 *)location) + goto overflow; + break; + case R_X86_64_32S: + *(s32 *)location = value; + if ((s64)value != *(s32 *)location) + goto overflow; + break; + case R_X86_64_PC32: + value -= (u64)address; + *(u32 *)location = value; + break; + default: + pr_err("Unknown rela relocation: %llu\n", + ELF64_R_TYPE(rel[i].r_info)); + return -ENOEXEC; + } + } + return 0; + +overflow: + pr_err("Overflow in relocation type %d value 0x%lx\n", + (int)ELF64_R_TYPE(rel[i].r_info), value); + return -ENOEXEC; +} diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8e80901..84f09e9 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -10,6 +10,7 @@ #include #include #include +#include #include /* Verify architecture specific macros are defined */ @@ -95,6 +96,27 @@ struct compat_kexec_segment { }; #endif +struct kexec_sha_region { + unsigned long start; + unsigned long len; +}; + +struct purgatory_info { + /* Pointer to elf header of read only purgatory */ + Elf_Ehdr *ehdr; + + /* Pointer to purgatory sechdrs which are modifiable */ + Elf_Shdr *sechdrs; + /* + * Temporary buffer location where purgatory is loaded and relocated + * This memory can be freed post image load + */ + void *purgatory_buf; + + /* Address where purgatory is finally loaded and is executed from */ + unsigned long purgatory_load_addr; +}; + struct kimage { kimage_entry_t head; kimage_entry_t *entry; @@ -143,6 +165,9 @@ struct kimage { /* Image loader handling the kernel can store a pointer here */ void *image_loader_data; + + /* Information for loading purgatory */ + struct purgatory_info purgatory_info; }; /* @@ -189,6 +214,14 @@ extern int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long *load_addr); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); +extern int kexec_load_purgatory(struct kimage *image, unsigned long min, + unsigned long max, int top_down, + unsigned long *load_addr); +extern int kexec_purgatory_get_set_symbol(struct kimage *image, + const char *name, void *buf, + unsigned int size, bool get_value); +extern void *kexec_purgatory_get_symbol_addr(struct kimage *image, + const char *name); extern void crash_kexec(struct pt_regs *); int kexec_should_crash(struct task_struct *); void crash_save_cpu(struct pt_regs *regs, int cpu); diff --git a/kernel/kexec.c b/kernel/kexec.c index 9b46219..669e331 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -42,6 +42,9 @@ #include #include +#include +#include + /* Per cpu memory for storing cpu states in case of system crash. */ note_buf_t __percpu *crash_notes; @@ -54,6 +57,15 @@ size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); /* Flag to indicate we are going to kexec a new kernel */ bool kexec_in_progress = false; +/* + * Declare these symbols weak so that if architecture provides a purgatory, + * these will be overridden. + */ +char __weak kexec_purgatory[0]; +size_t __weak kexec_purgatory_size = 0; + +static int kexec_calculate_store_digests(struct kimage *image); + /* Location of the reserved area for the crash kernel */ struct resource crashk_res = { .name = "Crash kernel", @@ -404,6 +416,24 @@ void __weak arch_kimage_file_post_load_cleanup(struct kimage *image) { } +/* Apply relocations of type RELA */ +int __weak +arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, + unsigned int relsec) +{ + pr_err("RELA relocation unsupported.\n"); + return -ENOEXEC; +} + +/* Apply relocations of type REL */ +int __weak +arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, + unsigned int relsec) +{ + pr_err("REL relocation unsupported.\n"); + return -ENOEXEC; +} + /* * Free up memory used by kernel, initrd, and comand line. This is temporary * memory allocation which is not needed any more after these buffers have @@ -411,6 +441,8 @@ void __weak arch_kimage_file_post_load_cleanup(struct kimage *image) */ static void kimage_file_post_load_cleanup(struct kimage *image) { + struct purgatory_info *pi = &image->purgatory_info; + vfree(image->kernel_buf); image->kernel_buf = NULL; @@ -420,6 +452,12 @@ static void kimage_file_post_load_cleanup(struct kimage *image) kfree(image->cmdline_buf); image->cmdline_buf = NULL; + vfree(pi->purgatory_buf); + pi->purgatory_buf = NULL; + + vfree(pi->sechdrs); + pi->sechdrs = NULL; + /* See if architecture has anything to cleanup post load */ arch_kimage_file_post_load_cleanup(image); } @@ -1105,7 +1143,7 @@ static int kimage_load_crash_segment(struct kimage *image, } ubytes -= uchunk; maddr += mchunk; - buf += mchunk; + buf += mchunk; mbytes -= mchunk; } out: @@ -1340,6 +1378,10 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; + ret = kexec_calculate_store_digests(image); + if (ret) + goto out; + for (i = 0; i < image->nr_segments; i++) { struct kexec_segment *ksegment; @@ -2092,6 +2134,506 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, return 0; } +/* Calculate and store the digest of segments */ +static int kexec_calculate_store_digests(struct kimage *image) +{ + struct crypto_shash *tfm; + struct shash_desc *desc; + int ret = 0, i, j, zero_buf_sz, sha_region_sz; + size_t desc_size, nullsz; + char *digest; + void *zero_buf; + struct kexec_sha_region *sha_regions; + struct purgatory_info *pi = &image->purgatory_info; + + zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT); + zero_buf_sz = PAGE_SIZE; + + tfm = crypto_alloc_shash("sha256", 0, 0); + if (IS_ERR(tfm)) { + ret = PTR_ERR(tfm); + goto out; + } + + desc_size = crypto_shash_descsize(tfm) + sizeof(*desc); + desc = kzalloc(desc_size, GFP_KERNEL); + if (!desc) { + ret = -ENOMEM; + goto out_free_tfm; + } + + sha_region_sz = KEXEC_SEGMENT_MAX * sizeof(struct kexec_sha_region); + sha_regions = vzalloc(sha_region_sz); + if (!sha_regions) + goto out_free_desc; + + desc->tfm = tfm; + desc->flags = 0; + + ret = crypto_shash_init(desc); + if (ret < 0) + goto out_free_sha_regions; + + digest = kzalloc(SHA256_DIGEST_SIZE, GFP_KERNEL); + if (!digest) { + ret = -ENOMEM; + goto out_free_sha_regions; + } + + for (j = i = 0; i < image->nr_segments; i++) { + struct kexec_segment *ksegment; + + ksegment = &image->segment[i]; + /* + * Skip purgatory as it will be modified once we put digest + * info in purgatory. + */ + if (ksegment->kbuf == pi->purgatory_buf) + continue; + + ret = crypto_shash_update(desc, ksegment->kbuf, + ksegment->bufsz); + if (ret) + break; + + /* + * Assume rest of the buffer is filled with zero and + * update digest accordingly. + */ + nullsz = ksegment->memsz - ksegment->bufsz; + while (nullsz) { + unsigned long bytes = nullsz; + + if (bytes > zero_buf_sz) + bytes = zero_buf_sz; + ret = crypto_shash_update(desc, zero_buf, bytes); + if (ret) + break; + nullsz -= bytes; + } + + if (ret) + break; + + sha_regions[j].start = ksegment->mem; + sha_regions[j].len = ksegment->memsz; + j++; + } + + if (!ret) { + ret = crypto_shash_final(desc, digest); + if (ret) + goto out_free_digest; + ret = kexec_purgatory_get_set_symbol(image, "sha_regions", + sha_regions, sha_region_sz, 0); + if (ret) + goto out_free_digest; + + ret = kexec_purgatory_get_set_symbol(image, "sha256_digest", + digest, SHA256_DIGEST_SIZE, 0); + if (ret) + goto out_free_digest; + } + +out_free_digest: + kfree(digest); +out_free_sha_regions: + vfree(sha_regions); +out_free_desc: + kfree(desc); +out_free_tfm: + kfree(tfm); +out: + return ret; +} + +/* Actually load purgatory. Lot of code taken from kexec-tools */ +static int __kexec_load_purgatory(struct kimage *image, unsigned long min, + unsigned long max, int top_down) +{ + struct purgatory_info *pi = &image->purgatory_info; + unsigned long align, buf_align, bss_align, buf_sz, bss_sz, bss_pad; + unsigned long memsz, entry, load_addr, curr_load_addr, bss_addr, offset; + unsigned char *buf_addr, *src; + int i, ret = 0, entry_sidx = -1; + const Elf_Shdr *sechdrs_c; + Elf_Shdr *sechdrs = NULL; + void *purgatory_buf = NULL; + + /* + * sechdrs_c points to section headers in purgatory and are read + * only. No modifications allowed. + */ + sechdrs_c = (void *)pi->ehdr + pi->ehdr->e_shoff; + + /* + * We can not modify sechdrs_c[] and its fields. It is read only. + * Copy it over to a local copy where one can store some temporary + * data and free it at the end. We need to modify ->sh_addr and + * ->sh_offset fields to keep track of permanent and temporary + * locations of sections. + */ + sechdrs = vzalloc(pi->ehdr->e_shnum * sizeof(Elf_Shdr)); + if (!sechdrs) + return -ENOMEM; + + memcpy(sechdrs, sechdrs_c, pi->ehdr->e_shnum * sizeof(Elf_Shdr)); + + /* + * We seem to have multiple copies of sections. First copy is which + * is embedded in kernel in read only section. Some of these sections + * will be copied to a temporary buffer and relocated. And these + * sections will finally be copied to their final destination at + * segment load time. + * + * Use ->sh_offset to reflect section address in memory. It will + * point to original read only copy if section is not allocatable. + * Otherwise it will point to temporary copy which will be relocated. + * + * Use ->sh_addr to contain final address of the section where it + * will go during execution time. + */ + for (i = 0; i < pi->ehdr->e_shnum; i++) { + if (sechdrs[i].sh_type == SHT_NOBITS) + continue; + + sechdrs[i].sh_offset = (unsigned long)pi->ehdr + + sechdrs[i].sh_offset; + } + + /* + * Identify entry point section and make entry relative to section + * start. + */ + entry = pi->ehdr->e_entry; + for (i = 0; i < pi->ehdr->e_shnum; i++) { + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) + continue; + + if (!(sechdrs[i].sh_flags & SHF_EXECINSTR)) + continue; + + /* Make entry section relative */ + if (sechdrs[i].sh_addr <= pi->ehdr->e_entry && + ((sechdrs[i].sh_addr + sechdrs[i].sh_size) > + pi->ehdr->e_entry)) { + entry_sidx = i; + entry -= sechdrs[i].sh_addr; + break; + } + } + + /* Determine how much memory is needed to load relocatable object. */ + buf_align = 1; + bss_align = 1; + buf_sz = 0; + bss_sz = 0; + + for (i = 0; i < pi->ehdr->e_shnum; i++) { + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) + continue; + + align = sechdrs[i].sh_addralign; + if (sechdrs[i].sh_type != SHT_NOBITS) { + if (buf_align < align) + buf_align = align; + buf_sz = ALIGN(buf_sz, align); + buf_sz += sechdrs[i].sh_size; + } else { + /* bss section */ + if (bss_align < align) + bss_align = align; + bss_sz = ALIGN(bss_sz, align); + bss_sz += sechdrs[i].sh_size; + } + } + + /* Determine the bss padding required to align bss properly */ + bss_pad = 0; + if (buf_sz & (bss_align - 1)) + bss_pad = bss_align - (buf_sz & (bss_align - 1)); + + memsz = buf_sz + bss_pad + bss_sz; + + /* Allocate buffer for purgatory */ + purgatory_buf = vzalloc(buf_sz); + if (!purgatory_buf) { + ret = -ENOMEM; + goto out; + } + + if (buf_align < bss_align) + buf_align = bss_align; + + /* Add buffer to segment list */ + ret = kexec_add_buffer(image, purgatory_buf, buf_sz, memsz, + buf_align, min, max, top_down, + &pi->purgatory_load_addr); + if (ret) + goto out; + + /* Load SHF_ALLOC sections */ + buf_addr = purgatory_buf; + load_addr = curr_load_addr = pi->purgatory_load_addr; + bss_addr = load_addr + buf_sz + bss_pad; + + for (i = 0; i < pi->ehdr->e_shnum; i++) { + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) + continue; + + align = sechdrs[i].sh_addralign; + if (sechdrs[i].sh_type != SHT_NOBITS) { + curr_load_addr = ALIGN(curr_load_addr, align); + offset = curr_load_addr - load_addr; + /* We already modifed ->sh_offset to keep src addr */ + src = (char *) sechdrs[i].sh_offset; + memcpy(buf_addr + offset, src, sechdrs[i].sh_size); + + /* Store load address and source address of section */ + sechdrs[i].sh_addr = curr_load_addr; + + /* + * This section got copied to temporary buffer. Update + * ->sh_offset accordingly. + */ + sechdrs[i].sh_offset = (unsigned long)(buf_addr + offset); + + /* Advance to the next address */ + curr_load_addr += sechdrs[i].sh_size; + } else { + bss_addr = ALIGN(bss_addr, align); + sechdrs[i].sh_addr = bss_addr; + bss_addr += sechdrs[i].sh_size; + } + } + + /* Update entry point based on load address of text section */ + if (entry_sidx >= 0) + entry += sechdrs[entry_sidx].sh_addr; + + /* Make kernel jump to purgatory after shutdown */ + image->start = entry; + + /* Used later to get/set symbol values */ + pi->sechdrs = sechdrs; + + /* + * Used later to identify which section is purgatory and skip it + * from checksumming. + */ + pi->purgatory_buf = purgatory_buf; + return ret; +out: + vfree(sechdrs); + vfree(purgatory_buf); + return ret; +} + +static int kexec_apply_relocations(struct kimage *image) +{ + int i, ret; + struct purgatory_info *pi = &image->purgatory_info; + Elf_Shdr *sechdrs = pi->sechdrs; + + /* Apply relocations */ + for (i = 0; i < pi->ehdr->e_shnum; i++) { + Elf_Shdr *section, *symtab; + + if (sechdrs[i].sh_type != SHT_RELA && + sechdrs[i].sh_type != SHT_REL) + continue; + + /* + * For section of type SHT_RELA/SHT_REL, + * ->sh_link contains section header index of associated + * symbol table. And ->sh_info contains section header + * index of section to which relocations apply. + */ + if (sechdrs[i].sh_info >= pi->ehdr->e_shnum || + sechdrs[i].sh_link >= pi->ehdr->e_shnum) + return -ENOEXEC; + + section = &sechdrs[sechdrs[i].sh_info]; + symtab = &sechdrs[sechdrs[i].sh_link]; + + if (!(section->sh_flags & SHF_ALLOC)) + continue; + + /* + * symtab->sh_link contain section header index of associated + * string table. + */ + if (symtab->sh_link >= pi->ehdr->e_shnum) + /* Invalid section number? */ + continue; + + /* + * Respective archicture needs to provide support for applying + * relocations of type SHT_RELA/SHT_REL. + */ + if (sechdrs[i].sh_type == SHT_RELA) + ret = arch_kexec_apply_relocations_add(pi->ehdr, + sechdrs, i); + else if (sechdrs[i].sh_type == SHT_REL) + ret = arch_kexec_apply_relocations(pi->ehdr, + sechdrs, i); + if (ret) + return ret; + } + + return 0; +} + +/* Load relocatable purgatory object and relocate it appropriately */ +int kexec_load_purgatory(struct kimage *image, unsigned long min, + unsigned long max, int top_down, + unsigned long *load_addr) +{ + struct purgatory_info *pi = &image->purgatory_info; + int ret; + + if (kexec_purgatory_size <= 0) + return -EINVAL; + + if (kexec_purgatory_size < sizeof(Elf_Ehdr)) + return -ENOEXEC; + + pi->ehdr = (Elf_Ehdr *)kexec_purgatory; + + if (memcmp(pi->ehdr->e_ident, ELFMAG, SELFMAG) != 0 + || pi->ehdr->e_type != ET_REL + || !elf_check_arch(pi->ehdr) + || pi->ehdr->e_shentsize != sizeof(Elf_Shdr)) + return -ENOEXEC; + + if (pi->ehdr->e_shoff >= kexec_purgatory_size + || (pi->ehdr->e_shnum * sizeof(Elf_Shdr) > + kexec_purgatory_size - pi->ehdr->e_shoff)) + return -ENOEXEC; + + ret = __kexec_load_purgatory(image, min, max, top_down); + if (ret) + return ret; + + ret = kexec_apply_relocations(image); + if (ret) + goto out; + + *load_addr = pi->purgatory_load_addr; + return 0; +out: + vfree(pi->sechdrs); + vfree(pi->purgatory_buf); + return ret; +} + +static Elf_Sym *kexec_purgatory_find_symbol(struct purgatory_info *pi, + const char *name) +{ + Elf_Sym *syms; + Elf_Shdr *sechdrs; + Elf_Ehdr *ehdr; + int i, k; + const char *strtab; + + if (!pi->sechdrs || !pi->ehdr) + return NULL; + + sechdrs = pi->sechdrs; + ehdr = pi->ehdr; + + for (i = 0; i < ehdr->e_shnum; i++) { + if (sechdrs[i].sh_type != SHT_SYMTAB) + continue; + + if (sechdrs[i].sh_link >= ehdr->e_shnum) + /* Invalid strtab section number */ + continue; + strtab = (char *)sechdrs[sechdrs[i].sh_link].sh_offset; + syms = (Elf_Sym *)sechdrs[i].sh_offset; + + /* Go through symbols for a match */ + for (k = 0; k < sechdrs[i].sh_size/sizeof(Elf_Sym); k++) { + if (ELF_ST_BIND(syms[k].st_info) != STB_GLOBAL) + continue; + + if (strcmp(strtab + syms[k].st_name, name) != 0) + continue; + + if (syms[k].st_shndx == SHN_UNDEF || + syms[k].st_shndx >= ehdr->e_shnum) { + pr_debug("Symbol: %s has bad section index %d.\n", + name, syms[k].st_shndx); + return NULL; + } + + /* Found the symbol we are looking for */ + return &syms[k]; + } + } + + return NULL; +} + +void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name) +{ + struct purgatory_info *pi = &image->purgatory_info; + Elf_Sym *sym; + Elf_Shdr *sechdr; + + sym = kexec_purgatory_find_symbol(pi, name); + if (!sym) + return ERR_PTR(-EINVAL); + + sechdr = &pi->sechdrs[sym->st_shndx]; + + /* + * Returns the address where symbol will finally be loaded after + * kexec_load_segment() + */ + return (void *)(sechdr->sh_addr + sym->st_value); +} + +/* + * Get or set value of a symbol. If "get_value" is true, symbol value is + * returned in buf otherwise symbol value is set based on value in buf. + */ +int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name, + void *buf, unsigned int size, bool get_value) +{ + Elf_Sym *sym; + Elf_Shdr *sechdrs; + struct purgatory_info *pi = &image->purgatory_info; + char *sym_buf; + + sym = kexec_purgatory_find_symbol(pi, name); + if (!sym) + return -EINVAL; + + if (sym->st_size != size) { + pr_err("symbol %s size mismatch: expected %lu actual %u\n", + name, (unsigned long)sym->st_size, size); + return -EINVAL; + } + + sechdrs = pi->sechdrs; + + if (sechdrs[sym->st_shndx].sh_type == SHT_NOBITS) { + pr_err("symbol %s is in a bss section. Cannot %s\n", name, + get_value ? "get" : "set"); + return -EINVAL; + } + + sym_buf = (unsigned char *)sechdrs[sym->st_shndx].sh_offset + + sym->st_value; + + if (get_value) + memcpy((void *)buf, sym_buf, size); + else + memcpy((void *)sym_buf, buf, size); + + return 0; +} /* * Move into place and start executing a preloaded standalone -- cgit v0.10.2 From 27f48d3e633be23656a097baa3be336e04a82d84 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:06 -0700 Subject: kexec-bzImage64: support for loading bzImage using 64bit entry This is loader specific code which can load bzImage and set it up for 64bit entry. This does not take care of 32bit entry or real mode entry. 32bit mode entry can be implemented if somebody needs it. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/include/asm/kexec-bzimage64.h b/arch/x86/include/asm/kexec-bzimage64.h new file mode 100644 index 0000000..d1b5d19 --- /dev/null +++ b/arch/x86/include/asm/kexec-bzimage64.h @@ -0,0 +1,6 @@ +#ifndef _ASM_KEXEC_BZIMAGE64_H +#define _ASM_KEXEC_BZIMAGE64_H + +extern struct kexec_file_ops kexec_bzImage64_ops; + +#endif /* _ASM_KEXE_BZIMAGE64_H */ diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 17483a4..0dfccce 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -23,6 +23,7 @@ #include #include +#include /* * KEXEC_SOURCE_MEMORY_LIMIT maximum page get_free_page can return. @@ -161,6 +162,26 @@ struct kimage_arch { pmd_t *pmd; pte_t *pte; }; + +struct kexec_entry64_regs { + uint64_t rax; + uint64_t rbx; + uint64_t rcx; + uint64_t rdx; + uint64_t rsi; + uint64_t rdi; + uint64_t rsp; + uint64_t rbp; + uint64_t r8; + uint64_t r9; + uint64_t r10; + uint64_t r11; + uint64_t r12; + uint64_t r13; + uint64_t r14; + uint64_t r15; + uint64_t rip; +}; #endif typedef void crash_vmclear_fn(void); diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index bde3993..b5ea75c 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -118,4 +118,5 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o obj-y += vsmp_64.o + obj-$(CONFIG_KEXEC) += kexec-bzimage64.o endif diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c new file mode 100644 index 0000000..bcedd10 --- /dev/null +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -0,0 +1,375 @@ +/* + * Kexec bzImage loader + * + * Copyright (C) 2014 Red Hat Inc. + * Authors: + * Vivek Goyal + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#define pr_fmt(fmt) "kexec-bzImage64: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/* + * Defines lowest physical address for various segments. Not sure where + * exactly these limits came from. Current bzimage64 loader in kexec-tools + * uses these so I am retaining it. It can be changed over time as we gain + * more insight. + */ +#define MIN_PURGATORY_ADDR 0x3000 +#define MIN_BOOTPARAM_ADDR 0x3000 +#define MIN_KERNEL_LOAD_ADDR 0x100000 +#define MIN_INITRD_LOAD_ADDR 0x1000000 + +/* + * This is a place holder for all boot loader specific data structure which + * gets allocated in one call but gets freed much later during cleanup + * time. Right now there is only one field but it can grow as need be. + */ +struct bzimage64_data { + /* + * Temporary buffer to hold bootparams buffer. This should be + * freed once the bootparam segment has been loaded. + */ + void *bootparams_buf; +}; + +static int setup_initrd(struct boot_params *params, + unsigned long initrd_load_addr, unsigned long initrd_len) +{ + params->hdr.ramdisk_image = initrd_load_addr & 0xffffffffUL; + params->hdr.ramdisk_size = initrd_len & 0xffffffffUL; + + params->ext_ramdisk_image = initrd_load_addr >> 32; + params->ext_ramdisk_size = initrd_len >> 32; + + return 0; +} + +static int setup_cmdline(struct boot_params *params, + unsigned long bootparams_load_addr, + unsigned long cmdline_offset, char *cmdline, + unsigned long cmdline_len) +{ + char *cmdline_ptr = ((char *)params) + cmdline_offset; + unsigned long cmdline_ptr_phys; + uint32_t cmdline_low_32, cmdline_ext_32; + + memcpy(cmdline_ptr, cmdline, cmdline_len); + cmdline_ptr[cmdline_len - 1] = '\0'; + + cmdline_ptr_phys = bootparams_load_addr + cmdline_offset; + cmdline_low_32 = cmdline_ptr_phys & 0xffffffffUL; + cmdline_ext_32 = cmdline_ptr_phys >> 32; + + params->hdr.cmd_line_ptr = cmdline_low_32; + if (cmdline_ext_32) + params->ext_cmd_line_ptr = cmdline_ext_32; + + return 0; +} + +static int setup_memory_map_entries(struct boot_params *params) +{ + unsigned int nr_e820_entries; + + nr_e820_entries = e820_saved.nr_map; + + /* TODO: Pass entries more than E820MAX in bootparams setup data */ + if (nr_e820_entries > E820MAX) + nr_e820_entries = E820MAX; + + params->e820_entries = nr_e820_entries; + memcpy(¶ms->e820_map, &e820_saved.map, + nr_e820_entries * sizeof(struct e820entry)); + + return 0; +} + +static int setup_boot_parameters(struct boot_params *params) +{ + unsigned int nr_e820_entries; + unsigned long long mem_k, start, end; + int i; + + /* Get subarch from existing bootparams */ + params->hdr.hardware_subarch = boot_params.hdr.hardware_subarch; + + /* Copying screen_info will do? */ + memcpy(¶ms->screen_info, &boot_params.screen_info, + sizeof(struct screen_info)); + + /* Fill in memsize later */ + params->screen_info.ext_mem_k = 0; + params->alt_mem_k = 0; + + /* Default APM info */ + memset(¶ms->apm_bios_info, 0, sizeof(params->apm_bios_info)); + + /* Default drive info */ + memset(¶ms->hd0_info, 0, sizeof(params->hd0_info)); + memset(¶ms->hd1_info, 0, sizeof(params->hd1_info)); + + /* Default sysdesc table */ + params->sys_desc_table.length = 0; + + setup_memory_map_entries(params); + nr_e820_entries = params->e820_entries; + + for (i = 0; i < nr_e820_entries; i++) { + if (params->e820_map[i].type != E820_RAM) + continue; + start = params->e820_map[i].addr; + end = params->e820_map[i].addr + params->e820_map[i].size - 1; + + if ((start <= 0x100000) && end > 0x100000) { + mem_k = (end >> 10) - (0x100000 >> 10); + params->screen_info.ext_mem_k = mem_k; + params->alt_mem_k = mem_k; + if (mem_k > 0xfc00) + params->screen_info.ext_mem_k = 0xfc00; /* 64M*/ + if (mem_k > 0xffffffff) + params->alt_mem_k = 0xffffffff; + } + } + + /* Setup EDD info */ + memcpy(params->eddbuf, boot_params.eddbuf, + EDDMAXNR * sizeof(struct edd_info)); + params->eddbuf_entries = boot_params.eddbuf_entries; + + memcpy(params->edd_mbr_sig_buffer, boot_params.edd_mbr_sig_buffer, + EDD_MBR_SIG_MAX * sizeof(unsigned int)); + + return 0; +} + +int bzImage64_probe(const char *buf, unsigned long len) +{ + int ret = -ENOEXEC; + struct setup_header *header; + + /* kernel should be atleast two sectors long */ + if (len < 2 * 512) { + pr_err("File is too short to be a bzImage\n"); + return ret; + } + + header = (struct setup_header *)(buf + offsetof(struct boot_params, hdr)); + if (memcmp((char *)&header->header, "HdrS", 4) != 0) { + pr_err("Not a bzImage\n"); + return ret; + } + + if (header->boot_flag != 0xAA55) { + pr_err("No x86 boot sector present\n"); + return ret; + } + + if (header->version < 0x020C) { + pr_err("Must be at least protocol version 2.12\n"); + return ret; + } + + if (!(header->loadflags & LOADED_HIGH)) { + pr_err("zImage not a bzImage\n"); + return ret; + } + + if (!(header->xloadflags & XLF_KERNEL_64)) { + pr_err("Not a bzImage64. XLF_KERNEL_64 is not set.\n"); + return ret; + } + + if (!(header->xloadflags & XLF_CAN_BE_LOADED_ABOVE_4G)) { + pr_err("XLF_CAN_BE_LOADED_ABOVE_4G is not set.\n"); + return ret; + } + + /* I've got a bzImage */ + pr_debug("It's a relocatable bzImage64\n"); + ret = 0; + + return ret; +} + +void *bzImage64_load(struct kimage *image, char *kernel, + unsigned long kernel_len, char *initrd, + unsigned long initrd_len, char *cmdline, + unsigned long cmdline_len) +{ + + struct setup_header *header; + int setup_sects, kern16_size, ret = 0; + unsigned long setup_header_size, params_cmdline_sz; + struct boot_params *params; + unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr; + unsigned long purgatory_load_addr; + unsigned long kernel_bufsz, kernel_memsz, kernel_align; + char *kernel_buf; + struct bzimage64_data *ldata; + struct kexec_entry64_regs regs64; + void *stack; + unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr); + + header = (struct setup_header *)(kernel + setup_hdr_offset); + setup_sects = header->setup_sects; + if (setup_sects == 0) + setup_sects = 4; + + kern16_size = (setup_sects + 1) * 512; + if (kernel_len < kern16_size) { + pr_err("bzImage truncated\n"); + return ERR_PTR(-ENOEXEC); + } + + if (cmdline_len > header->cmdline_size) { + pr_err("Kernel command line too long\n"); + return ERR_PTR(-EINVAL); + } + + /* + * Load purgatory. For 64bit entry point, purgatory code can be + * anywhere. + */ + ret = kexec_load_purgatory(image, MIN_PURGATORY_ADDR, ULONG_MAX, 1, + &purgatory_load_addr); + if (ret) { + pr_err("Loading purgatory failed\n"); + return ERR_PTR(ret); + } + + pr_debug("Loaded purgatory at 0x%lx\n", purgatory_load_addr); + + /* Load Bootparams and cmdline */ + params_cmdline_sz = sizeof(struct boot_params) + cmdline_len; + params = kzalloc(params_cmdline_sz, GFP_KERNEL); + if (!params) + return ERR_PTR(-ENOMEM); + + /* Copy setup header onto bootparams. Documentation/x86/boot.txt */ + setup_header_size = 0x0202 + kernel[0x0201] - setup_hdr_offset; + + /* Is there a limit on setup header size? */ + memcpy(¶ms->hdr, (kernel + setup_hdr_offset), setup_header_size); + + ret = kexec_add_buffer(image, (char *)params, params_cmdline_sz, + params_cmdline_sz, 16, MIN_BOOTPARAM_ADDR, + ULONG_MAX, 1, &bootparam_load_addr); + if (ret) + goto out_free_params; + pr_debug("Loaded boot_param and command line at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + bootparam_load_addr, params_cmdline_sz, params_cmdline_sz); + + /* Load kernel */ + kernel_buf = kernel + kern16_size; + kernel_bufsz = kernel_len - kern16_size; + kernel_memsz = PAGE_ALIGN(header->init_size); + kernel_align = header->kernel_alignment; + + ret = kexec_add_buffer(image, kernel_buf, + kernel_bufsz, kernel_memsz, kernel_align, + MIN_KERNEL_LOAD_ADDR, ULONG_MAX, 1, + &kernel_load_addr); + if (ret) + goto out_free_params; + + pr_debug("Loaded 64bit kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kernel_load_addr, kernel_memsz, kernel_memsz); + + /* Load initrd high */ + if (initrd) { + ret = kexec_add_buffer(image, initrd, initrd_len, initrd_len, + PAGE_SIZE, MIN_INITRD_LOAD_ADDR, + ULONG_MAX, 1, &initrd_load_addr); + if (ret) + goto out_free_params; + + pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + initrd_load_addr, initrd_len, initrd_len); + + setup_initrd(params, initrd_load_addr, initrd_len); + } + + setup_cmdline(params, bootparam_load_addr, sizeof(struct boot_params), + cmdline, cmdline_len); + + /* bootloader info. Do we need a separate ID for kexec kernel loader? */ + params->hdr.type_of_loader = 0x0D << 4; + params->hdr.loadflags = 0; + + /* Setup purgatory regs for entry */ + ret = kexec_purgatory_get_set_symbol(image, "entry64_regs", ®s64, + sizeof(regs64), 1); + if (ret) + goto out_free_params; + + regs64.rbx = 0; /* Bootstrap Processor */ + regs64.rsi = bootparam_load_addr; + regs64.rip = kernel_load_addr + 0x200; + stack = kexec_purgatory_get_symbol_addr(image, "stack_end"); + if (IS_ERR(stack)) { + pr_err("Could not find address of symbol stack_end\n"); + ret = -EINVAL; + goto out_free_params; + } + + regs64.rsp = (unsigned long)stack; + ret = kexec_purgatory_get_set_symbol(image, "entry64_regs", ®s64, + sizeof(regs64), 0); + if (ret) + goto out_free_params; + + setup_boot_parameters(params); + + /* Allocate loader specific data */ + ldata = kzalloc(sizeof(struct bzimage64_data), GFP_KERNEL); + if (!ldata) { + ret = -ENOMEM; + goto out_free_params; + } + + /* + * Store pointer to params so that it could be freed after loading + * params segment has been loaded and contents have been copied + * somewhere else. + */ + ldata->bootparams_buf = params; + return ldata; + +out_free_params: + kfree(params); + return ERR_PTR(ret); +} + +/* This cleanup function is called after various segments have been loaded */ +int bzImage64_cleanup(void *loader_data) +{ + struct bzimage64_data *ldata = loader_data; + + if (!ldata) + return 0; + + kfree(ldata->bootparams_buf); + ldata->bootparams_buf = NULL; + + return 0; +} + +struct kexec_file_ops kexec_bzImage64_ops = { + .probe = bzImage64_probe, + .load = bzImage64_load, + .cleanup = bzImage64_cleanup, +}; diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 88404c4..18d0f9e 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -23,9 +23,10 @@ #include #include #include +#include static struct kexec_file_ops *kexec_file_loaders[] = { - NULL, + &kexec_bzImage64_ops, }; static void free_transition_pgtable(struct kimage *image) @@ -328,7 +329,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) if (!image->fops || !image->fops->cleanup) return 0; - return image->fops->cleanup(image); + return image->fops->cleanup(image->image_loader_data); } /* diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 84f09e9..9481703 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -190,7 +190,7 @@ typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf, unsigned long kernel_len, char *initrd, unsigned long initrd_len, char *cmdline, unsigned long cmdline_len); -typedef int (kexec_cleanup_t)(struct kimage *image); +typedef int (kexec_cleanup_t)(void *loader_data); struct kexec_file_ops { kexec_probe_t *probe; diff --git a/kernel/kexec.c b/kernel/kexec.c index 669e331..0926f2a 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -460,6 +460,14 @@ static void kimage_file_post_load_cleanup(struct kimage *image) /* See if architecture has anything to cleanup post load */ arch_kimage_file_post_load_cleanup(image); + + /* + * Above call should have called into bootloader to free up + * any data stored in kimage->image_loader_data. It should + * be ok now to free it up. + */ + kfree(image->image_loader_data); + image->image_loader_data = NULL; } /* @@ -576,7 +584,6 @@ out_free_control_pages: kimage_free_page_list(&image->control_pages); out_free_post_load_bufs: kimage_file_post_load_cleanup(image); - kfree(image->image_loader_data); out_free_image: kfree(image); return ret; @@ -900,8 +907,6 @@ static void kimage_free(struct kimage *image) /* Free the kexec control pages... */ kimage_free_page_list(&image->control_pages); - kfree(image->image_loader_data); - /* * Free up any temporary buffers allocated. This might hit if * error occurred much later after buffer allocation. -- cgit v0.10.2 From dd5f726076cc7639d9713b334c8c133f77c6757a Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:09 -0700 Subject: kexec: support for kexec on panic using new system call This patch adds support for loading a kexec on panic (kdump) kernel usning new system call. It prepares ELF headers for memory areas to be dumped and for saved cpu registers. Also prepares the memory map for second kernel and limits its boot to reserved areas only. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h new file mode 100644 index 0000000..f498411 --- /dev/null +++ b/arch/x86/include/asm/crash.h @@ -0,0 +1,9 @@ +#ifndef _ASM_X86_CRASH_H +#define _ASM_X86_CRASH_H + +int crash_load_segments(struct kimage *image); +int crash_copy_backup_region(struct kimage *image); +int crash_setup_memmap_entries(struct kimage *image, + struct boot_params *params); + +#endif /* _ASM_X86_CRASH_H */ diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 0dfccce..d2434c1 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -25,6 +25,8 @@ #include #include +struct kimage; + /* * KEXEC_SOURCE_MEMORY_LIMIT maximum page get_free_page can return. * I.e. Maximum page that is mapped directly into kernel memory, @@ -62,6 +64,10 @@ # define KEXEC_ARCH KEXEC_ARCH_X86_64 #endif +/* Memory to backup during crash kdump */ +#define KEXEC_BACKUP_SRC_START (0UL) +#define KEXEC_BACKUP_SRC_END (640 * 1024UL) /* 640K */ + /* * CPU does not save ss and sp on stack if execution is already * running in kernel mode at the time of NMI occurrence. This code @@ -161,17 +167,35 @@ struct kimage_arch { pud_t *pud; pmd_t *pmd; pte_t *pte; + /* Details of backup region */ + unsigned long backup_src_start; + unsigned long backup_src_sz; + + /* Physical address of backup segment */ + unsigned long backup_load_addr; + + /* Core ELF header buffer */ + void *elf_headers; + unsigned long elf_headers_sz; + unsigned long elf_load_addr; }; +#endif /* CONFIG_X86_32 */ +#ifdef CONFIG_X86_64 +/* + * Number of elements and order of elements in this structure should match + * with the ones in arch/x86/purgatory/entry64.S. If you make a change here + * make an appropriate change in purgatory too. + */ struct kexec_entry64_regs { uint64_t rax; - uint64_t rbx; uint64_t rcx; uint64_t rdx; - uint64_t rsi; - uint64_t rdi; + uint64_t rbx; uint64_t rsp; uint64_t rbp; + uint64_t rsi; + uint64_t rdi; uint64_t r8; uint64_t r9; uint64_t r10; diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 507de80..0553a34 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -4,9 +4,14 @@ * Created by: Hariprasad Nellitheertha (hari@in.ibm.com) * * Copyright (C) IBM Corporation, 2004. All rights reserved. + * Copyright (C) Red Hat Inc., 2014. All rights reserved. + * Authors: + * Vivek Goyal * */ +#define pr_fmt(fmt) "kexec: " fmt + #include #include #include @@ -16,6 +21,7 @@ #include #include #include +#include #include #include @@ -28,6 +34,45 @@ #include #include +/* Alignment required for elf header segment */ +#define ELF_CORE_HEADER_ALIGN 4096 + +/* This primarily represents number of split ranges due to exclusion */ +#define CRASH_MAX_RANGES 16 + +struct crash_mem_range { + u64 start, end; +}; + +struct crash_mem { + unsigned int nr_ranges; + struct crash_mem_range ranges[CRASH_MAX_RANGES]; +}; + +/* Misc data about ram ranges needed to prepare elf headers */ +struct crash_elf_data { + struct kimage *image; + /* + * Total number of ram ranges we have after various adjustments for + * GART, crash reserved region etc. + */ + unsigned int max_nr_ranges; + unsigned long gart_start, gart_end; + + /* Pointer to elf header */ + void *ehdr; + /* Pointer to next phdr */ + void *bufp; + struct crash_mem mem; +}; + +/* Used while preparing memory map entries for second kernel */ +struct crash_memmap_data { + struct boot_params *params; + /* Type of memory */ + unsigned int type; +}; + int in_crash_kexec; /* @@ -39,6 +84,7 @@ int in_crash_kexec; */ crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss = NULL; EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss); +unsigned long crash_zero_bytes; static inline void cpu_crash_vmclear_loaded_vmcss(void) { @@ -135,3 +181,520 @@ void native_machine_crash_shutdown(struct pt_regs *regs) #endif crash_save_cpu(regs, safe_smp_processor_id()); } + +#ifdef CONFIG_X86_64 + +static int get_nr_ram_ranges_callback(unsigned long start_pfn, + unsigned long nr_pfn, void *arg) +{ + int *nr_ranges = arg; + + (*nr_ranges)++; + return 0; +} + +static int get_gart_ranges_callback(u64 start, u64 end, void *arg) +{ + struct crash_elf_data *ced = arg; + + ced->gart_start = start; + ced->gart_end = end; + + /* Not expecting more than 1 gart aperture */ + return 1; +} + + +/* Gather all the required information to prepare elf headers for ram regions */ +static void fill_up_crash_elf_data(struct crash_elf_data *ced, + struct kimage *image) +{ + unsigned int nr_ranges = 0; + + ced->image = image; + + walk_system_ram_range(0, -1, &nr_ranges, + get_nr_ram_ranges_callback); + + ced->max_nr_ranges = nr_ranges; + + /* + * We don't create ELF headers for GART aperture as an attempt + * to dump this memory in second kernel leads to hang/crash. + * If gart aperture is present, one needs to exclude that region + * and that could lead to need of extra phdr. + */ + walk_iomem_res("GART", IORESOURCE_MEM, 0, -1, + ced, get_gart_ranges_callback); + + /* + * If we have gart region, excluding that could potentially split + * a memory range, resulting in extra header. Account for that. + */ + if (ced->gart_end) + ced->max_nr_ranges++; + + /* Exclusion of crash region could split memory ranges */ + ced->max_nr_ranges++; + + /* If crashk_low_res is not 0, another range split possible */ + if (crashk_low_res.end != 0) + ced->max_nr_ranges++; +} + +static int exclude_mem_range(struct crash_mem *mem, + unsigned long long mstart, unsigned long long mend) +{ + int i, j; + unsigned long long start, end; + struct crash_mem_range temp_range = {0, 0}; + + for (i = 0; i < mem->nr_ranges; i++) { + start = mem->ranges[i].start; + end = mem->ranges[i].end; + + if (mstart > end || mend < start) + continue; + + /* Truncate any area outside of range */ + if (mstart < start) + mstart = start; + if (mend > end) + mend = end; + + /* Found completely overlapping range */ + if (mstart == start && mend == end) { + mem->ranges[i].start = 0; + mem->ranges[i].end = 0; + if (i < mem->nr_ranges - 1) { + /* Shift rest of the ranges to left */ + for (j = i; j < mem->nr_ranges - 1; j++) { + mem->ranges[j].start = + mem->ranges[j+1].start; + mem->ranges[j].end = + mem->ranges[j+1].end; + } + } + mem->nr_ranges--; + return 0; + } + + if (mstart > start && mend < end) { + /* Split original range */ + mem->ranges[i].end = mstart - 1; + temp_range.start = mend + 1; + temp_range.end = end; + } else if (mstart != start) + mem->ranges[i].end = mstart - 1; + else + mem->ranges[i].start = mend + 1; + break; + } + + /* If a split happend, add the split to array */ + if (!temp_range.end) + return 0; + + /* Split happened */ + if (i == CRASH_MAX_RANGES - 1) { + pr_err("Too many crash ranges after split\n"); + return -ENOMEM; + } + + /* Location where new range should go */ + j = i + 1; + if (j < mem->nr_ranges) { + /* Move over all ranges one slot towards the end */ + for (i = mem->nr_ranges - 1; i >= j; i--) + mem->ranges[i + 1] = mem->ranges[i]; + } + + mem->ranges[j].start = temp_range.start; + mem->ranges[j].end = temp_range.end; + mem->nr_ranges++; + return 0; +} + +/* + * Look for any unwanted ranges between mstart, mend and remove them. This + * might lead to split and split ranges are put in ced->mem.ranges[] array + */ +static int elf_header_exclude_ranges(struct crash_elf_data *ced, + unsigned long long mstart, unsigned long long mend) +{ + struct crash_mem *cmem = &ced->mem; + int ret = 0; + + memset(cmem->ranges, 0, sizeof(cmem->ranges)); + + cmem->ranges[0].start = mstart; + cmem->ranges[0].end = mend; + cmem->nr_ranges = 1; + + /* Exclude crashkernel region */ + ret = exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + return ret; + + ret = exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + return ret; + + /* Exclude GART region */ + if (ced->gart_end) { + ret = exclude_mem_range(cmem, ced->gart_start, ced->gart_end); + if (ret) + return ret; + } + + return ret; +} + +static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg) +{ + struct crash_elf_data *ced = arg; + Elf64_Ehdr *ehdr; + Elf64_Phdr *phdr; + unsigned long mstart, mend; + struct kimage *image = ced->image; + struct crash_mem *cmem; + int ret, i; + + ehdr = ced->ehdr; + + /* Exclude unwanted mem ranges */ + ret = elf_header_exclude_ranges(ced, start, end); + if (ret) + return ret; + + /* Go through all the ranges in ced->mem.ranges[] and prepare phdr */ + cmem = &ced->mem; + + for (i = 0; i < cmem->nr_ranges; i++) { + mstart = cmem->ranges[i].start; + mend = cmem->ranges[i].end; + + phdr = ced->bufp; + ced->bufp += sizeof(Elf64_Phdr); + + phdr->p_type = PT_LOAD; + phdr->p_flags = PF_R|PF_W|PF_X; + phdr->p_offset = mstart; + + /* + * If a range matches backup region, adjust offset to backup + * segment. + */ + if (mstart == image->arch.backup_src_start && + (mend - mstart + 1) == image->arch.backup_src_sz) + phdr->p_offset = image->arch.backup_load_addr; + + phdr->p_paddr = mstart; + phdr->p_vaddr = (unsigned long long) __va(mstart); + phdr->p_filesz = phdr->p_memsz = mend - mstart + 1; + phdr->p_align = 0; + ehdr->e_phnum++; + pr_debug("Crash PT_LOAD elf header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n", + phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz, + ehdr->e_phnum, phdr->p_offset); + } + + return ret; +} + +static int prepare_elf64_headers(struct crash_elf_data *ced, + void **addr, unsigned long *sz) +{ + Elf64_Ehdr *ehdr; + Elf64_Phdr *phdr; + unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz; + unsigned char *buf, *bufp; + unsigned int cpu; + unsigned long long notes_addr; + int ret; + + /* extra phdr for vmcoreinfo elf note */ + nr_phdr = nr_cpus + 1; + nr_phdr += ced->max_nr_ranges; + + /* + * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping + * area on x86_64 (ffffffff80000000 - ffffffffa0000000). + * I think this is required by tools like gdb. So same physical + * memory will be mapped in two elf headers. One will contain kernel + * text virtual addresses and other will have __va(physical) addresses. + */ + + nr_phdr++; + elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr); + elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN); + + buf = vzalloc(elf_sz); + if (!buf) + return -ENOMEM; + + bufp = buf; + ehdr = (Elf64_Ehdr *)bufp; + bufp += sizeof(Elf64_Ehdr); + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); + ehdr->e_ident[EI_CLASS] = ELFCLASS64; + ehdr->e_ident[EI_DATA] = ELFDATA2LSB; + ehdr->e_ident[EI_VERSION] = EV_CURRENT; + ehdr->e_ident[EI_OSABI] = ELF_OSABI; + memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD); + ehdr->e_type = ET_CORE; + ehdr->e_machine = ELF_ARCH; + ehdr->e_version = EV_CURRENT; + ehdr->e_phoff = sizeof(Elf64_Ehdr); + ehdr->e_ehsize = sizeof(Elf64_Ehdr); + ehdr->e_phentsize = sizeof(Elf64_Phdr); + + /* Prepare one phdr of type PT_NOTE for each present cpu */ + for_each_present_cpu(cpu) { + phdr = (Elf64_Phdr *)bufp; + bufp += sizeof(Elf64_Phdr); + phdr->p_type = PT_NOTE; + notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu)); + phdr->p_offset = phdr->p_paddr = notes_addr; + phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t); + (ehdr->e_phnum)++; + } + + /* Prepare one PT_NOTE header for vmcoreinfo */ + phdr = (Elf64_Phdr *)bufp; + bufp += sizeof(Elf64_Phdr); + phdr->p_type = PT_NOTE; + phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note(); + phdr->p_filesz = phdr->p_memsz = sizeof(vmcoreinfo_note); + (ehdr->e_phnum)++; + +#ifdef CONFIG_X86_64 + /* Prepare PT_LOAD type program header for kernel text region */ + phdr = (Elf64_Phdr *)bufp; + bufp += sizeof(Elf64_Phdr); + phdr->p_type = PT_LOAD; + phdr->p_flags = PF_R|PF_W|PF_X; + phdr->p_vaddr = (Elf64_Addr)_text; + phdr->p_filesz = phdr->p_memsz = _end - _text; + phdr->p_offset = phdr->p_paddr = __pa_symbol(_text); + (ehdr->e_phnum)++; +#endif + + /* Prepare PT_LOAD headers for system ram chunks. */ + ced->ehdr = ehdr; + ced->bufp = bufp; + ret = walk_system_ram_res(0, -1, ced, + prepare_elf64_ram_headers_callback); + if (ret < 0) + return ret; + + *addr = buf; + *sz = elf_sz; + return 0; +} + +/* Prepare elf headers. Return addr and size */ +static int prepare_elf_headers(struct kimage *image, void **addr, + unsigned long *sz) +{ + struct crash_elf_data *ced; + int ret; + + ced = kzalloc(sizeof(*ced), GFP_KERNEL); + if (!ced) + return -ENOMEM; + + fill_up_crash_elf_data(ced, image); + + /* By default prepare 64bit headers */ + ret = prepare_elf64_headers(ced, addr, sz); + kfree(ced); + return ret; +} + +static int add_e820_entry(struct boot_params *params, struct e820entry *entry) +{ + unsigned int nr_e820_entries; + + nr_e820_entries = params->e820_entries; + if (nr_e820_entries >= E820MAX) + return 1; + + memcpy(¶ms->e820_map[nr_e820_entries], entry, + sizeof(struct e820entry)); + params->e820_entries++; + return 0; +} + +static int memmap_entry_callback(u64 start, u64 end, void *arg) +{ + struct crash_memmap_data *cmd = arg; + struct boot_params *params = cmd->params; + struct e820entry ei; + + ei.addr = start; + ei.size = end - start + 1; + ei.type = cmd->type; + add_e820_entry(params, &ei); + + return 0; +} + +static int memmap_exclude_ranges(struct kimage *image, struct crash_mem *cmem, + unsigned long long mstart, + unsigned long long mend) +{ + unsigned long start, end; + int ret = 0; + + cmem->ranges[0].start = mstart; + cmem->ranges[0].end = mend; + cmem->nr_ranges = 1; + + /* Exclude Backup region */ + start = image->arch.backup_load_addr; + end = start + image->arch.backup_src_sz - 1; + ret = exclude_mem_range(cmem, start, end); + if (ret) + return ret; + + /* Exclude elf header region */ + start = image->arch.elf_load_addr; + end = start + image->arch.elf_headers_sz - 1; + return exclude_mem_range(cmem, start, end); +} + +/* Prepare memory map for crash dump kernel */ +int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) +{ + int i, ret = 0; + unsigned long flags; + struct e820entry ei; + struct crash_memmap_data cmd; + struct crash_mem *cmem; + + cmem = vzalloc(sizeof(struct crash_mem)); + if (!cmem) + return -ENOMEM; + + memset(&cmd, 0, sizeof(struct crash_memmap_data)); + cmd.params = params; + + /* Add first 640K segment */ + ei.addr = image->arch.backup_src_start; + ei.size = image->arch.backup_src_sz; + ei.type = E820_RAM; + add_e820_entry(params, &ei); + + /* Add ACPI tables */ + cmd.type = E820_ACPI; + flags = IORESOURCE_MEM | IORESOURCE_BUSY; + walk_iomem_res("ACPI Tables", flags, 0, -1, &cmd, + memmap_entry_callback); + + /* Add ACPI Non-volatile Storage */ + cmd.type = E820_NVS; + walk_iomem_res("ACPI Non-volatile Storage", flags, 0, -1, &cmd, + memmap_entry_callback); + + /* Add crashk_low_res region */ + if (crashk_low_res.end) { + ei.addr = crashk_low_res.start; + ei.size = crashk_low_res.end - crashk_low_res.start + 1; + ei.type = E820_RAM; + add_e820_entry(params, &ei); + } + + /* Exclude some ranges from crashk_res and add rest to memmap */ + ret = memmap_exclude_ranges(image, cmem, crashk_res.start, + crashk_res.end); + if (ret) + goto out; + + for (i = 0; i < cmem->nr_ranges; i++) { + ei.size = cmem->ranges[i].end - cmem->ranges[i].start + 1; + + /* If entry is less than a page, skip it */ + if (ei.size < PAGE_SIZE) + continue; + ei.addr = cmem->ranges[i].start; + ei.type = E820_RAM; + add_e820_entry(params, &ei); + } + +out: + vfree(cmem); + return ret; +} + +static int determine_backup_region(u64 start, u64 end, void *arg) +{ + struct kimage *image = arg; + + image->arch.backup_src_start = start; + image->arch.backup_src_sz = end - start + 1; + + /* Expecting only one range for backup region */ + return 1; +} + +int crash_load_segments(struct kimage *image) +{ + unsigned long src_start, src_sz, elf_sz; + void *elf_addr; + int ret; + + /* + * Determine and load a segment for backup area. First 640K RAM + * region is backup source + */ + + ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, + image, determine_backup_region); + + /* Zero or postive return values are ok */ + if (ret < 0) + return ret; + + src_start = image->arch.backup_src_start; + src_sz = image->arch.backup_src_sz; + + /* Add backup segment. */ + if (src_sz) { + /* + * Ideally there is no source for backup segment. This is + * copied in purgatory after crash. Just add a zero filled + * segment for now to make sure checksum logic works fine. + */ + ret = kexec_add_buffer(image, (char *)&crash_zero_bytes, + sizeof(crash_zero_bytes), src_sz, + PAGE_SIZE, 0, -1, 0, + &image->arch.backup_load_addr); + if (ret) + return ret; + pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx memsz=0x%lx\n", + image->arch.backup_load_addr, src_start, src_sz); + } + + /* Prepare elf headers and add a segment */ + ret = prepare_elf_headers(image, &elf_addr, &elf_sz); + if (ret) + return ret; + + image->arch.elf_headers = elf_addr; + image->arch.elf_headers_sz = elf_sz; + + ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz, + ELF_CORE_HEADER_ALIGN, 0, -1, 0, + &image->arch.elf_load_addr); + if (ret) { + vfree((void *)image->arch.elf_headers); + return ret; + } + pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + image->arch.elf_load_addr, elf_sz, elf_sz); + + return ret; +} + +#endif /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index bcedd10..a8e6464 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -21,6 +21,9 @@ #include #include +#include + +#define MAX_ELFCOREHDR_STR_LEN 30 /* elfcorehdr=0x<64bit-value> */ /* * Defines lowest physical address for various segments. Not sure where @@ -58,18 +61,24 @@ static int setup_initrd(struct boot_params *params, return 0; } -static int setup_cmdline(struct boot_params *params, +static int setup_cmdline(struct kimage *image, struct boot_params *params, unsigned long bootparams_load_addr, unsigned long cmdline_offset, char *cmdline, unsigned long cmdline_len) { char *cmdline_ptr = ((char *)params) + cmdline_offset; - unsigned long cmdline_ptr_phys; + unsigned long cmdline_ptr_phys, len; uint32_t cmdline_low_32, cmdline_ext_32; memcpy(cmdline_ptr, cmdline, cmdline_len); + if (image->type == KEXEC_TYPE_CRASH) { + len = sprintf(cmdline_ptr + cmdline_len - 1, + " elfcorehdr=0x%lx", image->arch.elf_load_addr); + cmdline_len += len; + } cmdline_ptr[cmdline_len - 1] = '\0'; + pr_debug("Final command line is: %s\n", cmdline_ptr); cmdline_ptr_phys = bootparams_load_addr + cmdline_offset; cmdline_low_32 = cmdline_ptr_phys & 0xffffffffUL; cmdline_ext_32 = cmdline_ptr_phys >> 32; @@ -98,11 +107,12 @@ static int setup_memory_map_entries(struct boot_params *params) return 0; } -static int setup_boot_parameters(struct boot_params *params) +static int setup_boot_parameters(struct kimage *image, + struct boot_params *params) { unsigned int nr_e820_entries; unsigned long long mem_k, start, end; - int i; + int i, ret = 0; /* Get subarch from existing bootparams */ params->hdr.hardware_subarch = boot_params.hdr.hardware_subarch; @@ -125,7 +135,13 @@ static int setup_boot_parameters(struct boot_params *params) /* Default sysdesc table */ params->sys_desc_table.length = 0; - setup_memory_map_entries(params); + if (image->type == KEXEC_TYPE_CRASH) { + ret = crash_setup_memmap_entries(image, params); + if (ret) + return ret; + } else + setup_memory_map_entries(params); + nr_e820_entries = params->e820_entries; for (i = 0; i < nr_e820_entries; i++) { @@ -153,7 +169,7 @@ static int setup_boot_parameters(struct boot_params *params) memcpy(params->edd_mbr_sig_buffer, boot_params.edd_mbr_sig_buffer, EDD_MBR_SIG_MAX * sizeof(unsigned int)); - return 0; + return ret; } int bzImage64_probe(const char *buf, unsigned long len) @@ -241,6 +257,22 @@ void *bzImage64_load(struct kimage *image, char *kernel, } /* + * In case of crash dump, we will append elfcorehdr= to + * command line. Make sure it does not overflow + */ + if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { + pr_debug("Appending elfcorehdr= to command line exceeds maximum allowed length\n"); + return ERR_PTR(-EINVAL); + } + + /* Allocate and load backup region */ + if (image->type == KEXEC_TYPE_CRASH) { + ret = crash_load_segments(image); + if (ret) + return ERR_PTR(ret); + } + + /* * Load purgatory. For 64bit entry point, purgatory code can be * anywhere. */ @@ -254,7 +286,8 @@ void *bzImage64_load(struct kimage *image, char *kernel, pr_debug("Loaded purgatory at 0x%lx\n", purgatory_load_addr); /* Load Bootparams and cmdline */ - params_cmdline_sz = sizeof(struct boot_params) + cmdline_len; + params_cmdline_sz = sizeof(struct boot_params) + cmdline_len + + MAX_ELFCOREHDR_STR_LEN; params = kzalloc(params_cmdline_sz, GFP_KERNEL); if (!params) return ERR_PTR(-ENOMEM); @@ -303,8 +336,8 @@ void *bzImage64_load(struct kimage *image, char *kernel, setup_initrd(params, initrd_load_addr, initrd_len); } - setup_cmdline(params, bootparam_load_addr, sizeof(struct boot_params), - cmdline, cmdline_len); + setup_cmdline(image, params, bootparam_load_addr, + sizeof(struct boot_params), cmdline, cmdline_len); /* bootloader info. Do we need a separate ID for kexec kernel loader? */ params->hdr.type_of_loader = 0x0D << 4; @@ -332,7 +365,9 @@ void *bzImage64_load(struct kimage *image, char *kernel, if (ret) goto out_free_params; - setup_boot_parameters(params); + ret = setup_boot_parameters(image, params); + if (ret) + goto out_free_params; /* Allocate loader specific data */ ldata = kzalloc(sizeof(struct bzimage64_data), GFP_KERNEL); diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 18d0f9e..9330434 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -178,6 +178,38 @@ static void load_segments(void) ); } +/* Update purgatory as needed after various image segments have been prepared */ +static int arch_update_purgatory(struct kimage *image) +{ + int ret = 0; + + if (!image->file_mode) + return 0; + + /* Setup copying of backup region */ + if (image->type == KEXEC_TYPE_CRASH) { + ret = kexec_purgatory_get_set_symbol(image, "backup_dest", + &image->arch.backup_load_addr, + sizeof(image->arch.backup_load_addr), 0); + if (ret) + return ret; + + ret = kexec_purgatory_get_set_symbol(image, "backup_src", + &image->arch.backup_src_start, + sizeof(image->arch.backup_src_start), 0); + if (ret) + return ret; + + ret = kexec_purgatory_get_set_symbol(image, "backup_sz", + &image->arch.backup_src_sz, + sizeof(image->arch.backup_src_sz), 0); + if (ret) + return ret; + } + + return ret; +} + int machine_kexec_prepare(struct kimage *image) { unsigned long start_pgtable; @@ -191,6 +223,11 @@ int machine_kexec_prepare(struct kimage *image) if (result) return result; + /* update purgatory as needed */ + result = arch_update_purgatory(image); + if (result) + return result; + return 0; } @@ -315,6 +352,9 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, void *arch_kexec_kernel_image_load(struct kimage *image) { + vfree(image->arch.elf_headers); + image->arch.elf_headers = NULL; + if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); diff --git a/arch/x86/purgatory/entry64.S b/arch/x86/purgatory/entry64.S index be3249d..d1a4291 100644 --- a/arch/x86/purgatory/entry64.S +++ b/arch/x86/purgatory/entry64.S @@ -61,13 +61,13 @@ new_cs_exit: .balign 4 entry64_regs: rax: .quad 0x0 -rbx: .quad 0x0 rcx: .quad 0x0 rdx: .quad 0x0 -rsi: .quad 0x0 -rdi: .quad 0x0 +rbx: .quad 0x0 rsp: .quad 0x0 rbp: .quad 0x0 +rsi: .quad 0x0 +rdi: .quad 0x0 r8: .quad 0x0 r9: .quad 0x0 r10: .quad 0x0 diff --git a/kernel/kexec.c b/kernel/kexec.c index 0926f2a..f18c780 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -548,6 +548,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, { int ret; struct kimage *image; + bool kexec_on_panic = flags & KEXEC_FILE_ON_CRASH; image = do_kimage_alloc_init(); if (!image) @@ -555,6 +556,12 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, image->file_mode = 1; + if (kexec_on_panic) { + /* Enable special crash kernel control page alloc policy. */ + image->control_page = crashk_res.start; + image->type = KEXEC_TYPE_CRASH; + } + ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd, cmdline_ptr, cmdline_len, flags); if (ret) @@ -572,10 +579,12 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, goto out_free_post_load_bufs; } - image->swap_page = kimage_alloc_control_pages(image, 0); - if (!image->swap_page) { - pr_err(KERN_ERR "Could not allocate swap buffer\n"); - goto out_free_control_pages; + if (!kexec_on_panic) { + image->swap_page = kimage_alloc_control_pages(image, 0); + if (!image->swap_page) { + pr_err(KERN_ERR "Could not allocate swap buffer\n"); + goto out_free_control_pages; + } } *rimage = image; @@ -1113,10 +1122,14 @@ static int kimage_load_crash_segment(struct kimage *image, unsigned long maddr; size_t ubytes, mbytes; int result; - unsigned char __user *buf; + unsigned char __user *buf = NULL; + unsigned char *kbuf = NULL; result = 0; - buf = segment->buf; + if (image->file_mode) + kbuf = segment->kbuf; + else + buf = segment->buf; ubytes = segment->bufsz; mbytes = segment->memsz; maddr = segment->mem; @@ -1139,7 +1152,12 @@ static int kimage_load_crash_segment(struct kimage *image, /* Zero the trailing part of the page */ memset(ptr + uchunk, 0, mchunk - uchunk); } - result = copy_from_user(ptr, buf, uchunk); + + /* For file based kexec, source pages are in kernel memory */ + if (image->file_mode) + memcpy(ptr, kbuf, uchunk); + else + result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); if (result) { @@ -1148,7 +1166,10 @@ static int kimage_load_crash_segment(struct kimage *image, } ubytes -= uchunk; maddr += mchunk; - buf += mchunk; + if (image->file_mode) + kbuf += mchunk; + else + buf += mchunk; mbytes -= mchunk; } out: @@ -2127,7 +2148,14 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, kbuf->top_down = top_down; /* Walk the RAM ranges and allocate a suitable range for the buffer */ - ret = walk_system_ram_res(0, -1, kbuf, locate_mem_hole_callback); + if (image->type == KEXEC_TYPE_CRASH) + ret = walk_iomem_res("Crash kernel", + IORESOURCE_MEM | IORESOURCE_BUSY, + crashk_res.start, crashk_res.end, kbuf, + locate_mem_hole_callback); + else + ret = walk_system_ram_res(0, -1, kbuf, + locate_mem_hole_callback); if (ret != 1) { /* A suitable memory range could not be found for buffer */ return -EADDRNOTAVAIL; -- cgit v0.10.2 From 6a2c20e7d8900ed273dc34a9af9bf02fc478e427 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:11 -0700 Subject: kexec: support kexec/kdump on EFI systems This patch does two things. It passes EFI run time mappings to second kernel in bootparams efi_info. Second kernel parse this info and create new mappings in second kernel. That means mappings in first and second kernel will be same. This paves the way to enable EFI in kexec kernel. This patch also prepares and passes EFI setup data through bootparams. This contains bunch of information about various tables and their addresses. These information gathering and passing has been written along the lines of what current kexec-tools is doing to make kexec work with UEFI. [akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt] Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Cc: Matt Fleming Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index a8e6464..623e6c5 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -18,10 +18,12 @@ #include #include #include +#include #include #include #include +#include #define MAX_ELFCOREHDR_STR_LEN 30 /* elfcorehdr=0x<64bit-value> */ @@ -90,7 +92,7 @@ static int setup_cmdline(struct kimage *image, struct boot_params *params, return 0; } -static int setup_memory_map_entries(struct boot_params *params) +static int setup_e820_entries(struct boot_params *params) { unsigned int nr_e820_entries; @@ -107,8 +109,93 @@ static int setup_memory_map_entries(struct boot_params *params) return 0; } -static int setup_boot_parameters(struct kimage *image, - struct boot_params *params) +#ifdef CONFIG_EFI +static int setup_efi_info_memmap(struct boot_params *params, + unsigned long params_load_addr, + unsigned int efi_map_offset, + unsigned int efi_map_sz) +{ + void *efi_map = (void *)params + efi_map_offset; + unsigned long efi_map_phys_addr = params_load_addr + efi_map_offset; + struct efi_info *ei = ¶ms->efi_info; + + if (!efi_map_sz) + return 0; + + efi_runtime_map_copy(efi_map, efi_map_sz); + + ei->efi_memmap = efi_map_phys_addr & 0xffffffff; + ei->efi_memmap_hi = efi_map_phys_addr >> 32; + ei->efi_memmap_size = efi_map_sz; + + return 0; +} + +static int +prepare_add_efi_setup_data(struct boot_params *params, + unsigned long params_load_addr, + unsigned int efi_setup_data_offset) +{ + unsigned long setup_data_phys; + struct setup_data *sd = (void *)params + efi_setup_data_offset; + struct efi_setup_data *esd = (void *)sd + sizeof(struct setup_data); + + esd->fw_vendor = efi.fw_vendor; + esd->runtime = efi.runtime; + esd->tables = efi.config_table; + esd->smbios = efi.smbios; + + sd->type = SETUP_EFI; + sd->len = sizeof(struct efi_setup_data); + + /* Add setup data */ + setup_data_phys = params_load_addr + efi_setup_data_offset; + sd->next = params->hdr.setup_data; + params->hdr.setup_data = setup_data_phys; + + return 0; +} + +static int +setup_efi_state(struct boot_params *params, unsigned long params_load_addr, + unsigned int efi_map_offset, unsigned int efi_map_sz, + unsigned int efi_setup_data_offset) +{ + struct efi_info *current_ei = &boot_params.efi_info; + struct efi_info *ei = ¶ms->efi_info; + + if (!current_ei->efi_memmap_size) + return 0; + + /* + * If 1:1 mapping is not enabled, second kernel can not setup EFI + * and use EFI run time services. User space will have to pass + * acpi_rsdp= on kernel command line to make second kernel boot + * without efi. + */ + if (efi_enabled(EFI_OLD_MEMMAP)) + return 0; + + ei->efi_loader_signature = current_ei->efi_loader_signature; + ei->efi_systab = current_ei->efi_systab; + ei->efi_systab_hi = current_ei->efi_systab_hi; + + ei->efi_memdesc_version = current_ei->efi_memdesc_version; + ei->efi_memdesc_size = efi_get_runtime_map_desc_size(); + + setup_efi_info_memmap(params, params_load_addr, efi_map_offset, + efi_map_sz); + prepare_add_efi_setup_data(params, params_load_addr, + efi_setup_data_offset); + return 0; +} +#endif /* CONFIG_EFI */ + +static int +setup_boot_parameters(struct kimage *image, struct boot_params *params, + unsigned long params_load_addr, + unsigned int efi_map_offset, unsigned int efi_map_sz, + unsigned int efi_setup_data_offset) { unsigned int nr_e820_entries; unsigned long long mem_k, start, end; @@ -140,7 +227,7 @@ static int setup_boot_parameters(struct kimage *image, if (ret) return ret; } else - setup_memory_map_entries(params); + setup_e820_entries(params); nr_e820_entries = params->e820_entries; @@ -161,6 +248,12 @@ static int setup_boot_parameters(struct kimage *image, } } +#ifdef CONFIG_EFI + /* Setup EFI state */ + setup_efi_state(params, params_load_addr, efi_map_offset, efi_map_sz, + efi_setup_data_offset); +#endif + /* Setup EDD info */ memcpy(params->eddbuf, boot_params.eddbuf, EDDMAXNR * sizeof(struct edd_info)); @@ -214,6 +307,15 @@ int bzImage64_probe(const char *buf, unsigned long len) return ret; } + /* + * Can't handle 32bit EFI as it does not allow loading kernel + * above 4G. This should be handled by 32bit bzImage loader + */ + if (efi_enabled(EFI_RUNTIME_SERVICES) && !efi_enabled(EFI_64BIT)) { + pr_debug("EFI is 32 bit. Can't load kernel above 4G.\n"); + return ret; + } + /* I've got a bzImage */ pr_debug("It's a relocatable bzImage64\n"); ret = 0; @@ -229,7 +331,7 @@ void *bzImage64_load(struct kimage *image, char *kernel, struct setup_header *header; int setup_sects, kern16_size, ret = 0; - unsigned long setup_header_size, params_cmdline_sz; + unsigned long setup_header_size, params_cmdline_sz, params_misc_sz; struct boot_params *params; unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr; unsigned long purgatory_load_addr; @@ -239,6 +341,7 @@ void *bzImage64_load(struct kimage *image, char *kernel, struct kexec_entry64_regs regs64; void *stack; unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr); + unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset; header = (struct setup_header *)(kernel + setup_hdr_offset); setup_sects = header->setup_sects; @@ -285,12 +388,29 @@ void *bzImage64_load(struct kimage *image, char *kernel, pr_debug("Loaded purgatory at 0x%lx\n", purgatory_load_addr); - /* Load Bootparams and cmdline */ + + /* + * Load Bootparams and cmdline and space for efi stuff. + * + * Allocate memory together for multiple data structures so + * that they all can go in single area/segment and we don't + * have to create separate segment for each. Keeps things + * little bit simple + */ + efi_map_sz = efi_get_runtime_map_size(); + efi_map_sz = ALIGN(efi_map_sz, 16); params_cmdline_sz = sizeof(struct boot_params) + cmdline_len + MAX_ELFCOREHDR_STR_LEN; - params = kzalloc(params_cmdline_sz, GFP_KERNEL); + params_cmdline_sz = ALIGN(params_cmdline_sz, 16); + params_misc_sz = params_cmdline_sz + efi_map_sz + + sizeof(struct setup_data) + + sizeof(struct efi_setup_data); + + params = kzalloc(params_misc_sz, GFP_KERNEL); if (!params) return ERR_PTR(-ENOMEM); + efi_map_offset = params_cmdline_sz; + efi_setup_data_offset = efi_map_offset + efi_map_sz; /* Copy setup header onto bootparams. Documentation/x86/boot.txt */ setup_header_size = 0x0202 + kernel[0x0201] - setup_hdr_offset; @@ -298,13 +418,13 @@ void *bzImage64_load(struct kimage *image, char *kernel, /* Is there a limit on setup header size? */ memcpy(¶ms->hdr, (kernel + setup_hdr_offset), setup_header_size); - ret = kexec_add_buffer(image, (char *)params, params_cmdline_sz, - params_cmdline_sz, 16, MIN_BOOTPARAM_ADDR, + ret = kexec_add_buffer(image, (char *)params, params_misc_sz, + params_misc_sz, 16, MIN_BOOTPARAM_ADDR, ULONG_MAX, 1, &bootparam_load_addr); if (ret) goto out_free_params; - pr_debug("Loaded boot_param and command line at 0x%lx bufsz=0x%lx memsz=0x%lx\n", - bootparam_load_addr, params_cmdline_sz, params_cmdline_sz); + pr_debug("Loaded boot_param, command line and misc at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + bootparam_load_addr, params_misc_sz, params_misc_sz); /* Load kernel */ kernel_buf = kernel + kern16_size; @@ -365,7 +485,9 @@ void *bzImage64_load(struct kimage *image, char *kernel, if (ret) goto out_free_params; - ret = setup_boot_parameters(image, params); + ret = setup_boot_parameters(image, params, bootparam_load_addr, + efi_map_offset, efi_map_sz, + efi_setup_data_offset); if (ret) goto out_free_params; diff --git a/drivers/firmware/efi/runtime-map.c b/drivers/firmware/efi/runtime-map.c index 97cdd16..018c29a 100644 --- a/drivers/firmware/efi/runtime-map.c +++ b/drivers/firmware/efi/runtime-map.c @@ -138,6 +138,27 @@ add_sysfs_runtime_map_entry(struct kobject *kobj, int nr) return entry; } +int efi_get_runtime_map_size(void) +{ + return nr_efi_runtime_map * efi_memdesc_size; +} + +int efi_get_runtime_map_desc_size(void) +{ + return efi_memdesc_size; +} + +int efi_runtime_map_copy(void *buf, size_t bufsz) +{ + size_t sz = efi_get_runtime_map_size(); + + if (sz > bufsz) + sz = bufsz; + + memcpy(buf, efi_runtime_map, sz); + return 0; +} + void efi_runtime_map_setup(void *map, int nr_entries, u32 desc_size) { efi_runtime_map = map; diff --git a/include/linux/efi.h b/include/linux/efi.h index efc681f..45cb4ff 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1156,6 +1156,9 @@ int efivars_sysfs_init(void); #ifdef CONFIG_EFI_RUNTIME_MAP int efi_runtime_map_init(struct kobject *); void efi_runtime_map_setup(void *, int, u32); +int efi_get_runtime_map_size(void); +int efi_get_runtime_map_desc_size(void); +int efi_runtime_map_copy(void *buf, size_t bufsz); #else static inline int efi_runtime_map_init(struct kobject *kobj) { @@ -1164,6 +1167,22 @@ static inline int efi_runtime_map_init(struct kobject *kobj) static inline void efi_runtime_map_setup(void *map, int nr_entries, u32 desc_size) {} + +static inline int efi_get_runtime_map_size(void) +{ + return 0; +} + +static inline int efi_get_runtime_map_desc_size(void) +{ + return 0; +} + +static inline int efi_runtime_map_copy(void *buf, size_t bufsz) +{ + return 0; +} + #endif /* prototypes shared between arch specific and generic stub code */ -- cgit v0.10.2 From 8e7d838103feac320baf9e68d73f954840ac1eea Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Fri, 8 Aug 2014 14:26:13 -0700 Subject: kexec: verify the signature of signed PE bzImage This is the final piece of the puzzle of verifying kernel image signature during kexec_file_load() syscall. This patch calls into PE file routines to verify signature of bzImage. If signature are valid, kexec_file_load() succeeds otherwise it fails. Two new config options have been introduced. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. I tested these patches with both "pesign" and "sbsign" signed bzImages. I used signing_key.priv key and signing_key.x509 cert for signing as generated during kernel build process (if module signing is enabled). Used following method to sign bzImage. pesign ====== - Convert DER format cert to PEM format cert openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform PEM - Generate a .p12 file from existing cert and private key file openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in signing_key.x509.PEM - Import .p12 file into pesign db pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign - Sign bzImage pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign -c "Glacier signing key - Magrathea" -s sbsign ====== sbsign --key signing_key.priv --cert signing_key.x509.PEM --output /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+ Patch details: Well all the hard work is done in previous patches. Now bzImage loader has just call into that code and verify whether bzImage signature are valid or not. Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. Signed-off-by: Vivek Goyal Cc: Borislav Petkov Cc: Michael Kerrisk Cc: Yinghai Lu Cc: Eric Biederman Cc: H. Peter Anvin Cc: Matthew Garrett Cc: Greg Kroah-Hartman Cc: Dave Young Cc: WANG Chao Cc: Baoquan He Cc: Andy Lutomirski Cc: Matt Fleming Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9558b9f..4aafd32 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1599,6 +1599,28 @@ config KEXEC interface is strongly in flux, so no good recommendation can be made. +config KEXEC_VERIFY_SIG + bool "Verify kernel signature during kexec_file_load() syscall" + depends on KEXEC + ---help--- + This option makes kernel signature verification mandatory for + kexec_file_load() syscall. If kernel is signature can not be + verified, kexec_file_load() will fail. + + This option enforces signature verification at generic level. + One needs to enable signature verification for type of kernel + image being loaded to make sure it works. For example, enable + bzImage signature verification option to be able to load and + verify signatures of bzImage. Otherwise kernel loading will fail. + +config KEXEC_BZIMAGE_VERIFY_SIG + bool "Enable bzImage signature verification support" + depends on KEXEC_VERIFY_SIG + depends on SIGNED_PE_FILE_VERIFICATION + select SYSTEM_TRUSTED_KEYRING + ---help--- + Enable bzImage signature verification support. + config CRASH_DUMP bool "kernel crash dumps" depends on X86_64 || (X86_32 && HIGHMEM) diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 623e6c5..9642b9b 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include #include #include @@ -525,8 +527,27 @@ int bzImage64_cleanup(void *loader_data) return 0; } +#ifdef CONFIG_KEXEC_BZIMAGE_VERIFY_SIG +int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len) +{ + bool trusted; + int ret; + + ret = verify_pefile_signature(kernel, kernel_len, + system_trusted_keyring, &trusted); + if (ret < 0) + return ret; + if (!trusted) + return -EKEYREJECTED; + return 0; +} +#endif + struct kexec_file_ops kexec_bzImage64_ops = { .probe = bzImage64_probe, .load = bzImage64_load, .cleanup = bzImage64_cleanup, +#ifdef CONFIG_KEXEC_BZIMAGE_VERIFY_SIG + .verify_sig = bzImage64_verify_sig, +#endif }; diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 9330434..8b04018 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -372,6 +372,17 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) return image->fops->cleanup(image->image_loader_data); } +int arch_kexec_kernel_verify_sig(struct kimage *image, void *kernel, + unsigned long kernel_len) +{ + if (!image->fops || !image->fops->verify_sig) { + pr_debug("kernel loader does not support signature verification."); + return -EKEYREJECTED; + } + + return image->fops->verify_sig(kernel, kernel_len); +} + /* * Apply purgatory relocations. * diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 9481703..4b2a0e1 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -191,11 +191,14 @@ typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf, unsigned long initrd_len, char *cmdline, unsigned long cmdline_len); typedef int (kexec_cleanup_t)(void *loader_data); +typedef int (kexec_verify_sig_t)(const char *kernel_buf, + unsigned long kernel_len); struct kexec_file_ops { kexec_probe_t *probe; kexec_load_t *load; kexec_cleanup_t *cleanup; + kexec_verify_sig_t *verify_sig; }; /* kexec interface functions */ diff --git a/kernel/kexec.c b/kernel/kexec.c index f18c780..0b49a0a 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -416,6 +416,12 @@ void __weak arch_kimage_file_post_load_cleanup(struct kimage *image) { } +int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, + unsigned long buf_len) +{ + return -EKEYREJECTED; +} + /* Apply relocations of type RELA */ int __weak arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, @@ -494,6 +500,15 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, if (ret) goto out; +#ifdef CONFIG_KEXEC_VERIFY_SIG + ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf, + image->kernel_buf_len); + if (ret) { + pr_debug("kernel signature verification failed.\n"); + goto out; + } + pr_debug("kernel signature verification successful.\n"); +#endif /* It is possible that there no initramfs is being loaded */ if (!(flags & KEXEC_FILE_NO_INITRAMFS)) { ret = copy_file_from_fd(initrd_fd, &image->initrd_buf, -- cgit v0.10.2 From e46d12c65998c7ac6a8544fc356e2297228b207a Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:26:16 -0700 Subject: MAINTAINERS: update DMA BUFFER SHARING patterns One pattern per F: line please... Signed-off-by: Joe Perches Acked-by: Sumit Semwal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index f49eb67..d53afcb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2943,7 +2943,9 @@ L: linux-media@vger.kernel.org L: dri-devel@lists.freedesktop.org L: linaro-mm-sig@lists.linaro.org F: drivers/dma-buf/ -F: include/linux/dma-buf* include/linux/reservation.h include/linux/*fence.h +F: include/linux/dma-buf* +F: include/linux/reservation.h +F: include/linux/*fence.h F: Documentation/dma-buf-sharing.txt T: git git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git -- cgit v0.10.2 From faf2e1dbd8744a48dd6c5666aaa9db80373a61b4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:26:18 -0700 Subject: MAINTAINERS: update usb/gadget patterns Several commits have moved files around, update the section patterns. Signed-off-by: Joe Perches Cc: Thomas Dahlmann Cc: Nicolas Ferre Cc: Li Yang Cc: Eric Miao Cc: Russell King Cc: Haojian Zhuang Acked-by: Laurent Pinchart Cc: Andrzej Pietrasiewicz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index d53afcb..9c28a5a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -597,7 +597,7 @@ AMD GEODE CS5536 USB DEVICE CONTROLLER DRIVER M: Thomas Dahlmann L: linux-geode@lists.infradead.org (moderated for non-subscribers) S: Supported -F: drivers/usb/gadget/amd5536udc.* +F: drivers/usb/gadget/udc/amd5536udc.* AMD GEODE PROCESSOR/CHIPSET SUPPORT P: Andres Salomon @@ -1690,7 +1690,7 @@ ATMEL USBA UDC DRIVER M: Nicolas Ferre L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Supported -F: drivers/usb/gadget/atmel_usba_udc.* +F: drivers/usb/gadget/udc/atmel_usba_udc.* ATMEL WIRELESS DRIVER M: Simon Kelley @@ -3800,7 +3800,7 @@ M: Li Yang L: linux-usb@vger.kernel.org L: linuxppc-dev@lists.ozlabs.org S: Maintained -F: drivers/usb/gadget/fsl* +F: drivers/usb/gadget/udc/fsl* FREESCALE QUICC ENGINE UCC ETHERNET DRIVER M: Li Yang @@ -7228,7 +7228,7 @@ S: Maintained F: arch/arm/mach-pxa/ F: drivers/pcmcia/pxa2xx* F: drivers/spi/spi-pxa2xx* -F: drivers/usb/gadget/pxa2* +F: drivers/usb/gadget/udc/pxa2* F: include/sound/pxa2xx-lib.h F: sound/arm/pxa* F: sound/soc/pxa/ @@ -9582,8 +9582,8 @@ USB WEBCAM GADGET M: Laurent Pinchart L: linux-usb@vger.kernel.org S: Maintained -F: drivers/usb/gadget/*uvc*.c -F: drivers/usb/gadget/webcam.c +F: drivers/usb/gadget/function/*uvc*.c +F: drivers/usb/gadget/legacy/webcam.c USB WIRELESS RNDIS DRIVER (rndis_wlan) M: Jussi Kivilinna -- cgit v0.10.2 From ecc265fe9e09e32a3573b2ba26e79b2099eb8bbb Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 8 Aug 2014 14:26:20 -0700 Subject: MAINTAINERS: update nomadik patterns Commit 3a19805920f1 ("pinctrl: nomadik: move all Nomadik drivers to subdir") move the files, update the patterns Signed-off-by: Joe Perches Cc: Linus Walleij Cc: Alessandro Rubini Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 9c28a5a..101445b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1157,6 +1157,7 @@ M: Linus Walleij L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained F: arch/arm/mach-nomadik/ +F: drivers/pinctrl/nomadik/ F: drivers/i2c/busses/i2c-nomadik.c T: git git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik.git @@ -1434,7 +1435,8 @@ F: drivers/mfd/abx500* F: drivers/mfd/ab8500* F: drivers/mfd/dbx500* F: drivers/mfd/db8500* -F: drivers/pinctrl/pinctrl-nomadik* +F: drivers/pinctrl/nomadik/pinctrl-ab* +F: drivers/pinctrl/nomadik/pinctrl-nomadik* F: drivers/rtc/rtc-ab8500.c F: drivers/rtc/rtc-pl031.c T: git git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git -- cgit v0.10.2