summaryrefslogtreecommitdiff
path: root/fs/logfs
AgeCommit message (Collapse)Author
2013-03-23block: Remove bi_idx referencesKent Overstreet
For immutable bvecs, all bi_idx usage needs to be audited - so here we're removing all the unnecessary uses. Most of these are places where it was being initialized on a bio that was just allocated, a few others are conversions to standard macros. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
2013-03-04fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman
Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-23new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-21fs/logfs: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Joern Engel <joern@logfs.org> CC: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-20logfs: drop vmtruncateMarco Stornelli
Removed vmtruncate Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-19Fix misspellings of "whether" in comments.Adam Buchbinder
"Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
2012-10-03fs: push rcu_barrier() from deactivate_locked_super() to filesystemsKirill A. Shutemov
There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
2012-09-21userns: Convert logfs to use kuid/kgid where appropriateEric W. Biederman
Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-26Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstreamLinus Torvalds
Pull LogFS bugfixes from Prasad Joshi: - "logfs: query block device for number of pages to send with bio" This BUG was found when LogFS was used on KVM. The patch fixes the problem by asking for underlaying block device the number of pages to send with each BIO. - "logfs: maintain the ordering of meta-inode destruction" LogFS maintains file system meta-data in special inodes. These inodes are releated to each other, therefore they must be destroyed in a proper order. - "logfs: initialize the number of iovecs in bio" LogFS used to panic when it was created on an encrypted LVM volume. The patch fixes the problem by properly initializing the BIO. Plus a couple more: - logfs: create a pagecache page if it is not present - logfs: destroy the reserved inodes while unmounting * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: logfs: query block device for number of pages to send with bio logfs: maintain the ordering of meta-inode destruction logfs: create a pagecache page if it is not present logfs: initialize the number of iovecs in bio logfs: destroy the reserved inodes while unmounting
2012-07-23logfs: query block device for number of pages to send with bioPrasad Joshi
The block device driver puts a limit on maximum number of pages that can be sent with the bio. Not all block devices can handle BIO_MAX_PAGES number of pages in bio. Specifically the virtio-blk diriver limits it to 126. When the LogFS file system was excersized in KVM, the following bug from do_virtblk_request() was observed static void do_virtblk_request(struct request_queue *q) { .... .... while ((req = blk_peek_request(q)) != NULL) { BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); .... .... } .... } The patch fixes the problem by querring the maximum number of pages in bio allowed from block device driver and then using those many pages during submit_bio. Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-07-23logfs: maintain the ordering of meta-inode destructionPrasad Joshi
LogFS does not use a specialized area to maintain the inodes. The inodes information is kept in a specialized file called inode file. Similarly, the segment information is kept in a segment file. Since the segment file also has an inode which is kept in the inode file, the inode for segment file must be evicted before the inode for inode file. The change fixes the following BUG during unmount Pid: 2057, comm: umount Not tainted 3.5.0-rc6+ #25 Bochs Bochs RIP: 0010:[<ffffffffa005c5f2>] [<ffffffffa005c5f2>] move_page_to_btree+0x32/0x1f0 [logfs] Process umount (pid: 2057, threadinfo ...) Call Trace: [<ffffffff8112adca>] ? find_get_pages+0x2a/0x180 [<ffffffffa00549f5>] logfs_invalidatepage+0x85/0x90 [logfs] [<ffffffff81136c51>] truncate_inode_page+0xb1/0xd0 [<ffffffff81136dcf>] truncate_inode_pages_range+0x15f/0x490 [<ffffffff81558549>] ? printk+0x78/0x7a [<ffffffff81137185>] truncate_inode_pages+0x15/0x20 [<ffffffffa005b7fc>] logfs_evict_inode+0x6c/0x190 [logfs] [<ffffffff8155c75b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff8119e3d7>] evict+0xa7/0x1b0 [<ffffffff8119ea6e>] dispose_list+0x3e/0x60 [<ffffffff8119f1c4>] evict_inodes+0xf4/0x110 [<ffffffff81185b53>] generic_shutdown_super+0x53/0xf0 [<ffffffffa005d8f2>] logfs_kill_sb+0x52/0xf0 [logfs] [<ffffffff81185ec5>] deactivate_locked_super+0x45/0x80 [<ffffffff81186a4a>] deactivate_super+0x4a/0x70 [<ffffffff811a228e>] mntput_no_expire+0xde/0x140 [<ffffffff811a30ff>] sys_umount+0x6f/0x3a0 [<ffffffff8155d8e9>] system_call_fastpath+0x16/0x1b ---[ end trace 45f7752082cefafd ]--- Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-07-23logfs: create a pagecache page if it is not presentPrasad Joshi
While writing the partial journal entries we assumed that the page associated with the journal would always in locatable. This incorrect assumption resulted in the following BUG kernel BUG at /home/benixon/WD_SMR/kernels/linux-3.3.7-logfs/fs/logfs/journal.c:569! EIP is at logfs_write_area+0xb6/0x109 [logfs] EAX: 00000000 EBX: 00000000 ECX: ef6efea4 EDX: 00000000 ESI: 001b9000 EDI: f009e000 EBP: c3c13f14 ESP: c3c13ef0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process sync (pid: 1799, ti=c3c12000 task=f07825b0 task.ti=c3c12000) Stack: 01001000 c3c13f26 781b9000 00000000 f009e000 f7286000 f1f83400 f8445071 f1f83400 c3c13f30 f8445ae9 c3c13f20 0000100a 000ee000 f009e000 00000001 c3c13f5c f8445d17 c05eb0ee 00000000 f1f83400 ef718000 f009e25c ea9c3d80 Call Trace: [<f8445071>] ? account_shadow+0x16d/0x16d [logfs] [<f8445ae9>] logfs_write_je+0x2a/0x44 [logfs] [<f8445d17>] logfs_write_anchor+0x114/0x228 [logfs] [<c05eb0ee>] ? empty+0x5/0x5 [<f8444522>] logfs_sync_fs+0x1e/0x31 [logfs] [<c051be66>] __sync_filesystem+0x5d/0x6f [<c051be8d>] sync_one_sb+0x15/0x17 [<c04ff8b0>] iterate_supers+0x59/0x9a [<c051be78>] ? __sync_filesystem+0x6f/0x6f [<c051befc>] sys_sync+0x29/0x4f [<c084285f>] sysenter_do_call+0x12/0x28 EIP: [<f8445127>] logfs_write_area+0xb6/0x109 [logfs] SS:ESP 0068:c3c13ef0 ---[ end trace ef6e9ef52601a945 ]--- The fix is to create the pagecache page if it is not locatable. Reported-and-tested-by: Benixon Dhas <Benixon.Dhas@wdc.com> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-07-14VFS: Pass mount flags to sget()David Howells
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14don't pass nameidata to ->create()Al Viro
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata to ->lookup()Al Viro
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-06vfs: Rename end_writeback() to clear_inode()Jan Kara
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-04-02logfs: initialize the number of iovecs in bioPrasad Joshi
This fixes the following crash when a LogFS file system, created on a encrypted LVM volume, was mounted [ 526.548034] BUG: unable to handle kernel NULL pointer dereference at [ 526.550106] IP: [<ffffffff8131ecab>] memcpy+0xb/0x120 [ 526.551008] PGD bd60067 PUD 1778d067 PMD 0 [ 526.551783] Oops: 0000 [#1] SMP <d>Pid: 2043, comm: mount <d>RIP: 0010:[<ffffffff8131ecab>] [<ffffffff8131ecab>] memcpy+0xb/0x120 Call Trace: kcryptd_io_read+0xdb/0x100 crypt_map+0xfd/0x190 __map_bio+0x48/0x150 __split_and_process_bio+0x51b/0x630 dm_request+0x138/0x230 generic_make_request+0xca/0x100 submit_bio+0x87/0x110 sync_request+0xdd/0x120 [logfs] bdev_readpage+0x2e/0x70 [logfs] do_read_cache_page+0x82/0x180 logfs_mount+0x2ad/0x770 [logfs] mount_fs+0x47/0x1c0 vfs_kern_mount+0x72/0x110 do_kern_mount+0x54/0x110 do_mount+0x520/0x7f0 sys_mount+0x90/0xe0 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42292 Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-04-02logfs: destroy the reserved inodes while unmountingPrasad Joshi
We were assuming that the evict_inode() would never be called on reserved inodes. However, (after the commit 8e22c1a4e logfs: get rid of magical inodes) while unmounting the file system, in put_super, we call iput() on all of the reserved inodes. The following simple test used to cause a kernel panic on LogFS: 1. Mount a LogFS file system on /mnt 2. Create a file $ touch /mnt/a 3. Try to unmount the FS $ umount /mnt The simple fix would be to drop the assumption and properly destroy the reserved inodes. Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-03-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "This is _not_ all; in particular, Miklos' and Jan's stuff is not there yet." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits) ext4: initialization of ext4_li_mtx needs to be done earlier debugfs-related mode_t whack-a-mole hfsplus: add an ioctl to bless files hfsplus: change finder_info to u32 hfsplus: initialise userflags qnx4: new helper - try_extent() qnx4: get rid of qnx4_bread/qnx4_getblk take removal of PF_FORKNOEXEC to flush_old_exec() trim includes in inode.c um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it um: embed ->stub_pages[] into mmu_context gadgetfs: list_for_each_safe() misuse ocfs2: fix leaks on failure exits in module_init ecryptfs: make register_filesystem() the last potential failure exit ntfs: forgets to unregister sysctls on register_filesystem() failure logfs: missing cleanup on register_filesystem() failure jfs: mising cleanup on register_filesystem() failure make configfs_pin_fs() return root dentry on success configfs: configfs_create_dir() has parent dentry in dentry->d_parent configfs: sanitize configfs_create() ...
2012-03-21logfs: missing cleanup on register_filesystem() failureAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-21switch open-coded instances of d_make_root() to new helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-21vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}Al Viro
New field of struct super_block - ->s_max_links. Maximal allowed value of ->i_nlink or 0; in the latter case all checks still need to be done in ->link/->mkdir/->rename instances. Note that this limit applies both to directoris and to non-directories. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20logfs: remove the second argument of k[un]map_atomic()Cong Wang
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-02-01mtd: fix merge conflict resolution breakageArtem Bityutskiy
This patch fixes merge conflict resolution breakage introduced by merge d3712b9dfcf4 ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream"). The commit changed 'mtd_can_have_bb()' function and made it always return zero, which is incorrect. Instead, we need it to return whether the underlying flash device can have bad eraseblocks or not. UBI needs this information because it affects how it handles the underlying flash. E.g., if the underlying flash is NOR, it cannot have bad blocks and any write or erase error is fatal, and all we can do is to switch to R/O mode. We do not need to reserve a pool of good eraseblocks for bad eraseblocks handling, and so on. This patch also removes 'mtd_can_have_bb()' invocations from Logfs to ensure correct Logfs behavior. I've tested that with this patch UBI works on top of NOR and NAND flashes emulated by mtdram and nandsim correspondingly. This patch is based on patch from Linus Torvalds. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jörn Engel <joern@logfs.org> Acked-by: Prasad Joshi <prasadjoshi.linux@gmail.com> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-31Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstreamLinus Torvalds
There are few important bug fixes for LogFS * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: Logfs: Allow NULL block_isbad() methods logfs: Grow inode in delete path logfs: Free areas before calling generic_shutdown_super() logfs: remove useless BUG_ON MAINTAINERS: Add Prasad Joshi in LogFS maintiners logfs: Propagate page parameter to __logfs_write_inode logfs: set superblock shutdown flag after generic sb shutdown logfs: take write mutex lock during fsync and sync logfs: Prevent memory corruption logfs: update page reference count for pined pages Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL block_isbad() methods" clashing with the abstraction changes in the commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly". This resolution takes the semantics from commit f2933e86ad93, and just makes mtd_block_isbad() return zero (false) if the 'block_isbad' function is NULL. But that also means that now "mtd_can_have_bb()" always returns 0. Now, "mtd_block_markbad()" will obviously return an error if the low-level driver doesn't support bad blocks, so this is somewhat non-symmetric, but it actually makes sense if a NULL "block_isbad" function is considered to mean "I assume that all my blocks are always good".
2012-01-28Logfs: Allow NULL block_isbad() methodsJoern Engel
Not all mtd drivers define block_isbad(). Let's assume no bad blocks instead of refusing to mount. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Grow inode in delete pathJoern Engel
Can be necessary if an inode gets deleted (through -ENOSPC) before being written. Might be better to move this into logfs_write_rec(), but for now go with the stupid&safe patch. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Free areas before calling generic_shutdown_super()Joern Engel
Or hit an assertion in map_invalidatepage() instead. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: remove useless BUG_ONJoern Engel
It prevents write sizes >4k. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Propagate page parameter to __logfs_write_inodePrasad Joshi
During GC LogFS has to rewrite each valid block to a separate segment. Rewrite operation reads data from an old segment and writes it to a newly allocated segment. Since every write operation changes data block pointers maintained in inode, inode should also be rewritten. In GC path to avoid AB-BA deadlock LogFS marks a page with PG_pre_locked in addition to locking the page (PG_locked). The page lock is ignored iff the page is pre-locked. LogFS uses a special file called segment file. The segment file maintains an 8 bytes entry for every segment. It keeps track of erase count, level etc. for every segment. Bad things happen with a segment belonging to the segment file is GCed ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/readwrite.c:297! invalid opcode: 0000 [#1] SMP Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4 serio_raw [last unloaded: logfs] Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH VirtualBox EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0 EIP is at logfs_lock_write_page+0x6a/0x70 [logfs] EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094 ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000) Stack: f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4 00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5 00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900 Call Trace: [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs] [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs] [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs] [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs] [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs] [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs] [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60 [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs] [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs] [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs] [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs] [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs] [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0 [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs] [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs] [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs] [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs] [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs] [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs] [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs] [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs] [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs] [<c1126b21>] mount_fs+0x21/0xd0 [<c10f6f6f>] ? __alloc_percpu+0xf/0x20 [<c113da41>] ? alloc_vfsmnt+0xb1/0x130 [<c113db4b>] vfs_kern_mount+0x4b/0xa0 [<c113e06e>] do_kern_mount+0x3e/0xe0 [<c113f60d>] do_mount+0x34d/0x670 [<c10f2749>] ? strndup_user+0x49/0x70 [<c113fcab>] sys_mount+0x6b/0xa0 [<c142d87c>] syscall_call+0x7/0xb Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09 EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18 ---[ end trace 96e67d5b3aa3d6ca ]--- The patch passes locked page to __logfs_write_inode. It calls function logfs_get_wblocks() to pre-lock the page. This ensures any further attempts to lock the page are ignored (esp from get_erase_count). Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: set superblock shutdown flag after generic sb shutdownPrasad Joshi
While unmounting the file system LogFS calls generic_shutdown_super. The function does file system independent superblock shutdown. However, it might result in call file system specific inode eviction. LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in super->s_flags. Since, inode eviction might call truncate on inode, following BUG is observed when file system is unmounted: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/segment.c:362! invalid opcode: 0000 [#1] PREEMPT SMP CPU 3 Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp parport psmouse floppy virtio_pci serio_raw virtio_ring virtio Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa008c841>] [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202 RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000 RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000 RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40 R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000 FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 1933, threadinfo ffff880062d7a000, task ffff880070b44500) Stack: ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000 8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460 ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000 Call Trace: [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs] [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs] [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs] [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs] [<ffffffff8115eef5>] evict+0x85/0x170 [<ffffffff8115f126>] iput+0xe6/0x1b0 [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280 [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90 [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100 [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs] [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70 [<ffffffff811487ea>] deactivate_super+0x4a/0x70 [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0 [<ffffffff8116469f>] sys_umount+0x6f/0x380 [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55 b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f> 0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66 RIP [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP <ffff880062d7b9e8> ---[ end trace fe6b040cea952290 ]--- Therefore, move super->s_flags setting after the fs-indenpendent work has been finished. Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: take write mutex lock during fsync and syncPrasad Joshi
LogFS uses super->s_write_mutex while writing data to disk. Taking the same mutex lock in sync and fsync code path solves the following BUG: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/dev_bdev.c:134! Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa007deed>] [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] Call Trace: [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs] [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100 [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs] [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20 [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130 [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs] [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs] [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs] [<ffffffff810f5ba7>] __writepage+0x17/0x40 [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0 [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70 [<ffffffff810f653a>] generic_writepages+0x4a/0x70 [<ffffffff810f75d1>] do_writepages+0x21/0x40 [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250 [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0 [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0 [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120 [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0 [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290 [<ffffffff8106d2e6>] kthread+0x96/0xa0 [<ffffffff814de514>] kernel_thread_helper+0x4/0x10 [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814de510>] ? gs_change+0xb/0xb RIP [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] ---[ end trace 0211ad60a57657c4 ]--- Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: Prevent memory corruptionJoern Engel
This is a bad one. I wonder whether we were so far protected by no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS. Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: update page reference count for pined pagesPrasad Joshi
LogFS sets PG_private flag to indicate a pined page. We assumed that marking a page as private is enough to ensure its existence. But instead it is necessary to hold a reference count to the page. The change resolves the following BUG BUG: Bad page state in process flush-253:16 pfn:6a6d0 page flags: 0x100000000000808(uptodate|private) Suggested-and-Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-10Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6Linus Torvalds
MTD pull for 3.3 * tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits) mtd: Fix dependency for MTD_DOC200x mtd: do not use mtd->block_markbad directly logfs: do not use 'mtd->block_isbad' directly mtd: introduce mtd_can_have_bb helper mtd: do not use mtd->suspend and mtd->resume directly mtd: do not use mtd->lock, unlock and is_locked directly mtd: do not use mtd->sync directly mtd: harmonize mtd_writev usage mtd: do not use mtd->lock_user_prot_reg directly mtd: mtd->write_user_prot_reg directly mtd: do not use mtd->read_*_prot_reg directly mtd: do not use mtd->get_*_prot_info directly mtd: do not use mtd->read_oob directly mtd: mtdoops: do not use mtd->panic_write directly romfs: do not use mtd->get_unmapped_area directly mtd: do not use mtd->get_unmapped_area directly mtd: do use mtd->point directly mtd: introduce mtd_has_oob helper mtd: mtdcore: export symbols cleanup mtd: clean-up the default_mtd_writev function ... Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
2012-01-09logfs: do not use 'mtd->block_isbad' directlyArtem Bityutskiy
Instead, use the new 'mtd_can_have_bb()' helper. Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: do not use mtd->sync directlyArtem Bityutskiy
This patch teaches 'mtd_sync()' to do nothing when the MTD driver does not have the '->sync()' method, which allows us to remove all direct 'mtd->sync' accesses. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: introduce mtd_block_isbad interfaceArtem Bityutskiy
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: introduce mtd_sync interfaceArtem Bityutskiy
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: introduce mtd_write interfaceArtem Bityutskiy
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: introduce mtd_read interfaceArtem Bityutskiy
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09mtd: introduce mtd_erase interfaceArtem Bityutskiy
This patch is part of a patch-set which changes the MTD interface from 'mtd->func()' form to 'mtd_func()' form. We need this because we want to add common code to to all drivers in the mtd core level, which is impossible with the current interface when MTD clients call driver functions like 'read()' or 'write()' directly. At this point we just introduce a new inline wrapper function, but later some of them are expected to gain more code. E.g., the input parameters check should be moved to the wrappers rather than be duplicated at many drivers. This particular patch introduced the 'mtd_erase()' interface. The following patches add all the other interfaces one by one. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09logfs: rename functions starting with mtd_Artem Bityutskiy
We are going to re-work the MTD interface and change 'mtd->write()' to 'mtd_write()', 'mtd->read()' to 'mtd_read()' and so forth for all functions in the 'struct mtd_info' structure. However, logfs has its own 'mtd_read()', 'mtd_write()', etc functions which collide with our changes. This patch renames these logfs functions to 'logfs_mtd_read()', 'logfs_mtd_write()', etc. Additionally, to make the 'fs/logfs/dev_mtd.c' file look consistent, rename similarly all the other functions starting with 'mtd_'. Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-04logfs: propagate umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch ->mknod() to umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch ->create() to umode_tAl Viro
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch vfs_mkdir() and ->mkdir() to umode_tAl Viro
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>