summaryrefslogtreecommitdiff
path: root/fs/gfs2/quota.c
AgeCommit message (Collapse)Author
2015-02-13list_lru: add helpers to isolate itemsVladimir Davydov
Currently, the isolate callback passed to the list_lru_walk family of functions is supposed to just delete an item from the list upon returning LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by __list_lru_walk_one after the callback returns. Since the callback is allowed to drop the lock after removing an item (it has to return LRU_REMOVED_RETRY then), the nr_items can be less than the actual number of elements on the list even if we check them under the lock. This makes it difficult to move items from one list_lru_one to another, which is required for per-memcg list_lru reparenting - we can't just splice the lists, we have to move entries one by one. This patch therefore introduces helpers that must be used by callback functions to isolate items instead of raw list_del/list_move. These are list_lru_isolate and list_lru_isolate_move. They not only remove the entry from the list, but also fix the nr_items counter, making sure nr_items always reflects the actual number of elements on the list if checked under the appropriate lock. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13list_lru: introduce list_lru_shrink_{count,walk}Vladimir Davydov
Kmem accounting of memcg is unusable now, because it lacks slab shrinker support. That means when we hit the limit we will get ENOMEM w/o any chance to recover. What we should do then is to call shrink_slab, which would reclaim old inode/dentry caches from this cgroup. This is what this patch set is intended to do. Basically, it does two things. First, it introduces the notion of per-memcg slab shrinker. A shrinker that wants to reclaim objects per cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be passed the memory cgroup to scan from in shrink_control->memcg. For such shrinkers shrink_slab iterates over the whole cgroup subtree under the target cgroup and calls the shrinker for each kmem-active memory cgroup. Secondly, this patch set makes the list_lru structure per-memcg. It's done transparently to list_lru users - everything they have to do is to tell list_lru_init that they want memcg-aware list_lru. Then the list_lru will automatically distribute objects among per-memcg lists basing on which cgroup the object is accounted to. This way to make FS shrinkers (icache, dcache) memcg-aware we only need to make them use memcg-aware list_lru, and this is what this patch set does. As before, this patch set only enables per-memcg kmem reclaim when the pressure goes from memory.limit, not from memory.kmem.limit. Handling memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and it is still unclear whether we will have this knob in the unified hierarchy. This patch (of 9): NUMA aware slab shrinkers use the list_lru structure to distribute objects coming from different NUMA nodes to different lists. Whenever such a shrinker needs to count or scan objects from a particular node, it issues commands like this: count = list_lru_count_node(lru, sc->nid); freed = list_lru_walk_node(lru, sc->nid, isolate_func, isolate_arg, &sc->nr_to_scan); where sc is an instance of the shrink_control structure passed to it from vmscan. To simplify this, let's add special list_lru functions to be used by shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which consolidate the nid and nr_to_scan arguments in the shrink_control structure. This will also allow us to avoid patching shrinkers that use list_lru when we make shrink_slab() per-memcg - all we will have to do is extend the shrink_control structure to include the target memcg and make list_lru_shrink_{count,walk} handle this appropriately. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-28quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space unitsJan Kara
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-11-20GFS2: use kvfree() instead of open-coding itAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-05-14GFS2: remove transaction glockBenjamin Marzinski
GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-04-17GFS2: quotas not being refreshed in gfs2_adjust_quotaAbhi Das
Old values of user quota limits were being used and could allow users to exceed their allotted quotas. This patch refreshes the limits to the latest values so that quotas are enforced correctly. Resolves: rhbz#1077463 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-31GFS2: Fix return value in slot_get()Abhi Das
ENOSPC was being returned in slot_get inspite of successful execution of the function. This patch fixes this return code. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-07GFS2: Use fs_<level> more oftenJoe Perches
Convert a couple of uses of pr_<level> to fs_<level> Add and use fs_emerg. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-07GFS2: Use pr_<level> more consistentlyJoe Perches
Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-06GFS2: global conversion to pr_foo()Fabian Frederick
-All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-27GFS2: replace kmalloc - __vmalloc / memset 0Fabian Frederick
Use kzalloc and __vmalloc __GFP_ZERO for clean sd_quota_bitmap allocation. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-15GFS2: Fix kbuild test robot reported warningSteven Whitehouse
Well I don't get the same warning locally as the kbuild robot, but I guess this should fix the problem, anyway. Here is the warning: head: 2d9e72303d538024627fb1fe2cbde48aec12acc0 commit: ee2411a8db49a21bc55dc124e1b434ba194c8903 [19/20] GFS2: Clean up quota slot allocation config: make ARCH=powerpc allmodconfig All error/warnings: fs/gfs2/quota.c: In function 'gfs2_quota_init': >> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ >> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ fs/gfs2/quota.c: In function 'gfs2_quota_cleanup': >> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(sdp->sd_quota_bitmap); Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-14GFS2: Move quota bitmap operations under their own lockSteven Whitehouse
Gradually, the global qd_lock is being used for less and less. After this patch it will only be used for the per super block list whose purpose is to allow syncing of changes back to the master quota file from the local quota changes file. Fixing up that process to make it more efficient will be the subject of a later patch, however this patch removes another barrier to doing that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2014-01-14GFS2: Clean up quota slot allocationSteven Whitehouse
Quota slot allocation has historically used a vector of pages and a set of homegrown find/test/set/clear bit functions. Since the size of the bitmap is likely to be based on the default qc file size, thats a couple of pages at most. So we ought to be able to allocate that as a single chunk, with a vmalloc fallback, just in case of memory fragmentation. We are then able to use the kernel's own find/test/set/clear bit functions, rather than rolling our own. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2014-01-14GFS2: Only run logd and quota when mounted read/writeSteven Whitehouse
While investigating a rather strange bit of code in the quota clean up function, I spotted that the reason for its existence was that when remounting read only, we were not stopping the quotad thread, and thus it was possible for it to still have a reference to some of the quotas in that case. This patch moves the logd and quota thread start and stop into the make_fs_rw/ro functions, so that we now stop those threads when mounted read only. This means that quotad will always be stopped before we call the quota clean up function, and we can thus dispose of the (rather hackish) code that waits for it to give up its reference on the quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2014-01-14GFS2: Use RCU/hlist_bl based hash for quotasSteven Whitehouse
Prior to this patch, GFS2 kept all the quotas for each super block in a single linked list. This is rather slow when there are large numbers of quotas. This patch introduces a hlist_bl based hash table, similar to the one used for glocks. The initial look up of the quota is now lockless in the case where it is already cached, although we still have to take the per quota spinlock in order to bump the ref count. Either way though, this is a big improvement on what was there before. The qd_lock and the per super block list is preserved, for the time being. However it is intended that since this is no longer used for its original role, it should be possible to shrink the number of items on that list in due course and remove the requirement to take qd_lock in qd_get. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-01-03GFS2: Remove gfs2_quota_change_host structureSteven Whitehouse
There is only one place this is used, when reading in the quota changes at mount time. It is not really required and much simpler to just convert the fields from the on-disk structure as required. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-11-16gfs2: endianness misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-04GFS2: Use generic list_lru for quotaSteven Whitehouse
By using the generic list_lru code, we can now separate the per sb quota list locking from the lru locking. The lru lock is made into the inner-most lock. As a result of this new lock order, we may occasionally see items on the per-sb quota list which are "dead" so that the two places where we traverse that list are updated to take account of that. As a result of this patch, the gfs2 quota shrinker is now NUMA zone aware, and we are also laying the foundations for further improvments in due course. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>
2013-11-04GFS2: Rename quota qd_lru_lock qd_lockSteven Whitehouse
This is a straight forward rename which is in preparation for introducing the generic list_lru infrastructure in the following patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
2013-11-04GFS2: Use reflink for quota data cacheSteven Whitehouse
This patch adds reflink support to the quota data cache. It looks a bit strange because we still don't have a sensible split in the lookup by id and the lru list. That is coming in later patches though. The intent here is just to swap the current ref count for reflinks in all cases with as little as possible other change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Protect quota sync generationSteven Whitehouse
Now that gfs2_quota_sync can be potentially called from multiple threads, we should protect this bit of code, and the sync generation number in particular in order to ensure that there are no races when syncing quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Inline qd_trylock into gfs2_quota_unlockSteven Whitehouse
The function qd_trylock was not a trylock despite its name and can be inlined into gfs2_quota_unlock in order to make the code a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Make two similar quota code fragments into a functionSteven Whitehouse
There should be no functional change bar the removal of a test of the MS_READONLY flag which would never be reachable. This merges the common code from qd_fish and qd_trylock into a single function and calls it from both those places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-04GFS2: Remove obsolete quota tunableSteven Whitehouse
There is no need for a paramater which relates to the internals of quota to be exposed to users. The only possible use would be to turn it up so large that the memory allocation fails. So lets remove it and set it to a sensible value which ensures that we don't ask for multipage allocations. Currently the size of struct gfs2_holder means that the caluclated value is identical to the previous default value, so there should be no functional change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2013-10-02GFS2: Move gfs2_icbit_munge into quota.cSteven Whitehouse
This function is only called twice, and both callers are quota related, so lets move this function into quota.c and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-02GFS2: Add allocation parameters structureSteven Whitehouse
This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-10fs: convert fs shrinkers to new scan/count APIDave Chinner
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10super: fix calculation of shrinkable objects for small numbersGlauber Costa
The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-05GFS2: Remove no-op wrapper functionSteven Whitehouse
This wrapper function is no longer required, so get rid of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24GFS2: two minor quota fixupsBob Peterson
This patch fixes two regression problems that Abhi found in the GFS2 quota code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-13gfs2: Use uid_eq and gid_eq where appropriateEric W. Biederman
Where kuid_t values are compared use uid_eq and where kgid_t values are compared use gid_eq. This is unfortunately necessary because of the type safety that keeps someone from accidentally mixing kuids and kgids with other types. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Use kuid_t and kgid_t types where appropriate.Eric W. Biederman
Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Remove the QUOTA_USER and QUOTA_GROUP definesEric W. Biederman
Remove the QUOTA_USER and QUOTA_GRUP defines. Remove the last vestigal users of QUOTA_USER and QUOTA_GROUP. Now that struct kqid is used throughout the gfs2 quota code the need there is to use QUOTA_USER and QUOTA_GROUP and the defines are just extraneous and confusing. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Store qd_id in struct gfs2_quota_data as a struct kqidEric W. Biederman
- Change qd_id in struct gfs2_qutoa_data to struct kqid. - Remove the now unnecessary QDF_USER bit field in qd_flags. - Propopoage this change through the code generally making things simpler along the way. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Convert gfs2_quota_refresh to take a kqidEric W. Biederman
- In quota_refresh_user_store convert the user supplied uid into a kqid and pass it to gfs2_quota_refresh. - In quota_refresh_group_store convert the user supplied gid into a kqid and pass it to gfs2_quota_refresh. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Modify qdsb_get to take a struct kqidEric W. Biederman
Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Modify struct gfs2_quota_change_host to use struct kqidEric W. Biederman
Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Introduce qd2indexEric W. Biederman
Both qd_alloc and qd2offset perform the exact same computation to get an index from a gfs2_quota_data. Make life a little simpler and factor out this index computation. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Report quotas in the caller's user namespace.Eric W. Biederman
When a quota is queried return the uid or the gid in the mapped into the caller's user namespace. In addition perform the munged version of the mapping so that instead of -1 a value that does not map is reported as the overflowuid or the overflowgid. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGEEric W. Biederman
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE so the constants may be well typed. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13gfs2: Remove improper checks in gfs2_set_dqblk.Eric W. Biederman
In set_dqblk it is an error to look at fdq->d_id or fdq->d_flags. Userspace quota applications do not set these fields when calling quotactl(Q_XSETQLIM,...), and the kernel does not set those fields when quota_setquota calls set_dqblk. gfs2 never looks at fdq->d_id or fdq->d_flags after checking to see if they match the id and type supplied to set_dqblk. No other linux filesystem in set_dqblk looks at either fdq->d_id or fdq->d_flags. Therefore remove these bogus checks from gfs2 and allow normal quota setting applications to work. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-01-29GFS2: Split gfs2_trans_add_bh() into twoSteven Whitehouse
There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-15GFS2: remove redundant lvb pointerDavid Teigland
The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07GFS2: Add Orlov allocatorSteven Whitehouse
Just like ext3, this works on the root directory and any directory with the +T flag set. Also, just like ext3, any subdirectory created in one of the just mentioned cases will be allocated to a random resource group (GFS2 equivalent of a block group). If you are creating a set of directories, each of which will contain a job running on a different node, then by setting +T on the parent directory before creating the subdirectories, each will land up in a different resource group, and thus resource group contention between nodes will be kept to a minimum. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07GFS2: Fix an unchecked error from gfs2_rs_allocAndrew Price
Check the return value of gfs2_rs_alloc(ip) and avoid a possible null pointer dereference. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-10-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
2012-09-24GFS2: Get rid of I_MUTEX_QUOTA usageJan Kara
GFS2 uses i_mutex on its system quota inode to synchronize writes to quota file. Since this is an internal inode to GFS2 (not part of directory hiearchy or visible by user) we are safe to define locking rules for it. So let's just get it its own locking class to make it clear. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24GFS2: Remove rs_requested field from reservationsSteven Whitehouse
The rs_requested field is left over from the original allocation code, however this should have been a parameter passed to the various functions from gfs2_inplace_reserve() and not a member of the reservation structure as the value is not required after the initial allocation. This also helps simplify the code since we no longer need to set the rs_requested to zero. Also the gfs2_inplace_release() function can also be simplified since the reservation structure will always be defined when it is called, and the only remaining task is to unlock the rgrp if required. It can also now be called unconditionally too, resulting in a further simplification. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>