summaryrefslogtreecommitdiff
path: root/fs/fcntl.c
AgeCommit message (Collapse)Author
2006-04-02BUG_ON() Conversion in fs/fcntl.cEric Sesterhenn
this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26[PATCH] Use __read_mostly on some hot fs variablesEric Dumazet
I discovered on oprofile hunting on a SMP platform that dentry lookups were slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in a cache line that contained inodes_stat. So each time inodes_stats is changed by a cpu, other cpus have to refill their cache line. This patch moves some variables to the __read_mostly section, in order to avoid false sharing. RCU dentry lookups can go full speed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] Shrinks sizeof(files_struct) and better layoutEric Dumazet
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits platforms, lowering kmalloc() allocated space by 50%. 2) Reduce the size of (files_struct), using a special 32 bits (or 64bits) embedded_fd_set, instead of a 1024 bits fd_set for the close_on_exec_init and open_fds_init fields. This save some ram (248 bytes per task) as most tasks dont open more than 32 files. D-Cache footprint for such tasks is also reduced to the minimum. 3) Reduce size of allocated fdset. Currently two full pages are allocated, that is 32768 bits on x86 for example, and way too much. The minimum is now L1_CACHE_BYTES. UP and SMP should benefit from this patch, because most tasks will touch only one cache line when open()/close() stdin/stdout/stderr (0/1/2), (next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the same cache line) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03[PATCH] fcntl F_SETFL and read-only IS_APPEND filesdean gaudet
There is code in setfl() which attempts to preserve the O_APPEND flag on IS_APPEND files... however IS_APPEND files could also be opened O_RDONLY and in that case setfl() should not require O_APPEND... coreutils 5.93 tail -f attempts to set O_NONBLOCK even on regular files... unfortunately if you try this on an append-only log file the result is this: fcntl64(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) fcntl64(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = -1 EPERM (Operation not permitted) I offer up the patch below as one way of fixing the problem... i've tested it fixes the problem with tail -f but haven't really tested beyond that. (I also reported the coreutils bug upstream... it shouldn't fail imho... <https://savannah.gnu.org/bugs/index.php?func=detailitem&item_id=15473>) Signed-off-by: dean gaudet <dean@arctic.org> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-15[PATCH] Unlinline a bunch of other functionsArjan van de Ven
Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12[PATCH] capable/capability.h (fs/)Randy Dunlap
fs: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] sigio: cleanup, don't take tasklist twiceOleg Nesterov
The only user of send_sigio_to_task() already holds tasklist_lock, so it is better not to send the signal via send_group_sig_info() (which takes tasklist recursively) but use group_send_sig_info(). The same change in send_sigurg()->send_sigurg_to_task(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] files: lock-free fd look-upDipankar Sarma
With the use of RCU in files structure, the look-up of files using fds can now be lock-free. The lookup is protected by rcu_read_lock()/rcu_read_unlock(). This patch changes the readers to use lock-free lookup. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Ravikiran Thirumalai <kiran_th@gmail.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] files: files struct with RCUDipankar Sarma
Patch to eliminate struct files_struct.file_lock spinlock on the reader side and use rcu refcounting rcuref_xxx api for the f_count refcounter. The updates to the fdtable are done by allocating a new fdtable structure and setting files->fdt to point to the new structure. The fdtable structure is protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A global list of fdtable to be freed is not scalable, so we use a per-cpu list. If keventd is already handling the current cpu's work, we use a timer to defer queueing of that work. Since the last publication, this patch has been re-written to avoid using explicit memory barriers and use rcu_assign_pointer(), rcu_dereference() premitives instead. This required that the fd information is kept in a separate structure (fdtable) and updated atomically. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] files: break up files structDipankar Sarma
In order for the RCU to work, the file table array, sets and their sizes must be updated atomically. Instead of ensuring this through too many memory barriers, we put the arrays and their sizes in a separate structure. This patch takes the first step of putting the file table elements in a separate structure fdtable that is embedded withing files_struct. It also changes all the users to refer to the file table using files_fdtable() macro. Subsequent applciation of RCU becomes easier after this. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] stale POSIX lock handlingPeter Staubach
I believe that there is a problem with the handling of POSIX locks, which the attached patch should address. The problem appears to be a race between fcntl(2) and close(2). A multithreaded application could close a file descriptor at the same time as it is trying to acquire a lock using the same file descriptor. I would suggest that that multithreaded application is not providing the proper synchronization for itself, but the OS should still behave correctly. SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when a file descriptor is closed, that all POSIX locks on the file, owned by the process which closed the file descriptor, should be released. The trick here is when those locks are released. The current code releases all locks which exist when close is processing, but any locks in progress are handled when the last reference to the open file is released. There are three cases to consider. One is the simple case, a multithreaded (mt) process has a file open and races to close it and acquire a lock on it. In this case, the close will release one reference to the open file and when the fcntl is done, it will release the other reference. For this situation, no locks should exist on the file when both the close and fcntl operations are done. The current system will handle this case because the last reference to the open file is being released. The second case is when the mt process has dup(2)'d the file descriptor. The close will release one reference to the file and the fcntl, when done, will release another, but there will still be at least one more reference to the open file. One could argue that the existence of a lock on the file after the close has completed is okay, because it was acquired after the close operation and there is still a way for the application to release the lock on the file, using an existing file descriptor. The third case is when the mt process has forked, after opening the file and either before or after becoming an mt process. In this case, each process would hold a reference to the open file. For each process, this degenerates to first case above. However, the lock continues to exist until both processes have released their references to the open file. This lock could block other lock requests. The changes to release the lock when the last reference to the open file aren't quite right because they would allow the lock to exist as long as there was a reference to the open file. This is too long. The new proposed solution is to add support in the fcntl code path to detect a race with close and then to release the lock which was just acquired when such as race is detected. This causes locks to be released in a timely fashion and for the system to conform to the POSIX semantic specification. This was tested by instrumenting a kernel to detect the handling locks and then running a program which generates case #3 above. A dangling lock could be reliably generated. When the changes to detect the close/fcntl race were added, a dangling lock could no longer be generated. Cc: Matthew Wilcox <willy@debian.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] convert that currently tests _NSIG directly to use valid_signal()Jesper Juhl
Convert most of the current code that uses _NSIG directly to instead use valid_signal(). This avoids gcc -W warnings and off-by-one errors. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16[PATCH] AYSNC IO using singals other than SIGIOBharath Ramesh
A question on sigwaitinfo based IO mechanism in multithreaded applications. I am trying to use RT signals to notify me of IO events using RT signals instead of SIGIO in a multithreaded applications. I noticed that there was some discussion on lkml during november 1999 with the subject of the discussion as "Signal driven IO". In the thread I noticed that RT signals were being delivered to the worker thread. I am running 2.6.10 kernel and I am trying to use the very same mechanism and I find that only SIGIO being propogated to the worker threads and RT signals only being propogated to the main thread and not the worker threads where I actually want them to be propogated too. On further inspection I found that the following patch which I have attached solves the problem. I am not sure if this is a bug or feature in the kernel. Roland McGrath <roland@redhat.com> said: This relates only to fcntl F_SETSIG, which is a Linux extension. So there is no POSIX issue. When changing various things like the normal SIGIO signalling to do group signals, I was concerned strictly with the POSIX semantics and generally avoided touching things in the domain of Linux inventions. That's why I didn't change this when I changed the call right next to it. There is no reason I can see that F_SETSIG-requested signals shouldn't use a group signal like normal SIGIO does. I'm happy to ACK this patch, there is nothing wrong with its change to the semantics in my book. But neither POSIX nor I care a whit what F_SETSIG does. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!