summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-09-24xen: Do not enable spinlocks before jump_label_init() has executedKonrad Rzeszutek Wilk
xen_init_spinlocks() currently calls static_key_slow_inc() before jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is the case) the effect of this static_key_slow_inc() is deferred until after jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in which case the key is set immediately. Thus, depending on the value of config option, we may observe different behavior. In addition, when we come to __jump_label_transform() from jump_label_init(), the key (paravirt_ticketlocks_enabled) is already enabled. On processors where ideal_nop is not the same as default_nop this will cause a BUG() since it is expected that before a key is enabled the latter is replaced by the former during initialization. To address this problem we need to move static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called after jump_label_init(). We also need to make sure that this is done before other cpus start to boot. early_initcall appears to be a good place to do so. (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops need to be set before alternative_instructions() runs.) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added extra comments in the code] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-24tpm: xen-tpmfront: Remove the locality sysfs attributeJason Gunthorpe
Upon deeper review it was agreed to remove the driver-unique 'locality' sysfs attribute before it is present in a released kernel. The attribute was introduced in e2683957fb268c6b29316fd9e7191e13239a30a5 during the 3.12 merge window, so this patch needs to go in before 3.12 is released. The hope is to have a well defined locality API that all the other locality aware drivers can use, perhaps in 3.13. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
2013-09-24tpm: xen-tpmfront: Fix default durationsJason Gunthorpe
All the default durations were being set to 10 minutes which is way too long for the timeouts. Normal values for the longest duration are around 5 mins, and short duration ar around .5s. Further, these are just the default, tpm_get_timeouts will set them to values from the TPM (or throw an error). Just remove them. Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-24drm/i2c: tda998x: fix audio mutingRussell King
Fix a bug that was introduced in commit c4c11dd160a8 ("drm/i2c: tda998x: add video and audio input configuration") when Sebastian cleaned up my original patch. Without this being fixed, audio is muted when the display is turned off, never to be re-enabled. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Darren Etheridge <detheridge@ti.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24ipc: fix race with LSMsDavidlohr Bueso
Currently, IPC mechanisms do security and auditing related checks under RCU. However, since security modules can free the security structure, for example, through selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure is freed before other tasks are done with it, creating a use-after-free condition. Manfred illustrates this nicely, for instance with shared mem and selinux: -> do_shmat calls rcu_read_lock() -> do_shmat calls shm_object_check(). Checks that the object is still valid - but doesn't acquire any locks. Then it returns. -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat) -> selinux_shm_shmat calls ipc_has_perm() -> ipc_has_perm accesses ipc_perms->security shm_close() -> shm_close acquires rw_mutex & shm_lock -> shm_close calls shm_destroy -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security) -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm) -> ipc_free_security calls kfree(ipc_perms->security) This patch delays the freeing of the security structures after all RCU readers are done. Furthermore it aligns the security life cycle with that of the rest of IPC - freeing them based on the reference counter. For situations where we need not free security, the current behavior is kept. Linus states: "... the old behavior was suspect for another reason too: having the security blob go away from under a user sounds like it could cause various other problems anyway, so I think the old code was at least _prone_ to bugs even if it didn't have catastrophic behavior." I have tested this patch with IPC testcases from LTP on both my quad-core laptop and on a 64 core NUMA server. In both cases selinux is enabled, and tests pass for both voluntary and forced preemption models. While the mentioned races are theoretical (at least no one as reported them), I wanted to make sure that this new logic doesn't break anything we weren't aware of. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24MIPS: cpu-features.h: s/MIPS53/MIPS64/Maciej W. Rozycki
No support for MIPS53 processors yet. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/5876/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-09-23Linux 3.12-rc2Linus Torvalds
2013-09-23Merge tag 'staging-3.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging fixes from Greg KH: "Here are a number of small staging tree and iio driver fixes. Nothing major, just lots of little things" * tag 'staging-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (34 commits) iio:buffer_cb: Add missing iio_buffer_init() iio: Prevent race between IIO chardev opening and IIO device free iio: fix: Keep a reference to the IIO device for open file descriptors iio: Stop sampling when the device is removed iio: Fix crash when scan_bytes is computed with active_scan_mask == NULL iio: Fix mcp4725 dev-to-indio_dev conversion in suspend/resume iio: Fix bma180 dev-to-indio_dev conversion in suspend/resume iio: Fix tmp006 dev-to-indio_dev conversion in suspend/resume iio: iio_device_add_event_sysfs() bugfix staging: iio: ade7854-spi: Fix return value staging:iio:hmc5843: Fix measurement conversion iio: isl29018: Fix uninitialized value staging:iio:dummy fix kfifo_buf kconfig dependency issue if kfifo modular and buffer enabled for built in dummy driver. iio: at91: fix adc_clk overflow staging: line6: add bounds check in snd_toneport_source_put() Staging: comedi: Fix dependencies for drivers misclassified as PCI staging: r8188eu: Adjust RX gain staging: r8188eu: Fix smatch warning in core/rtw_ieee80211. staging: r8188eu: Fix smatch error in core/rtw_mlme_ext.c staging: r8188eu: Fix Smatch off-by-one warning in hal/rtl8188e_hal_init.c ...
2013-09-23Merge tag 'usb-3.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB fixes for 3.12-rc2. One is a revert of a EHCI change that isn't quite ready for 3.12. Others are minor things, gadget fixes, Kconfig fixes, and some quirks and documentation updates. All have been in linux-next for a bit" * tag 'usb-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: pl2303: distinguish between original and cloned HX chips USB: Faraday fotg210: fix email addresses USB: fix typo in usb serial simple driver Kconfig Revert "USB: EHCI: support running URB giveback in tasklet context" usb: s3c-hsotg: do not disconnect gadget when receiving ErlySusp intr usb: s3c-hsotg: fix unregistration function usb: gadget: f_mass_storage: reset endpoint driver data when disabled usb: host: fsl-mph-dr-of: Staticize local symbols usb: gadget: f_eem: Staticize eem_alloc usb: gadget: f_ecm: Staticize ecm_alloc usb: phy: omap-usb3: Fix return value usb: dwc3: gadget: avoid memory leak when failing to allocate all eps usb: dwc3: remove extcon dependency usb: gadget: add '__ref' for rndis_config_register() and cdc_config_register() usb: dwc3: pci: add support for BayTrail usb: gadget: cdc2: fix conversion to new interface of f_ecm usb: gadget: fix a bug and a WARN_ON in dummy-hcd usb: gadget: mv_u3d_core: fix violation of locking discipline in mv_u3d_ep_disable()
2013-09-23dm: add reserved_bio_based_ios module parameterMike Snitzer
Allow user to change the number of IOs that are reserved by bio-based DM's mempools by writing to this file: /sys/module/dm_mod/parameters/reserved_bio_based_ios The default value is RESERVED_BIO_BASED_IOS (16). The maximum allowed value is RESERVED_MAX_IOS (1024). Export dm_get_reserved_bio_based_ios() for use by DM targets and core code. Switch to sizing dm-io's mempool and bioset using DM core's configurable 'reserved_bio_based_ios'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Frank Mayhar <fmayhar@google.com>
2013-09-23dm: add reserved_rq_based_ios module parameterMike Snitzer
Allow user to change the number of IOs that are reserved by request-based DM's mempools by writing to this file: /sys/module/dm_mod/parameters/reserved_rq_based_ios The default value is RESERVED_REQUEST_BASED_IOS (256). The maximum allowed value is RESERVED_MAX_IOS (1024). Export dm_get_reserved_rq_based_ios() for use by DM targets and core code. Switch to sizing dm-mpath's mempool using DM core's configurable 'reserved_rq_based_ios'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com>
2013-09-23dm: lower bio-based mempool reservationMike Snitzer
Bio-based device mapper processing doesn't need larger mempools (like request-based DM does), so lower the number of reserved entries for bio-based operation. 16 was already used for bio-based DM's bioset but mistakenly wasn't used for it's _io_cache. Formalize difference between bio-based and request-based defaults by introducing RESERVED_BIO_BASED_IOS and RESERVED_REQUEST_BASED_IOS. (based on older code from Mikulas Patocka) Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com>
2013-09-23dm thin: do not expose non-zero discard limits if discards disabledMike Snitzer
Fix issue where the block layer would stack the discard limits of the pool's data device even if the "ignore_discard" pool feature was specified. The pool and thin device(s) still had discards disabled because the QUEUE_FLAG_DISCARD request_queue flag wasn't set. But to avoid user confusion when "ignore_discard" is used: both the pool device and the thin device(s) have zeroes for all discard limits. Also, always set discard_zeroes_data_unsupported in targets because they should never advertise the 'discard_zeroes_data' capability (even if the pool's data device supports it). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2013-09-23x86/reboot: Add quirk to make Dell C6100 use reboot=pci automaticallyMasoud Sharbiani
Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time. Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com> Signed-off-by: Vinson Lee <vlee@twitter.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23perf/x86/intel: Add model number for Avoton SilvermontYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Cc: a.p.zijlstra@chello.nl Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: - some small fixes for msm and exynos - a regression revert affecting nouveau users with old userspace - intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits) Revert "drm: mark context support as a legacy subsystem" drm/i915: Don't enable the cursor on a disable pipe drm/i915: do not update cursor in crtc mode set drm/exynos: fix return value check in lowlevel_buffer_allocate() drm/exynos: Fix address space warnings in exynos_drm_fbdev.c drm/exynos: Fix address space warning in exynos_drm_buf.c drm/exynos: Remove redundant OF dependency drm/msm: drop unnecessary set_need_resched() drm/i915: kill set_need_resched drm/msm: fix potential NULL pointer dereference drm/i915/dvo: set crtc timings again for panel fixed modes drm/i915/sdvo: Robustify the dtd<->drm_mode conversions drm/msm: workaround for missing irq drm/msm: return -EBUSY if bo still active drm/msm: fix return value check in ERR_PTR() drm/msm: fix cmdstream size check drm/msm: hangcheck harder drm/msm: handle read vs write fences drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion drm/i915: Use proper print format for debug prints ...
2013-09-22Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO fixes from Jens Axboe: "After merge window, no new stuff this time only a collection of neatly confined and simple fixes" * 'for-3.12/core' of git://git.kernel.dk/linux-block: cfq: explicitly use 64bit divide operation for 64bit arguments block: Add nr_bios to block_rq_remap tracepoint If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up. bio-integrity: Fix use of bs->bio_integrity_pool after free blkcg: relocate root_blkg setting and clearing block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) block: trace all devices plug operation
2013-09-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are mostly bug fixes and a two small performance fixes. The most important of the bunch are Josef's fix for a snapshotting regression and Mark's update to fix compile problems on arm" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: create the uuid tree on remount rw btrfs: change extent-same to copy entire argument struct Btrfs: dir_inode_operations should use btrfs_update_time also btrfs: Add btrfs: prefix to kernel log output btrfs: refuse to remount read-write after abort Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0 Btrfs: don't leak transaction in btrfs_sync_file() Btrfs: add the missing mutex unlock in write_all_supers() Btrfs: iput inode on allocation failure Btrfs: remove space_info->reservation_progress Btrfs: kill delay_iput arg to the wait_ordered functions Btrfs: fix worst case calculator for space usage Revert "Btrfs: rework the overcommit logic to be based on the total size" Btrfs: improve replacing nocow extents Btrfs: drop dir i_size when adding new names on replay Btrfs: replay dir_index items before other items Btrfs: check roots last log commit when checking if an inode has been logged Btrfs: actually log directory we are fsync()'ing Btrfs: actually limit the size of delalloc range Btrfs: allocate the free space by the existed max extent size when ENOSPC ...
2013-09-22cfq: explicitly use 64bit divide operation for 64bit argumentsAnatol Pomozov
'samples' is 64bit operant, but do_div() second parameter is 32. do_div silently truncates high 32 bits and calculated result is invalid. In case if low 32bit of 'samples' are zeros then do_div() produces kernel crash. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-21Merge tag 'iio-fixes-for-3.12a' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: First round of IIO fixes for 3.12 A series of wrong 'struct dev' assumptions in suspend/resume callbacks following on from this issue being identified in a new driver review. One to watch out for in future. A number of driver specific fixes 1) at91 - fix a overflow in clock rate computation 2) dummy - Kconfig dependency issue 3) isl29018 - uninitialized value 4) hmc5843 - measurement conversion bug introduced by recent cleanup. 5) ade7854-spi - wrong return value. Some IIO core fixes 1) Wrong value picked up for event code creation for a modified channel 2) A null dereference on failure to initialize a buffer after no buffer has been in use, when using the available_scan_masks approach. 3) Sampling not stopped when a device is removed. Effects forced removal such as hot unplugging. 4) Prevent device going away if a chrdev is still open in userspace. 5) Prevent race on chardev opening and device being freed. 6) Add a missing iio_buffer_init in the call back buffer. These last few are the first part of a set from Lars-Peter Clausen who has been taking a closer look at our removal paths and buffer handling than anyone has for quite some time.
2013-09-21Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: "Fix a regression due to incorrect sharing of gss auth caches" * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: RPCSEC_GSS: fix crash on destroying gss auth
2013-09-21block: Add nr_bios to block_rq_remap tracepointJun'ichi Nomura
Adding the number of bios in a remapped request to 'block_rq_remap' tracepoint. Request remapper clones bios in a request to track the completion status of each bio. So the number of bios can be useful information for investigation. Related discussions: http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-21Btrfs: create the uuid tree on remount rwJosef Bacik
Users have been complaining of the uuid tree stuff warning that there is no uuid root when trying to do snapshot operations. This is because if you mount -o ro we will not create the uuid tree. But then if you mount -o rw,remount we will still not create it and then any subsequent snapshot/subvol operations you try to do will fail gloriously. Fix this by creating the uuid_root on remount rw if it was not already there. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21btrfs: change extent-same to copy entire argument structMark Fasheh
btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data back to it's argument struct. Unfortunately, not all architectures provide __put_user_unaligned(), so compiles break on them if btrfs is selected. Instead, just copy the whole struct in / out at the start and end of operations, respectively. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: dir_inode_operations should use btrfs_update_time alsoGuangyu Sun
Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on RO subvolumes) ensures that the access time of an inode is not updated when the inode lives in a read-only subvolume. However, if a directory on a read-only subvolume is accessed, the atime is updated. This results in a write operation to a read-only subvolume. I believe that access times should never be updated on read-only subvolumes. To reproduce: # mkfs.btrfs -f /dev/dm-3 (...) # mount /dev/dm-3 /mnt # btrfs subvol create /mnt/sub Create subvolume '/mnt/sub' # mkdir /mnt/sub/dir # echo "abc" > /mnt/sub/dir/file # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap' # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:21:49.389157126 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 # ls /mnt/rosnap/dir file # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:22:56.797151670 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 Reported-by: Koen De Wit <koen.de.wit@oracle.com> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21btrfs: Add btrfs: prefix to kernel log outputFrank Holton
The kernel log entries for device label %s and device fsid %pU are missing the btrfs: prefix. Add those here. Signed-off-by: Frank Holton <fholton@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21btrfs: refuse to remount read-write after abortDavid Sterba
It's still possible to flip the filesystem into RW mode after it's remounted RO due to an abort. There are lots of places that check for the superblock error bit and will not write data, but we should not let the filesystem appear read-write. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when ↵chandan
arg is 0 This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default subvolume by passing a subvolume id of 0. Signed-off-by: chandan <chandan@linux.vnet.ibm.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: don't leak transaction in btrfs_sync_file()Filipe David Borba Manana
In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would return without ending/freeing the transaction. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: add the missing mutex unlock in write_all_supers()Stefan Behrens
The BUG() was replaced by btrfs_error() and return -EIO with the patch "get rid of one BUG() in write_all_supers()", but the missing mutex_unlock() was overlooked. The 0-DAY kernel build service from Intel reported the missing unlock which was found by the coccinelle tool: fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374 Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: iput inode on allocation failureJosef Bacik
We don't do the iput when we fail to allocate our delayed delalloc work in __start_delalloc_inodes, fix this. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: remove space_info->reservation_progressJosef Bacik
This isn't used for anything anymore, just remove it. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: kill delay_iput arg to the wait_ordered functionsJosef Bacik
This is a left over of how we used to wait for ordered extents, which was to grab the inode and then run filemap flush on it. However if we have an ordered extent then we already are holding a ref on the inode, and we just use btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on the inode to start work on the ordered extent. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: fix worst case calculator for space usageJosef Bacik
Forever ago I made the worst case calculator say that we could potentially split into 3 blocks for every level on the way down, which isn't right. If we split we're only going to get two new blocks, the one we originally cow'ed and the new one we're going to split. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Revert "Btrfs: rework the overcommit logic to be based on the total size"Josef Bacik
This reverts commit 70afa3998c9baed4186df38988246de1abdab56d. It is causing performance issues and wasn't actually correct. There were problems with the way we flushed delalloc and that was the real cause of the early enospc. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: improve replacing nocow extentsJosef Bacik
Various people have hit a deadlock when running btrfs/011. This is because when replacing nocow extents we will take the i_mutex to make sure nobody messes with the file while we are replacing the extent. The problem is we are already holding a transaction open, which is a locking inversion, so instead we need to save these inodes we find and then process them outside of the transaction. Further we can't just lock the inode and assume we are good to go. We need to lock the extent range and then read back the extent cache for the inode to make sure the extent really still points at the physical block we want. If it doesn't we don't have to copy it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: drop dir i_size when adding new names on replayJosef Bacik
So if we have dir_index items in the log that means we also have the inode item as well, which means that the inode's i_size is correct. However when we process dir_index'es we call btrfs_add_link() which will increase the directory's i_size for the new entry. To fix this we need to just set the dir items i_size to 0, and then as we find dir_index items we adjust the i_size. btrfs_add_link() will do it for new entries, and if the entry already exists we can just add the name_len to the i_size ourselves. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: replay dir_index items before other itemsJosef Bacik
A user reported a bug where his log would not replay because he was getting -EEXIST back. This was because he had a file moved into a directory that was logged. What happens is the file had a lower inode number, and so it is processed first when replaying the log, and so we add the inode ref in for the directory it was moved to. But then we process the directories DIR_INDEX item and try to add the inode ref for that inode and it fails because we already added it when we replayed the inode. To solve this problem we need to just process any DIR_INDEX items we have in the log first so this all is taken care of, and then we can replay the rest of the items. With this patch my reproducer can remount the file system properly instead of erroring out. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: check roots last log commit when checking if an inode has been loggedJosef Bacik
Liu introduced a local copy of the last log commit for an inode to make sure we actually log an inode even if a log commit has already taken place. In order to make sure we didn't relog the same inode multiple times he set this local copy to the current trans when we log the inode, because usually we log the inode and then sync the log. The exception to this is during rename, we will relog an inode if the name changed and it is already in the log. The problem with this is then we go to sync the inode, and our check to see if the inode has already been logged is tripped and we don't sync the log. To fix this we need to _also_ check against the roots last log commit, because it could be less than what is in our local copy of the log commit. This fixes a bug where we rename a file into a directory and then fsync the directory and then on remount the directory is no longer there. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: actually log directory we are fsync()'ingJosef Bacik
If you just create a directory and then fsync that directory and then pull the power plug you will come back up and the directory will not be there. That is because we won't actually create directories if we've logged files inside of them since they will be created on replay, but in this check we will set our logged_trans of our current directory if it happens to be a directory, making us think it doesn't need to be logged. Fix the logic to only do this to parent directories. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: actually limit the size of delalloc rangeJosef Bacik
So forever we have had this thing to limit the amount of delalloc pages we'll setup to be written out to 128mb. This is because we have to lock all the pages in this range, so anything above this gets a bit unweildly, and also without a limit we'll happily allocate gigantic chunks of disk space. Turns out our check for this wasn't quite right, we wouldn't actually limit the chunk we wanted to write out, we'd just stop looking for more space after we went over the limit. So if you do a giant 20gb dd on my box with lots of ram I could get 2gig extents. This is fine normally, except when you go to relocate these extents and we can't find enough space to relocate these moster extents, since we have to be able to allocate exactly the same sized extent to move it around. So fix this by actually enforcing the limit. With this patch I'm no longer seeing giant 1.5gb extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: allocate the free space by the existed max extent size when ENOSPCMiao Xie
By the current code, if the requested size is very large, and all the extents in the free space cache are small, we will waste lots of the cpu time to cut the requested size in half and search the cache again and again until it gets down to the size the allocator can return. In fact, we can know the max extent size in the cache after the first search, so we needn't cut the size in half repeatedly, and just use the max extent size directly. This way can save lots of cpu time and make the performance grow up when there are only fragments in the free space cache. According to my test, if there are only 4KB free space extents in the fs, and the total size of those extents are 256MB, we can reduce the execute time of the following test from 5.4s to 1.4s. dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync Changelog v2 -> v3: - fix the problem that we skip the block group with the space which is less than we need. Changelog v1 -> v2: - address the problem that we return a wrong start position when searching the free space in a bitmap. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21btrfs: add lockdep and tracing annotations for uuid treeDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21btrfs: show compiled-in config features at module load timeStefan Behrens
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. (This commit message is a copy of David Sterba's commit message when he introduced btrfs_print_info()). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: more efficient inode tree replace operationFilipe David Borba Manana
Instead of removing the current inode from the red black tree and then add the new one, just use the red black tree replace operation, which is more efficient. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: do not add replace target to the alloc_listIlya Dryomov
If replace was suspended by the umount, replace target device is added to the fs_devices->alloc_list during a later mount. This is obviously wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that, but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized *after* everything is opened and fs_devices lists are populated. Fix this by checking the devid instead: for replace targets it's always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Btrfs: fixup error handling in btrfs_reloc_cowJosef Bacik
If we failed to actually allocate the correct size of the extent to relocate we will end up in an infinite loop because we won't return an error, we'll just move on to the next extent. So fix this up by returning an error, and then fix all the callers to return an error up the stack rather than BUG_ON()'ing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21Merge tag 'v3.11' into for-linusChris Mason
Linux 3.11
2013-09-21iio:buffer_cb: Add missing iio_buffer_init()Lars-Peter Clausen
Make sure to properly initialize the IIO buffer data structure. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2013-09-21iio: Prevent race between IIO chardev opening and IIO device freeLars-Peter Clausen
Set the IIO device as the parent for the character device We need to make sure that the IIO device is not freed while the character device exists, otherwise the freeing of the IIO device might race against the file open callback. Do this by setting the character device's parent to the IIO device, this will cause the character device to grab a reference to the IIO device and only release it once the character device itself has been removed. Also move the registration of the character device before the registration of the IIO device to avoid the (rather theoretical case) that the IIO device is already freed again before we can add the character device and grab a reference to the IIO device. We also need to move the call to cdev_del() from iio_dev_release() to iio_device_unregister() (where it should have been in the first place anyway) to avoid a reference cycle. As iio_dev_release() is only called once all reference are dropped, but the character device holds a reference to the IIO device. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org>