Age | Commit message (Collapse) | Author |
|
commit 2b1e1a7cd0a615d57455567a549f9965023321b5 upstream.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e15fd0a11db00fc7f470a9fc804657ec3f6d04a5 upstream.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d03857c63bb036edff0aa7a107276360173aca4e upstream.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4eb4517ce7c9c573b6c823de403aeccb40018cfc upstream.
- replace an ad-hoc array with a struct
- rename to calc_signature() for consistency
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7882a26d2e2e520099e2961d5e2e870f8e4172dc upstream.
It's going to be used as a temporary buffer for in-place en/decryption
with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a45f795c65b479b4ba107b6ccde29b896d51ee98 upstream.
Starting with 4.9, kernel stacks may be vmalloced and therefore not
guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
option is enabled by default on x86. This makes it invalid to use
on-stack buffers with the crypto scatterlist API, as sg_set_buf()
expects a logical address and won't work with vmalloced addresses.
There isn't a different (e.g. kvec-based) crypto API we could switch
net/ceph/crypto.c to and the current scatterlist.h API isn't getting
updated to accommodate this use case. Allocating a new header and
padding for each operation is a non-starter, so do the en/decryption
in-place on a single pre-assembled (header + data + padding) heap
buffer. This is explicitly supported by the crypto API:
"... the caller may provide the same scatter/gather list for the
plaintext and cipher text. After the completion of the cipher
operation, the plaintext data is replaced with the ciphertext data
in case of an encryption and vice versa for a decryption."
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 55d9cc834f933698fc864f0d36f3cca533d30a8d upstream.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 462e650451c577d15eeb4d883d70fa9e4e529fad upstream.
Since commit 0a990e709356 ("ceph: clean up service ticket decoding"),
th->session_key isn't assigned until everything is decoded.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 36721ece1e84a25130c4befb930509b3f96de020 upstream.
Pass what's going to be encrypted - that's msg_b, not ticket_blob.
ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
the maxlen calculation, but makes it a bit clearer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c056fdc5b474329037f2aa18401bd73033e0ce0 upstream.
After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.
AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd1bba ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").
The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down. Pass 0 to facilitate backporting, and kill
it in the next commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies. Starting with a large integer should ease the task
of telling apart kernel and userspace clients.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid pool.
Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove extra x1 variable, it's just temporary placeholder that
clutters the code unnecessarily.
Reflects ceph.git commit 0d19408d91dd747340d70287b4ef9efd89e95c6b.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Use __builtin_clz() supported by GCC and Clang to figure out
how many bits we should shift instead of shifting by a bit
in a loop until the value gets normalized. Improves performance
of this function by up to 3x in worst-case scenario and overall
straw2 performance by ~10%.
Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
A static bug finder (EBA) on Linux 4.7:
Double lock in net/ceph/auth.c
second lock at 108: mutex_lock(& ac->mutex); [ceph_auth_build_hello]
after calling from 263: ret = ceph_auth_build_hello(ac, msg_buf, msg_len);
if ! ac->protocol -> true at 262
first lock at 261: mutex_lock(& ac->mutex); [ceph_build_auth]
ceph_auth_build_hello() is never called, because the protocol is always
initialized, whether we are checking existing tickets (in delayed_work())
or getting new ones after invalidation (in invalidate_authorizer()).
Reported-by: Iago Abal <iari@itu.dk>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Export client addr/nonce, so userspace can check if a image is being
blacklisted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
[idryomov@gmail.com: ceph_client_addr(), endianess fix]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature. Maintenance
operations (resize, snapshot create, etc) are offloaded to librbd via
returning -EOPNOTSUPP - librbd should request the lock and execute the
operation.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
|
|
Revamp watch code to support retrying watch re-registration:
- add rbd_dev->watch_state for more robust errcb handling
- store watch cookie separately to avoid dereferencing watch_handle
which is set to NULL on unwatch
- move re-register code into a delayed work and retry re-registration
every second, unless the client is blacklisted
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
|
|
It's gid / global_id in other places.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Reuse ceph_mon_generic_request infrastructure for sending monitor
commands. In particular, add support for 'blacklist add' to prevent
other, non-responsive clients from making further updates.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Add an interface for the Ceph OSD lock.lock_info method and associated
data structures.
Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
This patch adds support for rados lock, unlock and break lock.
Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Add a convenience function to osd_client to send Ceph OSD
'class' ops. The interface assumes that the request and
reply data each consist of single pages.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Add support for this Ceph OSD op, needed to support the RBD exclusive
lock feature.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Clear up EntityName vs entity_name_t confusion.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Fixes the following sparse warning:
net/ceph/mon_client.c:577:6: warning:
symbol 'cancel_generic_request' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In case of error, the function ceph_alloc_page_vector() returns
ERR_PTR() and never returns NULL. The NULL test in the return value
check should be replaced with IS_ERR().
Fixes: 1907920324f1 ('libceph: support for sending notifies')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.
The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The data structure is for storing namesapce string. It allows namespace
string to be shared between cephfs inodes with same layout. This data
structure can also be referenced by OSD request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|
Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|
- decode.h needs slab.h for kmalloc()
- osd_client.h needs msgpool.h for struct ceph_msgpool
- msgpool.h doesn't need messenger.h
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Currently, osd_weight and osd_state fields are updated in the encoding
order. This is wrong, because an incremental map may look like e.g.
new_up_client: { osd=6, addr=... } # set osd_state and addr
new_state: { osd=6, xorstate=EXISTS } # clear osd_state
Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state. A non-existent OSD is considered down by the
mapping code
2087 for (i = 0; i < pg->pg_temp.len; i++) {
2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089 if (ceph_can_shift_osds(pi))
2090 continue;
2091
2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;
and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:
[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
and hung rbds on the client:
[ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[ 493.566805] rbd: rbd0: result -6 xferred 400000
[ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed
Fixes: http://tracker.ceph.com/issues/14901
Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
|
|
Commit d30291b985d1 ("libceph: variable-sized ceph_object_id") changed
dout()s in what is now encode_request() and ceph_object_locator_to_pg()
to use %pE, mostly to document that, although all rbd and cephfs object
names are NULL-terminated strings, ceph_object_id will handle any RADOS
object name, including the one containing NULs, just fine.
However, it turns out that vbin_printf() can't handle anything but ints
and %s - all %p suffixes are ignored. The buffer %p** points to isn't
recorded, resulting in trash in the messages if the buffer had been
reused by the time bstr_printf() got to it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
handle_reply() may be called twice on the same request: on ack and then
on commit. This occurs on btrfs-formatted OSDs or if cephfs sync write
path is triggered - CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK.
handle_reply() handles this with the help of done_request().
Fixes: 5aea3dcd5021 ("libceph: a major OSD client update")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For the benefit of every single caller, take osdc instead of map.
Also, now that osdc->osdmap can't ever be NULL, drop the check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most
cases, the sync IO should be uninterruptible. The fix is use killale
wait function in ceph_osdc_wait_request().
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.
No functional change.
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
... with a wrapper around maybe_request_map() - no need for two
osdmap-specific functions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
There is now about a dozen CEPH_OSDMAP_* flags. This is a debugging
interface, so just dump in hex instead of spelling each flag out.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This adds the "map check" infrastructure for sending osdmap version
checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
-ENOENT if the target pool doesn't exist or has just been deleted.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
messages asynchronously and get a callback on completion. Refactor MON
client to allow firing off generic requests asynchronously and add an
async variant of ceph_monc_get_version(). ceph_monc_do_statfs() is
switched over and remains sync.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Implement ceph_osdc_watch_check() to be able to check on status of
watch. Note that the time it takes for a watch/notify event to get
delivered through the notify_wq is taken into account.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Implement ceph_osdc_notify() for sending notifies.
Due to the fact that the current messenger can't do read-in into
pagelists (it can only do write-out from them), I had to go with a page
vector for a NOTIFY_COMPLETE payload, for now.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This adds support and switches rbd to a new, more reliable version of
watch/notify protocol. As with the OSD client update, this is mostly
about getting the right structures linked into the right places so that
reconnects are properly sent when needed. watch/notify v2 also
requires sending regular pings to the OSDs - send_linger_ping().
A major change from the old watch/notify implementation is the
introduction of ceph_osd_linger_request - linger requests no longer
piggy back on ceph_osd_request. ceph_osd_event has been merged into
ceph_osd_linger_request.
All the details are now hidden within libceph, the interface consists
of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
the lifetime management simple.
ceph_osdc_notify_ack() accepts an optional data payload, which is
relayed back to the notifier.
Portions of this patch are loosely based on work by Douglas Fuller
<dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|