summaryrefslogtreecommitdiff
path: root/net/ceph/messenger.c
AgeCommit message (Collapse)Author
2013-05-02libceph: don't clear bio_iter in prepare_write_message()Alex Elder
At one time it was necessary to clear a message's bio_iter field to avoid a bad pointer dereference in write_partial_msg_pages(). That no longer seems to be the case. Here's why. The message's bio fields represent (in this case) outgoing data. Between where the bio_iter is made NULL in prepare_write_message() and the call in that function to prepare_message_data(), the bio fields are never used. In prepare_message_data(), init-bio_iter() is called, and the result of that overwrites the value in the message's bio_iter field. Because it gets overwritten anyway, there is no need to set it to NULL. So don't do it. This resolves: http://tracker.ceph.com/issues/4402 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: activate message data assignment checksAlex Elder
The mds client no longer tries to assign zero-length message data, and the osd client no longer sets its data info more than once. This allows us to activate assertions in the messenger to verify these things never happen. This resolves both of these: http://tracker.ceph.com/issues/4263 http://tracker.ceph.com/issues/4284 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-02libceph: record message data byte lengthAlex Elder
Record the number of bytes of data in a page array rather than the number of pages in the array. It can be assumed that the page array is of sufficient size to hold the number of bytes indicated (and offset by the indicated alignment). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02ceph: only set message data pointers if non-emptyAlex Elder
Change it so we only assign outgoing data information for messages if there is outgoing data to send. This then allows us to add a few more (currently commented-out) assertions. This is related to: http://tracker.ceph.com/issues/4284 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-02libceph: isolate other message data fieldsAlex Elder
Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: set page info with byte lengthAlex Elder
When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: isolate message page field manipulationAlex Elder
Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: simplify new message initializationAlex Elder
Rather than explicitly initializing many fields to 0, NULL, or false in a newly-allocated message, just use kzalloc() for allocating new messages. This will become a much more convenient way of doing things anyway for upcoming patches that abstract the data field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: advance pagelist with list_rotate_left()Alex Elder
While processing an outgoing pagelist (either the data pagelist or trail) in a ceph message, the messenger cycles through each of the pages on the list. This is accomplished in out_msg_pos_next(), if the end of the first page on the list is reached, the first page is moved to the end of the list. There is a list operation, list_rotate_left(), which performs exactly this operation, and by using it, what's really going on becomes more obvious. So replace these two list_move_tail() calls with list_rotate_left(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: define and use in_msg_pos_next()Alex Elder
Define a new function in_msg_pos_next() to match out_msg_pos_next(), and use it in place of code at the end of read_partial_message_pages() and read_partial_message_bio(). Note that the page number is incremented and offset reset under slightly different conditions from before. The result is equivalent, however, as explained below. Each time an incoming message is going to arrive, we find out how much room is left--not surpassing the current page--and provide that as the number of bytes to receive. So the amount we'll use is the lesser of: all that's left of the entire request; and all that's left in the current page. If we received exactly how many were requested, we either reached the end of the request or the end of the page. In the first case, we're done, in the second, we move onto the next page in the array. In all cases but (possibly) on the last page, after adding the number of bytes received, page_pos == PAGE_SIZE. On the last page, it doesn't really matter whether we increment the page number and reset the page position, because we're done and we won't come back here again. The code previously skipped over that last case, basically. The new code handles that case the same as the others, incrementing and resetting. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: kill args in read_partial_message_bio()Alex Elder
There is only one caller for read_partial_message_bio(), and it always passes &msg->bio_iter and &bio_seg as the second and third arguments. Furthermore, the message in question is always the connection's in_msg, and we can get that inside the called function. So drop those two parameters and use their derived equivalents. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: change type of ceph_tcp_sendpage() "more"Alex Elder
Change the type of the "more" parameter from int to bool. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: minor byte order problems in read_partial_message()Alex Elder
Some values printed are not (necessarily) in CPU order. We already have a copy of the converted versions, so use them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: define CEPH_MSG_MAX_MIDDLE_LENAlex Elder
This is probably unnecessary but the code read as if it were wrong in read_partial_message(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: clean up skipped message logicAlex Elder
In ceph_con_in_msg_alloc() it is possible for a connection's alloc_msg method to indicate an incoming message should be skipped. By default, read_partial_message() initializes the skip variable to 0 before it gets provided to ceph_con_in_msg_alloc(). The osd client, mon client, and mds client each supply an alloc_msg method. The mds client always assigns skip to be 0. The other two leave the skip value of as-is, or assigns it to zero, except: - if no (osd or mon) request having the given tid is found, in which case skip is set to 1 and NULL is returned; or - in the osd client, if the data of the reply message is not adequate to hold the message to be read, it assigns skip value 1 and returns NULL. So the returned message pointer will always be NULL if skip is ever non-zero. Clean up the logic a bit in ceph_con_in_msg_alloc() to make this state of affairs more obvious. Add a comment explaining how a null message pointer can mean either a message that should be skipped or a problem allocating a message. This resolves: http://tracker.ceph.com/issues/4324 Reported-by: Greg Farnum <greg@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-02libceph: define mds_alloc_msg() methodAlex Elder
The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-02libceph: drop mutex while allocating a messageAlex Elder
In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a connection a new message is allocated with ceph_msg_new(). Drop the mutex before making this call, and make sure we're still connected when we get it back again. This is preparing for the next patch, which ensures all connections define an alloc_msg method, and then handles them all the same way. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-02libceph: kill ceph_msg->pagelist_countAlex Elder
The pagelist_count field is never actually used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: distinguish page array and pagelist countAlex Elder
Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: make ceph_msg->bio_seg be unsignedAlex Elder
The bio_seg field is used by the ceph messenger in iterating through a bio. It should never have a negative value, so make it an unsigned. (I contemplated making it unsigned short to match the struct bio definition, but it offered no benefit.) Change variables used to hold bio_seg values to all be unsigned as well. Change two variable names in init_bio_iter() to match the convention used everywhere else. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: use a do..while loop in con_work()Alex Elder
This just converts a manually-implemented loop into a do..while loop in con_work(). It also moves handling of EAGAIN inside the blocks where it's already been determined an error code was returned. Also update a few dout() calls near the affected code for consistency. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: use a flag to indicate a fault has occurredAlex Elder
This just rearranges the logic in con_work() a little bit so that a flag is used to indicate a fault has occurred. This allows both the fault and non-fault case to be handled the same way and avoids a couple of nearly consecutive gotos. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: separate non-locked fault handlingAlex Elder
An error occurring on a ceph connection is treated as a fault, causing the connection to be reset. The initial part of this fault handling has to be done while holding the connection mutex, but it must then be dropped for the last part. Separate the part of this fault handling that executes without the lock into its own function, con_fault_finish(). Move the call to this new function, as well as call that drops the connection mutex, into ceph_fault(). Rename that function con_fault() to reflect that it's only handling the connection part of the fault handling. The motivation for this was a warning from sparse about the locking being done here. Rearranging things this way keeps all the mutex manipulation within ceph_fault(), and this stops sparse from complaining. This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: encapsulate connection backoffAlex Elder
Collect the code that tests for and implements a backoff delay for a ceph connection into a new function, ceph_backoff(). Make the debug output messages in that part of the code report things consistently by reporting a message in the socket closed case, and by making the one for PREOPEN state report the connection pointer like the rest. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: eliminate sparse warningsAlex Elder
Eliminate most of the problems in the libceph code that cause sparse to issue warnings. - Convert functions that are never referenced externally to have static scope. - Pass NULL rather than 0 for a pointer argument in one spot in ceph_monc_delete_snapid() This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25libceph: define connection flag helpersAlex Elder
Define and use functions that encapsulate operations performed on a connection's flags. This resolves: http://tracker.ceph.com/issues/4234 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-14libceph: fix messenger CONFIG_BLOCK dependenciesAlex Elder
The ceph messenger has a few spots that are only used when bio messages are supported, and that's only when CONFIG_BLOCK is defined. This surrounds a couple of spots with #ifdef's that would cause a problem if CONFIG_BLOCK were not present in the kernel configuration. This resolves: http://tracker.ceph.com/issues/3976 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-28libceph: fix protocol feature mismatch failure pathSage Weil
We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-12-28libceph: WARN, don't BUG on unexpected connection statesAlex Elder
A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20libceph: report connection fault with warningAlex Elder
When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish a socket connection repeatedly if a fault occurs. This means that these error messages will get repeatedly added to the log, which is undesirable. Change the error message to be a warning, so they don't get logged by default. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17libceph: socket can close in any connection stateAlex Elder
A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the time its socket is closed. Handle those other cases at the top of con_work(). Pull this whole block of code into a separate function to reduce the clutter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-26libceph: avoid NULL kref_put from NULL alloc_msg returnSage Weil
The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which may return NULL. It also drops con->mutex while it allocates a message, which means that the connection state may change (e.g., get closed). If that happens, we clean up and bail out. Avoid calling ceph_msg_put() on a NULL return value and triggering a crash. This was observed when an ->alloc_msg() call races with a timeout that resends a zillion messages and resets the connection, and ->alloc_msg() returns NULL (because the request was resent to another target). Fixes http://tracker.newdream.net/issues/3342 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-10rbd: define common queue_con_delay()Alex Elder
This patch defines a single function, queue_con_delay() to call queue_delayed_work() for a connection. It basically generalizes what was previously queue_con() by adding the delay argument. queue_con() is now a simple helper that passes 0 for its delay. queue_con_delay() returns 0 if it queued work or an errno if it did not for some reason. If con_work() finds the BACKOFF flag set for a connection, it now calls queue_con_delay() to handle arranging to start again after a delay. Note about connection reference counts: con_work() only ever gets called as a work item function. At the time that work is scheduled, a reference to the connection is acquired, and the corresponding con_work() call is then responsible for dropping that reference before it returns. Previously, the backoff handling inside con_work() silently handed off its reference to delayed work it scheduled. Now that queue_con_delay() is used, a new reference is acquired for the newly-scheduled work, and the original reference is dropped by the con->ops->put() call at the end of the function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-10rbd: let con_work() handle backoffAlex Elder
Both ceph_fault() and con_work() include handling for imposing a delay before doing further processing on a faulted connection. The latter is used only if ceph_fault() is unable to. Instead, just let con_work() always be responsible for implementing the delay. After setting up the delay value, set the BACKOFF flag on the connection unconditionally and call queue_con() to ensure con_work() will get called to handle it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-10rbd: reset BACKOFF if unable to re-queueAlex Elder
If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-09-22libceph: only kunmap kmapped pagesAlex Elder
In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: Bryan K. Wright <bryan@virginia.edu> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-08-21libceph: avoid truncation due to racing bannersJim Schutt
Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...
2012-07-31libceph: recheck con state after allocating incoming messageSage Weil
We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31libceph: change ceph_con_in_msg_alloc convention to be less weirdSage Weil
This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31libceph: avoid dropping con mutex before faultSage Weil
The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31libceph: verify state after retaking con lock after dispatchSage Weil
We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31libceph: fix handling of immediate socket connect failureSage Weil
If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-31libceph: clear all flags on con_closeSage Weil
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: clean up con flagsSage Weil
Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: replace connection state bits with statesSage Weil
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: drop unnecessary CLOSED check in socket state change callbackSage Weil
If we are CLOSED, the socket is closed and we won't get these. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: close socket directly from ceph_con_close()Sage Weil
It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: drop gratuitous socket close calls in con_workSage Weil
If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-31libceph: move ceph_con_send() closed check under the con mutexSage Weil
Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by: Sage Weil <sage@inktank.com>