summaryrefslogtreecommitdiff
path: root/net/tipc/node.h
AgeCommit message (Collapse)Author
2015-10-24tipc: clean up unused code and structuresJon Paul Maloy
After the previous changes in this series, we can now remove some unused code and structures, both in the broadcast, link aggregation and link code. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24tipc: let broadcast packet reception use new link receive functionJon Paul Maloy
The code path for receiving broadcast packets is currently distinct from the unicast path. This leads to unnecessary code and data duplication, something that can be avoided with some effort. We now introduce separate per-peer tipc_link instances for handling broadcast packet reception. Each receive link keeps a pointer to the common, single, broadcast link instance, and can hence handle release and retransmission of send buffers as if they belonged to the own instance. Furthermore, we let each unicast link instance keep a reference to both the pertaining broadcast receive link, and to the common send link. This makes it possible for the unicast links to easily access data for broadcast link synchronization, as well as for carrying acknowledges for received broadcast packets. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24tipc: introduce capability bit for broadcast synchronizationJon Paul Maloy
Until now, we have tried to support both the newer, dedicated broadcast synchronization mechanism along with the older, less safe, RESET_MSG/ ACTIVATE_MSG based one. The latter method has turned out to be a hazard in a highly dynamic cluster, so we find it safer to disable it completely when we find that the former mechanism is supported by the peer node. For this purpose, we now introduce a new capabability bit, TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast syncronization is supported by the present node. The new bit is conveyed between peers in the 'capabilities' field of neighbor discovery messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31tipc: remove implicit message delivery in node_unlock()Jon Paul Maloy
After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31tipc: move received discovery data evaluation inside node.cJon Paul Maloy
The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31tipc: move link synch and failover to link aggregation levelJon Paul Maloy
Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31tipc: extend node FSMJon Paul Maloy
In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |------------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_|FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +------->| SELF_UP_ |<-------+ | | | | +----------------| PEER_UP |------------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING|<------|PEER_COMING| |SELF_COMING|------>|SELF_LEAVING| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ | DOWN_EVT A A DOWN_EVT | | | | | | | | | | SELF_UP_EVT| |PEER_UP_EVT | | | | | | | | | |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT| +------------------->| SELF_DOWN_ |<--------------------+ | PEER_DOWN | +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31tipc: move all link_reset() calls to link aggregation levelJon Paul Maloy
In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: reduce locking scope during packet receptionJon Paul Maloy
We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: introduce node contact FSMJon Paul Maloy
The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ | SELF_UP/ | +---------------->| PEER_COMING |-----------------+ SELF_ | +--------------+ |PEER_ ESTBL_ | | |ESTBL_ CONTACT| SELF_LOST_CONTACT | |CONTACT | v | | +--------------+ | | PEER_ | SELF_DOWN/ | SELF_ | | LOST_ +--| PEER_LEAVING |<--+ LOST_ v +-------------+ CONTACT | +--------------+ | CONTACT +-----------+ | SELF_DOWN/ |<----------+ +----------| SELF_UP/ | | PEER_DOWN |<----------+ +----------| PEER_UP | +-------------+ SELF_ | +--------------+ | PEER_ +-----------+ | LOST_ +--| SELF_LEAVING/|<--+ LOST_ A | CONTACT | PEER_DOWN | CONTACT | | +--------------+ | | A | PEER_ | PEER_LOST_CONTACT | |SELF_ ESTBL_ | | |ESTBL_ CONTACT| +--------------+ |CONTACT +---------------->| PEER_UP/ |-----------------+ | SELF_COMING | +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: move link supervision timer to node levelJon Paul Maloy
In our effort to move control of the links to the link aggregation layer, we move the perodic link supervision timer to struct tipc_node. The new timer is shared between all links belonging to the node, thus saving resources, while still kicking the FSM on both its pertaining links at each expiration. The current link timer and corresponding functions are removed. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: make media xmit call outside node spinlock contextJon Paul Maloy
Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: use bearer index when looking up active linksJon Paul Maloy
struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: move link input queue to tipc_nodeJon Paul Maloy
At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: move link creation from neighbor discoverer to nodeJon Paul Maloy
As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21tipc: introduce link entry structure to struct tipc_nodeJon Paul Maloy
struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14tipc: simplify include dependenciesJon Paul Maloy
When we try to add new inline functions in the code, we sometimes run into circular include dependencies. The main problem is that the file core.h, which really should be at the root of the dependency chain, instead is a leaf. I.e., core.h includes a number of header files that themselves should be allowed to include core.h. In reality this is unnecessary, because core.h does not need to know the full signature of any of the structs it refers to, only their type declaration. In this commit, we remove all dependencies from core.h towards any other tipc header file. As a consequence of this change, we can now move the function tipc_own_addr(net) from addr.c to addr.h, and make it inline. There are no functional changes in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29tipc: involve reference counter for node structureYing Xue
TIPC node hash node table is protected with rcu lock on read side. tipc_node_find() is used to look for a node object with node address through iterating the hash node table. As the entire process of what tipc_node_find() traverses the table is guarded with rcu read lock, it's safe for us. However, when callers use the node object returned by tipc_node_find(), there is no rcu read lock applied. Therefore, this is absolutely unsafe for callers of tipc_node_find(). Now we introduce a reference counter for node structure. Before tipc_node_find() returns node object to its caller, it first increases the reference counter. Accordingly, after its caller used it up, it decreases the counter again. This can prevent a node being used by one thread from being freed by another thread. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29tipc: fix potential deadlock when all links are resetYing Xue
[ 60.988363] ====================================================== [ 60.988754] [ INFO: possible circular locking dependency detected ] [ 60.989152] 3.19.0+ #194 Not tainted [ 60.989377] ------------------------------------------------------- [ 60.989781] swapper/3/0 is trying to acquire lock: [ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.990743] [ 60.990743] but task is already holding lock: [ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.991738] [ 60.991738] which lock already depends on the new lock. [ 60.991738] [ 60.992174] [ 60.992174] the existing dependency chain (in reverse order) is: [ 60.992174] -> #1 (&(&bclink->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc] [ 60.992174] [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc] [ 60.992174] [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc] [ 60.992174] [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc] [ 60.992174] [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] -> #0 (&(&n_ptr->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0 [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.992174] [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc] [ 60.992174] [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] [ 60.992174] other info that might help us debug this: [ 60.992174] [ 60.992174] Possible unsafe locking scenario: [ 60.992174] [ 60.992174] CPU0 CPU1 [ 60.992174] ---- ---- [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] [ 60.992174] *** DEADLOCK *** [ 60.992174] [ 60.992174] 3 locks held by swapper/3/0: [ 60.992174] #0: (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980 [ 60.992174] #1: (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc] [ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] The correct the sequence of grabbing n_ptr->lock and bclink->lock should be that the former is first held and the latter is then taken, which exactly happened on CPU1. But especially when the retransmission of broadcast link is failed, bclink->lock is first held in tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure() called by tipc_link_retransmit() subsequently, which is demonstrated on CPU0. As a result, deadlock occurs. If the order of holding the two locks happening on CPU0 is reversed, the deadlock risk will be relieved. Therefore, the node lock taken in link_retransmit_failure() originally is moved to tipc_bclink_rcv() so that it's obtained before bclink lock. But the precondition of the adjustment of node lock is that responding to bclink reset event must be moved from tipc_bclink_unlock() to tipc_node_unlock(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14tipc: split link outqueueJon Paul Maloy
struct tipc_link contains one single queue for outgoing packets, where both transmitted and waiting packets are queued. This infrastructure is hard to maintain, because we need to keep a number of fields to keep track of which packets are sent or unsent, and the number of packets in each category. A lot of code becomes simpler if we split this queue into a transmission queue, where sent/unacknowledged packets are kept, and a backlog queue, where we keep the not yet sent packets. In this commit we do this separation. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-14tipc: add framework for node capabilities exchangeJon Paul Maloy
The TIPC protocol spec has defined a 13 bit capability bitmap in the neighbor discovery header, as a means to maintain compatibility between different code and protocol generations. Until now this field has been unused. We now introduce the basic framework for exchanging capabilities between nodes at first contact. After exchange, a peer node's capabilities are stored as a 16 bit bitmap in struct tipc_node. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl node dump to nl compatRichard Alpe
Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09tipc: convert legacy nl link dump to nl compatRichard Alpe
Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06tipc: eliminate race condition at multicast receptionJon Paul Maloy
In a previous commit in this series we resolved a race problem during unicast message reception. Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06tipc: resolve race problem at unicast message receptionJon Paul Maloy
TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee. We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks. This solves the sequentiality problem, and seems to cause no measurable performance degradation. A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12tipc: make tipc node table aware of net namespaceYing Xue
Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links) To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26tipc: use generic SKB list APIs to manage deferred queue of linkYing Xue
Use standard SKB list APIs associated with struct sk_buff_head to manage link's deferred queue, simplifying relevant code. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26tipc: remove node subscription infrastructureYing Xue
The node subscribe infrastructure represents a virtual base class, so its users, such as struct tipc_port and struct publication, can derive its implemented functionalities. However, after the removal of struct tipc_port, struct publication is left as its only single user now. So defining an abstract infrastructure for one user becomes no longer reasonable. If corresponding new functions associated with the infrastructure are moved to name_table.c file, the node subscription infrastructure can be removed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21tipc: add node get/dump to new netlink apiRichard Alpe
Add TIPC_NL_NODE_GET to the new tipc netlink API. This command can dump the address and node status of all nodes in the tipc cluster. Netlink logical layout of returned node/address data: -> node -> address -> up flag Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21tipc: fix a potential deadlockYing Xue
Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07tipc: fix bug in multicast congestion handlingJon Maloy
One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use message to abort connections when losing contact to nodeJon Paul Maloy
In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-23tipc: use pseudo message to wake up sockets after link congestionJon Paul Maloy
The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27tipc: make link mtu easily accessible from socketJon Paul Maloy
Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14tipc: rename and move message reassembly functionJon Paul Maloy
The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-09tipc: rename enum names of node flagsYing Xue
Rename node flags to action_flags as well as its enum names so that they can reflect its real meanings. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05tipc: avoid to asynchronously deliver name tables to peer nodeYing Xue
Postpone the actions of delivering name tables until after node lock is released, avoiding to do it under asynchronous context. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05tipc: remove TIPC_NAMES_GONE node flagYing Xue
Since previously what all publications pertaining to the lost node were removed from name table was finished in tasklet context asynchronously, we need to TIPC_NAMES_GONE flag indicating whether the node cleanup work is finished or not. But now as the cleanup work has been finished when node lock is released, the flag becomes meaningless for us. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05tipc: avoid to asynchronously notify subscriptionsYing Xue
Postpone the actions of notifying subscriptions until after node lock is released, avoiding to asynchronously execute registered handlers when node is lost. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05tipc: rename setup_blocked variable of node struct to flagsYing Xue
Rename setup_blocked variable of node struct to a more common name called "flags", which will be used to represent kinds of node states. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05tipc: adjust order of variables in tipc_node structureYing Xue
Move more frequently used variables up to the head of tipc_node structure, hopefully improving a bit performance. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-26tipc: add ioctl to fetch link namesErik Hugne
We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27tipc: tipc: convert node list and node hlist to RCU listsYing Xue
Convert tipc_node_list list and node_htable hash list to RCU lists. On read side, the two lists are protected with RCU read lock, and on update side, node_list_lock is applied to them. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27tipc: rename node create lock to protect node list and hlistYing Xue
When a node is created, tipc_net_lock read lock is first held and then node_create_lock is grabbed in order to prevent the same node from being created and inserted into both node list and hlist twice. But when we query node from the two node lists, we only hold tipc_net_lock read lock without grabbing node_create_lock. Obviously this locking policy is unable to guarantee that the two node lists are always synchronized especially when the operation of changing and accessing them occurs in different contexts like currently doing. Therefore, rename node_create_lock to node_list_lock to protect the two node lists, that is, whenever node is inserted into them or node is queried from them, the node_list_lock should be always held. As a result, tipc_net_lock read lock becomes redundant and then can be removed from the node query functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07tipc: remove 'has_redundant_link' flag from STATE link protocol messagesJon Paul Maloy
The flag 'has_redundant_link' is defined only in RESET and ACTIVATE protocol messages. Due to an ambiguity in the protocol specification it is currently also transferred in STATE messages. Its value is used to initialize a link state variable, 'permit_changeover', which is used to inhibit futile link failover attempts when it is known that the peer node has no working links at the moment, although the local node may still think it has one. The fact that 'has_redundant_link' incorrectly is read from STATE messages has the effect that 'permit_changeover' sometimes gets a wrong value, and permanently blocks any links from being re-established. Such failures can only occur in in dual-link systems, and are extremely rare. This bug seems to have always been present in the code. Furthermore, since commit b4b5610223f17790419b03eaa962b0e3ecf930d7 ("tipc: Ensure both nodes recognize loss of contact between them"), the 'permit_changeover' field serves no purpose any more. The task of enforcing 'lost contact' cycles at both peer endpoints is now taken by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN in struct tipc_node to abort unnecessary failover attempts. We therefore remove the 'has_redundant_link' flag from STATE messages, as well as the now redundant 'permit_changeover' variable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05tipc: remove unused codestephen hemminger
Remove dead code; tipc_bearer_find_interface tipc_node_redundant_links This may break out of tree version of TIPC if there still is one. But that maybe a good thing :-) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07tipc: message reassembly using fragment chainErik Hugne
When the first fragment of a long data data message is received on a link, a reassembly buffer large enough to hold the data from this and all subsequent fragments of the message is allocated. The payload of each new fragment is copied into this buffer upon arrival. When the last fragment is received, the reassembled message is delivered upwards to the port/socket layer. Not only is this an inefficient approach, but it may also cause bursts of reassembly failures in low memory situations. since we may fail to allocate the necessary large buffer in the first place. Furthermore, after 100 subsequent such failures the link will be reset, something that in reality aggravates the situation. To remedy this problem, this patch introduces a different approach. Instead of allocating a big reassembly buffer, we now append the arriving fragments to a reassembly chain on the link, and deliver the whole chain up to the socket layer once the last fragment has been received. This is safe because the retransmission layer of a TIPC link always delivers packets in strict uninterrupted order, to the reassembly layer as to all other upper layers. Hence there can never be more than one fragment chain pending reassembly at any given time in a link, and we can trust (but still verify) that the fragments will be chained up in the correct order. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-22tipc: rename supported flag to recv_permittedYing Xue
Rename the "supported" flag in bclink structure to "recv_permitted" to better reflect what it is used for. When this flag is set for a given node, we are permitted to receive and acknowledge broadcast messages from that node. Convert it to a bool at the same time, since it is not used to store any numerical values. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22tipc: remove supportable flag from bclink structureYing Xue
The "supportable" flag in bclink structure is a compatibility flag indicating whether a peer node is capable of receiving TIPC broadcast messages. However, all TIPC versions since tipc-1.5, and after the inclusion in the upstream Linux kernel in 2006, support this capability. It is highly unlikely that anybody is still using such an old version of TIPC, let alone that they want to mix it with TIPC-2.0 nodes. Therefore, we now remove the "supportable" flag. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-30tipc: compress out gratuitous extra carriage returnsPaul Gortmaker
Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>