From 27915aa61060fd8954a68a86657784705955088a Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 2 Nov 2016 18:14:43 +0100 Subject: batman-adv: Revert "fix splat on disabling an interface" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The commit 9799c50372b2 ("batman-adv: fix splat on disabling an interface") fixed a warning but at the same time broke the rtnl function add_slave for devices which were temporarily removed. batadv_softif_slave_add requires soft_iface of and hard_iface to be NULL before it is allowed to be enslaved. But this resetting of soft_iface to NULL in batadv_hardif_disable_interface was removed with the aforementioned commit. Reported-by: Julian Labus Signed-off-by: Sven Eckelmann Acked-by: Linus Lüssing Signed-off-by: Simon Wunderlich diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index e034afb..08ce361 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -652,6 +652,7 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface, batadv_softif_destroy_sysfs(hard_iface->soft_iface); } + hard_iface->soft_iface = NULL; batadv_hardif_put(hard_iface); out: -- cgit v0.10.2 From e13258f38e927b61cdb5f4ad25309450d3b127d1 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 29 Oct 2016 09:18:43 +0200 Subject: batman-adv: Detect missing primaryif during tp_send as error The throughput meter detects different situations as problems for the current test. It stops the test after these and reports it to userspace. This also has to be done when the primary interface disappeared during the test. Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Reported-by: Joe Perches Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c index 2333777..8af1611 100644 --- a/net/batman-adv/tp_meter.c +++ b/net/batman-adv/tp_meter.c @@ -837,6 +837,7 @@ static int batadv_tp_send(void *arg) primary_if = batadv_primary_if_get_selected(bat_priv); if (unlikely(!primary_if)) { err = BATADV_TP_REASON_DST_UNREACHABLE; + tp_vars->reason = err; goto out; } -- cgit v0.10.2 From c8eaf3479e521e973eb2d4111b8ee8f5b7b564ab Mon Sep 17 00:00:00 2001 From: Filip Matusiak Date: Wed, 2 Nov 2016 10:04:26 +0100 Subject: mac80211: Ignore VHT IE from peer with wrong rx_mcs_map This is a workaround for VHT-enabled STAs which break the spec and have the VHT-MCS Rx map filled in with value 3 for all eight spacial streams, an example is AR9462 in AP mode. As per spec, in section 22.1.1 Introduction to the VHT PHY A VHT STA shall support at least single spactial stream VHT-MCSs 0 to 7 (transmit and receive) in all supported channel widths. Some devices in STA mode will get firmware assert when trying to associate, examples are QCA9377 & QCA6174. Packet example of broken VHT Cap IE of AR9462: Tag: VHT Capabilities (IEEE Std 802.11ac/D3.1) Tag Number: VHT Capabilities (IEEE Std 802.11ac/D3.1) (191) Tag length: 12 VHT Capabilities Info: 0x00000000 VHT Supported MCS Set Rx MCS Map: 0xffff .... .... .... ..11 = Rx 1 SS: Not Supported (0x0003) .... .... .... 11.. = Rx 2 SS: Not Supported (0x0003) .... .... ..11 .... = Rx 3 SS: Not Supported (0x0003) .... .... 11.. .... = Rx 4 SS: Not Supported (0x0003) .... ..11 .... .... = Rx 5 SS: Not Supported (0x0003) .... 11.. .... .... = Rx 6 SS: Not Supported (0x0003) ..11 .... .... .... = Rx 7 SS: Not Supported (0x0003) 11.. .... .... .... = Rx 8 SS: Not Supported (0x0003) ...0 0000 0000 0000 = Rx Highest Long GI Data Rate (in Mb/s, 0 = subfield not in use): 0x0000 Tx MCS Map: 0xffff ...0 0000 0000 0000 = Tx Highest Long GI Data Rate (in Mb/s, 0 = subfield not in use): 0x0000 Signed-off-by: Filip Matusiak Signed-off-by: Johannes Berg diff --git a/net/mac80211/vht.c b/net/mac80211/vht.c index ee71576..6832bf6 100644 --- a/net/mac80211/vht.c +++ b/net/mac80211/vht.c @@ -270,6 +270,22 @@ ieee80211_vht_cap_ie_to_sta_vht_cap(struct ieee80211_sub_if_data *sdata, vht_cap->vht_mcs.tx_mcs_map |= cpu_to_le16(peer_tx << i * 2); } + /* + * This is a workaround for VHT-enabled STAs which break the spec + * and have the VHT-MCS Rx map filled in with value 3 for all eight + * spacial streams, an example is AR9462. + * + * As per spec, in section 22.1.1 Introduction to the VHT PHY + * A VHT STA shall support at least single spactial stream VHT-MCSs + * 0 to 7 (transmit and receive) in all supported channel widths. + */ + if (vht_cap->vht_mcs.rx_mcs_map == cpu_to_le16(0xFFFF)) { + vht_cap->vht_supported = false; + sdata_info(sdata, "Ignoring VHT IE from %pM due to invalid rx_mcs_map\n", + sta->addr); + return; + } + /* finally set up the bandwidth */ switch (vht_cap->cap & IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_MASK) { case IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ: -- cgit v0.10.2 From 6c18a6b4e79953ba38bc110e1e42ac45a951b25f Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Thu, 3 Nov 2016 12:12:47 +0100 Subject: Revert "mac80211: allow using AP_LINK_PS with mac80211-generated TIM IE" This reverts commit c68df2e7be0c1238ea3c281fd744a204ef3b15a0. __sta_info_recalc_tim turns into a no-op if local->ops->set_tim is not set. This prevents the beacon TIM bit from being set for all drivers that do not implement this op (almost all of them), thus thoroughly essential AP mode powersave functionality. Cc: Emmanuel Grumbach Fixes: c68df2e7be0c ("mac80211: allow using AP_LINK_PS with mac80211-generated TIM IE") Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 78e9ecb..8e05032 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -688,7 +688,7 @@ static void __sta_info_recalc_tim(struct sta_info *sta, bool ignore_pending) } /* No need to do anything if the driver does all */ - if (!local->ops->set_tim) + if (ieee80211_hw_check(&local->hw, AP_LINK_PS)) return; if (sta->dead) -- cgit v0.10.2 From 8fdd136f2200e6b7237e7e48453f4a591d768e3e Mon Sep 17 00:00:00 2001 From: "Pedersen, Thomas" Date: Mon, 31 Oct 2016 11:28:40 -0700 Subject: cfg80211: add bitrate for 20MHz MCS 9 Some drivers (ath10k) report MCS 9 @ 20MHz, which technically isn't defined. To get more meaningful value than 0 out of this however, just extrapolate a bitrate from ratio of MCS 7 and 9 in channels where it is allowed. Signed-off-by: Thomas Pedersen [add a comment about it in the code] Signed-off-by: Johannes Berg diff --git a/net/wireless/util.c b/net/wireless/util.c index 5ea12af..659b507 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -1158,7 +1158,8 @@ static u32 cfg80211_calculate_bitrate_vht(struct rate_info *rate) 58500000, 65000000, 78000000, - 0, + /* not in the spec, but some devices use this: */ + 86500000, }, { 13500000, 27000000, -- cgit v0.10.2 From c1f4c9ede3c799da9f920c1df9ce524145781637 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 4 Nov 2016 10:27:52 +0100 Subject: mac80211: update A-MPDU flag on tx dequeue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The sequence number counter is used to derive the starting sequence number. Since that counter is updated on tx dequeue, the A-MPDU flag needs to be up to date at the tme of dequeue as well. This patch prevents sending more A-MPDU frames after the session has been terminated and also ensures that aggregation starts right after the session has been established Fixes: bb42f2d13ffc ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue") Signed-off-by: Felix Fietkau Acked-by: Toke Høiland-Jørgensen Signed-off-by: Johannes Berg diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 1c56abc..d08a849 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3426,6 +3426,11 @@ begin: goto begin; } + if (test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags)) + info->flags |= IEEE80211_TX_CTL_AMPDU; + else + info->flags &= ~IEEE80211_TX_CTL_AMPDU; + if (info->control.flags & IEEE80211_TX_CTRL_FAST_XMIT) { struct sta_info *sta = container_of(txq->sta, struct sta_info, sta); -- cgit v0.10.2 From fff712cbe38b6d4e211df9c22aabcfd9739c1c2a Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 4 Nov 2016 10:27:53 +0100 Subject: mac80211: remove bogus skb vif assignment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The call to ieee80211_txq_enqueue overwrites the vif pointer with the codel enqueue time, so setting it just before that call makes no sense. Signed-off-by: Felix Fietkau Acked-by: Toke Høiland-Jørgensen Signed-off-by: Johannes Berg diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index d08a849..fb73e86 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1501,7 +1501,6 @@ static bool ieee80211_queue_skb(struct ieee80211_local *local, struct sta_info *sta, struct sk_buff *skb) { - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); struct fq *fq = &local->fq; struct ieee80211_vif *vif; struct txq_info *txqi; @@ -1526,8 +1525,6 @@ static bool ieee80211_queue_skb(struct ieee80211_local *local, if (!txqi) return false; - info->control.vif = vif; - spin_lock_bh(&fq->lock); ieee80211_txq_enqueue(local, txqi, skb); spin_unlock_bh(&fq->lock); -- cgit v0.10.2 From a786f96da0d657bf8bd56d8eebb3f31cc45605bb Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 4 Nov 2016 10:27:54 +0100 Subject: mac80211: fix A-MSDU aggregation with fast-xmit + txq MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A-MSDU aggregation alters the QoS header after a frame has been enqueued, so it needs to be ready before enqueue and not overwritten again afterwards Fixes: bb42f2d13ffc ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue") Signed-off-by: Felix Fietkau Acked-by: Toke Høiland-Jørgensen Signed-off-by: Johannes Berg diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index fb73e86..bd5f4be 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3210,7 +3210,6 @@ static void ieee80211_xmit_fast_finish(struct ieee80211_sub_if_data *sdata, if (hdr->frame_control & cpu_to_le16(IEEE80211_STYPE_QOS_DATA)) { tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; - *ieee80211_get_qos_ctl(hdr) = tid; hdr->seq_ctrl = ieee80211_tx_next_seq(sta, tid); } else { info->flags |= IEEE80211_TX_CTL_ASSIGN_SEQ; @@ -3335,6 +3334,11 @@ static bool ieee80211_xmit_fast(struct ieee80211_sub_if_data *sdata, (tid_tx ? IEEE80211_TX_CTL_AMPDU : 0); info->control.flags = IEEE80211_TX_CTRL_FAST_XMIT; + if (hdr->frame_control & cpu_to_le16(IEEE80211_STYPE_QOS_DATA)) { + tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; + *ieee80211_get_qos_ctl(hdr) = tid; + } + __skb_queue_head_init(&tx.skbs); tx.flags = IEEE80211_TX_UNICAST; -- cgit v0.10.2 From 4fb7f8af1f4c14a2a6cee7c9ff0cf999d918c72d Mon Sep 17 00:00:00 2001 From: Benjamin Beichler Date: Fri, 11 Nov 2016 17:37:56 +0100 Subject: mac80211_hwsim: fix beacon delta calculation Due to the cast from uint32_t to int64_t, a wrong next beacon timing is calculated and effectively the beacon timer stops working. This is especially bad for 802.11s mesh networks, because discovery breaks without beacons. Signed-off-by: Benjamin Beichler Signed-off-by: Johannes Berg diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 431f13b..d3bad57 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -826,7 +826,7 @@ static void mac80211_hwsim_set_tsf(struct ieee80211_hw *hw, data->bcn_delta = do_div(delta, bcn_int); } else { data->tsf_offset -= delta; - data->bcn_delta = -do_div(delta, bcn_int); + data->bcn_delta = -(s64)do_div(delta, bcn_int); } } -- cgit v0.10.2 From f6c365fad1034c66f9969d1435ffad9102f966bb Mon Sep 17 00:00:00 2001 From: Jia Jie Ho Date: Mon, 14 Nov 2016 17:06:49 +0800 Subject: net: ethernet: Fix SGMII unable to switch speed and autonego failure TSE PCS SGMII ethernet has an issue where switching speed doesn't work caused by a faulty register macro offset. This fixes the issue. Signed-off-by: Jia Jie Ho Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c index 2920e2e..489ef14 100644 --- a/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c +++ b/drivers/net/ethernet/stmicro/stmmac/altr_tse_pcs.c @@ -63,8 +63,8 @@ #define TSE_PCS_SGMII_LINK_TIMER_0 0x0D40 #define TSE_PCS_SGMII_LINK_TIMER_1 0x0003 #define TSE_PCS_SW_RESET_TIMEOUT 100 -#define TSE_PCS_USE_SGMII_AN_MASK BIT(2) -#define TSE_PCS_USE_SGMII_ENA BIT(1) +#define TSE_PCS_USE_SGMII_AN_MASK BIT(1) +#define TSE_PCS_USE_SGMII_ENA BIT(0) #define SGMII_ADAPTER_CTRL_REG 0x00 #define SGMII_ADAPTER_DISABLE 0x0001 -- cgit v0.10.2 From 24803f38a5c0b6c57ed800b47e695f9ce474bc3a Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Mon, 14 Nov 2016 16:16:28 +0800 Subject: igmp: do not remove igmp souce list info when set link down In commit 24cf3af3fed5 ("igmp: call ip_mc_clear_src..."), we forgot to remove igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src(). This make us clear all IGMPv3 source filter info after NETDEV_DOWN. Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need ip_mc_clear_src() in ip_mc_destroy_dev(). On the other hand, we should restore back instead of free all source filter info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE. Fixes: 24cf3af3fed5 ("igmp: call ip_mc_clear_src() only when ...") Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 606cc3e..15db786 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -162,7 +162,7 @@ static int unsolicited_report_interval(struct in_device *in_dev) } static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im); -static void igmpv3_del_delrec(struct in_device *in_dev, __be32 multiaddr); +static void igmpv3_del_delrec(struct in_device *in_dev, struct ip_mc_list *im); static void igmpv3_clear_delrec(struct in_device *in_dev); static int sf_setstate(struct ip_mc_list *pmc); static void sf_markstate(struct ip_mc_list *pmc); @@ -1130,10 +1130,15 @@ static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im) spin_unlock_bh(&in_dev->mc_tomb_lock); } -static void igmpv3_del_delrec(struct in_device *in_dev, __be32 multiaddr) +/* + * restore ip_mc_list deleted records + */ +static void igmpv3_del_delrec(struct in_device *in_dev, struct ip_mc_list *im) { struct ip_mc_list *pmc, *pmc_prev; - struct ip_sf_list *psf, *psf_next; + struct ip_sf_list *psf; + struct net *net = dev_net(in_dev->dev); + __be32 multiaddr = im->multiaddr; spin_lock_bh(&in_dev->mc_tomb_lock); pmc_prev = NULL; @@ -1149,16 +1154,26 @@ static void igmpv3_del_delrec(struct in_device *in_dev, __be32 multiaddr) in_dev->mc_tomb = pmc->next; } spin_unlock_bh(&in_dev->mc_tomb_lock); + + spin_lock_bh(&im->lock); if (pmc) { - for (psf = pmc->tomb; psf; psf = psf_next) { - psf_next = psf->sf_next; - kfree(psf); + im->interface = pmc->interface; + im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + im->sfmode = pmc->sfmode; + if (pmc->sfmode == MCAST_INCLUDE) { + im->tomb = pmc->tomb; + im->sources = pmc->sources; + for (psf = im->sources; psf; psf = psf->sf_next) + psf->sf_crcount = im->crcount; } in_dev_put(pmc->interface); - kfree(pmc); } + spin_unlock_bh(&im->lock); } +/* + * flush ip_mc_list deleted records + */ static void igmpv3_clear_delrec(struct in_device *in_dev) { struct ip_mc_list *pmc, *nextpmc; @@ -1366,7 +1381,7 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr) ip_mc_hash_add(in_dev, im); #ifdef CONFIG_IP_MULTICAST - igmpv3_del_delrec(in_dev, im->multiaddr); + igmpv3_del_delrec(in_dev, im); #endif igmp_group_added(im); if (!in_dev->dead) @@ -1626,8 +1641,12 @@ void ip_mc_remap(struct in_device *in_dev) ASSERT_RTNL(); - for_each_pmc_rtnl(in_dev, pmc) + for_each_pmc_rtnl(in_dev, pmc) { +#ifdef CONFIG_IP_MULTICAST + igmpv3_del_delrec(in_dev, pmc); +#endif igmp_group_added(pmc); + } } /* Device going down */ @@ -1648,7 +1667,6 @@ void ip_mc_down(struct in_device *in_dev) in_dev->mr_gq_running = 0; if (del_timer(&in_dev->mr_gq_timer)) __in_dev_put(in_dev); - igmpv3_clear_delrec(in_dev); #endif ip_mc_dec_group(in_dev, IGMP_ALL_HOSTS); @@ -1688,8 +1706,12 @@ void ip_mc_up(struct in_device *in_dev) #endif ip_mc_inc_group(in_dev, IGMP_ALL_HOSTS); - for_each_pmc_rtnl(in_dev, pmc) + for_each_pmc_rtnl(in_dev, pmc) { +#ifdef CONFIG_IP_MULTICAST + igmpv3_del_delrec(in_dev, pmc); +#endif igmp_group_added(pmc); + } } /* @@ -1704,13 +1726,13 @@ void ip_mc_destroy_dev(struct in_device *in_dev) /* Deactivate timers */ ip_mc_down(in_dev); +#ifdef CONFIG_IP_MULTICAST + igmpv3_clear_delrec(in_dev); +#endif while ((i = rtnl_dereference(in_dev->mc_list)) != NULL) { in_dev->mc_list = i->next_rcu; in_dev->mc_count--; - - /* We've dropped the groups in ip_mc_down already */ - ip_mc_clear_src(i); ip_ma_put(i); } } -- cgit v0.10.2 From d2042052a0aa6a54f01a0c9e14243ec040b100e2 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Mon, 14 Nov 2016 09:27:28 +0100 Subject: stmmac: update the PTP header file This patch is to update this file by using BIT macros, removing not used defines and fixes some typos. Signed-off-by: Giuseppe Cavallaro Acked-by: Rayagond Kokatanur Acked-by: Alexandre TORGUE Acked-by: Richard Cochran Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h index 4535df3..c06938c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h @@ -22,51 +22,53 @@ Author: Rayagond Kokatanur ******************************************************************************/ -#ifndef __STMMAC_PTP_H__ -#define __STMMAC_PTP_H__ +#ifndef __STMMAC_PTP_H__ +#define __STMMAC_PTP_H__ -/* IEEE 1588 PTP register offsets */ -#define PTP_TCR 0x0700 /* Timestamp Control Reg */ -#define PTP_SSIR 0x0704 /* Sub-Second Increment Reg */ -#define PTP_STSR 0x0708 /* System Time – Seconds Regr */ -#define PTP_STNSR 0x070C /* System Time – Nanoseconds Reg */ -#define PTP_STSUR 0x0710 /* System Time – Seconds Update Reg */ -#define PTP_STNSUR 0x0714 /* System Time – Nanoseconds Update Reg */ -#define PTP_TAR 0x0718 /* Timestamp Addend Reg */ -#define PTP_TTSR 0x071C /* Target Time Seconds Reg */ -#define PTP_TTNSR 0x0720 /* Target Time Nanoseconds Reg */ -#define PTP_STHWSR 0x0724 /* System Time - Higher Word Seconds Reg */ -#define PTP_TSR 0x0728 /* Timestamp Status */ +#define PTP_GMAC4_OFFSET 0xb00 +#define PTP_GMAC3_X_OFFSET 0x700 -#define PTP_STNSUR_ADDSUB_SHIFT 31 +/* IEEE 1588 PTP register offsets */ +#define PTP_TCR 0x00 /* Timestamp Control Reg */ +#define PTP_SSIR 0x04 /* Sub-Second Increment Reg */ +#define PTP_STSR 0x08 /* System Time – Seconds Regr */ +#define PTP_STNSR 0x0c /* System Time – Nanoseconds Reg */ +#define PTP_STSUR 0x10 /* System Time – Seconds Update Reg */ +#define PTP_STNSUR 0x14 /* System Time – Nanoseconds Update Reg */ +#define PTP_TAR 0x18 /* Timestamp Addend Reg */ -/* PTP TCR defines */ -#define PTP_TCR_TSENA 0x00000001 /* Timestamp Enable */ -#define PTP_TCR_TSCFUPDT 0x00000002 /* Timestamp Fine/Coarse Update */ -#define PTP_TCR_TSINIT 0x00000004 /* Timestamp Initialize */ -#define PTP_TCR_TSUPDT 0x00000008 /* Timestamp Update */ -/* Timestamp Interrupt Trigger Enable */ -#define PTP_TCR_TSTRIG 0x00000010 -#define PTP_TCR_TSADDREG 0x00000020 /* Addend Reg Update */ -#define PTP_TCR_TSENALL 0x00000100 /* Enable Timestamp for All Frames */ -/* Timestamp Digital or Binary Rollover Control */ -#define PTP_TCR_TSCTRLSSR 0x00000200 +#define PTP_STNSUR_ADDSUB_SHIFT 31 +#define PTP_DIGITAL_ROLLOVER_MODE 0x3B9ACA00 /* 10e9-1 ns */ +#define PTP_BINARY_ROLLOVER_MODE 0x80000000 /* ~0.466 ns */ +/* PTP Timestamp control register defines */ +#define PTP_TCR_TSENA BIT(0) /* Timestamp Enable */ +#define PTP_TCR_TSCFUPDT BIT(1) /* Timestamp Fine/Coarse Update */ +#define PTP_TCR_TSINIT BIT(2) /* Timestamp Initialize */ +#define PTP_TCR_TSUPDT BIT(3) /* Timestamp Update */ +#define PTP_TCR_TSTRIG BIT(4) /* Timestamp Interrupt Trigger Enable */ +#define PTP_TCR_TSADDREG BIT(5) /* Addend Reg Update */ +#define PTP_TCR_TSENALL BIT(8) /* Enable Timestamp for All Frames */ +#define PTP_TCR_TSCTRLSSR BIT(9) /* Digital or Binary Rollover Control */ /* Enable PTP packet Processing for Version 2 Format */ -#define PTP_TCR_TSVER2ENA 0x00000400 +#define PTP_TCR_TSVER2ENA BIT(10) /* Enable Processing of PTP over Ethernet Frames */ -#define PTP_TCR_TSIPENA 0x00000800 +#define PTP_TCR_TSIPENA BIT(11) /* Enable Processing of PTP Frames Sent over IPv6-UDP */ -#define PTP_TCR_TSIPV6ENA 0x00001000 +#define PTP_TCR_TSIPV6ENA BIT(12) /* Enable Processing of PTP Frames Sent over IPv4-UDP */ -#define PTP_TCR_TSIPV4ENA 0x00002000 +#define PTP_TCR_TSIPV4ENA BIT(13) /* Enable Timestamp Snapshot for Event Messages */ -#define PTP_TCR_TSEVNTENA 0x00004000 +#define PTP_TCR_TSEVNTENA BIT(14) /* Enable Snapshot for Messages Relevant to Master */ -#define PTP_TCR_TSMSTRENA 0x00008000 +#define PTP_TCR_TSMSTRENA BIT(15) /* Select PTP packets for Taking Snapshots */ -#define PTP_TCR_SNAPTYPSEL_1 0x00010000 +#define PTP_TCR_SNAPTYPSEL_1 GENMASK(17, 16) /* Enable MAC address for PTP Frame Filtering */ -#define PTP_TCR_TSENMACADDR 0x00040000 +#define PTP_TCR_TSENMACADDR BIT(18) + +/* SSIR defines */ +#define PTP_SSIR_SSINC_MASK 0xff +#define GMAC4_PTP_SSIR_SSINC_SHIFT 16 -#endif /* __STMMAC_PTP_H__ */ +#endif /* __STMMAC_PTP_H__ */ -- cgit v0.10.2 From ba1ffd74df74a9efa5290f87632a0ed55f1aa387 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Mon, 14 Nov 2016 09:27:29 +0100 Subject: stmmac: fix PTP support for GMAC4 Due to bad management of the descriptors, when use ptp4l, kernel panics as shown below: ----------------------------------------------------------- Unable to handle kernel NULL pointer dereference at virtual address 000001ac ... Internal error: Oops: 17 [#1] SMP ARM ... Hardware name: STi SoC with Flattened Device Tree task: c0c05e80 task.stack: c0c00000 PC is at dwmac4_wrback_get_tx_timestamp_status+0x0/0xc LR is at stmmac_tx_clean+0x2f8/0x4d4 ----------------------------------------------------------- In case of GMAC4 the extended descriptor pointers were used for getting the timestamp. These are NULL for this HW, and the normal ones must be used. The PTP also had problems on this chip due to the bad register management and issues on the algo adopted to setup the PTP and getting the timestamp values from the descriptors. Signed-off-by: Giuseppe Cavallaro Acked-by: Rayagond Kokatanur Acked-by: Alexandre TORGUE Acked-by: Richard Cochran Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index d3292c4..6fc214c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -482,11 +482,12 @@ struct stmmac_ops { /* PTP and HW Timer helpers */ struct stmmac_hwtimestamp { void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data); - u32 (*config_sub_second_increment) (void __iomem *ioaddr, u32 clk_rate); + u32 (*config_sub_second_increment)(void __iomem *ioaddr, u32 ptp_clock, + int gmac4); int (*init_systime) (void __iomem *ioaddr, u32 sec, u32 nsec); int (*config_addend) (void __iomem *ioaddr, u32 addend); int (*adjust_systime) (void __iomem *ioaddr, u32 sec, u32 nsec, - int add_sub); + int add_sub, int gmac4); u64(*get_systime) (void __iomem *ioaddr); }; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c index a1b17cd..2ef2f0c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c @@ -204,14 +204,18 @@ static void dwmac4_rd_enable_tx_timestamp(struct dma_desc *p) static int dwmac4_wrback_get_tx_timestamp_status(struct dma_desc *p) { - return (p->des3 & TDES3_TIMESTAMP_STATUS) - >> TDES3_TIMESTAMP_STATUS_SHIFT; + /* Context type from W/B descriptor must be zero */ + if (p->des3 & TDES3_CONTEXT_TYPE) + return -EINVAL; + + /* Tx Timestamp Status is 1 so des0 and des1'll have valid values */ + if (p->des3 & TDES3_TIMESTAMP_STATUS) + return 0; + + return 1; } -/* NOTE: For RX CTX bit has to be checked before - * HAVE a specific function for TX and another one for RX - */ -static u64 dwmac4_wrback_get_timestamp(void *desc, u32 ats) +static inline u64 dwmac4_get_timestamp(void *desc, u32 ats) { struct dma_desc *p = (struct dma_desc *)desc; u64 ns; @@ -223,12 +227,54 @@ static u64 dwmac4_wrback_get_timestamp(void *desc, u32 ats) return ns; } -static int dwmac4_context_get_rx_timestamp_status(void *desc, u32 ats) +static int dwmac4_rx_check_timestamp(void *desc) +{ + struct dma_desc *p = (struct dma_desc *)desc; + u32 own, ctxt; + int ret = 1; + + own = p->des3 & RDES3_OWN; + ctxt = ((p->des3 & RDES3_CONTEXT_DESCRIPTOR) + >> RDES3_CONTEXT_DESCRIPTOR_SHIFT); + + if (likely(!own && ctxt)) { + if ((p->des0 == 0xffffffff) && (p->des1 == 0xffffffff)) + /* Corrupted value */ + ret = -EINVAL; + else + /* A valid Timestamp is ready to be read */ + ret = 0; + } + + /* Timestamp not ready */ + return ret; +} + +static int dwmac4_wrback_get_rx_timestamp_status(void *desc, u32 ats) { struct dma_desc *p = (struct dma_desc *)desc; + int ret = -EINVAL; + + /* Get the status from normal w/b descriptor */ + if (likely(p->des3 & TDES3_RS1V)) { + if (likely(p->des1 & RDES1_TIMESTAMP_AVAILABLE)) { + int i = 0; + + /* Check if timestamp is OK from context descriptor */ + do { + ret = dwmac4_rx_check_timestamp(desc); + if (ret < 0) + goto exit; + i++; - return (p->des1 & RDES1_TIMESTAMP_AVAILABLE) - >> RDES1_TIMESTAMP_AVAILABLE_SHIFT; + } while ((ret == 1) || (i < 10)); + + if (i == 10) + ret = -EBUSY; + } + } +exit: + return ret; } static void dwmac4_rd_init_rx_desc(struct dma_desc *p, int disable_rx_ic, @@ -373,8 +419,8 @@ const struct stmmac_desc_ops dwmac4_desc_ops = { .get_rx_frame_len = dwmac4_wrback_get_rx_frame_len, .enable_tx_timestamp = dwmac4_rd_enable_tx_timestamp, .get_tx_timestamp_status = dwmac4_wrback_get_tx_timestamp_status, - .get_timestamp = dwmac4_wrback_get_timestamp, - .get_rx_timestamp_status = dwmac4_context_get_rx_timestamp_status, + .get_rx_timestamp_status = dwmac4_wrback_get_rx_timestamp_status, + .get_timestamp = dwmac4_get_timestamp, .set_tx_ic = dwmac4_rd_set_tx_ic, .prepare_tx_desc = dwmac4_rd_prepare_tx_desc, .prepare_tso_tx_desc = dwmac4_rd_prepare_tso_tx_desc, diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.h index 0902a2e..9736c50 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.h @@ -59,10 +59,13 @@ #define TDES3_CTXT_TCMSSV BIT(26) /* TDES3 Common */ +#define TDES3_RS1V BIT(26) +#define TDES3_RS1V_SHIFT 26 #define TDES3_LAST_DESCRIPTOR BIT(28) #define TDES3_LAST_DESCRIPTOR_SHIFT 28 #define TDES3_FIRST_DESCRIPTOR BIT(29) #define TDES3_CONTEXT_TYPE BIT(30) +#define TDES3_CONTEXT_TYPE_SHIFT 30 /* TDS3 use for both format (read and write back) */ #define TDES3_OWN BIT(31) @@ -117,6 +120,7 @@ #define RDES3_LAST_DESCRIPTOR BIT(28) #define RDES3_FIRST_DESCRIPTOR BIT(29) #define RDES3_CONTEXT_DESCRIPTOR BIT(30) +#define RDES3_CONTEXT_DESCRIPTOR_SHIFT 30 /* RDES3 (read format) */ #define RDES3_BUFFER1_VALID_ADDR BIT(24) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index b15fc55..4d2a759 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -129,6 +129,7 @@ struct stmmac_priv { int irq_wake; spinlock_t ptp_lock; void __iomem *mmcaddr; + void __iomem *ptpaddr; u32 rx_tail_addr; u32 tx_tail_addr; u32 mss; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c index a77f689..10d6059 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c @@ -34,21 +34,29 @@ static void stmmac_config_hw_tstamping(void __iomem *ioaddr, u32 data) } static u32 stmmac_config_sub_second_increment(void __iomem *ioaddr, - u32 ptp_clock) + u32 ptp_clock, int gmac4) { u32 value = readl(ioaddr + PTP_TCR); unsigned long data; - /* Convert the ptp_clock to nano second - * formula = (2/ptp_clock) * 1000000000 - * where, ptp_clock = 50MHz. + /* For GMAC3.x, 4.x versions, convert the ptp_clock to nano second + * formula = (1/ptp_clock) * 1000000000 + * where ptp_clock is 50MHz if fine method is used to update system */ - data = (2000000000ULL / ptp_clock); + if (value & PTP_TCR_TSCFUPDT) + data = (1000000000ULL / 50000000); + else + data = (1000000000ULL / ptp_clock); /* 0.465ns accuracy */ if (!(value & PTP_TCR_TSCTRLSSR)) data = (data * 1000) / 465; + data &= PTP_SSIR_SSINC_MASK; + + if (gmac4) + data = data << GMAC4_PTP_SSIR_SSINC_SHIFT; + writel(data, ioaddr + PTP_SSIR); return data; @@ -104,14 +112,30 @@ static int stmmac_config_addend(void __iomem *ioaddr, u32 addend) } static int stmmac_adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec, - int add_sub) + int add_sub, int gmac4) { u32 value; int limit; + if (add_sub) { + /* If the new sec value needs to be subtracted with + * the system time, then MAC_STSUR reg should be + * programmed with (2^32 – ) + */ + if (gmac4) + sec = (100000000ULL - sec); + + value = readl(ioaddr + PTP_TCR); + if (value & PTP_TCR_TSCTRLSSR) + nsec = (PTP_DIGITAL_ROLLOVER_MODE - nsec); + else + nsec = (PTP_BINARY_ROLLOVER_MODE - nsec); + } + writel(sec, ioaddr + PTP_STSUR); - writel(((add_sub << PTP_STNSUR_ADDSUB_SHIFT) | nsec), - ioaddr + PTP_STNSUR); + value = (add_sub << PTP_STNSUR_ADDSUB_SHIFT) | nsec; + writel(value, ioaddr + PTP_STNSUR); + /* issue command to initialize the system time value */ value = readl(ioaddr + PTP_TCR); value |= PTP_TCR_TSUPDT; @@ -134,8 +158,9 @@ static u64 stmmac_get_systime(void __iomem *ioaddr) { u64 ns; + /* Get the TSSS value */ ns = readl(ioaddr + PTP_STNSR); - /* convert sec time value to nanosecond */ + /* Get the TSS and convert sec time value to nanosecond */ ns += readl(ioaddr + PTP_STSR) * 1000000000ULL; return ns; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e2c94ec..1f9ec02 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -340,18 +340,17 @@ out: /* stmmac_get_tx_hwtstamp - get HW TX timestamps * @priv: driver private structure - * @entry : descriptor index to be used. + * @p : descriptor pointer * @skb : the socket buffer * Description : * This function will read timestamp from the descriptor & pass it to stack. * and also perform some sanity checks. */ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv, - unsigned int entry, struct sk_buff *skb) + struct dma_desc *p, struct sk_buff *skb) { struct skb_shared_hwtstamps shhwtstamp; u64 ns; - void *desc = NULL; if (!priv->hwts_tx_en) return; @@ -360,58 +359,55 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv, if (likely(!skb || !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))) return; - if (priv->adv_ts) - desc = (priv->dma_etx + entry); - else - desc = (priv->dma_tx + entry); - /* check tx tstamp status */ - if (!priv->hw->desc->get_tx_timestamp_status((struct dma_desc *)desc)) - return; + if (!priv->hw->desc->get_tx_timestamp_status(p)) { + /* get the valid tstamp */ + ns = priv->hw->desc->get_timestamp(p, priv->adv_ts); - /* get the valid tstamp */ - ns = priv->hw->desc->get_timestamp(desc, priv->adv_ts); + memset(&shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps)); + shhwtstamp.hwtstamp = ns_to_ktime(ns); - memset(&shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps)); - shhwtstamp.hwtstamp = ns_to_ktime(ns); - /* pass tstamp to stack */ - skb_tstamp_tx(skb, &shhwtstamp); + netdev_info(priv->dev, "get valid TX hw timestamp %llu\n", ns); + /* pass tstamp to stack */ + skb_tstamp_tx(skb, &shhwtstamp); + } return; } /* stmmac_get_rx_hwtstamp - get HW RX timestamps * @priv: driver private structure - * @entry : descriptor index to be used. + * @p : descriptor pointer + * @np : next descriptor pointer * @skb : the socket buffer * Description : * This function will read received packet's timestamp from the descriptor * and pass it to stack. It also perform some sanity checks. */ -static void stmmac_get_rx_hwtstamp(struct stmmac_priv *priv, - unsigned int entry, struct sk_buff *skb) +static void stmmac_get_rx_hwtstamp(struct stmmac_priv *priv, struct dma_desc *p, + struct dma_desc *np, struct sk_buff *skb) { struct skb_shared_hwtstamps *shhwtstamp = NULL; u64 ns; - void *desc = NULL; if (!priv->hwts_rx_en) return; - if (priv->adv_ts) - desc = (priv->dma_erx + entry); - else - desc = (priv->dma_rx + entry); - - /* exit if rx tstamp is not valid */ - if (!priv->hw->desc->get_rx_timestamp_status(desc, priv->adv_ts)) - return; + /* Check if timestamp is available */ + if (!priv->hw->desc->get_rx_timestamp_status(p, priv->adv_ts)) { + /* For GMAC4, the valid timestamp is from CTX next desc. */ + if (priv->plat->has_gmac4) + ns = priv->hw->desc->get_timestamp(np, priv->adv_ts); + else + ns = priv->hw->desc->get_timestamp(p, priv->adv_ts); - /* get valid tstamp */ - ns = priv->hw->desc->get_timestamp(desc, priv->adv_ts); - shhwtstamp = skb_hwtstamps(skb); - memset(shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps)); - shhwtstamp->hwtstamp = ns_to_ktime(ns); + netdev_info(priv->dev, "get valid RX hw timestamp %llu\n", ns); + shhwtstamp = skb_hwtstamps(skb); + memset(shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps)); + shhwtstamp->hwtstamp = ns_to_ktime(ns); + } else { + netdev_err(priv->dev, "cannot get RX hw timestamp\n"); + } } /** @@ -598,17 +594,18 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) priv->hwts_tx_en = config.tx_type == HWTSTAMP_TX_ON; if (!priv->hwts_tx_en && !priv->hwts_rx_en) - priv->hw->ptp->config_hw_tstamping(priv->ioaddr, 0); + priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, 0); else { value = (PTP_TCR_TSENA | PTP_TCR_TSCFUPDT | PTP_TCR_TSCTRLSSR | tstamp_all | ptp_v2 | ptp_over_ethernet | ptp_over_ipv6_udp | ptp_over_ipv4_udp | ts_event_en | ts_master_en | snap_type_sel); - priv->hw->ptp->config_hw_tstamping(priv->ioaddr, value); + priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, value); /* program Sub Second Increment reg */ sec_inc = priv->hw->ptp->config_sub_second_increment( - priv->ioaddr, priv->clk_ptp_rate); + priv->ptpaddr, priv->clk_ptp_rate, + priv->plat->has_gmac4); temp = div_u64(1000000000ULL, sec_inc); /* calculate default added value: @@ -618,14 +615,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) */ temp = (u64)(temp << 32); priv->default_addend = div_u64(temp, priv->clk_ptp_rate); - priv->hw->ptp->config_addend(priv->ioaddr, + priv->hw->ptp->config_addend(priv->ptpaddr, priv->default_addend); /* initialize system time */ ktime_get_real_ts64(&now); /* lower 32 bits of tv_sec are safe until y2106 */ - priv->hw->ptp->init_systime(priv->ioaddr, (u32)now.tv_sec, + priv->hw->ptp->init_systime(priv->ptpaddr, (u32)now.tv_sec, now.tv_nsec); } @@ -1340,7 +1337,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) priv->dev->stats.tx_packets++; priv->xstats.tx_pkt_n++; } - stmmac_get_tx_hwtstamp(priv, entry, skb); + stmmac_get_tx_hwtstamp(priv, p, skb); } if (likely(priv->tx_skbuff_dma[entry].buf)) { @@ -1486,10 +1483,13 @@ static void stmmac_mmc_setup(struct stmmac_priv *priv) unsigned int mode = MMC_CNTRL_RESET_ON_READ | MMC_CNTRL_COUNTER_RESET | MMC_CNTRL_PRESET | MMC_CNTRL_FULL_HALF_PRESET; - if (priv->synopsys_id >= DWMAC_CORE_4_00) + if (priv->synopsys_id >= DWMAC_CORE_4_00) { + priv->ptpaddr = priv->ioaddr + PTP_GMAC4_OFFSET; priv->mmcaddr = priv->ioaddr + MMC_GMAC4_OFFSET; - else + } else { + priv->ptpaddr = priv->ioaddr + PTP_GMAC3_X_OFFSET; priv->mmcaddr = priv->ioaddr + MMC_GMAC3_X_OFFSET; + } dwmac_mmc_intr_all_mask(priv->mmcaddr); @@ -2484,7 +2484,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) if (netif_msg_rx_status(priv)) { void *rx_head; - pr_debug("%s: descriptor ring:\n", __func__); + pr_info(">>>>>> %s: descriptor ring:\n", __func__); if (priv->extend_desc) rx_head = (void *)priv->dma_erx; else @@ -2495,6 +2495,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) while (count < limit) { int status; struct dma_desc *p; + struct dma_desc *np; if (priv->extend_desc) p = (struct dma_desc *)(priv->dma_erx + entry); @@ -2514,9 +2515,11 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) next_entry = priv->cur_rx; if (priv->extend_desc) - prefetch(priv->dma_erx + next_entry); + np = (struct dma_desc *)(priv->dma_erx + next_entry); else - prefetch(priv->dma_rx + next_entry); + np = priv->dma_rx + next_entry; + + prefetch(np); if ((priv->extend_desc) && (priv->hw->desc->rx_extended_status)) priv->hw->desc->rx_extended_status(&priv->dev->stats, @@ -2568,7 +2571,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) frame_len -= ETH_FCS_LEN; if (netif_msg_rx_status(priv)) { - pr_debug("\tdesc: %p [entry %d] buff=0x%x\n", + pr_info("\tdesc: %p [entry %d] buff=0x%x\n", p, entry, des); if (frame_len > ETH_FRAME_LEN) pr_debug("\tframe size %d, COE: %d\n", @@ -2625,13 +2628,13 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit) DMA_FROM_DEVICE); } - stmmac_get_rx_hwtstamp(priv, entry, skb); - if (netif_msg_pktdata(priv)) { pr_debug("frame received (%dbytes)", frame_len); print_pkt(skb->data, frame_len); } + stmmac_get_rx_hwtstamp(priv, p, np, skb); + stmmac_rx_vlan(priv->dev, skb); skb->protocol = eth_type_trans(skb, priv->dev); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c index 1477471..3eb281d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -54,7 +54,7 @@ static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb) spin_lock_irqsave(&priv->ptp_lock, flags); - priv->hw->ptp->config_addend(priv->ioaddr, addend); + priv->hw->ptp->config_addend(priv->ptpaddr, addend); spin_unlock_irqrestore(&priv->ptp_lock, flags); @@ -89,7 +89,8 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta) spin_lock_irqsave(&priv->ptp_lock, flags); - priv->hw->ptp->adjust_systime(priv->ioaddr, sec, nsec, neg_adj); + priv->hw->ptp->adjust_systime(priv->ptpaddr, sec, nsec, neg_adj, + priv->plat->has_gmac4); spin_unlock_irqrestore(&priv->ptp_lock, flags); @@ -114,7 +115,7 @@ static int stmmac_get_time(struct ptp_clock_info *ptp, struct timespec64 *ts) spin_lock_irqsave(&priv->ptp_lock, flags); - ns = priv->hw->ptp->get_systime(priv->ioaddr); + ns = priv->hw->ptp->get_systime(priv->ptpaddr); spin_unlock_irqrestore(&priv->ptp_lock, flags); @@ -141,7 +142,7 @@ static int stmmac_set_time(struct ptp_clock_info *ptp, spin_lock_irqsave(&priv->ptp_lock, flags); - priv->hw->ptp->init_systime(priv->ioaddr, ts->tv_sec, ts->tv_nsec); + priv->hw->ptp->init_systime(priv->ptpaddr, ts->tv_sec, ts->tv_nsec); spin_unlock_irqrestore(&priv->ptp_lock, flags); -- cgit v0.10.2 From ee112c12ebd22baca85812175008ef584250e415 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Mon, 14 Nov 2016 09:27:30 +0100 Subject: stmmac: fix PTP type ethtool stats This patch fixes the ethtool stats for PTP frames; previous version does not take care about some message types: i.e. announce, management and signaling. It also provided a broken statistic in case of "No PTP message received". Signed-off-by: Giuseppe Cavallaro Acked-by: Rayagond Kokatanur Acked-by: Alexandre TORGUE Acked-by: Richard Cochran Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 6fc214c..6d2de4e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -120,14 +120,17 @@ struct stmmac_extra_stats { unsigned long ip_csum_bypassed; unsigned long ipv4_pkt_rcvd; unsigned long ipv6_pkt_rcvd; - unsigned long rx_msg_type_ext_no_ptp; - unsigned long rx_msg_type_sync; - unsigned long rx_msg_type_follow_up; - unsigned long rx_msg_type_delay_req; - unsigned long rx_msg_type_delay_resp; - unsigned long rx_msg_type_pdelay_req; - unsigned long rx_msg_type_pdelay_resp; - unsigned long rx_msg_type_pdelay_follow_up; + unsigned long no_ptp_rx_msg_type_ext; + unsigned long ptp_rx_msg_type_sync; + unsigned long ptp_rx_msg_type_follow_up; + unsigned long ptp_rx_msg_type_delay_req; + unsigned long ptp_rx_msg_type_delay_resp; + unsigned long ptp_rx_msg_type_pdelay_req; + unsigned long ptp_rx_msg_type_pdelay_resp; + unsigned long ptp_rx_msg_type_pdelay_follow_up; + unsigned long ptp_rx_msg_type_announce; + unsigned long ptp_rx_msg_type_management; + unsigned long ptp_rx_msg_pkt_reserved_type; unsigned long ptp_frame_type; unsigned long ptp_ver; unsigned long timestamp_dropped; diff --git a/drivers/net/ethernet/stmicro/stmmac/descs.h b/drivers/net/ethernet/stmicro/stmmac/descs.h index 2e4c171..e3c86d4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/descs.h +++ b/drivers/net/ethernet/stmicro/stmmac/descs.h @@ -155,14 +155,18 @@ #define ERDES4_L3_L4_FILT_NO_MATCH_MASK GENMASK(27, 26) /* Extended RDES4 message type definitions */ -#define RDES_EXT_NO_PTP 0 -#define RDES_EXT_SYNC 1 -#define RDES_EXT_FOLLOW_UP 2 -#define RDES_EXT_DELAY_REQ 3 -#define RDES_EXT_DELAY_RESP 4 -#define RDES_EXT_PDELAY_REQ 5 -#define RDES_EXT_PDELAY_RESP 6 -#define RDES_EXT_PDELAY_FOLLOW_UP 7 +#define RDES_EXT_NO_PTP 0x0 +#define RDES_EXT_SYNC 0x1 +#define RDES_EXT_FOLLOW_UP 0x2 +#define RDES_EXT_DELAY_REQ 0x3 +#define RDES_EXT_DELAY_RESP 0x4 +#define RDES_EXT_PDELAY_REQ 0x5 +#define RDES_EXT_PDELAY_RESP 0x6 +#define RDES_EXT_PDELAY_FOLLOW_UP 0x7 +#define RDES_PTP_ANNOUNCE 0x8 +#define RDES_PTP_MANAGEMENT 0x9 +#define RDES_PTP_SIGNALING 0xa +#define RDES_PTP_PKT_RESERVED_TYPE 0xf /* Basic descriptor structure for normal and alternate descriptors */ struct dma_desc { diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c index 2ef2f0c..a601f8d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c @@ -123,22 +123,29 @@ static int dwmac4_wrback_get_rx_status(void *data, struct stmmac_extra_stats *x, x->ipv4_pkt_rcvd++; if (rdes1 & RDES1_IPV6_HEADER) x->ipv6_pkt_rcvd++; - if (message_type == RDES_EXT_SYNC) - x->rx_msg_type_sync++; + + if (message_type == RDES_EXT_NO_PTP) + x->no_ptp_rx_msg_type_ext++; + else if (message_type == RDES_EXT_SYNC) + x->ptp_rx_msg_type_sync++; else if (message_type == RDES_EXT_FOLLOW_UP) - x->rx_msg_type_follow_up++; + x->ptp_rx_msg_type_follow_up++; else if (message_type == RDES_EXT_DELAY_REQ) - x->rx_msg_type_delay_req++; + x->ptp_rx_msg_type_delay_req++; else if (message_type == RDES_EXT_DELAY_RESP) - x->rx_msg_type_delay_resp++; + x->ptp_rx_msg_type_delay_resp++; else if (message_type == RDES_EXT_PDELAY_REQ) - x->rx_msg_type_pdelay_req++; + x->ptp_rx_msg_type_pdelay_req++; else if (message_type == RDES_EXT_PDELAY_RESP) - x->rx_msg_type_pdelay_resp++; + x->ptp_rx_msg_type_pdelay_resp++; else if (message_type == RDES_EXT_PDELAY_FOLLOW_UP) - x->rx_msg_type_pdelay_follow_up++; - else - x->rx_msg_type_ext_no_ptp++; + x->ptp_rx_msg_type_pdelay_follow_up++; + else if (message_type == RDES_PTP_ANNOUNCE) + x->ptp_rx_msg_type_announce++; + else if (message_type == RDES_PTP_MANAGEMENT) + x->ptp_rx_msg_type_management++; + else if (message_type == RDES_PTP_PKT_RESERVED_TYPE) + x->ptp_rx_msg_pkt_reserved_type++; if (rdes1 & RDES1_PTP_PACKET_TYPE) x->ptp_frame_type++; diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c index 38f19c9..e755493 100644 --- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c +++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c @@ -150,22 +150,30 @@ static void enh_desc_get_ext_status(void *data, struct stmmac_extra_stats *x, x->ipv4_pkt_rcvd++; if (rdes4 & ERDES4_IPV6_PKT_RCVD) x->ipv6_pkt_rcvd++; - if (message_type == RDES_EXT_SYNC) - x->rx_msg_type_sync++; + + if (message_type == RDES_EXT_NO_PTP) + x->no_ptp_rx_msg_type_ext++; + else if (message_type == RDES_EXT_SYNC) + x->ptp_rx_msg_type_sync++; else if (message_type == RDES_EXT_FOLLOW_UP) - x->rx_msg_type_follow_up++; + x->ptp_rx_msg_type_follow_up++; else if (message_type == RDES_EXT_DELAY_REQ) - x->rx_msg_type_delay_req++; + x->ptp_rx_msg_type_delay_req++; else if (message_type == RDES_EXT_DELAY_RESP) - x->rx_msg_type_delay_resp++; + x->ptp_rx_msg_type_delay_resp++; else if (message_type == RDES_EXT_PDELAY_REQ) - x->rx_msg_type_pdelay_req++; + x->ptp_rx_msg_type_pdelay_req++; else if (message_type == RDES_EXT_PDELAY_RESP) - x->rx_msg_type_pdelay_resp++; + x->ptp_rx_msg_type_pdelay_resp++; else if (message_type == RDES_EXT_PDELAY_FOLLOW_UP) - x->rx_msg_type_pdelay_follow_up++; - else - x->rx_msg_type_ext_no_ptp++; + x->ptp_rx_msg_type_pdelay_follow_up++; + else if (message_type == RDES_PTP_ANNOUNCE) + x->ptp_rx_msg_type_announce++; + else if (message_type == RDES_PTP_MANAGEMENT) + x->ptp_rx_msg_type_management++; + else if (message_type == RDES_PTP_PKT_RESERVED_TYPE) + x->ptp_rx_msg_pkt_reserved_type++; + if (rdes4 & ERDES4_PTP_FRAME_TYPE) x->ptp_frame_type++; if (rdes4 & ERDES4_PTP_VER) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 1e06173..c5d0142 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -115,14 +115,17 @@ static const struct stmmac_stats stmmac_gstrings_stats[] = { STMMAC_STAT(ip_csum_bypassed), STMMAC_STAT(ipv4_pkt_rcvd), STMMAC_STAT(ipv6_pkt_rcvd), - STMMAC_STAT(rx_msg_type_ext_no_ptp), - STMMAC_STAT(rx_msg_type_sync), - STMMAC_STAT(rx_msg_type_follow_up), - STMMAC_STAT(rx_msg_type_delay_req), - STMMAC_STAT(rx_msg_type_delay_resp), - STMMAC_STAT(rx_msg_type_pdelay_req), - STMMAC_STAT(rx_msg_type_pdelay_resp), - STMMAC_STAT(rx_msg_type_pdelay_follow_up), + STMMAC_STAT(no_ptp_rx_msg_type_ext), + STMMAC_STAT(ptp_rx_msg_type_sync), + STMMAC_STAT(ptp_rx_msg_type_follow_up), + STMMAC_STAT(ptp_rx_msg_type_delay_req), + STMMAC_STAT(ptp_rx_msg_type_delay_resp), + STMMAC_STAT(ptp_rx_msg_type_pdelay_req), + STMMAC_STAT(ptp_rx_msg_type_pdelay_resp), + STMMAC_STAT(ptp_rx_msg_type_pdelay_follow_up), + STMMAC_STAT(ptp_rx_msg_type_announce), + STMMAC_STAT(ptp_rx_msg_type_management), + STMMAC_STAT(ptp_rx_msg_pkt_reserved_type), STMMAC_STAT(ptp_frame_type), STMMAC_STAT(ptp_ver), STMMAC_STAT(timestamp_dropped), -- cgit v0.10.2 From c7a4e3d8c0d43a4f31f8b2ccf476e5a26eb85142 Mon Sep 17 00:00:00 2001 From: Alexander Kochetkov Date: Mon, 14 Nov 2016 16:32:52 +0300 Subject: net: arc_emac: annonce IFF_MULTICAST support Multicast support was implemented by commit 775dd682e2b0ec7 ('arc_emac: implement promiscuous mode and multicast filtering'). It can be enabled explicity using 'ifconfig eth0 multicast'. The patch is needed in order to remove explicit configuration as most devices has multicast mode enabled by default. Signed-off-by: Alexander Kochetkov Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index b0da969..2e4ee86 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -764,8 +764,6 @@ int arc_emac_probe(struct net_device *ndev, int interface) ndev->netdev_ops = &arc_emac_netdev_ops; ndev->ethtool_ops = &arc_emac_ethtool_ops; ndev->watchdog_timeo = TX_TIMEOUT; - /* FIXME :: no multicast support yet */ - ndev->flags &= ~IFF_MULTICAST; priv = netdev_priv(ndev); priv->dev = dev; -- cgit v0.10.2 From d0e3f65b34c528ec2b7d1ba9a620b483f71788d3 Mon Sep 17 00:00:00 2001 From: Alexander Kochetkov Date: Mon, 14 Nov 2016 16:32:53 +0300 Subject: net: arc_emac: don't pass multicast packets to kernel in non-multicast mode The patch disable capturing multicast packets when multicast mode disabled for ethernet ('ifconfig eth0 -multicast'). In that case no multicast packet will be passed to kernel. Signed-off-by: Alexander Kochetkov Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index 2e4ee86..be865b4 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -460,7 +460,7 @@ static void arc_emac_set_rx_mode(struct net_device *ndev) if (ndev->flags & IFF_ALLMULTI) { arc_reg_set(priv, R_LAFL, ~0); arc_reg_set(priv, R_LAFH, ~0); - } else { + } else if (ndev->flags & IFF_MULTICAST) { struct netdev_hw_addr *ha; unsigned int filter[2] = { 0, 0 }; int bit; @@ -472,6 +472,9 @@ static void arc_emac_set_rx_mode(struct net_device *ndev) arc_reg_set(priv, R_LAFL, filter[0]); arc_reg_set(priv, R_LAFH, filter[1]); + } else { + arc_reg_set(priv, R_LAFL, 0); + arc_reg_set(priv, R_LAFH, 0); } } } -- cgit v0.10.2 From 73e2d5e34b6cdd1080038daf3d6d6d744a9eefe6 Mon Sep 17 00:00:00 2001 From: Pablo Neira Date: Mon, 14 Nov 2016 23:40:30 +0100 Subject: udp: restore UDPlite many-cast delivery Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(), otherwise udplite broadcast/multicast use the wrong table and it breaks. Fixes: 2dc41cff7545 ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.") Signed-off-by: Pablo Neira Ayuso Acked-by: Eric Dumazet Signed-off-by: David S. Miller diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d123d68..0de9d5d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1652,10 +1652,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb, if (use_hash2) { hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) & - udp_table.mask; - hash2 = udp4_portaddr_hash(net, daddr, hnum) & udp_table.mask; + udptable->mask; + hash2 = udp4_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: - hslot = &udp_table.hash2[hash2]; + hslot = &udptable->hash2[hash2]; offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index b2ef061..e5056d4 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -706,10 +706,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb, if (use_hash2) { hash2_any = udp6_portaddr_hash(net, &in6addr_any, hnum) & - udp_table.mask; - hash2 = udp6_portaddr_hash(net, daddr, hnum) & udp_table.mask; + udptable->mask; + hash2 = udp6_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: - hslot = &udp_table.hash2[hash2]; + hslot = &udptable->hash2[hash2]; offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node); } -- cgit v0.10.2 From e88a2766143a27bfe6704b4493b214de4094cf29 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 14 Nov 2016 16:28:42 -0800 Subject: gro_cells: mark napi struct as not busy poll candidates Rolf Neugebauer reported very long delays at netns dismantle. Eric W. Biederman was kind enough to look at this problem and noticed synchronize_net() occurring from netif_napi_del() that was added in linux-4.5 Busy polling makes no sense for tunnels NAPI. If busy poll is used for sessions over tunnels, the poller will need to poll the physical device queue anyway. netif_tx_napi_add() could be used here, but function name is misleading, and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL bit directly. This will avoid inserting gro_cells napi structures in napi_hash[] and avoid the problematic synchronize_net() (per possible cpu) that Rolf reported. Fixes: 93d05d4a320c ("net: provide generic busy polling to all NAPI drivers") Signed-off-by: Eric Dumazet Reported-by: Rolf Neugebauer Reported-by: Eric W. Biederman Acked-by: Cong Wang Tested-by: Rolf Neugebauer Signed-off-by: David S. Miller diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h index d15214d..2a1abbf 100644 --- a/include/net/gro_cells.h +++ b/include/net/gro_cells.h @@ -68,6 +68,9 @@ static inline int gro_cells_init(struct gro_cells *gcells, struct net_device *de struct gro_cell *cell = per_cpu_ptr(gcells->cells, i); __skb_queue_head_init(&cell->napi_skbs); + + set_bit(NAPI_STATE_NO_BUSY_POLL, &cell->napi.state); + netif_napi_add(dev, &cell->napi, gro_cell_poll, 64); napi_enable(&cell->napi); } -- cgit v0.10.2 From 7e75f74a171a8146cc3ee92d5562878b40c25fb5 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Tue, 15 Nov 2016 10:39:03 +0100 Subject: rtnetlink: fix rtnl_vfinfo_size The size reported by rtnl_vfinfo_size doesn't match the space used by rtnl_fill_vfinfo. rtnl_vfinfo_size currently doesn't account for the nest attributes used by statistics (added in commit 3b766cd83232), nor for struct ifla_vf_tx_rate (since commit ed616689a3d9, which added ifla_vf_rate to the dump without removing ifla_vf_tx_rate, but replaced ifla_vf_tx_rate with ifla_vf_rate in the size computation). Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice") Fixes: ed616689a3d9 ("net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index db313ec..96f4bf2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -840,18 +840,20 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, if (dev->dev.parent && dev_is_pci(dev->dev.parent) && (ext_filter_mask & RTEXT_FILTER_VF)) { int num_vfs = dev_num_vf(dev->dev.parent); - size_t size = nla_total_size(sizeof(struct nlattr)); - size += nla_total_size(num_vfs * sizeof(struct nlattr)); + size_t size = nla_total_size(0); size += num_vfs * - (nla_total_size(sizeof(struct ifla_vf_mac)) + - nla_total_size(MAX_VLAN_LIST_LEN * - sizeof(struct nlattr)) + + (nla_total_size(0) + + nla_total_size(sizeof(struct ifla_vf_mac)) + + nla_total_size(sizeof(struct ifla_vf_vlan)) + + nla_total_size(0) + /* nest IFLA_VF_VLAN_LIST */ nla_total_size(MAX_VLAN_LIST_LEN * sizeof(struct ifla_vf_vlan_info)) + nla_total_size(sizeof(struct ifla_vf_spoofchk)) + + nla_total_size(sizeof(struct ifla_vf_tx_rate)) + nla_total_size(sizeof(struct ifla_vf_rate)) + nla_total_size(sizeof(struct ifla_vf_link_state)) + nla_total_size(sizeof(struct ifla_vf_rss_query_en)) + + nla_total_size(0) + /* nest IFLA_VF_STATS */ /* IFLA_VF_STATS_RX_PACKETS */ nla_total_size_64bit(sizeof(__u64)) + /* IFLA_VF_STATS_TX_PACKETS */ -- cgit v0.10.2 From b3cfaa31e3851c743d3f9d3441710f7ff6f7e868 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Tue, 15 Nov 2016 11:16:35 +0100 Subject: rtnetlink: fix rtnl message size computation for XDP rtnl_xdp_size() only considers the size of the actual payload attribute, and misses the space taken by the attribute used for nesting (IFLA_XDP). Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog") Signed-off-by: Sabrina Dubroca Reviewed-by: Brenden Blanco Signed-off-by: David S. Miller diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 96f4bf2..a6529c5 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -901,7 +901,8 @@ static size_t rtnl_port_size(const struct net_device *dev, static size_t rtnl_xdp_size(const struct net_device *dev) { - size_t xdp_size = nla_total_size(1); /* XDP_ATTACHED */ + size_t xdp_size = nla_total_size(0) + /* nest IFLA_XDP */ + nla_total_size(1); /* XDP_ATTACHED */ if (!dev->netdev_ops->ndo_xdp) return 0; -- cgit v0.10.2 From f23cc643f9baec7f71f2b74692da3cf03abbbfda Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 14 Nov 2016 15:45:36 -0500 Subject: bpf: fix range arithmetic for bpf map access I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn Signed-off-by: Josef Bacik Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 7035b99..6aaf425 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -14,7 +14,7 @@ * are obviously wrong for any sort of memory access. */ #define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024) -#define BPF_REGISTER_MIN_RANGE -(1024 * 1024 * 1024) +#define BPF_REGISTER_MIN_RANGE -1 struct bpf_reg_state { enum bpf_reg_type type; @@ -22,7 +22,8 @@ struct bpf_reg_state { * Used to determine if any memory access using this register will * result in a bad access. */ - u64 min_value, max_value; + s64 min_value; + u64 max_value; union { /* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */ s64 imm; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 99a7e5b..6a93615 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -216,8 +216,8 @@ static void print_verifier_state(struct bpf_verifier_state *state) reg->map_ptr->key_size, reg->map_ptr->value_size); if (reg->min_value != BPF_REGISTER_MIN_RANGE) - verbose(",min_value=%llu", - (unsigned long long)reg->min_value); + verbose(",min_value=%lld", + (long long)reg->min_value); if (reg->max_value != BPF_REGISTER_MAX_RANGE) verbose(",max_value=%llu", (unsigned long long)reg->max_value); @@ -758,7 +758,7 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off, * index'es we need to make sure that whatever we use * will have a set floor within our range. */ - if ((s64)reg->min_value < 0) { + if (reg->min_value < 0) { verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n", regno); return -EACCES; @@ -1468,7 +1468,8 @@ static void check_reg_overflow(struct bpf_reg_state *reg) { if (reg->max_value > BPF_REGISTER_MAX_RANGE) reg->max_value = BPF_REGISTER_MAX_RANGE; - if ((s64)reg->min_value < BPF_REGISTER_MIN_RANGE) + if (reg->min_value < BPF_REGISTER_MIN_RANGE || + reg->min_value > BPF_REGISTER_MAX_RANGE) reg->min_value = BPF_REGISTER_MIN_RANGE; } @@ -1476,7 +1477,8 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, struct bpf_insn *insn) { struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg; - u64 min_val = BPF_REGISTER_MIN_RANGE, max_val = BPF_REGISTER_MAX_RANGE; + s64 min_val = BPF_REGISTER_MIN_RANGE; + u64 max_val = BPF_REGISTER_MAX_RANGE; bool min_set = false, max_set = false; u8 opcode = BPF_OP(insn->code); @@ -1512,22 +1514,43 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, return; } + /* If one of our values was at the end of our ranges then we can't just + * do our normal operations to the register, we need to set the values + * to the min/max since they are undefined. + */ + if (min_val == BPF_REGISTER_MIN_RANGE) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + if (max_val == BPF_REGISTER_MAX_RANGE) + dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + switch (opcode) { case BPF_ADD: - dst_reg->min_value += min_val; - dst_reg->max_value += max_val; + if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) + dst_reg->min_value += min_val; + if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) + dst_reg->max_value += max_val; break; case BPF_SUB: - dst_reg->min_value -= min_val; - dst_reg->max_value -= max_val; + if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) + dst_reg->min_value -= min_val; + if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) + dst_reg->max_value -= max_val; break; case BPF_MUL: - dst_reg->min_value *= min_val; - dst_reg->max_value *= max_val; + if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) + dst_reg->min_value *= min_val; + if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) + dst_reg->max_value *= max_val; break; case BPF_AND: - /* & is special since it could end up with 0 bits set. */ - dst_reg->min_value &= min_val; + /* Disallow AND'ing of negative numbers, ain't nobody got time + * for that. Otherwise the minimum is 0 and the max is the max + * value we could AND against. + */ + if (min_val < 0) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + else + dst_reg->min_value = 0; dst_reg->max_value = max_val; break; case BPF_LSH: @@ -1537,24 +1560,25 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, */ if (min_val > ilog2(BPF_REGISTER_MAX_RANGE)) dst_reg->min_value = BPF_REGISTER_MIN_RANGE; - else + else if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) dst_reg->min_value <<= min_val; if (max_val > ilog2(BPF_REGISTER_MAX_RANGE)) dst_reg->max_value = BPF_REGISTER_MAX_RANGE; - else + else if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) dst_reg->max_value <<= max_val; break; case BPF_RSH: - dst_reg->min_value >>= min_val; - dst_reg->max_value >>= max_val; - break; - case BPF_MOD: - /* % is special since it is an unsigned modulus, so the floor - * will always be 0. + /* RSH by a negative number is undefined, and the BPF_RSH is an + * unsigned shift, so make the appropriate casts. */ - dst_reg->min_value = 0; - dst_reg->max_value = max_val - 1; + if (min_val < 0 || dst_reg->min_value < 0) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + else + dst_reg->min_value = + (u64)(dst_reg->min_value) >> min_val; + if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) + dst_reg->max_value >>= max_val; break; default: reset_reg_range_values(regs, insn->dst_reg); -- cgit v0.10.2 From 3b7093346b326e5d3590c7d49f6aefe6fa5b2c9a Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Tue, 15 Nov 2016 05:46:06 -0500 Subject: ipv4: Restore fib_trie_flush_external function and fix call ordering The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure") Reported-by: Eric Dumazet Cc: Jiri Pirko Signed-off-by: Alexander Duyck Acked-by: Eric Dumazet Signed-off-by: David S. Miller diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index b9314b4..f390c3b 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -243,6 +243,7 @@ int fib_table_dump(struct fib_table *table, struct sk_buff *skb, struct netlink_callback *cb); int fib_table_flush(struct net *net, struct fib_table *table); struct fib_table *fib_trie_unmerge(struct fib_table *main_tb); +void fib_table_flush_external(struct fib_table *table); void fib_free_table(struct fib_table *tb); #ifndef CONFIG_IP_MULTIPLE_TABLES diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index c3b8047..161fc0f 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -151,7 +151,7 @@ static void fib_replace_table(struct net *net, struct fib_table *old, int fib_unmerge(struct net *net) { - struct fib_table *old, *new; + struct fib_table *old, *new, *main_table; /* attempt to fetch local table if it has been allocated */ old = fib_get_table(net, RT_TABLE_LOCAL); @@ -162,11 +162,21 @@ int fib_unmerge(struct net *net) if (!new) return -ENOMEM; + /* table is already unmerged */ + if (new == old) + return 0; + /* replace merged table with clean table */ - if (new != old) { - fib_replace_table(net, old, new); - fib_free_table(old); - } + fib_replace_table(net, old, new); + fib_free_table(old); + + /* attempt to fetch main table if it has been allocated */ + main_table = fib_get_table(net, RT_TABLE_MAIN); + if (!main_table) + return 0; + + /* flush local entries from main table */ + fib_table_flush_external(main_table); return 0; } diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 4cff74d..735edc9 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1760,6 +1760,71 @@ out: return NULL; } +/* Caller must hold RTNL */ +void fib_table_flush_external(struct fib_table *tb) +{ + struct trie *t = (struct trie *)tb->tb_data; + struct key_vector *pn = t->kv; + unsigned long cindex = 1; + struct hlist_node *tmp; + struct fib_alias *fa; + + /* walk trie in reverse order */ + for (;;) { + unsigned char slen = 0; + struct key_vector *n; + + if (!(cindex--)) { + t_key pkey = pn->key; + + /* cannot resize the trie vector */ + if (IS_TRIE(pn)) + break; + + /* resize completed node */ + pn = resize(t, pn); + cindex = get_index(pkey, pn); + + continue; + } + + /* grab the next available node */ + n = get_child(pn, cindex); + if (!n) + continue; + + if (IS_TNODE(n)) { + /* record pn and cindex for leaf walking */ + pn = n; + cindex = 1ul << n->bits; + + continue; + } + + hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) { + /* if alias was cloned to local then we just + * need to remove the local copy from main + */ + if (tb->tb_id != fa->tb_id) { + hlist_del_rcu(&fa->fa_list); + alias_free_mem_rcu(fa); + continue; + } + + /* record local slen */ + slen = fa->fa_slen; + } + + /* update leaf slen */ + n->slen = slen; + + if (hlist_empty(&n->leaf)) { + put_child_root(pn, n->key, NULL); + node_free(n); + } + } +} + /* Caller must hold RTNL. */ int fib_table_flush(struct net *net, struct fib_table *tb) { -- cgit v0.10.2 From 3114cdfe66c156345b0ae34e2990472f277e0c1b Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Tue, 15 Nov 2016 05:46:12 -0500 Subject: ipv4: Fix memory leak in exception case for splitting tries Fix a small memory leak that can occur where we leak a fib_alias in the event of us not being able to insert it into the local table. Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet Signed-off-by: Alexander Duyck Acked-by: Eric Dumazet Signed-off-by: David S. Miller diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 735edc9..026f309 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1743,8 +1743,10 @@ struct fib_table *fib_trie_unmerge(struct fib_table *oldtb) local_l = fib_find_node(lt, &local_tp, l->key); if (fib_insert_alias(lt, local_tp, local_l, new_fa, - NULL, l->key)) + NULL, l->key)) { + kmem_cache_free(fn_alias_kmem, new_fa); goto out; + } } /* stop loop if key wrapped back to 0 */ -- cgit v0.10.2 From 612e94bd99912f3b2ac616c00c3dc7f166a98005 Mon Sep 17 00:00:00 2001 From: Radha Mohan Chintakuntla Date: Tue, 15 Nov 2016 17:37:16 +0530 Subject: net: thunderx: Introduce BGX_ID_MASK macro to extract bgx_id This patch fixes the 'bgx_id' determination on 83xx where there are 4 BGX blocks instead of 2 on other platforms. Signed-off-by: Radha Mohan Chintakuntla Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 8bbaedb..050e21f 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -1242,8 +1242,8 @@ static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) pci_read_config_word(pdev, PCI_DEVICE_ID, &sdevid); if (sdevid != PCI_DEVICE_ID_THUNDER_RGX) { - bgx->bgx_id = - (pci_resource_start(pdev, PCI_CFG_REG_BAR_NUM) >> 24) & 1; + bgx->bgx_id = (pci_resource_start(pdev, + PCI_CFG_REG_BAR_NUM) >> 24) & BGX_ID_MASK; bgx->bgx_id += nic_get_node_id(pdev) * MAX_BGX_PER_NODE; bgx->max_lmac = MAX_LMAC_PER_BGX; bgx_vnic[bgx->bgx_id] = bgx; diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h index d59c71e..01cc7c8 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h @@ -28,6 +28,8 @@ #define MAX_DMAC_PER_LMAC 8 #define MAX_FRAME_SIZE 9216 +#define BGX_ID_MASK 0x3 + #define MAX_DMAC_PER_LMAC_TNS_BYPASS_MODE 2 /* Registers */ -- cgit v0.10.2 From 712c3185344050c591d78584542bd945e4f6f778 Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Tue, 15 Nov 2016 17:37:36 +0530 Subject: net: thunderx: Program LMAC credits based on MTU Programming LMAC credits taking 9K frame size by default is incorrect as for an interface which is one of the many on the same BGX/QLM no of credits available will be less as Tx FIFO will be divided across all interfaces. So let's say a BGX with 40G interface and another BGX with multiple 10G, bandwidth of 10G interfaces will be effected when traffic is running on both 40G and 10G interfaces simultaneously. This patch fixes this issue by programming credits based on netdev's MTU. Also fixed configuring MTU to HW and added CQE counter for pkts which exceed this value. Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h index 3042610..cd2d379 100644 --- a/drivers/net/ethernet/cavium/thunder/nic.h +++ b/drivers/net/ethernet/cavium/thunder/nic.h @@ -47,7 +47,7 @@ /* Min/Max packet size */ #define NIC_HW_MIN_FRS 64 -#define NIC_HW_MAX_FRS 9200 /* 9216 max packet including FCS */ +#define NIC_HW_MAX_FRS 9190 /* Excluding L2 header and FCS */ /* Max pkinds */ #define NIC_MAX_PKIND 16 @@ -282,7 +282,6 @@ struct nicvf { u8 node; u8 cpi_alg; - u16 mtu; bool link_up; u8 duplex; u32 speed; diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c index 2bbf4cb..85c9e62 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_main.c +++ b/drivers/net/ethernet/cavium/thunder/nic_main.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "nic_reg.h" #include "nic.h" @@ -260,18 +261,31 @@ static void nic_get_bgx_stats(struct nicpf *nic, struct bgx_stats_msg *bgx) /* Update hardware min/max frame size */ static int nic_update_hw_frs(struct nicpf *nic, int new_frs, int vf) { - if ((new_frs > NIC_HW_MAX_FRS) || (new_frs < NIC_HW_MIN_FRS)) { - dev_err(&nic->pdev->dev, - "Invalid MTU setting from VF%d rejected, should be between %d and %d\n", - vf, NIC_HW_MIN_FRS, NIC_HW_MAX_FRS); + int bgx, lmac, lmac_cnt; + u64 lmac_credits; + + if ((new_frs > NIC_HW_MAX_FRS) || (new_frs < NIC_HW_MIN_FRS)) return 1; - } - new_frs += ETH_HLEN; - if (new_frs <= nic->pkind.maxlen) - return 0; - nic->pkind.maxlen = new_frs; - nic_reg_write(nic, NIC_PF_PKIND_0_15_CFG, *(u64 *)&nic->pkind); + bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]); + lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]); + lmac += bgx * MAX_LMAC_PER_BGX; + + new_frs += VLAN_ETH_HLEN + ETH_FCS_LEN + 4; + + /* Update corresponding LMAC credits */ + lmac_cnt = bgx_get_lmac_count(nic->node, bgx); + lmac_credits = nic_reg_read(nic, NIC_PF_LMAC_0_7_CREDIT + (lmac * 8)); + lmac_credits &= ~(0xFFFFFULL << 12); + lmac_credits |= (((((48 * 1024) / lmac_cnt) - new_frs) / 16) << 12); + nic_reg_write(nic, NIC_PF_LMAC_0_7_CREDIT + (lmac * 8), lmac_credits); + + /* Enforce MTU in HW + * This config is supported only from 88xx pass 2.0 onwards. + */ + if (!pass1_silicon(nic->pdev)) + nic_reg_write(nic, + NIC_PF_LMAC_0_7_CFG2 + (lmac * 8), new_frs); return 0; } @@ -464,7 +478,7 @@ static int nic_init_hw(struct nicpf *nic) /* PKIND configuration */ nic->pkind.minlen = 0; - nic->pkind.maxlen = NIC_HW_MAX_FRS + ETH_HLEN; + nic->pkind.maxlen = NIC_HW_MAX_FRS + VLAN_ETH_HLEN + ETH_FCS_LEN + 4; nic->pkind.lenerr_en = 1; nic->pkind.rx_hdr = 0; nic->pkind.hdr_sl = 0; diff --git a/drivers/net/ethernet/cavium/thunder/nic_reg.h b/drivers/net/ethernet/cavium/thunder/nic_reg.h index edf779f..80d4633 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_reg.h +++ b/drivers/net/ethernet/cavium/thunder/nic_reg.h @@ -106,6 +106,7 @@ #define NIC_PF_MPI_0_2047_CFG (0x210000) #define NIC_PF_RSSI_0_4097_RQ (0x220000) #define NIC_PF_LMAC_0_7_CFG (0x240000) +#define NIC_PF_LMAC_0_7_CFG2 (0x240100) #define NIC_PF_LMAC_0_7_SW_XOFF (0x242000) #define NIC_PF_LMAC_0_7_CREDIT (0x244000) #define NIC_PF_CHAN_0_255_TX_CFG (0x400000) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 45a13f7..8f83361 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1189,6 +1189,17 @@ int nicvf_stop(struct net_device *netdev) return 0; } +static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu) +{ + union nic_mbx mbx = {}; + + mbx.frs.msg = NIC_MBOX_MSG_SET_MAX_FRS; + mbx.frs.max_frs = mtu; + mbx.frs.vf_id = nic->vf_id; + + return nicvf_send_msg_to_pf(nic, &mbx); +} + int nicvf_open(struct net_device *netdev) { int err, qidx; @@ -1196,8 +1207,6 @@ int nicvf_open(struct net_device *netdev) struct queue_set *qs = nic->qs; struct nicvf_cq_poll *cq_poll = NULL; - nic->mtu = netdev->mtu; - netif_carrier_off(netdev); err = nicvf_register_misc_interrupt(nic); @@ -1248,9 +1257,12 @@ int nicvf_open(struct net_device *netdev) if (nic->sqs_mode) nicvf_get_primary_vf_struct(nic); - /* Configure receive side scaling */ - if (!nic->sqs_mode) + /* Configure receive side scaling and MTU */ + if (!nic->sqs_mode) { nicvf_rss_init(nic); + if (nicvf_update_hw_max_frs(nic, netdev->mtu)) + goto cleanup; + } err = nicvf_register_interrupts(nic); if (err) @@ -1297,17 +1309,6 @@ napi_del: return err; } -static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu) -{ - union nic_mbx mbx = {}; - - mbx.frs.msg = NIC_MBOX_MSG_SET_MAX_FRS; - mbx.frs.max_frs = mtu; - mbx.frs.vf_id = nic->vf_id; - - return nicvf_send_msg_to_pf(nic, &mbx); -} - static int nicvf_change_mtu(struct net_device *netdev, int new_mtu) { struct nicvf *nic = netdev_priv(netdev); @@ -1318,10 +1319,13 @@ static int nicvf_change_mtu(struct net_device *netdev, int new_mtu) if (new_mtu < NIC_HW_MIN_FRS) return -EINVAL; + netdev->mtu = new_mtu; + + if (!netif_running(netdev)) + return 0; + if (nicvf_update_hw_max_frs(nic, new_mtu)) return -EINVAL; - netdev->mtu = new_mtu; - nic->mtu = new_mtu; return 0; } diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index a4fc501..f0e0ca6 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -1530,6 +1530,9 @@ int nicvf_check_cqe_tx_errs(struct nicvf *nic, case CQ_TX_ERROP_SUBDC_ERR: stats->tx.subdesc_err++; break; + case CQ_TX_ERROP_MAX_SIZE_VIOL: + stats->tx.max_size_exceeded++; + break; case CQ_TX_ERROP_IMM_SIZE_OFLOW: stats->tx.imm_size_oflow++; break; diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h index 869f338..8f4718e 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h @@ -158,6 +158,7 @@ enum CQ_TX_ERROP_E { CQ_TX_ERROP_DESC_FAULT = 0x10, CQ_TX_ERROP_HDR_CONS_ERR = 0x11, CQ_TX_ERROP_SUBDC_ERR = 0x12, + CQ_TX_ERROP_MAX_SIZE_VIOL = 0x13, CQ_TX_ERROP_IMM_SIZE_OFLOW = 0x80, CQ_TX_ERROP_DATA_SEQUENCE_ERR = 0x81, CQ_TX_ERROP_MEM_SEQUENCE_ERR = 0x82, @@ -177,6 +178,7 @@ struct cmp_queue_stats { u64 desc_fault; u64 hdr_cons_err; u64 subdesc_err; + u64 max_size_exceeded; u64 imm_size_oflow; u64 data_seq_err; u64 mem_seq_err; -- cgit v0.10.2 From cadcf95a4f70362c96a8fe39ff5d5df830d4db7f Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Tue, 15 Nov 2016 17:37:54 +0530 Subject: net: thunderx: Fix configuration of L3/L4 length checking This patch fixes enabling of HW verification of L3/L4 length and TCP/UDP checksum which is currently being cleared. Also fixed VLAN stripping config which is being cleared when multiqset is enabled. Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index f0e0ca6..f914eef 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -538,9 +538,12 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs, mbx.rq.cfg = (1ULL << 62) | (RQ_CQ_DROP << 8); nicvf_send_msg_to_pf(nic, &mbx); - nicvf_queue_reg_write(nic, NIC_QSET_RQ_GEN_CFG, 0, 0x00); - if (!nic->sqs_mode) + if (!nic->sqs_mode && (qidx == 0)) { + /* Enable checking L3/L4 length and TCP/UDP checksums */ + nicvf_queue_reg_write(nic, NIC_QSET_RQ_GEN_CFG, 0, + (BIT(24) | BIT(23) | BIT(21))); nicvf_config_vlan_stripping(nic, nic->netdev->features); + } /* Enable Receive queue */ memset(&rq_cfg, 0, sizeof(struct rq_cfg)); -- cgit v0.10.2 From 964cb69bdc9db255f7c3a80f6e1bed8a25e4c60e Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Tue, 15 Nov 2016 17:38:16 +0530 Subject: net: thunderx: Fix VF driver's interface statistics This patch fixes multiple issues 1. Convert all driver statistics to percpu counters for accuracy. 2. To avoid multiple CQEs posted by a TSO packet appended to HW, TSO pkt's SQE has 'post_cqe' not set but a dummy SQE is added for getting HW transmit completion notification. This dummy SQE has 'dont_send' set and HW drops the pkt pointed to in this thus Tx drop counter increases. This patch fixes this by subtracting SW tx tso counter from HW Tx drop counter for actual packet drop counter. 3. Reset all individual queue's and VNIC HW stats when interface is going down. 4. Getrid off unnecessary counters in hot path. 5. Bringout all CQE error stats i.e both Rx and Tx. Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h index cd2d379..86bd93c 100644 --- a/drivers/net/ethernet/cavium/thunder/nic.h +++ b/drivers/net/ethernet/cavium/thunder/nic.h @@ -178,11 +178,11 @@ enum tx_stats_reg_offset { struct nicvf_hw_stats { u64 rx_bytes; + u64 rx_frames; u64 rx_ucast_frames; u64 rx_bcast_frames; u64 rx_mcast_frames; - u64 rx_fcs_errors; - u64 rx_l2_errors; + u64 rx_drops; u64 rx_drop_red; u64 rx_drop_red_bytes; u64 rx_drop_overrun; @@ -191,6 +191,19 @@ struct nicvf_hw_stats { u64 rx_drop_mcast; u64 rx_drop_l3_bcast; u64 rx_drop_l3_mcast; + u64 rx_fcs_errors; + u64 rx_l2_errors; + + u64 tx_bytes; + u64 tx_frames; + u64 tx_ucast_frames; + u64 tx_bcast_frames; + u64 tx_mcast_frames; + u64 tx_drops; +}; + +struct nicvf_drv_stats { + /* CQE Rx errs */ u64 rx_bgx_truncated_pkts; u64 rx_jabber_errs; u64 rx_fcs_errs; @@ -216,34 +229,30 @@ struct nicvf_hw_stats { u64 rx_l4_pclp; u64 rx_truncated_pkts; - u64 tx_bytes_ok; - u64 tx_ucast_frames_ok; - u64 tx_bcast_frames_ok; - u64 tx_mcast_frames_ok; - u64 tx_drops; -}; - -struct nicvf_drv_stats { - /* Rx */ - u64 rx_frames_ok; - u64 rx_frames_64; - u64 rx_frames_127; - u64 rx_frames_255; - u64 rx_frames_511; - u64 rx_frames_1023; - u64 rx_frames_1518; - u64 rx_frames_jumbo; - u64 rx_drops; - + /* CQE Tx errs */ + u64 tx_desc_fault; + u64 tx_hdr_cons_err; + u64 tx_subdesc_err; + u64 tx_max_size_exceeded; + u64 tx_imm_size_oflow; + u64 tx_data_seq_err; + u64 tx_mem_seq_err; + u64 tx_lock_viol; + u64 tx_data_fault; + u64 tx_tstmp_conflict; + u64 tx_tstmp_timeout; + u64 tx_mem_fault; + u64 tx_csum_overlap; + u64 tx_csum_overflow; + + /* driver debug stats */ u64 rcv_buffer_alloc_failures; - - /* Tx */ - u64 tx_frames_ok; - u64 tx_drops; u64 tx_tso; u64 tx_timeout; u64 txq_stop; u64 txq_wake; + + struct u64_stats_sync syncp; }; struct nicvf { @@ -297,7 +306,7 @@ struct nicvf { /* Stats */ struct nicvf_hw_stats hw_stats; - struct nicvf_drv_stats drv_stats; + struct nicvf_drv_stats __percpu *drv_stats; struct bgx_stats bgx_stats; /* MSI-X */ diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c index 85c9e62..6677b96 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_main.c +++ b/drivers/net/ethernet/cavium/thunder/nic_main.c @@ -851,6 +851,7 @@ static int nic_reset_stat_counters(struct nicpf *nic, nic_reg_write(nic, reg_addr, 0); } } + return 0; } diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c index ad4fddb..432bf6b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c @@ -36,11 +36,11 @@ struct nicvf_stat { static const struct nicvf_stat nicvf_hw_stats[] = { NICVF_HW_STAT(rx_bytes), + NICVF_HW_STAT(rx_frames), NICVF_HW_STAT(rx_ucast_frames), NICVF_HW_STAT(rx_bcast_frames), NICVF_HW_STAT(rx_mcast_frames), - NICVF_HW_STAT(rx_fcs_errors), - NICVF_HW_STAT(rx_l2_errors), + NICVF_HW_STAT(rx_drops), NICVF_HW_STAT(rx_drop_red), NICVF_HW_STAT(rx_drop_red_bytes), NICVF_HW_STAT(rx_drop_overrun), @@ -49,50 +49,59 @@ static const struct nicvf_stat nicvf_hw_stats[] = { NICVF_HW_STAT(rx_drop_mcast), NICVF_HW_STAT(rx_drop_l3_bcast), NICVF_HW_STAT(rx_drop_l3_mcast), - NICVF_HW_STAT(rx_bgx_truncated_pkts), - NICVF_HW_STAT(rx_jabber_errs), - NICVF_HW_STAT(rx_fcs_errs), - NICVF_HW_STAT(rx_bgx_errs), - NICVF_HW_STAT(rx_prel2_errs), - NICVF_HW_STAT(rx_l2_hdr_malformed), - NICVF_HW_STAT(rx_oversize), - NICVF_HW_STAT(rx_undersize), - NICVF_HW_STAT(rx_l2_len_mismatch), - NICVF_HW_STAT(rx_l2_pclp), - NICVF_HW_STAT(rx_ip_ver_errs), - NICVF_HW_STAT(rx_ip_csum_errs), - NICVF_HW_STAT(rx_ip_hdr_malformed), - NICVF_HW_STAT(rx_ip_payload_malformed), - NICVF_HW_STAT(rx_ip_ttl_errs), - NICVF_HW_STAT(rx_l3_pclp), - NICVF_HW_STAT(rx_l4_malformed), - NICVF_HW_STAT(rx_l4_csum_errs), - NICVF_HW_STAT(rx_udp_len_errs), - NICVF_HW_STAT(rx_l4_port_errs), - NICVF_HW_STAT(rx_tcp_flag_errs), - NICVF_HW_STAT(rx_tcp_offset_errs), - NICVF_HW_STAT(rx_l4_pclp), - NICVF_HW_STAT(rx_truncated_pkts), - NICVF_HW_STAT(tx_bytes_ok), - NICVF_HW_STAT(tx_ucast_frames_ok), - NICVF_HW_STAT(tx_bcast_frames_ok), - NICVF_HW_STAT(tx_mcast_frames_ok), + NICVF_HW_STAT(rx_fcs_errors), + NICVF_HW_STAT(rx_l2_errors), + NICVF_HW_STAT(tx_bytes), + NICVF_HW_STAT(tx_frames), + NICVF_HW_STAT(tx_ucast_frames), + NICVF_HW_STAT(tx_bcast_frames), + NICVF_HW_STAT(tx_mcast_frames), + NICVF_HW_STAT(tx_drops), }; static const struct nicvf_stat nicvf_drv_stats[] = { - NICVF_DRV_STAT(rx_frames_ok), - NICVF_DRV_STAT(rx_frames_64), - NICVF_DRV_STAT(rx_frames_127), - NICVF_DRV_STAT(rx_frames_255), - NICVF_DRV_STAT(rx_frames_511), - NICVF_DRV_STAT(rx_frames_1023), - NICVF_DRV_STAT(rx_frames_1518), - NICVF_DRV_STAT(rx_frames_jumbo), - NICVF_DRV_STAT(rx_drops), + NICVF_DRV_STAT(rx_bgx_truncated_pkts), + NICVF_DRV_STAT(rx_jabber_errs), + NICVF_DRV_STAT(rx_fcs_errs), + NICVF_DRV_STAT(rx_bgx_errs), + NICVF_DRV_STAT(rx_prel2_errs), + NICVF_DRV_STAT(rx_l2_hdr_malformed), + NICVF_DRV_STAT(rx_oversize), + NICVF_DRV_STAT(rx_undersize), + NICVF_DRV_STAT(rx_l2_len_mismatch), + NICVF_DRV_STAT(rx_l2_pclp), + NICVF_DRV_STAT(rx_ip_ver_errs), + NICVF_DRV_STAT(rx_ip_csum_errs), + NICVF_DRV_STAT(rx_ip_hdr_malformed), + NICVF_DRV_STAT(rx_ip_payload_malformed), + NICVF_DRV_STAT(rx_ip_ttl_errs), + NICVF_DRV_STAT(rx_l3_pclp), + NICVF_DRV_STAT(rx_l4_malformed), + NICVF_DRV_STAT(rx_l4_csum_errs), + NICVF_DRV_STAT(rx_udp_len_errs), + NICVF_DRV_STAT(rx_l4_port_errs), + NICVF_DRV_STAT(rx_tcp_flag_errs), + NICVF_DRV_STAT(rx_tcp_offset_errs), + NICVF_DRV_STAT(rx_l4_pclp), + NICVF_DRV_STAT(rx_truncated_pkts), + + NICVF_DRV_STAT(tx_desc_fault), + NICVF_DRV_STAT(tx_hdr_cons_err), + NICVF_DRV_STAT(tx_subdesc_err), + NICVF_DRV_STAT(tx_max_size_exceeded), + NICVF_DRV_STAT(tx_imm_size_oflow), + NICVF_DRV_STAT(tx_data_seq_err), + NICVF_DRV_STAT(tx_mem_seq_err), + NICVF_DRV_STAT(tx_lock_viol), + NICVF_DRV_STAT(tx_data_fault), + NICVF_DRV_STAT(tx_tstmp_conflict), + NICVF_DRV_STAT(tx_tstmp_timeout), + NICVF_DRV_STAT(tx_mem_fault), + NICVF_DRV_STAT(tx_csum_overlap), + NICVF_DRV_STAT(tx_csum_overflow), + NICVF_DRV_STAT(rcv_buffer_alloc_failures), - NICVF_DRV_STAT(tx_frames_ok), NICVF_DRV_STAT(tx_tso), - NICVF_DRV_STAT(tx_drops), NICVF_DRV_STAT(tx_timeout), NICVF_DRV_STAT(txq_stop), NICVF_DRV_STAT(txq_wake), @@ -278,8 +287,8 @@ static void nicvf_get_ethtool_stats(struct net_device *netdev, struct ethtool_stats *stats, u64 *data) { struct nicvf *nic = netdev_priv(netdev); - int stat; - int sqs; + int stat, tmp_stats; + int sqs, cpu; nicvf_update_stats(nic); @@ -289,9 +298,13 @@ static void nicvf_get_ethtool_stats(struct net_device *netdev, for (stat = 0; stat < nicvf_n_hw_stats; stat++) *(data++) = ((u64 *)&nic->hw_stats) [nicvf_hw_stats[stat].index]; - for (stat = 0; stat < nicvf_n_drv_stats; stat++) - *(data++) = ((u64 *)&nic->drv_stats) - [nicvf_drv_stats[stat].index]; + for (stat = 0; stat < nicvf_n_drv_stats; stat++) { + tmp_stats = 0; + for_each_possible_cpu(cpu) + tmp_stats += ((u64 *)per_cpu_ptr(nic->drv_stats, cpu)) + [nicvf_drv_stats[stat].index]; + *(data++) = tmp_stats; + } nicvf_get_qset_stats(nic, stats, &data); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 8f83361..9dc79c05 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -69,25 +69,6 @@ static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx) return qidx; } -static inline void nicvf_set_rx_frame_cnt(struct nicvf *nic, - struct sk_buff *skb) -{ - if (skb->len <= 64) - nic->drv_stats.rx_frames_64++; - else if (skb->len <= 127) - nic->drv_stats.rx_frames_127++; - else if (skb->len <= 255) - nic->drv_stats.rx_frames_255++; - else if (skb->len <= 511) - nic->drv_stats.rx_frames_511++; - else if (skb->len <= 1023) - nic->drv_stats.rx_frames_1023++; - else if (skb->len <= 1518) - nic->drv_stats.rx_frames_1518++; - else - nic->drv_stats.rx_frames_jumbo++; -} - /* The Cavium ThunderX network controller can *only* be found in SoCs * containing the ThunderX ARM64 CPU implementation. All accesses to the device * registers on this platform are implicitly strongly ordered with respect @@ -514,7 +495,6 @@ static int nicvf_init_resources(struct nicvf *nic) } static void nicvf_snd_pkt_handler(struct net_device *netdev, - struct cmp_queue *cq, struct cqe_send_t *cqe_tx, int cqe_type, int budget, unsigned int *tx_pkts, unsigned int *tx_bytes) @@ -536,7 +516,7 @@ static void nicvf_snd_pkt_handler(struct net_device *netdev, __func__, cqe_tx->sq_qs, cqe_tx->sq_idx, cqe_tx->sqe_ptr, hdr->subdesc_cnt); - nicvf_check_cqe_tx_errs(nic, cq, cqe_tx); + nicvf_check_cqe_tx_errs(nic, cqe_tx); skb = (struct sk_buff *)sq->skbuff[cqe_tx->sqe_ptr]; if (skb) { /* Check for dummy descriptor used for HW TSO offload on 88xx */ @@ -630,8 +610,6 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev, return; } - nicvf_set_rx_frame_cnt(nic, skb); - nicvf_set_rxhash(netdev, cqe_rx, skb); skb_record_rx_queue(skb, rq_idx); @@ -703,7 +681,7 @@ loop: work_done++; break; case CQE_TYPE_SEND: - nicvf_snd_pkt_handler(netdev, cq, + nicvf_snd_pkt_handler(netdev, (void *)cq_desc, CQE_TYPE_SEND, budget, &tx_pkts, &tx_bytes); tx_done++; @@ -740,7 +718,7 @@ done: nic = nic->pnicvf; if (netif_tx_queue_stopped(txq) && netif_carrier_ok(netdev)) { netif_tx_start_queue(txq); - nic->drv_stats.txq_wake++; + this_cpu_inc(nic->drv_stats->txq_wake); if (netif_msg_tx_err(nic)) netdev_warn(netdev, "%s: Transmit queue wakeup SQ%d\n", @@ -1084,7 +1062,7 @@ static netdev_tx_t nicvf_xmit(struct sk_buff *skb, struct net_device *netdev) if (!netif_tx_queue_stopped(txq) && !nicvf_sq_append_skb(nic, skb)) { netif_tx_stop_queue(txq); - nic->drv_stats.txq_stop++; + this_cpu_inc(nic->drv_stats->txq_stop); if (netif_msg_tx_err(nic)) netdev_warn(netdev, "%s: Transmit ring full, stopping SQ%d\n", @@ -1202,7 +1180,7 @@ static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu) int nicvf_open(struct net_device *netdev) { - int err, qidx; + int cpu, err, qidx; struct nicvf *nic = netdev_priv(netdev); struct queue_set *qs = nic->qs; struct nicvf_cq_poll *cq_poll = NULL; @@ -1262,6 +1240,11 @@ int nicvf_open(struct net_device *netdev) nicvf_rss_init(nic); if (nicvf_update_hw_max_frs(nic, netdev->mtu)) goto cleanup; + + /* Clear percpu stats */ + for_each_possible_cpu(cpu) + memset(per_cpu_ptr(nic->drv_stats, cpu), 0, + sizeof(struct nicvf_drv_stats)); } err = nicvf_register_interrupts(nic); @@ -1288,9 +1271,6 @@ int nicvf_open(struct net_device *netdev) for (qidx = 0; qidx < qs->rbdr_cnt; qidx++) nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx); - nic->drv_stats.txq_stop = 0; - nic->drv_stats.txq_wake = 0; - return 0; cleanup: nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0); @@ -1383,9 +1363,10 @@ void nicvf_update_lmac_stats(struct nicvf *nic) void nicvf_update_stats(struct nicvf *nic) { - int qidx; + int qidx, cpu; + u64 tmp_stats = 0; struct nicvf_hw_stats *stats = &nic->hw_stats; - struct nicvf_drv_stats *drv_stats = &nic->drv_stats; + struct nicvf_drv_stats *drv_stats; struct queue_set *qs = nic->qs; #define GET_RX_STATS(reg) \ @@ -1408,21 +1389,33 @@ void nicvf_update_stats(struct nicvf *nic) stats->rx_drop_l3_bcast = GET_RX_STATS(RX_DRP_L3BCAST); stats->rx_drop_l3_mcast = GET_RX_STATS(RX_DRP_L3MCAST); - stats->tx_bytes_ok = GET_TX_STATS(TX_OCTS); - stats->tx_ucast_frames_ok = GET_TX_STATS(TX_UCAST); - stats->tx_bcast_frames_ok = GET_TX_STATS(TX_BCAST); - stats->tx_mcast_frames_ok = GET_TX_STATS(TX_MCAST); + stats->tx_bytes = GET_TX_STATS(TX_OCTS); + stats->tx_ucast_frames = GET_TX_STATS(TX_UCAST); + stats->tx_bcast_frames = GET_TX_STATS(TX_BCAST); + stats->tx_mcast_frames = GET_TX_STATS(TX_MCAST); stats->tx_drops = GET_TX_STATS(TX_DROP); - drv_stats->tx_frames_ok = stats->tx_ucast_frames_ok + - stats->tx_bcast_frames_ok + - stats->tx_mcast_frames_ok; - drv_stats->rx_frames_ok = stats->rx_ucast_frames + - stats->rx_bcast_frames + - stats->rx_mcast_frames; - drv_stats->rx_drops = stats->rx_drop_red + - stats->rx_drop_overrun; - drv_stats->tx_drops = stats->tx_drops; + /* On T88 pass 2.0, the dummy SQE added for TSO notification + * via CQE has 'dont_send' set. Hence HW drops the pkt pointed + * pointed by dummy SQE and results in tx_drops counter being + * incremented. Subtracting it from tx_tso counter will give + * exact tx_drops counter. + */ + if (nic->t88 && nic->hw_tso) { + for_each_possible_cpu(cpu) { + drv_stats = per_cpu_ptr(nic->drv_stats, cpu); + tmp_stats += drv_stats->tx_tso; + } + stats->tx_drops = tmp_stats - stats->tx_drops; + } + stats->tx_frames = stats->tx_ucast_frames + + stats->tx_bcast_frames + + stats->tx_mcast_frames; + stats->rx_frames = stats->rx_ucast_frames + + stats->rx_bcast_frames + + stats->rx_mcast_frames; + stats->rx_drops = stats->rx_drop_red + + stats->rx_drop_overrun; /* Update RQ and SQ stats */ for (qidx = 0; qidx < qs->rq_cnt; qidx++) @@ -1436,18 +1429,17 @@ static struct rtnl_link_stats64 *nicvf_get_stats64(struct net_device *netdev, { struct nicvf *nic = netdev_priv(netdev); struct nicvf_hw_stats *hw_stats = &nic->hw_stats; - struct nicvf_drv_stats *drv_stats = &nic->drv_stats; nicvf_update_stats(nic); stats->rx_bytes = hw_stats->rx_bytes; - stats->rx_packets = drv_stats->rx_frames_ok; - stats->rx_dropped = drv_stats->rx_drops; + stats->rx_packets = hw_stats->rx_frames; + stats->rx_dropped = hw_stats->rx_drops; stats->multicast = hw_stats->rx_mcast_frames; - stats->tx_bytes = hw_stats->tx_bytes_ok; - stats->tx_packets = drv_stats->tx_frames_ok; - stats->tx_dropped = drv_stats->tx_drops; + stats->tx_bytes = hw_stats->tx_bytes; + stats->tx_packets = hw_stats->tx_frames; + stats->tx_dropped = hw_stats->tx_drops; return stats; } @@ -1460,7 +1452,7 @@ static void nicvf_tx_timeout(struct net_device *dev) netdev_warn(dev, "%s: Transmit timed out, resetting\n", dev->name); - nic->drv_stats.tx_timeout++; + this_cpu_inc(nic->drv_stats->tx_timeout); schedule_work(&nic->reset_task); } @@ -1594,6 +1586,12 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto err_free_netdev; } + nic->drv_stats = netdev_alloc_pcpu_stats(struct nicvf_drv_stats); + if (!nic->drv_stats) { + err = -ENOMEM; + goto err_free_netdev; + } + err = nicvf_set_qset_resources(nic); if (err) goto err_free_netdev; @@ -1652,6 +1650,8 @@ err_unregister_interrupts: nicvf_unregister_interrupts(nic); err_free_netdev: pci_set_drvdata(pdev, NULL); + if (nic->drv_stats) + free_percpu(nic->drv_stats); free_netdev(netdev); err_release_regions: pci_release_regions(pdev); @@ -1679,6 +1679,8 @@ static void nicvf_remove(struct pci_dev *pdev) unregister_netdev(pnetdev); nicvf_unregister_interrupts(nic); pci_set_drvdata(pdev, NULL); + if (nic->drv_stats) + free_percpu(nic->drv_stats); free_netdev(netdev); pci_release_regions(pdev); pci_disable_device(pdev); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index f914eef..bdce591 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -104,7 +104,8 @@ static inline int nicvf_alloc_rcv_buffer(struct nicvf *nic, gfp_t gfp, nic->rb_page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN, order); if (!nic->rb_page) { - nic->drv_stats.rcv_buffer_alloc_failures++; + this_cpu_inc(nic->pnicvf->drv_stats-> + rcv_buffer_alloc_failures); return -ENOMEM; } nic->rb_page_offset = 0; @@ -483,9 +484,12 @@ static void nicvf_reset_rcv_queue_stats(struct nicvf *nic) { union nic_mbx mbx = {}; - /* Reset all RXQ's stats */ + /* Reset all RQ/SQ and VF stats */ mbx.reset_stat.msg = NIC_MBOX_MSG_RESET_STAT_COUNTER; + mbx.reset_stat.rx_stat_mask = 0x3FFF; + mbx.reset_stat.tx_stat_mask = 0x1F; mbx.reset_stat.rq_stat_mask = 0xFFFF; + mbx.reset_stat.sq_stat_mask = 0xFFFF; nicvf_send_msg_to_pf(nic, &mbx); } @@ -1032,7 +1036,7 @@ nicvf_sq_add_hdr_subdesc(struct nicvf *nic, struct snd_queue *sq, int qentry, hdr->tso_max_paysize = skb_shinfo(skb)->gso_size; /* For non-tunneled pkts, point this to L2 ethertype */ hdr->inner_l3_offset = skb_network_offset(skb) - 2; - nic->drv_stats.tx_tso++; + this_cpu_inc(nic->pnicvf->drv_stats->tx_tso); } } @@ -1164,7 +1168,7 @@ static int nicvf_sq_append_tso(struct nicvf *nic, struct snd_queue *sq, nicvf_sq_doorbell(nic, skb, sq_num, desc_cnt); - nic->drv_stats.tx_tso++; + this_cpu_inc(nic->pnicvf->drv_stats->tx_tso); return 1; } @@ -1425,8 +1429,6 @@ void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx) /* Check for errors in the receive cmp.queue entry */ int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx) { - struct nicvf_hw_stats *stats = &nic->hw_stats; - if (!cqe_rx->err_level && !cqe_rx->err_opcode) return 0; @@ -1438,76 +1440,76 @@ int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx) switch (cqe_rx->err_opcode) { case CQ_RX_ERROP_RE_PARTIAL: - stats->rx_bgx_truncated_pkts++; + this_cpu_inc(nic->drv_stats->rx_bgx_truncated_pkts); break; case CQ_RX_ERROP_RE_JABBER: - stats->rx_jabber_errs++; + this_cpu_inc(nic->drv_stats->rx_jabber_errs); break; case CQ_RX_ERROP_RE_FCS: - stats->rx_fcs_errs++; + this_cpu_inc(nic->drv_stats->rx_fcs_errs); break; case CQ_RX_ERROP_RE_RX_CTL: - stats->rx_bgx_errs++; + this_cpu_inc(nic->drv_stats->rx_bgx_errs); break; case CQ_RX_ERROP_PREL2_ERR: - stats->rx_prel2_errs++; + this_cpu_inc(nic->drv_stats->rx_prel2_errs); break; case CQ_RX_ERROP_L2_MAL: - stats->rx_l2_hdr_malformed++; + this_cpu_inc(nic->drv_stats->rx_l2_hdr_malformed); break; case CQ_RX_ERROP_L2_OVERSIZE: - stats->rx_oversize++; + this_cpu_inc(nic->drv_stats->rx_oversize); break; case CQ_RX_ERROP_L2_UNDERSIZE: - stats->rx_undersize++; + this_cpu_inc(nic->drv_stats->rx_undersize); break; case CQ_RX_ERROP_L2_LENMISM: - stats->rx_l2_len_mismatch++; + this_cpu_inc(nic->drv_stats->rx_l2_len_mismatch); break; case CQ_RX_ERROP_L2_PCLP: - stats->rx_l2_pclp++; + this_cpu_inc(nic->drv_stats->rx_l2_pclp); break; case CQ_RX_ERROP_IP_NOT: - stats->rx_ip_ver_errs++; + this_cpu_inc(nic->drv_stats->rx_ip_ver_errs); break; case CQ_RX_ERROP_IP_CSUM_ERR: - stats->rx_ip_csum_errs++; + this_cpu_inc(nic->drv_stats->rx_ip_csum_errs); break; case CQ_RX_ERROP_IP_MAL: - stats->rx_ip_hdr_malformed++; + this_cpu_inc(nic->drv_stats->rx_ip_hdr_malformed); break; case CQ_RX_ERROP_IP_MALD: - stats->rx_ip_payload_malformed++; + this_cpu_inc(nic->drv_stats->rx_ip_payload_malformed); break; case CQ_RX_ERROP_IP_HOP: - stats->rx_ip_ttl_errs++; + this_cpu_inc(nic->drv_stats->rx_ip_ttl_errs); break; case CQ_RX_ERROP_L3_PCLP: - stats->rx_l3_pclp++; + this_cpu_inc(nic->drv_stats->rx_l3_pclp); break; case CQ_RX_ERROP_L4_MAL: - stats->rx_l4_malformed++; + this_cpu_inc(nic->drv_stats->rx_l4_malformed); break; case CQ_RX_ERROP_L4_CHK: - stats->rx_l4_csum_errs++; + this_cpu_inc(nic->drv_stats->rx_l4_csum_errs); break; case CQ_RX_ERROP_UDP_LEN: - stats->rx_udp_len_errs++; + this_cpu_inc(nic->drv_stats->rx_udp_len_errs); break; case CQ_RX_ERROP_L4_PORT: - stats->rx_l4_port_errs++; + this_cpu_inc(nic->drv_stats->rx_l4_port_errs); break; case CQ_RX_ERROP_TCP_FLAG: - stats->rx_tcp_flag_errs++; + this_cpu_inc(nic->drv_stats->rx_tcp_flag_errs); break; case CQ_RX_ERROP_TCP_OFFSET: - stats->rx_tcp_offset_errs++; + this_cpu_inc(nic->drv_stats->rx_tcp_offset_errs); break; case CQ_RX_ERROP_L4_PCLP: - stats->rx_l4_pclp++; + this_cpu_inc(nic->drv_stats->rx_l4_pclp); break; case CQ_RX_ERROP_RBDR_TRUNC: - stats->rx_truncated_pkts++; + this_cpu_inc(nic->drv_stats->rx_truncated_pkts); break; } @@ -1515,56 +1517,52 @@ int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx) } /* Check for errors in the send cmp.queue entry */ -int nicvf_check_cqe_tx_errs(struct nicvf *nic, - struct cmp_queue *cq, struct cqe_send_t *cqe_tx) +int nicvf_check_cqe_tx_errs(struct nicvf *nic, struct cqe_send_t *cqe_tx) { - struct cmp_queue_stats *stats = &cq->stats; - switch (cqe_tx->send_status) { case CQ_TX_ERROP_GOOD: - stats->tx.good++; return 0; case CQ_TX_ERROP_DESC_FAULT: - stats->tx.desc_fault++; + this_cpu_inc(nic->drv_stats->tx_desc_fault); break; case CQ_TX_ERROP_HDR_CONS_ERR: - stats->tx.hdr_cons_err++; + this_cpu_inc(nic->drv_stats->tx_hdr_cons_err); break; case CQ_TX_ERROP_SUBDC_ERR: - stats->tx.subdesc_err++; + this_cpu_inc(nic->drv_stats->tx_subdesc_err); break; case CQ_TX_ERROP_MAX_SIZE_VIOL: - stats->tx.max_size_exceeded++; + this_cpu_inc(nic->drv_stats->tx_max_size_exceeded); break; case CQ_TX_ERROP_IMM_SIZE_OFLOW: - stats->tx.imm_size_oflow++; + this_cpu_inc(nic->drv_stats->tx_imm_size_oflow); break; case CQ_TX_ERROP_DATA_SEQUENCE_ERR: - stats->tx.data_seq_err++; + this_cpu_inc(nic->drv_stats->tx_data_seq_err); break; case CQ_TX_ERROP_MEM_SEQUENCE_ERR: - stats->tx.mem_seq_err++; + this_cpu_inc(nic->drv_stats->tx_mem_seq_err); break; case CQ_TX_ERROP_LOCK_VIOL: - stats->tx.lock_viol++; + this_cpu_inc(nic->drv_stats->tx_lock_viol); break; case CQ_TX_ERROP_DATA_FAULT: - stats->tx.data_fault++; + this_cpu_inc(nic->drv_stats->tx_data_fault); break; case CQ_TX_ERROP_TSTMP_CONFLICT: - stats->tx.tstmp_conflict++; + this_cpu_inc(nic->drv_stats->tx_tstmp_conflict); break; case CQ_TX_ERROP_TSTMP_TIMEOUT: - stats->tx.tstmp_timeout++; + this_cpu_inc(nic->drv_stats->tx_tstmp_timeout); break; case CQ_TX_ERROP_MEM_FAULT: - stats->tx.mem_fault++; + this_cpu_inc(nic->drv_stats->tx_mem_fault); break; case CQ_TX_ERROP_CK_OVERLAP: - stats->tx.csum_overlap++; + this_cpu_inc(nic->drv_stats->tx_csum_overlap); break; case CQ_TX_ERROP_CK_OFLOW: - stats->tx.csum_overflow++; + this_cpu_inc(nic->drv_stats->tx_csum_overflow); break; } diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h index 8f4718e..2e3c940 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h @@ -172,26 +172,6 @@ enum CQ_TX_ERROP_E { CQ_TX_ERROP_ENUM_LAST = 0x8a, }; -struct cmp_queue_stats { - struct tx_stats { - u64 good; - u64 desc_fault; - u64 hdr_cons_err; - u64 subdesc_err; - u64 max_size_exceeded; - u64 imm_size_oflow; - u64 data_seq_err; - u64 mem_seq_err; - u64 lock_viol; - u64 data_fault; - u64 tstmp_conflict; - u64 tstmp_timeout; - u64 mem_fault; - u64 csum_overlap; - u64 csum_overflow; - } tx; -} ____cacheline_aligned_in_smp; - enum RQ_SQ_STATS { RQ_SQ_STATS_OCTS, RQ_SQ_STATS_PKTS, @@ -243,7 +223,6 @@ struct cmp_queue { spinlock_t lock; /* lock to serialize processing CQEs */ void *desc; struct q_desc_mem dmem; - struct cmp_queue_stats stats; int irq; } ____cacheline_aligned_in_smp; @@ -338,6 +317,5 @@ u64 nicvf_queue_reg_read(struct nicvf *nic, void nicvf_update_rq_stats(struct nicvf *nic, int rq_idx); void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx); int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx); -int nicvf_check_cqe_tx_errs(struct nicvf *nic, - struct cmp_queue *cq, struct cqe_send_t *cqe_tx); +int nicvf_check_cqe_tx_errs(struct nicvf *nic, struct cqe_send_t *cqe_tx); #endif /* NICVF_QUEUES_H */ -- cgit v0.10.2 From c94acf805d93e7beb5898ac97ff327ae0b6f04dd Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Tue, 15 Nov 2016 17:38:29 +0530 Subject: net: thunderx: Fix memory leak and other issues upon interface toggle This patch fixes the following 1. When interface is being teardown and queues are being cleaned up, free pending SKBs that are in SQ which are either not transmitted or freed as NAPI is disabled by that time. 2. While interface initialization, delay CFG_DONE notification till the end to avoid corner cases where TXQs are enabled but CQ interrupts are not which results blocking transmission and kicking off watchdog. 3. Check for IFF_UP while re-enabling RBDR interrupts from tasklet. Signed-off-by: Sunil Goutham Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 9dc79c05..8a37012 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -473,9 +473,6 @@ int nicvf_set_real_num_queues(struct net_device *netdev, static int nicvf_init_resources(struct nicvf *nic) { int err; - union nic_mbx mbx = {}; - - mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE; /* Enable Qset */ nicvf_qset_config(nic, true); @@ -488,9 +485,6 @@ static int nicvf_init_resources(struct nicvf *nic) return err; } - /* Send VF config done msg to PF */ - nicvf_write_to_mbx(nic, &mbx); - return 0; } @@ -1184,6 +1178,7 @@ int nicvf_open(struct net_device *netdev) struct nicvf *nic = netdev_priv(netdev); struct queue_set *qs = nic->qs; struct nicvf_cq_poll *cq_poll = NULL; + union nic_mbx mbx = {}; netif_carrier_off(netdev); @@ -1271,6 +1266,10 @@ int nicvf_open(struct net_device *netdev) for (qidx = 0; qidx < qs->rbdr_cnt; qidx++) nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx); + /* Send VF config done msg to PF */ + mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE; + nicvf_write_to_mbx(nic, &mbx); + return 0; cleanup: nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0); diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index bdce591..747ef08 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -271,7 +271,8 @@ refill: rbdr_idx, new_rb); next_rbdr: /* Re-enable RBDR interrupts only if buffer allocation is success */ - if (!nic->rb_alloc_fail && rbdr->enable) + if (!nic->rb_alloc_fail && rbdr->enable && + netif_running(nic->pnicvf->netdev)) nicvf_enable_intr(nic, NICVF_INTR_RBDR, rbdr_idx); if (rbdr_idx) @@ -362,6 +363,8 @@ static int nicvf_init_snd_queue(struct nicvf *nic, static void nicvf_free_snd_queue(struct nicvf *nic, struct snd_queue *sq) { + struct sk_buff *skb; + if (!sq) return; if (!sq->dmem.base) @@ -372,6 +375,15 @@ static void nicvf_free_snd_queue(struct nicvf *nic, struct snd_queue *sq) sq->dmem.q_len * TSO_HEADER_SIZE, sq->tso_hdrs, sq->tso_hdrs_phys); + /* Free pending skbs in the queue */ + smp_rmb(); + while (sq->head != sq->tail) { + skb = (struct sk_buff *)sq->skbuff[sq->head]; + if (skb) + dev_kfree_skb_any(skb); + sq->head++; + sq->head &= (sq->dmem.q_len - 1); + } kfree(sq->skbuff); nicvf_free_q_desc_mem(nic, &sq->dmem); } -- cgit v0.10.2 From 963abe5c8a0273a1cf5913556da1b1189de0e57a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 15 Nov 2016 22:24:12 -0800 Subject: virtio-net: add a missing synchronize_net() It seems many drivers do not respect napi_hash_del() contract. When napi_hash_del() is used before netif_napi_del(), an RCU grace period is needed before freeing NAPI object. Fixes: 91815639d880 ("virtio-net: rx busy polling support") Signed-off-by: Eric Dumazet Cc: Jason Wang Cc: Michael S. Tsirkin Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index fd8b1e6..7276d5a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1497,6 +1497,11 @@ static void virtnet_free_queues(struct virtnet_info *vi) netif_napi_del(&vi->rq[i].napi); } + /* We called napi_hash_del() before netif_napi_del(), + * we need to respect an RCU grace period before freeing vi->rq + */ + synchronize_net(); + kfree(vi->rq); kfree(vi->sq); } -- cgit v0.10.2 From ea339343d64a14594d882ccb52e8619d42defe5e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Nov 2016 06:12:42 -0800 Subject: be2net: do not call napi_hash_del() Calling napi_hash_del() before netif_napi_del() is dangerous if a synchronize_rcu() is not enforced before NAPI struct freeing. Lets leave this detail to core networking stack and feel more comfortable. Signed-off-by: Eric Dumazet Cc: Sathya Perla Cc: Ajit Khaparde Cc: Sriharsha Basavapatna Cc: Somnath Kotur Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index cece8a0..93aa293 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -2813,7 +2813,6 @@ static void be_evt_queues_destroy(struct be_adapter *adapter) if (eqo->q.created) { be_eq_clean(eqo); be_cmd_q_destroy(adapter, &eqo->q, QTYPE_EQ); - napi_hash_del(&eqo->napi); netif_napi_del(&eqo->napi); free_cpumask_var(eqo->affinity_mask); } -- cgit v0.10.2 From 5f00a8d8a2c2fd99528ab1a3632f0e77f4d25202 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Nov 2016 06:19:02 -0800 Subject: cxgb4: do not call napi_hash_del() Calling napi_hash_del() before netif_napi_del() is dangerous if a synchronize_rcu() is not enforced before NAPI struct freeing. Lets leave this detail to core networking stack and feel more comfortable. Signed-off-by: Eric Dumazet Cc: Hariprasad S Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index 1e74fd6..e19a0ca 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -2951,7 +2951,6 @@ void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq, rq->cntxt_id, fl_id, 0xffff); dma_free_coherent(adap->pdev_dev, (rq->size + 1) * rq->iqe_len, rq->desc, rq->phys_addr); - napi_hash_del(&rq->napi); netif_napi_del(&rq->napi); rq->netdev = NULL; rq->cntxt_id = rq->abs_id = 0; -- cgit v0.10.2 From 955e16026d08a601d02b961d13b6db9d6c13c8c9 Mon Sep 17 00:00:00 2001 From: Alex Date: Wed, 16 Nov 2016 01:02:33 -0800 Subject: net/phy/vitesse: Configure RGMII skew on VSC8601, if needed With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The VSC8601 can handle this internally. While the VSC8601 can set more fine-grained delays, the standard skew settings work out of the box. The same heuristic is used to determine when this skew should be enabled as in vsc824x_config_init(). Tested on custom board with AM3352 SOC and VSC801 PHY. Signed-off-by: Alexandru Gagniuc Signed-off-by: David S. Miller diff --git a/drivers/net/phy/vitesse.c b/drivers/net/phy/vitesse.c index 2e37eb3..24b4a09 100644 --- a/drivers/net/phy/vitesse.c +++ b/drivers/net/phy/vitesse.c @@ -62,6 +62,10 @@ /* Vitesse Extended Page Access Register */ #define MII_VSC82X4_EXT_PAGE_ACCESS 0x1f +/* Vitesse VSC8601 Extended PHY Control Register 1 */ +#define MII_VSC8601_EPHY_CTL 0x17 +#define MII_VSC8601_EPHY_CTL_RGMII_SKEW (1 << 8) + #define PHY_ID_VSC8234 0x000fc620 #define PHY_ID_VSC8244 0x000fc6c0 #define PHY_ID_VSC8514 0x00070670 @@ -111,6 +115,34 @@ static int vsc824x_config_init(struct phy_device *phydev) return err; } +/* This adds a skew for both TX and RX clocks, so the skew should only be + * applied to "rgmii-id" interfaces. It may not work as expected + * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */ +static int vsc8601_add_skew(struct phy_device *phydev) +{ + int ret; + + ret = phy_read(phydev, MII_VSC8601_EPHY_CTL); + if (ret < 0) + return ret; + + ret |= MII_VSC8601_EPHY_CTL_RGMII_SKEW; + return phy_write(phydev, MII_VSC8601_EPHY_CTL, ret); +} + +static int vsc8601_config_init(struct phy_device *phydev) +{ + int ret = 0; + + if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) + ret = vsc8601_add_skew(phydev); + + if (ret < 0) + return ret; + + return genphy_config_init(phydev); +} + static int vsc824x_ack_interrupt(struct phy_device *phydev) { int err = 0; @@ -275,7 +307,7 @@ static struct phy_driver vsc82xx_driver[] = { .phy_id_mask = 0x000ffff0, .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_INTERRUPT, - .config_init = &genphy_config_init, + .config_init = &vsc8601_config_init, .config_aneg = &genphy_config_aneg, .read_status = &genphy_read_status, .ack_interrupt = &vsc824x_ack_interrupt, -- cgit v0.10.2 From e47112d9d6009bf6b7438cedc0270316d6b0370d Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 15 Nov 2016 15:58:15 -0800 Subject: net: dsa: b53: Fix VLAN usage and how we treat CPU port We currently have a fundamental problem in how we treat the CPU port and its VLAN membership. As soon as a second VLAN is configured to be untagged, the CPU automatically becomes untagged for that VLAN as well, and yet, we don't gracefully make sure that the CPU becomes tagged in the other VLANs it could be a member of. This results in only one VLAN being effectively usable from the CPU's perspective. Instead of having some pretty complex logic which tries to maintain the CPU port's default VLAN and its untagged properties, just do something very simple which consists in neither altering the CPU port's PVID settings, nor its untagged settings: - whenever a VLAN is added, the CPU is automatically a member of this VLAN group, as a tagged member - PVID settings for downstream ports do not alter the CPU port's PVID since it now is part of all VLANs in the system This means that a typical example where e.g: LAN ports are in VLAN1, and WAN port is in VLAN2, now require having two VLAN interfaces for the host to properly terminate and send traffic from/to. Fixes: Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support") Reported-by: Hartmut Knaack Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index 7717b19..947adda 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -962,9 +962,10 @@ static void b53_vlan_add(struct dsa_switch *ds, int port, vl->members |= BIT(port) | BIT(cpu_port); if (untagged) - vl->untag |= BIT(port) | BIT(cpu_port); + vl->untag |= BIT(port); else - vl->untag &= ~(BIT(port) | BIT(cpu_port)); + vl->untag &= ~BIT(port); + vl->untag &= ~BIT(cpu_port); b53_set_vlan_entry(dev, vid, vl); b53_fast_age_vlan(dev, vid); @@ -973,8 +974,6 @@ static void b53_vlan_add(struct dsa_switch *ds, int port, if (pvid) { b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port), vlan->vid_end); - b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(cpu_port), - vlan->vid_end); b53_fast_age_vlan(dev, vid); } } @@ -984,7 +983,6 @@ static int b53_vlan_del(struct dsa_switch *ds, int port, { struct b53_device *dev = ds->priv; bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; - unsigned int cpu_port = dev->cpu_port; struct b53_vlan *vl; u16 vid; u16 pvid; @@ -997,8 +995,6 @@ static int b53_vlan_del(struct dsa_switch *ds, int port, b53_get_vlan_entry(dev, vid, vl); vl->members &= ~BIT(port); - if ((vl->members & BIT(cpu_port)) == BIT(cpu_port)) - vl->members = 0; if (pvid == vid) { if (is5325(dev) || is5365(dev)) @@ -1007,18 +1003,14 @@ static int b53_vlan_del(struct dsa_switch *ds, int port, pvid = 0; } - if (untagged) { + if (untagged) vl->untag &= ~(BIT(port)); - if ((vl->untag & BIT(cpu_port)) == BIT(cpu_port)) - vl->untag = 0; - } b53_set_vlan_entry(dev, vid, vl); b53_fast_age_vlan(dev, vid); } b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port), pvid); - b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(cpu_port), pvid); b53_fast_age_vlan(dev, pvid); return 0; -- cgit v0.10.2 From e5f6f564fd191d365fcd775c06a732a488205588 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Nov 2016 06:31:52 -0800 Subject: bnxt: add a missing rcu synchronization Add a missing synchronize_net() call to avoid potential use after free, since we explicitly call napi_hash_del() to factorize the RCU grace period. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Eric Dumazet Cc: Michael Chan Acked-by: Michael Chan Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index c690966..e18635b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4934,6 +4934,10 @@ static void bnxt_del_napi(struct bnxt *bp) napi_hash_del(&bnapi->napi); netif_napi_del(&bnapi->napi); } + /* We called napi_hash_del() before netif_napi_del(), we need + * to respect an RCU grace period before freeing napi structures. + */ + synchronize_net(); } static void bnxt_init_napi(struct bnxt *bp) -- cgit v0.10.2 From cfc44a4d147ea605d66ccb917cc24467d15ff867 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Wed, 16 Nov 2016 10:27:02 -0800 Subject: net: check dead netns for peernet2id_alloc() Andrei reports we still allocate netns ID from idr after we destroy it in cleanup_net(). cleanup_net(): ... idr_destroy(&net->netns_ids); ... list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list); -> rollback_registered_many() -> rtmsg_ifinfo_build_skb() -> rtnl_fill_ifinfo() -> peernet2id_alloc() After that point we should not even access net->netns_ids, we should check the death of the current netns as early as we can in peernet2id_alloc(). For net-next we can consider to avoid sending rtmsg totally, it is a good optimization for netns teardown path. Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids") Reported-by: Andrei Vagin Cc: Nicolas Dichtel Signed-off-by: Cong Wang Acked-by: Andrei Vagin Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index f61c0e0..7001da9 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -219,6 +219,8 @@ int peernet2id_alloc(struct net *net, struct net *peer) bool alloc; int id; + if (atomic_read(&net->count) == 0) + return NETNSA_NSID_NOT_ASSIGNED; spin_lock_irqsave(&net->nsid_lock, flags); alloc = atomic_read(&peer->count) == 0 ? false : true; id = __peernet2id_alloc(net, peer, &alloc); -- cgit v0.10.2 From 48c1699d5335bc045b50989a06b1c526b17a25ff Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 16 Nov 2016 15:20:36 +0100 Subject: of_mdio: fix node leak in of_phy_register_fixed_link error path Make sure to drop the of_node reference also on failure to parse the speed property in of_phy_register_fixed_link(). Fixes: 3be2a49e5c08 ("of: provide a binding for fixed link PHYs") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c index b470f7e..8f46483 100644 --- a/drivers/of/of_mdio.c +++ b/drivers/of/of_mdio.c @@ -456,8 +456,11 @@ int of_phy_register_fixed_link(struct device_node *np) status.link = 1; status.duplex = of_property_read_bool(fixed_link_node, "full-duplex"); - if (of_property_read_u32(fixed_link_node, "speed", &status.speed)) + if (of_property_read_u32(fixed_link_node, "speed", + &status.speed)) { + of_node_put(fixed_link_node); return -EINVAL; + } status.pause = of_property_read_bool(fixed_link_node, "pause"); status.asym_pause = of_property_read_bool(fixed_link_node, "asym-pause"); -- cgit v0.10.2 From 3ae30f4ce65e9d4de274b1472169ab3c27f5c666 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 16 Nov 2016 15:20:37 +0100 Subject: of_mdio: fix device reference leak in of_phy_find_device Make sure to drop the reference taken by bus_find_device() before returning NULL from of_phy_find_device() when the found device is not a PHY. Fixes: 6ed742363b9c ("of: of_mdio: Ensure mdio device is a PHY") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c index 8f46483..5a3145a 100644 --- a/drivers/of/of_mdio.c +++ b/drivers/of/of_mdio.c @@ -292,6 +292,7 @@ struct phy_device *of_phy_find_device(struct device_node *phy_np) mdiodev = to_mdio_device(d); if (mdiodev->flags & MDIO_DEVICE_FLAG_PHY) return to_phy_device(d); + put_device(d); } return NULL; -- cgit v0.10.2 From 13c9d934a5a1d04f055c20c2253090e9afd9a5d1 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 16 Nov 2016 15:20:38 +0100 Subject: net: phy: fixed_phy: fix of_node leak in fixed_phy_unregister Make sure to drop the of_node reference taken in fixed_phy_register() when deregistering a PHY. Fixes: a75951217472 ("net: phy: extend fixed driver with fixed_phy_register()") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c index c649c10..eb51672 100644 --- a/drivers/net/phy/fixed_phy.c +++ b/drivers/net/phy/fixed_phy.c @@ -279,7 +279,7 @@ EXPORT_SYMBOL_GPL(fixed_phy_register); void fixed_phy_unregister(struct phy_device *phy) { phy_device_remove(phy); - + of_node_put(phy->mdio.dev.of_node); fixed_phy_del(phy->mdio.addr); } EXPORT_SYMBOL_GPL(fixed_phy_unregister); -- cgit v0.10.2 From b5c2d49544e5930c96e2632a7eece3f4325a1888 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Wed, 16 Nov 2016 16:26:46 +0100 Subject: ip6_tunnel: disable caching when the traffic class is inherited If an ip6 tunnel is configured to inherit the traffic class from the inner header, the dst_cache must be disabled or it will foul the policy routing. The issue is apprently there since at leat Linux-2.6.12-rc2. Reported-by: Liam McBirnie Cc: Liam McBirnie Acked-by: Hannes Frederic Sowa Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 8778456..0a4759b 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1034,6 +1034,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, int mtu; unsigned int psh_hlen = sizeof(struct ipv6hdr) + t->encap_hlen; unsigned int max_headroom = psh_hlen; + bool use_cache = false; u8 hop_limit; int err = -1; @@ -1066,7 +1067,15 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); neigh_release(neigh); - } else if (!fl6->flowi6_mark) + } else if (!(t->parms.flags & + (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) { + /* enable the cache only only if the routing decision does + * not depend on the current inner header value + */ + use_cache = true; + } + + if (use_cache) dst = dst_cache_get(&t->dst_cache); if (!ip6_tnl_xmit_ctl(t, &fl6->saddr, &fl6->daddr)) @@ -1150,7 +1159,7 @@ route_lookup: if (t->encap.type != TUNNEL_ENCAP_NONE) goto tx_err_dst_release; } else { - if (!fl6->flowi6_mark && ndst) + if (use_cache && ndst) dst_cache_set_ip6(&t->dst_cache, ndst, &fl6->saddr); } skb_dst_set(skb, dst); -- cgit v0.10.2 From 30a391a13ab9215d7569da4e1773c5bb4deed96d Mon Sep 17 00:00:00 2001 From: Roman Mashak Date: Wed, 16 Nov 2016 17:16:10 -0500 Subject: net sched filters: pass netlink message flags in event notification Userland client should be able to read an event, and reflect it back to the kernel, therefore it needs to extract complete set of netlink flags. For example, this will allow "tc monitor" to distinguish Add and Replace operations. Signed-off-by: Roman Mashak Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2b2a797..8e93d4a 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL; it_chain = &tp->next) - tfilter_notify(net, oskb, n, tp, 0, event, false); + tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, false); } /* Select new prio value from the range, managed by kernel. */ @@ -430,7 +430,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, if (!skb) return -ENOBUFS; - if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, 0, event) <= 0) { + if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, + n->nlmsg_flags, event) <= 0) { kfree_skb(skb); return -EINVAL; } -- cgit v0.10.2 From 9853a55ef1bb66d7411136046060bbfb69c714fa Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 15 Nov 2016 12:05:11 +0100 Subject: cfg80211: limit scan results cache size It's possible to make scanning consume almost arbitrary amounts of memory, e.g. by sending beacon frames with random BSSIDs at high rates while somebody is scanning. Limit the number of BSS table entries we're willing to cache to 1000, limiting maximum memory usage to maybe 4-5MB, but lower in practice - that would be the case for having both full-sized beacon and probe response frames for each entry; this seems not possible in practice, so a limit of 1000 entries will likely be closer to 0.5 MB. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg diff --git a/net/wireless/core.h b/net/wireless/core.h index 08d2e94..f0c0c8a 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -71,6 +71,7 @@ struct cfg80211_registered_device { struct list_head bss_list; struct rb_root bss_tree; u32 bss_generation; + u32 bss_entries; struct cfg80211_scan_request *scan_req; /* protected by RTNL */ struct sk_buff *scan_msg; struct cfg80211_sched_scan_request __rcu *sched_scan_req; diff --git a/net/wireless/scan.c b/net/wireless/scan.c index b5bd58d..35ad69f 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -57,6 +57,19 @@ * also linked into the probe response struct. */ +/* + * Limit the number of BSS entries stored in mac80211. Each one is + * a bit over 4k at most, so this limits to roughly 4-5M of memory. + * If somebody wants to really attack this though, they'd likely + * use small beacons, and only one type of frame, limiting each of + * the entries to a much smaller size (in order to generate more + * entries in total, so overhead is bigger.) + */ +static int bss_entries_limit = 1000; +module_param(bss_entries_limit, int, 0644); +MODULE_PARM_DESC(bss_entries_limit, + "limit to number of scan BSS entries (per wiphy, default 1000)"); + #define IEEE80211_SCAN_RESULT_EXPIRE (30 * HZ) static void bss_free(struct cfg80211_internal_bss *bss) @@ -137,6 +150,10 @@ static bool __cfg80211_unlink_bss(struct cfg80211_registered_device *rdev, list_del_init(&bss->list); rb_erase(&bss->rbn, &rdev->bss_tree); + rdev->bss_entries--; + WARN_ONCE((rdev->bss_entries == 0) ^ list_empty(&rdev->bss_list), + "rdev bss entries[%d]/list[empty:%d] corruption\n", + rdev->bss_entries, list_empty(&rdev->bss_list)); bss_ref_put(rdev, bss); return true; } @@ -163,6 +180,40 @@ static void __cfg80211_bss_expire(struct cfg80211_registered_device *rdev, rdev->bss_generation++; } +static bool cfg80211_bss_expire_oldest(struct cfg80211_registered_device *rdev) +{ + struct cfg80211_internal_bss *bss, *oldest = NULL; + bool ret; + + lockdep_assert_held(&rdev->bss_lock); + + list_for_each_entry(bss, &rdev->bss_list, list) { + if (atomic_read(&bss->hold)) + continue; + + if (!list_empty(&bss->hidden_list) && + !bss->pub.hidden_beacon_bss) + continue; + + if (oldest && time_before(oldest->ts, bss->ts)) + continue; + oldest = bss; + } + + if (WARN_ON(!oldest)) + return false; + + /* + * The callers make sure to increase rdev->bss_generation if anything + * gets removed (and a new entry added), so there's no need to also do + * it here. + */ + + ret = __cfg80211_unlink_bss(rdev, oldest); + WARN_ON(!ret); + return ret; +} + void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev, bool send_message) { @@ -689,6 +740,7 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *rdev, const u8 *ie; int i, ssidlen; u8 fold = 0; + u32 n_entries = 0; ies = rcu_access_pointer(new->pub.beacon_ies); if (WARN_ON(!ies)) @@ -712,6 +764,12 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *rdev, /* This is the bad part ... */ list_for_each_entry(bss, &rdev->bss_list, list) { + /* + * we're iterating all the entries anyway, so take the + * opportunity to validate the list length accounting + */ + n_entries++; + if (!ether_addr_equal(bss->pub.bssid, new->pub.bssid)) continue; if (bss->pub.channel != new->pub.channel) @@ -740,6 +798,10 @@ static bool cfg80211_combine_bsses(struct cfg80211_registered_device *rdev, new->pub.beacon_ies); } + WARN_ONCE(n_entries != rdev->bss_entries, + "rdev bss entries[%d]/list[len:%d] corruption\n", + rdev->bss_entries, n_entries); + return true; } @@ -894,7 +956,14 @@ cfg80211_bss_update(struct cfg80211_registered_device *rdev, } } + if (rdev->bss_entries >= bss_entries_limit && + !cfg80211_bss_expire_oldest(rdev)) { + kfree(new); + goto drop; + } + list_add_tail(&new->list, &rdev->bss_list); + rdev->bss_entries++; rb_insert_bss(rdev, new); found = new; } -- cgit v0.10.2 From 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b Mon Sep 17 00:00:00 2001 From: Jeremy Linton Date: Thu, 17 Nov 2016 09:14:25 -0600 Subject: net: sky2: Fix shutdown crash The sky2 frequently crashes during machine shutdown with: sky2_get_stats+0x60/0x3d8 [sky2] dev_get_stats+0x68/0xd8 rtnl_fill_stats+0x54/0x140 rtnl_fill_ifinfo+0x46c/0xc68 rtmsg_ifinfo_build_skb+0x7c/0xf0 rtmsg_ifinfo.part.22+0x3c/0x70 rtmsg_ifinfo+0x50/0x5c netdev_state_change+0x4c/0x58 linkwatch_do_dev+0x50/0x88 __linkwatch_run_queue+0x104/0x1a4 linkwatch_event+0x30/0x3c process_one_work+0x140/0x3e0 worker_thread+0x60/0x44c kthread+0xdc/0xf0 ret_from_fork+0x10/0x50 This is caused by the sky2 being called after it has been shutdown. A previous thread about this can be found here: https://lkml.org/lkml/2016/4/12/410 An alternative fix is to assure that IFF_UP gets cleared by calling dev_close() during shutdown. This is similar to what the bnx2/tg3/xgene and maybe others are doing to assure that the driver isn't being called following _shutdown(). Signed-off-by: Jeremy Linton Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c index f05ea56..941c8e2 100644 --- a/drivers/net/ethernet/marvell/sky2.c +++ b/drivers/net/ethernet/marvell/sky2.c @@ -5220,6 +5220,19 @@ static SIMPLE_DEV_PM_OPS(sky2_pm_ops, sky2_suspend, sky2_resume); static void sky2_shutdown(struct pci_dev *pdev) { + struct sky2_hw *hw = pci_get_drvdata(pdev); + int port; + + for (port = 0; port < hw->ports; port++) { + struct net_device *ndev = hw->dev[port]; + + rtnl_lock(); + if (netif_running(ndev)) { + dev_close(ndev); + netif_device_detach(ndev); + } + rtnl_unlock(); + } sky2_suspend(&pdev->dev); pci_wake_from_d3(pdev, device_may_wakeup(&pdev->dev)); pci_set_power_state(pdev, PCI_D3hot); -- cgit v0.10.2 From c46ab7e08c79be7400f6d59edbc6f26a91941c5a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:39:58 +0100 Subject: net: ethernet: ti: cpsw: fix bad register access in probe error path Make sure to keep the platform device runtime-resumed throughout probe to avoid accessing the CPSW registers in the error path (e.g. for deferred probe) with clocks disabled: Unhandled fault: external abort on non-linefetch (0x1008) at 0xd0872d08 ... [] (cpsw_ale_control_set) from [] (cpsw_ale_destroy+0x2c/0x44) [] (cpsw_ale_destroy) from [] (cpsw_probe+0xbd0/0x10c4) [] (cpsw_probe) from [] (platform_drv_probe+0x5c/0xc0) Fixes: df828598a755 ("netdev: driver: ethernet: Add TI CPSW driver") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index c6cff3d..f60f8ab 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2641,13 +2641,12 @@ static int cpsw_probe(struct platform_device *pdev) goto clean_runtime_disable_ret; } cpsw->version = readl(&cpsw->regs->id_ver); - pm_runtime_put_sync(&pdev->dev); res = platform_get_resource(pdev, IORESOURCE_MEM, 1); cpsw->wr_regs = devm_ioremap_resource(&pdev->dev, res); if (IS_ERR(cpsw->wr_regs)) { ret = PTR_ERR(cpsw->wr_regs); - goto clean_runtime_disable_ret; + goto clean_pm_runtime_put_ret; } memset(&dma_params, 0, sizeof(dma_params)); @@ -2684,7 +2683,7 @@ static int cpsw_probe(struct platform_device *pdev) default: dev_err(priv->dev, "unknown version 0x%08x\n", cpsw->version); ret = -ENODEV; - goto clean_runtime_disable_ret; + goto clean_pm_runtime_put_ret; } for (i = 0; i < cpsw->data.slaves; i++) { struct cpsw_slave *slave = &cpsw->slaves[i]; @@ -2713,7 +2712,7 @@ static int cpsw_probe(struct platform_device *pdev) if (!cpsw->dma) { dev_err(priv->dev, "error initializing dma\n"); ret = -ENOMEM; - goto clean_runtime_disable_ret; + goto clean_pm_runtime_put_ret; } cpsw->txch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0); @@ -2815,12 +2814,16 @@ static int cpsw_probe(struct platform_device *pdev) } } + pm_runtime_put(&pdev->dev); + return 0; clean_ale_ret: cpsw_ale_destroy(cpsw->ale); clean_dma_ret: cpdma_ctlr_destroy(cpsw->dma); +clean_pm_runtime_put_ret: + pm_runtime_put_sync(&pdev->dev); clean_runtime_disable_ret: pm_runtime_disable(&pdev->dev); clean_ndev_ret: -- cgit v0.10.2 From 86e1d5adcef961eb383ce4eacbe0ef22f06e2045 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:39:59 +0100 Subject: net: ethernet: ti: cpsw: fix mdio device reference leak Make sure to drop the reference taken by of_find_device_by_node() when looking up an mdio device from a phy_id property during probe. Fixes: 549985ee9c72 ("cpsw: simplify the setup of the register pointers") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index f60f8ab..84c5d21 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2397,6 +2397,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, } snprintf(slave_data->phy_id, sizeof(slave_data->phy_id), PHY_ID_FMT, mdio->name, phyid); + put_device(&mdio->dev); } else { dev_err(&pdev->dev, "No slave[%d] phy_id, phy-handle, or fixed-link property\n", -- cgit v0.10.2 From a4e32b0d0a26ba2f2ba1c65bd403d06ccc1df29c Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:40:00 +0100 Subject: net: ethernet: ti: cpsw: fix deferred probe Make sure to deregister all child devices also on probe errors to avoid leaks and to fix probe deferral: cpsw 4a100000.ethernet: omap_device: omap_device_enable() called from invalid state 1 cpsw 4a100000.ethernet: use pm_runtime_put_sync_suspend() in driver? cpsw: probe of 4a100000.ethernet failed with error -22 Add generic helper to undo the effects of cpsw_probe_dt(), which will also be used in a follow-on patch to fix further leaks that have been introduced more recently. Note that the platform device is now runtime-resumed before registering any child devices in order to make sure that it is synchronously suspended after having deregistered the children in the error path. Fixes: 1fb19aa730e4 ("net: cpsw: Add parent<->child relation support between cpsw and mdio") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 84c5d21..5d14abb 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2441,6 +2441,11 @@ no_phy_slave: return 0; } +static void cpsw_remove_dt(struct platform_device *pdev) +{ + of_platform_depopulate(&pdev->dev); +} + static int cpsw_probe_dual_emac(struct cpsw_priv *priv) { struct cpsw_common *cpsw = priv->cpsw; @@ -2585,10 +2590,19 @@ static int cpsw_probe(struct platform_device *pdev) /* Select default pin state */ pinctrl_pm_select_default_state(&pdev->dev); + /* Need to enable clocks with runtime PM api to access module + * registers + */ + ret = pm_runtime_get_sync(&pdev->dev); + if (ret < 0) { + pm_runtime_put_noidle(&pdev->dev); + goto clean_runtime_disable_ret; + } + if (cpsw_probe_dt(&cpsw->data, pdev)) { dev_err(&pdev->dev, "cpsw: platform data missing\n"); ret = -ENODEV; - goto clean_runtime_disable_ret; + goto clean_dt_ret; } data = &cpsw->data; cpsw->rx_ch_num = 1; @@ -2609,7 +2623,7 @@ static int cpsw_probe(struct platform_device *pdev) GFP_KERNEL); if (!cpsw->slaves) { ret = -ENOMEM; - goto clean_runtime_disable_ret; + goto clean_dt_ret; } for (i = 0; i < data->slaves; i++) cpsw->slaves[i].slave_num = i; @@ -2621,7 +2635,7 @@ static int cpsw_probe(struct platform_device *pdev) if (IS_ERR(clk)) { dev_err(priv->dev, "fck is not found\n"); ret = -ENODEV; - goto clean_runtime_disable_ret; + goto clean_dt_ret; } cpsw->bus_freq_mhz = clk_get_rate(clk) / 1000000; @@ -2629,25 +2643,17 @@ static int cpsw_probe(struct platform_device *pdev) ss_regs = devm_ioremap_resource(&pdev->dev, ss_res); if (IS_ERR(ss_regs)) { ret = PTR_ERR(ss_regs); - goto clean_runtime_disable_ret; + goto clean_dt_ret; } cpsw->regs = ss_regs; - /* Need to enable clocks with runtime PM api to access module - * registers - */ - ret = pm_runtime_get_sync(&pdev->dev); - if (ret < 0) { - pm_runtime_put_noidle(&pdev->dev); - goto clean_runtime_disable_ret; - } cpsw->version = readl(&cpsw->regs->id_ver); res = platform_get_resource(pdev, IORESOURCE_MEM, 1); cpsw->wr_regs = devm_ioremap_resource(&pdev->dev, res); if (IS_ERR(cpsw->wr_regs)) { ret = PTR_ERR(cpsw->wr_regs); - goto clean_pm_runtime_put_ret; + goto clean_dt_ret; } memset(&dma_params, 0, sizeof(dma_params)); @@ -2684,7 +2690,7 @@ static int cpsw_probe(struct platform_device *pdev) default: dev_err(priv->dev, "unknown version 0x%08x\n", cpsw->version); ret = -ENODEV; - goto clean_pm_runtime_put_ret; + goto clean_dt_ret; } for (i = 0; i < cpsw->data.slaves; i++) { struct cpsw_slave *slave = &cpsw->slaves[i]; @@ -2713,7 +2719,7 @@ static int cpsw_probe(struct platform_device *pdev) if (!cpsw->dma) { dev_err(priv->dev, "error initializing dma\n"); ret = -ENOMEM; - goto clean_pm_runtime_put_ret; + goto clean_dt_ret; } cpsw->txch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0); @@ -2823,7 +2829,8 @@ clean_ale_ret: cpsw_ale_destroy(cpsw->ale); clean_dma_ret: cpdma_ctlr_destroy(cpsw->dma); -clean_pm_runtime_put_ret: +clean_dt_ret: + cpsw_remove_dt(pdev); pm_runtime_put_sync(&pdev->dev); clean_runtime_disable_ret: pm_runtime_disable(&pdev->dev); @@ -2850,7 +2857,7 @@ static int cpsw_remove(struct platform_device *pdev) cpsw_ale_destroy(cpsw->ale); cpdma_ctlr_destroy(cpsw->dma); - of_platform_depopulate(&pdev->dev); + cpsw_remove_dt(pdev); pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev); if (cpsw->data.dual_emac) -- cgit v0.10.2 From 8cbcc466fd4abd38a14b9d9b76c63a2cb7006554 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:40:01 +0100 Subject: net: ethernet: ti: cpsw: fix of_node and phydev leaks Make sure to drop references taken and deregister devices registered during probe on probe errors (including deferred probe) and driver unbind. Specifically, PHY of-node references were never released and fixed-link PHY devices were never deregistered. Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing") Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link PHY") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 5d14abb..c3b78bc 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2443,6 +2443,41 @@ no_phy_slave: static void cpsw_remove_dt(struct platform_device *pdev) { + struct net_device *ndev = platform_get_drvdata(pdev); + struct cpsw_common *cpsw = ndev_to_cpsw(ndev); + struct cpsw_platform_data *data = &cpsw->data; + struct device_node *node = pdev->dev.of_node; + struct device_node *slave_node; + int i = 0; + + for_each_available_child_of_node(node, slave_node) { + struct cpsw_slave_data *slave_data = &data->slave_data[i]; + + if (strcmp(slave_node->name, "slave")) + continue; + + if (of_phy_is_fixed_link(slave_node)) { + struct phy_device *phydev; + + phydev = of_phy_find_device(slave_node); + if (phydev) { + fixed_phy_unregister(phydev); + /* Put references taken by + * of_phy_find_device() and + * of_phy_register_fixed_link(). + */ + phy_device_free(phydev); + phy_device_free(phydev); + } + } + + of_node_put(slave_data->phy_node); + + i++; + if (i == data->slaves) + break; + } + of_platform_depopulate(&pdev->dev); } -- cgit v0.10.2 From a7fe9d466f6a33558a38c7ca9d58bcc83512d577 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:40:02 +0100 Subject: net: ethernet: ti: cpsw: fix secondary-emac probe error path Make sure to deregister the primary device in case the secondary emac fails to probe. kernel BUG at /home/johan/work/omicron/src/linux/net/core/dev.c:7743! ... [] (free_netdev) from [] (cpsw_probe+0x9cc/0xe50) [] (cpsw_probe) from [] (platform_drv_probe+0x5c/0xc0) Fixes: d9ba8f9e6298 ("driver: net: ethernet: cpsw: dual emac interface implementation") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index c3b78bc..11b2dae 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2852,7 +2852,7 @@ static int cpsw_probe(struct platform_device *pdev) ret = cpsw_probe_dual_emac(priv); if (ret) { cpsw_err(priv, probe, "error probe slave 2 emac interface\n"); - goto clean_ale_ret; + goto clean_unregister_netdev_ret; } } @@ -2860,6 +2860,8 @@ static int cpsw_probe(struct platform_device *pdev) return 0; +clean_unregister_netdev_ret: + unregister_netdev(ndev); clean_ale_ret: cpsw_ale_destroy(cpsw->ale); clean_dma_ret: -- cgit v0.10.2 From 3420ea88509f9d585b39f36e737022faf0286d9a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:40:03 +0100 Subject: net: ethernet: ti: cpsw: add missing sanity check Make sure to check for allocation failures before dereferencing a NULL-pointer during probe. Fixes: 649a1688c960 ("net: ethernet: ti: cpsw: create common struct to hold shared driver data") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 11b2dae..1387299 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2588,6 +2588,9 @@ static int cpsw_probe(struct platform_device *pdev) int irq; cpsw = devm_kzalloc(&pdev->dev, sizeof(struct cpsw_common), GFP_KERNEL); + if (!cpsw) + return -ENOMEM; + cpsw->dev = &pdev->dev; ndev = alloc_etherdev_mq(sizeof(struct cpsw_priv), CPSW_MAX_QUEUES); -- cgit v0.10.2 From 23a09873221c02106cf767a86743a55873f0d05b Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 17 Nov 2016 17:40:04 +0100 Subject: net: ethernet: ti: cpsw: fix fixed-link phy probe deferral Make sure to propagate errors from of_phy_register_fixed_link() which can fail with -EPROBE_DEFER. Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link PHY") Signed-off-by: Johan Hovold Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 1387299..58947aa 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2375,8 +2375,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, * to the PHY is the Ethernet MAC DT node. */ ret = of_phy_register_fixed_link(slave_node); - if (ret) + if (ret) { + if (ret != -EPROBE_DEFER) + dev_err(&pdev->dev, "failed to register fixed-link phy: %d\n", ret); return ret; + } slave_data->phy_node = of_node_get(slave_node); } else if (parp) { u32 phyid; @@ -2637,11 +2640,10 @@ static int cpsw_probe(struct platform_device *pdev) goto clean_runtime_disable_ret; } - if (cpsw_probe_dt(&cpsw->data, pdev)) { - dev_err(&pdev->dev, "cpsw: platform data missing\n"); - ret = -ENODEV; + ret = cpsw_probe_dt(&cpsw->data, pdev); + if (ret) goto clean_dt_ret; - } + data = &cpsw->data; cpsw->rx_ch_num = 1; cpsw->tx_ch_num = 1; -- cgit v0.10.2 From 06a77b07e3b44aea2b3c0e64de420ea2cfdcbaa9 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Thu, 17 Nov 2016 15:55:26 -0800 Subject: af_unix: conditionally use freezable blocking calls in read Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read") converts schedule_timeout() to its freezable version, it was probably correct at that time, but later, commit 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") breaks the strong requirement for a freezable sleep, according to commit 0f9548ca1091: We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. The pipe_lock is still held at that point. So use freezable version only for the recvmsg call path, avoid impact for Android. Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") Reported-by: Dmitry Vyukov Cc: Tejun Heo Cc: Colin Cross Cc: Rafael J. Wysocki Cc: Hannes Frederic Sowa Signed-off-by: Cong Wang Signed-off-by: David S. Miller diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5d1c14a..2358f26 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2199,7 +2199,8 @@ out: * Sleep until more data has arrived. But check for races.. */ static long unix_stream_data_wait(struct sock *sk, long timeo, - struct sk_buff *last, unsigned int last_len) + struct sk_buff *last, unsigned int last_len, + bool freezable) { struct sk_buff *tail; DEFINE_WAIT(wait); @@ -2220,7 +2221,10 @@ static long unix_stream_data_wait(struct sock *sk, long timeo, sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); unix_state_unlock(sk); - timeo = freezable_schedule_timeout(timeo); + if (freezable) + timeo = freezable_schedule_timeout(timeo); + else + timeo = schedule_timeout(timeo); unix_state_lock(sk); if (sock_flag(sk, SOCK_DEAD)) @@ -2250,7 +2254,8 @@ struct unix_stream_read_state { unsigned int splice_flags; }; -static int unix_stream_read_generic(struct unix_stream_read_state *state) +static int unix_stream_read_generic(struct unix_stream_read_state *state, + bool freezable) { struct scm_cookie scm; struct socket *sock = state->socket; @@ -2330,7 +2335,7 @@ again: mutex_unlock(&u->iolock); timeo = unix_stream_data_wait(sk, timeo, last, - last_len); + last_len, freezable); if (signal_pending(current)) { err = sock_intr_errno(timeo); @@ -2472,7 +2477,7 @@ static int unix_stream_recvmsg(struct socket *sock, struct msghdr *msg, .flags = flags }; - return unix_stream_read_generic(&state); + return unix_stream_read_generic(&state, true); } static int unix_stream_splice_actor(struct sk_buff *skb, @@ -2503,7 +2508,7 @@ static ssize_t unix_stream_splice_read(struct socket *sock, loff_t *ppos, flags & SPLICE_F_NONBLOCK) state.flags = MSG_DONTWAIT; - return unix_stream_read_generic(&state); + return unix_stream_read_generic(&state, false); } static int unix_shutdown(struct socket *sock, int mode) -- cgit v0.10.2 From 0f5258cd91e9d78a1ee30696314bec3c33321a93 Mon Sep 17 00:00:00 2001 From: Stefan Hajnoczi Date: Fri, 18 Nov 2016 09:41:46 +0000 Subject: netns: fix get_net_ns_by_fd(int pid) typo The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file descriptor not a pid. Fix the typo. Signed-off-by: Stefan Hajnoczi Acked-by: Rami Rosen Signed-off-by: David S. Miller diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index fc4f757..0940598 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -170,7 +170,7 @@ static inline struct net *copy_net_ns(unsigned long flags, extern struct list_head net_namespace_list; struct net *get_net_ns_by_pid(pid_t pid); -struct net *get_net_ns_by_fd(int pid); +struct net *get_net_ns_by_fd(int fd); #ifdef CONFIG_SYSCTL void ipx_register_sysctl(void); -- cgit v0.10.2 From f82ef3e10a870acc19fa04f80ef5877eaa26f41e Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 18 Nov 2016 15:50:39 +0100 Subject: rtnetlink: fix FDB size computation Add missing NDA_VLAN attribute's size. Fixes: 1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a6529c5..2b9d7d0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2852,7 +2852,10 @@ nla_put_failure: static inline size_t rtnl_fdb_nlmsg_size(void) { - return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN); + return NLMSG_ALIGN(sizeof(struct ndmsg)) + + nla_total_size(ETH_ALEN) + /* NDA_LLADDR */ + nla_total_size(sizeof(u16)) + /* NDA_VLAN */ + 0; } static void rtnl_fdb_notify(struct net_device *dev, u8 *addr, u16 vid, int type, -- cgit v0.10.2 From 178c7ae944444c198a1d9646477ab10d2d51f03e Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Sat, 19 Nov 2016 01:40:10 +0300 Subject: net: macb: add check for dma mapping error in start_xmit() at91ether_start_xmit() does not check for dma mapping errors. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index b32444a..533653b 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2673,6 +2673,12 @@ static int at91ether_start_xmit(struct sk_buff *skb, struct net_device *dev) lp->skb_length = skb->len; lp->skb_physaddr = dma_map_single(NULL, skb->data, skb->len, DMA_TO_DEVICE); + if (dma_mapping_error(NULL, lp->skb_physaddr)) { + dev_kfree_skb_any(skb); + dev->stats.tx_dropped++; + netdev_err(dev, "%s: DMA mapping error\n", __func__); + return NETDEV_TX_OK; + } /* Set address of the data in the Transmit Address register */ macb_writel(lp, TAR, lp->skb_physaddr); -- cgit v0.10.2 From 32c231164b762dddefa13af5a0101032c70b50ef Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Fri, 18 Nov 2016 22:13:00 +0100 Subject: l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind(). Without lock, a concurrent call could modify the socket flags between the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way, a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it would then leave a stale pointer there, generating use-after-free errors when walking through the list or modifying adjacent entries. BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8 Write of size 8 by task syz-executor/10987 CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0 ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0 Call Trace: [] dump_stack+0xb3/0x118 lib/dump_stack.c:15 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [< inline >] print_address_description mm/kasan/report.c:194 [] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283 [< inline >] kasan_report mm/kasan/report.c:303 [] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329 [< inline >] __write_once_size ./include/linux/compiler.h:249 [< inline >] __hlist_del ./include/linux/list.h:622 [< inline >] hlist_del_init ./include/linux/list.h:637 [] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239 [] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415 [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422 [] sock_release+0x8d/0x1d0 net/socket.c:570 [] sock_close+0x16/0x20 net/socket.c:1017 [] __fput+0x28c/0x780 fs/file_table.c:208 [] ____fput+0x15/0x20 fs/file_table.c:244 [] task_work_run+0xf9/0x170 [] do_exit+0x85e/0x2a00 [] do_group_exit+0x108/0x330 [] get_signal+0x617/0x17a0 kernel/signal.c:2307 [] do_signal+0x7f/0x18f0 [] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448 Allocated: PID = 10987 [ 1116.897025] [] save_stack_trace+0x16/0x20 [ 1116.897025] [] save_stack+0x46/0xd0 [ 1116.897025] [] kasan_kmalloc+0xad/0xe0 [ 1116.897025] [] kasan_slab_alloc+0x12/0x20 [ 1116.897025] [< inline >] slab_post_alloc_hook mm/slab.h:417 [ 1116.897025] [< inline >] slab_alloc_node mm/slub.c:2708 [ 1116.897025] [< inline >] slab_alloc mm/slub.c:2716 [ 1116.897025] [] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721 [ 1116.897025] [] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326 [ 1116.897025] [] sk_alloc+0x38/0xae0 net/core/sock.c:1388 [ 1116.897025] [] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182 [ 1116.897025] [] __sock_create+0x37b/0x640 net/socket.c:1153 [ 1116.897025] [< inline >] sock_create net/socket.c:1193 [ 1116.897025] [< inline >] SYSC_socket net/socket.c:1223 [ 1116.897025] [] SyS_socket+0xef/0x1b0 net/socket.c:1203 [ 1116.897025] [] entry_SYSCALL_64_fastpath+0x23/0xc6 Freed: PID = 10987 [ 1116.897025] [] save_stack_trace+0x16/0x20 [ 1116.897025] [] save_stack+0x46/0xd0 [ 1116.897025] [] kasan_slab_free+0x71/0xb0 [ 1116.897025] [< inline >] slab_free_hook mm/slub.c:1352 [ 1116.897025] [< inline >] slab_free_freelist_hook mm/slub.c:1374 [ 1116.897025] [< inline >] slab_free mm/slub.c:2951 [ 1116.897025] [] kmem_cache_free+0xc8/0x330 mm/slub.c:2973 [ 1116.897025] [< inline >] sk_prot_free net/core/sock.c:1369 [ 1116.897025] [] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444 [ 1116.897025] [] sk_destruct+0x44/0x80 net/core/sock.c:1452 [ 1116.897025] [] __sk_free+0x53/0x220 net/core/sock.c:1460 [ 1116.897025] [] sk_free+0x23/0x30 net/core/sock.c:1471 [ 1116.897025] [] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589 [ 1116.897025] [] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243 [ 1116.897025] [] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415 [ 1116.897025] [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422 [ 1116.897025] [] sock_release+0x8d/0x1d0 net/socket.c:570 [ 1116.897025] [] sock_close+0x16/0x20 net/socket.c:1017 [ 1116.897025] [] __fput+0x28c/0x780 fs/file_table.c:208 [ 1116.897025] [] ____fput+0x15/0x20 fs/file_table.c:244 [ 1116.897025] [] task_work_run+0xf9/0x170 [ 1116.897025] [] do_exit+0x85e/0x2a00 [ 1116.897025] [] do_group_exit+0x108/0x330 [ 1116.897025] [] get_signal+0x617/0x17a0 kernel/signal.c:2307 [ 1116.897025] [] do_signal+0x7f/0x18f0 [ 1116.897025] [] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156 [ 1116.897025] [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [ 1116.897025] [] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259 [ 1116.897025] [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Memory state around the buggy address: ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table. Fixes: c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case") Reported-by: Baozeng Ding Reported-by: Andrey Konovalov Tested-by: Baozeng Ding Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c index fce25af..982f6c4 100644 --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -251,8 +251,6 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) int ret; int chk_addr_ret; - if (!sock_flag(sk, SOCK_ZAPPED)) - return -EINVAL; if (addr_len < sizeof(struct sockaddr_l2tpip)) return -EINVAL; if (addr->l2tp_family != AF_INET) @@ -267,6 +265,9 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) read_unlock_bh(&l2tp_ip_lock); lock_sock(sk); + if (!sock_flag(sk, SOCK_ZAPPED)) + goto out; + if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_l2tpip)) goto out; diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index ad3468c..9978d01 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -269,8 +269,6 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) int addr_type; int err; - if (!sock_flag(sk, SOCK_ZAPPED)) - return -EINVAL; if (addr->l2tp_family != AF_INET6) return -EINVAL; if (addr_len < sizeof(*addr)) @@ -296,6 +294,9 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) lock_sock(sk); err = -EINVAL; + if (!sock_flag(sk, SOCK_ZAPPED)) + goto out_unlock; + if (sk->sk_state != TCP_CLOSE) goto out_unlock; -- cgit v0.10.2 From 3f0ae05d6fea0ed5b19efdbc9c9f8e02685a3af3 Mon Sep 17 00:00:00 2001 From: Zhang Shengju Date: Sat, 19 Nov 2016 23:28:32 +0800 Subject: rtnl: fix the loop index update error in rtnl_dump_ifinfo() If the link is filtered out, loop index should also be updated. If not, loop index will not be correct. Fixes: dc599f76c22b0 ("net: Add support for filtering link dump by master device and kind") Signed-off-by: Zhang Shengju Acked-by: David Ahern Signed-off-by: David S. Miller diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2b9d7d0..a99917b 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1609,7 +1609,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) head = &net->dev_index_head[h]; hlist_for_each_entry(dev, head, index_hlist) { if (link_dump_filtered(dev, master_idx, kind_ops)) - continue; + goto cont; if (idx < s_idx) goto cont; err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, -- cgit v0.10.2 From 51b9a31c42edcd089f5b229633477ab5128faf03 Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Sat, 19 Nov 2016 14:47:07 -0500 Subject: tipc: eliminate obsolete socket locking policy description The comment block in socket.c describing the locking policy is obsolete, and does not reflect current reality. We remove it in this commit. Since the current locking policy is much simpler and follows a mainstream approach, we see no need to add a new description. Signed-off-by: Jon Maloy Signed-off-by: David S. Miller diff --git a/net/tipc/socket.c b/net/tipc/socket.c index f9f5f3c..db32777 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1,7 +1,7 @@ /* * net/tipc/socket.c: TIPC socket API * - * Copyright (c) 2001-2007, 2012-2015, Ericsson AB + * Copyright (c) 2001-2007, 2012-2016, Ericsson AB * Copyright (c) 2004-2008, 2010-2013, Wind River Systems * All rights reserved. * @@ -129,54 +129,8 @@ static const struct proto_ops packet_ops; static const struct proto_ops stream_ops; static const struct proto_ops msg_ops; static struct proto tipc_proto; - static const struct rhashtable_params tsk_rht_params; -/* - * Revised TIPC socket locking policy: - * - * Most socket operations take the standard socket lock when they start - * and hold it until they finish (or until they need to sleep). Acquiring - * this lock grants the owner exclusive access to the fields of the socket - * data structures, with the exception of the backlog queue. A few socket - * operations can be done without taking the socket lock because they only - * read socket information that never changes during the life of the socket. - * - * Socket operations may acquire the lock for the associated TIPC port if they - * need to perform an operation on the port. If any routine needs to acquire - * both the socket lock and the port lock it must take the socket lock first - * to avoid the risk of deadlock. - * - * The dispatcher handling incoming messages cannot grab the socket lock in - * the standard fashion, since invoked it runs at the BH level and cannot block. - * Instead, it checks to see if the socket lock is currently owned by someone, - * and either handles the message itself or adds it to the socket's backlog - * queue; in the latter case the queued message is processed once the process - * owning the socket lock releases it. - * - * NOTE: Releasing the socket lock while an operation is sleeping overcomes - * the problem of a blocked socket operation preventing any other operations - * from occurring. However, applications must be careful if they have - * multiple threads trying to send (or receive) on the same socket, as these - * operations might interfere with each other. For example, doing a connect - * and a receive at the same time might allow the receive to consume the - * ACK message meant for the connect. While additional work could be done - * to try and overcome this, it doesn't seem to be worthwhile at the present. - * - * NOTE: Releasing the socket lock while an operation is sleeping also ensures - * that another operation that must be performed in a non-blocking manner is - * not delayed for very long because the lock has already been taken. - * - * NOTE: This code assumes that certain fields of a port/socket pair are - * constant over its lifetime; such fields can be examined without taking - * the socket lock and/or port lock, and do not need to be re-read even - * after resuming processing after waiting. These fields include: - * - socket type - * - pointer to socket sk structure (aka tipc_sock structure) - * - pointer to port structure - * - port reference - */ - static u32 tsk_own_node(struct tipc_sock *tsk) { return msg_prevnode(&tsk->phdr); -- cgit v0.10.2 From 6bc5445c0180a0c7cc61a95d131c7eac66459692 Mon Sep 17 00:00:00 2001 From: Peter Robinson Date: Sun, 20 Nov 2016 17:22:38 +0000 Subject: ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC There's not much point, except compile test, enabling the stmmac platform drivers unless the STM32 SoC is enabled. It's not useful without it. Signed-off-by: Peter Robinson Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig index 3818c5e..4b78168 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig @@ -107,7 +107,7 @@ config DWMAC_STI config DWMAC_STM32 tristate "STM32 DWMAC support" default ARCH_STM32 - depends on OF && HAS_IOMEM + depends on OF && HAS_IOMEM && (ARCH_STM32 || COMPILE_TEST) select MFD_SYSCON ---help--- Support for ethernet controller on STM32 SOCs. -- cgit v0.10.2 From 7c6ae610a1f0a9d3cebf790e0245b4e0f76aa86e Mon Sep 17 00:00:00 2001 From: Gao Feng Date: Mon, 21 Nov 2016 08:56:21 +0800 Subject: net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit The tc could return NET_XMIT_CN as one congestion notification, but it does not mean the packe is lost. Other modules like ipvlan, macvlan, and others treat NET_XMIT_CN as success too. So l2tp_eth_dev_xmit should add the NET_XMIT_CN check. Signed-off-by: Gao Feng Signed-off-by: David S. Miller diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c index 965f7e3..3dc97b4 100644 --- a/net/l2tp/l2tp_eth.c +++ b/net/l2tp/l2tp_eth.c @@ -97,7 +97,7 @@ static int l2tp_eth_dev_xmit(struct sk_buff *skb, struct net_device *dev) unsigned int len = skb->len; int ret = l2tp_xmit_skb(session, skb, session->hdr_len); - if (likely(ret == NET_XMIT_SUCCESS)) { + if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) { atomic_long_add(len, &priv->tx_bytes); atomic_long_inc(&priv->tx_packets); } else { -- cgit v0.10.2 From 7082c5c3f2407c52022507ffaf644dbbab97a883 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 21 Nov 2016 10:08:37 +0100 Subject: tcp: zero ca_priv area when switching cc algorithms We need to zero out the private data area when application switches connection to different algorithm (TCP_CONGESTION setsockopt). When congestion ops get assigned at connect time everything is already zeroed because sk_alloc uses GFP_ZERO flag. But in the setsockopt case this contains whatever previous cc placed there. Signed-off-by: Florian Westphal Signed-off-by: David S. Miller diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 1294af4..f9038d6 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -200,8 +200,10 @@ static void tcp_reinit_congestion_control(struct sock *sk, icsk->icsk_ca_ops = ca; icsk->icsk_ca_setsockopt = 1; - if (sk->sk_state != TCP_CLOSE) + if (sk->sk_state != TCP_CLOSE) { + memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv)); tcp_init_congestion_control(sk); + } } /* Manage refcounts on socket close. */ -- cgit v0.10.2