summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)Author
2007-07-15[NET_SCHED]: Kill CONFIG_NET_CLS_POLICEPatrick McHardy
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE, remove the old code. The config option will be kept around to select the equivalent NET_CLS_ACT options for a short time to allow easier upgrades. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15[NET_SCHED]: act_api: qdisc internal reclassify supportPatrick McHardy
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return it to the qdisc, which could handle it internally or ignore it. With NET_CLS_ACT however, tc_classify starts over at the first classifier and never returns it to the qdisc. This makes it impossible to support qdisc-internal reclassification, which in turn makes it impossible to remove the old NET_CLS_POLICE code without breaking compatibility since we have two qdiscs (CBQ and ATM) that support this. This patch adds a tc_classify_compat function that handles reclassification the old way and changes CBQ and ATM to use it. This again is of course not fully backwards compatible with the previous NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain compatibility *and* support qdisc internal reclassification with NET_CLS_ACT, but this seems like the better choice over keeping the two incompatible options around forever. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15[NET_SCHED]: sch_dsmark: act_api supportPatrick McHardy
Handle act_api classification results. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15[NET_SCHED]: sch_atm: act_api supportPatrick McHardy
Handle act_api classification results. The ATM scheduler behaves slightly different than other schedulers in that it only handles policer results for successful classifications, this behaviour is retained for the act_api case. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15[NET_SCHED]: sch_atm: LindentPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimizationPatrick McHardy
As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run from the timer handler can result in deadlock: > CPU#0 > > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! The entire optimization is a bit questionable IMO, it moves potentially large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ, which kind of defeats the separation of them. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Ranko Zivojnovic <ranko@spidernet.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-12[NET_SCHED]: ematch: module autoloadingPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: Make HTB scheduler work with TSO.Ranjit Manomohan
Currently the HTB scheduler does not correctly account for TSO packets which causes large inaccuracies in the bandwidth control when using TSO. This patch allows the HTB scheduler to work with TSO enabled devices. Signed-off-by: Ranjit Manomohan <ranjitm@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: Remove unnecessary includesPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: sch_htb: use generic estimatorPatrick McHardy
Use the generic estimator instead of reimplementing (parts of) it. For compatibility always create a default estimator for new classes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: Remove unnecessary stats_lock pointersPatrick McHardy
Remove stats_lock pointers from qdisc-internal structures, in all cases it points to dev->queue_lock. The only case where it is necessary is for top-level qdiscs, where it might also point to dev->ingress_lock in case of the ingress qdisc. Also remove it from actions completely, it always points to the actions internal lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR optionPatrick McHardy
The generic estimator is always built in anways and all the config options does is prevent including a minimal amount of code for setting it up. Additionally the option is already automatically selected for most cases. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[SCHED]: Qdisc changes and sch_rr added for multiqueuePeter P Waskiewicz Jr
Add the new sch_rr qdisc for multiqueue network device support. Allow sch_prio and sch_rr to be compiled with or without multiqueue hardware support. sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS. This was done since sch_prio and sch_rr only differ in their dequeue routine. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[CORE] Stack changes to add multiqueue hardware support APIPeter P Waskiewicz Jr
Add the multiqueue hardware device support API to the core network stack. Allow drivers to allocate multiple queues and manage them at the netdev level if they choose to do so. Added a new field to sk_buff, namely queue_mapping, for drivers to know which tx_ring to select based on OS classification of the flow. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET]: qdisc_restart - couple of optimizations.Krishna Kumar
Changes : - netif_queue_stopped need not be called inside qdisc_restart as it has been called already in qdisc_run() before the first skb is sent, and in __qdisc_run() after each intermediate skb is sent (note : we are the only sender, so the queue cannot get stopped while the tx lock was got in the ~LLTX case). - BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1 meant more packets are available, and __qdisc_run used to loop when qdisc_restart() returned -1. During those days, it was necessary to make sure that qlen is never less than zero, since __qdisc_run would get into an infinite loop if no packets are on the queue and this bug in qdisc was there (and worse - no more skbs could ever get queue'd as we hold the queue lock too). With Herbert's recent change to return values, this check is not required. Hopefully Herbert can validate this change. If at all this is required, it should be added to skb_dequeue (in failure case), and not to qdisc_qlen. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET]: qdisc_restart - readability changes plus one bug fix.Krishna Kumar
New changes : - Incorporated Peter Waskiewicz's comments. - Re-added back one warning message (on driver returning wrong value). Previous changes : - Converted to use switch/case code which looks neater. - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless check should be removed, since driver will return NETDEV_TX_LOCKED only if lockless is true and driver has to do the locking. In the original code as well as the latest code, this code can result in a bug where if LLTX is not set for a driver (lockless == 0) but the driver is written wrongly to do a trylock (despite LLTX being set), the driver returns LOCKED. But since lockless is zero, the packet is requeue'd instead of calling collision code which will issue warning and free up the skb. Instead this skb will be retried with this driver next time, and the same result will ensue. Removing this check will catch these driver bugs instead of hiding the problem. I am keeping this change to readability section since : a. it is confusing to check two things as it is; and b. it is difficult to keep this check in the changed 'switch' code. - Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is the work being done and easier to understand) and do_dev_requeue to dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to dev_handle_collision (handle_dev_cpu_collision is a small routine with only one caller, so there is no need to have two separate routines which also results in getting rid of two macros, etc. - Removed an XXX comment as it should never fail (I suspect this was related to batch skb WIP, Jamal ?). Converted some functions to original coding style of having the return values and the function name on same line, eg prio2list. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11[NET_SCHED]: Cleanup readability of qdisc restartJamal Hadi Salim
Over the years this code has gotten hairier. Resulting in many long discussions over long summer days and patches that get it wrong. This patch helps tame that code so normal people will understand it. Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy for their valuable reviews. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[NET_SCHED]: Fix filter double freePatrick McHardy
cbq and atm destroy their filters twice when destroying inner classes during qdisc destruction. Reported-and-tested-by: Strobl Anton <a.strobl@aws-it.at> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-04[NET]: Fix comparisons of unsigned < 0.Bill Nottingham
Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-04[NET]: Make net watchdog timers 1 sec jiffy aligned.Venkatesh Pallipadi
round_jiffies for net dev watchdog timer. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[NET_SCHED]: sch_htb: fix event cache time calculationPatrick McHardy
The event cache time must be an absolute value, when no event exists it is incorrectly set to 1s instead of 1s in the future. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[NET_SCHED]: Fix qdisc_restart return value when dequeue is emptyHerbert Xu
My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-14[NET_SCHED]: prio qdisc boundary conditionJamal Hadi Salim
This fixes an out-of-boundary condition when the classified band equals q->bands. Caught by Alexey Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-11[NET_SCHED]: Avoid requeue warning on dev_deactivateHerbert Xu
When we relinquish queue_lock in qdisc_restart and then retake it for requeueing, we might race against dev_deactivate and end up requeueing onto noop_qdisc. This causes a warning to be printed. This patch fixes this by checking this before we requeue. As an added bonus, we can remove the same check in __qdisc_run which was added to prevent dev->gso_skb from being requeued when we're shutting down. Even though we've had to add a new conditional in its place, it's better because it only happens on requeues rather than every single time that qdisc_run is called. For this to work we also need to move the clearing of gso_skb up in dev_deactivate as now qdisc_restart can occur even after we wait for __LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long as the queue and gso_skb is already clear). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-11[NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OKHerbert Xu
Now that we return the queue length after NETDEV_TX_OK we better make sure that we have the right queue. Otherwise we can cause a stall after a really quick dev_deactive/dev_activate. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-11[NET_SCHED]: Rationalise return value of qdisc_restartHerbert Xu
The current return value scheme and associated comment was invented back in the 20th century when we still had that tbusy flag. Things have changed quite a bit since then (even Tony Blair is moving on now, not to mention the new French president). All we need to indicate now is whether the caller should continue processing the queue. Therefore it's sufficient if we return 0 if we want to stop and non-zero otherwise. This is based on a patch by Krishna Kumar. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-11[NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED caseThomas Graf
When transmit fails with NETDEV_TX_LOCKED the skb is requeued to dev->qdisc again. The dev->qdisc pointer is protected by the queue lock which needs to be dropped when attempting to transmit and acquired again before requeing. The problem is that qdisc_restart() fetches the dev->qdisc pointer once and stores it in the `q' variable which is invalidated when dropping the queue_lock, therefore the variable needs to be refreshed before requeueing. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-11[NET_SCHED]: teql_enqueue can check limits before skb enqueueKrishna Kumar
Optimize teql_enqueue so that it first checks limits before enqueing. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NET]: Rework dev_base via list_head (v3)Pavel Emelianov
Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET]: cleanup extra semicolonsStephen Hemminger
Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: ingress: switch back to using ingress_lockPatrick McHardy
Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: Eliminate qdisc_tree_lockPatrick McHardy
Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: qdisc: remove unnecessary memory barriersPatrick McHardy
We're holding dev->queue_lock in qdisc_watchdog_schedule and qdisc_watchdog_cancel, no need for the barriers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: Unline tcf_destroyPatrick McHardy
Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: turn PSCHED_GET_TIME into inline functionPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline functionPatrick McHardy
Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: kill PSCHED_TDIFFPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECTPatrick McHardy
Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: kill PSCHED_TLESSPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: kill PSCHED_AUDIT_TDIFFPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED]: sch_netem: fix off-by-one in send time comparisonPatrick McHardy
netem checks PSCHED_TLESS(cb->time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb->time_to_send < now. Use !PSCHED_TLESS(now, cb->time_to_send) instead to properly handle cb->time_to_send == now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NETEM]: spelling errorsStephen Hemminger
Get rid of some of my creative spelling. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NET_SCHED] qdisc: avoid transmit softirq on watchdog wakeupStephen Hemminger
If possible, avoid having to do a transmit softirq when a qdisc watchdog decides to re-enable. The watchdog routine runs off a timer, so it is already in the same effective context as the softirq. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NETEM]: avoid excessive requeuesStephen Hemminger
The netem code would call getnstimeofday() and dequeue/requeue after every packet, even if it was waiting. Avoid this overhead by using the throttled flag. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NETEM]: Optimize tfifoStephen Hemminger
In most cases, the next packet will be sent after the last one. So optimize that case. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NETEM]: use better types for time valuesStephen Hemminger
The random number generator always generates 32 bit values. The time values are limited by psched_tdiff_t Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[NETEM]: report reorder percent correctly.Stephen Hemminger
If you setup netem to just delay packets; "tc qdisc ls" will report the reordering as 100%. Well it's a lie, reorder isn't used unless gap is set, so just set value to 0 so the output of utility is correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[PKT_SCHED] act: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[PKT_SCHED] cls: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>