提交 · 4.19.140-2008.2.0 · openeuler / Kernel

25 8月, 2020 40 次提交

can: j1939: add rxtimer for multipacket broadcast session · 1ca362cc

由 Zhang Changzhong 提交于 8月 25, 2020

mainline inclusion
from mainline-v5.9-rc2
commit 0ae18a82
category: bugfix
bugzilla: 39990
CVE: NA

---------------------------

According to SAE J1939/21 (Chapter 5.12.3 and APPENDIX C), for transmit side
the required time interval between packets of a multipacket broadcast message
is 50 to 200 ms, the responder shall use a timeout of 250ms (provides margin
allowing for the maximumm spacing of 200ms). For receive side a timeout will
occur when a time of greater than 750 ms elapsed between two message packets
when more packets were expected.

So this patch fix and add rxtimer for multipacket broadcast session.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1596599425-5534-5-git-send-email-zhangchangzhong@huawei.comAcked-by: NOleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1ca362cc

can: j1939: abort multipacket broadcast session when timeout occurs · c81ffe3c

由 Zhang Changzhong 提交于 8月 25, 2020

mainline inclusion
from mainline-v5.9-rc2
commit 2b8b2e31
category: bugfix
bugzilla: 39990
CVE: NA

---------------------------

If timeout occurs, j1939_tp_rxtimer() first calls hrtimer_start() to restart
rxtimer, and then calls __j1939_session_cancel() to set session->state =
J1939_SESSION_WAITING_ABORT. At next timeout expiration, because of the
J1939_SESSION_WAITING_ABORT session state j1939_tp_rxtimer() will call
j1939_session_deactivate_activate_next() to deactivate current session, and
rxtimer won't be set.

But for multipacket broadcast session, __j1939_session_cancel() don't set
session->state = J1939_SESSION_WAITING_ABORT, thus current session won't be
deactivate and hrtimer_start() is called to start new rxtimer again and again.

So fix it by moving session->state = J1939_SESSION_WAITING_ABORT out of if
(!j1939_cb_is_broadcast(&session->skcb)) statement.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1596599425-5534-4-git-send-email-zhangchangzhong@huawei.comAcked-by: NOleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c81ffe3c

can: j1939: cancel rxtimer on multipacket broadcast session complete · f36b2a73

由 Zhang Changzhong 提交于 8月 25, 2020

mainline inclusion
from mainline-v5.9-rc2
commit e8b17653
category: bugfix
bugzilla: 39990
CVE: NA

---------------------------

If j1939_xtp_rx_dat_one() receive last frame of multipacket broadcast message,
j1939_session_timers_cancel() should be called to cancel rxtimer.

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1596599425-5534-3-git-send-email-zhangchangzhong@huawei.comAcked-by: NOleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f36b2a73

can: j1939: fix support for multipacket broadcast message · 7b4f9fe3

由 Zhang Changzhong 提交于 8月 25, 2020

mainline inclusion
from mainline-v5.9-rc2
commit f4fd77fd
category: bugfix
bugzilla: 39990
CVE: NA

---------------------------

Currently j1939_tp_im_involved_anydir() in j1939_tp_recv() check the previously
set flags J1939_ECU_LOCAL_DST and J1939_ECU_LOCAL_SRC of incoming skb, thus
multipacket broadcast message was aborted by receive side because it may come
from remote ECUs and have no exact dst address. Similarly, j1939_tp_cmd_recv()
and j1939_xtp_rx_dat() didn't process broadcast message.

So fix it by checking and process broadcast message in j1939_tp_recv(),
j1939_tp_cmd_recv() and j1939_xtp_rx_dat().

Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1596599425-5534-2-git-send-email-zhangchangzhong@huawei.comAcked-by: NOleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7b4f9fe3

blk-iocost: fix spin_lock won't release in sq · 08668f83

由 Yu Kuai 提交于 8月 25, 2020

hulk inclusion
category: feature
bugzilla: 38688
CVE: NA

---------------------------

ioc_rqos_throttle() will be called with spin_lock_irq(q->queue_lock)
called in sq. However, ioc_rqos_throttle() will call
spin_lock_irq(&ioc->lock) and spin_unlock_irq(&ioc->lock) before
'q->queue_lock' is released. Thus, local irqs will be enabled after
'ioc->lock' is released and 'q->queue_lock' might never be released.

spin_lock_irq(q->queue_lock)        --> local irq will be disabled
    rq_qos_throttle
        spin_lock_irq(&ioc->lock)
        spin_unlock_irq(&ioc->lock) --> local irq will be enabled before
                                        'q->queue_lock' is released
        spin_unlock_irq(&q->queue_lock)
        io_schedule()
        spin_lock_irq(q->queue_lock)

Fix the problem by using spin_lock_irqsave()/spin_unlock_irqrestore()
for other locks.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

08668f83

iocost: fix a deadlock in ioc_rqos_throttle() · ceb9b8c3

由 Jiufei Xue 提交于 8月 25, 2020

hulk inclusion
category: feature
bugzilla: 38688
CVE: NA

---------------------------

ioc_rqos_throttle() may called inside queue_lock, the lock should
be unlocked before sleep.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

Conflict: block/blk-iocost.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ceb9b8c3

iocost: fix NULL pointer dereference in ioc_rqos_throttle · c076cd0f

由 Jiufei Xue 提交于 8月 25, 2020

hulk inclusion
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Bios are not associated with blkg before entering iocost controller.
do it in ioc_rqos_throttle() as well as ioc_rqos_merge().

Considering that there are so many chances to create blkg before
ioc_rqos_merge(), we just lookup the blkg here and if blkg are not
exist, just return rather than create it.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

Conflict: block/blk-iocost.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c076cd0f

iocost: add cgroup V1 suport · a0e391d9

由 Yu Kuai 提交于 8月 25, 2020

hulk inclusion
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Add definition of 'legacy_cftypes', so that iocost can be used in
cgroup V1.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a0e391d9

blkcg: Fix multiple bugs in blkcg_activate_policy() · e937794e

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc4
commit 9d179b86
category: feature
bugzilla: 38688
CVE: NA

---------------------------

blkcg_activate_policy() has the following bugs.

* cf09a8ee ("blkcg: pass @q and @blkcg into
  blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
  blkcg_activate_policy() ends up using pd's allocated for the root
  blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
  can be passed in pd's which are allocated for the root blkcg.

  For blk-iocost, this means that ->pd_init_fn() can write beyond the
  end of the allocated object as it determines the length of the flex
  array at the end based on the blkcg's nesting level.

* Each pd is initialized as they get allocated.  If alloc fails, the
  policy will get freed with pd's initialized on it.

* After the above partial failure, the partial pds are not freed.

This patch fixes all the above issues by

* Restructuring blkcg_activate_policy() so that alloc and init passes
  are separate.  Init takes place only after all allocs succeeded and
  on failure all allocated pds are freed.

* Unifying and fixing the cleanup of the remaining pd_prealloc.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: cf09a8ee ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict: block/blk-cgroup.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e937794e

blkcg: blkcg_activate_policy() should initialize ancestors first · 9610e8ca

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.3-rc1
commit 71c81407
category: feature
bugzilla: 38688
CVE: NA

---------------------------

When blkcg_activate_policy() is creating blkg_policy_data for existing
blkgs, it did in the wrong order - descendants first.  Fix it.  None
of the existing controllers seem affected by this.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict: block/blk-cgroup.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9610e8ca

blkcg: blk-iocost: predeclare used structs · ae770f09

由 Stephen Rothwell 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 8d1c1560
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ae770f09

blk-iocost: fix incorrect vtime comparison in iocg_is_idle() · 2ee0cc79

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.6-rc6
commit dcd6589b
category: feature
bugzilla: 38688
CVE: NA

---------------------------

vtimes may wrap and time_before/after64() should be used to determine
whether a given vtime is before or after another. iocg_is_idle() was
incorrectly using plain "<" comparison do determine whether done_vtime
is before vtime. Here, the only thing we're interested in is whether
done_vtime matches vtime which indicates that there's nothing in
flight. Let's test for inequality instead.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2ee0cc79

blk-iocost: Fix error on iocost_ioc_vrate_adj · fa63e6e6

由 Waiman Long 提交于 8月 25, 2020

mainline inclusion
from mainline-5.7-rc3
commit d6c8e949
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
argument of the iocost_ioc_vrate_adj trace entry defined in
include/trace/events/iocost.h leading to the following error:

  /tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
  error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
   , u32[]* __tracepoint_arg_missed_ppm

That argument type is indeed rather complex and hard to read. Looking
at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
the argument to a simple "u32 *missed_ppm" and adjusting the trace
entry accordingly, the compilation error was gone.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fa63e6e6

iocost: Fix iocost_monitor.py due to helper type mismatch · 235c6b1c

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.6-rc1
commit 9ea37e24
category: feature
bugzilla: 38688
CVE: NA

---------------------------

iocost_monitor.py broke with recent versions of drgn due to helper
being stricter about types.  Fix it so that it uses the correct type.
Signed-off-by: NTejun Heo <tj@kernel.org>
Suggested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

235c6b1c

iocost: over-budget forced IOs should schedule async delay · 8bbf782b

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.5-rc3
commit d7bd15a1
category: feature
bugzilla: 38688
CVE: NA

---------------------------

When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task.  This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().

However, it can affect control quality on filesystem metadata heavy
operations.  Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8bbf782b

iocost: check active_list of all the ancestors in iocg_activate() · a8dc8024

由 Jiufei Xue 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc8
commit 8b37bc27
category: feature
bugzilla: 38688
CVE: NA

---------------------------

There is a bug that checking the same active_list over and over again
in iocg_activate(). The intention of the code was checking whether all
the ancestors and self have already been activated. So fix it.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a8dc8024

iocost: don't nest spin_lock_irq in ioc_weight_write() · e07037c5

由 Dan Carpenter 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc6
commit 41591a51
category: feature
bugzilla: 38688
CVE: NA

---------------------------

This code causes a static analysis warning:

    block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'

We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish().  IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e07037c5

iocost: bump up default latency targets for hard disks · 9f427cd7

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 7afcccaf
category: feature
bugzilla: 38688
CVE: NA

---------------------------

The default hard disk param sets latency targets at 50ms.  As the
default target percentiles are zero, these don't directly regulate
vrate; however, they're still used to calculate the period length -
100ms in this case.

This is excessively low.  A SATA drive with QD32 saturated with random
IOs can easily reach avg completion latency of several hundred msecs.
A period duration which is substantially lower than avg completion
latency can lead to wildly fluctuating vrate.

Let's bump up the default latency targets to 250ms so that the period
duration is sufficiently long.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9f427cd7

iocost: improve nr_lagging handling · 16f8275d

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 7cd806a9
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Some IOs may span multiple periods.  As latencies are collected on
completion, the inbetween periods won't register them and may
incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
avoid those situations.  Currently, whenever there are IOs which are
spanning from the previous period, busy_level is reset to 0 if
negative thus suppressing vrate increase.

This has the following two problems.

* When latency target percentiles aren't set, vrate adjustment should
  only be governed by queue depth depletion; however, the current code
  keeps nr_lagging active which pulls in latency results and can keep
  down vrate unexpectedly.

* When lagging condition is detected, it resets the entire negative
  busy_level.  This turned out to be way too aggressive on some
  devices which sometimes experience extended latencies on a small
  subset of commands.  In addition, a lagging IO will be accounted as
  latency target miss on completion anyway and resetting busy_level
  amplifies its impact unnecessarily.

This patch fixes the above two problems by disabling nr_lagging
counting when latency target percentiles aren't set and blocking vrate
increases when there are lagging IOs while leaving busy_level as-is.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

16f8275d

iocost: better trace vrate changes · 52decb35

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 25d41e4a
category: feature
bugzilla: 38688
CVE: NA

---------------------------

vrate_adj tracepoint traces vrate changes; however, it does so only
when busy_level is non-zero.  busy_level turning to zero can sometimes
be as interesting an event.  This patch also enables vrate_adj
tracepoint on other vrate related events - busy_level changes and
non-zero nr_lagging.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

52decb35

iocost_monitor: Report debt · d03796e5

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 7c1ee704
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Report debt and rename del_ms row to delay for consistency.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d03796e5

iocost_monitor: Report more info with higher accuracy · 6f098a3f

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit b06f2d35
category: feature
bugzilla: 38688
CVE: NA

---------------------------

When outputting json:

* Don't truncate numbers.

* Report address of iocg to ease drilling down further.

When outputting table:

* Use math.ceil() for delay_ms so that small delays don't read as 0.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6f098a3f

iocost_monitor: Always use strings for json values · 61472106

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit e742bd5c
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Json has limited accuracy for numbers and can silently truncate 64bit
values, which can be extremely confusing.  Let's consistently use
string encapsulated values for json output.

While at it, convert an unnecesary f-string to str().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

61472106

blk-iocost: Don't let merges push vtime into the future · b645340f

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit e1518f63
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Merges have the same problem that forced-bios had which is fixed by
the previous patch.  The cost of a merge is calculated at the time of
issue and force-advances vtime into the future.  Until global vtime
catches up, how the cgroup's hweight changes in the meantime doesn't
matter and it often leads to situations where the cost is calculated
at one hweight and paid at a very different one.  See the previous
patch for more details.

Fix it by never advancing vtime into the future for merges.  If budget
is available, vtime is advanced.  Otherwise, the cost is charged as
debt.

This brings merge cost handling in line with issue cost handling in
ioc_rqos_throttle().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b645340f

blk-iocost: Account force-charged overage in absolute vtime · 38fdf69b

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 36a52481
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Currently, when a bio needs to be force-charged and there isn't enough
budget, vtime is simply pushed into the future.  This means that the
cost of the whole bio is scaled using the current hweight and then
charged immediately.  Until the global vtime advances beyond this
future vtime, the cgroup won't be allowed to issue normal IOs.

This is incorrect and can lead to, for example, exploding vrate or
extended stalls if vrate range is constrained.  Consider the following
scenario.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This is because the hweight the overage is calculated at is different
from the hweight that it's being paid at.

Fix it by remembering the overage in absoulte vtime and continuously
paying with the actual budget according to the current hweight at each
period.

Note that non-forced bios which wait already remembers the cost in
absolute vtime.  This brings forced-bio accounting in line.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

38fdf69b

blk-iocost: Fix incorrect operation order during iocg free · 2263a666

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit e036c4ca
category: feature
bugzilla: 38688
CVE: NA

---------------------------

ioc_pd_free() first cancels the hrtimers and then deactivates the
iocg.  However, the iocg timer can run inbetween and reschedule the
hrtimers which will end up running after the iocg is freed leading to
crashes like the following.

  general protection fault: 0000 [#1] SMP
  ...
  RIP: 0010:iocg_kick_delay+0xbe/0x1b0
  RSP: 0018:ffffc90003598ea0 EFLAGS: 00010046
  RAX: 1cee00fd69512b54 RBX: ffff8881bba48400 RCX: 00000000000003e8
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881bba48400
  RBP: 0000000000004e20 R08: 0000000000000002 R09: 00000000000003e8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003598ef0
  R13: 00979f3810ad461f R14: ffff8881bba4b400 R15: 25439f950d26e1d1
  FS:  0000000000000000(0000) GS:ffff88885f800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f64328c7e40 CR3: 0000000002409005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   iocg_delay_timer_fn+0x3d/0x60
   __hrtimer_run_queues+0xfe/0x270
   hrtimer_interrupt+0xf4/0x210
   smp_apic_timer_interrupt+0x5e/0x120
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Fix it by canceling hrtimers after deactivating the iocg.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2263a666

blkcg: add missing NULL check in ioc_cpd_alloc() · 82baa338

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit e916ad29
category: feature
bugzilla: 38688
CVE: NA

---------------------------

ioc_cpd_alloc() forgot to check NULL return from kzalloc().  Add it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

82baa338

blkcg: fix missing free on error path of blk_iocost_init() · 04342ad9

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 3532e722
category: feature
bugzilla: 38688
CVE: NA

---------------------------

blk_iocost_init() forgot to free its percpu stat on the error path.
Fix it.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Reported-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

04342ad9

blkcg: add tools/cgroup/iocost_coef_gen.py · 39ec6a90

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 8504dea7
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Add a script which can be used to generate device-specific iocost
linear model coefficients.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

39ec6a90

blkcg: add tools/cgroup/iocost_monitor.py · 53292d15

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 6954ff18
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Instead of mucking with debugfs and ->pd_stat(), add drgn based
monitoring script.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

53292d15

blkcg: implement blk-iocost · 9c441f73

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 7caa4715
category: feature
bugzilla: 38688
CVE: NA

---------------------------

This patchset implements IO cost model based work-conserving
proportional controller.

While io.latency provides the capability to comprehensively prioritize
and protect IOs depending on the cgroups, its protection is binary -
the lowest latency target cgroup which is suffering is protected at
the cost of all others.  In many use cases including stacking multiple
workload containers in a single system, it's necessary to distribute
IO capacity with better granularity.

One challenge of controlling IO resources is the lack of trivially
observable cost metric.  The most common metrics - bandwidth and iops
- can be off by orders of magnitude depending on the device type and
IO pattern.  However, the cost isn't a complete mystery.  Given
several key attributes, we can make fairly reliable predictions on how
expensive a given stream of IOs would be, at least compared to other
IO patterns.

The function which determines the cost of a given IO is the IO cost
model for the device.  This controller distributes IO capacity based
on the costs estimated by such model.  The more accurate the cost
model the better but the controller adapts based on IO completion
latency and as long as the relative costs across differents IO
patterns are consistent and sensible, it'll adapt to the actual
performance of the device.

Currently, the only implemented cost model is a simple linear one with
a few sets of default parameters for different classes of device.
This covers most common devices reasonably well.  All the
infrastructure to tune and add different cost models is already in
place and a later patch will also allow using bpf progs for cost
models.

Please see the top comment in blk-iocost.c and documentation for
more details.

v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
    for a divide-by-zero bug in current_hweight() triggered by zero
    inuse_sum.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
  block/Kconfig
  block/Makefile
  include/linux/blk_types.h
  block/blk-iocost.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9c441f73

blkcg: make ->cpd_init_fn() optional · d25cec00

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 86a5bba5
category: feature
bugzilla: 38688
CVE: NA

---------------------------

For policies which can do enough initialization from ->cpd_alloc_fn(),
make ->cpd_init_fn() optional.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d25cec00

blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn() · 990d8f58

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit cf09a8ee
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Instead of @node, pass in @q and @blkcg so that the alloc function has
more context.  This doesn't cause any behavior change and will be used
by io.weight implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflits:
 block/blk-iolatency.c
 block/blk-cgroup.c
 block/cfq-iosched.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

990d8f58

cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS · 62b6b2eb

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.3-rc1
commit 38cf3a68
category: feature
bugzilla: 38688
CVE: NA

---------------------------

a5e112e6 ("cgroup: add cgroup_parse_float()") accidentally added
cgroup_parse_float() inside CONFIG_SYSFS block.  Move it outside so
that it doesn't cause failures on !CONFIG_SYSFS builds.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: a5e112e6 ("cgroup: add cgroup_parse_float()")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

62b6b2eb

cgroup: add cgroup_parse_float() · 8a7c2bc9

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.3-rc1
commit a5e112e6
category: feature
bugzilla: 38688
CVE: NA

---------------------------

cgroup already uses floating point for percent[ile] numbers and there
are several controllers which want to take them as input.  Add a
generic parse helper to handle inputs.

Update the interface convention documentation about the use of
percentage numbers.  While at it, also clarify the default time unit.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8a7c2bc9

blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep() · 311b868f

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 015d254c
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled.  This doesn't introduce any functional changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

311b868f

block/rq_qos: implement rq_qos_ops->queue_depth_changed() · 2ef177be

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 9677a3e0
category: feature
bugzilla: 38688
CVE: NA

---------------------------

wbt already gets queue depth changed notification through
wbt_set_queue_depth().  Generalize it into
rq_qos_ops->queue_depth_changed() so that other rq_qos policies can
easily hook into the events too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
  block/blk-rq-qos.c
  block/blk-rq-qos.h
  block/blk-wbt.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2ef177be

block/rq_qos: add rq_qos_merge() · 8843e68d

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit d3e65fff
category: feature
bugzilla: 38688
CVE: NA

---------------------------

Add a merge hook for rq_qos.  This will be used by io.weight.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
  block/blk-rq-qos.c
  block/blk-core.c
  block/blk-rq-qos.h
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8843e68d

blk-mq: add optional request->alloc_time_ns · 449de6b8

由 Tejun Heo 提交于 8月 25, 2020

mainline inclusion
from mainline-5.4-rc1
commit 6f816b4b
category: feature
bugzilla: 38688
CVE: NA

---------------------------

There are currently two start time timestamps - start_time_ns and
io_start_time_ns.  The former marks the request allocation and and the
second issue-to-device time.  The planned io.weight controller needs
to measure the total time bios take to execute after it leaves rq_qos
including the time spent waiting for request to become available,
which can easily dominate on saturated devices.

This patch adds request->alloc_time_ns which records when the request
allocation attempt started.  As it isn't used for the usual stats,
make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
no users and it's active only on queues which need it even when
compiled in.

v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
    gating as suggested by Jens.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
  include/linux/blkdev.h
  block/Kconfig
  block/blk-mq.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

449de6b8

config: set default value of haltpoll · 1f3268e2

由 Xiangyou Xie 提交于 8月 25, 2020

hulk inclusion
category: config
bugzilla: NA
CVE: NA

We enable haltpoll by default for the improvement of performance.
X86 has been supported. Now, we will provide it on ARM.
Signed-off-by: NXiangyou Xie <xiexiangyou@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1f3268e2

openeuler / Kernel 9 个月 前同步成功

openeuler / Kernel
9 个月前同步成功