[lvc-project] [PATCH] block: bfq fix null pointer dereference of bfqg in bfq_bio_bfqg()
Anton Fadeev
anton.fadeev at red-soft.ru
Thu Jan 12 14:40:56 MSK 2023
12.01.2023 14:30, Yu Kuai пишет:
> CC Jan.
>
> 在 2023/01/12 19:24, Artem Chernyshev 写道:
>> Hi,
>> On Thu, Jan 12, 2023 at 07:09:10PM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/01/12 17:43, Artem Chernyshev 写道:
>>>> It is possible for bfqg to be NULL after being initialized as result of
>>>> blkg_to_bfqg() function.
>>>>
>>>> That was achieved on kernel 5.15.78, but should exist in mainline as
>>>> well
>>>
>>> The problem is already fixed in mainline by following patch:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f02be9002c480cd3ec0fcf184ad27cf531bd6ece
>>>
>>> Thanks,
>>> Kuai
>>>>
>>>> host1 login: [ 460.855794] watchdog: watchdog0: watchdog did not stop!
>>>> [ 898.944512] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000094
>>>> [ 899.285776] #PF: supervisor read access in kernel mode
>>>> [ 899.536511] #PF: error_code(0x0000) - not-present page
>>>> [ 899.647305] connection4:0: detected conn error (1020)
>>>> [ 899.786794] PGD 0 P4D 0
>>>> [ 899.786799] Oops: 0000 [#1] SMP PTI
>>>> [ 899.786802] CPU: 15 PID: 6073 Comm: ID iothread1 Not tainted
>>>> 5.15.78-1.el7virt.x86_64 #1
>>>> [ 899.786804] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
>>>> Gen9, BIOS P89 10/21/2019
>>>> [ 899.786806] RIP: 0010:bfq_bio_bfqg+0x26/0x80
>>>> [ 901.325944] Code: 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 89 f7
>>>> 53 48 8b 56 48 48 85 d2
>>>> 74 32 48 63 05 83 7f 35 01 48 83 c0 16 48 8b 5c c2 08 <80> bb 94 00
>>>> 00 00 00 00
>>>> [ 902.237825] RSP: 0018:ffffae2649437688 EFLAGS: 00010002
>>>> [ 902.493396] RAX: 0000000000000019 RBX: 0000000000000000 RCX:
>>>> dead000000000122
>>>> [ 902.841529] RDX: ffff8b6012cb3a00 RSI: ffff8b71002bbed0 RDI:
>>>> ffff8b71002bbed0
>>>> [ 903.189374] RBP: ffff8b601c46e800 R08: ffffae26494377c8 R09:
>>>> 0000000000000000
>>>> [ 903.532985] R10: 0000000000000001 R11: 0000000000000008 R12:
>>>> ffff8b6f844c5b30
>>>> [ 903.880809] R13: ffff8b601c46e800 R14: ffffae2649437760 R15:
>>>> ffff8b601c46e800
>>>> [ 904.220054] FS: 00007fec2fc4a700(0000) GS:ffff8b7f7f640000(0000)
>>>> kn1GS:00000000000000000
>>>> [ 904.614349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 904.894717] CR2: 0000000000000094 CR3: 0000000111fd8002 CR4:
>>>> 00000000003726e0
>>>> [ 905.243702] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>> 0000000000000000
>>>> [ 905.592493] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>> 0000000000000400
>>>> [ 905.936859] Call Trace:
>>>> [ 906.055955] <TASK>
>>>> [ 906.158109] bfq_bic_update_cgroup+0x2c/0x1f0
>>>> [ 906.371057] bfq_insert_requests+0x2c2/0x1fb0
>>>> [ 906.579207] blk_mq_sched_insert_request+0xc2/0x140
>>>> [ 906.817640] __blk_mq_try_issue_directly+0xe0/0x1f0
>>>> [ 907.055737] blk_mq_request_issue_directly+0x4e/0xa0
>>>> [ 907.298547] dm_mq_queue_rq+0x217/0x3e0
>>>> [ 907.485935] blk_mq_dispatch_rq_list+0x14b/0x860
>>>> [ 907.711973] ? sbitmap_get+0x87/0x1a0
>>>> [ 907.890370] blk_mq_do_dispatch_sched+0x350/0x3b0
>>>> [ 908.074869] NMI watchdog: Watchdog detected hard LOCKUP on cpu 40
>>>>
>>>> Fixes: 075a53b78b81 ("bfq: Make sure bfqg for which we are queueing
>>>> requests is online")
>>>> Co-developed-by: Anton Fadeev <anton.fadeev at red-soft.ru>
>>>> Signed-off-by: Anton Fadeev <anton.fadeev at red-soft.ru>
>>>> Signed-off-by: Artem Chernyshev <artem.chernyshev at red-soft.ru>
>>>> ---
>>>> block/bfq-cgroup.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>>>> index 1b2829e99dad..d4e9428cdbe5 100644
>>>> --- a/block/bfq-cgroup.c
>>>> +++ b/block/bfq-cgroup.c
>>>> @@ -616,7 +616,7 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data
>>>> *bfqd, struct bio *bio)
>>>> continue;
>>>> }
>>>> bfqg = blkg_to_bfqg(blkg);
>>>> - if (bfqg->online) {
>>>> + if (bfqg && bfqg->online) {
>>>> bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
>>>> return bfqg;
>>>> }
>>>>
>>
>> Sorry, forgot to mention, what behaviour was the same after we applied
>> this patch. Issue
>> was resolved only when we added NULL checking for bfqg.
>
> So, you mean that blkg is still online, while blkg_to_bfqg() return
> NULL. Can you explan how this is possible? I can't figure out how this
> is possible...
>
> Thanks,
> Kuai
>>
>> Thanks,
>> Artem
>>
>> .
>>
>
I'll try to describe. We have a virtualization cluster based on oVirt
4.4 with kernel 5.15.78 with patch you mentioned, there is four nodes in
it. Also we have an ISCSI target, that is connected to all nodes. LUNs
are connected in mixed mode, such as Storage Domains and direct LUNs to
VMs. When all VMs are in UP state, we simply disconnect the ISCSI target
from network or shutdown target service, then we got the described bug.
Thanks,
Anton.
More information about the lvc-project
mailing list