[lvc-project] [PATCH v3] Fix srcu_struct node grpmask overflow on 64-bit systems

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Sep 5 16:43:04 MSK 2023


On 9/5/23 09:38, Paul E. McKenney wrote:
> On Tue, Sep 05, 2023 at 08:57:53AM -0400, Mathieu Desnoyers wrote:
>> On 9/4/23 09:58, Paul E. McKenney wrote:
>>> On Mon, Sep 04, 2023 at 08:58:48AM -0400, Mathieu Desnoyers wrote:
>>>> On 9/4/23 08:42, Mathieu Desnoyers wrote:
>>>>> On 9/4/23 08:21, Denis Arefev wrote:
>>>>>> The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
>>>>>> is subject to overflow due to a failure to cast operands to a larger
>>>>>> data type before performing arithmetic.
>>>>>>
>>>>>> The maximum result of this subtraction is defined by the RCU_FANOUT
>>>>>> or other srcu level-spread values assigned by rcu_init_levelspread(),
>>>>>> which can indeed cause the signed 32-bit integer literal ("1") to
>>>>>> overflow
>>>>>> when shifted by any value greater than 31.
>>>>>
>>>>> We could expand on this:
>>>>>
>>>>> The maximum result of this subtraction is defined by the RCU_FANOUT
>>>>> or other srcu level-spread values assigned by rcu_init_levelspread(),
>>>>> which can indeed cause the signed 32-bit integer literal ("1") to overflow
>>>>> when shifted by any value greater than 31 on a 64-bit system.
>>>>>
>>>>> Moreover, when the subtraction value is 31, the 1 << 31 expression results
>>>>> in 0xffffffff80000000 when the signed integer is promoted to unsigned long
>>>>> on 64-bit systems due to type promotion rules, which is certainly not the
>>>>> intended result.
>>>
>>> Thank you both!  Could you please also add something to the effect of:
>>> "Given default Kconfig options, this bug affects only systems with more
>>> than 512 CPUs."?
>>
>> Hi Paul,
>>
>> I'm trying to understand this "NR_CPUS > 512 CPUs" default Kconfig lower
>> bound from kernel/rcu/Kconfig and rcu_node_tree.h. Is that on a 32-bit or
>> 64-bit architecture ? Also, I suspect that something like x86-64 MAXSMP (or
>> an explicit NR_CPUS) needs to be selected over a default Kconfig to support
>> that many CPUs.
> 
> 64-bit only.  I believe that 32-bit kernels are unaffected by this bug.
> 
> The trick is that RCU reshapes the rcu_node tree in rcu_init_geometry(),
> which is invoked during early boot from rcu_init().  This reshaping is
> based on nr_cpu_ids.  So if NR_CPUS is (say) 4096, there will be enough
> rcu_node structures allocated at build time to accommodate 4096 CPUs
> (259 of them, 256 leaf nodes, four internal nodes, and one root node),
> but only assuming dense numbering of CPUs.  If rcu_init_geometry() sees
> that nr_cpu_ids is (say) 64, it will use only five of them, that is,
> four leaf nodes and one root node.  The leaf nodes will need to shift
> by at most 16, and the root node by at most 4.
> 
> But the possibility of sparse CPU numbering (perhaps to your point)
> means that the bug can occur in 64-bit kernels booted on systems with
> 512 CPUs or fewer if that system has sparse CPU IDs.  For example,
> there have been systems that disable all but one hardware thread per
> core, but leave places in the CPU numbering for those disabled threads.
> Such a system with four hardware threads per core could have a CPU 516
> (and thus be affected by this bug) with as few as 129 CPUs.
> 
> So a better request would be for something like: "Given default Kconfig
> options, this bug affects only 64-bit systems having at least one CPU
> for which smp_processor_id() returns 512 or greater."
> 
> Does that help, or am I missing your point?

This is a good point, although not the one I was trying to make. See my 
explanation about impact of having exactly 512 wrt signed integer type 
promotion in a separate email. So your last phrasing "returns 512 or 
greater" is better. Previously it appeared that only systems with _more 
than_ 512 cpus were affected, which was off-by-one considering that 
systems with exactly 512 cpus are an issue as well.

Thanks,

Mathieu


> 
> 							Thanx, Paul
> 
>> Thanks,
>>
>> Mathieu
>>
>>
>>>
>>> 							Thanx, Paul
>>>
>>>>>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>>>>
>>>>> With the commit message updated with my comment above, please also add:
>>>>>
>>>>> Fixes: c7e88067c1 ("srcu: Exact tracking of srcu_data structures
>>>>> containing callbacks")
>>>>> Cc: <stable at vger.kernel.org> # v4.11
>>>>
>>>> Sorry, the line above should read:
>>>>
>>>> Cc: <stable at vger.kernel.org> # v4.11+
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Mathieu
>>>>>
>>>>>>
>>>>>> Signed-off-by: Denis Arefev <arefev at swemel.ru>
>>>>>> ---
>>>>>> v3: Changed the name of the patch, as suggested by
>>>>>> Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>>>>>> v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
>>>>>> Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>>>>>>     kernel/rcu/srcutree.c | 4 ++--
>>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>>>>>> index 20d7a238d675..6c18e6005ae1 100644
>>>>>> --- a/kernel/rcu/srcutree.c
>>>>>> +++ b/kernel/rcu/srcutree.c
>>>>>> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct
>>>>>> srcu_struct *ssp, gfp_t gfp_flags)
>>>>>>                     snp->grplo = cpu;
>>>>>>                 snp->grphi = cpu;
>>>>>>             }
>>>>>> -        sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>>>>>> +        sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>>>>>>         }
>>>>>>         smp_store_release(&ssp->srcu_sup->srcu_size_state,
>>>>>> SRCU_SIZE_WAIT_BARRIER);
>>>>>>         return true;
>>>>>> @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct
>>>>>> srcu_struct *ssp, struct srcu_node *snp
>>>>>>         int cpu;
>>>>>>         for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
>>>>>> -        if (!(mask & (1 << (cpu - snp->grplo))))
>>>>>> +        if (!(mask & (1UL << (cpu - snp->grplo))))
>>>>>>                 continue;
>>>>>>             srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>>>>>>         }
>>>>>
>>>>
>>>> -- 
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>>>
>>
>> -- 
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




More information about the lvc-project mailing list