[lvc-project] [PATCH v2] The value may overflow

Paul E. McKenney paulmck at kernel.org
Tue Sep 5 22:45:36 MSK 2023


On Tue, Sep 05, 2023 at 03:36:40PM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 15:27, Paul E. McKenney wrote:
> > On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> > > > On 9/5/23 10:15, David Laight wrote:
> > > > > ...
> > > > > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > > > > only require a 31-bit shift, which should be OK, but 497 would require
> > > > > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > > > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > > > > which would result in a 33-bit shift (and is that even defined in C?).
> > > > > 
> > > > > Not quite right :-)
> > > > > 
> > > > > (1 << 31) is int and negative, that gets sign extended before
> > > > > being converted to 'unsigned long' - so has the top 33 bits set.
> > > 
> > > Good point, thank you for the correction.
> > > 
> > > > > (1 << 32) is undefined, the current x86 cpu ignore the high
> > > > > shift bits so it is (1 << 0).
> > > 
> > > Which would cause it to attempt to invoke SRCU callbacks on the
> > > lowest-numbered CPU associated with that srcu_node structure.
> > > 
> > > > Yes, I was about to reply the same thing. A shift of 31 is buggy,
> > > > because shifting 1 << 31 raises the sign bit, which sets the top 33
> > > > bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> > > > with for instance x86 choosing to ignore the top bit.
> > > > 
> > > > But in order to have a 1 << 31 shift from this expression:
> > > > 
> > > >                  sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > 
> > > > I *think* we need the group to have 32 cpus or more (indexed within
> > > > the group from grplo to grplo + 31 (both inclusive)).
> > > > 
> > > > So as soon as we have one group with 32 cpus, the problem should show
> > > > up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> > > > group of 32 cpus, for:
> > > > 
> > > >    15*31 + 32 = 497 cpus.
> > > > 
> > > > AFAIU, this would be the minimum value for smp_processor_id()+1 which
> > > > triggers this issue.
> > > 
> > > By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
> > > srcu_node structure takes up one bit in the parent srcu_node structures'
> > > srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
> > > and 32 bits and larger erroneous.
> > > 
> > > This is the situation as I see it (assuming dense CPU numbering):
> > > 
> > > 	# Leaf srcu_node		Largest
> > > 	structures	#CPUs		CPU #		Status
> > > 
> > > 	0-30		0-480		479		Good
> > > 	31		481-496		495		Questionable
> > > 	32-		497-		496+		Bad.
> > > 
> > > Tree SRCU differs from Tree RCU in its operation, so this bug should
> > > not hold up SRCU grace periods.  It might instead cause SRCU callbacks
> > > to be ignored (which would admittedly look quite similar to the user).
> > > 
> > > My attempts to cheat the numbering system ran up against the limited
> > > height of the srcu_node tree.
> > > 
> > > But there is still the question of why this has not been seen in the
> > > wild, given that there really are systems with more than 479 CPUs
> > > out there.  One possibility is the call to srcu_schedule_cbs_sdp()
> > > from srcu_funnel_gp_start().  But it does not seem likely that this
> > > would happen all that often, as it requires back-to-back grace periods
> > > and then some.
> > > 
> > > Maybe people with large systems boot with srcutree.convert_to_big=0?
> > > 
> > > Adding Laurent for his thoughts.
> > > 
> > > Aha!
> > > 
> > > Here is what makes it work, given David's description of 1<<32:
> > > 
> > > static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
> > > 				  unsigned long mask, unsigned long delay)
> > > {
> > > 	int cpu;
> > > 
> > > 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > 		if (!(mask & (1 << (cpu - snp->grplo))))
> > > 			continue;
> > > 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > 	}
> > > }
> > > 
> > > As long as at least one bit is set in the result of 1<<N, and as long
> > > as the compiler always does the same thing for a given N, then this loop
> > > will make the right thing happen.
> > > 
> > > But even that is not relied upon, because the calling code looks like
> > > this:
> > > 
> > > 			spin_lock_irq_rcu_node(snp);
> > > 			cbs = false;
> > > 			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
> > > 			if (last_lvl)
> > > 				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
> > > 			snp->srcu_have_cbs[idx] = gpseq;
> > > 			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > > 			sgsne = snp->srcu_gp_seq_needed_exp;
> > > 			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
> > > 				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > > 			if (ss_state < SRCU_SIZE_BIG)
> > > 				mask = ~0;
> > > 			else
> > > 				mask = snp->srcu_data_have_cbs[idx];
> > > 			snp->srcu_data_have_cbs[idx] = 0;
> > > 			spin_unlock_irq_rcu_node(snp);
> > > 			if (cbs)
> > > 				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
> > > 
> > > So that last_lvl check means that the srcu_schedule_cbs_snp() function
> > > is invoked only for leaf srcu_node structures.  Which by default limit
> > > the shift to 16.
> > > 
> > > So this bug appears to have no effect for default RCU setups, even on very
> > > large 64-bit systems, which is consistent with field experience.  Even if
> > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > Which actually happened the other day due to someone trusting ChatGPT's
> > > opinion about RCU Kconfig options...
> > 
> > And I therefore queued Denis's v3 patch with an edited commit log.
> > Of course, if anyone sees some other way that the bug could manifest
> > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > than 30 on a system with at least 31 CPUs, please let me know so that
> > I can adjust.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit ed083b0e22f1396dee3599896249a3f218845298
> > Author: Denis Arefev <arefev at swemel.ru>
> > Date:   Mon Sep 4 15:21:14 2023 +0300
> > 
> >      Fix srcu_struct node grpmask overflow on 64-bit systems
> >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> 
> AFAIU, the overflow resides in the "bitwise expression" and not
> the arithmetic expression.

Rather than quibble about exactly what constitutes arithmetic, I
updated the first paragraph and added your Reviewed-by as shown
below.  ;-)

> Other than this, please add my
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Thank you all!!!

------------------------------------------------------------------------

commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
Author: Denis Arefev <arefev at swemel.ru>
Date:   Mon Sep 4 15:21:14 2023 +0300

    Fix srcu_struct node grpmask overflow on 64-bit systems
    
    The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
    is subject to overflow due to a failure to cast operands to a larger
    data type before performing the bitwise operation.
    
    The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
    Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
    maximum shift of 15), but which can be set up as high as 64 (resulting
    in a maximum shift of 63).  A value of 31 can result in sign extension,
    resulting in 0xffffffff80000000 instead of the desired 0x80000000.
    A value of 31 or greater triggers undefined behavior per the C standard.
    
    This bug has not been known to cause issues because almost all kernels
    take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as
    a given compiler gives a deterministic result for 1<<N for N>=32, the
    code correctly invokes all SRCU callbacks, albeit wasting CPU time along
    the way.
    
    This commit therefore substitutes the correct 1UL for the buggy 1.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Signed-off-by: Denis Arefev <arefev at swemel.ru>
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
    Cc: David Laight <David.Laight at aculab.com>
    Signed-off-by: Paul E. McKenney <paulmck at kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 833a8f848a90..5602042856b1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
 				snp->grplo = cpu;
 			snp->grphi = cpu;
 		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
 	}
 	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
 	return true;
@@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}



More information about the lvc-project mailing list