[lvc-project] [PATCH] goku_udc: Add check for NULL in goku_irq

Fri Mar 31 17:07:34 MSK 2023

15.03.2023 19:26, Mirsad Goran Todorovac пишет:
> On 15.3.2023. 15:26, Anastasia Belova wrote:
>>
>> 13.03.2023 16:49, Mirsad Goran Todorovac пишет:
>>> On 13.3.2023. 13:19, Anastasia Belova wrote:
>>>>
>>>> 11.03.2023 06:29, Mirsad Goran Todorovac пишет:
>>>>> On 15. 02. 2023. 14:48, Greg Kroah-Hartman wrote:
>>>>>> On Wed, Feb 15, 2023 at 04:39:56PM +0300, Анастасия Белова wrote:
>>>>>>> 03.02.2023 13:45, Greg Kroah-Hartman пишет:
>>>>>>>> On Fri, Feb 03, 2023 at 01:18:28PM +0300, Anastasia Belova wrote:
>>>>>>>>> Before dereferencing dev->driver check it for NULL.
>>>>>>>>>
>>>>>>>>> If an interrupt handler is called after assigning
>>>>>>>>> NULL to dev->driver, but before resetting dev->int_enable,
>>>>>>>>> NULL-pointer will be dereferenced.
>>>>>>>>>
>>>>>>>>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>>>>>>>>
>>>>>>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>>>>>>>> Signed-off-by: Anastasia Belova <abelova at astralinux.ru>
>>>>>>>>> ---
>>>>>>>>>    drivers/usb/gadget/udc/goku_udc.c | 5 +++--
>>>>>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/usb/gadget/udc/goku_udc.c 
>>>>>>>>> b/drivers/usb/gadget/udc/goku_udc.c
>>>>>>>>> index bdc56b24b5c9..896bba8b47f1 100644
>>>>>>>>> --- a/drivers/usb/gadget/udc/goku_udc.c
>>>>>>>>> +++ b/drivers/usb/gadget/udc/goku_udc.c
>>>>>>>>> @@ -1616,8 +1616,9 @@ static irqreturn_t goku_irq(int irq, 
>>>>>>>>> void *_dev)
>>>>>>>>>    pm_next:
>>>>>>>>>            if (stat & INT_USBRESET) { /* hub reset done */
>>>>>>>>>                ACK(INT_USBRESET);
>>>>>>>>> -            INFO(dev, "USB reset done, gadget %s\n",
>>>>>>>>> -                dev->driver->driver.name);
>>>>>>>>> +            if (dev->driver)
>>>>>>>>> +                INFO(dev, "USB reset done, gadget %s\n",
>>>>>>>>> + dev->driver->driver.name);
>>>>>>>> How can this ever happen?  Can you trigger this somehow?  If 
>>>>>>>> not, I
>>>>>>>> don't think this is going to be possible (also what's up with 
>>>>>>>> printk
>>>>>>>> from an irq handler???)
>>>>>>> Unfortunately, I can't find the way to trigger this at the moment.
>>>>>> Then the change should not be made.
>>>>>>
>>>>>>> What about printk, should trace_printk be used instead?
>>>>>> Why?
>>>>>>
>>>>>>>> Odds are, no one actually has this hardware anymore, right?
>>>>>>> Despite of this, such vulnerability should be fixed because
>>>>>>> there is a possibility to exploit it.
>>>>>> How can this be "exploited" if it can not ever be triggered?
>>>>>>
>>>>>> Also, this would cause a NULL dereference in an irq handler, how 
>>>>>> can you
>>>>>> "exploit" that?
>>>>>>
>>>>>> Please only submit patches that actually do something. It is getting
>>>>>> very hard to want to even review patches from this "project" 
>>>>>> based on
>>>>>> the recent submissions.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>> Hi Greg, Anastasia,
>>>>
>>>> Hi Misrad,
>>>>
>>>>> Without any pros or cons, or taking sides, there appears to be a 
>>>>> similar check
>>>>> when using dev->driver->driver.name in
>>>>>
>>>>> https://elixir.bootlin.com/linux/latest/source/drivers/usb/gadget/udc/goku_udc.c#L1158 
>>>>>
>>>>>
>>>>>     seq_printf(m,
>>>>>            "%s - %s\n"
>>>>>            "%s version: %s %s\n"
>>>>>            "Gadget driver: %s\n"
>>>>>            "Host %s, %s\n"
>>>>>            "\n",
>>>>>            pci_name(dev->pdev), driver_desc,
>>>>>            driver_name, DRIVER_VERSION, dmastr(),
>>>>>            dev->driver ? dev->driver->driver.name : "(none)",
>>>>>            is_usb_connected
>>>>>                ? ((tmp & PW_PULLUP) ? "full speed" : "powered")
>>>>>                : "disconnected",
>>>>>            udc_ep_state(dev->ep0state));
>>>>>
>>>>> On the other hand, where could dev->drivre be reset without 
>>>>> resetting dev->int_enable?
>>>>>
>>>>> dev->driver = NULL appears here:
>>>>>
>>>>> static int goku_udc_stop(struct usb_gadget *g)
>>>>> {
>>>>>     struct goku_udc    *dev = to_goku_udc(g);
>>>>>     unsigned long    flags;
>>>>>
>>>>>     spin_lock_irqsave(&dev->lock, flags);
>>>>>     dev->driver = NULL;
>>>>>     stop_activity(dev);
>>>>>     spin_unlock_irqrestore(&dev->lock, flags);
>>>>>
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> it is followed by stop_activity() calling udc_reset():
>>>>>
>>>>> static void udc_reset(struct goku_udc *dev)
>>>>> {
>>>>>     struct goku_udc_regs __iomem    *regs = dev->regs;
>>>>>
>>>>>     writel(0, &regs->power_detect);
>>>>>     writel(0, &regs->int_enable);
>>>>>     readl(&regs->int_enable);
>>>>>     dev->int_enable = 0;
>>>>> .
>>>>> .
>>>>> .
>>>>>
>>>>> ... but this happens in between spin_lock_irqsave() and 
>>>>> spin_unlock_irqsave(),
>>>>> which appears like a correct way to do it.
>>>>
>>>> Are you sure that spin_lock_irqsave makes the code safe? This 
>>>> function disables interrupts on
>>>> local processor only (Linux Device Drivers, Third Edition). So it 
>>>> doesn't seem to be
>>>> absolutely safe on multiprocessor systems.
>>>
>>> Hi, Anastasia,
>>>
>>> Looking from the Second Edition or the book and the source, I see that
>>> spin_lock_irqsave() expands to:
>>>
>>> static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t 
>>> *lock)
>>> {
>>>     unsigned long flags;
>>>
>>>     local_irq_save(flags);
>>>     preempt_disable();
>>>     spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
>>>     LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>>>     return flags;
>>> }
>>>
>>> if the multiple threads on multiple cores/SMTs contend for the same 
>>> lock,
>>> that with preempt_disable() should assure mutual exclusion.
>>>
>>> Can you please quote from the Third Edition of Linux Device Drivers 
>>> where
>>> it says otherwise?
>>>
>>
>> Hi, Mirsad,
>>
>>
>> If I get it right, preempt_disable blocks interrupts on all processors,
>> correct? This statement seems to make the code safe, but there is a 
>> quote
>> from Linux Device Drivers, Third Edition, CHAPTER 5, Concurrency and 
>> Race
>> Conditions: "...spin_lock_irqsave disables interrupts (on the local 
>> processor
>> only) before taking the spinlock...". These thoughts contradict, 
>> don't they?
>
> Hi, Anastasia,
>
> To quote the book:
>
> https://static.lwn.net/images/pdf/LDD3/ch05.pdf page 119:
>
> > We have already seen how spin_lock works. spin_lock_irqsave disables 
> interrupts (on
> > the local processor only) before taking the spinlock; the previous 
> interrupt state is
> > stored in flags.
>
> This is consistent with the source. To repeat, spin_lock_irqsave() 
> eventually
> expands to:
>
> https://elixir.bootlin.com/linux/v6.3-rc2/source/include/linux/spinlock_api_smp.h#L97 
>
>
> static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
> {
>     unsigned long flags;
>
>     local_irq_save(flags);
>     preempt_disable();
>     spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
>     LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>     return flags;
> }
>
> preempt_disable() disables interrupt only on local CPU, but this is 
> necessary
> so IRQ wouldn't attempt another asynchronous lock attempt on the lock 
> currently being
> acquired, which would of course block forever this CPU, core or SMT.
>
> The magic is in LOCK_CONTENTED, which eventually atomically sets a 
> global semaphore
> and executes a barrier() call, to ensure no compiler optimisation 
> would mess with that
> before all writes are synchronised to RAM.
>
> Additional info on barrier() is here:
>
> https://elixir.bootlin.com/linux/v6.3-rc2/source/include/linux/compiler.h#L83 
>
>
> This guarantees that the next read from the same or another CPU, core 
> or SMT will
> see lock (synchronised and "taken"), so the mutual exclusion between 
> the cores is
> OK. NOTE: the lock must be global, not local to thread for this to work.
>
> So, the key is to go to the Source ;-)
>
Hi, Mirsad,

Thank you so much for our discussion. This case is clearer for me now.

Best regards,

Anastasia

> This of course doesn't invalidate driver hardening against 
> "impossible" situations.
> In userland, assert() is good for this, but it is impractical for 
> kernel to die on
> each bug. Rather log the bug or oops?
>
> Best regards,
> Mirsad
>
>>> BTW, please also consider reading this article:
>>>
>>> https://docs.kernel.org/driver-api/io_ordering.html
>>>
>>> I saw they were using this readl() after writel() for 
>>> synchronisation, so
>>> please see if this clears your doubts. It says "readl() should flush 
>>> any pending
>>> writes".
>>>
>>> But I certainly see no harm in your proposal of guarding against 
>>> NULL pointer
>>> dereference of dev->driver, both in
>>>
>>> > -            INFO(dev, "USB reset done, gadget %s\n",
>>> > -                dev->driver->driver.name);
>>> > +            if (dev->driver)
>>> > +                INFO(dev, "USB reset done, gadget %s\n",
>>> > +                    dev->driver->driver.name);
>>>
>>> or use the construct as the one in another line of the driver:
>>>
>>> > -            INFO(dev, "USB reset done, gadget %s\n",
>>> > -                dev->driver->driver.name);
>>> > +            INFO(dev, "USB reset done, gadget %s\n",
>>> > +                dev->driver ? dev->driver->driver.name : "(none)");
>>>
>>> (This would IMHO enable detecting and logging when dev->driver 
>>> unexpectedly becomes NULL
>>> in a race condition, rather that just silently skipping and ignoring 
>>> the situation.)
>>>
>> Agree, the second construct looks better.
>>> but additionally also in:
>>>
>>> > -     spin_unlock (&dev->lock);
>>> > -     tmp = dev->driver->setup(&dev->gadget, &ctrl);
>>> > -     spin_lock (&dev->lock);
>>> > +     if (dev->driver && dev->driver->setup) {
>>> > +         spin_unlock (&dev->lock);
>>> > +         tmp = dev->driver->setup(&dev->gadget, &ctrl);
>>> > +         spin_lock (&dev->lock);
>>> > +     }
>>>
>>> for completeness and robustness sake.
>>>
>>> The author agrees that the race conditions in the device drivers are 
>>> very hard
>>> to reproduce:
>>>
>>> https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch09s08.html 
>>>
>>>
>>> But I am really not able to analyse all possible scenarios ATM.
>>> Maybe some ideas come after getting some oxygen.
>>>
>>> (This is not an authoritative answer on the matter, just an attempt 
>>> on analysis.)
>>>
>>> Regards,
>>> Mirsad
>>>
>> Regards,
>>
>> Anastasia
>>
>>>>> But second appearance is here:
>>>>>
>>>>> https://elixir.bootlin.com/linux/latest/source/drivers/usb/gadget/udc/goku_udc.c#L1559 
>>>>>
>>>>>
>>>>>     spin_lock(&dev->lock);
>>>>>
>>>>> rescan:
>>>>>     stat = readl(&regs->int_status) & dev->int_enable;
>>>>>          if (!stat)
>>>>>         goto done;
>>>>>     dev->irqs++;
>>>>>
>>>>>     /* device-wide irqs */
>>>>>     if (unlikely(stat & INT_DEVWIDE)) {
>>>>>         if (stat & INT_SYSERROR) {
>>>>>             ERROR(dev, "system error\n");
>>>>>             stop_activity(dev);
>>>>>             stat = 0;
>>>>>             handled = 1;
>>>>>             // FIXME have a neater way to prevent re-enumeration
>>>>>             dev->driver = NULL;
>>>>>             goto done;
>>>>>         }
>>>>>
>>>>> goto done leads to:
>>>>>
>>>>> done:
>>>>>     (void)readl(&regs->int_enable);
>>>>>     spin_unlock(&dev->lock);
>>>>>
>>>>> This unlocks dev->lock before setting dev->int_enable to zero, or 
>>>>> calling writel(0, &regs->int_enable);
>>>>> which could be problematic. Unless it called stop_activity(dev) 
>>>>> four lines earlier. Which does
>>>>> bot of:
>>>>>
>>>>>     writel(0, &regs->int_enable);
>>>>>     dev->int_enable = 0;
>>>>>
>>>>> So, FWIW, we seem to be safe. Yet, there might be no harm in 
>>>>> printing "(null)" rather
>>>>> than having an NULL pointer dereference, it seems.
>>>>>
>>>>> Yet, there is another unprotected dereference of dev->driver:
>>>>>
>>>>> https://elixir.bootlin.com/linux/latest/source/drivers/usb/gadget/udc/goku_udc.c#L1513 
>>>>>
>>>>>
>>>>>     spin_unlock (&dev->lock);
>>>>>     tmp = dev->driver->setup(&dev->gadget, &ctrl);
>>>>>     spin_lock (&dev->lock);
>>>>>
>>>>> All others (in goku_udc.c, at least) have triple safeguards like:
>>>>>
>>>>>                 if (dev->gadget.speed != USB_SPEED_UNKNOWN
>>>>>                         && dev->driver
>>>>>                         && dev->driver->suspend) {
>>>>>                     spin_unlock(&dev->lock);
>>>>> dev->driver->suspend(&dev->gadget);
>>>>>                     spin_lock(&dev->lock);
>>>>>                 }
>>>>>
>>>>> So the above should maybe put to:
>>>>>
>>>>>     if (dev->driver && dev->driver->setup) {
>>>>>         spin_unlock (&dev->lock);
>>>>>         tmp = dev->driver->setup(&dev->gadget, &ctrl);
>>>>>         spin_lock (&dev->lock);
>>>>>     }
>>>>>
>>>>> instead to be completely certain.
>>>>>
>>>>> Forgive me for this uninspired rant. Thank you if you've read this 
>>>>> far.
>>>>> I hope this helps.
>>>>>
>>>>> My $0.02.
>>>>>
>>>>> Regards,
>>>>> Mirsad
>>>>>
>>>> Thanks,
>>>>
>>>> Anastasia
>>>
>