[lvc-project] [PATCH] iommu: fix crash in report_iommu_fault()
Tian, Kevin
kevin.tian at intel.com
Thu Apr 10 08:29:51 MSK 2025
> From: Robin Murphy <robin.murphy at arm.com>
> Sent: Wednesday, April 9, 2025 7:30 PM
>
> On 2025-04-08 10:38 pm, Jason Gunthorpe wrote:
> > On Wed, Apr 09, 2025 at 12:33:41AM +0300, Fedor Pchelkin wrote:
> >> The following crash is observed while handling an IOMMU fault with a
> >> recent kernel:
> >>
> >> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >> BUG: unable to handle page fault for address: ffff8c708299f700
> >> PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
> >> Oops: Oops: 0011 [#1] SMP NOPTI
> >> CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20
> PREEMPT(lazy)
> >> Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW
> 09/27/2024
> >> RIP: 0010:0xffff8c708299f700
> >> Call Trace:
> >> <TASK>
> >> ? report_iommu_fault+0x78/0xd3
> >> ? amd_iommu_report_page_fault+0x91/0x150
> >> ? amd_iommu_int_thread+0x77/0x180
> >> ? __pfx_irq_thread_fn+0x10/0x10
> >> ? irq_thread_fn+0x23/0x60
> >> ? irq_thread+0xf9/0x1e0
> >> ? __pfx_irq_thread_dtor+0x10/0x10
> >> ? __pfx_irq_thread+0x10/0x10
> >> ? kthread+0xfc/0x240
> >> ? __pfx_kthread+0x10/0x10
> >> ? ret_from_fork+0x34/0x50
> >> ? __pfx_kthread+0x10/0x10
> >> ? ret_from_fork_asm+0x1a/0x30
> >> </TASK>
> >>
> >> report_iommu_fault() checks for an installed handler comparing the
> >> corresponding field to NULL. It can (and could before) be called for a
> >> domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA,
> specifically.
> >> Cookie is represented as a union so we may end up with a garbage value
> >> treated there if this happens for a domain with another cookie type.
> >>
> >> Formerly there were two exclusive cookie types in the union.
> >> IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().
> >>
> >> Call the fault handler only if the passed domain has a required cookie
> >> type.
> >>
> >> Found by Linux Verification Center (linuxtesting.org).
> >>
> >> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
> >> Signed-off-by: Fedor Pchelkin <pchelkin at ispras.ru>
> >> ---
> >
> > Reviewed-by: Jason Gunthorpe <jgg at nvidia.com>
> >
> > This should go to rc
> >
> >>> iommu-dma itself isn't ever going to use a fault
> >>> handler because it expects the DMA API to be used correctly and thus no
> >>> faults to occur.
> >>
> >> My first thought about this is that iommu-dma is not interested in
> >> installing a fault handler ever, okay. But why does it suppose that no
> >> faults would occur? It is a matter of "chance", firmware bugs, abovesaid
> >> DMA API abusing, etc... isn't it?
> >
> > Yes, it should not happen, this driver is clearly buggy.
>
> Right, it's not that we assume no faults can occur at all, just that any
> faults are *unexpected* since they represent some device or driver doing
> something wrong in any number of ways, thus there is nothing a fault
> handler could reasonably do except print "a fault happened!", which
> every IOMMU driver is going to do anyway, therefore there is no need to
> support fault handlers on DMA domains.
>
> But indeed it is now erroneous to be dereferencing domain->handler
> without checking that it is in fact a handler, sorry I missed that.
>
btw an interesting observation. The comment of report_iommu_fault()
says:
* This function should be called by the low-level IOMMU implementations
* whenever IOMMU faults happen, to allow high-level users, that are
* interested in such events, to know about them.
It sounds a general requirement to all IOMMU drivers, but in reality
only a subset of iommu drivers call it (e.g. intel/smmuv3 don't). So
there seems to be an implicit assumption from drivers on whether
the underlying IOMMU provides such facility...
More information about the lvc-project
mailing list