28-Aug-2017: Finding bugs related to the data races in Linux kernel

Anton Volkov has finished his Google Summer of Code project "Finding bugs related to the data races in Linux kernel" for The Linux Foundation. The goals of the GSoC project were:

  1. searching for bugs related to data races in Linux kernel;
  2. notifying authors and maintainers about them.

Data races are not necessarily bugs but may cause them. Bugs related to data races are one of the most challenging types of bugs to detect. Common reasons behind them are incorrect usage of synchronization primitives and usage of non-standard synchronization mechanisms.

For the purpose of finding bugs related to data races the tool called Klever (which is a new version of LDV Tools) was used. It was launched on drivers/ subdirectory of the Linux kernel v4.2.6. Although a bit dated this kernel version nevertheless contained a number of bugs that were still present in the latest stable Linux kernel version.

Anton Volkov analyzed around 500 warnings on/about potential data races. 70% of them were false positives: the tool considered the case to be a data race due to the analysis imprecision. 30% of the warnings corresponded to feasible data races leading to the undesired behaviour (bugs) like NULL pointer dereference and benign races. Usually several warnings had the same cause. Therefore only 28 true errors corresponded to the warnings.

During the project Anton Volkov communicated with authors and maintainers of the Linux kernel modules containing bugs related to data races and discussed possible solutions to those cases. One of the challenges of fixing bugs related to the data races is the complexity of the fix. Because of that not all of the reports received the corresponding patches: even the developers sometimes couldn’t suggest the proper fix to a problem. Below you can find information on submitted reports and proposed patches that were created during the project.

Reported races:

  1. ks8851_mll: https://lkml.org/lkml/2017/8/15/484
  2. ibmasm: https://lkml.org/lkml/2017/8/18/621
  3. apds990x: https://lkml.org/lkml/2017/8/10/515
  4. bh1770glc: https://lkml.org/lkml/2017/8/8/536
  5. via-ircc: https://lkml.org/lkml/2017/8/8/777
  6. cafe_ccic: https://lkml.org/lkml/2017/8/22/569
  7. isp1760: https://lkml.org/lkml/2017/8/7/570
  8. iowarrior: https://lkml.org/lkml/2017/8/22/490

Reported races that were considered to be infeasible by maintainers due to the internal device structure:

  1. xilinx-xadc: https://lkml.org/lkml/2017/8/18/705
  2. c4: https://lkml.org/lkml/2017/8/15/369

Reported races which were confirmed as errors:

  1. rcar-dmac: https://lkml.org/lkml/2017/8/8/480
  2. spi-tegra114, spi-tegra20-slink, spi-tegra20-sflash: https://lkml.org/lkml/2017/7/24/253
  3. mlx5_ib (benign race): https://lkml.org/lkml/2017/8/18/709
  4. ucb1400_ts: https://lkml.org/lkml/2017/8/15/384
  5. adutux: https://lkml.org/lkml/2017/8/15/360
  6. loop: https://lkml.org/lkml/2017/7/28/482
  7. hysdn: https://lkml.org/lkml/2017/7/27/480
  8. pc87413_wdt: https://lkml.org/lkml/2017/8/7/454
  9. cypress_m8: https://lkml.org/lkml/2017/8/22/358
  10. nsc-ircc: https://lkml.org/lkml/2017/8/25/378

Patches:

  1. rcar-dmac: https://patchwork.kernel.org/patch/9911629/
  2. hysdn (applied in v4.13-rc5): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commi...
  3. loop (to appear in v4.14): https://patchwork.kernel.org/patch/9885335/
  4. spi-tegra114: https://patchwork.kernel.org/patch/9915305/