Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPIO Kconfig bug & add BT host driver #1

Merged
merged 2 commits into from
Sep 29, 2015

Conversation

apopple
Copy link

@apopple apopple commented Sep 29, 2015

No description provided.

The Aspeed GPIO menu entry was added under "PCI GPIO expanders" which
depends on PCI which may not be enabled for Aspeed. Move the entry to
the "Memory mapped GPIO drivers" which describes the Aspeed GPIO
driver more correctly.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
This patch adds a simple device driver to expose the iBT interface on
Aspeed chips as a character device (/dev/bt).

Signed-off-by: Alistair Popple <alistair@popple.id.au>
@apopple apopple merged this pull request into openbmc:dev Sep 29, 2015
shenki pushed a commit that referenced this pull request Nov 13, 2015
…ap_device_late_init

Kernel fails to boot 50% of times (form build to build) with
RT-patchset applied due to the following race - on late boot
stages deferred_probe_work_func->omap_hsmmc_probe races with omap_device_late_ini.

The same issue has been reported now on linux-next (4.3) by Keerthy [1]

late_initcall
 - deferred_probe_initcal() tries to re-probe all pending driver's probe.

- later on, some driver is probing in this case It's cpsw.c
  (but could be any other drivers)
  cpsw_init
  - platform_driver_register
    - really_probe
       - driver_bound
         - driver_deferred_probe_trigger
  and boot proceed.
  So, at this moment we have deferred_probe_work_func scheduled.

late_initcall_sync
  - omap_device_late_init
    - omap_device_idle

CPU1					CPU2
  - deferred_probe_work_func
    - really_probe
      - omap_hsmmc_probe
	- pm_runtime_get_sync
					late_initcall_sync
					- omap_device_late_init
						if (od->_driver_status != BUS_NOTIFY_BOUND_DRIVER) {
							if (od->_state == OMAP_DEVICE_STATE_ENABLED) {
								- omap_device_idle [ops - IP is disabled]
	- [fail]
	- pm_runtime_put_sync
          - omap_hsmmc_runtime_suspend [ooops!]

== log ==
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 davinci_mdio 48485000.mdio: davinci mdio revision 1.6
 davinci_mdio 48485000.mdio: detected phy mask fffffff3
 libphy: 48485000.mdio: probed
 davinci_mdio 48485000.mdio: phy[2]: device 48485000.mdio:02, driver unknown
 davinci_mdio 48485000.mdio: phy[3]: device 48485000.mdio:03, driver unknown
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 cpsw 48484000.ethernet: Detected MACID = b4:99:4c:c7:d2:48
 cpsw 48484000.ethernet: cpsw: Detected MACID = b4:99:4c:c7:d2:49
 hctosys: unable to open rtc device (rtc0)
 omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver.  Idling
 ldousb: disabling
 Unhandled fault: imprecise external abort (0x1406) at 0x00000000
 [00000000] *pgd=00000000
 Internal error: : 1406 [#1] PREEMPT SMP ARM
 Modules linked in:
 CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 4.1.2-rt1-00467-g6da3c0a-dirty #5
 Hardware name: Generic DRA74X (Flattened Device Tree)
 Workqueue: deferwq deferred_probe_work_func
 task: ee6ddb00 ti: edd3c000 task.ti: edd3c000
 PC is at omap_hsmmc_runtime_suspend+0x1c/0x12c
 LR is at _od_runtime_suspend+0xc/0x24
 pc : [<c0471998>]    lr : [<c0029590>]    psr: a0000013
 sp : edd3dda0  ip : ee6ddb00  fp : c07be540
 r10: 00000000  r9 : c07be540  r8 : 00000008
 r7 : 00000000  r6 : ee646c10  r5 : ee646c10  r4 : edd79380
 r3 : fa0b4100  r2 : 00000000  r1 : 00000000  r0 : ee646c10
 Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
 Control: 10c5387d  Table: 8000406a  DAC: 00000015
 Process kworker/u4:1 (pid: 58, stack limit = 0xedd3c218)
 Stack: (0xedd3dda0 to 0xedd3e000)
 dda0: ee646c70 ee646c10 c0029584 00000000 00000008 c0029590 ee646c70 ee646c10
 ddc0: c0029584 c03adfb8 ee646c10 00000004 0000000c c03adff0 ee646c10 00000004
 dde0: 0000000c c03ae4ec 00000000 edd3c000 ee646c10 00000004 ee646c70 00000004
 de00: fa0b4000 c03aec20 ee6ddb00 ee646c10 00000004 ee646c70 ee646c10 fffffdfb
 de20: edd79380 00000000 fa0b4000 c03aee90 fffffdfb edd79000 ee646c00 c0474290
 de40: 00000000 edda24c0 edd79380 edc81f00 00000000 00000200 00000001 c06dd488
 de60: edda3960 ee646c10 ee646c10 c0824cc4 fffffdfb c0880c94 00000002 edc92600
 de80: c0836378 c03a7f84 ee646c10 c0824cc4 00000000 c0880c80 c0880c94 c03a6568
 dea0: 00000000 ee646c10 c03a66ac ee4f8000 00000000 00000001 edc92600 c03a4b40
 dec0: ee404c94 edc83c4c ee646c10 ee646c10 ee646c44 c03a63c4 ee646c10 ee646c10
 dee0: c0814448 c03a5aa8 ee646c10 c0814220 edd3c000 c03a5ec0 c0814250 ee6be400
 df00: edd3c000 c004e5bc ee6ddb01 00000078 ee6ddb00 ee4f8000 ee6be418 edd3c000
 df20: ee4f8028 00000088 c0836045 ee4f8000 ee6be400 c004e928 ee4f8028 00000000
 df40: c004e8ec 00000000 ee6bf1c0 ee6be400 c004e8ec 00000000 00000000 00000000
 df60: 00000000 c0053450 2e56fa97 00000000 afdffbd7 ee6be400 00000000 00000000
 df80: edd3df80 edd3df80 00000000 00000000 edd3df90 edd3df90 edd3dfac ee6bf1c0
 dfa0: c0053384 00000000 00000000 c000f668 00000000 00000000 00000000 00000000
 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 f1fc9d7e febfbdff
 [<c0471998>] (omap_hsmmc_runtime_suspend) from [<c0029590>] (_od_runtime_suspend+0xc/0x24)
 [<c0029590>] (_od_runtime_suspend) from [<c03adfb8>] (__rpm_callback+0x24/0x3c)
 [<c03adfb8>] (__rpm_callback) from [<c03adff0>] (rpm_callback+0x20/0x80)
 [<c03adff0>] (rpm_callback) from [<c03ae4ec>] (rpm_suspend+0xe4/0x618)
 [<c03ae4ec>] (rpm_suspend) from [<c03aee90>] (__pm_runtime_idle+0x60/0x80)
 [<c03aee90>] (__pm_runtime_idle) from [<c0474290>] (omap_hsmmc_probe+0x6bc/0xa7c)
 [<c0474290>] (omap_hsmmc_probe) from [<c03a7f84>] (platform_drv_probe+0x44/0xa4)
 [<c03a7f84>] (platform_drv_probe) from [<c03a6568>] (driver_probe_device+0x170/0x2b4)
 [<c03a6568>] (driver_probe_device) from [<c03a4b40>] (bus_for_each_drv+0x64/0x98)
 [<c03a4b40>] (bus_for_each_drv) from [<c03a63c4>] (device_attach+0x70/0x88)
 [<c03a63c4>] (device_attach) from [<c03a5aa8>] (bus_probe_device+0x84/0xac)
 [<c03a5aa8>] (bus_probe_device) from [<c03a5ec0>] (deferred_probe_work_func+0x58/0x88)
 [<c03a5ec0>] (deferred_probe_work_func) from [<c004e5bc>] (process_one_work+0x134/0x464)
 [<c004e5bc>] (process_one_work) from [<c004e928>] (worker_thread+0x3c/0x4fc)
 [<c004e928>] (worker_thread) from [<c0053450>] (kthread+0xcc/0xe4)
 [<c0053450>] (kthread) from [<c000f668>] (ret_from_fork+0x14/0x2c)
 Code: e594302c e593202c e584205c e594302c (e5932128)
 ---[ end trace 0000000000000002 ]---

The issue happens because omap_device_late_init() do not take into
account that some drivers are present, but their probes were not
finished successfully and where deferred instead. This is the valid
case, and omap_device_late_init() should not idle such devices.

To fix this issue, the value of omap_device->_driver_status field
should be checked not only for BUS_NOTIFY_BOUND_DRIVER (driver is
present and has been bound to device successfully), but also checked
for BUS_NOTIFY_BIND_DRIVER (driver about to be bound) - which means
driver is present and there was try to bind it to device.

[1] http://www.spinics.net/lists/arm-kernel/msg441880.html
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
On old ARM chips, unaligned accesses to memory are not trapped and
fixed.  On module load, symbols are relocated, and the relocation of
__bug_table symbols is done on a u32 basis. Yet the section is not
aligned to a multiple of 4 address, but to a multiple of 2.

This triggers an Oops on pxa architecture, where address 0xbf0021ea
is the first relocation in the __bug_table section :
  apply_relocate(): pxa3xx_nand: section 13 reloc 0 sym ''
  Unable to handle kernel paging request at virtual address bf0021ea
  pgd = e1cd0000
  [bf0021ea] *pgd=c1cce851, *pte=c1cde04f, *ppte=c1cde01f
  Internal error: Oops: 23 [#1] ARM
  Modules linked in:
  CPU: 0 PID: 606 Comm: insmod Not tainted 4.2.0-rc8-next-20150828-cm-x300+ torvalds#887
  Hardware name: CM-X300 module
  task: e1c68700 ti: e1c3e000 task.ti: e1c3e000
  PC is at apply_relocate+0x2f4/0x3d4
  LR is at 0xbf0021ea
  pc : [<c000e7c8>]    lr : [<bf0021ea>]    psr: 80000013
  sp : e1c3fe30  ip : 60000013  fp : e49e8c60
  r10: e49e8fa8  r9 : 00000000  r8 : e49e7c58
  r7 : e49e8c38  r6 : e49e8a58  r5 : e49e8920  r4 : e49e8918
  r3 : bf0021ea  r2 : bf007034  r1 : 00000000  r0 : bf000000
  Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 0000397f  Table: c1cd0018  DAC: 00000051
  Process insmod (pid: 606, stack limit = 0xe1c3e198)
  [<c000e7c8>] (apply_relocate) from [<c005ce5c>] (load_module+0x1248/0x1f5c)
  [<c005ce5c>] (load_module) from [<c005dc54>] (SyS_init_module+0xe4/0x170)
  [<c005dc54>] (SyS_init_module) from [<c000a420>] (ret_fast_syscall+0x0/0x38)

Fix this by ensuring entries in __bug_table are all aligned to at least
of multiple of 4. This transforms a module section  __bug_table as :
-   [12] __bug_table       PROGBITS        00000000 002232 000018 00   A  0   0  1
+   [12] __bug_table       PROGBITS        00000000 002232 000018 00   A  0   0  4

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The renesas-irqc interrupt controller is cascaded to the GIC. Hence when
propagating wake-up settings to its parent interrupt controller, the
following lockdep warning is printed:

    =============================================
    [ INFO: possible recursive locking detected ]
    4.2.0-ape6evm-10725-g50fcd7643c034198 torvalds#280 Not tainted
    ---------------------------------------------
    s2ram/1072 is trying to acquire lock:
    (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    but task is already holding lock:
    (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    other info that might help us debug this:
    Possible unsafe locking scenario:

	  CPU0
	  ----
     lock(&irq_desc_lock_class);
     lock(&irq_desc_lock_class);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    6 locks held by s2ram/1072:
    #0:  (sb_writers#7){.+.+.+}, at: [<c012eb14>] __sb_start_write+0xa0/0xa8
    #1:  (&of->mutex){+.+.+.}, at: [<c019396c>] kernfs_fop_write+0x4c/0x1bc
    #2:  (s_active#24){.+.+.+}, at: [<c0193974>] kernfs_fop_write+0x54/0x1bc
    #3:  (pm_mutex){+.+.+.}, at: [<c008213c>] pm_suspend+0x10c/0x510
    #4:  (&dev->mutex){......}, at: [<c02af3c4>] __device_suspend+0xdc/0x2cc
    #5:  (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    stack backtrace:
    CPU: 0 PID: 1072 Comm: s2ram Not tainted 4.2.0-ape6evm-10725-g50fcd7643c034198 torvalds#280
    Hardware name: Generic R8A73A4 (Flattened Device Tree)
    [<c0018078>] (unwind_backtrace) from [<c00144f0>] (show_stack+0x10/0x14)
    [<c00144f0>] (show_stack) from [<c0451f14>] (dump_stack+0x88/0x98)
    [<c0451f14>] (dump_stack) from [<c007b29c>] (__lock_acquire+0x15cc/0x20e4)
    [<c007b29c>] (__lock_acquire) from [<c007c6e0>] (lock_acquire+0xac/0x12c)
    [<c007c6e0>] (lock_acquire) from [<c0457c00>] (_raw_spin_lock_irqsave+0x40/0x54)
    [<c0457c00>] (_raw_spin_lock_irqsave) from [<c008d3fc>] (__irq_get_desc_lock+0x58/0x98)
    [<c008d3fc>] (__irq_get_desc_lock) from [<c008ebbc>] (irq_set_irq_wake+0x20/0xf8)
    [<c008ebbc>] (irq_set_irq_wake) from [<c0260770>] (irqc_irq_set_wake+0x20/0x4c)
    [<c0260770>] (irqc_irq_set_wake) from [<c008ec28>] (irq_set_irq_wake+0x8c/0xf8)
    [<c008ec28>] (irq_set_irq_wake) from [<c02cb8c0>] (gpio_keys_suspend+0x74/0xc0)
    [<c02cb8c0>] (gpio_keys_suspend) from [<c02ae8cc>] (dpm_run_callback+0x54/0x124)

Avoid this false positive by using a separate lockdep class for IRQC
interrupts.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1441798974-25716-2-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The renesas-intc-irqpin interrupt controller is cascaded to the GIC.
Hence when propagating wake-up settings to its parent interrupt
controller, the following lockdep warning is printed:

    =============================================
    [ INFO: possible recursive locking detected ]
    4.2.0-armadillo-10725-g50fcd7643c034198 torvalds#781 Not tainted
    ---------------------------------------------
    s2ram/1179 is trying to acquire lock:
    (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    but task is already holding lock:
    (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    other info that might help us debug this:
    Possible unsafe locking scenario:

	  CPU0
	  ----
     lock(&irq_desc_lock_class);
     lock(&irq_desc_lock_class);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    7 locks held by s2ram/1179:
    #0:  (sb_writers#7){.+.+.+}, at: [<c00c9708>] __sb_start_write+0x64/0xb8
    #1:  (&of->mutex){+.+.+.}, at: [<c0125a00>] kernfs_fop_write+0x78/0x1a0
    #2:  (s_active#23){.+.+.+}, at: [<c0125a08>] kernfs_fop_write+0x80/0x1a0
    #3:  (autosleep_lock){+.+.+.}, at: [<c0058244>] pm_autosleep_lock+0x18/0x20
    #4:  (pm_mutex){+.+.+.}, at: [<c0057e50>] pm_suspend+0x54/0x248
    #5:  (&dev->mutex){......}, at: [<c0243a20>] __device_suspend+0xdc/0x240
    #6:  (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    stack backtrace:
    CPU: 0 PID: 1179 Comm: s2ram Not tainted 4.2.0-armadillo-10725-g50fcd7643c034198

    Hardware name: Generic R8A7740 (Flattened Device Tree)
    [<c00129f4>] (dump_backtrace) from [<c0012bec>] (show_stack+0x18/0x1c)
    [<c0012bd4>] (show_stack) from [<c03f5d94>] (dump_stack+0x20/0x28)
    [<c03f5d74>] (dump_stack) from [<c00514d4>] (__lock_acquire+0x67c/0x1b88)
    [<c0050e58>] (__lock_acquire) from [<c0052df8>] (lock_acquire+0x9c/0xbc)
    [<c0052d5c>] (lock_acquire) from [<c03fb068>] (_raw_spin_lock_irqsave+0x44/0x58)
    [<c03fb024>] (_raw_spin_lock_irqsave) from [<c005bb54>] (__irq_get_desc_lock+0x78/0x94
    [<c005badc>] (__irq_get_desc_lock) from [<c005c3d8>] (irq_set_irq_wake+0x28/0x100)
    [<c005c3b0>] (irq_set_irq_wake) from [<c01e50d0>] (intc_irqpin_irq_set_wake+0x24/0x4c)
    [<c01e50ac>] (intc_irqpin_irq_set_wake) from [<c005c17c>] (set_irq_wake_real+0x3c/0x50
    [<c005c140>] (set_irq_wake_real) from [<c005c414>] (irq_set_irq_wake+0x64/0x100)
    [<c005c3b0>] (irq_set_irq_wake) from [<c02a19b4>] (gpio_keys_suspend+0x60/0xa0)
    [<c02a1954>] (gpio_keys_suspend) from [<c023b750>] (platform_pm_suspend+0x3c/0x5c)

Avoid this false positive by using a separate lockdep class for INTC
External IRQ Pin interrupts.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1441798974-25716-3-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The OPP list needs to be protected against concurrent accesses. Using
simple RCU read locks does the trick and gets rid of the following
lockdep warning:

	===============================
	[ INFO: suspicious RCU usage. ]
	4.2.0-next-20150908 #1 Not tainted
	-------------------------------
	drivers/base/power/opp.c:460 Missing rcu_read_lock() or dev_opp_list_lock protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 0
	4 locks held by kworker/u8:0/6:
	 #0:  ("%s""deferwq"){++++.+}, at: [<c0040d8c>] process_one_work+0x118/0x4bc
	 #1:  (deferred_probe_work){+.+.+.}, at: [<c0040d8c>] process_one_work+0x118/0x4bc
	 #2:  (&dev->mutex){......}, at: [<c03b8194>] __device_attach+0x20/0x118
	 #3:  (prepare_lock){+.+...}, at: [<c054bc08>] clk_prepare_lock+0x10/0xf8

	stack backtrace:
	CPU: 2 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.0-next-20150908 #1
	Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
	Workqueue: deferwq deferred_probe_work_func
	[<c001802c>] (unwind_backtrace) from [<c00135a4>] (show_stack+0x10/0x14)
	[<c00135a4>] (show_stack) from [<c02a8418>] (dump_stack+0x94/0xd4)
	[<c02a8418>] (dump_stack) from [<c03c6f6c>] (dev_pm_opp_find_freq_ceil+0x108/0x114)
	[<c03c6f6c>] (dev_pm_opp_find_freq_ceil) from [<c0551a3c>] (dfll_calculate_rate_request+0xb8/0x170)
	[<c0551a3c>] (dfll_calculate_rate_request) from [<c0551b10>] (dfll_clk_round_rate+0x1c/0x2c)
	[<c0551b10>] (dfll_clk_round_rate) from [<c054de2c>] (clk_calc_new_rates+0x1b8/0x228)
	[<c054de2c>] (clk_calc_new_rates) from [<c054e44c>] (clk_core_set_rate_nolock+0x44/0xac)
	[<c054e44c>] (clk_core_set_rate_nolock) from [<c054e4d8>] (clk_set_rate+0x24/0x34)
	[<c054e4d8>] (clk_set_rate) from [<c0512460>] (tegra124_cpufreq_probe+0x120/0x230)
	[<c0512460>] (tegra124_cpufreq_probe) from [<c03b9cbc>] (platform_drv_probe+0x44/0xac)
	[<c03b9cbc>] (platform_drv_probe) from [<c03b84c8>] (driver_probe_device+0x218/0x304)
	[<c03b84c8>] (driver_probe_device) from [<c03b69b0>] (bus_for_each_drv+0x60/0x94)
	[<c03b69b0>] (bus_for_each_drv) from [<c03b8228>] (__device_attach+0xb4/0x118)
	ata1: SATA link down (SStatus 0 SControl 300)
	[<c03b8228>] (__device_attach) from [<c03b77c8>] (bus_probe_device+0x88/0x90)
	[<c03b77c8>] (bus_probe_device) from [<c03b7be8>] (deferred_probe_work_func+0x58/0x8c)
	[<c03b7be8>] (deferred_probe_work_func) from [<c0040dfc>] (process_one_work+0x188/0x4bc)
	[<c0040dfc>] (process_one_work) from [<c004117c>] (worker_thread+0x4c/0x4f4)
	[<c004117c>] (worker_thread) from [<c0047230>] (kthread+0xe4/0xf8)
	[<c0047230>] (kthread) from [<c000f7d0>] (ret_from_fork+0x14/0x24)

Signed-off-by: Thierry Reding <treding@nvidia.com>
Fixes: c4fe70a ("clk: tegra: Add closed loop support for the DFLL")
[vince.h@nvidia.com: Unlock rcu on error path]
Signed-off-by: Vince Hsu <vince.h@nvidia.com>
[sboyd@codeaurora.org: Dropped second hunk that nested the rcu
read lock unnecessarily]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
shenki pushed a commit that referenced this pull request Nov 13, 2015
If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ torvalds#244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is
non-existant and we'll end up dereferencing a NULL ptr:

[1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386
[1033173.493678] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[1033173.493689] Modules linked in:
[1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545
[1033173.493706] task: ffff8800630c4000 ti: ffff880063110000 task.ti: ffff880063110000
[1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.493826] RSP: 0018:ffff880063117a88  EFLAGS: 00010203
[1033173.493828] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000000c
[1033173.493830] RDX: 0000000000000002 RSI: ffffffffb3f10720 RDI: 0000000000000014
[1033173.493832] RBP: ffff880063117b80 R08: ffff88047574d9a4 R09: 0000000000000000
[1033173.493834] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000c622f53
[1033173.493836] R13: ffff8800cb905500 R14: ffff8808d6da2000 R15: 00000000fffffdfd
[1033173.493840] FS:  00007fa56b92d700(0000) GS:ffff880478000000(0000) knlGS:0000000000000000
[1033173.493843] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1033173.493845] CR2: 0000000000000000 CR3: 00000000630e8000 CR4: 00000000000006a0
[1033173.493855] Stack:
[1033173.493862]  ffffffffb0b60444 000000000000eaea 0000000041b58ab3 ffffffffb3c3ce32
[1033173.493867]  ffffffffb0b6f3e0 ffffffffb0b60444 ffffffffb5ea2e50 1ffff1000c622f5e
[1033173.493873]  ffff8800630c4cd8 00000000000ee09a ffffffffb3ec4888 ffffffffb5ea2de8
[1033173.493874] Call Trace:
[1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170)
[1033173.494113] vcc_ioctl (net/atm/ioctl.c:189)
[1033173.494116] svc_ioctl (net/atm/svc.c:605)
[1033173.494200] sock_do_ioctl (net/socket.c:874)
[1033173.494204] sock_ioctl (net/socket.c:958)
[1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00
All code

========
   0:   fa                      cli
   1:   48 c1 ea 03             shr    $0x3,%rdx
   5:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
   9:   0f 85 50 09 00 00       jne    0x95f
   f:   49 8b 9e 60 06 00 00    mov    0x660(%r14),%rbx
  16:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  1d:   fc ff df
  20:   48 8d 7b 14             lea    0x14(%rbx),%rdi
  24:   48 89 fa                mov    %rdi,%rdx
  27:   48 c1 ea 03             shr    $0x3,%rdx
  2b:*  0f b6 04 02             movzbl (%rdx,%rax,1),%eax               <-- trapping instruction
  2f:   48 89 fa                mov    %rdi,%rdx
  32:   83 e2 07                and    $0x7,%edx
  35:   38 d0                   cmp    %dl,%al
  37:   7f 08                   jg     0x41
  39:   84 c0                   test   %al,%al
  3b:   0f 85 14 09 00 00       jne    0x955

Code starting with the faulting instruction
===========================================
   0:   0f b6 04 02             movzbl (%rdx,%rax,1),%eax
   4:   48 89 fa                mov    %rdi,%rdx
   7:   83 e2 07                and    $0x7,%edx
   a:   38 d0                   cmp    %dl,%al
   c:   7f 08                   jg     0x16
   e:   84 c0                   test   %al,%al
  10:   0f 85 14 09 00 00       jne    0x92a
[1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.494368]  RSP <ffff880063117a88>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Commit b6d3096 (Input: uinput - switch to
using for_each_set_bit()) switched driver to use for_each_set_bit().
However during initial write of the uinput structure that contains min/max
data for all possible axes none of them are reflected in dev->absbit yet
and so we were skipping over all of them and were not allocating absinfo
memory which caused crash later when driver tried to sens EV_ABS events:

<1>[   15.064330] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
<1>[   15.064336] IP: [<ffffffff8163f142>] input_handle_event+0x232/0x4e0
<4>[   15.064343] PGD 0
<4>[   15.064345] Oops: 0000 [#1] SMP

Fixes: b6d3096
Cc: stable@vger.kernel.org
Reported-by: Stephen Chandler Paul <cpaul@redhat.com>
Tested-by: Stephen Chandler Paul <cpaul@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
A deadlock trace is seen in netcp driver with lockup detector enabled.
The trace log is provided below for reference. This patch fixes the
bug by removing the usage of netcp_modules_lock within ndo_ops functions.
ndo_{open/close/ioctl)() is already called with rtnl_lock held. So there
is no need to hold another mutex for serialization across processes on
multiple cores.  So remove use of netcp_modules_lock mutex from these
ndo ops functions.

ndo_set_rx_mode() shouldn't be using a mutex as it is called from atomic
context. In the case of ndo_set_rx_mode(), there can be call to this API
without rtnl_lock held from an atomic context. As the underlying modules
are expected to add address to a hardware table, it is to be protected
across concurrent updates and hence a spin lock is used to synchronize
the access. Same with ndo_vlan_rx_add_vid() & ndo_vlan_rx_kill_vid().

Probably the netcp_modules_lock is used to protect the module not being
removed as part of rmmod. Currently this is not fully implemented and
assumes the interface is brought down before doing rmmod of modules.
The support for rmmmod while interface is up is expected in a future
patch set when additional modules such as pa, qos are added. For now
all of the tests such as if up/down, reboot, iperf works fine with this
patch applied.

Deadlock trace seen with lockup detector enabled is shown below for
reference.

[   16.863014] ======================================================
[   16.869183] [ INFO: possible circular locking dependency detected ]
[   16.875441] 4.1.6-01265-gfb1e101 #1 Tainted: G        W
[   16.881176] -------------------------------------------------------
[   16.887432] ifconfig/1662 is trying to acquire lock:
[   16.892386]  (netcp_modules_lock){+.+.+.}, at: [<c03e8110>]
netcp_ndo_open+0x168/0x518
[   16.900321]
[   16.900321] but task is already holding lock:
[   16.906144]  (rtnl_mutex){+.+.+.}, at: [<c053a418>] devinet_ioctl+0xf8/0x7e4
[   16.913206]
[   16.913206] which lock already depends on the new lock.
[   16.913206]
[   16.921372]
[   16.921372] the existing dependency chain (in reverse order) is:
[   16.928844]
-> #1 (rtnl_mutex){+.+.+.}:
[   16.932865]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
[   16.938521]        [<c04c5758>] register_netdev+0xc/0x24
[   16.943831]        [<c03e65c0>] netcp_module_probe+0x214/0x2ec
[   16.949660]        [<c03e8a54>] netcp_register_module+0xd4/0x140
[   16.955663]        [<c089654c>] keystone_gbe_init+0x10/0x28
[   16.961233]        [<c000977c>] do_one_initcall+0xb8/0x1f8
[   16.966714]        [<c0867e04>] kernel_init_freeable+0x148/0x1e8
[   16.972720]        [<c05f9994>] kernel_init+0xc/0xe8
[   16.977682]        [<c0010038>] ret_from_fork+0x14/0x3c
[   16.982905]
-> #0 (netcp_modules_lock){+.+.+.}:
[   16.987619]        [<c006eab0>] lock_acquire+0x118/0x320
[   16.992928]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
[   16.998582]        [<c03e8110>] netcp_ndo_open+0x168/0x518
[   17.004064]        [<c04c48f0>] __dev_open+0xa8/0x10c
[   17.009112]        [<c04c4b74>] __dev_change_flags+0x94/0x144
[   17.014853]        [<c04c4c3c>] dev_change_flags+0x18/0x48
[   17.020334]        [<c053a9fc>] devinet_ioctl+0x6dc/0x7e4
[   17.025729]        [<c04a59ec>] sock_ioctl+0x1d0/0x2a8
[   17.030865]        [<c0142844>] do_vfs_ioctl+0x41c/0x688
[   17.036173]        [<c0142ae4>] SyS_ioctl+0x34/0x5c
[   17.041046]        [<c000ff60>] ret_fast_syscall+0x0/0x54
[   17.046441]
[   17.046441] other info that might help us debug this:
[   17.046441]
[   17.054434]  Possible unsafe locking scenario:
[   17.054434]
[   17.060343]        CPU0                    CPU1
[   17.064862]        ----                    ----
[   17.069381]   lock(rtnl_mutex);
[   17.072522]                                lock(netcp_modules_lock);
[   17.078875]                                lock(rtnl_mutex);
[   17.084532]   lock(netcp_modules_lock);
[   17.088366]
[   17.088366]  *** DEADLOCK ***
[   17.088366]
[   17.094279] 1 lock held by ifconfig/1662:
[   17.098278]  #0:  (rtnl_mutex){+.+.+.}, at: [<c053a418>]
devinet_ioctl+0xf8/0x7e4
[   17.105774]
[   17.105774] stack backtrace:
[   17.110124] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   17.118637] Hardware name: Keystone
[   17.122123] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   17.129862] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   17.137079] [<c05ff450>] (dump_stack) from [<c0068e34>]
(print_circular_bug+0x210/0x330)
[   17.145161] [<c0068e34>] (print_circular_bug) from [<c006ab7c>]
(validate_chain.isra.35+0xf98/0x13ac)
[   17.154372] [<c006ab7c>] (validate_chain.isra.35) from [<c006da60>]
(__lock_acquire+0x52c/0xcc0)
[   17.163149] [<c006da60>] (__lock_acquire) from [<c006eab0>]
(lock_acquire+0x118/0x320)
[   17.171058] [<c006eab0>] (lock_acquire) from [<c06023f0>]
(mutex_lock_nested+0x68/0x4a8)
[   17.179140] [<c06023f0>] (mutex_lock_nested) from [<c03e8110>]
(netcp_ndo_open+0x168/0x518)
[   17.187484] [<c03e8110>] (netcp_ndo_open) from [<c04c48f0>]
(__dev_open+0xa8/0x10c)
[   17.195133] [<c04c48f0>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   17.203129] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   17.211560] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   17.219729] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   17.227378] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   17.234939] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   17.242242] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   17.258855] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
[   17.271282] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   17.279712] in_atomic(): 1, irqs_disabled(): 0, pid: 1662, name: ifconfig
[   17.286500] INFO: lockdep is turned off.
[   17.290413] Preemption disabled at:[<  (null)>]   (null)
[   17.295728]
[   17.297214] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   17.305735] Hardware name: Keystone
[   17.309223] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   17.316970] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   17.324194] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   17.332112] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   17.340724] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
(dev_set_rx_mode+0x1c/0x28)
[   17.348982] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
(__dev_open+0xc4/0x10c)
[   17.356724] [<c04c490c>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   17.364729] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   17.373166] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   17.381344] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   17.388994] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   17.396563] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   17.403873] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   17.413772] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.20.2) started
Sending discover...
[   18.690666] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
Sending discover...
[   22.250972] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
[   22.258721] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   22.265458] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   22.273896] in_atomic(): 1, irqs_disabled(): 0, pid: 342, name: kworker/1:1
[   22.280854] INFO: lockdep is turned off.
[   22.284767] Preemption disabled at:[<  (null)>]   (null)
[   22.290074]
[   22.291568] CPU: 1 PID: 342 Comm: kworker/1:1 Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   22.300255] Hardware name: Keystone
[   22.303750] Workqueue: ipv6_addrconf addrconf_dad_work
[   22.308895] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   22.316643] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   22.323867] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   22.331786] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   22.340394] [<c03e9840>] (netcp_set_rx_mode) from [<c04c9d18>]
(__dev_mc_add+0x54/0x68)
[   22.348401] [<c04c9d18>] (__dev_mc_add) from [<c05ab358>]
(igmp6_group_added+0x168/0x1b4)
[   22.356580] [<c05ab358>] (igmp6_group_added) from [<c05ad2cc>]
(ipv6_dev_mc_inc+0x4f0/0x5a8)
[   22.365019] [<c05ad2cc>] (ipv6_dev_mc_inc) from [<c058f0d0>]
(addrconf_dad_work+0x21c/0x33c)
[   22.373460] [<c058f0d0>] (addrconf_dad_work) from [<c0042850>]
(process_one_work+0x214/0x8d0)
[   22.381986] [<c0042850>] (process_one_work) from [<c0042f54>]
(worker_thread+0x48/0x4bc)
[   22.390071] [<c0042f54>] (worker_thread) from [<c004868c>]
(kthread+0xf0/0x108)
[   22.397381] [<c004868c>] (kthread) from [<c0010038>]

Trace related to incorrect usage of mutex inside ndo_set_rx_mode

[   24.086066] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   24.094506] in_atomic(): 1, irqs_disabled(): 0, pid: 1682, name: ifconfig
[   24.101291] INFO: lockdep is turned off.
[   24.105203] Preemption disabled at:[<  (null)>]   (null)
[   24.110511]
[   24.112005] CPU: 2 PID: 1682 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   24.120518] Hardware name: Keystone
[   24.124018] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   24.131772] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   24.138989] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   24.146908] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   24.155523] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
(dev_set_rx_mode+0x1c/0x28)
[   24.163787] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
(__dev_open+0xc4/0x10c)
[   24.171531] [<c04c490c>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   24.179528] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   24.187966] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   24.196145] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   24.203803] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   24.211373] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   24.218676] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   24.227156] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Jonathan Liu reports that the recent addition of CPU_SW_DOMAIN_PAN
causes wpa_supplicant to die due to the following kernel oops:

Unhandled fault: page domain fault (0x81b) at 0x001017a2
pgd = ee1b8000
[001017a2] *pgd=6ebee831, *pte=6c35475f, *ppte=6c354c7f
Internal error: : 81b [#1] SMP ARM
Modules linked in: rt2800usb rt2x00usb rt2800librt2x00lib crc_ccitt mac80211
CPU: 1 PID: 202 Comm: wpa_supplicant Not tainted 4.3.0-rc2 #1
Hardware name: Allwinner sun7i (A20) Family
task: ec872f80 ti: ee364000 task.ti: ee364000
PC is at do_alignment_ldmstm+0x1d4/0x238
LR is at 0x0
pc : [<c001d1d8>]    lr : [<00000000>]    psr: 600c0113
sp : ee365e18  ip : 00000000  fp : 00000002
r10: 001017a2  r9 : 00000002  r8 : 001017aa
r7 : ee365fb0  r6 : e8820018  r5 : 001017a2  r4 : 00000003
r3 : d49e30e0  r2 : 00000000  r1 : ee365fbc  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none[   34.393106] Control: 10c5387d  Table: 6e1b806a  DAC: 00000051
Process wpa_supplicant (pid: 202, stack limit = 0xee364210)
Stack: (0xee365e18 to 0xee366000)
...
[<c001d1d8>] (do_alignment_ldmstm) from [<c001d510>] (do_alignment+0x1f0/0x904)
[<c001d510>] (do_alignment) from [<c00092a0>] (do_DataAbort+0x38/0xb4)
[<c00092a0>] (do_DataAbort) from [<c0013d7c>] (__dabt_usr+0x3c/0x40)
Exception stack(0xee365fb0 to 0xee365ff8)
5fa0:                                     00000000 56c728c0 001017a2 d49e30e0
5fc0: 775448d2 597d4e74 00200800 7a9e1625 00802001 00000021 b6deec84 00000100
5fe0: 08020200 be9f4f20 0c0b0d0a b6d9b3e0 600c0010 ffffffff
Code: e1a0a005 e1a0000c 1affffe8 e5913000 (e4ea3001)
---[ end trace 0acd3882fcfdf9dd ]---

This is caused by the alignment handler not being fixed up for the
uaccess changes, and userspace issuing an unaligned LDM instruction.
So, fix the problem by adding the necessary fixups.

Reported-by: Jonathan Liu <net147@gmail.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
shenki pushed a commit that referenced this pull request Nov 13, 2015
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
memslots array (SRCU protected).  Using a mini SRCU critical section
is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
the VMX vcpu_create callback calls synchronize_srcu.

Fixes this lockdep splat:

===============================
[ INFO: suspicious RCU usage. ]
4.3.0-rc1+ #1 Not tainted
-------------------------------
include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!

other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-i38/17000:
 #0:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]

[...]
Call Trace:
 dump_stack+0x4e/0x84
 lockdep_rcu_suspicious+0xfd/0x130
 kvm_zap_gfn_range+0x188/0x1a0 [kvm]
 kvm_set_cr0+0xde/0x1e0 [kvm]
 init_vmcb+0x760/0xad0 [kvm_amd]
 svm_create_vcpu+0x197/0x250 [kvm_amd]
 kvm_arch_vcpu_create+0x47/0x70 [kvm]
 kvm_vm_ioctl+0x302/0x7e0 [kvm]
 ? __lock_is_held+0x51/0x70
 ? __fget+0x101/0x210
 do_vfs_ioctl+0x2f4/0x560
 ? __fget_light+0x29/0x90
 SyS_ioctl+0x4c/0x90
 entry_SYSCALL_64_fastpath+0x16/0x73

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
ppp_create_interface() must then lock these mutexes in that same order
to avoid possible deadlock.

[  120.880011] ======================================================
[  120.880011] [ INFO: possible circular locking dependency detected ]
[  120.880011] 4.2.0 #1 Not tainted
[  120.880011] -------------------------------------------------------
[  120.880011] ppp-apitest/15827 is trying to acquire lock:
[  120.880011]  (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[  120.880011]
[  120.880011] but task is already holding lock:
[  120.880011]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[  120.880011]
[  120.880011] which lock already depends on the new lock.
[  120.880011]
[  120.880011]
[  120.880011] the existing dependency chain (in reverse order) is:
[  120.880011]
[  120.880011] -> #1 (rtnl_mutex){+.+.+.}:
[  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[  120.880011]        [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[  120.880011]        [<ffffffff812d9d94>] register_netdev+0x11/0x27
[  120.880011]        [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
[  120.880011]        [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
[  120.880011]        [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
[  120.880011]        [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[  120.880011]
[  120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
[  120.880011]        [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76
[  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[  120.880011]        [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[  120.880011]        [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252
[  120.880011]        [<ffffffff812d5381>] rollback_registered+0x29/0x38
[  120.880011]        [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
[  120.880011]        [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
[  120.880011]        [<ffffffff8112d9f6>] __fput+0xec/0x192
[  120.880011]        [<ffffffff8112dacc>] ____fput+0x9/0xb
[  120.880011]        [<ffffffff8105447a>] task_work_run+0x66/0x80
[  120.880011]        [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
[  120.880011]        [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
[  120.880011]        [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
[  120.880011]
[  120.880011] other info that might help us debug this:
[  120.880011]
[  120.880011]  Possible unsafe locking scenario:
[  120.880011]
[  120.880011]        CPU0                    CPU1
[  120.880011]        ----                    ----
[  120.880011]   lock(rtnl_mutex);
[  120.880011]                                lock(&pn->all_ppp_mutex);
[  120.880011]                                lock(rtnl_mutex);
[  120.880011]   lock(&pn->all_ppp_mutex);
[  120.880011]
[  120.880011]  *** DEADLOCK ***

Fixes: 8cb775b ("ppp: fix device unregistration upon netns deletion")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Andrey reported a panic:

[ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
[ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
[ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
[ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
[ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
[ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
[ 7249.867141] Stack:
[ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
...

Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.

Fixes: 613d09b ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <temnota.am@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this pull request Nov 13, 2015
it was broken by 35d5d20
"mtd: mxc_nand: cleanup copy_spare function"

else we get the following error :
[   22.709507] ubi0: attaching mtd3
[   23.613470] ubi0: scanning is finished
[   23.617278] ubi0: empty MTD device detected
[   23.623219] Unhandled fault: imprecise external abort (0x1c06) at 0x9e62f0ec
[   23.630291] pgd = 9df80000
[   23.633005] [9e62f0ec] *pgd=8e60041e(bad)
[   23.637064] Internal error: : 1c06 [#1] SMP ARM
[   23.641605] Modules linked in:
[   23.644687] CPU: 0 PID: 99 Comm: ubiattach Not tainted 4.2.0-dirty #22
[   23.651222] Hardware name: Freescale i.MX53 (Device Tree Support)
[   23.657322] task: 9e687300 ti: 9dcfc000 task.ti: 9dcfc000
[   23.662744] PC is at memcpy16_toio+0x4c/0x74
[   23.667026] LR is at mxc_nand_command+0x484/0x640
[   23.671739] pc : [<803f9c08>]    lr : [<803faeb0>]    psr: 60000013
[   23.671739] sp : 9dcfdb10  ip : 9e62f0ea  fp : 9dcfdb1c
[   23.683222] r10: a09c1000  r9 : 0000001a  r8 : ffffffff
[   23.688453] r7 : ffffffff  r6 : 9e674810  r5 : 9e674810  r4 : 000000b6
[   23.694985] r3 : a09c16a4  r2 : a09c16a4  r1 : a09c16a4  r0 : 0000ffff
[   23.701521] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   23.708662] Control: 10c5387d  Table: 8df80019  DAC: 00000015
[   23.714413] Process ubiattach (pid: 99, stack limit = 0x9dcfc210)
[   23.720514] Stack: (0x9dcfdb10 to 0x9dcfe000)
[   23.724881] db00:                                     9dcfdb6c 9dcfdb20 803faeb0 803f9bc8
[   23.733069] db20: 803f227c 803f9b74 ffffffff 9e674810 9e674810 9e674810 00000040 9e62f010
[   23.741255] db40: 803faa2c 9e674b40 9e674810 803faa2c 00000400 803faa2c 00000000 9df42800
[   23.749441] db60: 9dcfdb9c 9dcfdb70 803f2024 803faa38 9e4201cc 00000000 803f0a78 9e674b40
[   23.757627] db80: 803f1f80 9e674810 00000400 00000400 9dcfdc14 9dcfdba0 803f3bd8 803f1f8c
[   23.765814] dba0: 9e4201cc 00000000 00000580 00000000 00000000 800718c0 0000007f 00001000
[   23.774000] dbc0: 9df42800 000000e0 00000000 00000000 9e4201cc 00000000 00000000 00000000
[   23.782186] dbe0: 00000580 00000580 00000000 9e674810 9dcfdc20 9dcfdce8 9df42800 00580000
[   23.790372] dc00: 00000000 00000400 9dcfdc6c 9dcfdc18 803f3f94 803f39a4 9dcfdc20 00000000
[   23.798558] dc20: 00000000 00000400 00000000 00000000 00000000 00000000 9df42800 00000000
[   23.806744] dc40: 9dcfdd0c 00580000 00000000 00000400 00000000 9df42800 9dee1000 9d802000
[   23.814930] dc60: 9dcfdc94 9dcfdc70 803eb63c 803f3f38 00000400 9dcfdce8 9df42800 dead4ead
[   23.823116] dc80: 803eb5f4 00000000 9dcfdcc4 9dcfdc98 803e82ac 803eb600 00000400 9dcfdce8
[   23.831301] dca0: 9df42800 00000400 9dee0000 00000000 00000400 00000000 9dcfdd1c 9dcfdcc8
[   23.839488] dcc0: 80406048 803e8230 00000400 9dcfdce8 9df42800 9dcfdc78 00000008 00000000
[   23.847673] dce0: 00000000 00000000 00000000 00000004 00000000 9df42800 9dee0000 00000000
[   23.855859] dd00: 9d802030 00000000 9dc8b214 9d802000 9dcfdd44 9dcfdd20 804066cc 80405f50
[   23.864047] dd20: 00000400 9dc8b200 9d802030 9df42800 9dee0000 9dc8b200 9dcfdd84 9dcfdd48
[   23.872233] dd40: 8040a544 804065ac 9e401c80 000080d0 9dcfdd84 00000001 800fc828 9df42400
[   23.880418] dd60: 00000000 00000080 9dc8b200 9dc8b200 9dc8b200 9dee0000 9dcfdddc 9dcfdd8
[   23.888605] dd80: 803fb560 8040a440 9dcfddc4 9dcfdd98 800f1428 9dee1000 a0acf000 00000000
[   23.896792] dda0: 00000000 ffffffff 00000006 00000000 9dee0000 9dee0000 00005600 00000080
[   23.904979] ddc0: 9dc8b200 a0acf000 9dc8b200 8112514c 9dcfde24 9dcfdde0 803fc08c 803fb4f0
[   23.913165] dde0: 9e401c80 00000013 9dcfde04 9dcfddf8 8006bbf8 8006ba00 9dcfde24 00000000
[   23.921351] de00: 9dee0000 00000065 9dee0000 00000001 9dc8b200 8112514c 9dcfde84 9dcfde28
[   23.929538] de20: 8040afa0 803fb948 ffffffff 00000000 9dc8b214 9dcfde40 800f1428 800f11dc
[   23.937724] de40: 9dc8b21c 9dc8b20c 9dc8b204 9dee1000 9dc8b214 8069bb60 fffff000 fffff000
[   23.945911] de60: 9e7b5400 00000000 9dee0000 9dee1000 00001000 9e7b5400 9dcfdecc 9dcfde88
[   23.954097] de80: 803ff1bc 8040a630 9dcfdea4 9dcfde98 00000800 00000800 9dcfdecc 9dcfdea8
[   23.962284] dea0: 803e8f6c 00000000 7e87ab70 9e7b5400 80113e30 00000003 9dcfc000 00000000
[   23.970470] dec0: 9dcfdf04 9dcfded0 804008cc 803feb98 ffffffff 00000003 00000000 00000000
[   23.978656] dee0: 00000000 00000000 9e7cb000 9dc193e0 7e87ab70 9dd92140 9dcfdf7c 9dcfdf08
[   23.986842] df00: 80113b5c 8040080c 800fbed8 8006bbf0 9e7cb000 00000003 9e7cb000 9dd92140
[   23.995029] df20: 9dc193e0 9dd92148 9dcfdf4c 9dcfdf38 8011022c 800fbe78 8000f9cc 9e687300
[   24.003216] df40: 9dcfdf6c 9dcfdf50 8011f798 8007ffe8 7e87ab70 9dd92140 00000003 9dd92140
[   24.011402] df60: 40186f40 7e87ab70 9dcfc000 00000000 9dcfdfa4 9dcfdf80 80113e30 8011373c
[   24.019588] df80: 7e87ab70 7e87ab70 7e87aea9 00000036 8000fb8 9dcfc000 00000000 9dcfdfa8
[   24.027775] dfa0: 8000f9a0 80113e00 7e87ab70 7e87ab70 00000003 40186f40 7e87ab70 00000000
[   24.035962] dfc0: 7e87ab70 7e87ab70 7e87aea9 00000036 00000000 00000000 76fd1f70 00000000
[   24.044148] dfe0: 76f80f8c 7e87ab28 00009810 76f80fc4 60000010 00000003 00000000 00000000
[   24.052328] Backtrace:
[   24.054806] [<803f9bbc>] (memcpy16_toio) from [<803faeb0>] (mxc_nand_command+0x484/0x640)
[   24.062996] [<803faa2c>] (mxc_nand_command) from [<803f2024>] (nand_write_page+0xa4/0x154)
[   24.071264]  r10:9df42800 r9:00000000 r8:803faa2c r7:00000400 r6:803faa2c r5:9e674810
[   24.079180]  r4:9e674b40
[   24.081738] [<803f1f80>] (nand_write_page) from [<803f3bd8>] (nand_do_write_ops+0x240/0x444)
[   24.090180]  r8:00000400 r7:00000400 r6:9e674810 r5:803f1f80 r4:9e674b40
[   24.096970] [<803f3998>] (nand_do_write_ops) from [<803f3f94>] (nand_write+0x68/0x88)
[   24.104804]  r10:00000400 r9:00000000 r8:00580000 r7:9df42800 r6:9dcfdce8 r5:9dcfdc20
[   24.112719]  r4:9e674810
[   24.115287] [<803f3f2c>] (nand_write) from [<803eb63c>] (part_write+0x48/0x50)
[   24.122514]  r10:9d802000 r9:9dee1000 r8:9df42800 r7:00000000 r6:00000400 r5:00000000
[   24.130429]  r4:00580000
[   24.132989] [<803eb5f4>] (part_write) from [<803e82ac>] (mtd_write+0x88/0xa0)
[   24.140129]  r5:00000000 r4:803eb5f4
[   24.143748] [<803e8224>] (mtd_write) from [<80406048>] (ubi_io_write+0x104/0x65c)
[   24.151235]  r7:00000000 r6:00000400 r5:00000000 r4:9dee0000
[   24.156968] [<80405f44>] (ubi_io_write) from [<804066cc>] (ubi_io_write_ec_hdr+0x12c/0x190)
[   24.165323]  r10:9d802000 r9:9dc8b214 r8:00000000 r7:9d802030 r6:00000000 r5:9dee0000
[   24.173239]  r4:9df42800
[   24.175798] [<804065a0>] (ubi_io_write_ec_hdr) from [<8040a544>] (ubi_early_get_peb+0x110/0x1f0)
[   24.184587]  r6:9dc8b200 r5:9dee0000 r4:9df42800
[   24.189262] [<8040a434>] (ubi_early_get_peb) from [<803fb560>] (create_vtbl+0x7c/0x238)
[   24.197271]  r10:9dee0000 r9:9dc8b200 r8:9dc8b200 r7:9dc8b200 r6:00000080 r5:00000000
[   24.205187]  r4:9df42400
[   24.207746] [<803fb4e4>] (create_vtbl) from [<803fc08c>] (ubi_read_volume_table+0x750/0xa64)
[   24.216187]  r10:8112514c r9:9dc8b200 r8:a0acf000 r7:9dc8b200 r6:00000080 r5:00005600
[   24.224103]  r4:9dee0000
[   24.226662] [<803fb93c>] (ubi_read_volume_table) from [<8040afa0>] (ubi_attach+0x97c/0x152c)
[   24.235103]  r10:8112514c r9:9dc8b200 r8:00000001 r7:9dee0000 r6:00000065 r5:9dee0000
[   24.243018]  r4:00000000
[   24.245579] [<8040a624>] (ubi_attach) from [<803ff1bc>] (ubi_attach_mtd_dev+0x630/0xbac)
[   24.253673]  r10:9e7b5400 r9:00001000 r8:9dee1000 r7:9dee0000 r6:00000000 r5:9e7b5400
[   24.261588]  r4:fffff000
[   24.264148] [<803feb8c>] (ubi_attach_mtd_dev) from [<804008cc>] (ctrl_cdev_ioctl+0xcc/0x1cc)
[   24.272589]  r10:00000000 r9:9dcfc000 r8:00000003 r7:80113e30 r6:9e7b5400 r5:7e87ab70
[   24.280505]  r4:00000000
[   24.283070] [<80400800>] (ctrl_cdev_ioctl) from [<80113b5c>] (do_vfs_ioctl+0x42c/0x6c4)
[   24.291077]  r6:9dd92140 r5:7e87ab70 r4:9dc193e0
[   24.295753] [<80113730>] (do_vfs_ioctl) from [<80113e30>] (SyS_ioctl+0x3c/0x64)
[   24.303066]  r10:00000000 r9:9dcfc000 r8:7e87ab70 r7:40186f40 r6:9dd92140 r5:00000003
[   24.310981]  r4:9dd92140
[   24.313549] [<80113df4>] (SyS_ioctl) from [<8000f9a0>] (ret_fast_syscall+0x0/0x54)
[   24.321123]  r9:9dcfc000 r8:8000fb84 r7:00000036 r6:7e87aea9 r5:7e87ab70 r4:7e87ab70
[   24.328957] Code: e1c300b0 e1510002 e1a03001 1afffff9 (e89da800)
[   24.335066] ---[ end trace ab1cb17887f21bbb ]---
[   24.340249] Unhandled fault: imprecise external abort (0x1c06) at 0x7ee8bcf0
[   24.347310] pgd = 9df3c000
[   24.350023] [7ee8bcf0] *pgd=8dcbf831, *pte=8eb3334f, *ppte=8eb3383f
Segmentation fault

Fixes: 35d5d20 ("mtd: mxc_nand: cleanup copy_spare function")
Signed-off-by: Eric Bénard <eric@eukrea.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Baruch Siach <baruch@tkos.co.il>
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…t initialized.

In case something goes wrong with power well initialization we were calling
intel_prepare_ddi during boot while encoder list isnt't initilized.

[    9.618747] i915 0000:00:02.0: Invalid ROM contents
[    9.631446] [drm] failed to find VBIOS tables
[    9.720036] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000058
[    9.721986] IP: [<ffffffffa014eb72>] ddi_get_encoder_port+0x82/0x190 [i915]
[    9.723736] PGD 0
[    9.724286] Oops: 0000 [#1] PREEMPT SMP
[    9.725386] Modules linked in: intel_powerclamp snd_hda_intel(+) coretemp crc
32c_intel snd_hda_codec snd_hda_core serio_raw snd_pcm snd_timer i915(+) parport
_pc parport pinctrl_sunrisepoint pinctrl_intel nfsd nfs_acl
[    9.730635] CPU: 0 PID: 497 Comm: systemd-udevd Not tainted 4.3.0-rc2-eywa-10
967-g72de2cfd-dirty #2
[    9.732785] Hardware name: Intel Corporation Cannonlake Client platform/Skyla
ke DT DDR4 RVP8, BIOS CNLSE2R1.R00.X021.B00.1508040310 08/04/2015
[    9.735785] task: ffff88008a704700 ti: ffff88016a1ac000 task.ti: ffff88016a1a
c000
[    9.737584] RIP: 0010:[<ffffffffa014eb72>]  [<ffffffffa014eb72>] ddi_get_enco
der_port+0x82/0x190 [i915]
[    9.739934] RSP: 0000:ffff88016a1af710  EFLAGS: 00010296
[    9.741184] RAX: 000000000000004e RBX: ffff88008a9edc98 RCX: 0000000000000001
[    9.742934] RDX: 000000000000004e RSI: ffffffff81fc1e82 RDI: 00000000ffffffff
[    9.744634] RBP: ffff88016a1af730 R08: 0000000000000000 R09: 0000000000000578
[    9.746333] R10: 0000000000001065 R11: 0000000000000578 R12: fffffffffffffff8
[    9.748033] R13: ffff88016a1af7a8 R14: ffff88016a1af794 R15: 0000000000000000
[    9.749733] FS:  00007eff2e1e07c0(0000) GS:ffff88016fc00000(0000) knlGS:00000
00000000000
[    9.751683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.753083] CR2: 0000000000000058 CR3: 000000016922b000 CR4: 00000000003406f0
[    9.754782] Stack:
[    9.755332]  ffff88008a9edc98 ffff88008a9ed800 ffffffffa01d07b0 00000000fffb9
09e
[    9.757232]  ffff88016a1af7d8 ffffffffa0154ea7 0000000000000246 ffff88016a370
080
[    9.759182]  ffff88016a370080 ffff88008a9ed800 0000000000000246 ffff88008a9ed
c98
[    9.761132] Call Trace:
[    9.761782]  [<ffffffffa0154ea7>] intel_prepare_ddi+0x67/0x860 [i915]
[    9.763332]  [<ffffffff81a56996>] ? _raw_spin_unlock_irqrestore+0x26/0x40
[    9.765031]  [<ffffffffa00fad01>] ? gen9_read32+0x141/0x360 [i915]
[    9.766531]  [<ffffffffa00b43e1>] skl_set_power_well+0x431/0xa80 [i915]
[    9.768181]  [<ffffffffa00b4a63>] skl_power_well_enable+0x13/0x20 [i915]
[    9.769781]  [<ffffffffa00b2188>] intel_power_well_enable+0x28/0x50 [i915]
[    9.771481]  [<ffffffffa00b4d52>] intel_display_power_get+0x92/0xc0 [i915]
[    9.773180]  [<ffffffffa00b4fcb>] intel_display_set_init_power+0x3b/0x40 [i91
5]
[    9.774980]  [<ffffffffa00b5170>] intel_power_domains_init_hw+0x120/0x520 [i9
15]
[    9.776780]  [<ffffffffa0194c61>] i915_driver_load+0xb21/0xf40 [i915]

So let's protect this case.

My first attempt was to remove the intel_prepare_ddi, but Daniel had pointed out
this is really needed to restore those registers values. And Imre pointed out
that this case was without the flag protection and this was actually where things
were going bad. So I've just checked and this indeed solves my issue.

The regressing intel_prepare_ddi call was added in

commit 1d2b952
Author: Damien Lespiau <damien.lespiau@intel.com>
Date:   Fri Mar 6 18:50:53 2015 +0000

    drm/i915/skl: Restore the DDI translation tables when enabling PW1

Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[Jani: regression reference]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…ut event

A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake.  Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.

Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.

 BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
 ...
 RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
 RSP: 0018:ffff880028323b20  EFLAGS: 00000206
 RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
 RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
 R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
 FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
 CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
 Stack:
 ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
 <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
 <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
 Call Trace:
 <IRQ>
 [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
 [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81497255>] ? ip_rcv+0x275/0x350
 [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
 ...

With lockdep debugging:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 CslRx/12087 is trying to release lock (slock-AF_INET) at:
 [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
 but there are no more locks to release!

 other info that might help us debug this:
 2 locks held by CslRx/12087:
 #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
 #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]

Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.

Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Fixes the following lockdep splat:
[    1.244527] =============================================
[    1.245193] [ INFO: possible recursive locking detected ]
[    1.245193] 4.2.0-rc1+ #37 Not tainted
[    1.245193] ---------------------------------------------
[    1.245193] cp/742 is trying to acquire lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]
[    1.245193] but task is already holding lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] other info that might help us debug this:
[    1.245193]  Possible unsafe locking scenario:
[    1.245193]
[    1.245193]        CPU0
[    1.245193]        ----
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]
[    1.245193]  *** DEADLOCK ***
[    1.245193]
[    1.245193]  May be due to missing lock nesting notation
[    1.245193]
[    1.245193] 2 locks held by cp/742:
[    1.245193]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
[    1.245193]  #1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] stack backtrace:
[    1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
[    1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
[    1.245193]  ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
[    1.245193]  ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
[    1.245193]  000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
[    1.245193] Call Trace:
[    1.245193]  [<ffffffff814f6f49>] dump_stack+0x4c/0x65
[    1.245193]  [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
[    1.245193]  [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
[    1.245193]  [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
[    1.245193]  [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
[    1.245193]  [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
[    1.245193]  [<ffffffff81195d15>] vfs_create+0x95/0xc0
[    1.245193]  [<ffffffff8119929c>] path_openat+0x7cc/0x1280
[    1.245193]  [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
[    1.245193]  [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff81189bd9>] do_sys_open+0x129/0x200
[    1.245193]  [<ffffffff81189cc9>] SyS_open+0x19/0x20
[    1.245193]  [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f

While the lockdep splat is a false positive, becuase path_openat holds i_mutex
of the parent directory and ubifs_init_security() tries to acquire i_mutex
of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
in vain because it is only being called in the inode allocation path
and therefore nobody else can see the inode yet.

Cc: stable@vger.kernel.org # 3.20-
Reported-and-tested-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-and-tested-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: dedekind1@gmail.com
shenki pushed a commit that referenced this pull request Nov 13, 2015
My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[   28.010131] ------------[ cut here ]------------
[   28.010609] kernel BUG at ../mm/vmstat.c:1392!
[   28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   28.011860] Modules linked in:
[   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ torvalds#634
[   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[   28.014160] Workqueue: events vmstat_update
[   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[   28.015445] RIP: 0010:[<ffffffff8115f921>]  [<ffffffff8115f921>]vmstat_update+0x31/0x80
[   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
[   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[   28.020071] Stack:
[   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[   28.020071] Call Trace:
[   28.020071]  [<ffffffff8106bd88>] process_one_work+0x1c8/0x540
[   28.020071]  [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540
[   28.020071]  [<ffffffff8106c214>] worker_thread+0x114/0x460
[   28.020071]  [<ffffffff8106c100>] ? process_one_work+0x540/0x540
[   28.020071]  [<ffffffff81071bf8>] kthread+0xf8/0x110
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200
[   28.020071]  [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v2.6.31+
shenki pushed a commit that referenced this pull request Nov 13, 2015
Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…fy a fault

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Since 9eb1e57
drm/dp/mst: make sure mst_primary mstb is valid in work function

we validate the mstb structs in the work function, and doing
that takes a reference. So we should never get here with the
work function running using the mstb device, only if the work
function hasn't run yet or is running for another mstb.

So we don't need to sync the work here, this was causing
lockdep spew as below.

[  +0.000160] =============================================
[  +0.000001] [ INFO: possible recursive locking detected ]
[  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1 Tainted: G        W      ------------
[  +0.000001] ---------------------------------------------
[  +0.000001] kworker/4:2/1262 is trying to acquire lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
[  +0.000007]
but task is already holding lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]
other info that might help us debug this:
[  +0.000001]  Possible unsafe locking scenario:

[  +0.000002]        CPU0
[  +0.000000]        ----
[  +0.000001]   lock((&mgr->work));
[  +0.000002]   lock((&mgr->work));
[  +0.000001]
 *** DEADLOCK ***

[  +0.000001]  May be due to missing lock nesting notation

[  +0.000002] 2 locks held by kworker/4:2/1262:
[  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]  #1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000003]
stack backtrace:
[  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1
[  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
[  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
[  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
[  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
[  +0.000003] Call Trace:
[  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
[  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
[  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
[  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
[  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
[  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
[  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
[  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
[  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
[  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
[  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
[  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
[  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
[  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
[  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
[  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
[  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
[  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90

v2: add flush_work.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
When function graph tracer is enabled, the following operation
will trigger panic:

mount -t debugfs nodev /sys/kernel
echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/tracing/current_tracer
ls /proc/

------------[ cut here ]------------
[  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
[  198.506126] pgd = ffffffc008f79000
[  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
[  198.517726] Internal error: Oops: 94000005 [#1] SMP
[  198.518798] Modules linked in:
[  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
[  198.521800] Hardware name: linux,dummy-virt (DT)
[  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
[  198.524306] PC is at next_tgid+0x30/0x100
[  198.525205] LR is at return_to_handler+0x0/0x20
[  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
[  198.527392] sp : ffffffc0f9ab3d40
[  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
[  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
[  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
[  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
[  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
[  198.534446] x19: 0000000000000002 x18: 0000000000000000
[  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
[  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
[  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
[  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
[  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
[  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
[  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
[  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
[  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
[  198.547432]
[  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
[  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
[  198.582568] Call trace:
[  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
[  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
[  198.591092] ---[ end trace 6a346f8f20949ac8 ]---

This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.

Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Commit 1a3d595 ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Octeon uses its own implementation in octeon_switch.S, so remove FP
context saving there too in order to prevent attempting to save context
twice. That formerly led to an exception from the second save as follows
because the FPU had already been disabled by the first save:

    do_cpu invoked from kernel context![#1]:
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc2-dirty #2
    task: 800000041f84a008 ti: 800000041f864000 task.ti: 800000041f864000
    $ 0   : 0000000000000000 0000000010008ce1 0000000000100000 ffffffffbfffffff
    $ 4   : 800000041f84a008 800000041f84ac08 800000041f84c000 0000000000000004
    $ 8   : 0000000000000001 0000000000000000 0000000000000000 0000000000000001
    $12   : 0000000010008ce3 0000000000119c60 0000000000000036 800000041f864000
    $16   : 800000041f84ac08 800000000792ce80 800000041f84a008 ffffffff81758b00
    $20   : 0000000000000000 ffffffff8175ae50 0000000000000000 ffffffff8176c740
    $24   : 0000000000000006 ffffffff81170300
    $28   : 800000041f864000 800000041f867d90 0000000000000000 ffffffff815f3fa0
    Hi    : 0000000000fa8257
    Lo    : ffffffffe15cfc00
    epc   : ffffffff8112821c resume+0x9c/0x200
    ra    : ffffffff815f3fa0 __schedule+0x3f0/0x7d8
    Status: 10008ce2        KX SX UX KERNEL EXL
    Cause : 1080002c (ExcCode 0b)
    PrId  : 000d0601 (Cavium Octeon+)
    Modules linked in:
    Process kthreadd (pid: 2, threadinfo=800000041f864000, task=800000041f84a008, tls=0000000000000000)
    Stack : ffffffff81604218 ffffffff815f7e08 800000041f84a008 ffffffff811681b0
              800000041f84a008 ffffffff817e9878 0000000000000000 ffffffff81770000
              ffffffff81768340 ffffffff81161398 0000000000000001 0000000000000000
              0000000000000000 ffffffff815f4424 0000000000000000 ffffffff81161d68
              ffffffff81161be8 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 ffffffff8111e16c
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              ...
    Call Trace:
    [<ffffffff8112821c>] resume+0x9c/0x200
    [<ffffffff815f3fa0>] __schedule+0x3f0/0x7d8
    [<ffffffff815f4424>] schedule+0x34/0x98
    [<ffffffff81161d68>] kthreadd+0x180/0x198
    [<ffffffff8111e16c>] ret_from_kernel_thread+0x14/0x1c

Tested using cavium_octeon_defconfig on an EdgeRouter Lite.

Fixes: 1a3d595 ("MIPS: Tidy up FPU context switching")
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Aleksey Makarov <aleksey.makarov@auriga.com>
Cc: linux-kernel@vger.kernel.org
Cc: Chandrakala Chavva <cchavva@caviumnetworks.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Leonid Rosenboim <lrosenboim@caviumnetworks.com>
Patchwork: https://patchwork.linux-mips.org/patch/11166/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
shenki pushed a commit that referenced this pull request Nov 13, 2015
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 #1 [ffff88303d107bd0] schedule at ffffffff815c513e
 #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 #5 [ffff88303d107dd0] tty_read at ffffffff81368013
 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 #8 [ffff88303d107f00] sys_read at ffffffff811a4306
 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Since commit 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
[  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops: 0000 [#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
[  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
[  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
[  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
[  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
[  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
[  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
[  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
[  570.144601] Stack:
[  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
[  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
[  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
[  570.144601] Call Trace:
[  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
[  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
[  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
[  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
[  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
[  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
[  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
[  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
[  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
[  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601]  RSP <ffff880036b63e08>
[  570.144601] CR2: 00000000000004e0
[  570.200518] ---[ end trace 46956baf17349563 ]---

pppoe_flush_dev() has no reason to override sk->sk_state with
PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
PPPOX_DEAD, which is the correct state given that sk is unbound and
po->pppoe_dev is NULL.

Fixes: 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Tested-by: Oleksii Berezhniak <core@irc.lg.ua>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
amboar pushed a commit that referenced this pull request Jan 24, 2025
commit d38e26e upstream.

Using the 'net' structure via 'current' is not recommended for different
reasons.

First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.

Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
  CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
   __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
   __kernel_write+0xf6/0x140 fs/read_write.c:632
   do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
   acct_pin_kill+0x2d/0x100 kernel/acct.c:192
   pin_kill+0x194/0x7c0 fs/fs_pin.c:44
   mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
   cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
   task_work_run+0x14e/0x250 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xad8/0x2d70 kernel/exit.c:938
   do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
   get_signal+0x2576/0x2610 kernel/signal.c:3017
   arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
   do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fee3cb87a6a
  Code: Unable to access opcode bytes at 0x7fee3cb87a40.
  RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
  RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
  R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
  R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
   </TASK>
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  ----------------
  Code disassembly (best guess), 1 bytes skipped:
     0:	42 80 3c 38 00       	cmpb   $0x0,(%rax,%r15,1)
     5:	0f 85 fe 02 00 00    	jne    0x309
     b:	4d 8b a4 24 08 09 00 	mov    0x908(%r12),%r12
    12:	00
    13:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
    1a:	fc ff df
    1d:	49 8d 7c 24 28       	lea    0x28(%r12),%rdi
    22:	48 89 fa             	mov    %rdi,%rdx
    25:	48 c1 ea 03          	shr    $0x3,%rdx
  * 29:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
    2d:	0f 85 cc 02 00 00    	jne    0x2ff
    33:	4d 8b 7c 24 28       	mov    0x28(%r12),%r15
    38:	48                   	rex.W
    39:	8d                   	.byte 0x8d
    3a:	84 24 c8             	test   %ah,(%rax,%rcx,8)

Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.

Fixes: 6963c50 ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
commit 6a97f41 upstream.

die() can be called in exception handler, and therefore cannot sleep.
However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled.
That causes the following warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex
preempt_count: 110001, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
    dump_backtrace+0x1c/0x24
    show_stack+0x2c/0x38
    dump_stack_lvl+0x5a/0x72
    dump_stack+0x14/0x1c
    __might_resched+0x130/0x13a
    rt_spin_lock+0x2a/0x5c
    die+0x24/0x112
    do_trap_insn_illegal+0xa0/0xea
    _new_vmalloc_restore_context_a0+0xcc/0xd8
Oops - illegal instruction [#1]

Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT
enabled.

Fixes: 76d2a04 ("RISC-V: Init and Halt Code")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241118091333.1185288-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
[ Upstream commit 5641e82 ]

Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:

  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 1
        mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 0
        mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
 Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00
 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
 Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
  CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G           OE      6.11.0+ #2
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
  lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
  sp : ffff800085fafb00
  x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000
  x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000
  x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000
  x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350
  x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0
  x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c
  x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190
  x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000
  x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000
  Call trace:
   mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
   mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
   mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
   mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
   mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
   mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
   process_one_work+0x170/0x3e0
   worker_thread+0x2d8/0x3e0
   kthread+0x11c/0x128
   ret_from_fork+0x10/0x20
  Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400)
  ---[ end trace 0000000000000000 ]---

Fixes: dc48516 ("net/mlx5: Lag, add support to create definers for LAG")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
[ Upstream commit 2c36880 ]

Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.

 =====================================================
 WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
 6.12.0+ #4 Not tainted
 -----------------------------------------------------
 charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
 ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]

 and this task is already holding:
 ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
 which would create a new lock dependency:
  (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (&x->lock){+.-.}-{3:3}

 ... which became SOFTIRQ-irq-safe at:
   lock_acquire+0x1be/0x520
   _raw_spin_lock_bh+0x34/0x40
   xfrm_timer_handler+0x91/0xd70
   __hrtimer_run_queues+0x1dd/0xa60
   hrtimer_run_softirq+0x146/0x2e0
   handle_softirqs+0x266/0x860
   irq_exit_rcu+0x115/0x1a0
   sysvec_apic_timer_interrupt+0x6e/0x90
   asm_sysvec_apic_timer_interrupt+0x16/0x20
   default_idle+0x13/0x20
   default_idle_call+0x67/0xa0
   do_idle+0x2da/0x320
   cpu_startup_entry+0x50/0x60
   start_secondary+0x213/0x2a0
   common_startup_64+0x129/0x138

 to a SOFTIRQ-irq-unsafe lock:
  (&xa->xa_lock#24){+.+.}-{3:3}

 ... which became SOFTIRQ-irq-unsafe at:
 ...
   lock_acquire+0x1be/0x520
   _raw_spin_lock+0x2c/0x40
   xa_set_mark+0x70/0x110
   mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
   xfrm_dev_state_add+0x3bb/0xd70
   xfrm_add_sa+0x2451/0x4a90
   xfrm_user_rcv_msg+0x493/0x880
   netlink_rcv_skb+0x12e/0x380
   xfrm_netlink_rcv+0x6d/0x90
   netlink_unicast+0x42f/0x740
   netlink_sendmsg+0x745/0xbe0
   __sock_sendmsg+0xc5/0x190
   __sys_sendto+0x1fe/0x2c0
   __x64_sys_sendto+0xdc/0x1b0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

 other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xa->xa_lock#24);
                                local_irq_disable();
                                lock(&x->lock);
                                lock(&xa->xa_lock#24);
   <Interrupt>
     lock(&x->lock);

  *** DEADLOCK ***

 2 locks held by charon/1337:
  #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
  #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30

 the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
 -> (&x->lock){+.-.}-{3:3} ops: 29 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_alloc_spi+0xc0/0xe60
                     xfrm_alloc_userspi+0x5f6/0xbc0
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    IN-SOFTIRQ-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_timer_handler+0x91/0xd70
                     __hrtimer_run_queues+0x1dd/0xa60
                     hrtimer_run_softirq+0x146/0x2e0
                     handle_softirqs+0x266/0x860
                     irq_exit_rcu+0x115/0x1a0
                     sysvec_apic_timer_interrupt+0x6e/0x90
                     asm_sysvec_apic_timer_interrupt+0x16/0x20
                     default_idle+0x13/0x20
                     default_idle_call+0x67/0xa0
                     do_idle+0x2da/0x320
                     cpu_startup_entry+0x50/0x60
                     start_secondary+0x213/0x2a0
                     common_startup_64+0x129/0x138
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    xfrm_alloc_spi+0xc0/0xe60
                    xfrm_alloc_userspi+0x5f6/0xbc0
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffff87f9cd20>] __key.18+0x0/0x40

 the dependencies between the lock to be acquired
  and SOFTIRQ-irq-unsafe lock:
 -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    SOFTIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock+0x2c/0x40
                     xa_set_mark+0x70/0x110
                     mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                    xfrm_dev_state_add+0x3bb/0xd70
                    xfrm_add_sa+0x2451/0x4a90
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
  ... acquired at:
    __lock_acquire+0x30a0/0x5040
    lock_acquire+0x1be/0x520
    _raw_spin_lock_bh+0x34/0x40
    mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
    xfrm_dev_state_delete+0x90/0x160
    __xfrm_state_delete+0x662/0xae0
    xfrm_state_delete+0x1e/0x30
    xfrm_del_sa+0x1c2/0x340
    xfrm_user_rcv_msg+0x493/0x880
    netlink_rcv_skb+0x12e/0x380
    xfrm_netlink_rcv+0x6d/0x90
    netlink_unicast+0x42f/0x740
    netlink_sendmsg+0x745/0xbe0
    __sock_sendmsg+0xc5/0x190
    __sys_sendto+0x1fe/0x2c0
    __x64_sys_sendto+0xdc/0x1b0
    do_syscall_64+0x6d/0x140
    entry_SYSCALL_64_after_hwframe+0x4b/0x53

 stack backtrace:
 CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x74/0xd0
  check_irq_usage+0x12e8/0x1d90
  ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
  ? check_chain_key+0x1bb/0x4c0
  ? __lockdep_reset_lock+0x180/0x180
  ? check_path.constprop.0+0x24/0x50
  ? mark_lock+0x108/0x2fb0
  ? print_circular_bug+0x9b0/0x9b0
  ? mark_lock+0x108/0x2fb0
  ? print_usage_bug.part.0+0x670/0x670
  ? check_prev_add+0x1c4/0x2310
  check_prev_add+0x1c4/0x2310
  __lock_acquire+0x30a0/0x5040
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  lock_acquire+0x1be/0x520
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? __xfrm_state_delete+0x5f0/0xae0
  ? lock_downgrade+0x6b0/0x6b0
  _raw_spin_lock_bh+0x34/0x40
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  xfrm_dev_state_delete+0x90/0x160
  __xfrm_state_delete+0x662/0xae0
  xfrm_state_delete+0x1e/0x30
  xfrm_del_sa+0x1c2/0x340
  ? xfrm_get_sa+0x250/0x250
  ? check_chain_key+0x1bb/0x4c0
  xfrm_user_rcv_msg+0x493/0x880
  ? copy_sec_ctx+0x270/0x270
  ? check_chain_key+0x1bb/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  netlink_rcv_skb+0x12e/0x380
  ? copy_sec_ctx+0x270/0x270
  ? netlink_ack+0xd90/0xd90
  ? netlink_deliver_tap+0xcd/0xb60
  xfrm_netlink_rcv+0x6d/0x90
  netlink_unicast+0x42f/0x740
  ? netlink_attachskb+0x730/0x730
  ? lock_acquire+0x1be/0x520
  netlink_sendmsg+0x745/0xbe0
  ? netlink_unicast+0x740/0x740
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x740/0x740
  __sock_sendmsg+0xc5/0x190
  ? fdget+0x163/0x1d0
  __sys_sendto+0x1fe/0x2c0
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_user_addr_fault+0x856/0xe30
  ? lock_acquire+0x1be/0x520
  ? __task_pid_nr_ns+0x117/0x410
  ? lock_downgrade+0x6b0/0x6b0
  __x64_sys_sendto+0xdc/0x1b0
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  do_syscall_64+0x6d/0x140
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7f7d31291ba4
 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4
 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a
 RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028
 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1
  </TASK>

Fixes: 4c24272 ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
[ Upstream commit eb09fbe ]

syzkaller reported a corrupted list in ieee802154_if_remove. [1]

Remove an IEEE 802.15.4 network interface after unregister an IEEE 802.15.4
hardware device from the system.

CPU0					CPU1
====					====
genl_family_rcv_msg_doit		ieee802154_unregister_hw
ieee802154_del_iface			ieee802154_remove_interfaces
rdev_del_virtual_intf_deprecated	list_del(&sdata->list)
ieee802154_if_remove
list_del_rcu

The net device has been unregistered, since the rcu grace period,
unregistration must be run before ieee802154_if_remove.

To avoid this issue, add a check for local->interfaces before deleting
sdata list.

[1]
kernel BUG at lib/list_debug.c:58!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 6277 Comm: syz-executor157 Not tainted 6.12.0-rc6-syzkaller-00005-g557329bcecc2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__list_del_entry_valid_or_report+0xf4/0x140 lib/list_debug.c:56
Code: e8 a1 7e 00 07 90 0f 0b 48 c7 c7 e0 37 60 8c 4c 89 fe e8 8f 7e 00 07 90 0f 0b 48 c7 c7 40 38 60 8c 4c 89 fe e8 7d 7e 00 07 90 <0f> 0b 48 c7 c7 a0 38 60 8c 4c 89 fe e8 6b 7e 00 07 90 0f 0b 48 c7
RSP: 0018:ffffc9000490f3d0 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: d211eee56bb28d00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88805b278dd8 R08: ffffffff8174a12c R09: 1ffffffff2852f0d
R10: dffffc0000000000 R11: fffffbfff2852f0e R12: dffffc0000000000
R13: dffffc0000000000 R14: dead000000000100 R15: ffff88805b278cc0
FS:  0000555572f94380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056262e4a3000 CR3: 0000000078496000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:124 [inline]
 __list_del_entry include/linux/list.h:215 [inline]
 list_del_rcu include/linux/rculist.h:157 [inline]
 ieee802154_if_remove+0x86/0x1e0 net/mac802154/iface.c:687
 rdev_del_virtual_intf_deprecated net/ieee802154/rdev-ops.h:24 [inline]
 ieee802154_del_iface+0x2c0/0x5c0 net/ieee802154/nl-phy.c:323
 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
 genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2551
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:729 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:744
 ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2607
 ___sys_sendmsg net/socket.c:2661 [inline]
 __sys_sendmsg+0x292/0x380 net/socket.c:2690
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported-and-tested-by: syzbot+985f827280dc3a6e7e92@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985f827280dc3a6e7e92
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20241113095129.1457225-1-lizhi.xu@windriver.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
commit f6abafc upstream.

Some of the core functions can only be called if the transport
has been assigned.

As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:

    BUG: kernel NULL pointer dereference, address: 00000000000000a0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
    Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
    RIP: 0010:vsock_connectible_has_data+0x1f/0x40
    Call Trace:
     vsock_bpf_recvmsg+0xca/0x5e0
     sock_recvmsg+0xb9/0xc0
     __sys_recvfrom+0xb3/0x130
     __x64_sys_recvfrom+0x20/0x30
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().

Fixes: 634f1a7 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Jan 24, 2025
commit 9860370 upstream.

irq_chip functions may be called in raw spinlock context. Therefore, we
must also use a raw spinlock for our own internal locking.

This fixes the following lockdep splat:

[    5.349336] =============================
[    5.353349] [ BUG: Invalid wait context ]
[    5.357361] 6.13.0-rc5+ #69 Tainted: G        W
[    5.363031] -----------------------------
[    5.367045] kworker/u17:1/44 is trying to lock:
[    5.371587] ffffff88018b02c0 (&chip->gpio_lock){....}-{3:3}, at: xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8))
[    5.380079] other info that might help us debug this:
[    5.385138] context-{5:5}
[    5.387762] 5 locks held by kworker/u17:1/44:
[    5.392123] #0: ffffff8800014958 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3204)
[    5.402260] #1: ffffffc082fcbdd8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3205)
[    5.411528] #2: ffffff880172c900 (&dev->mutex){....}-{4:4}, at: __device_attach (drivers/base/dd.c:1006)
[    5.419929] #3: ffffff88039c8268 (request_class#2){+.+.}-{4:4}, at: __setup_irq (kernel/irq/internals.h:156 kernel/irq/manage.c:1596)
[    5.428331] #4: ffffff88039c80c8 (lock_class#2){....}-{2:2}, at: __setup_irq (kernel/irq/manage.c:1614)
[    5.436472] stack backtrace:
[    5.439359] CPU: 2 UID: 0 PID: 44 Comm: kworker/u17:1 Tainted: G        W          6.13.0-rc5+ #69
[    5.448690] Tainted: [W]=WARN
[    5.451656] Hardware name: xlnx,zynqmp (DT)
[    5.455845] Workqueue: events_unbound deferred_probe_work_func
[    5.461699] Call trace:
[    5.464147] show_stack+0x18/0x24 C
[    5.467821] dump_stack_lvl (lib/dump_stack.c:123)
[    5.471501] dump_stack (lib/dump_stack.c:130)
[    5.474824] __lock_acquire (kernel/locking/lockdep.c:4828 kernel/locking/lockdep.c:4898 kernel/locking/lockdep.c:5176)
[    5.478758] lock_acquire (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814)
[    5.482429] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[    5.486797] xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8))
[    5.490737] irq_enable (kernel/irq/internals.h:236 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345)
[    5.494060] __irq_startup (kernel/irq/internals.h:241 kernel/irq/chip.c:180 kernel/irq/chip.c:250)
[    5.497645] irq_startup (kernel/irq/chip.c:270)
[    5.501143] __setup_irq (kernel/irq/manage.c:1807)
[    5.504728] request_threaded_irq (kernel/irq/manage.c:2208)

Fixes: a32c7ca ("gpio: gpio-xilinx: Add interrupt support")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250110163354.2012654-1-sean.anderson@linux.dev
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
Packets handled by hardware have added secpath as a way to inform XFRM
core code that this path was already handled. That secpath is not needed
at all after policy is checked and it is removed later in the stack.

However, in the case of IP forwarding is enabled (/proc/sys/net/ipv4/ip_forward),
that secpath is not removed and packets which already were handled are reentered
to the driver TX path with xfrm_offload set.

The following kernel panic is observed in mlx5 in such case:

 mlx5_core 0000:04:00.0 enp4s0f0np0: Link up
 mlx5_core 0000:04:00.1 enp4s0f1np1: Link up
 Initializing XFRM netlink socket
 IPsec XFRM device driver
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor instruction fetch in kernel mode
 #PF: error_code(0x0010) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0010 [openbmc#1] PREEMPT SMP
 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc1-alex openbmc#3
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0018:ffffb87380003800 EFLAGS: 00010206
 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf
 RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00
 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010
 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00
 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e
 FS:  0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0
 Call Trace:
  <IRQ>
  ? show_regs+0x63/0x70
  ? __die_body+0x20/0x60
  ? __die+0x2b/0x40
  ? page_fault_oops+0x15c/0x550
  ? do_user_addr_fault+0x3ed/0x870
  ? exc_page_fault+0x7f/0x190
  ? asm_exc_page_fault+0x27/0x30
  mlx5e_ipsec_handle_tx_skb+0xe7/0x2f0 [mlx5_core]
  mlx5e_xmit+0x58e/0x1980 [mlx5_core]
  ? __fib_lookup+0x6a/0xb0
  dev_hard_start_xmit+0x82/0x1d0
  sch_direct_xmit+0xfe/0x390
  __dev_queue_xmit+0x6d8/0xee0
  ? __fib_lookup+0x6a/0xb0
  ? internal_add_timer+0x48/0x70
  ? mod_timer+0xe2/0x2b0
  neigh_resolve_output+0x115/0x1b0
  __neigh_update+0x26a/0xc50
  neigh_update+0x14/0x20
  arp_process+0x2cb/0x8e0
  ? __napi_build_skb+0x5e/0x70
  arp_rcv+0x11e/0x1c0
  ? dev_gro_receive+0x574/0x820
  __netif_receive_skb_list_core+0x1cf/0x1f0
  netif_receive_skb_list_internal+0x183/0x2a0
  napi_complete_done+0x76/0x1c0
  mlx5e_napi_poll+0x234/0x7a0 [mlx5_core]
  __napi_poll+0x2d/0x1f0
  net_rx_action+0x1a6/0x370
  ? atomic_notifier_call_chain+0x3b/0x50
  ? irq_int_handler+0x15/0x20 [mlx5_core]
  handle_softirqs+0xb9/0x2f0
  ? handle_irq_event+0x44/0x60
  irq_exit_rcu+0xdb/0x100
  common_interrupt+0x98/0xc0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x27/0x40
 RIP: 0010:pv_native_safe_halt+0xb/0x10
 Code: 09 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 22
 0f 1f 84 00 00 00 00 00 90 eb 07 0f 00 2d 7f e9 36 00 fb
40 00 83 ff 07 77 21 89 ff ff 24 fd 88 3d a1 bd 0f 21 f8
 RSP: 0018:ffffffffbe603de8 EFLAGS: 00000202
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000f92f46680
 RDX: 0000000000000037 RSI: 00000000ffffffff RDI: 00000000000518d4
 RBP: ffffffffbe603df0 R08: 000000cd42e4dffb R09: ffffffffbe603d70
 R10: 0000004d80d62680 R11: 0000000000000001 R12: ffffffffbe60bf40
 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffbe60aff8
  ? default_idle+0x9/0x20
  arch_cpu_idle+0x9/0x10
  default_idle_call+0x29/0xf0
  do_idle+0x1f2/0x240
  cpu_startup_entry+0x2c/0x30
  rest_init+0xe7/0x100
  start_kernel+0x76b/0xb90
  x86_64_start_reservations+0x18/0x30
  x86_64_start_kernel+0xc0/0x110
  ? setup_ghcb+0xe/0x130
  common_startup_64+0x13e/0x141
  </TASK>
 Modules linked in: esp4_offload esp4 xfrm_interface
xfrm6_tunnel tunnel4 tunnel6 xfrm_user xfrm_algo binfmt_misc
intel_rapl_msr intel_rapl_common kvm_amd ccp kvm input_leds serio_raw
qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua efi_pstore ip_tables x_tables autofs4 raid10 raid456
async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx
libcrc32c raid1 raid0 mlx5_core crct10dif_pclmul crc32_pclmul
polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3
sha1_ssse3 ahci mlxfw i2c_i801 libahci i2c_mux i2c_smbus psample
virtio_rng pci_hyperv_intf aesni_intel crypto_simd cryptd
 CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0018:ffffb87380003800 EFLAGS: 00010206
 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf
 RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00
 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010
 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00
 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e
 FS:  0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x3b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: 5958372 ("xfrm: add RX datapath protection for IPsec packet offload mode")
Signed-off-by: Alexandre Cassen <acassen@corp.free.fr>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
On some systems, the same CPU (with the same APIC ID) is assigned a
different logical CPU id after commit ec9aedb ("x86/acpi: Ignore
invalid x2APIC entries").

This means that Linux enumerates the CPUs in a different order, which
violates ACPI specification[1] that states:

  "OSPM should initialize processors in the order that they appear in
   the MADT"

The problematic commit parses all LAPIC entries before any x2APIC
entries, aiming to ignore x2APIC entries with APIC ID < 255 when valid
LAPIC entries exist. However, it disrupts the CPU enumeration order on
systems where x2APIC entries precede LAPIC entries in the MADT.

Fix this problem by:

 1) Parsing LAPIC entries first without registering them in the
    topology to evaluate whether valid LAPIC entries exist.

 2) Restoring the MADT in order parser which invokes either the LAPIC
    or the X2APIC parser function depending on the entry type.

The X2APIC parser still ignores entries < 0xff in case that openbmc#1 found
valid LAPIC entries independent of their position in the MADT table.

Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order
Cc: All applicable <stable@vger.kernel.org>
Reported-by: Jim Mattson <jmattson@google.com>
Closes: https://lore.kernel.org/all/20241010213136.668672-1-jmattson@google.com/
Fixes: ec9aedb ("x86/acpi: Ignore invalid x2APIC entries")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250117081420.4046737-1-rui.zhang@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
Add read memory barrier to ensure the order of operations when accessing
control queue descriptors. Specifically, we want to avoid cases where loads
can be reordered:

1. Load openbmc#1 is dispatched to read descriptor flags.
2. Load openbmc#2 is dispatched to read some other field from the descriptor.
3. Load openbmc#2 completes, accessing memory/cache at a point in time when the DD
   flag is zero.
4. NIC DMA overwrites the descriptor, now the DD flag is one.
5. Any fields loaded before step 4 are now inconsistent with the actual
   descriptor state.

Add read memory barrier between steps 1 and 2, so that load openbmc#2 is not
executed until load openbmc#1 has completed.

Fixes: 8077c72 ("idpf: add controlq init and reset checks")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Suggested-by: Lance Richardson <rlance@google.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
In "one-shot" mode, turbostat
1. takes a counter snapshot
2. forks and waits for a child
3. takes the end counter snapshot and prints the result.

But turbostat counter snapshots currently use affinity to travel
around the system so that counter reads are "local", and this
affinity must be cleared between openbmc#1 and openbmc#2 above.

The offending commit removed that reset that allowed the child
to run on cpu_present_set.

Fix that issue, and improve upon the original by using
cpu_possible_set for the child.  This allows the child
to also run on CPUs that hotplug online during its runtime.

Reported-by: Zhang Rui <rui.zhang@intel.com>
Fixes: 7bb3fe2 ("tools/power/turbostat: Obey allowed CPUs during startup")
Signed-off-by: Len Brown <len.brown@intel.com>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    openbmc#1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    openbmc#2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    openbmc#3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    openbmc#4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    openbmc#5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    openbmc#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    openbmc#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    openbmc#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    openbmc#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    openbmc#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    openbmc#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    openbmc#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250122025519.361873-1-howardchu95@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
amboar pushed a commit to amboar/linux that referenced this pull request Feb 3, 2025
This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
	_raw_spin_lock_irqsave+0x31/0x40
	folio_lruvec_lock_irqsave+0x5f/0x90
	folio_batch_move_lru+0x91/0x150
	lru_add_drain_per_cpu+0x1c/0x40
	process_one_work+0x17d/0x350
	worker_thread+0x27b/0x3a0
	kthread+0xe8/0x120
	ret_from_fork+0x34/0x50
	ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner:

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
 #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
 openbmc#1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
 openbmc#2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
 openbmc#3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
 openbmc#4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
 openbmc#5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
 openbmc#6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
 openbmc#7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
 openbmc#8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
 openbmc#9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
 r  b   swpd   free  inact active   si   so    bi    bo
 1  0   0 1445623076 45898836 83646008    0    0     0
 1  0   0 1445623076 43450228 86094616    0    0     0
 1  0   0 1445623076 41003480 88541364    0    0     0
 1  0   0 1445623076 38557088 90987756    0    0     0
 1  0   0 1445623076 36109688 93435156    0    0     0
 1  0   0 1445619552 33663256 95881632    0    0     0
 1  0   0 1445619804 31217140 98327792    0    0     0
 1  0   0 1445619804 28769988 100774944    0    0     0
 1  0   0 1445619804 26322348 103222584    0    0     0
 1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c70: 0000000000000000 0000000000000000
    ffffc90006fb7c80: 0000000000000000 0000000000000000
    ffffc90006fb7c90: 0000000000000000 0000000000000000
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937
    ffffc90006fb7cb0: 0000000000000000 0000000000000000
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes: b2e1875 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
amboar pushed a commit that referenced this pull request Feb 3, 2025
commit 6e64d6b upstream.

In commit e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), we introduced a change to assign the job pointer
to NULL after completing a job, indicating job completion.

However, this approach created a race condition between the DRM
scheduler workqueue and the IRQ execution thread. As soon as the fence is
signaled in the IRQ execution thread, a new job starts to be executed.
This results in a race condition where the IRQ execution thread sets the
job pointer to NULL simultaneously as the `run_job()` function assigns
a new job to the pointer.

This race condition can lead to a NULL pointer dereference if the IRQ
execution thread sets the job pointer to NULL after `run_job()` assigns
it to the new job. When the new job completes and the GPU emits an
interrupt, `v3d_irq()` is triggered, potentially causing a crash.

[  466.310099] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0
[  466.318928] Mem abort info:
[  466.321723]   ESR = 0x0000000096000005
[  466.325479]   EC = 0x25: DABT (current EL), IL = 32 bits
[  466.330807]   SET = 0, FnV = 0
[  466.333864]   EA = 0, S1PTW = 0
[  466.337010]   FSC = 0x05: level 1 translation fault
[  466.341900] Data abort info:
[  466.344783]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  466.350285]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  466.355350]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  466.360677] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000089772000
[  466.367140] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  466.375875] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  466.382163] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep binfmt_misc vc4 snd_soc_hdmi_codec drm_display_helper cec brcmfmac_wcc spidev rpivid_hevc(C) drm_client_lib brcmfmac hci_uart drm_dma_helper pisp_be btbcm brcmutil snd_soc_core aes_ce_blk v4l2_mem2mem bluetooth aes_ce_cipher snd_compress videobuf2_dma_contig ghash_ce cfg80211 gf128mul snd_pcm_dmaengine videobuf2_memops ecdh_generic sha2_ce ecc videobuf2_v4l2 snd_pcm v3d sha256_arm64 rfkill videodev snd_timer sha1_ce libaes gpu_sched snd videobuf2_common sha1_generic drm_shmem_helper mc rp1_pio drm_kms_helper raspberrypi_hwmon spi_bcm2835 gpio_keys i2c_brcmstb rp1 raspberrypi_gpiomem rp1_mailbox rp1_adc nvmem_rmem uio_pdrv_genirq uio i2c_dev drm ledtrig_pattern drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6
[  466.458429] CPU: 0 UID: 1000 PID: 2008 Comm: chromium Tainted: G         C         6.13.0-v8+ #18
[  466.467336] Tainted: [C]=CRAP
[  466.470306] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  466.476157] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  466.483143] pc : v3d_irq+0x118/0x2e0 [v3d]
[  466.487258] lr : __handle_irq_event_percpu+0x60/0x228
[  466.492327] sp : ffffffc080003ea0
[  466.495646] x29: ffffffc080003ea0 x28: ffffff80c0c94200 x27: 0000000000000000
[  466.502807] x26: ffffffd08dd81d7b x25: ffffff80c0c94200 x24: ffffff8003bdc200
[  466.509969] x23: 0000000000000001 x22: 00000000000000a7 x21: 0000000000000000
[  466.517130] x20: ffffff8041bb0000 x19: 0000000000000001 x18: 0000000000000000
[  466.524291] x17: ffffffafadfb0000 x16: ffffffc080000000 x15: 0000000000000000
[  466.531452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  466.538613] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffd08c527eb0
[  466.545777] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[  466.552941] x5 : ffffffd08c4100d0 x4 : ffffffafadfb0000 x3 : ffffffc080003f70
[  466.560102] x2 : ffffffc0829e8058 x1 : 0000000000000001 x0 : 0000000000000000
[  466.567263] Call trace:
[  466.569711]  v3d_irq+0x118/0x2e0 [v3d] (P)
[  466.573826]  __handle_irq_event_percpu+0x60/0x228
[  466.578546]  handle_irq_event+0x54/0xb8
[  466.582391]  handle_fasteoi_irq+0xac/0x240
[  466.586498]  generic_handle_domain_irq+0x34/0x58
[  466.591128]  gic_handle_irq+0x48/0xd8
[  466.594798]  call_on_irq_stack+0x24/0x58
[  466.598730]  do_interrupt_handler+0x88/0x98
[  466.602923]  el0_interrupt+0x44/0xc0
[  466.606508]  __el0_irq_handler_common+0x18/0x28
[  466.611050]  el0t_64_irq_handler+0x10/0x20
[  466.615156]  el0t_64_irq+0x198/0x1a0
[  466.618740] Code: 52800035 3607faf3 f9442e80 52800021 (f9406018)
[  466.624853] ---[ end trace 0000000000000000 ]---
[  466.629483] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  466.636384] SMP: stopping secondary CPUs
[  466.640320] Kernel Offset: 0x100c400000 from 0xffffffc080000000
[  466.646259] PHYS_OFFSET: 0x0
[  466.649141] CPU features: 0x100,00000170,00901250,0200720b
[  466.654644] Memory Limit: none
[  466.657706] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Fix the crash by assigning the job pointer to NULL before signaling the
fence. This ensures that the job pointer is cleared before any new job
starts execution, preventing the race condition and the NULL pointer
dereference crash.

Cc: stable@vger.kernel.org
Fixes: e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL after job completion")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Phil Elwell <phil@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123012403.20447-1-mcanal@igalia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
[ Upstream commit 1068568 ]

The current implementation does not work correctly with a limit of
1. iproute2 actually checks for this and this patch adds the check in
kernel as well.

This fixes the following syzkaller reported crash:

UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:210:6
index 65535 is out of range for type 'struct sfq_head[128]'
CPU: 0 PID: 2569 Comm: syz-executor101 Not tainted 5.10.0-smp-DEV #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack+0x125/0x19f lib/dump_stack.c:120
  ubsan_epilogue lib/ubsan.c:148 [inline]
  __ubsan_handle_out_of_bounds+0xed/0x120 lib/ubsan.c:347
  sfq_link net/sched/sch_sfq.c:210 [inline]
  sfq_dec+0x528/0x600 net/sched/sch_sfq.c:238
  sfq_dequeue+0x39b/0x9d0 net/sched/sch_sfq.c:500
  sfq_reset+0x13/0x50 net/sched/sch_sfq.c:525
  qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
  tbf_reset+0x3d/0x100 net/sched/sch_tbf.c:319
  qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
  dev_reset_queue+0x8c/0x140 net/sched/sch_generic.c:1296
  netdev_for_each_tx_queue include/linux/netdevice.h:2350 [inline]
  dev_deactivate_many+0x6dc/0xc20 net/sched/sch_generic.c:1362
  __dev_close_many+0x214/0x350 net/core/dev.c:1468
  dev_close_many+0x207/0x510 net/core/dev.c:1506
  unregister_netdevice_many+0x40f/0x16b0 net/core/dev.c:10738
  unregister_netdevice_queue+0x2be/0x310 net/core/dev.c:10695
  unregister_netdevice include/linux/netdevice.h:2893 [inline]
  __tun_detach+0x6b6/0x1600 drivers/net/tun.c:689
  tun_detach drivers/net/tun.c:705 [inline]
  tun_chr_close+0x104/0x1b0 drivers/net/tun.c:3640
  __fput+0x203/0x840 fs/file_table.c:280
  task_work_run+0x129/0x1b0 kernel/task_work.c:185
  exit_task_work include/linux/task_work.h:33 [inline]
  do_exit+0x5ce/0x2200 kernel/exit.c:931
  do_group_exit+0x144/0x310 kernel/exit.c:1046
  __do_sys_exit_group kernel/exit.c:1057 [inline]
  __se_sys_exit_group kernel/exit.c:1055 [inline]
  __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:1055
 do_syscall_64+0x6c/0xd0
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fe5e7b52479
Code: Unable to access opcode bytes at RIP 0x7fe5e7b5244f.
RSP: 002b:00007ffd3c800398 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe5e7b52479
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007fe5e7bcd2d0 R08: ffffffffffffffb8 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe5e7bcd2d0
R13: 0000000000000000 R14: 00007fe5e7bcdd20 R15: 00007fe5e7b24270

The crash can be also be reproduced with the following (with a tc
recompiled to allow for sfq limits of 1):

tc qdisc add dev dummy0 handle 1: root tbf rate 1Kbit burst 100b lat 1s
../iproute2-6.9.0/tc/tc qdisc add dev dummy0 handle 2: parent 1:10 sfq limit 1
ifconfig dummy0 up
ping -I dummy0 -f -c2 -W0.1 8.8.8.8
sleep 1

Scenario that triggers the crash:

* the first packet is sent and queued in TBF and SFQ; qdisc qlen is 1

* TBF dequeues: it peeks from SFQ which moves the packet to the
  gso_skb list and keeps qdisc qlen set to 1. TBF is out of tokens so
  it schedules itself for later.

* the second packet is sent and TBF tries to queues it to SFQ. qdisc
  qlen is now 2 and because the SFQ limit is 1 the packet is dropped
  by SFQ. At this point qlen is 1, and all of the SFQ slots are empty,
  however q->tail is not NULL.

At this point, assuming no more packets are queued, when sch_dequeue
runs again it will decrement the qlen for the current empty slot
causing an underflow and the subsequent out of bounds access.

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Octavian Purdila <tavip@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241204030520.2084663-2-tavip@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
[ Upstream commit 95fc45d ]

syzbot found a lockdep issue [1].

We should remove ax25 RTNL dependency in ax25_setsockopt()

This should also fix a variety of possible UAF in ax25.

[1]

WARNING: possible circular locking dependency detected
6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0 Not tainted
------------------------------------------------------
syz.5.1818/12806 is trying to acquire lock:
 ffffffff8fcb3988 (rtnl_mutex){+.+.}-{4:4}, at: ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680

but task is already holding lock:
 ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
 ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_AX25){+.+.}-{0:0}:
        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
        lock_sock_nested+0x48/0x100 net/core/sock.c:3642
        lock_sock include/net/sock.h:1618 [inline]
        ax25_kill_by_device net/ax25/af_ax25.c:101 [inline]
        ax25_device_event+0x24d/0x580 net/ax25/af_ax25.c:146
        notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
       __dev_notify_flags+0x207/0x400
        dev_change_flags+0xf0/0x1a0 net/core/dev.c:9026
        dev_ifsioc+0x7c8/0xe70 net/core/dev_ioctl.c:563
        dev_ioctl+0x719/0x1340 net/core/dev_ioctl.c:820
        sock_do_ioctl+0x240/0x460 net/socket.c:1234
        sock_ioctl+0x626/0x8e0 net/socket.c:1339
        vfs_ioctl fs/ioctl.c:51 [inline]
        __do_sys_ioctl fs/ioctl.c:906 [inline]
        __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (rtnl_mutex){+.+.}-{4:4}:
        check_prev_add kernel/locking/lockdep.c:3161 [inline]
        check_prevs_add kernel/locking/lockdep.c:3280 [inline]
        validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
        __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
        __mutex_lock_common kernel/locking/mutex.c:585 [inline]
        __mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
        ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
        __sys_setsockopt net/socket.c:2349 [inline]
        __do_sys_setsockopt net/socket.c:2355 [inline]
        __se_sys_setsockopt net/socket.c:2352 [inline]
        __x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_AX25);
                               lock(rtnl_mutex);
                               lock(sk_lock-AF_AX25);
  lock(rtnl_mutex);

 *** DEADLOCK ***

1 lock held by syz.5.1818/12806:
  #0: ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
  #0: ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574

stack backtrace:
CPU: 1 UID: 0 PID: 12806 Comm: syz.5.1818 Not tainted 6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:94 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
  print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
  check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
  check_prev_add kernel/locking/lockdep.c:3161 [inline]
  check_prevs_add kernel/locking/lockdep.c:3280 [inline]
  validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
  __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
  lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
  __mutex_lock_common kernel/locking/mutex.c:585 [inline]
  __mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
  ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
  do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
  __sys_setsockopt net/socket.c:2349 [inline]
  __do_sys_setsockopt net/socket.c:2355 [inline]
  __se_sys_setsockopt net/socket.c:2352 [inline]
  __x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7b62385d29

Fixes: c433570 ("ax25: fix a use-after-free in ax25_fillin_cb()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250103210514.87290-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
[ Upstream commit be7a6a7 ]

It isn't guaranteed that NETWORK_INTERFACE_INFO::LinkSpeed will always
be set by the server, so the client must handle any values and then
prevent oopses like below from happening:

Oops: divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 1323 Comm: cat Not tainted 6.13.0-rc7 #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41
04/01/2014
RIP: 0010:cifs_debug_data_proc_show+0xa45/0x1460 [cifs] Code: 00 00 48
89 df e8 3b cd 1b c1 41 f6 44 24 2c 04 0f 84 50 01 00 00 48 89 ef e8
e7 d0 1b c1 49 8b 44 24 18 31 d2 49 8d 7c 24 28 <48> f7 74 24 18 48 89
c3 e8 6e cf 1b c1 41 8b 6c 24 28 49 8d 7c 24
RSP: 0018:ffffc90001817be0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88811230022c RCX: ffffffffc041bd99
RDX: 0000000000000000 RSI: 0000000000000567 RDI: ffff888112300228
RBP: ffff888112300218 R08: fffff52000302f5f R09: ffffed1022fa58ac
R10: ffff888117d2c566 R11: 00000000fffffffe R12: ffff888112300200
R13: 000000012a15343f R14: 0000000000000001 R15: ffff888113f2db58
FS: 00007fe27119e740(0000) GS:ffff888148600000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2633c5000 CR3: 0000000124da0000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body.cold+0x19/0x27
 ? die+0x2e/0x50
 ? do_trap+0x159/0x1b0
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? do_error_trap+0x90/0x130
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? exc_divide_error+0x39/0x50
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? asm_exc_divide_error+0x1a/0x20
 ? cifs_debug_data_proc_show+0xa39/0x1460 [cifs]
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? seq_read_iter+0x42e/0x790
 seq_read_iter+0x19a/0x790
 proc_reg_read_iter+0xbe/0x110
 ? __pfx_proc_reg_read_iter+0x10/0x10
 vfs_read+0x469/0x570
 ? do_user_addr_fault+0x398/0x760
 ? __pfx_vfs_read+0x10/0x10
 ? find_held_lock+0x8a/0xa0
 ? __pfx_lock_release+0x10/0x10
 ksys_read+0xd3/0x170
 ? __pfx_ksys_read+0x10/0x10
 ? __rcu_read_unlock+0x50/0x270
 ? mark_held_locks+0x1a/0x90
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe271288911
Code: 00 48 8b 15 01 25 10 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8
20 ad 01 00 f3 0f 1e fa 80 3d b5 a7 10 00 00 74 13 31 c0 0f 05 <48> 3d
00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
RSP: 002b:00007ffe87c079d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007fe271288911
RDX: 0000000000040000 RSI: 00007fe2633c6000 RDI: 0000000000000003
RBP: 00007ffe87c07a00 R08: 0000000000000000 R09: 00007fe2713e6380
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
R13: 00007fe2633c6000 R14: 0000000000000003 R15: 0000000000000000
 </TASK>

Fix this by setting cifs_server_iface::speed to a sane value (1Gbps)
by default when link speed is unset.

Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Fixes: a6d8fb5 ("cifs: distribute channels across interfaces based on speed")
Reported-by: Frank Sorenson <sorenson@redhat.com>
Reported-by: Jay Shin <jaeshin@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
[ Upstream commit 600258d ]

Packets handled by hardware have added secpath as a way to inform XFRM
core code that this path was already handled. That secpath is not needed
at all after policy is checked and it is removed later in the stack.

However, in the case of IP forwarding is enabled (/proc/sys/net/ipv4/ip_forward),
that secpath is not removed and packets which already were handled are reentered
to the driver TX path with xfrm_offload set.

The following kernel panic is observed in mlx5 in such case:

 mlx5_core 0000:04:00.0 enp4s0f0np0: Link up
 mlx5_core 0000:04:00.1 enp4s0f1np1: Link up
 Initializing XFRM netlink socket
 IPsec XFRM device driver
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor instruction fetch in kernel mode
 #PF: error_code(0x0010) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0010 [#1] PREEMPT SMP
 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc1-alex #3
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0018:ffffb87380003800 EFLAGS: 00010206
 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf
 RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00
 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010
 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00
 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e
 FS:  0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0
 Call Trace:
  <IRQ>
  ? show_regs+0x63/0x70
  ? __die_body+0x20/0x60
  ? __die+0x2b/0x40
  ? page_fault_oops+0x15c/0x550
  ? do_user_addr_fault+0x3ed/0x870
  ? exc_page_fault+0x7f/0x190
  ? asm_exc_page_fault+0x27/0x30
  mlx5e_ipsec_handle_tx_skb+0xe7/0x2f0 [mlx5_core]
  mlx5e_xmit+0x58e/0x1980 [mlx5_core]
  ? __fib_lookup+0x6a/0xb0
  dev_hard_start_xmit+0x82/0x1d0
  sch_direct_xmit+0xfe/0x390
  __dev_queue_xmit+0x6d8/0xee0
  ? __fib_lookup+0x6a/0xb0
  ? internal_add_timer+0x48/0x70
  ? mod_timer+0xe2/0x2b0
  neigh_resolve_output+0x115/0x1b0
  __neigh_update+0x26a/0xc50
  neigh_update+0x14/0x20
  arp_process+0x2cb/0x8e0
  ? __napi_build_skb+0x5e/0x70
  arp_rcv+0x11e/0x1c0
  ? dev_gro_receive+0x574/0x820
  __netif_receive_skb_list_core+0x1cf/0x1f0
  netif_receive_skb_list_internal+0x183/0x2a0
  napi_complete_done+0x76/0x1c0
  mlx5e_napi_poll+0x234/0x7a0 [mlx5_core]
  __napi_poll+0x2d/0x1f0
  net_rx_action+0x1a6/0x370
  ? atomic_notifier_call_chain+0x3b/0x50
  ? irq_int_handler+0x15/0x20 [mlx5_core]
  handle_softirqs+0xb9/0x2f0
  ? handle_irq_event+0x44/0x60
  irq_exit_rcu+0xdb/0x100
  common_interrupt+0x98/0xc0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x27/0x40
 RIP: 0010:pv_native_safe_halt+0xb/0x10
 Code: 09 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 22
 0f 1f 84 00 00 00 00 00 90 eb 07 0f 00 2d 7f e9 36 00 fb
40 00 83 ff 07 77 21 89 ff ff 24 fd 88 3d a1 bd 0f 21 f8
 RSP: 0018:ffffffffbe603de8 EFLAGS: 00000202
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000f92f46680
 RDX: 0000000000000037 RSI: 00000000ffffffff RDI: 00000000000518d4
 RBP: ffffffffbe603df0 R08: 000000cd42e4dffb R09: ffffffffbe603d70
 R10: 0000004d80d62680 R11: 0000000000000001 R12: ffffffffbe60bf40
 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffbe60aff8
  ? default_idle+0x9/0x20
  arch_cpu_idle+0x9/0x10
  default_idle_call+0x29/0xf0
  do_idle+0x1f2/0x240
  cpu_startup_entry+0x2c/0x30
  rest_init+0xe7/0x100
  start_kernel+0x76b/0xb90
  x86_64_start_reservations+0x18/0x30
  x86_64_start_kernel+0xc0/0x110
  ? setup_ghcb+0xe/0x130
  common_startup_64+0x13e/0x141
  </TASK>
 Modules linked in: esp4_offload esp4 xfrm_interface
xfrm6_tunnel tunnel4 tunnel6 xfrm_user xfrm_algo binfmt_misc
intel_rapl_msr intel_rapl_common kvm_amd ccp kvm input_leds serio_raw
qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua efi_pstore ip_tables x_tables autofs4 raid10 raid456
async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx
libcrc32c raid1 raid0 mlx5_core crct10dif_pclmul crc32_pclmul
polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3
sha1_ssse3 ahci mlxfw i2c_i801 libahci i2c_mux i2c_smbus psample
virtio_rng pci_hyperv_intf aesni_intel crypto_simd cryptd
 CR2: 0000000000000000
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0018:ffffb87380003800 EFLAGS: 00010206
 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf
 RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00
 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010
 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00
 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e
 FS:  0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x3b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: 5958372 ("xfrm: add RX datapath protection for IPsec packet offload mode")
Signed-off-by: Alexandre Cassen <acassen@corp.free.fr>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
[ Upstream commit c7b87ce ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250122025519.361873-1-howardchu95@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
commit c79a39d upstream.

On a board running ntpd and gpsd, I'm seeing a consistent use-after-free
in sys_exit() from gpsd when rebooting:

    pps pps1: removed
    ------------[ cut here ]------------
    kobject: '(null)' (00000000db4bec24): is not initialized, yet kobject_put() is being called.
    WARNING: CPU: 2 PID: 440 at lib/kobject.c:734 kobject_put+0x120/0x150
    CPU: 2 UID: 299 PID: 440 Comm: gpsd Not tainted 6.11.0-rc6-00308-gb31c44928842 #1
    Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
    pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : kobject_put+0x120/0x150
    lr : kobject_put+0x120/0x150
    sp : ffffffc0803d3ae0
    x29: ffffffc0803d3ae0 x28: ffffff8042dc9738 x27: 0000000000000001
    x26: 0000000000000000 x25: ffffff8042dc9040 x24: ffffff8042dc9440
    x23: ffffff80402a4620 x22: ffffff8042ef4bd0 x21: ffffff80405cb600
    x20: 000000000008001b x19: ffffff8040b3b6e0 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000 x15: 696e6920746f6e20
    x14: 7369203a29343263 x13: 205d303434542020 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
    x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    Call trace:
     kobject_put+0x120/0x150
     cdev_put+0x20/0x3c
     __fput+0x2c4/0x2d8
     ____fput+0x1c/0x38
     task_work_run+0x70/0xfc
     do_exit+0x2a0/0x924
     do_group_exit+0x34/0x90
     get_signal+0x7fc/0x8c0
     do_signal+0x128/0x13b4
     do_notify_resume+0xdc/0x160
     el0_svc+0xd4/0xf8
     el0t_64_sync_handler+0x140/0x14c
     el0t_64_sync+0x190/0x194
    ---[ end trace 0000000000000000 ]---

...followed by more symptoms of corruption, with similar stacks:

    refcount_t: underflow; use-after-free.
    kernel BUG at lib/list_debug.c:62!
    Kernel panic - not syncing: Oops - BUG: Fatal exception

This happens because pps_device_destruct() frees the pps_device with the
embedded cdev immediately after calling cdev_del(), but, as the comment
above cdev_del() notes, fops for previously opened cdevs are still
callable even after cdev_del() returns. I think this bug has always
been there: I can't explain why it suddenly started happening every time
I reboot this particular board.

In commit d953e0e ("pps: Fix a use-after free bug when
unregistering a source."), George Spelvin suggested removing the
embedded cdev. That seems like the simplest way to fix this, so I've
implemented his suggestion, using __register_chrdev() with pps_idr
becoming the source of truth for which minor corresponds to which
device.

But now that pps_idr defines userspace visibility instead of cdev_add(),
we need to be sure the pps->dev refcount can't reach zero while
userspace can still find it again. So, the idr_remove() call moves to
pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev.

    pps_core: source serial1 got cdev (251:1)
    <...>
    pps pps1: removed
    pps_core: unregistering pps1
    pps_core: deallocating pps1

Fixes: d953e0e ("pps: Fix a use-after free bug when unregistering a source.")
Cc: stable@vger.kernel.org
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/a17975fd5ae99385791929e563f72564edbcf28f.1731383727.git.calvin@wbinvd.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 12, 2025
commit ee1b504 upstream.

The following kernel oops is thrown when trying to remove the max96712
module:

Unable to handle kernel paging request at virtual address 00007375746174db
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af89000
[00007375746174db] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in: crct10dif_ce polyval_ce mxc_jpeg_encdec flexcan
    snd_soc_fsl_sai snd_soc_fsl_asoc_card snd_soc_fsl_micfil dwc_mipi_csi2
    imx_csi_formatter polyval_generic v4l2_jpeg imx_pcm_dma can_dev
    snd_soc_imx_audmux snd_soc_wm8962 snd_soc_imx_card snd_soc_fsl_utils
    max96712(C-) rpmsg_ctrl rpmsg_char pwm_fan fuse
    [last unloaded: imx8_isi]
CPU: 0 UID: 0 PID: 754 Comm: rmmod
	    Tainted: G         C    6.12.0-rc6-06364-g327fec852c31 #17
Tainted: [C]=CRAP
Hardware name: NXP i.MX95 19X19 board (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : led_put+0x1c/0x40
lr : v4l2_subdev_put_privacy_led+0x48/0x58
sp : ffff80008699bbb0
x29: ffff80008699bbb0 x28: ffff00008ac233c0 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: ffff000080cf1170 x22: ffff00008b53bd00 x21: ffff8000822ad1c8
x20: ffff000080ff5c00 x19: ffff00008b53be40 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000004 x13: ffff0000800f8010 x12: 0000000000000000
x11: ffff000082acf5c0 x10: ffff000082acf478 x9 : ffff0000800f8010
x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
x2 : ffff00008ac233c0 x1 : ffff00008ac233c0 x0 : ff00737574617473
Call trace:
 led_put+0x1c/0x40
 v4l2_subdev_put_privacy_led+0x48/0x58
 v4l2_async_unregister_subdev+0x2c/0x1a4
 max96712_remove+0x1c/0x38 [max96712]
 i2c_device_remove+0x2c/0x9c
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1cc/0x228
 driver_detach+0x4c/0x98
 bus_remove_driver+0x6c/0xbc
 driver_unregister+0x30/0x60
 i2c_del_driver+0x54/0x64
 max96712_i2c_driver_exit+0x18/0x1d0 [max96712]
 __arm64_sys_delete_module+0x1a4/0x290
 invoke_syscall+0x48/0x10c
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0xd8
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x190/0x194
Code: f9000bf3 aa0003f3 f9402800 f9402000 (f9403400)
---[ end trace 0000000000000000 ]---

This happens because in v4l2_i2c_subdev_init(), the i2c_set_cliendata()
is called again and the data is overwritten to point to sd, instead of
priv. So, in remove(), the wrong pointer is passed to
v4l2_async_unregister_subdev(), leading to a crash.

Fixes: 5814f32 ("media: staging: max96712: Add basic support for MAX96712 GMSL2 deserializer")
Signed-off-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com>
Cc: stable@vger.kernel.org
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 18, 2025
…saction abort

[ Upstream commit 0d85f5c ]

If while we are doing a direct IO write a transaction abort happens, we
mark all existing ordered extents with the BTRFS_ORDERED_IOERR flag (done
at btrfs_destroy_ordered_extents()), and then after that if we enter
btrfs_split_ordered_extent() and the ordered extent has bytes left
(meaning we have a bio that doesn't cover the whole ordered extent, see
details at btrfs_extract_ordered_extent()), we will fail on the following
assertion at btrfs_split_ordered_extent():

   ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));

because the BTRFS_ORDERED_IOERR flag is set and the definition of
BTRFS_ORDERED_TYPE_FLAGS is just the union of all flags that identify the
type of write (regular, nocow, prealloc, compressed, direct IO, encoded).

Fix this by returning an error from btrfs_extract_ordered_extent() if we
find the BTRFS_ORDERED_IOERR flag in the ordered extent. The error will
be the error that resulted in the transaction abort or -EIO if no
transaction abort happened.

This was recently reported by syzbot with the following trace:

   FAULT_INJECTION: forcing a failure.
   name failslab, interval 1, probability 0, space 0, times 1
   CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
   Call Trace:
    <TASK>
    __dump_stack lib/dump_stack.c:94 [inline]
    dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
    fail_dump lib/fault-inject.c:53 [inline]
    should_fail_ex+0x3b0/0x4e0 lib/fault-inject.c:154
    should_failslab+0xac/0x100 mm/failslab.c:46
    slab_pre_alloc_hook mm/slub.c:4072 [inline]
    slab_alloc_node mm/slub.c:4148 [inline]
    __do_kmalloc_node mm/slub.c:4297 [inline]
    __kmalloc_noprof+0xdd/0x4c0 mm/slub.c:4310
    kmalloc_noprof include/linux/slab.h:905 [inline]
    kzalloc_noprof include/linux/slab.h:1037 [inline]
    btrfs_chunk_alloc_add_chunk_item+0x244/0x1100 fs/btrfs/volumes.c:5742
    reserve_chunk_space+0x1ca/0x2c0 fs/btrfs/block-group.c:4292
    check_system_chunk fs/btrfs/block-group.c:4319 [inline]
    do_chunk_alloc fs/btrfs/block-group.c:3891 [inline]
    btrfs_chunk_alloc+0x77b/0xf80 fs/btrfs/block-group.c:4187
    find_free_extent_update_loop fs/btrfs/extent-tree.c:4166 [inline]
    find_free_extent+0x42d1/0x5810 fs/btrfs/extent-tree.c:4579
    btrfs_reserve_extent+0x422/0x810 fs/btrfs/extent-tree.c:4672
    btrfs_new_extent_direct fs/btrfs/direct-io.c:186 [inline]
    btrfs_get_blocks_direct_write+0x706/0xfa0 fs/btrfs/direct-io.c:321
    btrfs_dio_iomap_begin+0xbb7/0x1180 fs/btrfs/direct-io.c:525
    iomap_iter+0x697/0xf60 fs/iomap/iter.c:90
    __iomap_dio_rw+0xeb9/0x25b0 fs/iomap/direct-io.c:702
    btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
    btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
    btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
    do_iter_readv_writev+0x600/0x880
    vfs_writev+0x376/0xba0 fs/read_write.c:1050
    do_pwritev fs/read_write.c:1146 [inline]
    __do_sys_pwritev2 fs/read_write.c:1204 [inline]
    __se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   RIP: 0033:0x7f1281f85d29
   RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
   RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
   RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
   RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
   R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
   R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
    </TASK>
   BTRFS error (device loop0 state A): Transaction aborted (error -12)
   BTRFS: error (device loop0 state A) in btrfs_chunk_alloc_add_chunk_item:5745: errno=-12 Out of memory
   BTRFS info (device loop0 state EA): forced readonly
   assertion failed: !(flags & ~BTRFS_ORDERED_TYPE_FLAGS), in fs/btrfs/ordered-data.c:1234
   ------------[ cut here ]------------
   kernel BUG at fs/btrfs/ordered-data.c:1234!
   Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
   CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
   RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
   RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
   RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
   RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
   RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
   R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
   R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
   FS:  00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    <TASK>
    btrfs_extract_ordered_extent fs/btrfs/direct-io.c:702 [inline]
    btrfs_dio_submit_io+0x4be/0x6d0 fs/btrfs/direct-io.c:737
    iomap_dio_submit_bio fs/iomap/direct-io.c:85 [inline]
    iomap_dio_bio_iter+0x1022/0x1740 fs/iomap/direct-io.c:447
    __iomap_dio_rw+0x13b7/0x25b0 fs/iomap/direct-io.c:703
    btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
    btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
    btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
    do_iter_readv_writev+0x600/0x880
    vfs_writev+0x376/0xba0 fs/read_write.c:1050
    do_pwritev fs/read_write.c:1146 [inline]
    __do_sys_pwritev2 fs/read_write.c:1204 [inline]
    __se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
   RIP: 0033:0x7f1281f85d29
   RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
   RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
   RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
   RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
   R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
   R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
    </TASK>
   Modules linked in:
   ---[ end trace 0000000000000000 ]---
   RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
   RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
   RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
   RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
   RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
   R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
   R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
   FS:  00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

In this case the transaction abort was due to (an injected) memory
allocation failure when attempting to allocate a new chunk.

Reported-by: syzbot+f60d8337a5c8e8d92a77@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6777f2dd.050a0220.178762.0045.GAE@google.com/
Fixes: 52b1fdc ("btrfs: handle completed ordered extents in btrfs_split_ordered_extent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 18, 2025
…ex()

[ Upstream commit 082d9e2 ]

Somewhen between 6.10 and 6.11 the driver started to crash on my
MacBookPro14,3. The property doesn't exist and 'tmp' remains
uninitialized, so we pass a random pointer to devm_kstrdup().

The crash I am getting looks like this:

BUG: unable to handle page fault for address: 00007f033c669379
PF: supervisor read access in kernel mode
PF: error_code(0x0001) - permissions violation
PGD 8000000101341067 P4D 8000000101341067 PUD 101340067 PMD 1013bb067 PTE 800000010aee9025
Oops: Oops: 0001 [#1] SMP PTI
CPU: 4 UID: 0 PID: 827 Comm: (udev-worker) Not tainted 6.11.8-gentoo #1
Hardware name: Apple Inc. MacBookPro14,3/Mac-551B86E5744E2388, BIOS 529.140.2.0.0 06/23/2024
RIP: 0010:strlen+0x4/0x30
Code: f7 75 ec 31 c0 c3 cc cc cc cc 48 89 f8 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <80> 3f 00 74 14 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 cc
RSP: 0018:ffffb4aac0683ad8 EFLAGS: 00010202
RAX: 00000000ffffffea RBX: 00007f033c669379 RCX: 0000000000000001
RDX: 0000000000000cc0 RSI: 00007f033c669379 RDI: 00007f033c669379
RBP: 00000000ffffffea R08: 0000000000000000 R09: 00000000c0ba916a
R10: ffffffffffffffff R11: ffffffffb61ea260 R12: ffff91f7815b50c8
R13: 0000000000000cc0 R14: ffff91fafefffe30 R15: ffffb4aac0683b30
FS:  00007f033ccbe8c0(0000) GS:ffff91faeed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f033c669379 CR3: 0000000107b1e004 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die+0x23/0x70
 ? page_fault_oops+0x149/0x4c0
 ? raw_spin_rq_lock_nested+0xe/0x20
 ? sched_balance_newidle+0x22b/0x3c0
 ? update_load_avg+0x78/0x770
 ? exc_page_fault+0x6f/0x150
 ? asm_exc_page_fault+0x26/0x30
 ? __pfx_pci_conf1_write+0x10/0x10
 ? strlen+0x4/0x30
 devm_kstrdup+0x25/0x70
 brcmf_of_probe+0x273/0x350 [brcmfmac]

Signed-off-by: Stefan Dösinger <stefan@codeweavers.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20250106170958.3595-1-stefan@codeweavers.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 18, 2025
commit a3a860b upstream.

The following failure was reported on HPE ProLiant D320:

[   10.693310][    T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id 0)
[   10.848132][    T1] ------------[ cut here ]------------
[   10.853559][    T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727 __alloc_pages_noprof+0x2ca/0x330
[   10.862827][    T1] Modules linked in:
[   10.866671][    T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased) 588cd98293a7c9eba9013378d807364c088c9375
[   10.882741][    T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant DL320 Gen12, BIOS 1.20 10/28/2024
[   10.892170][    T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330
[   10.898103][    T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe ff ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce 01 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1
[   10.917750][    T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246
[   10.923777][    T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX: 0000000000000000
[   10.931727][    T1] RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000040cc0

The above transcript shows that ACPI pointed a 16 MiB buffer for the log
events because RSI maps to the 'order' parameter of __alloc_pages_noprof().
Address the bug by moving from devm_kmalloc() to devm_add_action() and
kvmalloc() and devm_add_action().

Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org # v2.6.16+
Fixes: 55a82ab ("[PATCH] tpm: add bios measurement log")
Reported-by: Andy Liang <andy.liang@hpe.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219495
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Andy Liang <andy.liang@hpe.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 18, 2025
commit b0fce54 upstream.

syz reports an out of bounds read:

==================================================================
BUG: KASAN: slab-out-of-bounds in ocfs2_match fs/ocfs2/dir.c:334
[inline]
BUG: KASAN: slab-out-of-bounds in ocfs2_search_dirblock+0x283/0x6e0
fs/ocfs2/dir.c:367
Read of size 1 at addr ffff88804d8b9982 by task syz-executor.2/14802

CPU: 0 UID: 0 PID: 14802 Comm: syz-executor.2 Not tainted 6.13.0-rc4 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
04/01/2014
Sched_ext: serialise (enabled+all), task: runnable_at=-10ms
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x229/0x350 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x164/0x530 mm/kasan/report.c:489
kasan_report+0x147/0x180 mm/kasan/report.c:602
ocfs2_match fs/ocfs2/dir.c:334 [inline]
ocfs2_search_dirblock+0x283/0x6e0 fs/ocfs2/dir.c:367
ocfs2_find_entry_id fs/ocfs2/dir.c:414 [inline]
ocfs2_find_entry+0x1143/0x2db0 fs/ocfs2/dir.c:1078
ocfs2_find_files_on_disk+0x18e/0x530 fs/ocfs2/dir.c:1981
ocfs2_lookup_ino_from_name+0xb6/0x110 fs/ocfs2/dir.c:2003
ocfs2_lookup+0x30a/0xd40 fs/ocfs2/namei.c:122
lookup_open fs/namei.c:3627 [inline]
open_last_lookups fs/namei.c:3748 [inline]
path_openat+0x145a/0x3870 fs/namei.c:3984
do_filp_open+0xe9/0x1c0 fs/namei.c:4014
do_sys_openat2+0x135/0x1d0 fs/open.c:1402
do_sys_open fs/open.c:1417 [inline]
__do_sys_openat fs/open.c:1433 [inline]
__se_sys_openat fs/open.c:1428 [inline]
__x64_sys_openat+0x15d/0x1c0 fs/open.c:1428
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf6/0x210 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f01076903ad
Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f01084acfc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f01077cbf80 RCX: 00007f01076903ad
RDX: 0000000000105042 RSI: 0000000020000080 RDI: ffffffffffffff9c
RBP: 00007f01077cbf80 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001ff R11: 0000000000000246 R12: 0000000000000000
R13: 00007f01077cbf80 R14: 00007f010764fc90 R15: 00007f010848d000
</TASK>
==================================================================

And a general protection fault in ocfs2_prepare_dir_for_insert:

==================================================================
loop0: detected capacity change from 0 to 32768
JBD2: Ignoring recovery information on journal
ocfs2: Mounting device (7,0) on (node local, slot 0) with ordered data
mode.
Oops: general protection fault, probably for non-canonical address
0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 5096 Comm: syz-executor792 Not tainted
6.11.0-rc4-syzkaller-00002-gb0da640826ba #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ocfs2_find_dir_space_id fs/ocfs2/dir.c:3406 [inline]
RIP: 0010:ocfs2_prepare_dir_for_insert+0x3309/0x5c70 fs/ocfs2/dir.c:4280
Code: 00 00 e8 2a 25 13 fe e9 ba 06 00 00 e8 20 25 13 fe e9 4f 01 00 00
e8 16 25 13 fe 49 8d 7f 08 49 8d 5f 09 48 89 f8 48 c1 e8 03 <42> 0f b6
04 20 84 c0 0f 85 bd 23 00 00 48 89 d8 48 c1 e8 03 42 0f
RSP: 0018:ffffc9000af9f020 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000009 RCX: ffff88801e27a440
RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000008
RBP: ffffc9000af9f830 R08: ffffffff8380395b R09: ffffffff838090a7
R10: 0000000000000002 R11: ffff88801e27a440 R12: dffffc0000000000
R13: ffff88803c660878 R14: f700000000000088 R15: 0000000000000000
FS:  000055555a677380(0000) GS:ffff888020800000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560bce569178 CR3: 000000001de5a000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ocfs2_mknod+0xcaf/0x2b40 fs/ocfs2/namei.c:292
vfs_mknod+0x36d/0x3b0 fs/namei.c:4088
do_mknodat+0x3ec/0x5b0
__do_sys_mknodat fs/namei.c:4166 [inline]
__se_sys_mknodat fs/namei.c:4163 [inline]
__x64_sys_mknodat+0xa7/0xc0 fs/namei.c:4163
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f2dafda3a99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 17 00 00 90 48 89
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8
64 89 01 48
RSP: 002b:00007ffe336a6658 EFLAGS: 00000246 ORIG_RAX:
0000000000000103
RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f2dafda3a99
RDX: 00000000000021c0 RSI: 0000000020000040 RDI:
00000000ffffff9c
RBP: 00007f2dafe1b5f0 R08: 0000000000004480 R09:
000055555a6784c0
R10: 0000000000000103 R11: 0000000000000246 R12:
00007ffe336a6680
R13: 00007ffe336a68a8 R14: 431bde82d7b634db R15:
00007f2dafdec03b
</TASK>
==================================================================

The two reports are all caused invalid negative i_size of dir inode.  For
ocfs2, dir_inode can't be negative or zero.

Here add a check in which is called by ocfs2_check_dir_for_entry().  It
fixes the second report as ocfs2_check_dir_for_entry() must be called
before ocfs2_prepare_dir_for_insert().  Also set a up limit for dir with
OCFS2_INLINE_DATA_FL.  The i_size can't be great than blocksize.

Link: https://lkml.kernel.org/r/20250106140640.92260-1-glass.su@suse.com
Reported-by: Jiacheng Xu <stitch@zju.edu.cn>
Link: https://lore.kernel.org/ocfs2-devel/17a04f01.1ae74.19436d003fc.Coremail.stitch@zju.edu.cn/T/#u
Reported-by: syzbot+5a64828fcc4c2ad9b04f@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000005894f3062018caf1@google.com/T/
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 24, 2025
[ Upstream commit 5805402 ]

vxlan_init() must check vxlan_vnigroup_init() success
otherwise a crash happens later, spotted by syzbot.

Oops: general protection fault, probably for non-canonical address 0xdffffc000000002c: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000160-0x0000000000000167]
CPU: 0 UID: 0 PID: 7313 Comm: syz-executor147 Not tainted 6.14.0-rc1-syzkaller-00276-g69b54314c975 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
 RIP: 0010:vxlan_vnigroup_uninit+0x89/0x500 drivers/net/vxlan/vxlan_vnifilter.c:912
Code: 00 48 8b 44 24 08 4c 8b b0 98 41 00 00 49 8d 86 60 01 00 00 48 89 c2 48 89 44 24 10 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 04 00 00 49 8b 86 60 01 00 00 48 ba 00 00 00
RSP: 0018:ffffc9000cc1eea8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffffff8672effb
RDX: 000000000000002c RSI: ffffffff8672ecb9 RDI: ffff8880461b4f18
RBP: ffff8880461b4ef4 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000020000
R13: ffff8880461b0d80 R14: 0000000000000000 R15: dffffc0000000000
FS:  00007fecfa95d6c0(0000) GS:ffff88806a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fecfa95cfb8 CR3: 000000004472c000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  vxlan_uninit+0x1ab/0x200 drivers/net/vxlan/vxlan_core.c:2942
  unregister_netdevice_many_notify+0x12d6/0x1f30 net/core/dev.c:11824
  unregister_netdevice_many net/core/dev.c:11866 [inline]
  unregister_netdevice_queue+0x307/0x3f0 net/core/dev.c:11736
  register_netdevice+0x1829/0x1eb0 net/core/dev.c:10901
  __vxlan_dev_create+0x7c6/0xa30 drivers/net/vxlan/vxlan_core.c:3981
  vxlan_newlink+0xd1/0x130 drivers/net/vxlan/vxlan_core.c:4407
  rtnl_newlink_create net/core/rtnetlink.c:3795 [inline]
  __rtnl_newlink net/core/rtnetlink.c:3906 [inline]

Fixes: f9c4bb0 ("vxlan: vni filtering support on collect metadata device")
Reported-by: syzbot+6a9624592218c2c5e7aa@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67a9d9b4.050a0220.110943.002d.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250210105242.883482-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 24, 2025
…faces

commit 2240fed upstream.

Robert Morris created a test program which can cause
usb_hub_to_struct_hub() to dereference a NULL or inappropriate
pointer:

Oops: general protection fault, probably for non-canonical address
0xcccccccccccccccc: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 7 UID: 0 PID: 117 Comm: kworker/7:1 Not tainted 6.13.0-rc3-00017-gf44d154d6e3d #14
Hardware name: FreeBSD BHYVE/BHYVE, BIOS 14.0 10/17/2021
Workqueue: usb_hub_wq hub_event
RIP: 0010:usb_hub_adjust_deviceremovable+0x78/0x110
...
Call Trace:
 <TASK>
 ? die_addr+0x31/0x80
 ? exc_general_protection+0x1b4/0x3c0
 ? asm_exc_general_protection+0x26/0x30
 ? usb_hub_adjust_deviceremovable+0x78/0x110
 hub_probe+0x7c7/0xab0
 usb_probe_interface+0x14b/0x350
 really_probe+0xd0/0x2d0
 ? __pfx___device_attach_driver+0x10/0x10
 __driver_probe_device+0x6e/0x110
 driver_probe_device+0x1a/0x90
 __device_attach_driver+0x7e/0xc0
 bus_for_each_drv+0x7f/0xd0
 __device_attach+0xaa/0x1a0
 bus_probe_device+0x8b/0xa0
 device_add+0x62e/0x810
 usb_set_configuration+0x65d/0x990
 usb_generic_driver_probe+0x4b/0x70
 usb_probe_device+0x36/0xd0

The cause of this error is that the device has two interfaces, and the
hub driver binds to interface 1 instead of interface 0, which is where
usb_hub_to_struct_hub() looks.

We can prevent the problem from occurring by refusing to accept hub
devices that violate the USB spec by having more than one
configuration or interface.

Reported-and-tested-by: Robert Morris <rtm@csail.mit.edu>
Cc: stable <stable@kernel.org>
Closes: https://lore.kernel.org/linux-usb/95564.1737394039@localhost/
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/c27f3bf4-63d8-4fb5-ac82-09e3cd19f61c@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amboar pushed a commit that referenced this pull request Feb 28, 2025
[ Upstream commit 071ed42 ]

tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can
return 1 if the allocation succeeded after wrapping. This was treated as
an error, with value 1 returned to caller tcf_exts_init_ex() which sets
exts->actions to NULL and returns 1 to caller fl_change().

fl_change() treats err == 1 as success, calling tcf_exts_validate_ex()
which calls tcf_action_init() with exts->actions as argument, where it
is dereferenced.

Example trace:

BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1
RIP: 0010:tcf_action_init+0x1f8/0x2c0
Call Trace:
 tcf_action_init+0x1f8/0x2c0
 tcf_exts_validate_ex+0x175/0x190
 fl_change+0x537/0x1120 [cls_flower]

Fixes: 80cd22c ("net/sched: cls_api: Support hardware miss to tc action")
Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250213223610.320278-1-pierre@stackhpc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 28, 2025
[ Upstream commit 4ccacf8 ]

Brad Spengler reported the list_del() corruption splat in
gtp_net_exit_batch_rtnl(). [0]

Commit eb28fd7 ("gtp: Destroy device along with udp socket's netns
dismantle.") added the for_each_netdev() loop in gtp_net_exit_batch_rtnl()
to destroy devices in each netns as done in geneve and ip tunnels.

However, this could trigger ->dellink() twice for the same device during
->exit_batch_rtnl().

Say we have two netns A & B and gtp device B that resides in netns B but
whose UDP socket is in netns A.

  1. cleanup_net() processes netns A and then B.

  2. gtp_net_exit_batch_rtnl() finds the device B while iterating
     netns A's gn->gtp_dev_list and calls ->dellink().

  [ device B is not yet unlinked from netns B
    as unregister_netdevice_many() has not been called. ]

  3. gtp_net_exit_batch_rtnl() finds the device B while iterating
     netns B's for_each_netdev() and calls ->dellink().

gtp_dellink() cleans up the device's hash table, unlinks the dev from
gn->gtp_dev_list, and calls unregister_netdevice_queue().

Basically, calling gtp_dellink() multiple times is fine unless
CONFIG_DEBUG_LIST is enabled.

Let's remove for_each_netdev() in gtp_net_exit_batch_rtnl() and
delegate the destruction to default_device_exit_batch() as done
in bareudp.

[0]:
list_del corruption, ffff8880aaa62c00->next (autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]) is LIST_POISON1 (ffffffffffffff02) (prev is 0xffffffffffffff04)
kernel BUG at lib/list_debug.c:58!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 UID: 0 PID: 1804 Comm: kworker/u8:7 Tainted: G                T   6.12.13-grsec-full-20250211091339 #1
Tainted: [T]=RANDSTRUCT
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:[<ffffffff84947381>] __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58
Code: c2 76 91 31 c0 e8 9f b1 f7 fc 0f 0b 4d 89 f0 48 c7 c1 02 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 e0 c2 76 91 31 c0 e8 7f b1 f7 fc <0f> 0b 4d 89 e8 48 c7 c1 04 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 60
RSP: 0018:fffffe8040b4fbd0 EFLAGS: 00010283
RAX: 00000000000000cc RBX: dffffc0000000000 RCX: ffffffff818c4054
RDX: ffffffff84947381 RSI: ffffffff818d1512 RDI: 0000000000000000
RBP: ffff8880aaa62c00 R08: 0000000000000001 R09: fffffbd008169f32
R10: fffffe8040b4f997 R11: 0000000000000001 R12: a1988d84f24943e4
R13: ffffffffffffff02 R14: ffffffffffffff04 R15: ffff8880aaa62c08
RBX: kasan shadow of 0x0
RCX: __wake_up_klogd.part.0+0x74/0xe0 kernel/printk/printk.c:4554
RDX: __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58
RSI: vprintk+0x72/0x100 kernel/printk/printk_safe.c:71
RBP: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]
RSP: process kstack fffffe8040b4fbd0+0x7bd0/0x8000 [kworker/u8:7+netns 1804 ]
R09: kasan shadow of process kstack fffffe8040b4f990+0x7990/0x8000 [kworker/u8:7+netns 1804 ]
R10: process kstack fffffe8040b4f997+0x7997/0x8000 [kworker/u8:7+netns 1804 ]
R15: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc08/0x1000 [slab object]
FS:  0000000000000000(0000) GS:ffff888116000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000748f5372c000 CR3: 0000000015408000 CR4: 00000000003406f0 shadow CR4: 00000000003406f0
Stack:
 0000000000000000 ffffffff8a0c35e7 ffffffff8a0c3603 ffff8880aaa62c00
 ffff8880aaa62c00 0000000000000004 ffff88811145311c 0000000000000005
 0000000000000001 ffff8880aaa62000 fffffe8040b4fd40 ffffffff8a0c360d
Call Trace:
 <TASK>
 [<ffffffff8a0c360d>] __list_del_entry_valid include/linux/list.h:131 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] __list_del_entry include/linux/list.h:248 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] list_del include/linux/list.h:262 [inline] fffffe8040b4fc28
 [<ffffffff8a0c360d>] gtp_dellink+0x16d/0x360 drivers/net/gtp.c:1557 fffffe8040b4fc28
 [<ffffffff8a0d0404>] gtp_net_exit_batch_rtnl+0x124/0x2c0 drivers/net/gtp.c:2495 fffffe8040b4fc88
 [<ffffffff8e705b24>] cleanup_net+0x5a4/0xbe0 net/core/net_namespace.c:635 fffffe8040b4fcd0
 [<ffffffff81754c97>] process_one_work+0xbd7/0x2160 kernel/workqueue.c:3326 fffffe8040b4fd88
 [<ffffffff81757195>] process_scheduled_works kernel/workqueue.c:3407 [inline] fffffe8040b4fec0
 [<ffffffff81757195>] worker_thread+0x6b5/0xfa0 kernel/workqueue.c:3488 fffffe8040b4fec0
 [<ffffffff817782a0>] kthread+0x360/0x4c0 kernel/kthread.c:397 fffffe8040b4ff78
 [<ffffffff814d8594>] ret_from_fork+0x74/0xe0 arch/x86/kernel/process.c:172 fffffe8040b4ffb8
 [<ffffffff8110f509>] ret_from_fork_asm+0x29/0xc0 arch/x86/entry/entry_64.S:399 fffffe8040b4ffe8
 </TASK>
Modules linked in:

Fixes: eb28fd7 ("gtp: Destroy device along with udp socket's netns dismantle.")
Reported-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250217203705.40342-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this pull request Feb 28, 2025
commit 07b598c upstream.

Syzkaller reports the following bug:

BUG: spinlock bad magic on CPU#1, syz-executor.0/7995
 lock: 0xffff88805303f3e0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 1 PID: 7995 Comm: syz-executor.0 Tainted: G            E     5.10.209+ #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x119/0x179 lib/dump_stack.c:118
 debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
 do_raw_spin_lock+0x1f6/0x270 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
 _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
 reset_per_cpu_data+0xe6/0x240 [drop_monitor]
 net_dm_cmd_trace+0x43d/0x17a0 [drop_monitor]
 genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739
 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
 genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800
 netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2497
 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
 netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1348
 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:651 [inline]
 __sock_sendmsg+0x157/0x190 net/socket.c:663
 ____sys_sendmsg+0x712/0x870 net/socket.c:2378
 ___sys_sendmsg+0xf8/0x170 net/socket.c:2432
 __sys_sendmsg+0xea/0x1b0 net/socket.c:2461
 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x62/0xc7
RIP: 0033:0x7f3f9815aee9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3f972bf0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3f9826d050 RCX: 00007f3f9815aee9
RDX: 0000000020000000 RSI: 0000000020001300 RDI: 0000000000000007
RBP: 00007f3f981b63bd R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f3f9826d050 R15: 00007ffe01ee6768

If drop_monitor is built as a kernel module, syzkaller may have time
to send a netlink NET_DM_CMD_START message during the module loading.
This will call the net_dm_monitor_start() function that uses
a spinlock that has not yet been initialized.

To fix this, let's place resource initialization above the registration
of a generic netlink family.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.

Fixes: 9a8afc8 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250213152054.2785669-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant