Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel-m10-bmc-hwmon N6000 board power value in milli Watts instead of micro-Watts required by HWMON SYSFS #25

Closed
umairsiddiqui-digitek opened this issue May 15, 2024 · 7 comments
Assignees

Comments

@umairsiddiqui-digitek
Copy link

According to
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

HWMon expects board power in microwatts units

power[1-*]_input		Instantaneous power use
				Unit: microWatt
				RO
				

However the BMC of N6001 provides board power value in milli Watts. therefore following line should be

{ 0x724, 0x0, 0x0, 0x0, 0x0, 1, "Board Power" },

static const struct m10bmc_sdata n6000bmc_power_tbl[] = {
	{ 0x724, 0x0, 0x0, 0x0, 0x0, 1000, "Board Power" },
};

Now the

fpgainfo bmc

shows the correct value, because OPAE SDK reads from sysfs and treats the value in milli Watts

https://github.com/OFS/opae-sdk/blob/31d4aba7e9b456f29234a97ce5c215f00d82e340/libraries/plugins/xfpga/metrics/metrics_max10.c#L387

But it is incorrect from hwmon sysfs prospective

@pcolberg pcolberg self-assigned this May 15, 2024
@pcolberg
Copy link
Contributor

pcolberg commented May 15, 2024

Thanks for the report; I could confirm the behaviour on an N6001.

# strace -o fpgainfo_bmc_strace.log fpgainfo bmc | grep -i power
(45) Board Power                                        : 27.32 Watts

# grep -A2 hwmon.*power fpgainfo_bmc_strace.log
lstat("/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.6/n6000bmc-hwmon.2.auto/hwmon/hwmon4/power1_label", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.6/n6000bmc-hwmon.2.auto/hwmon/hwmon4/power1_label", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(4, "Board Power\n", 4096)          = 12
--
openat(AT_FDCWD, "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.6/n6000bmc-hwmon.2.auto/hwmon/hwmon4/power1_input", O_RDONLY) = 4
lseek(4, 0, SEEK_SET)                   = 0
read(4, "27319\n", 256)                 = 6

# sensors | grep -i 'board power'
Board Power:                                   27.32 mW 

The issue was introduced with upstream commit e198322.

@pcolberg
Copy link
Contributor

I will submit a fix to the upstream kernel and to OPAE SDK. I briefly considered whether the fix for OPAE SDK should include a heuristic to work with both broken and fixed kernels, but that would likely make things worse when the hardware (BMC) is actually reporting wrong values, which might then be shadowed by the heuristic.

@pcolberg
Copy link
Contributor

Submitted OFS/linux-dfl-backport#118 to Linux DFL backport driver.

@umairsiddiqui-digitek, please let me know your name and email address if you would like to be acknowledged publicly in the kernel commit message.

@umairsiddiqui-digitek
Copy link
Author

umairsiddiqui-digitek commented May 18, 2024

after this update opae sdk need to be updated as well

https://github.com/OFS/opae-sdk

because

fpgainfo bmc

assumes power in milli watts

https://github.com/OFS/opae-sdk/blob/31d4aba7e9b456f29234a97ce5c215f00d82e340/libraries/plugins/xfpga/metrics/metrics_max10.c#L387

@pcolberg
Copy link
Contributor

pcolberg added a commit to OFS/opae-sdk that referenced this issue May 21, 2024
The hwmon sysfs interface provides power values in microwatt.

Link: OFS/linux-dfl#25
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue May 21, 2024
The hwmon sysfs interface provides power values in microwatt.

Link: OFS/linux-dfl#25
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue May 22, 2024
The hwmon sysfs interface provides power values in microwatt.

Link: OFS/linux-dfl#25
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
@pcolberg
Copy link
Contributor

pcolberg pushed a commit that referenced this issue May 31, 2024
Klara Modin reported warnings for a kernel configured with BPF_JIT but
without MODULES:

[   44.131296] Trying to vfree() bad address (000000004a17c299)
[   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
[   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[   44.164433] Workqueue: events bpf_prog_free_deferred
[   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.188772] sp : ffff800082a13c70
[   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
[   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
[   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
[   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[   44.275703] Call trace:
[   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.283858] vfree (mm/vmalloc.c:3322)
[   44.287835] execmem_free (mm/execmem.c:70)
[   44.292347] bpf_jit_free_exec+0x10/0x1c
[   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
[   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
[   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
[   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
[   44.317785] process_one_work (kernel/workqueue.c:3273)
[   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
[   44.327292] kthread (kernel/kthread.c:388)
[   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)

The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.

Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.

Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
Fixes: 2c9e5d4 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
@pcolberg
Copy link
Contributor

pcolberg commented Jun 5, 2024

The kernel fix was upstreamed to Linux 6.10 as commit 027a44fedd55 ("hwmon: (intel-m10-bmc-hwmon) Fix multiplier for N6000 board power sensor").

pcolberg pushed a commit that referenced this issue Oct 21, 2024
A dentry leak may be caused when a lookup cookie and a cull are concurrent:

            P1             |             P2
-----------------------------------------------------------
cachefiles_lookup_cookie
  cachefiles_look_up_object
    lookup_one_positive_unlocked
     // get dentry
                            cachefiles_cull
                              inode->i_flags |= S_KERNEL_FILE;
    cachefiles_open_file
      cachefiles_mark_inode_in_use
        __cachefiles_mark_inode_in_use
          can_use = false
          if (!(inode->i_flags & S_KERNEL_FILE))
            can_use = true
	  return false
        return false
        // Returns an error but doesn't put dentry

After that the following WARNING will be triggered when the backend folder
is umounted:

==================================================================
BUG: Dentry 000000008ad87947{i=7a,n=Dx_1_1.img}  still in use (1) [unmount of ext4 sda]
WARNING: CPU: 4 PID: 359261 at fs/dcache.c:1767 umount_check+0x5d/0x70
CPU: 4 PID: 359261 Comm: umount Not tainted 6.6.0-dirty #25
RIP: 0010:umount_check+0x5d/0x70
Call Trace:
 <TASK>
 d_walk+0xda/0x2b0
 do_one_tree+0x20/0x40
 shrink_dcache_for_umount+0x2c/0x90
 generic_shutdown_super+0x20/0x160
 kill_block_super+0x1a/0x40
 ext4_kill_sb+0x22/0x40
 deactivate_locked_super+0x35/0x80
 cleanup_mnt+0x104/0x160
==================================================================

Whether cachefiles_open_file() returns true or false, the reference count
obtained by lookup_positive_unlocked() in cachefiles_look_up_object()
should be released.

Therefore release that reference count in cachefiles_look_up_object() to
fix the above issue and simplify the code.

Fixes: 1f08c92 ("cachefiles: Implement backing file wrangling")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240829083409.3788142-1-libaokun@huaweicloud.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants