[pull] master from torvalds:master #166

pull · 2020-10-13T20:08:49Z

See Commits and Changes for more details.

Created by pull[bot]. Want to support this open source service? Please star it : )

@IoveC

Set rw->free_iovec to @IoveC, that gives an identical result and stresses that @IoveC param rw->free_iovec play the same role. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

When io_req_map_rw() is called from io_rw_prep_async(), it memcpy() iorw->iter into itself. Even though it doesn't lead to an error, such a memcpy()'s aliasing rules violation is considered to be a bad practise. Inline io_req_map_rw() into io_rw_prep_async(). We don't really need any remapping there, so it's much simpler than the generic implementation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Testing ctx->user_bufs for NULL in io_import_fixed() is not neccessary, because in that case ctx->nr_user_bufs would be zero, and the following check would fail. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

There's really no point in having this union, it just means that we're always allocating enough room to cater to any command. But that's pointless, as the ->io field is request type private anyway. This gets rid of the io_async_ctx structure, and fills in the required size in the io_op_defs[] instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>

In the spirit of fairness, cap the max number of SQ entries we'll submit for SQPOLL if we have multiple rings. If we don't do that, we could be submitting tons of entries for one ring, while others are waiting to get service. The value of 8 is somewhat arbitrarily chosen as something that allows a fair bit of batching, without using an excessive time per ring. Signed-off-by: Jens Axboe <axboe@kernel.dk>

Fixes coccicheck warning: fs/io_uring.c:4242:13-14: Unneeded semicolon Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring does account any registered buffer as pinned/locked memory, and checks limit and fails if the given user doesn't have a big enough limit to register the ranges specified. However, if huge pages are used, we are potentially under-accounting the memory in terms of what gets pinned on the vm side. This patch rectifies that, by ensuring that we account the full size of a compound page, regardless of how much of it is being registered. Huge pages are not accounted mulitple times - if multiple sections of a huge page is registered, then the page is only accounted once. Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

There are a few operations that are offloaded to the worker threads. In this case, we lose process context and end up in kthread context. This results in ios to be not accounted to the issuing cgroup and consequently end up as issued by root. Just like others, adopt the personality of the blkcg too when issuing via the workqueues. For the SQPOLL thread, it will live and attach in the inited cgroup's context. Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>

We do this for CQ ring wait, in case task_work completions come in. We should do the same in io_uring_register(), to avoid spurious -EINTR if the ring quiescing ends up having to process task_work to complete the operation Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

In most cases we'll specify IORING_SETUP_SQPOLL and run multiple io_uring instances in a host. Since all sqthreads are named "io_uring-sq", it's hard to distinguish the relations between application process and its io_uring sqthread. With this patch, application can get its corresponding sqthread pid and cpu through show_fdinfo. Steps: 1. Get io_uring fd first. $ ls -l /proc/<pid>/fd | grep -w io_uring 2. Then get io_uring instance related info, including corresponding sqthread pid and cpu. $ cat /proc/<pid>/fdinfo/<io_uring_fd> pos: 0 flags: 02000002 mnt_id: 13 SqThread: 6929 SqThreadCpu: 2 UserFiles: 1 0: testfile UserBufs: 0 PollList: Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> [axboe: fixed for new shared SQPOLL] Signed-off-by: Jens Axboe <axboe@kernel.dk>

The smart syzbot has found a reproducer for the following issue: ================================================================== BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline] BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_write include/linux/instrumented.h:71 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_wqe_inc_running fs/io-wq.c:301 [inline] io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 schedule_timeout+0x148/0x250 kernel/time/timer.c:1879 io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 7768: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594 kmalloc_node include/linux/slab.h:572 [inline] kzalloc_node include/linux/slab.h:677 [inline] io_wq_create+0x57b/0xa10 fs/io-wq.c:1064 io_init_wq_offload fs/io_uring.c:7432 [inline] io_sq_offload_start fs/io_uring.c:7504 [inline] io_uring_create fs/io_uring.c:8625 [inline] io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 21: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kfree+0x10e/0x2b0 mm/slab.c:3756 __io_wq_destroy fs/io-wq.c:1138 [inline] io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146 io_finish_async fs/io_uring.c:6836 [inline] io_ring_ctx_free fs/io_uring.c:7870 [inline] io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at ffff8882183db000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 140 bytes inside of 1024-byte region [ffff8882183db000, ffff8882183db400) The buggy address belongs to the page: page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db flags: 0x57ffe0000000200(slab) raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== which is down to the comment below, /* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) complete(&wqe->wq->done); because there might be multiple cases of wqe in a wq and we would wait for every worker in every wqe to go home before releasing wq's resources on destroying. To that end, rework wq's refcount by making it independent of the tracking of workers because after all they are two different things, and keeping it balanced when workers come and go. Note the manager kthread, like other workers, now holds a grab to wq during its lifetime. Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker and do nothing for exiting wq. Cc: stable@vger.kernel.org # v5.5+ Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

This flag is no longer used, remove it. Signed-off-by: Jens Axboe <axboe@kernel.dk>

Extract common code from if/else branches. That is cleaner and optimised even better. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Put brackets around bitwise ops in a complex expression Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

REQ_F_NEED_CLEANUP is set only by io_*_prep() and they're guaranteed to be called only once, so there is no one who may have set the flag before. Kill REQ_F_NEED_CLEANUP check in these *prep() handlers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Move setting IOCB_NOWAIT from io_prep_rw() into io_read()/io_write(), so it's set/cleared in a single place. Also remove @force_nonblock parameter from io_prep_rw(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

All io_*_prep() functions including io_{read,write}_prep() are called only during submission where @force_nonblock is always true. Don't keep propagating it and instead remove the @force_nonblock argument from prep() altogether. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_issue_sqe() does two things at once, trying to prepare request and issuing them. Split it in two and deduplicate with io_defer_prep(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

@sqe

All request preparations are done only during submission, reflect it in the code by moving io_req_prep() much earlier into io_queue_sqe(). That's much cleaner, because it doen't expose bits to async code which it won't ever use. Also it makes the interface harder to misuse, and there are potential places for bugs. For instance, __io_queue() doesn't clear @sqe before proceeding to a next linked request, that could have been disastrous, but hopefully there are linked requests IFF sqe==NULL, so not actually a bug. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

We always use &req->task_work anyway, no point in passing it in. Signed-off-by: Jens Axboe <axboe@kernel.dk>

There's no need to have a check when setting timing.past_jiffies, as we can simply do the initialization earlier at vidtv_mux_init(). Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Changeset 870e350 ("media: vidtv: get rid of ENDIAN_BITFIELD nonsense") was incomplete. There are still some wrong endannes logic at the driver. Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

This field should point to the network ID, and has different ranges for cable, terrestrial or satellite. It also has an special range for temporary private usage. For now, let's use the temporary private one. Once the Network Information Table (NIT) gets added, this should be reviewed. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

There's no reason to use static vars to store PSI version numbers. Also, currently, version numbers are starting with 0x01, because there's a code being called that increases it to 1 for all table initializer code, as the code may support dynamic changes at the PS tables on some future. So, let's just initialize them to 0x1f, in order for the versions to be reported as starting from 0. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

optinal -> optional Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Instead of passing struct pes_ts_header_write_args fields as function parameters, just pass a pointer to the struct. That would allow adding more args without needing to change the function prototype. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

When compared with a stream generated via ffmpeg, it can be noticed that the PES doesn't contain any PCR info. That could cause problems with userspace decoding. So, rewrite the logic that fills the adaptation info, in order to allow it to add PCR frames without breaking frame alignment. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Neither Vlc nor Gstreamer likes the PES_scrambling_control bits. In the case of GST, this can be seen with: $ GST_DEBUG=2 LANG=C gst-play-1.0 pcm_audio.ts ... 0:00:00.097973439 12308 0x55f7ddd155e0 WARN pesparser pesparse.c:411:mpegts_parse_pes_header: Wrong '0x10' marker before PES_scrambling_control (0x40) 0:00:00.097987026 12308 0x55f7ddd155e0 WARN tsdemux tsdemux.c:2314:gst_ts_demux_parse_pes_header: Error parsing PES header. pid: 0x111 stream_type: 0x6 ... So, change, it. After such change, the stream now plays fine with Vlc, Gstreamer, ffmpeg - and with programs that use such libraries, like Kaffeine. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Add proper control type to the recently introduced V4L2_CID_MPEG_VIDEO_FRAME_SKIP_MODE control. This will forward it to v4l2_ctrl_new_std_menu() not v4l2_ctrl_new_std(), what causes the failure. This fixes the following warning during driver initialization: s5p_mfc_enc_ctrls_setup:2671: Adding control (18) failed s5p_mfc_open:811: Failed to setup mfc controls Fixes: ef56b3e ("media: s5p-mfc: Use standard frame skip mode control") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files are not used. Hence, verifying fds only against ctx->nr_user_files is enough. Remove the other check from hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Keep ->needs_file_no_error check out of io_file_get(), and let callers handle it. It makes it more straightforward. Also, as the only error it can hand back -EBADF, make it return a file or NULL. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

state->ios_left isn't decremented for requests that don't need a file, so it might be larger than number of SQEs left. That in some circumstances makes us to grab more files that is needed so imposing extra put. Deaccount one ios_left for each request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Don't use struct io_timeout for both IORING_OP_TIMEOUT and IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two, that allows to remove an unused field in struct io_timeout, and btw kill ->flags not used by either. This also easier to follow, especially for timeout remove. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel() successfully cancels it. With this we don't need to care whether there was a race and it was removed in io_timeout_fn(), and that will be handy for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Kill extra if in io_issue_sqe() and place send/recv[msg] calls appropriately under switch's cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Don't postpone io_init_req() error checks and do that right after calling it. There is no control-flow statements or dependencies with sqe/submitted accounting, so do those earlier, that makes the code flow a bit more natural. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Keep file_data in a local var and replace with it complex references such as ctx->file_data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Don't keep repeating cleaning sequences in error paths, write it once in the and use labels. It's less error prone and looks cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

->cur_refs of struct fixed_file_data always points to percpu_ref embedded into struct fixed_file_ref_node. Don't overuse container_of() and offsetting, and point directly to fixed_file_ref_node. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

…t/ulfh/mmc Pull MMC updates from Ulf Hansson: "MMC core: - Export SDIO revision and info strings to userspace - Add support for specifying mmc/mmcblk index via mmc aliases in DT MMC host: - Enable support for async probe for all mmc host drivers - Enable compile testing of multiple host drivers - dw_mmc: Enable the Synopsys DesignWare driver for RISCV and CSKY - mtk-sd: Fixup support for CQHCI - owl-mmc: Add support for the actions,s700-mmc variant - renesas_sdhi: Fix regression (temporary) for re-insertion of SD cards - renesas_sdhi: Add support for the r8a774e1 variant - renesas_sdhi/tmio: Improvements for tunings - renesas_sdhi/tmio: Rework support for reset of controller - sdhci-acpi: Fix HS400 tuning for devices with invalid presets on AMDI0040 - sdhci_am654: Improve support for tunings - sdhci_am654: Add support for input tap delays - sdhci_am654: Add workaround for card detect debounce timer - sdhci-am654: Add support for the TI's J7200 variants - sdhci-esdhc-imx: Fix support for manual tuning - sdhci-iproc: Enable support for eMMC DDR 3.3V for bcm2711 - sdhci-msm: Fix stability issues with HS400 for sc7180 - sdhci-of-sparx5: Add Sparx5 SoC eMMC driver - sdhci-of-esdhc: Fixup reference clock source selection - sdhci-pci: Add LTR support for some Intel BYT controllers - sdhci-pci-gli: Add CQHCI Support for GL9763E" * tag 'mmc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (91 commits) mmc: sdhci_am654: Fix module autoload mmc: renesas_sdhi: workaround a regression when reinserting SD cards mmc: sdhci-pci-gli: Add CQHCI Support for GL9763E mmc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN mmc: sdhci_am654: Enable tuning for SDR50 mmc: sdhci_am654: Add support for software tuning mmc: sdhci_am654: Add support for input tap delay mmc: sdhci_am654: Fix hard coded otap delay array size dt-bindings: mmc: sdhci-am654: Add documentation for input tap delay dt-bindings: mmc: sdhci-am654: Convert sdhci-am654 controller documentation to json schema mmc: sdhci-of-esdhc: fix reference clock source selection mmc: host: fix depends for MMC_MESON_GX w/ COMPILE_TEST mmc: sdhci-s3c: hide forward declaration of of_device_id behind CONFIG_OF mmc: sdhci: fix indentation mistakes mmc: moxart: remove unneeded check for drvdata mmc: renesas_sdhi: drop local flag for tuning mmc: rtsx_usb_sdmmc: simplify the return expression of sd_change_phase() mmc: core: document mmc_hw_reset() mmc: mediatek: Drop pointer to mmc_host from msdc_host dt-bindings: mmc: owl: add compatible string actions,s700-mmc ...

…l/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - the usbvision driver was dropped from staging - the Zoran driver were re-added at staging. It gained lots of improvements, and was converted to use videobuf2 API - a new virtual driver (vidtv) was added in order to allow testing the digital TV framework and APIs - the media uAPI documentation gained a glossary with commonly used terms, helping to simplify some parts of the docs - more cleanups at the atomisp driver - Mediatek VPU gained support for MT8183 - added support for codecs with supports doing colorspace conversion (CSC) - support for CSC API was added at vivid and rksip1 drivers - added a helper core support and uAPI for better supporting H.264 codecs - added support for Renesas R8A774E1 - use the new SPDX GFDL-1.1-no-invariants-or-later license on media uAPI docs, instead of a license text - Venus driver has gained VP9 codec support - lots of other cleanups and driver improvements * tag 'media/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (555 commits) media: dvb-frontends/drxk_hard.c: fix uninitialized variable warning media: tvp7002: fix uninitialized variable warning media: s5k5baf: drop 'data' field in struct s5k5baf_fw media: dt-bindings: media: venus: Add an optional power domain for perf voting media: rcar-vin: rcar-dma: Fix setting VNIS_REG for RAW8 formats media: staging: rkisp1: uapi: Do not use BIT() macro media: v4l2-mem2mem: Fix spurious v4l2_m2m_buf_done media: usbtv: Fix refcounting mixup media: zoran.rst: place it at the right place this time media: add Zoran cardlist media: admin-guide: update cardlists media: siano: rename a duplicated card string media: zoran: move documentation file to the right place media: atomisp: fixes build breakage for ISP2400 due to a cleanup media: zoran: fix mixed case on vars media: zoran: get rid of an unused var media: zoran: use upper case for card types media: zoran: fix sparse warnings media: zoran: fix smatch warning media: zoran: update TODO ...

…/git/broonie/regmap Pull regmap updates from Mark Brown: "Quite a busy release for regmap, mostly support for new features useful on fairly small subsets of devices. The user visible features are: - A new API for registering large numbers of regmap fields at once. - Support for Intel AVMM buses connected via SPI. - Support for 12/20 address/value layouts. - Support for yet another scheme for acknowledging interrupts used on some Qualcomm devices" * tag 'regmap-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: irq: Add support to clear ack registers regmap: add support to regmap_field_bulk_alloc/free apis regmap: destroy mutex (if used) in regmap_exit() regmap: debugfs: use semicolons rather than commas to separate statements regmap: debugfs: Fix more error path regressions regmap: Add support for 12/20 register formatting regmap: Add can_sleep configuration option regmap: soundwire: remove unused header mod_devicetable.h regmap: Use flexible sleep regmap: add Intel SPI Slave to AVMM Bus Bridge support

…nel/git/broonie/regulator Pull regulator updates from Mark Brown: "This is a fairly small release for the regulator API, there's quite a few new devices supported and some important improvements around coupled regulators in the core but mostly just small fixes and improvements otherwise. Summary: - Fixes and cleanups around the handling of coupled regulators. - A special driver for some Raspberry Pi panels with some unusually custom stuff around them. - Support for Qualcomm PM660/PM660L, PM8950 and PM8953, Richtek RT4801 and RTMV20, Rohm BD9576MUF and BD9573MUF" * tag 'regulator-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (89 commits) regulator: bd9576: Fix print regulator: bd9576: fix regulator binfdings dt node names dt-bindings: regulator: document pm8950 and pm8953 smd regulators regulator: qcom_smd: add pm8953 regulators regulator: Make constraint debug processing conditional on DEBUG regulator: qcom: labibb: Constify static structs regulator: dt-bindings: Document the PM660/PM660L PMICs entries regulator: qcom_smd: Add PM660/PM660L regulator support regulator: dt-bindings: Document the PM660/660L SPMI PMIC entries regulator: qcom_spmi: Add PM660/PM660L regulators regulator: qcom_spmi: Add support for new regulator types regulator: core: Enlarge max OF property name length to 64 chars regulator: tps65910: use regmap accessors regulator: rtmv20: Add missing regcache cache only before marked as dirty regulator: rtmv20: Update DT binding document and property name parsing regulator: rtmv20: Add DT-binding document for Richtek RTMV20 regulator: rtmv20: Adds support for Richtek RTMV20 load switch regulator regulator: resolve supply after creating regulator regulator: print symbolic errors in kernel messages regulator: print state at boot ...

…t/broonie/spi Pull spi updates from Mark Brown: "There's quite a lot of changes for SPI in this release but none in the core, they're all mostly small driver updates and additions. Some of the more notable changes include: - A huge set of cleanups, optimizations and improvements for the DesignWare driver from Serge Semin finishing up the work started last release. - Conversion of the Zynq gqspi driver to spi-mem. - Support for Baikal T1, Broadcom BCMSTB 7445, and Renesas R8A7742" * tag 'spi-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (137 commits) spi: cadence: Add SPI transfer delays spi: dw: Add Baikal-T1 SPI Controller bindings spi: dw: Add Baikal-T1 SPI Controller glue driver spi: dw: Add poll-based SPI transfers support spi: dw: Introduce max mem-ops SPI bus frequency setting spi: dw: Add memory operations support spi: dw: Add generic DW SSI status-check method spi: dw: Move num-of retries parameter to the header file spi: dw: Explicitly de-assert CS on SPI transfer completion spi: dw: De-assert chip-select on reset spi: dw: Discard chip enabling on DMA setup error spi: dw: Unmask IRQs after enabling the chip spi: dw: Perform IRQ setup in a dedicated function spi: dw: Refactor IRQ-based SPI transfer procedure spi: dw: Refactor data IO procedure spi: dw: Add DW SPI controller config structure spi: dw: Update Rx sample delay in the config function spi: dw: Simplify the SPI bus speed config procedure spi: dw: Update SPI bus speed in a config function spi: dw: Detach SPI device specific CR0 config method ...

…/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This time very little driver changes but lots of core changes. We have some interesting cooperative work for ARM and Intel alike, making the GPIO subsystem more and more suitable for industrial systems and the like, in addition to the in-kernel users. We touch driver core (device properties) and lib/* by adding one simple string array free function, these are authored by Andy Shevchenko who is a well known and recognized core helpers maintainers so this should be fine. We also see some Android GKI-related modularization in the MXC drivers. Core changes: - The big core change is the updated (v2) userspace character device API. This corrects badly designed 64-bit alignment around the line events. We also add the debounce request feature. This echoes the often quotes passage from Frederick Brooks "The mythical man-month" to always throw one away, which we have seen before in things such as V4L2. So we put in a new one and deprecate and obsolete the old one. - All example tools in tools/gpio/* are migrated to the new API to set a good example. The libgpiod userspace library has been augmented to use this new API pretty much from day 1. - Some misc API hardening by using strn* function calls has been added as well. - Use the simpler IDA interface for GPIO chip instance enumeration. - Add device core function for counting string arrays in device properties. - Provide a generic library function kfree_strarray() that can be used throughout the kernel. Driver enhancements: - The DesignWare dwapb-gpio driver has been enhanced and now uses the IRQ handling in the gpiolib core. - The mockup and aggregator drivers have seen some substantial code clean-up and now use more of the core kernel inftrastructure. - Misc cleanups using dev_err_probe(). - The MXC drivers (Freescale/NXP) can now be built modularized, which makes modularized GKI Android kernels happy" * tag 'gpio-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (73 commits) gpiolib: Update header block in gpiolib-cdev.h gpiolib: cdev: switch from kstrdup() to kstrndup() docs: gpio: add a new document to its index.rst gpio: pca953x: Add support for the NXP PCAL9554B/C tools: gpio: add debounce support to gpio-event-mon tools: gpio: add multi-line monitoring to gpio-event-mon tools: gpio: port gpio-event-mon to v2 uAPI tools: gpio: port gpio-hammer to v2 uAPI tools: gpio: rename nlines to num_lines tools: gpio: port gpio-watch to v2 uAPI tools: gpio: port lsgpio to v2 uAPI gpio: uapi: document uAPI v1 as deprecated gpiolib: cdev: support setting debounce gpiolib: cdev: support GPIO_V2_LINE_SET_VALUES_IOCTL gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL gpiolib: cdev: support edge detection for uAPI v2 gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL gpiolib: add build option for CDEV v1 ABI gpiolib: make cdev a build option ...

…nel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "New driver and chip support: - Moortec MR75203 PVT controller - MPS Multi-phase mp2975 controller - ADM1266 - Zen3 CPUs - Intel MAX 10 BMC Enhancements: - Support for rated attributes in hwmon core - MAX20730: - Device monitoring via debugfs - VOUT readin adjustment vie devicetree bindings - LM75: - Devicetree support - Regulator support - Improved accumulationm logic in amd_energy driver - Added fan sensor to gsc-hwmon driver - Support for simplified I2C probing Various other minor fixes and improvements" * tag 'hwmon-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (64 commits) hwmon: (pmbus/max20730) adjust the vout reading given voltage divider dt-bindings: hwmon: max20730: adding device tree doc for max20730 hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller hwmon: Add DT bindings schema for PVT controller dt-bindings: hwmon: Add the +vs supply to the lm75 bindings dt-bindings: hwmon: Convert lm75 bindings to yaml docs: hwmon: (ltc2945) update datasheet link hwmon: (mlxreg-fan) Fix double "Mellanox" hwmon: (pmbus/max20730) add device monitoring via debugfs hwmon: (pmbus/max34440) Fix OC fault limits hwmon: (bt1-pvt) Wait for the completion with timeout hwmon: (bt1-pvt) Cache current update timeout hwmon: (bt1-pvt) Test sensor power supply on probe hwmon: (lm75) Add regulator support hwmon: Add hwmon driver for Intel MAX 10 BMC dt-bindings: Add MP2975 voltage regulator device hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller hwmon: (tmp513) fix spelling typo in comments hwmon: (amd_energy) Update driver documentation hwmon: (amd_energy) Improve the accumulation logic ...

The conversion of #DE to the idtentry mechanism introduced a change in the Ooops message which confuses tools which parse crash information in dmesg. Remove the underscore from 'divide_error' to restore previous behaviour. Fixes: 9d06c40 ("x86/entry: Convert Divide Error to IDTENTRY") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com

Remove an unused variable. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201013154731.132565-1-mike.travis@hpe.com

…linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix the #DE oops message string format which confused tools parsing crash information (Thomas Gleixner) - Remove an unused variable in the UV5 code which was triggering a build warning with clang (Mike Travis) * tag 'x86_urgent_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Remove unused variable in UV5 NMI handler x86/traps: Fix #DE Oops message regression

Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...

Pull io_uring updates from Jens Axboe: - Add blkcg accounting for io-wq offload (Dennis) - A use-after-free fix for io-wq (Hillf) - Cancelation fixes and improvements - Use proper files_struct references for offload - Cleanup of io_uring_get_socket() since that can now go into our own header - SQPOLL fixes and cleanups, and support for sharing the thread - Improvement to how page accounting is done for registered buffers and huge pages, accounting the real pinned state - Series cleaning up the xarray code (Willy) - Various cleanups, refactoring, and improvements (Pavel) - Use raw spinlock for io-wq (Sebastian) - Add support for ring restrictions (Stefano) * tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits) io_uring: keep a pointer ref_node in file_data io_uring: refactor *files_register()'s error paths io_uring: clean file_data access in files_register io_uring: don't delay io_init_req() error check io_uring: clean leftovers after splitting issue io_uring: remove timeout.list after hrtimer cancel io_uring: use a separate struct for timeout_remove io_uring: improve submit_state.ios_left accounting io_uring: simplify io_file_get() io_uring: kill extra check in fixed io_file_get() io_uring: clean up ->files grabbing io_uring: don't io_prep_async_work() linked reqs io_uring: Convert advanced XArray uses to the normal API io_uring: Fix XArray usage in io_uring_add_task_file io_uring: Fix use of XArray in __io_uring_files_cancel io_uring: fix break condition for __io_uring_register() waiting io_uring: no need to call xa_destroy() on empty xarray io_uring: batch account ->req_issue and task struct references io_uring: kill callback_head argument for io_req_task_work_add() io_uring: move req preps out of io_issue_sqe() ...

The IPA BCM resource ("IP0") on sc7180 was moved to the clk-rpmh driver in commit bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180") and modeled as a clk, but this interconnect driver still had it modeled as an interconnect. This was mostly OK because nobody used the interconnect definition, until the interconnect framework started dropping bandwidth requests on interconnects that aren't used via the sync_state callback in commit 7d3b0b0 ("interconnect: qcom: Use icc_sync_state"). Once that patch was applied the IP0 resource was going to be controlled from two places, the clk framework and the interconnect framework. Even then, things were probably going to be OK, because commit b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate") was needed to actually drop bandwidth requests on unused interconnects, of which the IPA was one of the interconnect that wasn't getting dropped to zero. Combining the three commits together leads to bad behavior where the interconnect framework is disabling the IP0 resource because it has no users while the clk framework thinks the IP0 resource is on because the only user, the IPA driver, has turned it on via clk_prepare_enable(). Depending on when sync_state is called, we can get into a situation like below: IPA driver probes IPA driver gets notified modem started runtime PM get() IPA clk enabled -> IP0 resource is ON sync_state runs interconnect zeroes out the IP0 resource -> IP0 resource is off IPA driver tries to access a register and blows up The crash is an unclocked access that manifest as an SError. SError Interrupt on CPU0, code 0xbe000011 -- SError CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mutex_lock+0x4c/0x80 lr : mutex_lock+0x30/0x80 sp : ffffffc00da9b9c0 x29: ffffffc00da9b9c0 x28: 0000000000000000 x27: 0000000000000000 x26: ffffffc00da9bc90 x25: ffffff80c2024010 x24: ffffff80c2024000 x23: ffffff8083100000 x22: ffffff80831000d0 x21: ffffff80831000a8 x20: ffffff80831000a8 x19: ffffff8083100070 x18: 00000000ffff0a00 x17: 000000002f7254f1 x16: 0000000000000100 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 000000000001f0b8 x10: ffffffc00931f0b8 x9 : 0000000000000000 x8 : 0000000000000000 x7 : fefefefefeff2f60 x6 : 0000808080808080 x5 : 0000000000000000 x4 : 8080808080800000 x3 : ffffff80d2d4ee28 x2 : ffffff808c1d6e40 x1 : 0000000000000000 x0 : ffffff8083100070 Kernel panic - not syncing: Asynchronous SError Interrupt CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166 Hardware name: Google Lazor (rev1 - 2) with LTE (DT) Call trace: dump_backtrace+0xf4/0x114 show_stack+0x24/0x30 dump_stack_lvl+0x64/0x7c dump_stack+0x18/0x38 panic+0x150/0x38c nmi_panic+0x88/0xa0 arm64_serror_panic+0x74/0x80 do_serror+0x0/0x80 do_serror+0x58/0x80 el1h_64_error_handler+0x34/0x4c el1h_64_error+0x78/0x7c mutex_lock+0x4c/0x80 __gsi_channel_start+0x50/0x17c gsi_channel_start+0x54/0x90 ipa_endpoint_enable_one+0x34/0xc0 ipa_open+0x4c/0x120 Remove all IP0 resource management from the interconnect driver so that clk-rpmh is the sole owner. This fixes the issue by preventing the interconnect driver from overwriting the IP0 resource data that the clk-rpmh driver wrote. Cc: Alex Elder <elder@linaro.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Taniya Das <quic_tdas@quicinc.com> Cc: Mike Tipton <quic_mdtipton@quicinc.com> Fixes: b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate") Fixes: bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180") Fixes: 7d3b0b0 ("interconnect: qcom: Use icc_sync_state") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Alex Elder <elder@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220412220033.1273607-2-swboyd@chromium.org Signed-off-by: Georgi Djakov <djakov@kernel.org>

gpio_keys module can either accept gpios or interrupts. The module initializes delayed work in case of gpios only and is only used if debounce timer is not used, so make sure cancel_delayed_work_sync() is called only when its gpio-backed and debounce_use_hrtimer is false. This fixes the issue seen below when the gpio_keys module is unloaded and an interrupt pin is used instead of GPIO: [ 360.297569] ------------[ cut here ]------------ [ 360.302303] WARNING: CPU: 0 PID: 237 at kernel/workqueue.c:3066 __flush_work+0x414/0x470 [ 360.310531] Modules linked in: gpio_keys(-) [ 360.314797] CPU: 0 PID: 237 Comm: rmmod Not tainted 5.18.0-rc5-arm64-renesas-00116-g73636105874d-dirty #166 [ 360.324662] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT) [ 360.331270] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 360.338318] pc : __flush_work+0x414/0x470 [ 360.342385] lr : __cancel_work_timer+0x140/0x1b0 [ 360.347065] sp : ffff80000a7fba00 [ 360.350423] x29: ffff80000a7fba00 x28: ffff000012b9c5c0 x27: 0000000000000000 [ 360.357664] x26: ffff80000a7fbb80 x25: ffff80000954d0a8 x24: 0000000000000001 [ 360.364904] x23: ffff800009757000 x22: 0000000000000000 x21: ffff80000919b000 [ 360.372143] x20: ffff00000f5974e0 x19: ffff00000f5974e0 x18: ffff8000097fcf48 [ 360.379382] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000053f40 [ 360.386622] x14: ffff800009850e88 x13: 0000000000000002 x12: 000000000000a60c [ 360.393861] x11: 000000000000a610 x10: 0000000000000000 x9 : 0000000000000008 [ 360.401100] x8 : 0101010101010101 x7 : 00000000a473c394 x6 : 0080808080808080 [ 360.408339] x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffff80000919b458 [ 360.415578] x2 : ffff8000097577f0 x1 : 0000000000000001 x0 : 0000000000000000 [ 360.422818] Call trace: [ 360.425299] __flush_work+0x414/0x470 [ 360.429012] __cancel_work_timer+0x140/0x1b0 [ 360.433340] cancel_delayed_work_sync+0x10/0x18 [ 360.437931] gpio_keys_quiesce_key+0x28/0x58 [gpio_keys] [ 360.443327] devm_action_release+0x10/0x18 [ 360.447481] release_nodes+0x8c/0x1a0 [ 360.451194] devres_release_all+0x90/0x100 [ 360.455346] device_unbind_cleanup+0x14/0x60 [ 360.459677] device_release_driver_internal+0xe8/0x168 [ 360.464883] driver_detach+0x4c/0x90 [ 360.468509] bus_remove_driver+0x54/0xb0 [ 360.472485] driver_unregister+0x2c/0x58 [ 360.476462] platform_driver_unregister+0x10/0x18 [ 360.481230] gpio_keys_exit+0x14/0x828 [gpio_keys] [ 360.486088] __arm64_sys_delete_module+0x1e0/0x270 [ 360.490945] invoke_syscall+0x40/0xf8 [ 360.494661] el0_svc_common.constprop.3+0xf0/0x110 [ 360.499515] do_el0_svc+0x20/0x78 [ 360.502877] el0_svc+0x48/0xf8 [ 360.505977] el0t_64_sync_handler+0x88/0xb0 [ 360.510216] el0t_64_sync+0x148/0x14c [ 360.513930] irq event stamp: 4306 [ 360.517288] hardirqs last enabled at (4305): [<ffff8000080b0300>] __cancel_work_timer+0x130/0x1b0 [ 360.526359] hardirqs last disabled at (4306): [<ffff800008d194fc>] el1_dbg+0x24/0x88 [ 360.534204] softirqs last enabled at (4278): [<ffff8000080104a0>] _stext+0x4a0/0x5e0 [ 360.542133] softirqs last disabled at (4267): [<ffff8000080932ac>] irq_exit_rcu+0x18c/0x1b0 [ 360.550591] ---[ end trace 0000000000000000 ]--- Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20220524135822.14764-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

When running with return thunks enabled under 32-bit EFI, the system crashes with: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: 000000005bc02900 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0011) - permissions violation PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063 Oops: 0011 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:0x5bc02900 Code: Unable to access opcode bytes at RIP 0x5bc028d6. RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048 RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28 R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710 R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0 Call Trace: ? efi_set_virtual_address_map+0x9c/0x175 efi_enter_virtual_mode+0x4a6/0x53e start_kernel+0x67c/0x71e x86_64_start_reservations+0x24/0x2a x86_64_start_kernel+0xe9/0xf4 secondary_startup_64_no_verify+0xe5/0xeb That's because it cannot jump to the return thunk from the 32-bit code. Using a naked RET and marking it as safe allows the system to proceed booting. Fixes: aa3d480 ("x86: Use return-thunk in asm code") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: <stable@vger.kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

If a relocatable kernel is loaded at an address that is not 2MB aligned and told not to relocate to zero, the kernel can crash due to mark_rodata_ro() incorrectly changing some read-write data to read-only. Scenarios where the misalignment can occur are when the kernel is loaded by kdump or using the RELOCATABLE_TEST config option. Example crash with the kernel loaded at 5MB: Run /sbin/init as init process BUG: Unable to handle kernel data access on write at 0xc000000000452000 Faulting instruction address: 0xc0000000005b6730 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380 REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000 CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0 ... NIP memset+0x68/0x104 LR zero_user_segments.constprop.0+0xa8/0xf0 Call Trace: ext4_mpage_readpages+0x7f8/0x830 ext4_readahead+0x48/0x60 read_pages+0xb8/0x380 page_cache_ra_unbounded+0x19c/0x250 filemap_fault+0x58c/0xae0 __do_fault+0x60/0x100 __handle_mm_fault+0x1230/0x1a40 handle_mm_fault+0x120/0x300 ___do_page_fault+0x20c/0xa80 do_page_fault+0x30/0xc0 data_access_common_virt+0x210/0x220 This happens because mark_rodata_ro() tries to change permissions on the range _stext..__end_rodata, but _stext sits in the middle of the 2MB page from 4MB to 6MB: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec) The logic that changes the permissions assumes the linear mapping was split correctly at boot, so it marks the entire 2MB page read-only. That leads to the write fault above. To fix it, the boot time mapping logic needs to consider that if the kernel is running at a non-zero address then _stext is a boundary where it must split the mapping. That leads to the mapping being split correctly, allowing the rodata permission change to take happen correctly, with no spillover: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec) radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec) If the kernel is loaded at a 2MB aligned address, the mapping continues to use 2MB pages as before: radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec) radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages Fixes: c55d7b5 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au

…el.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: log intent item recovery should reconstruct defer work state Long Li reported a KASAN report from a UAF when intent recovery fails: ================================================================== BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x50/0x70 print_report+0xc2/0x600 kasan_report+0xb6/0xe0 xfs_cui_release+0xb7/0xc0 xfs_cud_item_release+0x3c/0x90 xfs_trans_committed_bulk+0x2d5/0x7f0 xlog_cil_committed+0xaba/0xf20 xlog_cil_push_work+0x1a60/0x2360 process_one_work+0x78e/0x1140 worker_thread+0x58b/0xf60 kthread+0x2cd/0x3c0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x55/0x60 kmem_cache_alloc+0x195/0x5f0 xfs_cui_init+0x198/0x1d0 xlog_recover_cui_commit_pass2+0x133/0x5f0 xlog_recover_items_pass2+0x107/0x230 xlog_recover_commit_trans+0x3e7/0x9c0 xlog_recovery_process_trans+0x140/0x1d0 xlog_recover_process_ophdr+0x1a0/0x3d0 xlog_recover_process_data+0x108/0x2d0 xlog_recover_process+0x1f6/0x280 xlog_do_recovery_pass+0x609/0xdb0 xlog_do_log_recovery+0x84/0xe0 xlog_do_recover+0x7d/0x470 xlog_recover+0x25f/0x490 xfs_log_mount+0x2dd/0x6f0 xfs_mountfs+0x11ce/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x40 __kasan_slab_free+0x114/0x1b0 kmem_cache_free+0xf8/0x510 xfs_cui_item_free+0x95/0xb0 xfs_cui_release+0x86/0xc0 xlog_recover_cancel_intents.isra.0+0xf8/0x210 xlog_recover_finish+0x7e7/0x980 xfs_log_mount_finish+0x2bb/0x4a0 xfs_mountfs+0x14bf/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888012575dc8 which belongs to the cache xfs_cui_item of size 432 The buggy address is located 152 bytes inside of freed 432-byte region [ffff888012575dc8, ffff888012575f78) The buggy address belongs to the physical page: page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc ================================================================== "If process intents fails, intent items left in AIL will be delete from AIL and freed in error handling, even intent items that have been recovered and created done items. After this, uaf will be triggered when done item committed, because at this point the released intent item will be accessed. xlog_recover_finish xlog_cil_push_work ---------------------------- --------------------------- xlog_recover_process_intents xfs_cui_item_recover//cui_refcount == 1 xfs_trans_get_cud xfs_trans_commit <add cud item to cil> xfs_cui_item_recover <error occurred and return> xlog_recover_cancel_intents xfs_cui_release //cui_refcount == 0 xfs_cui_item_free //free cui <release other intent items> xlog_force_shutdown //shutdown <...> <push items in cil> xlog_cil_committed xfs_cud_item_release xfs_cui_release // UAF "Intent log items are created with a reference count of 2, one for the creator, and one for the intent done object. Log recovery explicitly drops the creator reference after it is inserted into the AIL, but it then processes the log item as if it also owns the intent-done reference. "The code in ->iop_recovery should assume that it passes the reference to the done intent, we can remove the intent item from the AIL after creating the done-intent, but if that code fails before creating the done-intent then it needs to release the intent reference by log recovery itself. "That way when we go to cancel the intent, the only intents we find in the AIL are the ones we know have not been processed yet and hence we can safely drop both the creator and the intent done reference from xlog_recover_cancel_intents(). "Hence if we remove the intent from the list of intents that need to be recovered after we have done the initial recovery, we acheive two things: "1. the tail of the log can be moved forward with the commit of the done intent or new intent to continue the operation, and "2. We avoid the problem of trying to determine how many reference counts we need to drop from intent recovery cancelling because we never come across intents we've actually attempted recovery on." Restated: The cause of the UAF is that xlog_recover_cancel_intents thinks that it owns the refcount on any intent item in the AIL, and that it's always safe to release these intent items. This is not true after the recovery function creates an log intent done item and points it at the log intent item because releasing the done item always releases the intent item. The runtime defer ops code avoids all this by tracking both the log intent and the intent done items, and releasing only the intent done item if both have been created. Long Li proposed fixing this by adding state flags, but I have a more comprehensive fix. First, observe that the latter half of the intent _recover functions are nearly open-coded versions of the corresponding _finish_one function that uses an onstack deferred work item to single-step through the item. Second, notice that the recover function is not an exact match because of the odd behavior that unfinished recovered work items are relogged with separate log intent items instead of a single new log intent item, which is what the defer ops machinery does. Dave and I have long suspected that recovery should be reconstructing the defer work state from what's in the recovered intent item. Now we finally have an excuse to refactor the code to do that. This series starts by fixing a resource leak in LARP recovery. We fix the bug that Long Li reported by switching the intent recovery code to construct chains of xfs_defer_pending objects and then using the defer pending objects to track the intent/done item ownership. Finally, we clean up the code to reconstruct the exact incore state, which means we can remove all the opencoded _recover code, which makes maintaining log items much easier. v2: minor changes per review comments v3: pick up more rvb tags, fix build errors This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move ->iop_recover to xfs_defer_op_type xfs: use xfs_defer_finish_one to finish recovered work items xfs: dump the recovered xattri log item if corruption happens xfs: recreate work items when recovering intent items xfs: transfer recovered intent item ownership in ->iop_recover xfs: pass the xfs_defer_pending object to iop_recover xfs: use xfs_defer_pending objects to recover intent items xfs: don't leak recovered attri intent items

@2

Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: #344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: #345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: #346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu

isilence and others added 30 commits September 30, 2020 20:32

io_uring: refactor io_req_map_rw()

afb8765

Set rw->free_iovec to @IoveC, that gives an identical result and stresses that @IoveC param rw->free_iovec play the same role. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: remove unneeded semicolon

14db841

Fixes coccicheck warning: fs/io_uring.c:4242:13-14: Unneeded semicolon Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io-wq: kill unused IO_WORKER_F_EXITING

145cc8c

This flag is no longer used, remove it. Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: simplify io_alloc_req()

291b282

Extract common code from if/else branches. That is cleaner and optimised even better. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: io_kiocb_ppos() style change

5b09e37

Put brackets around bitwise ops in a complex expression Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: decouple issuing and req preparation

bfe7655

io_issue_sqe() does two things at once, trying to prepare request and issuing them. Split it in two and deduplicate with io_defer_prep(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: kill callback_head argument for io_req_task_work_add()

87c4311

We always use &req->task_work anyway, no point in passing it in. Signed-off-by: Jens Axboe <axboe@kernel.dk>

media: vidtv: simplify PCR logic to get jiffies

880a8fc

There's no need to have a check when setting timing.past_jiffies, as we can simply do the initialization earlier at vidtv_mux_init(). Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

media: vidtv: remove more ENDIAN_BITFIELD nonsense

02578bd

Changeset 870e350 ("media: vidtv: get rid of ENDIAN_BITFIELD nonsense") was incomplete. There are still some wrong endannes logic at the driver. Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

media: vidtv: fix a typo

d6a36ed

optinal -> optional Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

isilence and others added 22 commits October 10, 2020 12:49

io_uring: clean leftovers after splitting issue

062d04d

Kill extra if in io_issue_sqe() and place send/recv[msg] calls appropriately under switch's cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: clean file_data access in files_register

5398ae6

Keep file_data in a local var and replace with it complex references such as ctx->file_data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: refactor *files_register()'s error paths

600cf3f

Don't keep repeating cleaning sequences in error paths, write it once in the and use labels. It's less error prone and looks cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

x86/platform/uv: Remove unused variable in UV5 NMI handler

081dd68

Remove an unused variable. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201013154731.132565-1-mike.travis@hpe.com

pull bot added the ⤵️ pull label Oct 13, 2020

pull bot merged commit 6ad4bf6 into vchong:master Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from torvalds:master #166

[pull] master from torvalds:master #166

pull bot commented Oct 13, 2020 •

edited

Loading

[pull] master from torvalds:master #166

[pull] master from torvalds:master #166

Conversation

pull bot commented Oct 13, 2020 • edited Loading

pull bot commented Oct 13, 2020 •

edited

Loading