Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from torvalds:master #166

Merged
merged 1,323 commits into from
Oct 13, 2020
Merged

[pull] master from torvalds:master #166

merged 1,323 commits into from
Oct 13, 2020

Conversation

pull[bot]
Copy link

@pull pull bot commented Oct 13, 2020

See Commits and Changes for more details.


Created by pull[bot]. Want to support this open source service? Please star it : )

isilence and others added 30 commits September 30, 2020 20:32
Set rw->free_iovec to @IoveC, that gives an identical result and stresses
that @IoveC param rw->free_iovec play the same role.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When io_req_map_rw() is called from io_rw_prep_async(), it memcpy()
iorw->iter into itself. Even though it doesn't lead to an error, such a
memcpy()'s aliasing rules violation is considered to be a bad practise.

Inline io_req_map_rw() into io_rw_prep_async(). We don't really need any
remapping there, so it's much simpler than the generic implementation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Testing ctx->user_bufs for NULL in io_import_fixed() is not neccessary,
because in that case ctx->nr_user_bufs would be zero, and the following
check would fail.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's really no point in having this union, it just means that we're
always allocating enough room to cater to any command. But that's
pointless, as the ->io field is request type private anyway.

This gets rid of the io_async_ctx structure, and fills in the required
size in the io_op_defs[] instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the spirit of fairness, cap the max number of SQ entries we'll submit
for SQPOLL if we have multiple rings. If we don't do that, we could be
submitting tons of entries for one ring, while others are waiting to get
service.

The value of 8 is somewhat arbitrarily chosen as something that allows
a fair bit of batching, without using an excessive time per ring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fixes coccicheck warning:

fs/io_uring.c:4242:13-14: Unneeded semicolon

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring does account any registered buffer as pinned/locked memory, and
checks limit and fails if the given user doesn't have a big enough limit
to register the ranges specified. However, if huge pages are used, we
are potentially under-accounting the memory in terms of what gets pinned
on the vm side.

This patch rectifies that, by ensuring that we account the full size of
a compound page, regardless of how much of it is being registered. Huge
pages are not accounted mulitple times - if multiple sections of a huge
page is registered, then the page is only accounted once.

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are a few operations that are offloaded to the worker threads. In
this case, we lose process context and end up in kthread context. This
results in ios to be not accounted to the issuing cgroup and
consequently end up as issued by root. Just like others, adopt the
personality of the blkcg too when issuing via the workqueues.

For the SQPOLL thread, it will live and attach in the inited cgroup's
context.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We do this for CQ ring wait, in case task_work completions come in. We
should do the same in io_uring_register(), to avoid spurious -EINTR
if the ring quiescing ends up having to process task_work to complete
the operation

Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In most cases we'll specify IORING_SETUP_SQPOLL and run multiple
io_uring instances in a host. Since all sqthreads are named
"io_uring-sq", it's hard to distinguish the relations between
application process and its io_uring sqthread.
With this patch, application can get its corresponding sqthread pid
and cpu through show_fdinfo.
Steps:
1. Get io_uring fd first.
$ ls -l /proc/<pid>/fd | grep -w io_uring
2. Then get io_uring instance related info, including corresponding
sqthread pid and cpu.
$ cat /proc/<pid>/fdinfo/<io_uring_fd>

pos:	0
flags:	02000002
mnt_id:	13
SqThread:	6929
SqThreadCpu:	2
UserFiles:	1
    0: testfile
UserBufs:	0
PollList:

Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
[axboe: fixed for new shared SQPOLL]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The smart syzbot has found a reproducer for the following issue:

 ==================================================================
 BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
 BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline]
 BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771

 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x198/0x1fd lib/dump_stack.c:118
  print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
  __kasan_report mm/kasan/report.c:513 [inline]
  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
  check_memory_region_inline mm/kasan/generic.c:186 [inline]
  check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
  instrument_atomic_write include/linux/instrumented.h:71 [inline]
  atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
  io_wqe_inc_running fs/io-wq.c:301 [inline]
  io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
  schedule_timeout+0x148/0x250 kernel/time/timer.c:1879
  io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

 Allocated by task 7768:
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track mm/kasan/common.c:56 [inline]
  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
  kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594
  kmalloc_node include/linux/slab.h:572 [inline]
  kzalloc_node include/linux/slab.h:677 [inline]
  io_wq_create+0x57b/0xa10 fs/io-wq.c:1064
  io_init_wq_offload fs/io_uring.c:7432 [inline]
  io_sq_offload_start fs/io_uring.c:7504 [inline]
  io_uring_create fs/io_uring.c:8625 [inline]
  io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

 Freed by task 21:
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
  kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
  __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
  __cache_free mm/slab.c:3418 [inline]
  kfree+0x10e/0x2b0 mm/slab.c:3756
  __io_wq_destroy fs/io-wq.c:1138 [inline]
  io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146
  io_finish_async fs/io_uring.c:6836 [inline]
  io_ring_ctx_free fs/io_uring.c:7870 [inline]
  io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954
  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

 The buggy address belongs to the object at ffff8882183db000
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 140 bytes inside of
  1024-byte region [ffff8882183db000, ffff8882183db400)
 The buggy address belongs to the page:
 page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db
 flags: 0x57ffe0000000200(slab)
 raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700
 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

which is down to the comment below,

	/* all workers gone, wq exit can proceed */
	if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
		complete(&wqe->wq->done);

because there might be multiple cases of wqe in a wq and we would wait
for every worker in every wqe to go home before releasing wq's resources
on destroying.

To that end, rework wq's refcount by making it independent of the tracking
of workers because after all they are two different things, and keeping
it balanced when workers come and go. Note the manager kthread, like
other workers, now holds a grab to wq during its lifetime.

Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker
and do nothing for exiting wq.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com
Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This flag is no longer used, remove it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Extract common code from if/else branches. That is cleaner and optimised
even better.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Put brackets around bitwise ops in a complex expression

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
REQ_F_NEED_CLEANUP is set only by io_*_prep() and they're guaranteed to
be called only once, so there is no one who may have set the flag
before. Kill REQ_F_NEED_CLEANUP check in these *prep() handlers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move setting IOCB_NOWAIT from io_prep_rw() into io_read()/io_write(), so
it's set/cleared in a single place. Also remove @force_nonblock
parameter from io_prep_rw().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All io_*_prep() functions including io_{read,write}_prep() are called
only during submission where @force_nonblock is always true. Don't keep
propagating it and instead remove the @force_nonblock argument
from prep() altogether.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_issue_sqe() does two things at once, trying to prepare request and
issuing them. Split it in two and deduplicate with io_defer_prep().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All request preparations are done only during submission, reflect it in
the code by moving io_req_prep() much earlier into io_queue_sqe().

That's much cleaner, because it doen't expose bits to async code which
it won't ever use. Also it makes the interface harder to misuse, and
there are potential places for bugs.

For instance, __io_queue() doesn't clear @sqe before proceeding to a
next linked request, that could have been disastrous, but hopefully
there are linked requests IFF sqe==NULL, so not actually a bug.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We always use &req->task_work anyway, no point in passing it in.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's no need to have a check when setting timing.past_jiffies,
as we can simply do the initialization earlier at vidtv_mux_init().

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Changeset 870e350 ("media: vidtv: get rid of ENDIAN_BITFIELD nonsense")
was incomplete. There are still some wrong endannes logic at
the driver. Get rid of them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This field should point to the network ID, and has different
ranges for cable, terrestrial or satellite. It also has
an special range for temporary private usage.

For now, let's use the temporary private one. Once the
Network Information Table (NIT) gets added, this should be
reviewed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
There's no reason to use static vars to store PSI version
numbers.

Also, currently, version numbers are starting with 0x01,
because there's a code being called that increases it to
1 for all table initializer code, as the code may support
dynamic changes at the PS tables on some future.

So, let's just initialize them to 0x1f, in order for the
versions to be reported as starting from 0.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
In order to better test userspace applications, add an encoding
for the strings. The original channel string is also too big.
As we're just playing the first compasses of the 5th Beethoven
Symphony, change the service name to just "Beethoven". Also,
add a provider's name. For such purpose, let's point it to
the Linux media community website.

With such changes, the SDT table now looks more like one got
from a real digital TV channel:

SDT
| table_id         0x42
| section_length      44
| one                 2
| zero                1
| syntax              1
| transport_stream_id 1860
| current_next        1
| version             0
| one2                3
| section_number      0
| last_section_number 0
| network_id          65281
| reserved            255
|\
|- service 0x0880
|   EIT schedule          0
|   EIT present following 0
|   free CA mode          0
|   running status        4
|   descriptor length     27
|        0x48: service_descriptor
|           service type  1
|           provider      'LinuxTV.org'
|           name          'Beethoven'
|_  1 services

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
optinal -> optional

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Instead of passing struct pes_ts_header_write_args fields as
function parameters, just pass a pointer to the struct.

That would allow adding more args without needing to change
the function prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
When compared with a stream generated via ffmpeg, it can be
noticed that the PES doesn't contain any PCR info. That could
cause problems with userspace decoding. So, rewrite the
logic that fills the adaptation info, in order to allow it
to add PCR frames without breaking frame alignment.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Neither Vlc nor Gstreamer likes the PES_scrambling_control bits.

In the case of GST, this can be seen with:

	$ GST_DEBUG=2 LANG=C gst-play-1.0 pcm_audio.ts
	...
	0:00:00.097973439 12308 0x55f7ddd155e0 WARN               pesparser pesparse.c:411:mpegts_parse_pes_header: Wrong '0x10' marker before PES_scrambling_control (0x40)
	0:00:00.097987026 12308 0x55f7ddd155e0 WARN                 tsdemux tsdemux.c:2314:gst_ts_demux_parse_pes_header: Error parsing PES header. pid: 0x111 stream_type: 0x6
	...

So, change, it. After such change, the stream now plays
fine with Vlc, Gstreamer, ffmpeg - and with programs
that use such libraries, like Kaffeine.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add proper control type to the recently introduced
V4L2_CID_MPEG_VIDEO_FRAME_SKIP_MODE control. This will forward it to
v4l2_ctrl_new_std_menu() not v4l2_ctrl_new_std(), what causes the
failure. This fixes the following warning during driver initialization:

s5p_mfc_enc_ctrls_setup:2671: Adding control (18) failed
s5p_mfc_open:811: Failed to setup mfc controls

Fixes: ef56b3e ("media: s5p-mfc: Use standard frame skip mode control")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
isilence and others added 22 commits October 10, 2020 12:49
ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files
are not used. Hence, verifying fds only against ctx->nr_user_files is
enough. Remove the other check from hot path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keep ->needs_file_no_error check out of io_file_get(), and let callers
handle it. It makes it more straightforward. Also, as the only error it
can hand back -EBADF, make it return a file or NULL.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
state->ios_left isn't decremented for requests that don't need a file,
so it might be larger than number of SQEs left. That in some
circumstances makes us to grab more files that is needed so imposing
extra put.
Deaccount one ios_left for each request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't use struct io_timeout for both IORING_OP_TIMEOUT and
IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two,
that allows to remove an unused field in struct io_timeout, and btw kill
->flags not used by either. This also easier to follow, especially for
timeout remove.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel()
successfully cancels it. With this we don't need to care whether there
was a race and it was removed in io_timeout_fn(), and that will be handy
for following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kill extra if in io_issue_sqe() and place send/recv[msg] calls
appropriately under switch's cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't postpone io_init_req() error checks and do that right after
calling it. There is no control-flow statements or dependencies with
sqe/submitted accounting, so do those earlier, that makes the code flow
a bit more natural.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keep file_data in a local var and replace with it complex references
such as ctx->file_data.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't keep repeating cleaning sequences in error paths, write it once
in the and use labels. It's less error prone and looks cleaner.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->cur_refs of struct fixed_file_data always points to percpu_ref
embedded into struct fixed_file_ref_node. Don't overuse container_of()
and offsetting, and point directly to fixed_file_ref_node.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
…t/ulfh/mmc

Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Export SDIO revision and info strings to userspace
   - Add support for specifying mmc/mmcblk index via mmc aliases in DT

 MMC host:
   - Enable support for async probe for all mmc host drivers
   - Enable compile testing of multiple host drivers
   - dw_mmc: Enable the Synopsys DesignWare driver for RISCV and CSKY
   - mtk-sd: Fixup support for CQHCI
   - owl-mmc: Add support for the actions,s700-mmc variant
   - renesas_sdhi: Fix regression (temporary) for re-insertion of SD cards
   - renesas_sdhi: Add support for the r8a774e1 variant
   - renesas_sdhi/tmio: Improvements for tunings
   - renesas_sdhi/tmio: Rework support for reset of controller
   - sdhci-acpi: Fix HS400 tuning for devices with invalid presets on AMDI0040
   - sdhci_am654: Improve support for tunings
   - sdhci_am654: Add support for input tap delays
   - sdhci_am654: Add workaround for card detect debounce timer
   - sdhci-am654: Add support for the TI's J7200 variants
   - sdhci-esdhc-imx: Fix support for manual tuning
   - sdhci-iproc: Enable support for eMMC DDR 3.3V for bcm2711
   - sdhci-msm: Fix stability issues with HS400 for sc7180
   - sdhci-of-sparx5: Add Sparx5 SoC eMMC driver
   - sdhci-of-esdhc: Fixup reference clock source selection
   - sdhci-pci: Add LTR support for some Intel BYT controllers
   - sdhci-pci-gli: Add CQHCI Support for GL9763E"

* tag 'mmc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (91 commits)
  mmc: sdhci_am654: Fix module autoload
  mmc: renesas_sdhi: workaround a regression when reinserting SD cards
  mmc: sdhci-pci-gli: Add CQHCI Support for GL9763E
  mmc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
  mmc: sdhci_am654: Enable tuning for SDR50
  mmc: sdhci_am654: Add support for software tuning
  mmc: sdhci_am654: Add support for input tap delay
  mmc: sdhci_am654: Fix hard coded otap delay array size
  dt-bindings: mmc: sdhci-am654: Add documentation for input tap delay
  dt-bindings: mmc: sdhci-am654: Convert sdhci-am654 controller documentation to json schema
  mmc: sdhci-of-esdhc: fix reference clock source selection
  mmc: host: fix depends for MMC_MESON_GX w/ COMPILE_TEST
  mmc: sdhci-s3c: hide forward declaration of of_device_id behind CONFIG_OF
  mmc: sdhci: fix indentation mistakes
  mmc: moxart: remove unneeded check for drvdata
  mmc: renesas_sdhi: drop local flag for tuning
  mmc: rtsx_usb_sdmmc: simplify the return expression of sd_change_phase()
  mmc: core: document mmc_hw_reset()
  mmc: mediatek: Drop pointer to mmc_host from msdc_host
  dt-bindings: mmc: owl: add compatible string actions,s700-mmc
  ...
…l/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - the usbvision driver was dropped from staging

 - the Zoran driver were re-added at staging. It gained lots of
   improvements, and was converted to use videobuf2 API

 - a new virtual driver (vidtv) was added in order to allow testing the
   digital TV framework and APIs

 - the media uAPI documentation gained a glossary with commonly used
   terms, helping to simplify some parts of the docs

 - more cleanups at the atomisp driver

 - Mediatek VPU gained support for MT8183

 - added support for codecs with supports doing colorspace conversion
   (CSC)

 - support for CSC API was added at vivid and rksip1 drivers

 - added a helper core support and uAPI for better supporting H.264
   codecs

 - added support for Renesas R8A774E1

 - use the new SPDX GFDL-1.1-no-invariants-or-later license on media
   uAPI docs, instead of a license text

 - Venus driver has gained VP9 codec support

 - lots of other cleanups and driver improvements

* tag 'media/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (555 commits)
  media: dvb-frontends/drxk_hard.c: fix uninitialized variable warning
  media: tvp7002: fix uninitialized variable warning
  media: s5k5baf: drop 'data' field in struct s5k5baf_fw
  media: dt-bindings: media: venus: Add an optional power domain for perf voting
  media: rcar-vin: rcar-dma: Fix setting VNIS_REG for RAW8 formats
  media: staging: rkisp1: uapi: Do not use BIT() macro
  media: v4l2-mem2mem: Fix spurious v4l2_m2m_buf_done
  media: usbtv: Fix refcounting mixup
  media: zoran.rst: place it at the right place this time
  media: add Zoran cardlist
  media: admin-guide: update cardlists
  media: siano: rename a duplicated card string
  media: zoran: move documentation file to the right place
  media: atomisp: fixes build breakage for ISP2400 due to a cleanup
  media: zoran: fix mixed case on vars
  media: zoran: get rid of an unused var
  media: zoran: use upper case for card types
  media: zoran: fix sparse warnings
  media: zoran: fix smatch warning
  media: zoran: update TODO
  ...
…/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "Quite a busy release for regmap, mostly support for new features
  useful on fairly small subsets of devices. The user visible features
  are:

   - A new API for registering large numbers of regmap fields at once.

   - Support for Intel AVMM buses connected via SPI.

   - Support for 12/20 address/value layouts.

   - Support for yet another scheme for acknowledging interrupts used on
     some Qualcomm devices"

* tag 'regmap-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Add support to clear ack registers
  regmap: add support to regmap_field_bulk_alloc/free apis
  regmap: destroy mutex (if used) in regmap_exit()
  regmap: debugfs: use semicolons rather than commas to separate statements
  regmap: debugfs: Fix more error path regressions
  regmap: Add support for 12/20 register formatting
  regmap: Add can_sleep configuration option
  regmap: soundwire: remove unused header mod_devicetable.h
  regmap: Use flexible sleep
  regmap: add Intel SPI Slave to AVMM Bus Bridge support
…nel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "This is a fairly small release for the regulator API, there's quite a
  few new devices supported and some important improvements around
  coupled regulators in the core but mostly just small fixes and
  improvements otherwise.

  Summary:

   - Fixes and cleanups around the handling of coupled regulators.

   - A special driver for some Raspberry Pi panels with some unusually
     custom stuff around them.

   - Support for Qualcomm PM660/PM660L, PM8950 and PM8953, Richtek
     RT4801 and RTMV20, Rohm BD9576MUF and BD9573MUF"

* tag 'regulator-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (89 commits)
  regulator: bd9576: Fix print
  regulator: bd9576: fix regulator binfdings dt node names
  dt-bindings: regulator: document pm8950 and pm8953 smd regulators
  regulator: qcom_smd: add pm8953 regulators
  regulator: Make constraint debug processing conditional on DEBUG
  regulator: qcom: labibb: Constify static structs
  regulator: dt-bindings: Document the PM660/PM660L PMICs entries
  regulator: qcom_smd: Add PM660/PM660L regulator support
  regulator: dt-bindings: Document the PM660/660L SPMI PMIC entries
  regulator: qcom_spmi: Add PM660/PM660L regulators
  regulator: qcom_spmi: Add support for new regulator types
  regulator: core: Enlarge max OF property name length to 64 chars
  regulator: tps65910: use regmap accessors
  regulator: rtmv20: Add missing regcache cache only before marked as dirty
  regulator: rtmv20: Update DT binding document and property name parsing
  regulator: rtmv20: Add DT-binding document for Richtek RTMV20
  regulator: rtmv20: Adds support for Richtek RTMV20 load switch regulator
  regulator: resolve supply after creating regulator
  regulator: print symbolic errors in kernel messages
  regulator: print state at boot
  ...
…t/broonie/spi

Pull spi updates from Mark Brown:
 "There's quite a lot of changes for SPI in this release but none in the
  core, they're all mostly small driver updates and additions. Some of
  the more notable changes include:

   - A huge set of cleanups, optimizations and improvements for the
     DesignWare driver from Serge Semin finishing up the work started
     last release.

   - Conversion of the Zynq gqspi driver to spi-mem.

   - Support for Baikal T1, Broadcom BCMSTB 7445, and Renesas R8A7742"

* tag 'spi-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (137 commits)
  spi: cadence: Add SPI transfer delays
  spi: dw: Add Baikal-T1 SPI Controller bindings
  spi: dw: Add Baikal-T1 SPI Controller glue driver
  spi: dw: Add poll-based SPI transfers support
  spi: dw: Introduce max mem-ops SPI bus frequency setting
  spi: dw: Add memory operations support
  spi: dw: Add generic DW SSI status-check method
  spi: dw: Move num-of retries parameter to the header file
  spi: dw: Explicitly de-assert CS on SPI transfer completion
  spi: dw: De-assert chip-select on reset
  spi: dw: Discard chip enabling on DMA setup error
  spi: dw: Unmask IRQs after enabling the chip
  spi: dw: Perform IRQ setup in a dedicated function
  spi: dw: Refactor IRQ-based SPI transfer procedure
  spi: dw: Refactor data IO procedure
  spi: dw: Add DW SPI controller config structure
  spi: dw: Update Rx sample delay in the config function
  spi: dw: Simplify the SPI bus speed config procedure
  spi: dw: Update SPI bus speed in a config function
  spi: dw: Detach SPI device specific CR0 config method
  ...
…/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This time very little driver changes but lots of core changes.

  We have some interesting cooperative work for ARM and Intel alike,
  making the GPIO subsystem more and more suitable for industrial
  systems and the like, in addition to the in-kernel users.

  We touch driver core (device properties) and lib/* by adding one
  simple string array free function, these are authored by Andy
  Shevchenko who is a well known and recognized core helpers maintainers
  so this should be fine.

  We also see some Android GKI-related modularization in the MXC
  drivers.

  Core changes:

   - The big core change is the updated (v2) userspace character device
     API.

     This corrects badly designed 64-bit alignment around the line
     events. We also add the debounce request feature. This echoes the
     often quotes passage from Frederick Brooks "The mythical man-month"
     to always throw one away, which we have seen before in things such
     as V4L2. So we put in a new one and deprecate and obsolete the old
     one.

   - All example tools in tools/gpio/* are migrated to the new API to
     set a good example. The libgpiod userspace library has been
     augmented to use this new API pretty much from day 1.

   - Some misc API hardening by using strn* function calls has been
     added as well.

   - Use the simpler IDA interface for GPIO chip instance enumeration.

   - Add device core function for counting string arrays in device
     properties.

   - Provide a generic library function kfree_strarray() that can be
     used throughout the kernel.

  Driver enhancements:

   - The DesignWare dwapb-gpio driver has been enhanced and now uses the
     IRQ handling in the gpiolib core.

   - The mockup and aggregator drivers have seen some substantial code
     clean-up and now use more of the core kernel inftrastructure.

   - Misc cleanups using dev_err_probe().

   - The MXC drivers (Freescale/NXP) can now be built modularized, which
     makes modularized GKI Android kernels happy"

* tag 'gpio-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (73 commits)
  gpiolib: Update header block in gpiolib-cdev.h
  gpiolib: cdev: switch from kstrdup() to kstrndup()
  docs: gpio: add a new document to its index.rst
  gpio: pca953x: Add support for the NXP PCAL9554B/C
  tools: gpio: add debounce support to gpio-event-mon
  tools: gpio: add multi-line monitoring to gpio-event-mon
  tools: gpio: port gpio-event-mon to v2 uAPI
  tools: gpio: port gpio-hammer to v2 uAPI
  tools: gpio: rename nlines to num_lines
  tools: gpio: port gpio-watch to v2 uAPI
  tools: gpio: port lsgpio to v2 uAPI
  gpio: uapi: document uAPI v1 as deprecated
  gpiolib: cdev: support setting debounce
  gpiolib: cdev: support GPIO_V2_LINE_SET_VALUES_IOCTL
  gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL
  gpiolib: cdev: support edge detection for uAPI v2
  gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL
  gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL
  gpiolib: add build option for CDEV v1 ABI
  gpiolib: make cdev a build option
  ...
…nel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:
 "New driver and chip support:
   - Moortec MR75203 PVT controller
   - MPS Multi-phase mp2975 controller
   - ADM1266
   - Zen3 CPUs
   - Intel MAX 10 BMC

  Enhancements:
   - Support for rated attributes in hwmon core
   - MAX20730:
      - Device monitoring via debugfs
      - VOUT readin adjustment vie devicetree bindings
   - LM75:
      - Devicetree support
      - Regulator support
   - Improved accumulationm logic in amd_energy driver
   - Added fan sensor to gsc-hwmon driver
   - Support for simplified I2C probing

  Various other minor fixes and improvements"

* tag 'hwmon-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (64 commits)
  hwmon: (pmbus/max20730) adjust the vout reading given voltage divider
  dt-bindings: hwmon: max20730: adding device tree doc for max20730
  hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller
  hwmon: Add DT bindings schema for PVT controller
  dt-bindings: hwmon: Add the +vs supply to the lm75 bindings
  dt-bindings: hwmon: Convert lm75 bindings to yaml
  docs: hwmon: (ltc2945) update datasheet link
  hwmon: (mlxreg-fan) Fix double "Mellanox"
  hwmon: (pmbus/max20730) add device monitoring via debugfs
  hwmon: (pmbus/max34440) Fix OC fault limits
  hwmon: (bt1-pvt) Wait for the completion with timeout
  hwmon: (bt1-pvt) Cache current update timeout
  hwmon: (bt1-pvt) Test sensor power supply on probe
  hwmon: (lm75) Add regulator support
  hwmon: Add hwmon driver for Intel MAX 10 BMC
  dt-bindings: Add MP2975 voltage regulator device
  hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller
  hwmon: (tmp513) fix spelling typo in comments
  hwmon: (amd_energy) Update driver documentation
  hwmon: (amd_energy) Improve the accumulation logic
  ...
The conversion of #DE to the idtentry mechanism introduced a change in the
Ooops message which confuses tools which parse crash information in dmesg.

Remove the underscore from 'divide_error' to restore previous behaviour.

Fixes: 9d06c40 ("x86/entry: Convert Divide Error to IDTENTRY")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com
Remove an unused variable.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201013154731.132565-1-mike.travis@hpe.com
…linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix the #DE oops message string format which confused tools parsing
   crash information (Thomas Gleixner)

 - Remove an unused variable in the UV5 code which was triggering a
   build warning with clang (Mike Travis)

* tag 'x86_urgent_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Remove unused variable in UV5 NMI handler
  x86/traps: Fix #DE Oops message regression
Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
Pull io_uring updates from Jens Axboe:

 - Add blkcg accounting for io-wq offload (Dennis)

 - A use-after-free fix for io-wq (Hillf)

 - Cancelation fixes and improvements

 - Use proper files_struct references for offload

 - Cleanup of io_uring_get_socket() since that can now go into our own
   header

 - SQPOLL fixes and cleanups, and support for sharing the thread

 - Improvement to how page accounting is done for registered buffers and
   huge pages, accounting the real pinned state

 - Series cleaning up the xarray code (Willy)

 - Various cleanups, refactoring, and improvements (Pavel)

 - Use raw spinlock for io-wq (Sebastian)

 - Add support for ring restrictions (Stefano)

* tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits)
  io_uring: keep a pointer ref_node in file_data
  io_uring: refactor *files_register()'s error paths
  io_uring: clean file_data access in files_register
  io_uring: don't delay io_init_req() error check
  io_uring: clean leftovers after splitting issue
  io_uring: remove timeout.list after hrtimer cancel
  io_uring: use a separate struct for timeout_remove
  io_uring: improve submit_state.ios_left accounting
  io_uring: simplify io_file_get()
  io_uring: kill extra check in fixed io_file_get()
  io_uring: clean up ->files grabbing
  io_uring: don't io_prep_async_work() linked reqs
  io_uring: Convert advanced XArray uses to the normal API
  io_uring: Fix XArray usage in io_uring_add_task_file
  io_uring: Fix use of XArray in __io_uring_files_cancel
  io_uring: fix break condition for __io_uring_register() waiting
  io_uring: no need to call xa_destroy() on empty xarray
  io_uring: batch account ->req_issue and task struct references
  io_uring: kill callback_head argument for io_req_task_work_add()
  io_uring: move req preps out of io_issue_sqe()
  ...
@pull pull bot added the ⤵️ pull label Oct 13, 2020
@pull pull bot merged commit 6ad4bf6 into vchong:master Oct 13, 2020
pull bot pushed a commit that referenced this pull request Apr 30, 2022
The IPA BCM resource ("IP0") on sc7180 was moved to the clk-rpmh driver
in commit bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180") and
modeled as a clk, but this interconnect driver still had it modeled as
an interconnect. This was mostly OK because nobody used the interconnect
definition, until the interconnect framework started dropping bandwidth
requests on interconnects that aren't used via the sync_state callback
in commit 7d3b0b0 ("interconnect: qcom: Use icc_sync_state"). Once
that patch was applied the IP0 resource was going to be controlled from
two places, the clk framework and the interconnect framework.

Even then, things were probably going to be OK, because commit
b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in
pre_aggregate") was needed to actually drop bandwidth requests on unused
interconnects, of which the IPA was one of the interconnect that wasn't
getting dropped to zero. Combining the three commits together leads to
bad behavior where the interconnect framework is disabling the IP0
resource because it has no users while the clk framework thinks the IP0
resource is on because the only user, the IPA driver, has turned it on
via clk_prepare_enable(). Depending on when sync_state is called, we can
get into a situation like below:

  IPA driver probes
  IPA driver gets notified modem started
   runtime PM get()
    IPA clk enabled -> IP0 resource is ON
  sync_state runs
   interconnect zeroes out the IP0 resource -> IP0 resource is off
  IPA driver tries to access a register and blows up

The crash is an unclocked access that manifest as an SError.

 SError Interrupt on CPU0, code 0xbe000011 -- SError
 CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166
 Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : mutex_lock+0x4c/0x80
 lr : mutex_lock+0x30/0x80
 sp : ffffffc00da9b9c0
 x29: ffffffc00da9b9c0 x28: 0000000000000000 x27: 0000000000000000
 x26: ffffffc00da9bc90 x25: ffffff80c2024010 x24: ffffff80c2024000
 x23: ffffff8083100000 x22: ffffff80831000d0 x21: ffffff80831000a8
 x20: ffffff80831000a8 x19: ffffff8083100070 x18: 00000000ffff0a00
 x17: 000000002f7254f1 x16: 0000000000000100 x15: 0000000000000000
 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
 x11: 000000000001f0b8 x10: ffffffc00931f0b8 x9 : 0000000000000000
 x8 : 0000000000000000 x7 : fefefefefeff2f60 x6 : 0000808080808080
 x5 : 0000000000000000 x4 : 8080808080800000 x3 : ffffff80d2d4ee28
 x2 : ffffff808c1d6e40 x1 : 0000000000000000 x0 : ffffff8083100070
 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 0 PID: 3595 Comm: mmdata_mgr Not tainted 5.17.1+ #166
 Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
 Call trace:
  dump_backtrace+0xf4/0x114
  show_stack+0x24/0x30
  dump_stack_lvl+0x64/0x7c
  dump_stack+0x18/0x38
  panic+0x150/0x38c
  nmi_panic+0x88/0xa0
  arm64_serror_panic+0x74/0x80
  do_serror+0x0/0x80
  do_serror+0x58/0x80
  el1h_64_error_handler+0x34/0x4c
  el1h_64_error+0x78/0x7c
  mutex_lock+0x4c/0x80
  __gsi_channel_start+0x50/0x17c
  gsi_channel_start+0x54/0x90
  ipa_endpoint_enable_one+0x34/0xc0
  ipa_open+0x4c/0x120

Remove all IP0 resource management from the interconnect driver so that
clk-rpmh is the sole owner. This fixes the issue by preventing the
interconnect driver from overwriting the IP0 resource data that the
clk-rpmh driver wrote.

Cc: Alex Elder <elder@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Cc: Mike Tipton <quic_mdtipton@quicinc.com>
Fixes: b95b668 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate")
Fixes: bcd63d2 ("clk: qcom: rpmh: Add IPA clock for SC7180")
Fixes: 7d3b0b0 ("interconnect: qcom: Use icc_sync_state")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220412220033.1273607-2-swboyd@chromium.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
pull bot pushed a commit that referenced this pull request May 29, 2022
gpio_keys module can either accept gpios or interrupts. The module
initializes delayed work in case of gpios only and is only used if
debounce timer is not used, so make sure cancel_delayed_work_sync()
is called only when its gpio-backed and debounce_use_hrtimer is false.

This fixes the issue seen below when the gpio_keys module is unloaded and
an interrupt pin is used instead of GPIO:

[  360.297569] ------------[ cut here ]------------
[  360.302303] WARNING: CPU: 0 PID: 237 at kernel/workqueue.c:3066 __flush_work+0x414/0x470
[  360.310531] Modules linked in: gpio_keys(-)
[  360.314797] CPU: 0 PID: 237 Comm: rmmod Not tainted 5.18.0-rc5-arm64-renesas-00116-g73636105874d-dirty #166
[  360.324662] Hardware name: Renesas SMARC EVK based on r9a07g054l2 (DT)
[  360.331270] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  360.338318] pc : __flush_work+0x414/0x470
[  360.342385] lr : __cancel_work_timer+0x140/0x1b0
[  360.347065] sp : ffff80000a7fba00
[  360.350423] x29: ffff80000a7fba00 x28: ffff000012b9c5c0 x27: 0000000000000000
[  360.357664] x26: ffff80000a7fbb80 x25: ffff80000954d0a8 x24: 0000000000000001
[  360.364904] x23: ffff800009757000 x22: 0000000000000000 x21: ffff80000919b000
[  360.372143] x20: ffff00000f5974e0 x19: ffff00000f5974e0 x18: ffff8000097fcf48
[  360.379382] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000053f40
[  360.386622] x14: ffff800009850e88 x13: 0000000000000002 x12: 000000000000a60c
[  360.393861] x11: 000000000000a610 x10: 0000000000000000 x9 : 0000000000000008
[  360.401100] x8 : 0101010101010101 x7 : 00000000a473c394 x6 : 0080808080808080
[  360.408339] x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffff80000919b458
[  360.415578] x2 : ffff8000097577f0 x1 : 0000000000000001 x0 : 0000000000000000
[  360.422818] Call trace:
[  360.425299]  __flush_work+0x414/0x470
[  360.429012]  __cancel_work_timer+0x140/0x1b0
[  360.433340]  cancel_delayed_work_sync+0x10/0x18
[  360.437931]  gpio_keys_quiesce_key+0x28/0x58 [gpio_keys]
[  360.443327]  devm_action_release+0x10/0x18
[  360.447481]  release_nodes+0x8c/0x1a0
[  360.451194]  devres_release_all+0x90/0x100
[  360.455346]  device_unbind_cleanup+0x14/0x60
[  360.459677]  device_release_driver_internal+0xe8/0x168
[  360.464883]  driver_detach+0x4c/0x90
[  360.468509]  bus_remove_driver+0x54/0xb0
[  360.472485]  driver_unregister+0x2c/0x58
[  360.476462]  platform_driver_unregister+0x10/0x18
[  360.481230]  gpio_keys_exit+0x14/0x828 [gpio_keys]
[  360.486088]  __arm64_sys_delete_module+0x1e0/0x270
[  360.490945]  invoke_syscall+0x40/0xf8
[  360.494661]  el0_svc_common.constprop.3+0xf0/0x110
[  360.499515]  do_el0_svc+0x20/0x78
[  360.502877]  el0_svc+0x48/0xf8
[  360.505977]  el0t_64_sync_handler+0x88/0xb0
[  360.510216]  el0t_64_sync+0x148/0x14c
[  360.513930] irq event stamp: 4306
[  360.517288] hardirqs last  enabled at (4305): [<ffff8000080b0300>] __cancel_work_timer+0x130/0x1b0
[  360.526359] hardirqs last disabled at (4306): [<ffff800008d194fc>] el1_dbg+0x24/0x88
[  360.534204] softirqs last  enabled at (4278): [<ffff8000080104a0>] _stext+0x4a0/0x5e0
[  360.542133] softirqs last disabled at (4267): [<ffff8000080932ac>] irq_exit_rcu+0x18c/0x1b0
[  360.550591] ---[ end trace 0000000000000000 ]---

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20220524135822.14764-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
pull bot pushed a commit that referenced this pull request Jul 16, 2022
When running with return thunks enabled under 32-bit EFI, the system
crashes with:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle page fault for address: 000000005bc02900
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0011) - permissions violation
  PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
  Oops: 0011 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:0x5bc02900
  Code: Unable to access opcode bytes at RIP 0x5bc028d6.
  RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
  RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
  RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
  R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
  R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
  CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
  Call Trace:
   ? efi_set_virtual_address_map+0x9c/0x175
   efi_enter_virtual_mode+0x4a6/0x53e
   start_kernel+0x67c/0x71e
   x86_64_start_reservations+0x24/0x2a
   x86_64_start_kernel+0xe9/0xf4
   secondary_startup_64_no_verify+0xe5/0xeb

That's because it cannot jump to the return thunk from the 32-bit code.

Using a naked RET and marking it as safe allows the system to proceed
booting.

Fixes: aa3d480 ("x86: Use return-thunk in asm code")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: <stable@vger.kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pull bot pushed a commit that referenced this pull request Feb 5, 2023
If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.

Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.

Example crash with the kernel loaded at 5MB:

  Run /sbin/init as init process
  BUG: Unable to handle kernel data access on write at 0xc000000000452000
  Faulting instruction address: 0xc0000000005b6730
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
  NIP:  c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
  REGS: c000000004503250 TRAP: 0300   Not tainted  (6.2.0-rc1-00011-g349188be4841)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44288480  XER: 00000000
  CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
  ...
  NIP memset+0x68/0x104
  LR  zero_user_segments.constprop.0+0xa8/0xf0
  Call Trace:
    ext4_mpage_readpages+0x7f8/0x830
    ext4_readahead+0x48/0x60
    read_pages+0xb8/0x380
    page_cache_ra_unbounded+0x19c/0x250
    filemap_fault+0x58c/0xae0
    __do_fault+0x60/0x100
    __handle_mm_fault+0x1230/0x1a40
    handle_mm_fault+0x120/0x300
    ___do_page_fault+0x20c/0xa80
    do_page_fault+0x30/0xc0
    data_access_common_virt+0x210/0x220

This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)

The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.

To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.

That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
  radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
  radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)

If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages

Fixes: c55d7b5 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
pull bot pushed a commit that referenced this pull request Jan 11, 2024
…el.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA

xfs: log intent item recovery should reconstruct defer work state

Long Li reported a KASAN report from a UAF when intent recovery fails:

 ==================================================================
 BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
 Workqueue: xfs-cil/sda xlog_cil_push_work
 Call Trace:
  <TASK>
  dump_stack_lvl+0x50/0x70
  print_report+0xc2/0x600
  kasan_report+0xb6/0xe0
  xfs_cui_release+0xb7/0xc0
  xfs_cud_item_release+0x3c/0x90
  xfs_trans_committed_bulk+0x2d5/0x7f0
  xlog_cil_committed+0xaba/0xf20
  xlog_cil_push_work+0x1a60/0x2360
  process_one_work+0x78e/0x1140
  worker_thread+0x58b/0xf60
  kthread+0x2cd/0x3c0
  ret_from_fork+0x1f/0x30
  </TASK>

 Allocated by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  __kasan_slab_alloc+0x55/0x60
  kmem_cache_alloc+0x195/0x5f0
  xfs_cui_init+0x198/0x1d0
  xlog_recover_cui_commit_pass2+0x133/0x5f0
  xlog_recover_items_pass2+0x107/0x230
  xlog_recover_commit_trans+0x3e7/0x9c0
  xlog_recovery_process_trans+0x140/0x1d0
  xlog_recover_process_ophdr+0x1a0/0x3d0
  xlog_recover_process_data+0x108/0x2d0
  xlog_recover_process+0x1f6/0x280
  xlog_do_recovery_pass+0x609/0xdb0
  xlog_do_log_recovery+0x84/0xe0
  xlog_do_recover+0x7d/0x470
  xlog_recover+0x25f/0x490
  xfs_log_mount+0x2dd/0x6f0
  xfs_mountfs+0x11ce/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  kasan_save_free_info+0x2b/0x40
  __kasan_slab_free+0x114/0x1b0
  kmem_cache_free+0xf8/0x510
  xfs_cui_item_free+0x95/0xb0
  xfs_cui_release+0x86/0xc0
  xlog_recover_cancel_intents.isra.0+0xf8/0x210
  xlog_recover_finish+0x7e7/0x980
  xfs_log_mount_finish+0x2bb/0x4a0
  xfs_mountfs+0x14bf/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 The buggy address belongs to the object at ffff888012575dc8
  which belongs to the cache xfs_cui_item of size 432
 The buggy address is located 152 bytes inside of
  freed 432-byte region [ffff888012575dc8, ffff888012575f78)

 The buggy address belongs to the physical page:
 page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
  ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
 >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
 ==================================================================

"If process intents fails, intent items left in AIL will be delete
from AIL and freed in error handling, even intent items that have been
recovered and created done items. After this, uaf will be triggered when
done item committed, because at this point the released intent item will
be accessed.

xlog_recover_finish                     xlog_cil_push_work
----------------------------            ---------------------------
xlog_recover_process_intents
  xfs_cui_item_recover//cui_refcount == 1
    xfs_trans_get_cud
    xfs_trans_commit
      <add cud item to cil>
  xfs_cui_item_recover
    <error occurred and return>
xlog_recover_cancel_intents
  xfs_cui_release     //cui_refcount == 0
    xfs_cui_item_free //free cui
  <release other intent items>
xlog_force_shutdown   //shutdown
                               <...>
                                        <push items in cil>
                                        xlog_cil_committed
                                          xfs_cud_item_release
                                            xfs_cui_release // UAF

"Intent log items are created with a reference count of 2, one for the
creator, and one for the intent done object. Log recovery explicitly
drops the creator reference after it is inserted into the AIL, but it
then processes the log item as if it also owns the intent-done reference.

"The code in ->iop_recovery should assume that it passes the reference
to the done intent, we can remove the intent item from the AIL after
creating the done-intent, but if that code fails before creating the
done-intent then it needs to release the intent reference by log recovery
itself.

"That way when we go to cancel the intent, the only intents we find in
the AIL are the ones we know have not been processed yet and hence we
can safely drop both the creator and the intent done reference from
xlog_recover_cancel_intents().

"Hence if we remove the intent from the list of intents that need to
be recovered after we have done the initial recovery, we acheive two
things:

"1. the tail of the log can be moved forward with the commit of the
done intent or new intent to continue the operation, and

"2. We avoid the problem of trying to determine how many reference
counts we need to drop from intent recovery cancelling because we
never come across intents we've actually attempted recovery on."

Restated: The cause of the UAF is that xlog_recover_cancel_intents
thinks that it owns the refcount on any intent item in the AIL, and that
it's always safe to release these intent items.  This is not true after
the recovery function creates an log intent done item and points it at
the log intent item because releasing the done item always releases the
intent item.

The runtime defer ops code avoids all this by tracking both the log
intent and the intent done items, and releasing only the intent done
item if both have been created.  Long Li proposed fixing this by adding
state flags, but I have a more comprehensive fix.

First, observe that the latter half of the intent _recover functions are
nearly open-coded versions of the corresponding _finish_one function
that uses an onstack deferred work item to single-step through the item.

Second, notice that the recover function is not an exact match because
of the odd behavior that unfinished recovered work items are relogged
with separate log intent items instead of a single new log intent item,
which is what the defer ops machinery does.

Dave and I have long suspected that recovery should be reconstructing
the defer work state from what's in the recovered intent item.  Now we
finally have an excuse to refactor the code to do that.

This series starts by fixing a resource leak in LARP recovery.  We fix
the bug that Long Li reported by switching the intent recovery code to
construct chains of xfs_defer_pending objects and then using the defer
pending objects to track the intent/done item ownership.  Finally, we
clean up the code to reconstruct the exact incore state, which means we
can remove all the opencoded _recover code, which makes maintaining log
items much easier.

v2: minor changes per review comments
v3: pick up more rvb tags, fix build errors

This has been lightly tested with fstests.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: move ->iop_recover to xfs_defer_op_type
  xfs: use xfs_defer_finish_one to finish recovered work items
  xfs: dump the recovered xattri log item if corruption happens
  xfs: recreate work items when recovering intent items
  xfs: transfer recovered intent item ownership in ->iop_recover
  xfs: pass the xfs_defer_pending object to iop_recover
  xfs: use xfs_defer_pending objects to recover intent items
  xfs: don't leak recovered attri intent items
pull bot pushed a commit that referenced this pull request May 17, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: #344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: #345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: #346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.