Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource tracker destroys shared memory segments when other processes should still have valid access #82300

Open
1 of 3 tasks
applio opened this issue Sep 11, 2019 · 36 comments
Open
1 of 3 tasks
Assignees
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@applio
Copy link
Member

applio commented Sep 11, 2019

BPO 38119
Nosy @pitrou, @applio, @pablogsal, @maggyero, @vinay0410, @JSmith-BitFlipper, @keven425, @TKaluza
PRs
  • bpo-38119: Fix shmem resource tracking #15989
  • gh-82300: Disable resource tracking & auto-destruction of SharedMemory blocks on POSIX #21516
  • bpo-38119: fix shared memory's resource tracking #23174
  • Files
  • mprt_monkeypatch.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/applio'
    closed_at = None
    created_at = <Date 2019-09-11.15:58:26.817>
    labels = ['3.8', 'type-bug', 'library', '3.9', '3.10']
    title = 'resource tracker destroys shared memory segments when other processes should still have valid access'
    updated_at = <Date 2022-02-10.20:53:56.777>
    user = 'https://github.com/applio'

    bugs.python.org fields:

    activity = <Date 2022-02-10.20:53:56.777>
    actor = 'maggyero'
    assignee = 'davin'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-09-11.15:58:26.817>
    creator = 'davin'
    dependencies = []
    files = ['49859']
    hgrepos = []
    issue_num = 38119
    keywords = ['patch']
    message_count = 14.0
    messages = ['351960', '352050', '374434', '374939', '374944', '374951', '374954', '374956', '374958', '374974', '387606', '388287', '390198', '412440']
    nosy_count = 12.0
    nosy_names = ['pitrou', 'steve.newcomb', 'turicas', 'davin', 'pablogsal', 'maggyero', 'vinay0410', 'damian.barabonkov', 'keven425', 'davfelsen', 'jdogzz-g5', 'timka']
    pr_nums = ['15989', '21516', '23174']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue38119'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    TODO list:

    Linked PRs

    @applio
    Copy link
    Member Author

    applio commented Sep 11, 2019

    The resource tracker currently destroys (via _posixshmem.shm_unlink) shared memory segments on posix systems when any independently created Python process with a handle on a shared memory segment exits (gracefully or otherwise). This breaks the expected cross-platform behavior that a shared memory segment persists at least as long as any running process has a handle on that segment.

    As described with an example scenario in bpo-37754:
    Let's say a three processes P1, P2 and P3 are trying to communicate using shared memory.
    --> P1 creates the shared memory block, and waits for P2 and P3 to access it.
    --> P2 starts and attaches this shared memory segment, writes some data to it and exits.
    --> Now in case of Unix, shm_unlink is called as soon as P2 exits. (This is by action of the resource tracker.)
    --> Now, P3 starts and tries to attach the shared memory segment.
    --> P3 will not be able to attach the shared memory segment in Unix, because shm_unlink has been called on that segment.
    --> Whereas, P3 will be able to attach to the shared memory segment in Windows.

    Another key scenario we expect to work but does not currently:

    1. A multiprocessing.managers.SharedMemoryManager is instantiated and started in process A.
    2. A shared memory segment is created using that manager in process A.
    3. A serialized representation of that shared memory segment is deserialized in process B.
    4. Process B does work with the shared memory segment that is also still visible to process A.
    5. Process B exits cleanly.
    6. Process A reads data from the shared memory segment after process B is gone. (This currently fails.)

    The SharedMemoryManager provides a flexible means for ensuring cleanup of shared memory segments. The current resource tracker attempts to treat shared memory segments as equivalent to semaphore references, which is too narrow of an interpretation. As such, the current resource tracker should not be attempting to enforce cleanup of shared memory segments because it breaks expected behavior and significantly limits functionality.

    @applio applio added 3.8 (EOL) end of life 3.9 only security fixes labels Sep 11, 2019
    @applio applio self-assigned this Sep 11, 2019
    @applio applio added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 11, 2019
    @vinay0410
    Copy link
    Mannequin

    vinay0410 mannequin commented Sep 12, 2019

    Hi Davin,
    This PR would fix the issues mentioned by you, by not prematurely unlinking the shared memory segment. And, therefore it would make shared memory useful in a lot of use cases.

    But, this would still not make Unix's implementation consistent with Windows.
    Windows uses a reference counting mechanism to count the number of processes using a shared memory segment. When all of them are done using it, Windows simply unlinks and frees the memory allocated to the shared memory segment.

    I know that you already know this. I am commenting to find out, that what would be the next steps to fix the above inconsistency. You could see my last comment(msg351445) in bpo-37754, where I have listed some ways to implement the above reference counting mechanism.

    If you could have a look and see which one would be the best way, I would be happy to make a PR for it.

    @gvanrossum
    Copy link
    Member

    @davin, could you merge one or the other of the PRs that fix this? Presumably also backport to 3.9 and 3.8 (but that's up to you and the release manager).

    @gvanrossum gvanrossum added the 3.10 only security fixes label Jul 27, 2020
    @JSmith-BitFlipper
    Copy link
    Mannequin

    JSmith-BitFlipper mannequin commented Aug 6, 2020

    As per Guido's comment (#21516 (comment)), I'm going to use this space to discuss ways to go forward with resource tracking and SharedMemory.

    Taking inspiration from Vinay (https://bugs.python.org/issue37754#msg351445), I think the simplest and best way forward is to use a small section of the shared memory at the start as a reference counter.

    Every time a process latches onto a shared memory block, it does and atomic increment to the reference counter. And if it detaches, it does an atomic decrement. This atomic operations are available in C via hardware specific instructions. This would require modifying the Python C code posixshmem.c. It should not be a difficult change.

    This would then change the SharedMemory API such that a call to close() could check the reference count at the end, and aromatically unlink if it reaches 0. Basically, the purpose of the explicit unlink() call is dissolved.

    I think this would also play nice with the current implementation of the resource_tracker. A small change would need to take place such that it calls close() instead of unlink() as the clean up function. Nonetheless, it would keep track if all attachments of shared memory call close() at the end, which they should, and issue a warning if they do not. It would do this with the current code, no need to change anything.

    @gvanrossum
    Copy link
    Member

    I recommend bringing this new proposal up on python-dev or python-ideas, to get more eyeballs on the ideas before attempting implementation. One immediate worry I have is that if the reference counter is maintained in the shared memory segment, every process has to participate, and if a process crashes (segfaults) the shared refcount will be forever wrong and the segment will leak. Distributed GC is hard!

    @vinay0410
    Copy link
    Mannequin

    vinay0410 mannequin commented Aug 6, 2020

    That's a valid point Guido. But, I guess this can be easily handled by resource tracker. At this current moment resource tracker unlinks shared memory if the process which created it dies without unliking it.

    Therefore, resource tracker handles cleaning up resources which the respective processes couldn't or didn't do.

    So, instead of unlinking the shared memory segment, resource tracker can instead decrement the reference count, if the process failed to do so, and if the reference count becomes 0, then unlink the shared memory segment.

    This approach will ensure that even if the respective processes died unexpectedly, there are no leaks.

    @JSmith-BitFlipper
    Copy link
    Mannequin

    JSmith-BitFlipper mannequin commented Aug 6, 2020

    Unless the resource_tracker also dies along with the process. In which case, I'm not sure what there is there to do.

    I believe the resource_tracker actually spawns a process alongside the process that uses it. So if the parent process seg-faults, the resource_tracker should still be alive.

    @vinay0410
    Copy link
    Mannequin

    vinay0410 mannequin commented Aug 6, 2020

    Well, the chances of resource tracker dying abruptly are very less because it's thoroughly tested, and there are mechanisms to re-spawn resource_tracker process if you see the code. There is a function called def ensure_running.

    Resource tracker is still alive even if the process for which it was created dies. It also handles cleaning shared semaphores. So, I guess this is something we can rely on for cleaning up things, because at the end of the day that's what it was made for.

    @JSmith-BitFlipper
    Copy link
    Mannequin

    JSmith-BitFlipper mannequin commented Aug 6, 2020

    Agreed.

    @vinay0410
    Copy link
    Mannequin

    vinay0410 mannequin commented Aug 7, 2020

    As suggested by Guido I have floated this solution to python-dev mailing list.
    Link to archive: https://mail.python.org/archives/list/python-dev@python.org/thread/O67CR7QWEOJ7WDAJEBKSY74NQ2C4W3AI/

    @keven425
    Copy link
    Mannequin

    keven425 mannequin commented Feb 24, 2021

    Agree w/ PR here to remove resource tracker unlinking as a quick fix: #15989

    This will at least make the unlink behavior more controllable, which is not the case currently (on mac and linux).

    Would love to have this merged.

    @turicas
    Copy link
    Mannequin

    turicas mannequin commented Mar 8, 2021

    Based on changes at #15989 I've monkey-patched multiprocessing.resource_tracker so my current applications (running on Python 3.9.2) won't be affected. The code may be useful to others while the PR is not merged and we don't have a new release - you just need to call the remove_shm_from_resource_tracker function inside each Process target function.

    ----- >8 -----

    from multiprocessing import resource_tracker
    
    
    def remove_shm_from_resource_tracker():
        """Monkey-patch multiprocessing.resource_tracker so SharedMemory won't be tracked
    More details at: https://bugs.python.org/issue38119
    """
    
        def fix_register(name, rtype):
            if rtype == "shared_memory":
                return
            return resource_tracker._resource_tracker.register(self, name, rtype)
        resource_tracker.register = fix_register
    
        def fix_unregister(name, rtype):
            if rtype == "shared_memory":
                return
            return resource_tracker._resource_tracker.unregister(self, name, rtype)
        resource_tracker.unregister = fix_unregister
    
        if "shared_memory" in resource_tracker._CLEANUP_FUNCS:
            del resource_tracker._CLEANUP_FUNCS["shared_memory"]
    ----- 8< 

    There's a working example in attached file mprt_monkeypatch.py (if you comment the function calls on lines 28 and 37 the warnings will be shown).

    @stevenewcomb
    Copy link
    Mannequin

    stevenewcomb mannequin commented Apr 4, 2021

    Sometimes a leak is exactly what's wanted, i.e. a standing block of shared memory that allows sharing processes come and go ad libitum. I mention this because I haven't seen anyone mention it explicitly.

    While turicas's monkeypatch covers the use case in which a manually-named block of shared memory is intended to remain indefinitely, it would be best if future versions of shared_memory allowed for such a use case, too. It shouldn't be necessary to monkeypatch in order to have a standing block of shared memory remain ready and waiting even when nobody's using it.

    @TKaluza
    Copy link
    Mannequin

    TKaluza mannequin commented Feb 3, 2022

    On the long run: Maybe this could solve some win32api problems?

    https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @root-11
    Copy link

    root-11 commented Apr 11, 2022

    Is the use case included where the shared memory object lives longer than the process that created it and where gc respects that shm shall not be gc'ed?

    Outline:

    1. main starts a subprocess with a queue for inter inter-proc message exchange.
    2. main sleeps.
    3. subprocess creates a shared memory block
    4. subprocess posts the shm name (str) on the inter-proc queue
    5. subprocess finishes and exits.
    6. main is still busy.
    7. main reads the name from the queue
    8. main accesses to the shared_memory block
    9. main closes the shared_memory.

    If not, then I can add the test case here?

    @root-11
    Copy link

    root-11 commented Apr 13, 2022

    I've added a post here: https://discuss.python.org/t/multiprocessing-with-shared-memory-potential-fix-for-bug-82300/15012
    that works on windows & linux.

       I hope it's to some use?
    

    devmessias added a commit to devmessias/fury that referenced this issue Apr 13, 2022
    devmessias added a commit to devmessias/fury that referenced this issue Apr 13, 2022
    @buzmeg
    Copy link

    buzmeg commented Oct 19, 2022

    Context: This is all on Linux (Ubuntu 22.04.1 LTS) so I'm going to avoid talking about Windows

    Dumb question: Isn't SharedMemory supposed to work if a completely different language does the shm_open()?

    I do an shm_open() in a zig process and then tried to attach and close a SharedMemory() from Python. The resource tracker then complains with:

    λ /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
    

    and unlinks the shared memory.

    That's a touch ... surprising.

    If multiprocessing isn't supposed to handle this case, could somebody point me at a more cooperative Python module?

    Thanks.

    @ambv
    Copy link
    Contributor

    ambv commented Dec 3, 2022

    Both removing the resource tracking as well as introducing the reference counting are backwards incompatible. In the former case, the danger of breaking users' software is much lesser though because:

    1. if the parent process Python code was using SharedMemoryManager anyway, the blocks will get cleaned up anyway;
    2. if the parent process Python code was exposing the shared memory block to other programming languages, it will continue working as expected;
    3. if the parent process Python code was using an explicit name for the shared memory block, restarting the process will fail with a clear error that the name is used;
    4. if the parent process Python code creates a new implicitly named shared memory block every time, you have a memory leak in the system.

    Only cases 3. and 4. are disruptive to the user, both in trivially fixable ways. If we adopted the solution from GH-23174, the situation would be more tricky because:

    1. processes unaware that the first 4 bytes are special would misuse the shared memory block created with the patched multiprocessing.shared_memory;
    2. and vice versa: Python would be unable to access (via multiprocessing.shared_memory) shared memory blocks created by other programming languages;
    3. a shared memory block must be able to outlive its creating process, this is not only expected but also currently documented in multiprocessing.shared_memory as a valid use case;
    4. the user can still say track_resource=False and thus cause memory leaks.

    Therefore, I think what we should do is the following:

    1. Redo bpo-38119: Fix shmem resource tracking #15989 (superseded by gh-82300: Disable resource tracking & auto-destruction of SharedMemory blocks on POSIX #21516) to disable resource tracking of shared memory blocks by default but add an optional track_resource= as bpo-38119: fix shared memory's resource tracking #23174 is doing. The default should be false.
    2. The documentation needs updating to address how to write code that is executable on Python versions before the patch and after the patch.

    Note that the documentation is already recommending use of SharedMemoryManager and already says that "As a resource for sharing data across processes, shared memory blocks may outlive the original process that created them. When one process no longer needs access to a shared memory block that might still be needed by other processes, the close() method should be called. When a shared memory block is no longer needed by any process, the unlink() method should be called to ensure proper cleanup.", implying that this process is not happening by default.

    @buzmeg
    Copy link

    buzmeg commented Oct 12, 2023

    Can we please get a "track=True" default on the initializer and fix the semantics on close when "track=False"?

    This issue has been open almost 4 years. I commented on it almost a year ago.

    Adding "track=True" in the init and having those of us who don't want tracking set it to False is:

    1. an effective solution for those of us who need this
    2. backward compatible
    3. Simple enough that it should be able to get approved (nothing more complicated seems to be able to get through discussion)

    Thanks.

    @gvanrossum
    Copy link
    Member

    This issue has been open almost 4 years.

    It looks like no core dev exists who feels confident they understand this API well enough to change this API without accidentally breaking existing code. If someone here is still listening who can submit an implementation, please go ahead! It's much easier as a core dev to find time for a code review than to find time to implement an idea from scratch (even if the idea is about changing existing code).

    @pitrou
    Copy link
    Member

    pitrou commented Oct 12, 2023

    @pan324 @buzmeg A PR would be welcome.

    @buzmeg
    Copy link

    buzmeg commented Oct 15, 2023

    @pan324 Thanks for putting in the PR. You saved me the time and effort. Let me know if there is something you need me to do.

    @gvanrossum Thanks for breaking the deadlock on this. What would be an appropriate timeframe to start jumping up and down about this not getting reviewed and committed?

    @gvanrossum
    Copy link
    Member

    Start jumping when alpha 2 is released without this.

    @buzmeg
    Copy link

    buzmeg commented Nov 30, 2023

    @gvanrossum You said I should start jumping up and down when Alpha 2 is released without this.

    Alpha 2 has been released without this. I'm jumping up and down. 😄

    @pitrou @pan324

    @gpshead gpshead moved this to In Progress in Multiprocessing issues Nov 30, 2023
    gpshead added a commit that referenced this issue Dec 5, 2023
    Add a track parameter to shared memory to allow resource tracking via the side-launched resource tracker process to be disabled on platforms that use it (POSIX).
    
    This allows people who do not want automated cleanup at process exit because they are using the shared memory with processes not participating in Python's resource tracking to use the shared_memory API.
    
    Co-authored-by: Łukasz Langa <lukasz@langa.pl>
    Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    @gpshead
    Copy link
    Member

    gpshead commented Dec 5, 2023

    The PR adding the track=True keyword argument feature to multiprocessing.shared_memory.SharedMemory to allow tracking to be disabled by passing False is in for 3.13.

    TODO list (i'll put this in the opening comment up top for visibility):

    Given the way POSIX "works" (or doesn't, depending on your point of view) I think this is all we can meaningfully do given the platform behavior.

    @gpshead gpshead added the docs Documentation in the Doc dir label Dec 5, 2023
    @gpshead
    Copy link
    Member

    gpshead commented Dec 5, 2023

    Thanks for all the discussion & proposed PRs everybody.

    @buzmeg
    Copy link

    buzmeg commented Dec 11, 2023

    @pan324 Thanks for continuing to bang on the PR until it got through.

    @gpshead Thanks for shepherding this in.

    @gvanrossum
    Copy link
    Member

    Thanks Greg. I was out of my depth here…

    aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
    …ython#110778)
    
    Add a track parameter to shared memory to allow resource tracking via the side-launched resource tracker process to be disabled on platforms that use it (POSIX).
    
    This allows people who do not want automated cleanup at process exit because they are using the shared memory with processes not participating in Python's resource tracking to use the shared_memory API.
    
    Co-authored-by: Łukasz Langa <lukasz@langa.pl>
    Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this issue Jun 2, 2024
    Summary:
    Fix resource tracker destroys shared memory segments when other processes should still have valid access
    
    Upstream issue: python/cpython#82300
    Upstream PR: python/cpython@81ee026
    
    Reviewed By: itamaro
    
    Differential Revision: D58048029
    
    fbshipit-source-id: 087e18caea2a106a1d4d30eec102fcf2c919cb7b
    @tanrbobanr
    Copy link

    tanrbobanr commented Jun 15, 2024

    Glad to see this get resolved. For a backward-compatible fix, I have found that the below works well in case anybody needs this in earlier Python versions:

    import sys
    import threading
    from multiprocessing import resource_tracker as _mprt
    from multiprocessing import shared_memory as _mpshm
    
    
    if sys.version_info >= (3, 13):
        SharedMemory = _mpshm.SharedMemory
    else:
        class SharedMemory(_mpshm.SharedMemory):
            __lock = threading.Lock()
    
            def __init__(
                self, name: str | None = None, create: bool = False,
                size: int = 0, *, track: bool = True
            ) -> None:
                self._track = track
    
                # if tracking, normal init will suffice
                if track:
                    return super().__init__(name=name, create=create, size=size)
    
                # lock so that other threads don't attempt to use the
                # register function during this time
                with self.__lock:
                    # temporarily disable registration during initialization
                    orig_register = _mprt.register
                    _mprt.register = self.__tmp_register
    
                    # initialize; ensure original register function is
                    # re-instated
                    try:
                        super().__init__(name=name, create=create, size=size)
                    finally:
                        _mprt.register = orig_register
    
            @staticmethod
            def __tmp_register(*args, **kwargs) -> None:
                return
    
            def unlink(self) -> None:
                if _mpshm._USE_POSIX and self._name:
                    _mpshm._posixshmem.shm_unlink(self._name)
                    if self._track:
                        _mprt.unregister(self._name, "shared_memory")

    This is likely slightly less efficient than simply copying the 3.13 version of SharedMemory, but is a lot smaller (especially given only a few things need to happen in order for tracking to be avoided).

    Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
    …ython#110778)
    
    Add a track parameter to shared memory to allow resource tracking via the side-launched resource tracker process to be disabled on platforms that use it (POSIX).
    
    This allows people who do not want automated cleanup at process exit because they are using the shared memory with processes not participating in Python's resource tracking to use the shared_memory API.
    
    Co-authored-by: Łukasz Langa <lukasz@langa.pl>
    Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
    Co-authored-by: Antoine Pitrou <pitrou@free.fr>
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    sjj-star pushed a commit to sjj-star/parallel_simulator that referenced this issue Dec 26, 2024
    See: python/cpython#82300
    the patch overload 'register'&'register' of 'resource_tracker'
    kernc added a commit to kernc/backtesting.py that referenced this issue Feb 18, 2025
    kernc added a commit to kernc/backtesting.py that referenced this issue Feb 18, 2025
    kernc added a commit to kernc/backtesting.py that referenced this issue Feb 19, 2025
    @mbrinz
    Copy link

    mbrinz commented Mar 2, 2025

    my application is using ShareableList() which makes use of SharedMemory() => shared_memory.py .. actually I was glad to see that the track parameter got added to SharedMemory() as of 3.13 onwwards, but I was wondering why this track parameter was not added to ShareableList() also ... doing this will help standalone client processes to close without destroying the ShareableList() of the server process.
    Below you see the necessary adaptations in the code for this:

    `class ShareableList:

    def __init__(self, sequence=None, *, name=None, track=True):
    
        if name is None or sequence is not None:
    
            self.shm = SharedMemory(name, create=True, size=requested_size, track=track)
        else:
            self.shm = SharedMemory(name, track=track)
    

    `

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
    Projects
    Status: In Progress
    Development

    No branches or pull requests