Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear external tileset skeletons from tile tree to save memory usage #1107

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

azrogers
Copy link
Contributor

Closes #739. Currently, though the content of tiles are unloaded when no longer used, the "skeletons" - the Tile objects - created by loading external tilesets are never unloaded. This can cause memory usage to steadily increase. This change implements a _doNotUnloadCount number on each tile that tracks situations where the tile's pointers are still in use. When a tile is in a situation where its pointer is being used - the tile is being loaded, for example - it increments this count on the tile and each of its parent tiles, and when the pointer is no longer needed, this counter is decremented up the tree as well. This means we can clear the children of external tilesets when their _doNotUnloadCount number is 0. This implementation also includes a TileDoNotUnloadCountTracker class, enabled with the CESIUM_DEBUG_TILE_UNLOADING switch, that tracks the source of every modification to a tile's _doNotUnloadCount.

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @azrogers, this is so good! I tried flying around with Google Photorealistic 3D Tiles in Cesium for Unreal, and memory usage stays extremely steady. Previously I believe it would have gone up quite quickly. I didn't see any crashes or other dodgy behavior either. This will be a major improvement for our users!

Mostly small comments here, but I did notice a couple of cases where I think there's potential for (rare) bad behavior.

I think it's also worth taking a bit of time to think through whether there could be any other gotchas like this, and generally do everything we can to thoroughly test everything. Perhaps bring back the soak test from #1415?

@azrogers
Copy link
Contributor Author

@kring Looks like reversing the direction of iteration, as well as unloading empty tiles, worked great! I'll take a look at that soak test and think of some other ways we could verify correct behavior here.

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small things here. Also please update CHANGES.md.

@azrogers
Copy link
Contributor Author

Bringing the soak test up-to-date (now in CesiumGS/cesium-unreal#1615 for reasons detailed in that PR) did identify some crashes, including issues with sampleHeightMostDetailed. I'll try to fix those now.

@azrogers
Copy link
Contributor Author

I realize now that sampleHeightMostDetailed never showed up on your list of Tile pointer references in the previous PR because the functionality hadn't been implemented at that point 😅

@azrogers
Copy link
Contributor Author

It's now tracking most of the tile pointer usages in TilesetHeightQuery, though it still crashes when running the test in Unreal. However, it also crashes when running the soak test from CesiumGS/cesium-unreal#1615, so I might in fact be correctly tracking all the usages in TilesetHeightQuery and it's crashing from an unrelated pointer use that I haven't found yet. Still looking into it, but might need to finish up solving this on Monday.

@azrogers
Copy link
Contributor Author

Almost fixed all the issues. First up, the fix for the issue where external tilesets would sometimes be unloaded while some of their children were ContentLoading. This confused me, since it's something we're explicitly flagging to avoid. After a lot of log statements and trial and error, I found out that the issue is here:

setTileContent(tile, std::move(pair.result), pair.pRenderResources);
tile.decrementDoNotUnloadCount("TilesetContentManager::loadTileContent done loading");

The issue with these two lines is that, despite the decrementDoNotUnloadCount call coming after the call to setTileContent, which updates the tile out of the ContentLoading state, decrementDoNotUnloadCount would reliably run before the tile's state was updated. How is this happening? setTileContent isn't a future, it blocks until completion. setState isn't doing anything funky either. It gets even more confusing when the fix turned out to be just adding decrementDoNotUnloadCount after every setState call in setTileContent instead of calling it in the loadTileContent future. How does this change anything? I have no idea. But it works, the _doNotUnloadCount is now decremented after the tile leaves the ContentLoading state.

The other fix at least makes sense. Turns out, while _loadedTiles is reliably in order from top to bottom of the tree after the root tile, this isn't a guarantee before the root tile. I'm fairly sure this is because some of a tile's children can be visited for rendering without visiting the others, meaning those tiles and their ancestors will be dragged to the end of _loadedTiles, but the ones that are no longer rendering will stay in their original position. Then, when cleaning up those more recently rendered tiles, we will end up cleaning up the ancestors before the children, resulting in those pointers still in the _loadedTiles list turning into junk data.

The easiest solution is just to make sure we're unloading the entirety of _loadedTiles at once, ensuring that every parent and child is removed from _loadedTiles before we clear any children. But the time limit is there for a reason! So the best solution I came up with is a _clearChildrenRecursively method that allows us to bail out at any time:

bool Tileset::_clearChildrenRecursively(Tile* pTile) noexcept {
  // Iterate through all children, calling this method recursively unless the
  // child is still in _loadedTiles. If the child is still in _loadedTiles, or
  // any of its children (or children's children and so on) are in _loadedTiles,
  // we return false so we don't clear the children prematurely.
  for (Tile& child : pTile->getChildren()) {
    if (this->_loadedTiles.contains(child) ||
        !_clearChildrenRecursively(&child)) {
      return false;
    }
  }

  pTile->clearChildren();

  return true;
}

That way, if there are children that still haven't been unloaded, we will hold off on clearing the children. And if this method returns false, we also hold off on removing the external tileset Tile from _loadedTiles so that we can handle it next frame, or the frame after that, or however long it takes for all of its children to be unloaded. It's not ideal that we have to walk the tree every time when we're potentially still in a state where we can't unload the children, but it's the best I've come up with so far.

The remaining issue is a result of this fix - because the external tileset Tile can remain Unloaded for an indefinite amount of time without clearing its children, this opens us up to "Children already created" errors. We can't just set the Tile to Unloading either while it's pending, because this will mean that if we need the tile and its children to render, we will have to wait for all the tile's previous children to unload and be cleared before we can get it to an Unloaded state to be able to start the loading process again. So, I either need to figure out a way to make sure the external tileset has all of its children cleared on the same frame as it is unloaded, or I need to figure out how to re-use those children that haven't been cleared yet when the tile is picked back up for rendering.

@azrogers
Copy link
Contributor Author

On the first ordering issue: you can test this out by checking out 140a7d9 and adding log statements to both Tile::decrementDoNotUnloadCount and Tile::setState. It was occurring for me in Unreal building with MSVC - results may vary in other engines and compilers.

@kring
Copy link
Member

kring commented Feb 19, 2025

The issue with these two lines is that, despite the decrementDoNotUnloadCount call coming after the call to setTileContent, which updates the tile out of the ContentLoading state, decrementDoNotUnloadCount would reliably run before the tile's state was updated. How is this happening?

Do you have an insight into why this matters? Only the main thread should ever look at the doNotUnload count, and also only the main thread should look at the tile state. And finally, only the main thread should modify either of these. Therefore I can't see how it would matter in the slightest whether the count was updated before or after the load state was updated, because nothing should ever be able to observe one updated without the other, regardless of their order. In fact, the compiler is free to reorder these if it feels like it.

If any of those assumptions are violated (i.e., if any Tile property is read or written off the main thread), then that's the real problem.

We do, in some cases, create Tile instances in a worker thread, and then transfer ownership of them to the main thread. This should be fine because the transfer itself will create the necessary memory barrier to avoid partial updates of the Tile being visible. Are there any other cases where we might have thread-unsafe access to a Tile?

@azrogers
Copy link
Contributor Author

Ok, I believe I've finally solved it! Turns out, there's two reasons that we do in fact need to remove children from _loadedTiles recursively:

  1. It's possible for a tile to be created Unloaded (byTile::createChildTiles), get visited and added to _loadedTiles, but not get loaded by the time the parent external tileset is unloaded and has its children cleared. Because the tile never had to go through unloadTileContent to become Unloaded, it doesn't count towards the count of tiles still yet to be unloaded, but it nevertheless is in _loadedTiles.
  2. The root tiles of loaded external tilesets, which are created as empty tiles with children, will be considered unconditionally refined unloaded external tilesets if we set them to Unloaded and remove their content. This is what was responsible for the Children already created issues. The fix for this is simply not letting empty tiles count towards the "still not unloaded" count of their parents, and cleaning them up as we're clearing their external tileset parent's children. You can read my full reasoning here.

I ran some quick comparisons with the tile loading soak test (CesiumGS/cesium-unreal#1615) to show that this does indeed clean up external tilesets:
Total Unloaded Tiles Over Time (main)

Total Unloaded Tiles Over Time (unload-external-tilesets-2)

@kring kring added this to the March 2025 Release milestone Feb 21, 2025
@kring
Copy link
Member

kring commented Feb 24, 2025

For my own reference, I created an inventory of how the two counts in a Tile are used:

_doNotUnloadCount

  • Increment
    • When the value in a child tile is incremented, the parent tile is, too.
    • When a tile is added to the tilesFadingOut list.
    • When a tile starts async loading.
    • When a tile is added to candidateTiles / additiveCandidateTiles lists for height queries.
    • When a tile is added to _heightQueryLoadQueue
  • Decrement
    • When the value in a child tile is decremented, the parent tile is, too.
    • When a tile is removed from the tilesFadingOut list.
    • When a tile finishes async loading.
    • When a tile is removed from candidatedTiles / additiveCandidateTiles lists for height queries.
    • When a tile is remove from _heightQueryLoadQueue
  • Usage
    • In TilesetContentManager::unloadTileContent, if the tile has external content, it is not unloaded if this count is greater than 0.

Summary:
This counter tracks the number of outstanding pointers to this Tile or its children. When there are any outstanding pointers at all, we can't destroy the tile.

_tilesStillNotUnloadedCount

  • Increment:
    • When the value in a child tile is incremented, the parent tile is, too.
    • In createChildTiles, tilesStillNotUnloadedCount of new children is added to current tile and propagated up the tree.
    • In Tile constructor if the tile starts in a state other than Unloaded, and it's not UnknownContent, and it's not EmptyContent.
    • When TilesetContentManager::setTileContent transitions the tile to the ContentLoaded state and it's not UnknownContent and it's not EmptyContent.
  • Decrement:
    • When the value in a child tile is decremented, the parent tile is, too.
    • In TilesetContentManager::unloadTileContent when it transitions the tile back to the Unloaded state.
  • Usage:
    • In TilesetContentManager::unloadTileContent, if the tile has external content, it is not unloaded if this count is greater than 1.

Summary:
This counter tracks the number of tile's in this subtree (counting itself) that have content that needs to be unloaded. All content in the subtree must be unloaded before the external tileset can be unloaded and the children can be cleared.

@kring
Copy link
Member

kring commented Feb 24, 2025

Based on the above, I think it's possible to collapse this back down to a single count, _doNotUnloadCount. Basically, everywhere we would increment/decrement _tilesStillNotUnloadedCount, we instead increment/decrement _doNotUnloadCount but starting with the parent tile instead of the current one. The logic is that when a tile has loaded content, the parent tile's subtree cannot yet be unloaded.

Also, let's rename _doNotUnloadCount to _doNotUnloadSubtreeCount. A tile's renderable content can be unloaded no matter the value of this count, it's only the subtree that cannot be unloaded.

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! A few suggestions that might help clean things up a bit.

@@ -122,9 +163,19 @@ void Tile::createChildTiles(std::vector<Tile>&& children) {
throw std::runtime_error("Children already created.");
}

int32_t prevLoadedContentsCount = this->_tilesStillNotUnloadedCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for any of these children to have loaded content at this point. I can't think of how they would, at least. Am I mistaken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it is possible - in TilesetJsonLoader::createLoader when we call createChildTiles with the root tile from parseTilesetJson. When parseTilesetJson is called with an implicit tileset, that root tile is created with external content which gives it the ContentLoaded state. Checking in createChildTiles if any children have the ContentLoaded state and ticking up the _doNotUnloadSubtreeCount for each one solves the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting! Implicit tiling has been a bit of a blind spot for me in reviewing this so far. We should definitely make sure implicit tilesets are still working well in this PR, if you haven't already.

It doesn't need to be part of this PR, but we should eventually support unloading implicit tilesets, too. The rules there are a bit different, though. With an external tileset, once we're sure the entire thing is unused, we can unload all of it. It's all or nothing. However, with implicit tiling, we can recreate explicit tiles for any part of the implicit tree, so it's valid to unload any unused subtree.

@azrogers
Copy link
Contributor Author

Removing _tilesStillNotUnloadedCount and just incrementing the _doNotUnloadSubtreeCount on the tile's parent worked great! Unfortunately I have yet to figure out exactly how to resolve the empty tile content issue in TilesetJsonLoader as you described - will experiment further.

@azrogers
Copy link
Contributor Author

@kring Successfully got the fix for empty tile handling implemented. I believe that's everything!

Copy link
Member

@kring kring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better and better, but I still have a few comments. Thanks for your patience and ongoing work on this @azrogers!

@@ -185,6 +185,7 @@ option(CESIUM_TESTS_ENABLED "Whether to enable tests" ON)
option(CESIUM_GLM_STRICT_ENABLED "Whether to force strict GLM compile definitions." ON)
option(CESIUM_DISABLE_DEFAULT_ELLIPSOID "Whether to disable the WGS84 default value for ellipsoid parameters across cesium-native." OFF)
option(CESIUM_MSVC_STATIC_RUNTIME_ENABLED "Whether to enable static linking for MSVC runtimes" OFF)
option(CESIUM_DEBUG_TILE_UNLOADING "Whether to enable tracking of tile _doNotUnloadCount modifications for tile unloading debugging." OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_doNotUnloadSubtreeCount

*
* This function is not supposed to be called by clients.
*/
int32_t getDoNotUnloadCount() const noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDoNotUnloadSubtreeCount

Comment on lines 17 to 23
#include <optional>
#include <span>
#include <string>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif
#include <vector>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format won't reorder around an ifdef (for good reason), so let's put the ifdef'd header at the end.

Suggested change
#include <optional>
#include <span>
#include <string>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif
#include <vector>
#include <optional>
#include <span>
#include <string>
#include <vector>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif

Comment on lines 14 to 21
#include <cstdint>
#include <memory>
#include <stdexcept>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif
#include <utility>
#include <vector>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here.

Suggested change
#include <cstdint>
#include <memory>
#include <stdexcept>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif
#include <utility>
#include <vector>
#include <cstdint>
#include <memory>
#include <stdexcept>
#include <utility>
#include <vector>
#ifdef CESIUM_DEBUG_TILE_UNLOADING
#include <unordered_map>
#endif

Comment on lines +169 to +171
if (tile.getState() == TileLoadState::ContentLoaded) {
++this->_doNotUnloadSubtreeCount;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still to handle the root tile of implicit tilesets, which are constructed with TileExternalContent, right?

If so, I don't have high confidence it will work correctly in all cases. For one thing, an implicit root need not be at the root tile of the tileset.json (though it often is). Second, this code is only incrementing the count on this tile, not propagating it up the tree.

Comment on lines +268 to +269
(*tileIt)->decrementDoNotUnloadSubtreeCount(
"Tileset::_updateLodTransitions in render list");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines +281 to +282
(*tileIt)->decrementDoNotUnloadSubtreeCount(
"Tileset::_updateLodTransitions done fading out");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

if (!tilesNeedingChildrenCleared.empty()) {
for (Tile* pTileToClear : tilesNeedingChildrenCleared) {
CESIUM_ASSERT(pTileToClear->getDoNotUnloadCount() == 0);
_clearChildrenRecursively(pTileToClear);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_clearChildrenRecursively(pTileToClear);
this->_clearChildrenRecursively(pTileToClear);

Comment on lines +1662 to +1664
// The root tile marks the beginning of the tiles that were used for rendering
// last frame. By iterating backwards starting from this position, we ensure
// that descendants will always be unloaded before their ancestors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change to flip the order of traversal still needed? The problem with it, which I should have realized when I first suggested it, is that we're no longer unloading the least-recently-used tiles first.

I think reverting this will mean that external tileset loading can be delayed by a frame. The tile with external content will often be considered first and deemed not unloadable, but then other tiles will be unloaded, which would allow the tile with external content to be unloaded. But this won't be noticed until the next frame.

I think this is fine. From a performance perspective, this may even be a good thing, because needing to (re-)load external tilesets before loading the content of tiles in the subtree adds quite a bit of latency.

However, it reminds me that external tilesets are currently not counted at all in getTotalDataBytes. They always have a size of zero. I don't think this is catastrophic, and I don't think it's essential to fix in this PR or in the next release, but we should write an issue and fix it as a followup.

return loadInput.asyncSystem.createResolvedFuture<TileLoadResult>(
TileLoadResult{
TileEmptyContent{},
_upAxis,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_upAxis,
this->_upAxis,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak when using external tilesets.
2 participants