-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issues while using Hybrid Engine #1516
Comments
I run your test code and I can confirm that there is memory leak in hybrid NDManager. On every inference there is one NDArray is leaked. I will debug more and see how to fix it. |
The root cause is we lost track of alternativeManager when a new NDManager is attached to the NDArray: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/ndarray/NDArrayAdapter.java#L70 Once a new NDManager is attached to NDArray, all NDArray created should under the new NDManager. For OrtNDArray, since all NDArray is created by alternativeManager, so the alternativeManager should also be updated. the alternativeManager should either be a child of new NDManager of point to the new NDManager itself. |
When I use the onnx model and Pytroch's PtNDArray, it seems that this issue still arises |
Do you have code that can reproduce your issue? |
A code example:
environmental information: |
What's the error message and stacktrace? I don't think this error is related to memory leak. can you print out the shape of |
Description
We have increased memory consumption when using ONNX together with another engine (Pytorch or MxNet). In the example code I made, with fixed inputs and a single-threaded predictor, the memory creeps up very slowly in values of 20MB.
In around 40mins, memory increases circa 1GB. The Java heap is fine, but RSS memory of the Java process continues increasing. The increase of memory happens a lot faster in our production environment with variable inputs than in the example I made, possibly because of the rate of predictions.
This also happens exclusively while using the hybrid engine (which I understand happens when I try an operation that is currently not supported by the onnx engine). If I don't do any unsupported operations, I don't have any problems.
We've also noticed that the memory problem happens when I hold the model loaded with
ModelZoo.loadModel
for a while. If I always load the model and close it before every predictions, I don't have any memory problems, although inferences seem slower. It seems to me that some residual memory is attached to the model resources with every prediction and will only be released when the model is released.Expected Behavior
That memory would not increase over time when doing predictions.
How to Reproduce?
I made an example App here
The critical part is in the translator:
Model checkpoint:
https://drive.google.com/file/d/1Hm8Q4CnAjpychj3L3c4C4p6pQ517ZLef/view?usp=sharing
What have you tried to solve it?
We were able to workaround this issue by controlling the NDManager of the alternative engine by ourselves, for example on our translator:
Environment Info
The text was updated successfully, but these errors were encountered: