-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lockup while running files cp
#7844
Comments
It isn't a database issue, since the issue is fixed as soon as the daemon is restarted. It happened again, this time I collected the debug information and killed it with sigabrt to get the stack trace. But it was too long for the system log. Hope that's helpful. |
@RubenKelevra We are looking into this, and I have a couple of questions.
If GC were to run between: |
Thanks!
Normal operation. I've rebooted the machine because I did a minor kernel update.
GC isn't expected to run, since there's a lot of free space for ipfs configured. But auto-gc is on for running when there's 90% of storage filled. The GC was run after upgrading the daemon was upgraded while the cluster daemon was turned off. So no operation should have been running on the ipfs client. After the update and the GC run the normal cluster operation is run again. I can't say what exactly the cluster was doing at the time that this happened, but the daemon got stuck multiple times when this commands was run. So not sure how the block can disappear between both operations. Tbh I think the MFS is corrupted since I'm now no longer able to start the daemon, see #7845. I've removed the startup timeout (which was 15 minutes) which killed the daemon multiple times due to the timeout reached. Now the daemon won't start within 24 hours on a pretty high performance machine with a flatfs storage on a fast SSD.
The network cannot provide this block, since I've just added this file with this content and try to move it in the MFS before recursively sharing the folder with it in it in the cluster. |
I forgot to make clear that I see currently three possible scenarios:
Edit: Since IPFS runs on ZFS we can snapshot the current state and do some commands to try to work around this state. Additionally its all publicly available data, so I'm happy to share the ipfs database and the flatfs content as a tar.gz when this helps the debugging efforts, too. |
So far, I have not been able to reproduce this problem. It does appear that the cause is that MFS is somehow corrupted, particularly given the related issues. At this point, I think it would be useful to get your db and flat fs content -- if possible a minimum dataset that still exhibits the problem. Hopefully, the nature of any corruption found will give some indication of the possible cause. |
@gammazero wrote:
I've packed the whole ipfs folder and just removed the key files and the identity from the config. The server providing it is a bit slow, hope this works for you. /ipfs/QmVx4BqSsQnhiYdnLbqA3zCXzteXBb7hvj6rQXDfyqxRJ8 |
Will pin to my cluster, to help deliver :) |
@RubenKelevra could you see if you can get a gateway to see it? My node's been searching for yours for ages now |
Just connect to the node, I guess with
I'll also run |
Got it. Pinned! |
@RubenKelevra correct me if I'm wrong but is your repo 140GB?! I may have to pin it on only one machine, and temporarily at best, if so |
yeah, it's around that size - that's why I had to put it on a slower server ;D You can just pin it and unpin it again, we just need it for providing it a bit faster :) |
I just found this problem as well, trying to copy files from libgen to my localstorage using EDIT: for clarification, if I let the daemon restart and wait a couple minutes before making a copy it will hang. Copies apparently only work for me right after the restart. EDIT 2: after some tests, I've noticed that sometimes even right after the daemon restart copies are impossible. Maybe this is a problem with the protocols? Is there a way to deactivate protocols like QUIC for me to test and report on it? Thank you! |
Probably related to #6113. |
@schomatis maybe just block any |
I decided to remove the automatic operation and do the operation after completing my loop of tasks. So if the Repo gets too big I'll run a GC: |
Version information:
go-ipfs version: 0.9.0-dev-2ed925442
Repo version: 11
System version: amd64/linux
Golang version: go1.15.6
Description:
I have a script that had run successfully for 160 days, now I updated it to 0.8rc1 and had run the garbage collection and it seems that the garbage collection did damage the MFS.
When I run the following commands, the IPFS API call will just get stuck:
I aborted it after 3 minutes.
Btw: It would be nice if there would be a timeout for such operations, that it at least fails with a timeout itself,
The text was updated successfully, but these errors were encountered: