Skip to content
This repository was archived by the owner on Apr 29, 2020. It is now read-only.

v0.4.0 - Unable to pin large directories #19

Closed
sdockray opened this issue Feb 2, 2016 · 34 comments
Closed

v0.4.0 - Unable to pin large directories #19

sdockray opened this issue Feb 2, 2016 · 34 comments
Assignees

Comments

@sdockray
Copy link

sdockray commented Feb 2, 2016

I have been experimenting with adding a large directory on one node (origin) and then attempting to pin that directory on another node (destination). When I simply do: ipfs pin add HASH on the destination, few, if any, hashes get pinned, CPU and memory load are extremely high and ipfs eventually crashes.

As an alternative, I captured all of the hashes that came from the original add operation (one per line) and wrote this simple script to pin hashes one by one: https://github.com/sdockray/ipfs_pin_remote_hashes/blob/master/ipfs_pin_remote_hashes.sh
It is successful for a few hundred hashes (about 4-7gb). Along the way it will take several minutes, sometimes close to an hour for a single hash, and then eventually I start seeing Error: pin: merkledag: not found and the daemon has silently gone away.

The origin is Debian Jessie. I am still collecting data about the destinations but they are typically Debian as well, and all are running 0.4.0-dev. I don't see a whole lot in the logs that is very helpful.

@jbenet
Copy link
Contributor

jbenet commented Feb 3, 2016

Sorry you're having this problem @sdockray What are the specs of the machines? (The go runtime and our use of it is way more expensive than it should be)

cc @whyrusleeping @diasdavid

@sdockray
Copy link
Author

sdockray commented Feb 3, 2016

I'm not sure what to put exactly, but the basics are below.

Origin:
Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
MemTotal: 65953248 kB
MemFree: 30195284 kB
Debian GNU/Linux jessie/sid

Destination:
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
MemTotal: 264049788 kB
MemFree: 36578076 kB
Debian GNU/Linux 8 (jessie)

@jbenet
Copy link
Contributor

jbenet commented Feb 3, 2016

The 65GB and 264GB of memory seems like virtual mem to me. i'm wondering how much RAM the machines actually have,

@sdockray
Copy link
Author

sdockray commented Feb 3, 2016

On the origin dmidecode shows 4 slots like this one:

Handle 0x004B, DMI type 17, 34 bytes
Memory Device
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Speed: 1066 MHz

also

sudo cat /var/log/dmesg | grep Memory
[ 0.000000] Memory: 65933388K/67079912K available (5693K kernel code, 1119K rwdata, 2812K rodata, 1304K init, 844K bss, 1146524K reserved, 0K cma-reserved)

on the destination I unfortunately don't have root access and so the best i can do right now is:

free
total used free shared buffers cached
Mem: 264049788 227437292 36612496 4091940 834408 137124540
-/+ buffers/cache: 89478344 174571444
Swap: 0 0 0

@ghost
Copy link

ghost commented Feb 24, 2016

There's a PR under way which should improve pinning quite a bit: ipfs/kubo#2384

@RichardLitt
Copy link
Collaborator

That PR has been merged. @sdockray can you try on the new version, if this is still a problem?

@sdockray
Copy link
Author

I have been using distributed ipfs versions, so I did the go install, with gx, etc. just now and tried again and within a couple minutes (before actually pinning anything) the pin crashes ipfs

[x ~] ipfs pin add -r Qmaw9UT8gsgYGZVnFXgqD82x1moha4EPpngrq6vHi1tSq4
Error: Post http://127.0.0.1:5001/api/v0/pin/add?arg=Qmaw9UT8gsgYGZVnFXgqD82x1moha4EPpngrq6vHi1tSq4&encoding=json&r=true&stream-channels=true: EOF
[1]+ Exit 2 nohup ipfs daemon

I will keep trying with smaller large directories and see if there is some limit to the size or something

@daviddias
Copy link
Contributor

@sdockray does go-ipfs give any interesting error log? What is the size of the file were are talking about? And before the ipfs pin add -r, where have you added it to IPFS?

@sdockray
Copy link
Author

the error log from the crash is 56M and i can't see anything particularly noteworthy in it but i'm also not sure what i'm looking for.. at the beginning and end i didn't see anything eye catching. here is clip of beginning and end.

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f557aee6067 m=2

goroutine 0 [idle]:

goroutine 1 [chan receive, 7 minutes]:
main.daemonFunc(0x7f55785ec0a8, 0xc8203760c0, 0x7f55785ec260, 0xc820388070)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/cmd/ipfs/daemon.go:308 +0x1121
github.com/ipfs/go-ipfs/commands.(*Command).Call(0x152e420, 0x7f55785ec0a8, 0xc8203760c0, 0x0, 0x0)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/commands/command.go:110 +0x471
main.callCommand(0x7f55785ec180, 0xc820144800, 0x7f55785ec0a8, 0xc8203760c0, 0x152e420, 0x15106a0, 0x0, 0x0, 0x0, 0x0)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:344 +0xa84
main.(*cmdInvocation).Run(0xc8201447c0, 0x7f55785ec180, 0xc820144800, 0x0, 0x0, 0x0, 0x0)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:187 +0x1e5
main.main()
        /pathToHome/work/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:152 +0x65b

goroutine 17 [syscall, 7 minutes, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1

goroutine 6 [chan receive]:
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics.init.1.func2()
        /pathToHome/work/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics/metrics.go:321 +0x73
created by github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics.init.1
        /pathToHome/work/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics/metrics.go:328 +0x7b

goroutine 7 [syscall, 7 minutes]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37

goroutine 34 [select, 7 minutes]:
...

goroutine 2194930 [semacquire]:
sync.runtime_Semacquire(0xc82027933c)
        /usr/local/go/src/runtime/sema.go:43 +0x26
sync.(*Mutex).Lock(0xc820279338)
        /usr/local/go/src/sync/mutex.go:82 +0x1c4
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).updateReceiveCounters(0xc8202792c0, 0xc8ac2dd200, 0x0, 0x0)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/exchange/bitswap/bitswap.go:354 +0x52
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).ReceiveMessage.func1(0xc8247ceb60, 0xc8202792c0, 0x7f55785ec000, 0xc82016a0d0, 0xc82307bf80, 0x22, 0xc8ac2dd200)
        /pathToHome/work/src/github.com/ipfs/go-ipfs/exchange/bitswap/bitswap.go:335 +0x6b
created by github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).ReceiveMessage
        /pathToHome/work/src/github.com/ipfs/go-ipfs/exchange/bitswap/bitswap.go:346 +0x485

rax    0x0
rbx    0x7f557b257708
rcx    0xffffffffffffffff
rdx    0x6
rdi    0x8409
rsi    0x840a
rbp    0x10d86a8
rsp    0x7f557ae6f948
r8     0xa
r9     0x7f557ae70700
r10    0x8
r11    0x206
r12    0x2c
r13    0x10cd860
r14    0x0
r15    0x8
rip    0x7f557aee6067
rflags 0x206
cs     0x33
fs     0x0
gs     0x0

and as for the size...

size of large directory which was added to ipfs, you can see where in the pin command
x% du -sh processed
385G processed
count files and directories
x% find processed -type f | wc -l
52593
x% find processed -type d | wc -l
82933

i tried to pin some of the large subdirectories (about 1G) inside the big target directory and they are recursively pinned just fine.

@daviddias
Copy link
Contributor

Hi @sdockray, we've recently released go-ipfs 0.4.0 and with that a lot of perf and mem improvements came along. Could you check if you still have this issue with the latest version?

@sdockray
Copy link
Author

sdockray commented Apr 26, 2016

updated

i was able to finish adding the directory - first pass got 99% of the contents added. then i kept getting "blockservice is closed" until i stopped the ipfs daemon and tried again. then it completed the whole directory. moving on the testing the pinning now.
....

Hi @diasdavid - all of my comments on and after March 26 were with 0.4.0 (before that they were with 0.4.0-dev). I tried (a week ago and again just now) to start over again (re-adding the large directory before trying to re-pin it) and unfortunately under 0.4.0 I can no longer add the directory as I consistently get the following error (but at completely different stages of the process, 1%, 3%, 99%):

ERROR commands/h: blockservice is closed client.go:247

So unfortunately, I am stuck trying to recursively add this large directory (which never gave me problems in 0.4.0-dev) before I can test pinning it.

@sdockray
Copy link
Author

OK the pin failed again on a freshly added directory (407G) , running 0.4.0 everywhere.

ipfs pin add -r QmQr8LVFKWDMAR186oi6sEKhfVzoQPig3zcfLFWWkx9yrZ

outputs (after a few minutes):

Error: Post http://127.0.0.1:5001/api/v0/pin/add?arg=QmQr8LVFKWDMAR186oi6sEKhfVzoQPig3zcfLFWWkx9yrZ&encoding=json&r=true&stream-channels=true: EOF

the pin process and the daemon both exit. the rest of the error log is like the one above.

@whyrusleeping
Copy link

@sdockray could I get the full output log from when it crashes? That would be very helpful.

Also, a few things I'd like to have you try:

  • Run the daemon with the env var IPFS_REUSEPORT set to false
    • this is generally a source of excess thread resource consumption, it may help.
  • Also try: ipfs config --json Datastore.NoSync true
    • this disables a few fsync calls that our storage engine makes. If your system crashes during an add you will have less guarantees that data entered into ipfs is still there (the hashes printed to your screen might not be fully synced to disk). But on the other side, it greatly speeds up the add process.
  • After starting the add, in another terminal, run: curl localhost:5001/debug/pprof/profile > ipfs.cpuprof
    • this will hang for 30 seconds collecting profiling data. If you can provide me with that file and a copy of the running ipfs binary it will let me more easily investigate what its working on

@sdockray
Copy link
Author

@whyrusleeping

the full log is here: https://hostb.org/AW
ipfs.cpuprof is here: https://hostb.org/E5
the ipfs binary: https://hostb.org/N1

I haven't tried changing the Datastore.NoSync config item yet. I have set the IPFS_REUSEPORT env var

@whyrusleeping
Copy link

whyrusleeping commented Apr 27, 2016

I'm not sure what the cause of your specific failure is, but it looks like theres an issue in the bitswap provider connector. Its basically not working fast enough and thats causing a buildup of extra goroutines all the way back up the line. I have an open PR ( ipfs/kubo#2437 ) that improves that codepath so i will put effort into landing that sooner and report back

@spinda
Copy link

spinda commented Jul 28, 2016

@sdockray

It looks like @whyrusleeping managed to land ipfs/kubo#2437. Does it help with your issue? I'm also interested in pinning large directories.

@sdockray
Copy link
Author

@spinda - I tried with version 0.4.2 and got the same behavior, but maybe it is in the 0.4.3-rc? I can't see the change specifically cited in the change log but it does mention that it "improves the performance of Storage, Bitswap, as well as Content and Peer Routing" so maybe?

In the next few days, I'll try that version and report back

@Kubuxu
Copy link

Kubuxu commented Jul 29, 2016

The major improvements are in 0.4.3-rc1.

@sdockray
Copy link
Author

sdockray commented Jul 29, 2016

thanks for the info.. will try and get it updated soon

% ipfs update --verbose install v0.4.3-rc1
fetching go-ipfs version v0.4.3-rc1
  - using GOOS=linux and GOARCH=amd64
  - fetching "/ipns/dist.ipfs.io/go-ipfs/v0.4.3-rc1/go-ipfs_v0.4.3-rc1_linux-amd64.tar.gz"
  - using local ipfs daemon for transfer
  - writing to /tmp/ipfs-update776635258/go-ipfs_v0.4.3-rc1_linux-amd64.tar.gz
  - extracting binary to tempdir:  /tmp/ipfs-update032086215/ipfs-new
binary downloaded, verifying...
  - running init in '/dir/ipfs/update-staging/test495205521' with new binary
  - checking new binary outputs correct version
install failed, reverting changes...
ERROR: version didnt match 

@Kubuxu
Copy link

Kubuxu commented Jul 29, 2016

That huge mistake on our part. We didn't increase version of go-ipfs in the binary to 0.4.3-rc1 from 0.4.3-dev. For now you can try updating manually from https://dist.ipfs.io

cc @whyrusleeping

@sdockray
Copy link
Author

fyi, i downloaded the RC manually and

% tar xvfz go-ipfs_v0.4.3-rc1_linux-amd64.tar.gz
% mv go-ipfs/ipfs /dir/work/bin/
% ipfs version
ipfs version 0.4.3-dev

will just work with dev version for now

@Kubuxu
Copy link

Kubuxu commented Jul 30, 2016

It really is rc1, we just made a mistake and didn't update the name of the version in the code.

@sdockray
Copy link
Author

sdockray commented Aug 2, 2016

I've re-added everything and then tried to pin the root directory again and unfortunately am still having it fail

ipfs pin add QmevLmheGTxdZSQ9Vwwnb2SkY4n9SNFiznYWjRKY5z3KEK > ipfs_pin.log &
... after 30-60 minutes ...
Error: Post http://127.0.0.1:5001/api/v0/pin/add?arg=QmevLmheGTxdZSQ9Vwwnb2SkY4n9SNFiznYWjRKY5z3KEK&encoding=json&stream-channels=true: EOF

@whyrusleeping
Copy link

@sdockray when that failure happens, does the daemon print anything?

@sdockray
Copy link
Author

sdockray commented Aug 5, 2016

@whyrusleeping: No, the daemon exits without outputting anything

@sdockray
Copy link
Author

sdockray commented Aug 5, 2016

As a general question, what is the largest directory anyone has pinned? Or would it have more to do with the number if items to be pinned (during recursion)? If you have about 1/2TB free you can test with the hash above (although it seems unlikely that you'll actually pull more than a few gb before it fails)

@whyrusleeping
Copy link

@lgierth recently pinned the CCC archives, which i think was over 2TB.

@ghost
Copy link

ghost commented Dec 9, 2016

3.7 TB to be precise, but that was a plain add with the default of --pin=true, as opposed to a fetch + pin.

@whyrusleeping
Copy link

I think the problem here is merkledag.FetchGraph, It has unbounded concurrency IIRC. I'll push a quick fix for this...

@xloem
Copy link

xloem commented Jan 9, 2017

I'm still having this issue with 0.4.5-dev . When trying to pin a large folder, ipfs uses up over 3 gigs of ram and is then killed by oom_killer.

@xloem
Copy link

xloem commented Jan 9, 2017

I guess this is pretty much ipfs/kubo#3318

@xloem
Copy link

xloem commented Jan 9, 2017

I gave my system a ton of virtual ram and let my pin run overnight. It's still working on the pin ! And using 8.7 gigs of ram. I took the profiling stats which are at https://gateway.ipfs.io/ipfs//QmZmseT7n9MptemPcxkuWh9FeAPK7rPeTNMDiPQWTft4Qo . Writing them out was very slow. I had to forcibly kill and reboot the daemon to complete the add.

@xloem
Copy link

xloem commented Jan 9, 2017

I don't know go, but it seems notable that there are over 4500 goroutines in ipfs.stack that are created by gx/ipfs/QmVwFjMdejJ8mGVmgyR2mKcUHrvNBDtDsKRT99soVbkFhA/go-peerstream.(*Swarm).notifyAll. Doesn't quite sound like normal network app behavior.

@flyingzumwalt
Copy link
Contributor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants