Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make npm api requests in parallel #181

Open
Mic92 opened this issue May 9, 2020 · 13 comments
Open

Make npm api requests in parallel #181

Mic92 opened this issue May 9, 2020 · 13 comments

Comments

@Mic92
Copy link
Contributor

Mic92 commented May 9, 2020

It seems that node2nix currently performs blocking requests:

diff --git a/lib/sources/NPMRegistrySource.js b/lib/sources/NPMRegistrySource.js
index 57d99be..81c627f 100644
--- a/lib/sources/NPMRegistrySource.js
+++ b/lib/sources/NPMRegistrySource.js
@@ -79,7 +79,9 @@ NPMRegistrySource.prototype.fetch = function(callback) {
               };
             }
 
+            console.log(`start downloading ${url}`)
             client.get(url, clientParams, function(err, data, raw, res) {
+                console.log(`finished downloading ${url}`)
                 if(err) {
                     callback(err);
                 } else if(data == undefined || data.versions === undefined) {
$ ./result/bin/node2nix
fetching local directory: ./. from .
start downloading https://registry.npmjs.org/optparse
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/optparse
http 200 https://registry.npmjs.org/optparse
finished downloading https://registry.npmjs.org/optparse
start downloading https://registry.npmjs.org/semver
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/semver
http 200 https://registry.npmjs.org/semver
finished downloading https://registry.npmjs.org/semver
start downloading https://registry.npmjs.org/npm-registry-client
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/npm-registry-client
http 200 https://registry.npmjs.org/npm-registry-client
finished downloading https://registry.npmjs.org/npm-registry-client
start downloading https://registry.npmjs.org/npmconf
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/npmconf
http 200 https://registry.npmjs.org/npmconf
finished downloading https://registry.npmjs.org/npmconf
start downloading https://registry.npmjs.org/tar
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/tar
http 200 https://registry.npmjs.org/tar
finished downloading https://registry.npmjs.org/tar
start downloading https://registry.npmjs.org/temp
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/temp
http 200 https://registry.npmjs.org/temp
finished downloading https://registry.npmjs.org/temp
start downloading https://registry.npmjs.org/fs.extra
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/fs.extra
http 200 https://registry.npmjs.org/fs.extra
finished downloading https://registry.npmjs.org/fs.extra
start downloading https://registry.npmjs.org/findit

Maybe you can explain why this happens as you also seem to maintain the asynchronous abstraction behind it: https://github.com/svanderburg/slasp

@Mic92
Copy link
Contributor Author

Mic92 commented May 14, 2020

Would agree to rewrite node2nix to use async/await builtins instead of slasp?

@svanderburg
Copy link
Owner

It is not necessarily blocking, but it currently only performs one requests at the time and was not made to do any downloads a parallel.

It could be done with promises/async/await, but there also other ways.

I have not done any experiments yet, but having the ability to have multiple downloads in parallel could be beneficial to the generation speed.

@Mic92
Copy link
Contributor Author

Mic92 commented May 14, 2020

Is the api client that only performs one request at the time? The rest looks like it should be asynchronous.

@raboof
Copy link

raboof commented Jun 10, 2020

nixpkgs generate.sh takes quite a while, and seems like it could potentially benefit from this. Do you have a particular direction in mind how you would like to see this implemented?

@svanderburg
Copy link
Owner

@raboof Although having parallel downloads might shave off a bit of processing time, I believe what we really want is the ability to partially regenerate the expression for only the package that needs to be added or changed.

I'm currently investigating options to make that possible.

Not this feature is not useful, but what I expect to happen is that if we make the regeneration process faster, people will add more packages eventually nullifying its effect.

With partial regenerations, we can have faster updates, and better commits on a per package basis.

@Mic92
Copy link
Contributor Author

Mic92 commented Jun 11, 2020

@raboof Although having parallel downloads might shave off a bit of processing time, I believe what we really want is the ability to partially regenerate the expression for only the package that needs to be added or changed.

That would be really great. Also to enable backports

@raboof
Copy link

raboof commented Jun 28, 2020

With partial regenerations, we can have faster updates, and better commits on a per package basis.

That sounds great. I didn't see an issue for this yet, so I filed #192.

@oxalica
Copy link

oxalica commented Sep 21, 2020

I tried to replace slasp.fromEach with a parallel version. (See repo)

It has speed up fetching about 2x in good network environment but makes the output non-deterministic, which are differences like A-0.2, B // { A-0.1 }, C vs A-0.1, B, C // { A-0.2 } while A is a common transparent dependency. This is related to the order of package traversal, since we currently choose the first version seen to be in outer layer and mark other version as override.
Related code:

var parentDependency = self.findMatchingProvidedDependencyByParent(dependencyName, versionSpec);

@Mic92
Copy link
Contributor Author

Mic92 commented Sep 21, 2020

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better? That should make it deterministic as well.

@oxalica
Copy link

oxalica commented Sep 21, 2020

@Mic92

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better?

I think the order affects not only the output layout, but also the version selection result.
Consider that A requires C@^1.0 and B requires C@=1.5, assuming C has version 1.5 and 1.9. Resolving order A, B results in two package, C@1.9 for A and C@1.5 for B, while resolving order B, A results in a single package C@1.5 for both A and B.

The version resolution algorithm seems to be quite naive now.

@Mic92
Copy link
Contributor Author

Mic92 commented Sep 21, 2020

@Mic92

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better?

I think the order affects not only the output layout, but also the version selection result.
Consider that A requires C@^1.0 and B requires C@=1.5, assuming C has version 1.5 and 1.9. Resolving order A, B results in two package, C@1.9 for A and C@1.5 for B, while resolving order B, A results in a single package C@1.5 for both A and B.

The version resolution algorithm seems to be quite naive now.

Also there determinism could be achieved by going for the newest possible version for every constraint, but I am not sure if this is what the users want.

@svanderburg
Copy link
Owner

@Mic92 Always picking the newest version is not going to work -- the fact that this does not always happen is intentional. If we would choose to alter this behaviour, then compatibility with certain packages is most likely going to break.

Sadly, NPM always tries to "reuse" dependencies that fit within a certain version range in a node_modules/ folders that reside in the parent directories. For certain packages, this means that a version range will resolve to an older version than the latest. I don't like this behaviour either, but it is how NPM works and instrumental to ensure compatibility.

I had several commercial projects that would fail to deploy if I would not replicate this very odd and illogical behaviour. To discover this, I did quite a few intensive comparisons between node2nix and the vanilla NPM.

Furthermore, this is also NPM's means to break out of cyclic dependencies. If we always resolve to the latest version, then certain deployments will remain stuck in an infinite loop.

@oxalica Thanks for trying to optimize this. Although we have to give it some more thought, it is good to hear that there is a speed up possible.

@LamprosPitsillos
Copy link

Sorry to bump this old issue , but has there been any advancements in solving this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants