Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability meta issue #121

Open
4 of 16 tasks
luceos opened this issue Mar 22, 2021 · 13 comments
Open
4 of 16 tasks

Scalability meta issue #121

luceos opened this issue Mar 22, 2021 · 13 comments

Comments

@luceos
Copy link
Member

luceos commented Mar 22, 2021

This issue is meant to provide actionable tasks to improve scalability in specialised, highly volatile hosting environments.

  • The renderer is stored and cached to local disk. TextFormatter cache does not change drivers #154
  • the users.avatar_url string length is too short disallowing the use of CDN's
  • migrations table has no auto increment primary key ID on some DB's this causes errors and fails migration completely, I also remember faintly some db tools dislike this and remove checkboxes (phpmyadmin?)
  • The DatabaseSessionHandler requires knowledge of the actor. As user nor request isn't available from the container one cannot override the session.handler easily; right now fixed by binding the actor into the container.
  • Overriding any of the compilers (because you want to override the revision generation), requires you to override the Assets class which you cannot override easily. It is possible using extend but then you also need to hit the original FrontendServiceProvider to retrieve addBaseCss twice (css, localeCss).
  • logoUrl and favIconUrl both need to start using a similar implementation like the avatar_url of User. Right now the only way to override these is by mutating the attributes in the ForumSerializer.
  • It would be helpful to be able to add migrations from the local extend.php. Right now my default go to is to create a custom command that replicates our own migration command, but with a different path.
  • cache clearing won't work across nodes
  • posts.number calculated on php backend PDOException : Duplicate entry '.....' for key 'posts_discussion_id_number_unique' framework#3350
  • locales catalogue is stored on disk to storage/locale, on scaled nodes this file isn't available and thus references are not loaded
  • queue jobs $queue property is ignored, only setting the queue name through dispatching works
  • the DatabaseSettingsRepository is wrapped inside a MemoryCacheSettingsRepository that keeps settings in memory, it would be better to inject the configured cache and use that instead of the static property
  • the queue daemon/worker does not have the cache store injected and as such cannot restart based on the RestartCommand, which also isn't yet part of Flarum
  • last_used/last_seen updated on all API requests framework#3231
  • AccessToken DB updates duplicated when 'Remember me' is selected framework#3234
  • The js (forum, admin and locale)/css are stored using something-<hash>.filetype, when invalidating cache on one node and that node stores a new version into a CDN, users on other nodes will require to reload the page to retrieve the latest rev-manifest to load the right files. In the meantime their pages might break. A solution is to use the revision/hash as query param thus auto invalidating client-side cached files, but also providing a fallback in case the revision is incorrect.
@askvortsov1
Copy link
Member

askvortsov1 commented Mar 22, 2021

the users.avatar_url string length is too short disallowing the use of CDN's

Wouldn't we want this to be the path relative to the CDN's root? That would give more flexibility in changing avatar filesystems.

On the other hand, storing the full URL would allow distributing assets across several filesystem backends.

migrations table has no auto increment primary key ID on some DB's this causes errors and fails migration completely, I also remember faintly some db tools dislike this and remove checkboxes (phpmyadmin?)

Yeah, we should add this. I suppose we'll want to do something like what Clark did in flarum/framework#2651. Luckily I don't think we'll ever have tens of thousands of migrations, so that should make life quite a bit easier.

The DatabaseSessionHandler requires knowledge of the actor. As user nor request isn't available from the container one cannot override the session.handler easily; right now fixed by binding the actor into the container.

I don't think we'll be able to use Laravel's default provided one, that assumes way too much Laravel stuff is present. That being said, I don't know why we wouldn't be able to make our own version. I'm also not sure why it would require the user globally: the only places where it seems to be used in core are the CollectGarbage and StartSession middlewares; in both cases, we can pass the current request in through a setRequest method.

A solution is to use the revision/hash as query param thus auto invalidating client-side cached files, but also providing a fallback in case the revision is incorrect.

This seems simple enough, and I don't see any downsides off the top of my head. In favor!

Overriding any of the compilers (because you want to override the revision generation), requires you to override the Assets class which you cannot override easily.

Not sure how we'd want to approach this one; we could pass in a container instance and resolve an abstract string binding (which could be set via service providers)? Not sure I like giving it a container instance though. Why not make the compilers singletons on the service provider level, instead of relying on Assets?

logoUrl and favIconUrl both need to start using a similar implementation like the avatar_url of User. Right now the only way to override these is by mutating the attributes in the ForumSerializer.

If we implement flarum/framework#1783, wouldn't we have the base path of the assets filesystem, regardless if it's stored with us or on S3? I do agree that consistency would be nice though.

posts.number calculated on php backend flarum/framework#3350

What's the alternative?

It would be helpful to be able to add migrations from the local extend.php. Right now my default go to is to create a custom command that replicates our own migration command, but with a different path.

Seems sensible, although I'd prefer to discourage this for non-advanced use.

cache clearing won't work across nodes

I'll need to do a bit more research on distributed cache theory before opining on this one.

@luceos
Copy link
Member Author

luceos commented Mar 23, 2021

I don't think we'll be able to use Laravel's default provided one, that assumes way too much Laravel stuff is present. That being said, I don't know why we wouldn't be able to make our own version. I'm also not sure why it would require the user globally: the only places where it seems to be used in core are the CollectGarbage and StartSession middlewares; in both cases, we can pass the current request in through a setRequest method.

The alternate solution would be to override the StartSession middleware to instantiate the other session handler there and inject the actor in the some run.

The problem is you cannot globally define a new session handler into the ioc because it expects in its own logic to be able to retrieve the actor on whim. As you suggested a custom solution might solve that, but I think that any distance you create between the framework of the components you use has to be deliberate. Because the hurdle of easily porting packages from Laravel will only become greater this way. Horizon is a great example of that.

Not sure how we'd want to approach this one; we could pass in a container instance and resolve an abstract string binding (which could be set via service providers)? Not sure I like giving it a container instance though. Why not make the compilers singletons on the service provider level, instead of relying on Assets?

Binding the compiler into ioc the same way the other assets related binding works would already help a ton. This allows use of extend and resolving for instance.

If we implement flarum/framework#1783, wouldn't we have the base path of the assets filesystem, regardless if it's stored with us or on S3? I do agree that consistency would be nice though.

I think consistency is an extensible system is very important. We already are all over the place in terms of some implementations.

What's the alternative?

Check the issue ;)

Seems sensible, although I'd prefer to discourage this for non-advanced use.

Maybe something for flarum-cli perhaps? flarum-cli migrate --local which would set up the command in app, update composer.json with the autoload.psr4.app key?

I'll need to do a bit more research on distributed cache theory before opining on this one.

The problem is that clearing a cache from the dashboard or cli executes the ClearingCache event only on that node. As a consequence the other nodes are not flushed. A possible solution would be to remember the last flushed timestamp in the local filesystem cache and periodically check for that while flushing the cache if the timestamp does not match against the database stored one.

Alternatively a way to trigger an endpoint on every node could be an idea too.

Both suggestions would not make sense for core though, but maybe as extension...

@askvortsov1
Copy link
Member

The problem is you cannot globally define a new session handler into the ioc because it expects in its own logic to be able to retrieve the actor on whim. As you suggested a custom solution might solve that, but I think that any distance you create between the framework of the components you use has to be deliberate. Because the hurdle of easily porting packages from Laravel will only become greater this way. Horizon is a great example of that.

I took another look at the DatabaseSessionHandler implementation, and I'm not sure why we wouldn't be able to immediately use it. There are several places where it attempts to read from a guard and a request bound into the container, but it checks that they are actually bound before proceeding:

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Session/DatabaseSessionHandler.php#L224
https://github.com/laravel/framework/blob/8.x/src/Illuminate/Session/DatabaseSessionHandler.php#L199

So these seem to be completely optional. And we don't really need them because we store all this ourselves on the access token level.

I think consistency is an extensible system is very important. We already are all over the place in terms of some implementations.

My attempt to refactor this in flarum/framework#2729 is coming together extremely nicely: URLs to logo/favicon and avatar are being generated from the filesystem driver. Those now work the same way Assets does:

https://github.com/flarum/core/blob/783c563305f5d1f809e2bdc388f9e0cc785d8f4a/src/Frontend/FrontendServiceProvider.php#L29-L29
https://github.com/flarum/core/blob/783c563305f5d1f809e2bdc388f9e0cc785d8f4a/src/Frontend/Compiler/RevisionCompiler.php#L140-L140

Both suggestions would not make sense for core though, but maybe as extension...

We'd just need to make sure it's possible without having to completely replace the skeleton.

@luceos
Copy link
Member Author

luceos commented Mar 24, 2021

Point added:

  • locales catalogue is stored on disk to storage/locale, on scaled nodes this file isn't available and thus references are not loaded

@askvortsov1
Copy link
Member

One potential solution to caching:

The filesystem refactor I linked above unifies and simplifies our disk filesystem use for both internals and extensions. We might want to consider doing a similar thing with cache. Laravel supports many cache drivers out of the box, so we could relatively easily add laravel caching for:

  • locales
  • views
  • Anything else I missed

Formatter already uses a Laravel-based cache, but there's also stuff in storage/formatter. However the fix @luceos has been working on should fix this, so that we don't need to worry about clearing the storage/formatter cache ourselves. We also don't have control over this cache, it's entirely in TextFormatter land.

A system like this would allow one solution for all caching concerns, make it simpler (and documented!) to add caching to new things, and support various cache drivers (file as default, database, memcached, redis, etc) out of the box.

@franzliedke
Copy link

As far as I can tell, sessions are still saved to disk instead of the database. That would be a big problem with scalability. Are you planning to keep this to extension territory, is this part of flarum/framework#2074 or should it be added to this list? 😀

@askvortsov1
Copy link
Member

As far as I can tell, sessions are still saved to disk instead of the database. That would be a big problem with scalability. Are you planning to keep this to extension territory, is this part of flarum/framework#2074 or should it be added to this list?

At least personally, I'd like to see direct use of Laravel service providers (and the accompanying config from https://github.com/flarum/core/blob/12f6b1b3750a0e04286a1535dab8074046413218/src/Foundation/InstalledSite.php#L159-L159) moved completely into Flarum service providers, like flarum/framework#2732 does. Sessions would eventually be a part of that. For now though, I believe it's possible to support a DB driver via custom container bindings and a migration to add the necessary table.

@franzliedke
Copy link

I don't know of any special strengths of the file-based driver that warrant keeping it as default, but yes, it can be exchanged at runtime. Your choice. 👍🏼

@askvortsov1
Copy link
Member

I don't know of any special strengths of the file-based driver that warrant keeping it as default, but yes, it can be exchanged at runtime. Your choice. 👍🏼

Needs more research! Just like everything, now that I think about it. Someone should really get around to inventing a time machine already 🤔

@luceos
Copy link
Member Author

luceos commented Apr 7, 2021

One of the reasons this issue does not include the session driver is because it can indeed be easily replaced. This issue is mostly about the real hardships after having tried to replace or extend.

@luceos
Copy link
Member Author

luceos commented Oct 14, 2021

Coming back to database session handling. When using database session handling all requests will always switch to write connections. This removes the ability to set up read/write connections. For API GET requests (without the update last seen logic) in theory connections could remain read only the whole time, with database session handling this is no longer the case, because we use sessions in our API middleware.

To prevent these write connections, file based sessions are great; but they don't (horizontally) scale unless you mount the sessions directory or add session stickiness to your loadbalancer with its own risks involved.

What would make sense, I think, is to ship Flarum with both session and database session handling. And add a select in the (new) advanced pane which should not be visible by default. Then anyone interested can easily toggle between the two (and invalidate all user sessions 🙈). Alternatively it really isn't too much to have a dedicated extension become a drop-in replacement for file based session handling.

@tankerkiller125
Copy link

I think the whole point of the advanced panel was to add in things for scalability, in which I think database sessions fits quite neatly.

@askvortsov1
Copy link
Member

I think the whole point of the advanced panel was to add in things for scalability, in which I think database sessions fits quite neatly.

Well not just for scalability, but for minor tweaks of various constant values / config. I'd like to avoid becoming Discourse, so we should still use use-case-designed settings UIs when possible, but there should be a place to put all the constants / minor things / assorted knobs that should be tweakable.

@davwheat davwheat pinned this issue Dec 27, 2021
@askvortsov1 askvortsov1 transferred this issue from flarum/framework Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants