-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting the monolithic server into multiple microservices #506
Comments
I've become increasingly a fan of the first alternative of having a PyPI library, even though I was also hesitant about the friction it introduced. In our case, I've separated the data processing to its own library (https://github.com/TRIP-Lab/itinerum-tripkit) and reference that when needed--it's allowed me to both offer it as a CLI tool and pull it into services as needed. With the ease of pinning, it's also given me more confidence in using it in new projects I hadn't anticipated, or that have a short shelf-life now but I want to continue working for future reference. That said, I've been very careful about trying to keep large dependencies low. One example I've wanted to emulate is Sentry.io's deprecated It's probably my own experience with Git, but I'd rather deal with resolving an errant pip dependency than tracking down an issue with Git submodules. I prefer Git submodules in the case of some monolith repo where I'm truly just glueing together complete components at a very high level. |
@kafitz thanks for the feedback! For the record, the raven documentation is here Let me see if I can figure out how they accomplished that. |
here's the answer. I am not sure we will need this, but it is good to know how to set it up. |
@kafitz good points about the pip install. After I wrote out the options, I realized that in terms of modularity, we don't actually want other services to add new data models to the core. The core data models are the ones for the incoming data and the basic trip diary. Any new service should have its own data models in its own module. There might be a way for the new service to register its data model with the core module at runtime for greater interoperability. I need to think through this some more. |
I am going to think through this in two separate steps:
|
I wrote out a big answer to this question, but then I ran out of memory and had to reboot and lost it. Here I go again. List of core modulesThe main core modules that are used across multiple services are:
Current usageData model wrappersThe data model wrappers are used as classes to wrap existing information. So it makes more sense to include them as a library. This is a good candidate for pulling out in PyPi, although we need to figure out how this can be extended (see below). Database/storage callsThis makes sense to model as a microservice. The service will accept an encryption key, decrypt the information, and make it available via a standard interface to other services. This makes the database layer very similar to the existing XBOS/SMAP layer, which focuses on efficient storage and data accessibility and should make it easier to merge projects in the future. I believe that the current prototype for the UPC already has the database as a separate service - @njriasan can you confirm? ExtensionOne of the big differences between e-mission and other platforms is that e-mission is designed to be extensible. This means that people can add new functionality, both from the analysis side, and from the sensing side. Data model wrappersWe should note that the server-side data model wrapper only represent the server-side representation. There is also a similar client side representation, currently in the data collection plugin. Over the long-term, we should really structure this as follows:
However, this is fairly complicated, and I am afraid of overengineering a solution in the time we have left. Instead, when any user wants to add a new data type, they will add it in the plugin and in the repository for the wrappers. The python version of the wrappers will be a library, installable via Database/storage callsI will refactor this officially into a microservice. This should be fairly straightforward since all access to the database should be through the defined @njriasan any thoughts on this? |
Another component that we should really split out is the webapp component, which contains the HTML and Javascript for the server UI, including the aggregate heatmap. That should really be served completely separately from the API, which makes calls into python. The obvious fix would be to have two servers - one for the server UI (web tier), and the other for API layer (app tier). But that would involve having two separate ports, which opens up multiple holes in the firewall and which is generally worse from a security perspective. An alternative would be to have one front-facing server that serves up the presentation layer and forwards other connections to the underlying API layer. The front-facing server can include a list of calls that should be forwarded, or it can forward everything that it doesn't handle and let the API layer reject it. |
Although we don't want to get bogged down in this, it is also worthwhile considering how all this will work in the decentralized world. In the decentralized world, each API call (e.g. This seems to imply that, of the multiple microservices running in parallel, each of them will need to listen to a separate port. @njriasan what is the firewall story wrt the cloud cluster? Will we even be able to have a regular firewall, given that each user will have multiple services, each of which will need its own port? |
Also, in order to reduce maintenance costs, it would be good if we could standardize on microservices and run them in both decentralized and centralized environments. In the centralized environment, something like https://www.express-gateway.io/, open sourced at https://github.com/ExpressGateway seems like a good solution. Of course, since our API proxy is fairly basic, we could just implement something simple in python if it turns out that switching to the MEAN stack is too complicated. |
Related python project is https://github.com/Miserlou/Zappa which appears to be a wrapper around AWS API Gateway resource and Lambda services. Which is great, but I am not sure I want to build in that kind of dependency. |
Sorry I didn't see this until just now. The database is currently extracted into a separate docker container and run in conjunction in the UPC architecture. I think a service which interacts with a user's database/storage layer is reasonable. I think the alternative approach could be that there is a central UPC database service and that the Database/storage cells rather than being fed the private key send a query to a core UPC component. Similarly I think you could design the database/storage cells to request the data from the UPC instance rather than assuming some internal docker database (unless that's what you mean). These are just other options to consider because I think what you listed sounds great. I think the firewall situation is complicated at best and its probably not feasible to give each user its own firewall. The exact authentication protocol probably needs to be discussed in more detail, but aside from only making the services addressable from inside the cluster I'm not sure how we can have a true firewall with dynamic ports. |
In case anybody is tracking progress here, I got sidetracked by a conda regression (#511), but I have worked around it now, dockerized the testing infrastructure, and got it to work with Github Actions and Travis CI (e-mission/e-mission-server#731). Now that I have testing in place, I can move out the first part, which is the simulation code, and create a dockerized setup with the OTP setup from Helsinki. Onward! |
Removed all the old webapp code from the server pending creation of a separate, modular webapp. This should make the server a lot smaller, and ensure that the dependabot alerts disappear since all of them were related to the javascript code. |
The phone native code is already modularized into several plugins. But the server code is currently in one monolithic repository. While this is a simple design to begin with, it is problematic for several reasons:
Further, specially as we move towards the UPC architecture, we really want to have a microservices architecture. Many of the services share common functionality, like in the diagram below.
So the next question is, what are the best tools to split up the architecture? Some choices that I have considered, along with their limitations are:
@atton16 @jf87 @kafitz @PatGendre @stephhuerre any thoughts on this?
The text was updated successfully, but these errors were encountered: