Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: use non-MongoDB backend as an option #46

Open
Tagar opened this issue Jul 28, 2015 · 3 comments
Open

feature request: use non-MongoDB backend as an option #46

Tagar opened this issue Jul 28, 2015 · 3 comments

Comments

@Tagar
Copy link

Tagar commented Jul 28, 2015

Feature request.

It would be great to have non-MongoDB backeend as an option.

We use Cloudera CDH, so anything that is packaged with it would be an option for us, like HBase, Solr or Hive.

@ryan-williams
Copy link
Member

Cool, good idea @Tagar. The big forcing function for using Mongo is: that's what Meteor uses, which is what gets Spree all of the live-updating-ness!

If you just wanted to write Spark events to a database and not use the reactive web-app, it would be possible to modify slim, or make another server similar to it, that wrote to [DB of your choice].

Otherwise, you'll have to wait for Meteor's fabled release of a SQL engine (later this year, I think I heard?) if you want Spree's live-updating bits to work on data stored in not-Mongo.

Let me know which of those options more closely matches what you're after.

@Tagar
Copy link
Author

Tagar commented Jul 31, 2015

Yep, seems Meteor's ability to work with not-Mongo is what we're after.
Great visualizations - hope to see them as part of core Spark or as a Spark package.
Thanks!

@ryan-williams
Copy link
Member

There are a few questions relevant to this discussion that I'd like to discuss further:

Mongo

In my deployment of Spree, I just run meteor and it manages the Mongo instance for me. Even if CDH was managing Mongo on your cluster, or you were pointing Meteor at a CDH-managed HBase, you'd still need to run the Meteor server process (as well as Slim) somewhere, right? I'm missing why you can't just run meteor and slim somewhere that your driver can make network requests to, and let Meteor and its Mongo do their thing.

Related, see #44 about providing a DigitalOcean droplet that will fire up a Slim and Spree somewhere that your driver can send events to; such a setup would work for any Spark/cluster configuration, no?

Spark Package

You mentioned getting these as a Spark package. As it happens, JsonRelay (the SparkListener used by Spree) is a Spark package. However, in testing it I discovered that the spark.extraListeners flag that JsonRelay registers itself via doesn't work with classes that live in Spark packages, due to a bug in Spark; the driver's classpath doesn't get the packages' classes on the classpath correctly when it is instantiating listeners.

This bug exists in Spark <= 1.4.1 but will be fixed in 1.5.0. Until then, this listener can't really be used, which is why I've refrained from mentioning it in any of the Spree docs.

Once 1.5.0 is out, registering JsonRelay with the driver will be as simple as:

--packages hammerlab:spark-json-relay:1.0.0 
--conf spark.extraListeners=org.apache.spark.JsonRelay

whereas today you have to download the JsonRelay JAR and pass that to Spark:

--driver-class-path /path/to/spark-json-relay-1.0.0.jar
--conf spark.extraListeners=org.apache.spark.JsonRelay

Anyway, none of that helps to run Slim or Spree as a Spark package, which it sounds like you were requesting. Given that they're both Node servers, it's hard for me to see how that would be possible. Additionally, even if they were java/scala servers, I don't have a clear picture of how a Spark package could run a server inside the driver; maybe after registering itself as a SparkListener, and with no formal life-cycle management, just starting the server at construction time and never tearing it down?

I'll have to think about that; let me know if you feel like that is possible / a desirable way to architect things, if you were thinking something else, etc.

One more relevant point is: as I mentioned in my email to the dev list, pushing some of this work out to separate processes (Slim and Spree), while it incurs some maintenance overhead, seems more like a feature than a bug to me, since the alternative is to bog down the driver with various work that is not core to what it is supposed to be doing.

Curious to hear any thoughts you have about these topics!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants