Cassandra integration #66

zapletal-martin · 2016-07-24T15:05:09Z

Cassandra database integration

CassandraSource
CassandraSink
CassandraStore

Reuses some Spark-Cassandra connector files and follows how that works. The intent is to allow the connector to be reused when version for other processing systems is available. The Source looks up token ranges in the desired table, splits to independent sets of partitions and assigns those to available number of source tasks, allowing very good parallelism. All fetches of data except the first one are asynchronous. The Sink can be trivially parallelised by the user where different writes are assigned to different tasks.

The Source scans a current table snapshot and does not currently honour updates (so not a continuous stream). The source is not time replayable. There are options how to handle both these, but must be properly thought through. The test coverage is poor at the moment. but this first attempt will allow iteration and continuous improvement of the code and adding features.

…y to track progress in all partitions

zapletal-martin · 2016-07-24T15:17:06Z

project/Pack.scala

    // This is for DFSJarStore
-    "${PROG_HOME}/lib/yarn/*"
+    // "${PROG_HOME}/lib/yarn/*"


Had to comment out to avoid runtime issues. When I try to submit an example job to Gearpump I get "java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(". I believe that happens because Gearpump pulls com.google.guava:guava version 11.0.2 from Hadoop dependencies, but Cassandra Java driver I am using needs version 16.0.1. Need to figure out a solution to this.

I would appreciate help here as I may not understand exactly what my changes may cause elsewhere.

I'll fix this once 0.8.1 is out. Sorry we may need hold this for a while.

manuzhang · 2016-07-25T01:49:45Z

@zapletal-martin Thanks for your contribution. I'll pull your branch and try playing with it.

…tion

… more classdefnot found version issues.

zapletal-martin added 11 commits July 10, 2016 22:46

Gearpump Cassandra Integration

253ea9e

Script

1719f7b

Partitioner

0f3db5e

Styling and working partitioning

676afcc

Refactoring

9e90868

Cleanup pass

fd8f3b4

Removing unnecessary code

9314220

Refactoring

e9c7084

Apache 2.0 licence modification notices

10370f3

Refactoring

b5d96ba

Removing a TimeReplayableSource before we figure out an nefficient wa…

d855bde

…y to track progress in all partitions

zapletal-martin reviewed Jul 24, 2016
View reviewed changes

Cassandra example moved to experiments

4d3eaf2

zapletal-martin added 9 commits July 30, 2016 20:01

License

c1c08fc

Merge remote-tracking branch 'upstream/master' into cassandra-integra…

ec94419

…tion

Scalastyle config

5bcf186

Testing increased embedded cassandra startup timeout for TravisCI

3c2decc

Merge remote-tracking branch 'upstream/master' into cassandra-integra…

c4865de

…tion

Merge and changes to fix compilation and test issues.

43c0e97

Fixed cassandra source tests

13d7061

Reverted changes to build files. Dependency on Yarn now removed so no…

c413b64

… more classdefnot found version issues.

Timeout

1a167e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra integration #66

Cassandra integration #66

zapletal-martin commented Jul 24, 2016

zapletal-martin Jul 24, 2016

zapletal-martin Aug 5, 2016

manuzhang Aug 6, 2016

manuzhang commented Jul 25, 2016

Cassandra integration #66

Are you sure you want to change the base?

Cassandra integration #66

Conversation

zapletal-martin commented Jul 24, 2016

zapletal-martin Jul 24, 2016

Choose a reason for hiding this comment

zapletal-martin Aug 5, 2016

Choose a reason for hiding this comment

manuzhang Aug 6, 2016

Choose a reason for hiding this comment

manuzhang commented Jul 25, 2016