Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks spark #3

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
adad5ea
[maven-release-plugin] prepare for next development iteration
elahrvivaz Apr 20, 2018
6911d7c
GEOMESA-2262 Add gs:// to list of remote prefixes (#1939)
elahrvivaz Apr 23, 2018
10d14a1
GEOMESA-2271 HBase - support query timeout for coprocessor calls (#1943)
elahrvivaz Apr 27, 2018
9eb42fb
GEOMESA-2267 Spark dataframe .show or .take throws an exception with …
atallahhezbor Apr 30, 2018
b9dff19
GEOMESA-2268 Adding info on gpg signatures, fixing javadoc comments r…
elahrvivaz May 2, 2018
3210be7
GEOMESA-2273 Fixing push-down spark sql date predicates (#1944)
elahrvivaz May 2, 2018
28c7f89
GEOMESA-2275 Disable speculative execution during ingestion (#1947)
elahrvivaz May 3, 2018
1e79399
GEOMESA-2282 Fix GTD converter date format (#1950)
aheyne May 7, 2018
64433db
GEOMESA-2281 SparkSQL - handling push-down casts (#1951)
elahrvivaz May 7, 2018
125e720
GEOMESA-2280 Fixing unbounded secondary date attribute ranges (#1949)
elahrvivaz May 7, 2018
bb9cb90
GEOMESA-2278 Docs - document non-Accumulo spatial RDD providers (#1952)
elahrvivaz May 9, 2018
cbfef8a
GEOMESA-2277 Docs - version compatibility (#1953)
elahrvivaz May 10, 2018
bbf7ed8
GEOMESA-2176 Create python conversion for JTS Geometry instances (#1936)
aheyne May 11, 2018
89cfd80
Updating release version
elahrvivaz May 11, 2018
f91d753
[maven-release-plugin] prepare release geomesa_2.11-2.0.1
elahrvivaz May 11, 2018
62a37af
[maven-release-plugin] prepare for next development iteration
elahrvivaz May 11, 2018
32e0a4b
GEOMESA-2241 HBase Spark queries can drop part of FID (#1958)
elahrvivaz May 21, 2018
ee1b259
GEOMESA-1730,GEOMESA-1479 GPX tutorial shouldn't depend on external c…
elahrvivaz May 21, 2018
5fb333f
GEOMESA-2133 JsonPath converter treats missing elements as null (#1963)
elahrvivaz May 21, 2018
6e6a559
GEOMESA-2225 Fixing converter parseMap example in docs, adding unit t…
elahrvivaz May 21, 2018
c7a4c3f
GEOMESA-2274 Supporting query reprojections (#1955)
elahrvivaz May 21, 2018
e8a9013
GEOMESA-2212 JSON list converter - handle nulls (#1960)
elahrvivaz May 21, 2018
88435a4
GEOMESA-2289 Change Recommended Accumulo/Hadoop versions (#1968)
jahhulbert-ccri Jun 7, 2018
a52cc29
GEOMESA-2291 CQEngine - handle filter functions, etc (#1969)
jnh5y Jun 7, 2018
1007ea4
GEOMESA-2292 Adding handling for 'attribute is not null' in stats-bas…
elahrvivaz Jun 7, 2018
e999549
GEOMESA-2293 Fixing query planning import statement in docs (#1971)
elahrvivaz Jun 7, 2018
430490d
GEOMESA-2294 Update ORC version to 1.4.4 (#1972)
elahrvivaz Jun 11, 2018
77765c9
Updating release version in README
elahrvivaz Jun 11, 2018
793e18b
GEOMESA-2299 Fixing Bigtable SpatialRDDProvider (#1974)
elahrvivaz Jun 11, 2018
4e59c09
[maven-release-plugin] prepare release geomesa_2.11-2.0.2
elahrvivaz Jun 11, 2018
e24ee22
Reverted scala logging and caffeine
aheyne Jul 25, 2018
46d18e9
Spark HBase connection caching
Jul 25, 2018
7c660e3
Fixing build
aheyne Jul 27, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .travisbuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ if [[ $RESULT -ne 0 ]]; then
echo -e "[ERROR] Build failed!\n"
else
# now run tests - using the maven executable, as zinc uses too much memory
mvn -o surefire:test -DargLine="-Duser.timezone=UTC -Xmx4g -XX:-UseGCOverheadLimit -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Dgeomesa.scan.ranges.target=500" 2>&1 | tee -a $BUILD_OUTPUT | grep -e 'Building GeoMesa' -e '\(maven-surefire-plugin\|maven-jar-plugin\|scala-maven-plugin.*:compile\)'
mvn surefire:test -DargLine="-Duser.timezone=UTC -Xmx4g -XX:-UseGCOverheadLimit -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Dgeomesa.scan.ranges.target=500" 2>&1 | tee -a $BUILD_OUTPUT | grep -e 'Building GeoMesa' -e '\(maven-surefire-plugin\|maven-jar-plugin\|scala-maven-plugin.*:compile\)'
RESULT=${PIPESTATUS[0]} # capture the status of the maven build

# dump out the end of the build log, to show success or errors
Expand Down
42 changes: 32 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,42 @@ geospatial analytics.

## Downloads

**Current release: 2.0.0**
**Current release: 2.0.2**

    
[**HBase**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.0/geomesa-hbase_2.11-2.0.0-bin.tar.gz) |
[**Accumulo**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.0/geomesa-accumulo_2.11-2.0.0-bin.tar.gz) |
[**Cassandra**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.0/geomesa-cassandra_2.11-2.0.0-bin.tar.gz) |
[**Kafka**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.0/geomesa-kafka_2.11-2.0.0-bin.tar.gz) |
[**FileSystem**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.0/geomesa-fs_2.11-2.0.0-bin.tar.gz) |
[**Source**](https://github.com/locationtech/geomesa/archive/geomesa_2.11-2.0.0.tar.gz) |
[**CheckSums**](https://github.com/locationtech/geomesa/releases/geomesa_2.11-2.0.0)
[**HBase**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.2/geomesa-hbase_2.11-2.0.2-bin.tar.gz) |
[**Accumulo**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.2/geomesa-accumulo_2.11-2.0.2-bin.tar.gz) |
[**Cassandra**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.2/geomesa-cassandra_2.11-2.0.2-bin.tar.gz) |
[**Kafka**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.2/geomesa-kafka_2.11-2.0.2-bin.tar.gz) |
[**FileSystem**](https://github.com/locationtech/geomesa/releases/download/geomesa_2.11-2.0.2/geomesa-fs_2.11-2.0.2-bin.tar.gz) |
[**Source**](https://github.com/locationtech/geomesa/archive/geomesa_2.11-2.0.2.tar.gz) |
[**CheckSums**](https://github.com/locationtech/geomesa/releases/geomesa_2.11-2.0.2)

**Development version: 2.1.0-SNAPSHOT**  
[![Build Status](https://api.travis-ci.org/locationtech/geomesa.svg?branch=master)](https://travis-ci.org/locationtech/geomesa)

### Verifying Downloads

Downloads hosted on GitHub include SHA-256 hashes and gpg signatures (.asc files). To verify a download using gpg,
import the appropriate key:

```bash
$ gpg2 --keyserver hkp://pool.sks-keyservers.net --recv-keys CD24F317
```

Then verify the file:

```bash
$ gpg2 --verify geomesa-accumulo_2.11-2.0.0-bin.tar.gz.asc geomesa-accumulo_2.11-2.0.0-bin.tar.gz
```

The keys currently used for signing are:

| Key ID | Name |
| ------ | ---- |
| `CD24F317` | Emilio Lahr-Vivaz <elahrvivaz(-at-)ccri.com> |
| `1E679A56` | James Hughes <jnh5y(-at-)ccri.com> |

### Upgrading

To upgrade between minor releases of GeoMesa, the versions of all GeoMesa components **must** match.
Expand Down Expand Up @@ -87,7 +109,7 @@ and then include the desired `geomesa-*` dependencies:
<dependency>
<groupId>org.locationtech.geomesa</groupId>
<artifactId>geomesa-utils_2.11</artifactId>
<version>2.0.0</version>
<version>2.0.2</version>
</dependency>
...
```
Expand Down Expand Up @@ -134,7 +156,7 @@ resolvers ++= Seq(

// Select desired modules
libraryDependencies ++= Seq(
"org.locationtech.geomesa" %% "geomesa-utils" % "2.0.0",
"org.locationtech.geomesa" %% "geomesa-utils" % "2.0.2",
...
)
```
Expand Down
22 changes: 22 additions & 0 deletions build/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,28 @@ geospatial analytics.
**Development version: ${geomesa.devel.version}** &nbsp;
[![Build Status](https://api.travis-ci.org/locationtech/geomesa.svg?branch=master)](https://travis-ci.org/locationtech/geomesa)

### Verifying Downloads

Downloads hosted on GitHub include SHA-256 hashes and gpg signatures (.asc files). To verify a download using gpg,
import the appropriate key:

```bash
$ gpg2 --keyserver hkp://pool.sks-keyservers.net --recv-keys CD24F317
```

Then verify the file:

```bash
$ gpg2 --verify geomesa-accumulo_2.11-2.0.0-bin.tar.gz.asc geomesa-accumulo_2.11-2.0.0-bin.tar.gz
```

The keys currently used for signing are:

| Key ID | Name |
| ------ | ---- |
| `CD24F317` | Emilio Lahr-Vivaz &lt;elahrvivaz(-at-)ccri.com&gt; |
| `1E679A56` | James Hughes &lt;jnh5y(-at-)ccri.com&gt; |

### Upgrading

To upgrade between minor releases of GeoMesa, the versions of all GeoMesa components **must** match.
Expand Down
5 changes: 2 additions & 3 deletions build/cqs.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ de.topobyte:osm4j-utils 0.0.22 compile
de.topobyte:osm4j-xml 0.0.3 compile
eu.medsea.mimeutil:mime-util 2.1.3 compile
io.airlift:aircompressor 0.8 compile
io.airlift:slice 0.29 compile
io.dropwizard.metrics:metrics-core 3.1.2 compile
io.dropwizard.metrics:metrics-ganglia 3.1.2 compile
io.dropwizard.metrics:metrics-graphite 3.1.2 compile
Expand Down Expand Up @@ -148,8 +147,8 @@ org.apache.hive:hive-storage-api 2.2.1 compile
org.apache.htrace:htrace-core 3.1.0-incubating compile
org.apache.metamodel:MetaModel-core 4.3.6 compile
org.apache.metamodel:MetaModel-pojo 4.3.6 compile
org.apache.orc:orc-core 1.4.1 compile
org.apache.orc:orc-mapreduce 1.4.1 compile
org.apache.orc:orc-core 1.4.4 compile
org.apache.orc:orc-mapreduce 1.4.4 compile
org.apache.parquet:parquet-column 1.9.0 compile
org.apache.parquet:parquet-common 1.9.0 compile
org.apache.parquet:parquet-encoding 1.9.0 compile
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You will also need ``make``.

To build HTML versions of the manuals:

$ mvn clean install -Pdocs
$ mvn clean install -Pdocs -pl docs

If you do not have Sphinx installed the manual will not be built.
The outputted files are written to the ``target/html`` directory.
Expand Down
6 changes: 3 additions & 3 deletions docs/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,17 +112,17 @@

.. |release_1_2_source_tarball| replace:: %(url_github_archive)s/geomesa-%(release_1_2)s.tar.gz

.. |maven_version| replace:: 3.2.2 or later
.. |maven_version| replace:: 3.5.2 or later

.. |geoserver_version| replace:: 2.12.x

.. |geotools_version| replace:: 18.x

.. |accumulo_version| replace:: 1.7 or 1.8
.. |accumulo_version| replace:: 1.9.1 or later

.. |hbase_version| replace:: 1.3.x

.. |hadoop_version| replace:: 2.2 or later
.. |hadoop_version| replace:: 2.6 or later

.. |zookeeper_version| replace:: 3.4.5 or later

Expand Down
2 changes: 1 addition & 1 deletion docs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
<parent>
<groupId>org.locationtech.geomesa</groupId>
<artifactId>geomesa_2.11</artifactId>
<version>2.0.0</version>
<version>2.0.2</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/broadcast-join.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GeoMesa Spark: Aggregating Data
===============================
GeoMesa Spark: Broadcast Join and Aggregation
=============================================

This tutorial will show you how to:

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/dwithin-join.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GeoMesa Spark: Aggregating Data
===============================
GeoMesa Spark: Spatial Join and Aggregation
===========================================

This tutorial will show you how to:

Expand Down
12 changes: 6 additions & 6 deletions docs/user/accumulo/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,36 +17,36 @@ geomesa.batchwriter.latency

The latency is defined as a duration, e.g. ``60 seconds`` or ``100 millis``. See the `Accumulo API`__ for details.

__ https://accumulo.apache.org/1.8/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxLatency(long,%20java.util.concurrent.TimeUnit)
__ https://accumulo.apache.org/1.9/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxLatency(long,%20java.util.concurrent.TimeUnit)

geomesa.batchwriter.maxthreads
++++++++++++++++++++++++++++++

Determines the max threads used for writing. See the `Accumulo API`__ for details.

__ https://accumulo.apache.org/1.8/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxWriteThreads(int)
__ https://accumulo.apache.org/1.9/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxWriteThreads(int)

geomesa.batchwriter.memory
++++++++++++++++++++++++++

The memory is defined in bytes, e.g. ``10mb`` or ``100kb``. See the `Accumulo API`__ for details.

__ https://accumulo.apache.org/1.8/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxMemory(long)
__ https://accumulo.apache.org/1.9/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setMaxMemory(long)

geomesa.batchwriter.timeout.millis
++++++++++++++++++++++++++++++++++

The timeout is defined as a duration, e.g. ``60 seconds`` or ``100 millis``. See the `Accumulo API`__ for details.

__ https://accumulo.apache.org/1.8/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setTimeout(long,%20java.util.concurrent.TimeUnit)
__ https://accumulo.apache.org/1.9/apidocs/org/apache/accumulo/core/client/BatchWriterConfig.html#setTimeout(long,%20java.util.concurrent.TimeUnit)

Map Reduce Input Splits Properties
----------------------------------

The following properties control the number of input splits for a map reduce job. See the
`Accumulo User Manual`__ for details.

__ https://accumulo.apache.org/1.8/accumulo_user_manual#_splitting
__ https://accumulo.apache.org/1.9/accumulo_user_manual#_splitting

geomesa.mapreduce.splits.max
++++++++++++++++++++++++++++
Expand Down Expand Up @@ -74,4 +74,4 @@ instance.zookeeper.timeout
The Zookeeper timeout is defined in milliseconds, according to the Accumulo specification. See the
`Accumulo User Manual`__ for details.

__ https://accumulo.apache.org/1.8/accumulo_user_manual.html#_instance_zookeeper_timeout
__ https://accumulo.apache.org/1.9/accumulo_user_manual.html#_instance_zookeeper_timeout
120 changes: 91 additions & 29 deletions docs/user/accumulo/data_management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,34 +78,6 @@ Table sharing can be disabled by setting the user data ``geomesa.table.sharing``

See :ref:`set_sft_options` for more details on how to set user data values.

Moving and Migrating Data
-------------------------

If you want an offline copy of your data, or you want to move data between networks, you can
export compressed Avro files containing your simple features. To do this using the command line
tools, use the export command with the ``format`` and ``gzip`` options:

.. code-block:: bash

$ geomesa-accumulo export -c myTable -f mySft --format avro --gzip 6 -o myFeatures.avro

To re-import the data into another environment, you may use the ingest command. Because the Avro file
is self-describing, you do not need to specify any converter config or simple feature type definition:

.. code-block:: bash

$ geomesa-accumulo ingest -c myTable -f mySft myFeatures.avro

If your data is too large for a single file, you may run multiple exports and use CQL
filters to separate your data.

If you prefer to not use Avro files, you may do the same process with delimited text files:

.. code-block:: bash

$ geomesa-accumulo export -c myTable -f mySft --format tsv --gzip 6 -o myFeatures.tsv.gz
$ geomesa-accumulo ingest -c myTable -f mySft myFeatures.tsv.gz

.. _index_upgrades:

Upgrading Existing Indices
Expand All @@ -116,7 +88,8 @@ the index format for a given schema is fixed when it is first created. Updating
will provide bug fixes and new features, but will not update existing data to new index formats.

The exact version of an index used for each schema can be read from the ``SimpleFeatureType`` user data,
or by simple examining the name of the index tables created by GeoMesa.
or by simple examining the name of the index tables created by GeoMesa. See below for a description of
current index versions.

Using the GeoMesa command line tools, you can add or update an index to a newer version using ``add-index``.
For example, you could add the XZ3 index to replace the Z3 index for a feature type with non-point geometries.
Expand All @@ -125,3 +98,92 @@ only populate features matching a CQL filter (e.g. the last month), or choose to
data. The update is seamless, and clients can continue to query and ingest while it runs.

See :ref:`add_index_command` for more details on the command line tools.

.. _accumulo_index_versions:

Accumulo Index Versions
-----------------------

See :ref:`index_versioning` for an explanation of index versions. The following versions are available in Accumulo:

.. tabs::

.. tab:: Z3

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.1.0 Initial implementation
2 1.2.1 Support for non-point geometries

Support for shards
3 1.2.5 Removed support for non-point geometries in favor of xz

Removed redundant feature ID in row value to reduce size on disk

Support for per-attribute visibility
4 1.3.1 Support for table sharing
5 2.0.0 Uses fixed Z-curve implementation
============= =============== =================================================================

.. tab:: Z2

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.2.2 Initial implementation
2 1.2.5 Removed support for non-point geometries in favor of xz

Removed redundant feature ID in row value to reduce size on disk

Support for per-attribute visibility
3 1.3.1 Optimized deletes
4 2.0.0 Uses fixed Z-curve implementation
============= =============== =================================================================

.. tab:: XZ3

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.2.5 Initial implementation
============= =============== =================================================================

.. tab:: XZ2

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.2.5 Initial implementation
============= =============== =================================================================

.. tab:: Attribute

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.0.0 Initial implementation
2 1.1.0 Added secondary date index
3 1.2.5 Removed redundant feature ID in row value to reduce size on disk

Support for per-attribute visibility
4 1.3.1 Added secondary Z index
5 1.3.2 Support for shards
6 2.0.0-m.1 Internal row layout change
7 2.0.0 Uses fixed Z-curve implementation
============= =============== =================================================================

.. tab:: ID

============= =============== =================================================================
Index Version GeoMesa Version Notes
============= =============== =================================================================
1 1.0.0 Initial implementation
2 1.2.5 Removed redundant feature ID in row value to reduce size on disk

Support for per-attribute visibility
3 2.0.0 Standardized index identifier to 'id'
============= =============== =================================================================

Note that GeoMesa versions prior to 1.2.2 included a geohash index. That index has been replaced with
the Z indices and is no longer supported.
Loading