|
2 | 2 |
|
3 | 3 | > ...because it's magic
|
4 | 4 |
|
5 |
| -The basic idea of **mrlin** is to enable **M**ap **R**educe processing of **Lin**ked Data - hence the name. In the following I'm going to show you first to how to use HBase to store Linked Data with RDF and then how to use Hadoop to execute MapReduce jobs. |
| 5 | +The basic idea of **mrlin** is to enable **M**ap **R**educe processing of **Lin**ked Data - hence the name. In the following I'm going to show you first to how to use HBase to store Linked Data with RDF, and then how to use Hadoop to run MapReduce jobs. |
6 | 6 |
|
7 | 7 | ## Background
|
8 | 8 |
|
9 | 9 | ### Dependencies
|
10 | 10 |
|
11 | 11 | * You'll need [Apache HBase](http://hbase.apache.org/) first. I downloaded [`hbase-0.94.2.tar.gz`](http://ftp.heanet.ie/mirrors/www.apache.org/dist/hbase/stable/hbase-0.94.2.tar.gz) and followed the [quickstart](http://hbase.apache.org/book/quickstart.html) up to section 1.2.3. to set it up.
|
12 |
| -* The mrlin Python scripts depend on [Happybase](https://github.com/wbolster/happybase). See also the [docs](http://happybase.readthedocs.org/en/latest/index.html) for further details. |
| 12 | +* The mrlin Python scripts depend on: |
| 13 | + * [Happybase](https://github.com/wbolster/happybase) to manage HBase; see also the [docs](http://happybase.readthedocs.org/en/latest/index.html) for further details. |
| 14 | + * [mrjob](https://github.com/Yelp/mrjob) to run MapReduce jobs; see also the [docs](http://packages.python.org/mrjob/) for further details. |
13 | 15 |
|
14 | 16 | ### Representing RDF triples in HBase
|
15 | 17 | Learn about how mrlin represents [RDF triples in HBase](https://github.com/mhausenblas/mrlin/wiki/RDF-in-HBase).
|
@@ -69,14 +71,18 @@ To reset the HBase table (and remove all triples from it), use the [`mrlin utils
|
69 | 71 | (hb)michau@~/Documents/dev/mrlin$ python mrlin_utils.py clear
|
70 | 72 |
|
71 | 73 | ### Query
|
72 |
| -In order to query the mrlin datastore, use the [`mrlin query`](https://raw.github.com/mhausenblas/mrlin/master/mrlin_query.py) script: |
| 74 | +In order to query the mrlin datastore in HBase, use the [`mrlin query`](https://raw.github.com/mhausenblas/mrlin/master/mrlin_query.py) script: |
73 | 75 |
|
74 | 76 | (hb)michau@~/Documents/dev/mrlin$ python mrlin_query.py Tribes
|
75 | 77 | 2012-10-30T04:01:22 Scanning table rdf with filter ValueFilter(=,'substring:Tribes')
|
76 | 78 | 2012-10-30T04:01:22 Key: http://dbpedia.org/resource/Galway - Value: {'O:148': 'u\'"City of the Tribes"\'', 'O:66': 'u\'"City of the Tribes"\'', ...}
|
77 | 79 | 2012-10-30T04:01:22 ============
|
78 | 80 | 2012-10-30T04:01:22 Query took me 0.01 seconds.
|
79 | 81 |
|
| 82 | +### Running MapReduce jobs |
| 83 | + |
| 84 | +*TBD* |
| 85 | + |
80 | 86 | ## License
|
81 | 87 |
|
82 | 88 | All artifacts in this repository are licensed under [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0.html) Software License.
|
0 commit comments