-
Notifications
You must be signed in to change notification settings - Fork 3
MIABIS converter tutorial
By following this tutorial you will learn:
- How to install, configure and run all the required software.
- How to format your data.
- How to index samples and access them though HTTP.
- How to install and test Kibana.
Please make sure that your system has a recent version of Java (1.8). You can find the latest java version from your system in www.java.com.
In order to properly follow this tutorial you will need to download the following tools:
- MIABIS converter: A tool that facilitates sample indexing. It reads and process the input files. (download it here)
- Elasticsearch: A search server that provides a distributed full-text search engine with an HTTP interface. (download it here)
- Kibana: An open source, browser based analytics and search dashboard for Elasticsearch. Since Kibana is a web application, it requires a web server. For simplicity purposes, you can download kibana running on Jetty (a java web server) from here.
For this tutorial, we have prepared a sample dataset (download it here). This dataset contains information of 2000 samples in 6 files:
- biobanks.txt: contains all the fields required to describe a biobank.
- contacts.txt: contains all the fields required to describe a contact. A contact is a person responsible for a biobank, sample collection or study.
- sample_collections.txt: contains all the fields required to describe a sample collection.
- studies.txt: contains all the fields required to describe a study.
-
map.properties: This file ensures that the column titles in each file point to a valid model attribute and entity. For example, a quick look into map.properties the column
Name
in biobanks.txt points to the attributebiobank.name
:
biobank.id = ID
biobank.acronym = Acronym
biobank.name = Name
biobank.url = URL
Before you can continue, please unzip the downloaded file elasticsearch-1.7.3.zip
.
Before we start Elasticsearch, we need to tweak the configuration file. In particular we are interested in two configuration flags:
-
cluster.name
: This is the name you will assign to your cluster. This configuration flag is important since it is used to discover and auto-join other nodes. It is important that your Cluster Name is unique and reflects the name of your biobank. -
node.name
: This is the name you will assign to your node. This flag sets the node name. It is important to identify nodes in a cluster. -
http.cors.enabled
: This flag enables CORS. For this tutorial we are goint to set it to true since we will access the data in our index using kibana.
To change the default configuration, open the file elasticsearch-1.7.3/config/elasticsearch.yml
. Look for the properties cluster.name
and node.name
and replace the default values. In this case, we are going to use elixir as our cluster name and node1 as our node name:
################################### Cluster ###################################
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elixir
#################################### Node #####################################
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: node1
...
############################## Network And HTTP ###############################
# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).
http.cors.enabled: true
Keep in mind that depending on your setup you may need/want to tweak other settings. Here is a list with all the configuration flags available.
In order to start elasticsearch open a console/terminal, navigate to elasticsearch-1.7.3/bin
and execute the file elasticsearch
. If you are using windows the run the file elasticsearch.bat
. You can test if elasticsearch is running by opening a browser and navigating to http://localhost:9200/. The response should look like this:
{
"status" : 200,
"name" : "node1",
"cluster_name" : "elixir",
"version" : {
"number" : "1.7.3",
"build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
"build_timestamp" : "2015-10-15T09:14:17Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Once elasticsearch is running, you can move on to index the sample dataset. First unzip the dataset (Elixir.zip
) and paste the miabis-converter-1.0.2-SNAPSHOT.jar
inside. Open a terminal/console and navigate to the unzipped Elixir
folder. Once there, run the following command (more info about the tool):
java -jar miabis-converter-1.0.2-SNAPSHOT.jar -i samples.txt biobanks.txt sample_collections.txt studies.txt contacts.txt -m map.properties -n elixir -z elixir
To test that the samples are indexed, go to http://localhost:9200/_count?pretty=1. The page should look this:
{
"count" : 2000,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
The server is telling us that there are 2000 samples indexed. That means that everything worked just fine.
In order to run kibana simply unzip the file Jetty_Kibana3.zip. Open a terminal/console, navigate to the unzipped folder and run the command java -jar start.jar
. After that go to http://localhost:8080/kibana-3.1.2 and explore kibana.