Skip to content

MIABIS converter tutorial

José Villaveces edited this page Jan 18, 2016 · 14 revisions

MIABIS converter tutorial

Learning objectives

By following this tutorial you will learn:

  • How to install, configure and run all the required software.
  • How to format your data.
  • How to index samples and access them though HTTP.
  • How to install and test Kibana.

Before you begin

Please make sure that your system has a recent version of Java (1.8). You can find the latest java version from your system in www.java.com.

Tools

In order to properly follow this tutorial you will need to download the following tools:

  • MIABIS converter: A tool that facilitates sample indexing. It reads and process the input files. (download it here)
  • Elasticsearch: A search server that provides a distributed full-text search engine with an HTTP interface. (download it here)
  • Kibana: An open source, browser based analytics and search dashboard for Elasticsearch. Since Kibana is a web application, it requires a web server. For simplicity purposes, you can download kibana running on Jetty (a java web server) from here.

Data

For this tutorial, we have prepared a sample dataset (download it here). This dataset contains information of 2000 samples in 6 files:

  • biobanks.txt: contains all the fields required to describe a biobank.
  • contacts.txt: contains all the fields required to describe a contact. A contact is a person responsible for a biobank, sample collection or study.
  • sample_collections.txt: contains all the fields required to describe a sample collection.
  • studies.txt: contains all the fields required to describe a study.
  • map.properties: This file ensures that the column titles in each file point to a valid model attribute and entity. For example, a quick look into map.properties the column Name in biobanks.txt points to the attribute biobank.name:
biobank.id = ID
biobank.acronym = Acronym
biobank.name = Name
biobank.url = URL

Elasticsearch

Before you can continue, please unzip the downloaded file elasticsearch-1.7.3.zip.

Configuration

Before we start Elasticsearch, we need to tweak the configuration file. In particular we are interested in two configuration flags:

  1. cluster.name: This is the name you will assign to your cluster. This configuration flag is important since it is used to discover and auto-join other nodes. It is important that your Cluster Name is unique and reflects the name of your biobank.

  2. node.name: This is the name you will assign to your node. This flag sets the node name. It is important to identify nodes in a cluster.

  3. http.cors.enabled: This flag enables CORS. For this tutorial we are goint to set it to true since we will access the data in our index using kibana.

To change the default configuration, open the file elasticsearch-1.7.3/config/elasticsearch.yml. Look for the properties cluster.name and node.name and replace the default values. In this case, we are going to use elixir as our cluster name and node1 as our node name:

################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elixir


#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: node1

...

############################## Network And HTTP ###############################

# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

http.cors.enabled: true

Keep in mind that depending on your setup you may need/want to tweak other settings. Here is a list with all the configuration flags available.

Run

In order to start elasticsearch open a console/terminal, navigate to elasticsearch-1.7.3/bin and execute the file elasticsearch. If you are using windows the run the file elasticsearch.bat. You can test if elasticsearch is running by opening a browser and navigating to http://localhost:9200/. The response should look like this:

{
  "status" : 200,
  "name" : "node1",
  "cluster_name" : "elixir",
  "version" : {
    "number" : "1.7.3",
    "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
    "build_timestamp" : "2015-10-15T09:14:17Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Indexing samples

Once elasticsearch is running, you can move on to index the sample dataset. First unzip the dataset (Elixir.zip) and paste the miabis-converter-1.0.2-SNAPSHOT.jar inside. Open a terminal/console and navigate to the unzipped Elixir folder. Once there, run the following command (more info about the tool):

java -jar miabis-converter-1.0.2-SNAPSHOT.jar -i samples.txt biobanks.txt sample_collections.txt studies.txt contacts.txt -m map.properties -n elixir -z elixir

To test that the samples are indexed, go to http://localhost:9200/_count?pretty=1. The page should look this:

{
  "count" : 2000,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

The server is telling us that there are 2000 samples indexed. That means that everything worked just fine.

Kibana

In order to run kibana simply unzip the file Jetty_Kibana3.zip. Open a terminal/console, navigate to the unzipped folder and run the command java -jar start.jar. After that go to http://localhost:8080/kibana-3.1.2 and explore kibana.