Skip to content

MIABIS converter tutorial

José Villaveces edited this page Jan 15, 2016 · 14 revisions

MIABIS converter tutorial

Table of contents

Learning objectives

By following this tutorial you will learn:

  • How to install, configure and run all the required software.
  • How to format your data.
  • How to index samples and access them though HTTP.

Before you begin

Please make sure that your system has a recent version of Java (1.8). You can find the latest java version from your system in www.java.com.

Tools

In order to properly follow this tutorial you will need to download the following tools:

  • MIABIS converter: A tool that facilitates sample indexing. It reads and process the input files. (download it here)
  • Elasticsearch: A search server that provides a distributed full-text search engine with an HTTP interface. (download it here)

Data

For this tutorial, we have prepared a sample dataset. (download it here)

Elasticsearch

Before you can continue, please unzip the downloaded file elasticsearch-1.7.3.zip.

Configuration

Before we start Elasticsearch, we need to tweak the configuration file. In particular we are interested in two configuration flags:

  1. cluster.name: This is the name you will assign to your cluster. This configuration flag is important since it is used to discover and auto-join other nodes. It is important that your Cluster Name is unique and reflects the name of your biobank.

  2. node.name: This is the name you will assign to your node. This flag sets the node name. It is important to identify nodes in a cluster.

To change the default configuration, open the file elasticsearch-1.7.3/config/elasticsearch.yml. Look for the properties cluster.name and node.name and replace the default values. In this case, we are going to use elixir as our cluster name and node1 as our node name:

################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elixir


#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: node1

Keep in mind that depending on your setup you may need/want to tweak other settings. Here is a list with all the configuration flags available.

Run

In order to start elasticsearch open a console/terminal, navigate to elasticsearch-1.7.3/bin and execute the file elasticsearch. If you are using windows the run the file elasticsearch.bat. You can test if elasticsearch is running by opening a browser and navigating to http://localhost:9200/. The response should look like this:

{
  "status" : 200,
  "name" : "node1",
  "cluster_name" : "elixir",
  "version" : {
    "number" : "1.7.3",
    "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
    "build_timestamp" : "2015-10-15T09:14:17Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}
Clone this wiki locally