|
| 1 | +# BERT Question and Answer |
| 2 | + |
| 3 | +```elixir |
| 4 | +Mix.install([ |
| 5 | + {:tflite_elixir, "~> 0.3.1"}, |
| 6 | + {:req, "~> 0.3.0"}, |
| 7 | + {:kino, "~> 0.9.0"} |
| 8 | +]) |
| 9 | +``` |
| 10 | + |
| 11 | +## How it works |
| 12 | + |
| 13 | +The model can be used to build a system that can answer users’ questions in natural language. It was created using a pre-trained BERT model fine-tuned on SQuAD 1.1 dataset. |
| 14 | + |
| 15 | +[BERT](https://github.com/google-research/bert), or Bidirectional Encoder Representations from Transformers, is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing tasks. |
| 16 | + |
| 17 | +This app uses a compressed version of BERT, MobileBERT, that runs 4x faster and has 4x smaller model size. |
| 18 | + |
| 19 | +[SQuAD](https://rajpurkar.github.io/SQuAD-explorer/), or Stanford Question Answering Dataset, is a reading comprehension dataset consisting of articles from Wikipedia and a set of question-answer pairs for each article. |
| 20 | + |
| 21 | +The model takes a passage and a question as input, then returns a segment of the passage that most likely answers the question. It requires semi-complex pre-processing including tokenization and post-processing steps that are described in the BERT [paper](https://arxiv.org/abs/1810.04805) and implemented in the sample app. |
| 22 | + |
| 23 | +```elixir |
| 24 | +alias TFLiteElixir.MobileBert |
| 25 | +``` |
| 26 | + |
| 27 | +## Download model |
| 28 | + |
| 29 | +Download the pre-trained TensorFlow Lite MobileBert model. |
| 30 | + |
| 31 | +```elixir |
| 32 | +# /data is the writable portion of a Nerves system |
| 33 | +downloads_dir = |
| 34 | + if Code.ensure_loaded?(Nerves.Runtime), do: "/data/livebook", else: System.tmp_dir!() |
| 35 | + |
| 36 | +download = fn url, save_as -> |
| 37 | + save_as = Path.join(downloads_dir, save_as) |
| 38 | + unless File.exists?(save_as), do: Req.get!(url, output: save_as) |
| 39 | + save_as |
| 40 | +end |
| 41 | + |
| 42 | +data_files = |
| 43 | + [ |
| 44 | + mobiler_bert: { |
| 45 | + "https://tfhub.dev/tensorflow/lite-model/mobilebert/1/metadata/1?lite-format=tflite", |
| 46 | + "mobilebert.tflite" |
| 47 | + }, |
| 48 | + ] |
| 49 | + |> Enum.map(fn {key, {url, save_as}} -> {key, download.(url, save_as)} end) |
| 50 | + |> Map.new() |
| 51 | + |
| 52 | +data_files |
| 53 | +|> Enum.map(fn {k, v} -> [name: k, location: v] end) |
| 54 | +|> Kino.DataTable.new(name: "Data files") |
| 55 | +``` |
| 56 | + |
| 57 | +## Load MobileBert |
| 58 | + |
| 59 | +```elixir |
| 60 | +alias TFLiteElixir.MobileBert |
| 61 | +{:ok, bert} = MobileBert.init(data_files.mobile_bert) |
| 62 | +``` |
| 63 | + |
| 64 | +## Example |
| 65 | + |
| 66 | +Passage (Input) |
| 67 | + |
| 68 | +```elixir |
| 69 | +content = """ |
| 70 | +Google LLC is an American multinational technology company |
| 71 | +that specializes in Internet-related services and products, |
| 72 | +which include online advertising technologies, search engine, |
| 73 | +cloud computing, software, and hardware. It is considered one |
| 74 | +of the Big Four technology companies, alongside Amazon, Apple, |
| 75 | +and Facebook. |
| 76 | +
|
| 77 | +Google was founded in September 1998 by Larry Page and Sergey |
| 78 | +Brin while they were Ph.D. students at Stanford University in |
| 79 | +California. Together they own about 14 percent of its shares |
| 80 | +and control 56 percent of the stockholder voting power through |
| 81 | +supervoting stock. They incorporated Google as a California |
| 82 | +privately held company on September 4, 1998, in California. |
| 83 | +Google was then reincorporated in Delaware on October 22, 2002. |
| 84 | +An initial public offering (IPO) took place on August 19, 2004, |
| 85 | +and Google moved to its headquarters in Mountain View, California, |
| 86 | +nicknamed the Googleplex. In August 2015, Google announced plans |
| 87 | +to reorganize its various interests as a conglomerate called |
| 88 | +Alphabet Inc. Google is Alphabet's leading subsidiary and will |
| 89 | +continue to be the umbrella company for Alphabet's Internet |
| 90 | +interests. Sundar Pichai was appointed CEO of Google, replacing |
| 91 | +Larry Page who became the CEO of Alphabet. |
| 92 | +""" |
| 93 | + |
| 94 | +:ok |
| 95 | +``` |
| 96 | + |
| 97 | +Question (Input) |
| 98 | + |
| 99 | +```elixir |
| 100 | +query = "Who is the CEO of Google?" |
| 101 | +``` |
| 102 | + |
| 103 | +Answer (Output) |
| 104 | + |
| 105 | +```elixir |
| 106 | +MobileBert.run(bert, query, content) |
| 107 | +|> Enum.map(fn {score, answer} -> [score: Float.round(score, 6), answer: answer] end) |
| 108 | +|> Kino.DataTable.new(name: "Answer") |
| 109 | +``` |
0 commit comments