Elastic Stack Lab04

Before beginning this lab create the alias for curl discussed in the slides.

Open ~/.bashrc in vim or your favorite editor and add:

alias curl="/usr/bin/curl -H 'Content-type: application/json' "

Now that source the updated ~/.bashrc file to apply alias.

source ~/.bashrc

In this lab you will be importing data from the MovieLens dataset we downloaded earlier.

Import Single Document

Now that we have the mapping configured for ‘year’ we can use curl to insert a specific movie.

Remember that for the purposes of this class we are using curl but in most other environments you would interact with the Elasticsearch API through a programming language, and there are client libraries available for Python, Java, Golang etc..

Insert movie

curl -XPUT 127.0.0.1:9200/movies/movie/109488 -d '
{
"genre" : ["IMAX","Sci-Fi"], "title" : "Interstellar", "year" :2014
}'

Let’s break down this command and go through each section. We start with the curl command which sends our PUT request to Elasticsearch’s API using a REST call.

The data section contains genre which is actually a list because we want to map it to IMAP and Sci-Fi genres. We then give it a title and provide the year it was released.

If this ran successfully you should see something like the following confirming it was added.

{"_index":"movies","_type":"movie","_id":"109488","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Retrieve movie

We can now run a curl command with the GET verb to confirm the movie is actually in Elasticsearch. We are going to pass _search without any other values so that it will show us all documents inside of our movies index.

curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty

You should now see confirmation that the movie was added.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "movie",
        "_id" : "109488",
        "_score" : 1.0,
        "_source" : {
          "genre" : [
            "IMAP",
            "Sci-Fi"
          ],
          "title" : "Interstellar",
          "year" : 2014
        }
      }
    ]
  }
}

Awesome! You’ve successfully inserted your first document in Elasticsearch!

Congrats!

Import Many Documents

Elasticsearch has a bulk import endpoint which does exactly what you would think… it allows you to import a bunch of documents at the same time.

You will be downloading a JSON file that contains a few movies and using curl to import that data into Elasticsearch.

First let’s start by pulling down the data file.

wget http://bit.ly/es-movies-data -O movies.json

Now let’s look inside and confirm the data looks like we expect.

vi movies.json 

You should now see the same list of movies from the slide.

{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "135569" } }
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "122886" } }
{ "id": "122886", "title" : "Star Wars: Episode VII - The Force Awakens", "year":2015 , "genre":["Action", "Adventure", "Fantasy", "Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "109487" } }
{ "id": "109487", "title" : "Interstellar", "year":2014 , "genre":["Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "58559" } }
{ "id": "58559", "title" : "Dark Knight, The", "year":2008 , "genre":["Action", "Crime", "Drama", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "1924" } }
{ "id": "1924", "title" : "Plan 9 from Outer Space", "year":1959 , "genre":["Horror", "Sci-Fi"] }


Let’s break this down a little.

Each movie has a pair of lines. The first line is the create command, it creates the index for the document we will be uploading. This includes the index, type and id. This information is used to hash the document to a specific shard and send the data off to the matching shard.

After that the individual documents are imported into the index with the fields needed to query them.

We are now going to insert all 5 movies at once.

curl -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @movies.json

After running that command you should get something back stating whether all the movies were inserted successfully. Take a look at that and see if any of the imports failed.

Why or why not?

{
  "took" : 42,
  "errors" : false,
  "items" : [
    {
      "create" : {
        "_index" : "movies",
        "_type" : "movie",
        "_id" : "135569",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
..<snip>

Now that we’ve imported these documents let’s see what we can do with them.

List all documents in the movies index

curl -XGET 127.0.0.1:9200/movies/_search?pretty

Search for all movies with the word the in the title.

curl -XGET 127.0.0.1:9200/movies/_search?pretty -d '
{
    "query": {
        "query_string" : {
            "default_field" : "title",
            "query" : "the"
        }
    }
}
'

Now play around and search using other fields.

Can you find all the movies that are in the Action genre?

What other fields can you query?

Lab Complete