Elastic Stack Lab07

In this lab we will be playing around with different analyzers. These analyzers determine what type of search results we get when querying full text.

There are two different types of analyzers for querying text fields.

Keyword: The keyword type is much more rigid and is meant to search for specific full words, using case-sensitive matching. This means that if I searched for is I wouldn’t get results for this or his. I also would not see text containing Is, IS or iS.

Text: The text type is much more flexible and enables analyzers to search text fields however you want. Partial matching, stemming, case-insensitive, case-sensitive, synonyms etc..

If I searched for is I would get back all words container is

Now let’s demonstrate this through a lab.

We are going to do a search for “Star Trek” across our movies index and review the results.

In a terminal use curl with a GET request to search the movies index with the following data

{
  "query": {
    "match": {
       "title": "Star Trek"
    }
  }
}

Now review the results, did you get what you were expecting? Pay attention to the first result, why is it different than what you searched for?

If the index has a small dataset but a large number of shards this can lead to unexpected results. For example we searched for “Star Trek” but because we are using the text mapping for title it actually looked in all the different shards for any documents containing the words Star and Trek. The inverse document frequency is computed per shard and we had 2 matching documents with Star , one of the shards had higher relevancy score so Star Wars was returned first.

In production you will have more documents and larger datasets so this won’t be an issue.

Now let’s try another text search but this time we are going to use the query term match_phrase and see what the result is.

In a terminal use curl with a GET request to search the movies index with the following data

{
  "query": {
    "match_phrase": {
       "genre": "sci"
    }
  }
}

What are the results? Default analyzer is being used for genre field, so it’s doing case-insensitive, full text search.

Start over

Elasticsearch does not allow re-mapping of an index so if you decide later on you want to change mapping to get different search results you’ll need to delete the entire index and start over.

Let’s go ahead and delete our movies index.

curl -XDELETE 127.0.0.1:9200/movies

Now we need to create a new movies index with some changes to the way it searches text.

curl -XPUT 127.0.0.1:9200/movies -d '
{
    "mappings" : {
        "movie": {
            "properties": {
                "id": {"type": "integer"},
                "year": {"type": "date"},
                "genre": {"type": "keyword"},
                "title": {"type": "text", "analyzer": "english"}
            }
        }
    }
}'

Notice we changed the genre to keyword so now it will only return exact matches, and we won’t get any results if we do not type the entire keyword. We also added a specific analyzer to the title field telling it to only search for matches in English. This could lead to issues if we have foreign films in our index but it does illustrate how we can specify an analyzer for a field.

Now that we’ve deleted everything and created the new mapping we need to repopulate our index.

curl -XPUT 127.0.0.1:9200/movies/_bulk?pretty --data-binary @movies.json

The return should show that our index was populated with the contents of movies.json.

Search again

Now that we’ve changed the mapping and re-populated our data let’s see if it responds as expected when we do a query for sci in the genre field.

Using curl with the GET verb, query the Elasticsearch API movies index with the following data

{ 
  "query": {
     "match": {
        "genre": "sci"
    }
  }
}

Did the keyword type work? Do you get any partial or case-insensitive results?

Now let’s try to search for sci-fi and see what happens.

Using curl with the GET verb, query the Elasticsearch API movies index with the following data

{ 
  "query": {
     "match": {
        "genre": "sci-fi"
    }
  }
}

Are the results what you expected? Why or why not?

Now let’s try to query it one more time, paying attention to case.

{ 
  "query": {
     "match": {
        "genre": "Sci-Fi"
    }
  }
}

Spend some time querying different genres and see what happens.

Lab Complete