Write search queries with OpenSearch® and Python#
Learn how to write and run search queries on your OpenSearch cluster using a Python OpenSearch client.
For our data, we use a food recipe dataset from Kaggle. After injecting this data into our cluster, we will write search queries to find different food recipes.
Pre-requisites#
GitHub repository#
The full code used here can be found in a GitHub repository. The files are organized according to their functions:
config.py, information to connect to the cluster
index.py, methods that manipulate the index
search.py, customized search query methods
helpers.py, response handler of search requests
We use Typer
Python library to create CLI commands to run from the terminal. Follow the instructions to get the code on your machine and try the commands:
Clone the repository and install the dependencies
git clone https://github.com/aiven/demo-opensearch-python
pip install -r requirements.txt
Download the dataset from Kaggle’s recipe dataset, and save the
full_format_recipes.json
in the current folder of the demo repository.
Connect to the OpenSearch cluster with Python#
Make sure to update the SERVICE_URI
to your cluster SERVICE_URI
in the .env
file as explained in the README.
Once the environment variables are set, create an OpenSearch Python client to connect to your OpenSearch cluster using the connection instructions. You can see find the whole code sample in the config.py:
import os
from dotenv import load_dotenv
from opensearchpy import OpenSearch
load_dotenv()
INDEX_NAME = "epicurious-recipes"
SERVICE_URI = os.getenv("SERVICE_URI")
client = OpenSearch(SERVICE_URI, use_ssl=True)
Tip
The SERVICE_URI
value can be found in the Aiven Console dashboard.
After creating a client with a valid SERVICE_URI
, you’re set to interact with your cluster.
Upload data to OpenSearch using Python#
Once you’re connected, the next step should be to inject data into our cluster. This is done in our demo with the load_data function.
You can inject the data to your cluster by running:
python index.py load-data
Once the data is loaded, we can retrieve the data mapping to explore the structure of the data, with their respective fields and types. You can find the code implementation in the get_mapping function.
Check the structure of your data by running:
python index.py get-mapping
You should be able to see the fields’ output:
[
'calories',
'categories',
'date',
'desc',
'directions',
'fat',
'ingredients',
'protein',
'rating',
'sodium',
'title'
]
And the mapping with the fields and their respective types.
{'calories': {'type': 'float'},
'categories': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}},
'type': 'text'},
'date': {'type': 'date'},
'desc': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}},
'type': 'text'},
'directions': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}},
'type': 'text'},
'fat': {'type': 'float'},
'ingredients': {'fields': {'keyword': {'ignore_above': 256,
'type': 'keyword'}},
'type': 'text'},
'protein': {'type': 'float'},
'rating': {'type': 'float'},
'sodium': {'type': 'float'},
'title': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}},
'type': 'text'}}
All set to start writing your search queries.
Query the data#
Use the search()
method#
You have an OpenSearch client and data injected in your cluster, so you can start writing search queries. Python OpenSearch client has a handy method called search()
, which we’ll use to run our queries.
We can check the method signature to understand the function and which parameters we’ll use. As you can see, all the parameters are optional in the search()
method. Find below the method signature:
client.search: (body=None, index=None, doc_type=None, params=None, headers=None)
To run the search queries, we’ll use two of these parameters - index
and body
:
index
, parameter refers to the name of the index we used to load the data. Therefore, it does not change.body
, parameter refers to the search query specifications. We’ll modify it according to our query purpose.
Lucene query and query DSL#
OpenSearch supports the Lucene query syntax to perform searches by using the q
parameter. The q
parameter expects a string with your query specifications, for example:
client.search({
index: 'recipes',
q: 'ingredients:broccoli AND calories:(>=100 AND <200)'
})
For users, who prefer to work with nested objects and familiar structures like JSON (equivalent to Python dictionaries), OpenSearch supports the query domain-specific language (DSL).
For the Query DSL, the field body
expects a dictionary object which can facilitate the construction of more complex queries depending on your use case, for example:
query_body = {
"query": {
"multi_match": {
"query": "Garlic-Lemon",
"fields": [
"title",
"ingredients"
]
}
}
}
In this example, we are searching for “Garlic-Lemon” across title
and ingredients
fields. Try out yourself using our demo:
python search.py multi-match title ingredients Garlic-Lemon
Check what comes out from this interesting combination 🧄 🍋 :
[
'Garlic-Lemon Potatoes ',
'Lemon Garlic Mayonnaise ',
'Lemon Garlic Mayonnaise ',
'Garlic-Lemon Croutons ',
'Lemon-Garlic Vinaigrette ',
'Lemon-Garlic Lamb Chops ',
'Lemon Pepper Garlic Vinaigrette ',
'Lemon-Garlic Baked Shrimp ',
'Lemon-Herb Turkey with Lemon-Garlic Gravy ',
'Garlic, Oregano, and Lemon Vinaigrette '
]
For this tutorial, we focus on the query DSL syntax to construct queries modifying the body
parameter. In the method search()
, one of the optional fields is the size
field, which is defined as the number of results returned in the search.
Note
The default value of the size
field is 10, and we’re using the default value in this tutorial.
Write common queries#
In the next section, we cover some of the more common queries. Time to start querying 🔎
Create match
query#
The match
query helps you to find the best matches with multiple search words. It is the default option for a full-text search.
You can build your match query based on a field
and the query
that you are searching for. The DSL defaults to the “or” operator
.
query_body = {
"query": {
"match": {
field: {
"query": query,
"operator": operator
}
}
}
}
Thinking about how the match query works, if we run this query, it will return matches. This could be confusing because in our cluster the field fat
corresponds to a value float
, not a string
.
query_body = {
"query": {
"match": {
"fat": {
"query": "0"
}
}
}
}
This is possible because full-text queries, such as the match query, use an analyzer to make the data optimized for search. As we have not specified an analyzer when we searched, the default standard analyzer is used:
query_body = {
"query": {
"match": {
"fat": {
"query": "0",
"analyzer": "standard",
}
}
}
}
The default standard analyzer drops most punctuation, breaks up text into individual words, and lower cases them to optimize the search. If you want to choose a different analyzer, check out the available ones in the OpenSearch documentation.
You can find out how a customized match query can be written with your Python OpenSearch client in the search_match() function. You can run yourself the code to explore the match
function. For example, if you want to find out recipes with the name “Spring” on them:
python search.py match title Spring
As a result of the “Spring” search recipes, you’ll find:
[
'Spring Fever ',
'Spring Rolls ',
'Spring Feeling ',
'Spring Fever ',
'Spring Rolls ',
'Spring Feeling ',
'Spring Vegetable Sauté ',
'Spring-Onion Cocktail ',
'Braised Spring Legumes ',
'Asian Spring Rolls '
]
See also
Find out more about match queries.
Use a multi_match
query#
One useful query when you want to align the match
query properties but expand it to search in more fields is the multi_match
query. You can add several fields in the fields
property, to search for the query
string across all those fields included in the list.
query_body = {
"query": {
"multi_match": {
"query": query,
"fields": [field1, field2 ...]
}
}
}
In our demo, we have a function called search_multi_match() that build customized multi match queries in Python. You can use our demo with multi-match
keyword followed by the fields
and the query
to explore this type of query.
Suppose you are looking for citrus recipes 🍋. For example, recipes with ingredients and lemon in the title, you can run your query from our demo as:
python search.py multi-match title ingredients lemon
Match with phrases#
This query can be used to match exact phrases in a field. Where the query
is the phrase that is being searched in a certain field
:
query_body = {
"query": {
"match_phrase": {
field: {
"query": query
}
}
}
}
If you know exactly which phrases you’re looking for, you can try out our match-phrase
search_match_phrase().
Note
If you misspell the searched word, the query will not return any results as the purpose is to look for exact phrases. The lowercase and uppercase can bring your results according to the relevance
For example, try searching for pannacotta with lemon marmalade
in the title:
python search.py match-phrase title "Pannacotta with lemon marmalade"
If you just have a rough idea of the phrase you’re looking for, you can make your match phrase query more flexible with the slop
parameter as explained in the section match phrase with slop query section.
Match phrases and add some slop
#
You can use the slop
parameter to create more flexible searches. Suppose you’re searching for pannacotta marmalade
with the match_phrase
query, and no results are found. This happens because you are looking for exact phrases, as discussed in match phrase query section.
You can expand your searches by configuring the slop
parameter. The default value for the slop
parameter is 0.
The slop
parameter allows to control the degree of disorder in your search as explained in the OpenSearch documentation for the slop feature:
slop
is the number of other words allowed between words in the query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match.
You can construct a query and add some slop
like this:
query_body = {
"query": {
"match_phrase": {
field: {
"query": query
"slop": slop # integer or float
}
}
}
}
In the demo, you can find the search_slop() function where this query is used. Suppose you’re looking for pannacotta marmalade
phrase. To find more results rather than exact phrases, you should allow a certain degree. You can configure the slop
to 2 , so it can find matches skipping two words between the searched ones.
This is how you can run this query yourself:
python search.py slop "title" "pannacotta marmalade" 2
Your result should look like this:
['Lemon Pannacotta with Lemon Marmalade ']
So with slop
parameter adjusted, you’re may be able to find results even with other words in between the ones you searched.
See also
Read more about slop
parameter on the OpenSearch project specifications.
Use a term
query#
If you want results with a precise value in a field
, the term query is the right choice. The term query can be used to find documents according to a precise value such as a price or product ID, for example.
This query can be constructed as:
query_body = {
"query": {
"term": {
field: value
}
}
}
In this query, the term is matched as it is, which means that no analyzer is applied to the search term. If you are searching for text field values, it is recommended to use match query instead.
You can look the search_term() function, which uses this query to build customized term queries.
Run the search query yourself to find recipes with zero sodium on it, for example:
python search.py term sodium 0
Search with a range
query#
This query helps to find documents that the field is within a provided range. This can be handy if you’re dealing with numerical values and are interested in ranges instead of specific values. The queries can be constructed as:
query_body = {
"query": {
"range": {
field: {
"gte": gte,
"lte": lte
}
}
}
}
You can construct range queries with combinations of inclusive and exclusive parameters as can be seen in the table:
Parameter |
Behavior |
---|---|
|
Greater than or equal to |
|
Greater than |
|
Less than |
|
Less than or equal to |
Try to find recipes in a certain range of sodium, for example:
python search.py range sodium 0 10
See also
See more about the range query in the OpenSearch documentation.
Write fuzzy queries#
This query looks for documents that have similar term to the searched term. This similarity is calculated by the Levenshtein
edit distance. This distance refers to the minimum number of single-character edits between two words. Some of those changes:
Change of a character:
post
→lost
Removal of a character:
eggs
→ggs
Insertion of a character:
edi
→edit
Transposition of two adjacent characters:
act
→cat
The queries can be constructed as:
query_body = {
"query": {
"fuzzy": {
field: {
"value": value
"fuzziness": fuzziness,
}
}
}
}
We can try out looking for a misspelled word and allowing some fuzziness
. Writing a fuzzy query with a misspelled word, such as pinapple
and setting fuzziness
to zero. Running it, will bring no results:
python search.py fuzzy "title" "pinapple" 0
To correct pinapple
→ Pineapple
word, we only need to change one letter. So we can try again to search this word setting the fuzziness
to one and run the search again.
python search.py fuzzy "title" "pinapple" 1
As you can see, this search returns results 🍍:
[
'Pineapple "Lasagna" ',
'Pineapple Bowl ',
'Pineapple Paletas ',
'Pineapple "Salsa" ',
'Pineapple Sangria ',
'Pineapple Tart ',
'Pineapple Split ',
'Roasted Pineapple with Star Anise Pineapple Sorbet ',
'Pineapple-Apricot Salsa ',
'Pineapple Papaya Relish '
]
It is your turn, try out more combinations to better understand the fuzzy query.