Exploring OpenSearch v2.5.0 using a Multipass managed virtual machine
curl
to interact with
OpenSearchjq
to wrangle unruly JSON strings./opensearch_machine.bash
to create the
opensearch_machine
virtual machinemultipass shell opensearch_machine
to finish
setting things upopensearch-machine
VM01-setup-scripts.bash
07-add-opensearch.bash
.bashrc
file.Now we’re ready to start working with OpenSearch
We’ll be using …
sudo systemctl status opensearch.service
Check to see how OpenSearch is currently configured. By default OpenSearch runs on HTTPS (self signed certs) and requires “admin” account to access.
curl -k --user admin:admin \
https://localhost:9200/_settings?pretty
This should return JSON which shows the settings of our OpenSearch installation.
curl -k --user admin:admin \
-X PUT https://localhost:9200/contact-list?pretty
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "contact-list"
}
curl -k --user admin:admin \
-H 'Content-Type: application/json' \
--data '{"name": "Robert", "email": "rsdoiel@caltech.edu", "orcid": "0000-0003-0900-6903"}' \
-X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
curl -k --user admin:admin \
https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Robert",
"email" : "rsdoiel@caltech.edu",
"orcid" : "0000-0003-0900-6903"
}
}
{INDEX_NAME}/_search?q=robert
curl -k --user admin:admin \
https://localhost:9200/contact-list/_search?q=robert | \
jq .
NOTE: The “?pretty
” option doesn’t work on
“_search
” queries. But we have “jq .
” to help
us out.
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "contact-list",
"_id": "0000-0003-0900-6903",
"_score": 0.2876821,
"_source": {
"name": "Robert",
"email": "rsdoiel@caltech.edu",
"orcid": "0000-0003-0900-6903"
}
}
]
}
}
{INDEX_NAME}/_search
curl -k --user admin:admin -X GET https://localhost:9200/contact-list/_search?pretty
NOTE: When you get lots of results (more than one “page”) you can iterate over the pages. If you collect the ids you can then retrieve the documents
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_score" : 1.0,
"_source" : {
"name" : "Robert",
"email" : "rsdoiel@caltech.edu",
"orcid" : "0000-0003-0900-6903"
}
}
]
}
}
_id
curl -k --user admin:admin \
https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
And the response:
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 1,
"_seq_no" : 3,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Robert",
"email" : "rsdoiel@caltech.edu",
"orcid" : "0000-0003-0900-6903"
}
}
{INDEX_NAME}/_doc/{DOC_ID}
In update, I will add a website field.
curl -k --user admin:admin \
-H 'Content-Type: application/json' \
--data '{"name":"Robert","email":"rsdoiel@caltech.edu","orcid":"0000-0003-0900-6903","url":"https://rsdoiel.github.io"}' \
-X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
{INDEX_NAME}/_doc/{DOC_ID}
curl -k --user admin:admin \
-X DELETE https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 3,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
curl -k --user admin:admin \
-H 'Content-Type: application/json' \
--data '{"name": "Robert", "email": "rsdoiel@caltech.edu", "orcid": "0000-0003-0900-6903"}' \
-X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty
{
"_index" : "contact-list",
"_id" : "0000-0003-0900-6903",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
NOTE: We can actually do an “add” with a POST, the
“_version
” is set back to “1”.
https://inveniordm.docs.cern.ch/develop/howtos/backup_search_indices/#elasticdump
Dumping our index data:
env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
--input https://admin:admin@localhost:9200/contact-list \
--output contact-list.data.json \
--type data
If we had created an “index map” we’d want to dump that too.
For index mappings:
env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
--input https://admin:admin@localhost:9200/contact-list \
--output contact-list.mappings.json \
--type mapping
jq . contact-list-data.json
{
"_index": "contact-list",
"_id": "0000-0003-0900-6903",
"_score": 1,
"_source": {
"name": "Robert",
"email": "rsdoiel@caltech.edu",
"orcid": "0000-0003-0900-6903"
}
}
{INDEX_NAME}
curl -k --user admin:admin \
-X DELETE https://localhost:9200/contact-list
The response should look like
{"acknowledged":true}
For index data:
env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
--input contact-list.data.json \
--output https://admin:admin@localhost:9200/contact-list \
--type data
For index mapping:
env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
--input contact-list.mappings.json \
--output https://admin:admin@localhost:9200/contact-list \
--type mapping
elasticdump
OpenSearch provides many additional features
Both are used heavily in Invenio RDM
OpenSearch provides many additional features
_settings
_aliases
_search
_mapping
_source
_doc