Caltech Library logo

Part I: OpenSearch Machine (VM)

Exploring OpenSearch v2.5.0 using a Multipass managed virtual machine

Knowledge requirements

System Requirements (Host machine)

Software Requirements (Virtual machine)

What we’ll be doing (host and virtual machine)

Before you start

  1. Make sure Multipass is installed and working
  2. Know how to launch, start, stop, delete, purge your virtual machine

Getting started

  1. Run ./opensearch_machine.bash to create the opensearch_machine virtual machine
  2. Run multipass shell opensearch_machine to finish setting things up

On, the opensearch-machine VM

  1. Run 01-setup-scripts.bash
  2. Run 07-add-opensearch.bash
  3. Source the updated .bashrc file.
  4. Reboot your virtual machine

Now we’re ready to start working with OpenSearch


Part II: Working with OpenSearch

We’ll be using …

Make sure OpenSearch us up and running healthy

sudo systemctl status opensearch.service

Exploring OpenSearch

Check to see how OpenSearch is currently configured. By default OpenSearch runs on HTTPS (self signed certs) and requires “admin” account to access.

curl -k --user admin:admin \
  https://localhost:9200/_settings?pretty

This should return JSON which shows the settings of our OpenSearch installation.

Creating our first index

curl -k --user admin:admin \
  -X PUT https://localhost:9200/contact-list?pretty

Response from creating the index

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "contact-list"
}

Creating a document in our index

curl -k --user admin:admin \
  -H 'Content-Type: application/json' \
  --data '{"name": "Robert", "email": "rsdoiel@caltech.edu", "orcid": "0000-0003-0900-6903"}' \
  -X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

Response from creating our document

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Reading back our document

curl -k --user admin:admin \
  https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

Read document response

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Robert",
    "email" : "rsdoiel@caltech.edu",
    "orcid" : "0000-0003-0900-6903"
  }
}

Searching for our document

curl -k --user admin:admin \
  https://localhost:9200/contact-list/_search?q=robert | \
  jq .

NOTE: The “?pretty” option doesn’t work on “_search” queries. But we have “jq .” to help us out.

Search results

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "contact-list",
        "_id": "0000-0003-0900-6903",
        "_score": 0.2876821,
        "_source": {
          "name": "Robert",
          "email": "rsdoiel@caltech.edu",
          "orcid": "0000-0003-0900-6903"
        }
      }
    ]
  }
}

Retrieving an index’s contents

curl -k --user admin:admin -X GET https://localhost:9200/contact-list/_search?pretty

NOTE: When you get lots of results (more than one “page”) you can iterate over the pages. If you collect the ids you can then retrieve the documents

The retrieved index’s response

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "contact-list",
        "_id" : "0000-0003-0900-6903",
        "_score" : 1.0,
        "_source" : {
          "name" : "Robert",
          "email" : "rsdoiel@caltech.edu",
          "orcid" : "0000-0003-0900-6903"
        }
      }
    ]
  }
}

Retrieving the specific document with an _id

curl -k --user admin:admin \
  https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

And the response:

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 1,
  "_seq_no" : 3,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Robert",
    "email" : "rsdoiel@caltech.edu",
    "orcid" : "0000-0003-0900-6903"
  }
}

Updating an index document

In update, I will add a website field.

curl -k --user admin:admin \
  -H 'Content-Type: application/json' \
  --data '{"name":"Robert","email":"rsdoiel@caltech.edu","orcid":"0000-0003-0900-6903","url":"https://rsdoiel.github.io"}' \
  -X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

Update response

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

Dropping document

curl -k --user admin:admin \
  -X DELETE https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

Dropping document response

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

Adding back our deleted document

curl -k --user admin:admin \
  -H 'Content-Type: application/json' \
  --data '{"name": "Robert", "email": "rsdoiel@caltech.edu", "orcid": "0000-0003-0900-6903"}' \
  -X POST https://localhost:9200/contact-list/_doc/0000-0003-0900-6903?pretty

Response from adding back our document

{
  "_index" : "contact-list",
  "_id" : "0000-0003-0900-6903",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

NOTE: We can actually do an “add” with a POST, the “_version” is set back to “1”.

Dumping our index with elasticdump

https://inveniordm.docs.cern.ch/develop/howtos/backup_search_indices/#elasticdump

Dumping our index data:

env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
   --input https://admin:admin@localhost:9200/contact-list \
   --output contact-list.data.json \
   --type data

If we had created an “index map” we’d want to dump that too.

For index mappings:

env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
   --input https://admin:admin@localhost:9200/contact-list \
   --output contact-list.mappings.json \
   --type mapping

Here’s what our data dump looks like

jq . contact-list-data.json
{
  "_index": "contact-list",
  "_id": "0000-0003-0900-6903",
  "_score": 1,
  "_source": {
    "name": "Robert",
    "email": "rsdoiel@caltech.edu",
    "orcid": "0000-0003-0900-6903"
  }
}

Dropping an index

curl -k --user admin:admin \
  -X DELETE https://localhost:9200/contact-list

The response should look like

{"acknowledged":true}

Restoring an index with elasticdump

For index data:

env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
   --input contact-list.data.json \
   --output https://admin:admin@localhost:9200/contact-list \
   --type data

For index mapping:

env NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
   --input contact-list.mappings.json \
   --output https://admin:admin@localhost:9200/contact-list \
   --type mapping

What we’ve learned so far


Part: III: Future topics to explore

OpenSearch provides many additional features

Both are used heavily in Invenio RDM

Future topics to explore (continued)

OpenSearch provides many additional features

Tooling for OpenSearch

Interesting API end points


Concluding

About