OpenSearch

OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

Persistent Mobile Foundation Analytics uses OpenSearch as follows.

Storing data and running search queries.
PMF Analytics Server - Storing all the mobile and server data in the JSON format.
PMF Analytics Console - Populated by querying the OpenSearch instances in real-time.

Prerequisites

This section outlines the minimum and recommended system requirements for running OpenSearch with PMF.

For official OpenSearch software requirements and supported versions, see OpenSearch 3.1 documentation - Install and upgrade OpenSearch.

Hardware requirements

The hardware requirements for OpenSearch are highly dependent on the use case, data volume, and query load. Following are general guidelines for minimum requirements.

Development environment

A single or multiple node cluster with following configuration:

Type	Configuration
RAM	4 GB per node
CPU	2 cores per node
Storage	20 GB per node (SSD recommended)

Production environment

A minimum 3 node (Cluster manager and 2 data nodes) cluster with each node having following configuration is recommended:

Type	Configuration
RAM	16 GB per node
CPU	4 cores per node
Storage	200 GB per node (SSD recommended)

Note: For detailed information on sizing and hardware considerations, see Hardware sizing calculator.

Software requirements

Following are the prerequisite software requirements.

OpenSearch 3.1
Operating systems -
- Rocky Linux 8
- Alma Linux 8
- Amazon Linux 2/2023
- Ubuntu 24.04
- Windows Server 2019
- Ubuntu 22.04 LTS (Tested by PMF team)
For more information, see OpenSearch 3.1 documentation - Supported operating systems.
OpenJDK 21.0.7 (Tested by PMF team)

If you want to use a different OpenJDK version, specify the OPENSEARCH_JAVA_HOME or JAVA_HOME environment variable as follows.
```
 export OPENSEARCH_JAVA_HOME=/path/to/opensearch-3.1.0/jdk
```

File system recommendations

Solid-State Drives (SSDs) installed on the host nodes are recommended.
Network files systems (NFS) are not recommended.

Network requirements

Following are the network requirement details.

Port number	OpenSearch component
443	OpenSearch Dashboards in AWS OpenSearch Service with encryption in transit (TLS)
5601	OpenSearch Dashboards
9200	OpenSearch ReST API
9300	Node communication and transport (internal), cross cluster search
9600	Performance Analyzer

For more information, see OpenSearch 3.1 documentation - Network requirements.

Important configurations

You need to set the following configurations.

Memory map setting: For production workloads running on Linux, make sure the vm.max_map_count is set to at least 262144.
Disable memory lock: Disable JVM heap memory swapping by setting the value of the bootstrap.memory_lock property to “true”.
Optional: Disable swap on a Linux system: Open /etc/fstab in a text editor with root privileges and comment out or remove any lines containing the word “swap”.

Note: Disabling swap on a system with limited RAM might cause issues if OpenSearch tries to allocate more memory than is available. Ensure that you have sufficient RAM to accommodate the OpenSearch process.
Java heap size setting: Ensure that the value of the Java heap size (OPENSEARCH_JAVA_OPTS) property is half of the RAM size. For example, if your node has 16GB RAM, you should allocate 8GB to heap space. Though the default value of the property is OPENSEARCH_JAVA_OPTS=--Xms1g -Xmx1g, it is recommended to change the values by updating the following lines in the jvm.options file located in the config directory.
- Xms1g to Xms8g
- Xmx1g to Xmx8g
File descriptor limit: Set the value of nofile property to “65536” to specify limit of 65536 open files for a OpenSearch user.

For more details on how to update these configurations, see OpenSearch 3.1 documentation - Important settings.

Installation

Following is a high-level overview of installing OpenSearch for PMF. We recommend a clustered deployment for production environments to ensure high availability and reliability.

Following are the types of available installations.

Tarball/ZIP

This is the most traditional method for self-hosting on a virtual machine or server. This requires manual installation and configuration. For detailed information, see Capacity Planning - OpenSearch sheet in the Hardware sizing calculator.

For more information, see OpenSearch 3.1 documentation - Installing OpenSearch through Tarball.
Containerized For containerized environments, OpenSearch provides official Docker images. This is a popular choice for local development and production deployments using container orchestration platforms like Kubernetes.
- For more information on the docker-based deployment, see OpenSearch 3.1 documentation - Docker.
- For more information on the helm-based deployment, see OpenSearch 3.1 documentation - Helm.
- For more information on the Kubernetes-based deployment, see OpenSearch 3.1 documentation - Kubernetes.
Managed service

OpenSearch can be provisioned using managed service provided by public cloud platforms. One of the options is provisioning by using Amazon OpenSearch Service. For more information, see Amazon OpenSearch Service.

Following table displays the mapping of testing done with different deployment and installation types with operating systems.

Infrastructure	Installation type	Platform	Tested (Y/N)
Cloud	Helm charts/Operator-based	Openshift	✔
Cloud	Managed Service	Amazon Web Services	✖
On-Premises	Helm charts/Operator-based	Kubernetes	✔
Physical/Virtual Server	Tarball	Linux – Ubuntu	✔
Physical/Virtual Server	Tarball	Linux – Other Linux	✖
Physical/Virtual Server	RPM	Linux – RHEL, Centos	✖
Physical/Virtual Server	APM	Linux – Debian	✖

Post-installation instructions

You need to update the following.

Install analytics-icu plugin

You must install the analytics-icu plugin for successful integration with PMF, else you will get the following message in the OpenSearch logs.
```
  Custom Analyzer [normalization] failed to find tokenizer under name [icu_tokenizer]
```
For more information, see OpenSearch 3.1 documentation - Installing plugins.
Data retention policy

The data retention policy is set through Time to Live (TTL) period, in days, on the PMF Analytics console. By default value is 30 days. Though you can change the default value, PMF team recommends not to unless required due to specific data retention needs.

Note: Some documents might get retained for an extended period due to adjustment done to the index-based document deletion. For more information, see Index management section in the Best practices.

Best practices

Following are the key maintenance and operational best practices for an OpenSearch cluster integrated with PMF.

Monitoring

Use a monitoring solution (like OpenSearch Dashboards, or a third-party tool) to monitor key metrics, including CPU usage, memory pressure, disk utilization, and cluster health status (Green, Yellow, or Red).
Backups and snapshots

Regularly take snapshots of your OpenSearch cluster to prevent data loss. For more information, see OpenSearch 3.1 documentation - Snapshots.
Security

Ensure that the fine-grained access control is enabled and properly configured to secure your data. Avoid public access to your cluster. For more information, see OpenSearch 3.1 documentation - Security in OpenSearch.
Upgrades

Plan for and perform regular upgrades to stay on a supported version and patch any security vulnerabilities. For more information, see OpenSearch 3.1 documentation - Upgrade OpenSearch.
Data migration

The data migration process currently supports data migration from PMF 9.0.x, and PMF 9.x releases to the PMF 10.x release. This includes migrating data from the Elastic Search cluster (9.0.x and 9.x releases) to the OpenSearch cluster (10.x release). The PMF bundles Node.js-based utility for migrating data from Elastic Search 1.0 to OpenSearch 3.1. For more information, see Using Analytics Data Migration tool.
Index management

The PMF provides automated index management for data ingested through PMF.
- Index rollover - The indices would be rolled over based on the following conditions,
  - The size of the index reaches 30 GB, OR
  - Rollover period, in days, for index is reached.
  Following formula is used to calculate the index rollover period:
```
 Rollover period (Days) = Math.ceil (TTL in days/4)
```
  Where,
  
  TTL is time period, in days, the Analytic data is going to be retained in the OpenSearch cluster.
  
  Example
  
  So if you set the rollover time as 30 days, the rollover period for indices as per the following calculation will be 8 days.
```
 Rollover period = Math.ceil (30/4)=8 days
```
- Index expiration, and deletion
  
  The old rolled-over indices are deleted when they reach expiration period. The expiration period is calculated by adding configured TTL and Index rollover period calculated above.
```
 Expiration period = Configured TTL + Index rollover period
```
  Example
  
  If you set the configured TTL as 30 days, and the Index rollover period is calculated as 8 days, then the index get expired and deleted on or after 38th day from the date of creation.
```
 Expiration period = 30 + 8 = 38 days
```
  Note: The PMF uses cron JOBs to delete the old indices which have reached the deletion period. The deletion period is counted from date of creation of the index. This might result in the data loss by deleting latest added documents early and before configured TTL period. To prevent this, PMF adds index rollover period to configured TTL period.
Tuning Operating system
- Increase the allowed number of open file descriptors to 32k or 64k.
- Increase the virtual memory map counts.
Note: Check the corresponding documentation for the Operating system.
Tuning OpenSearch Cluster
- Both Java Xms and Xmx has to be set (Min and Max) as same.
- Maximum allowed Heapsize Per JVM <= RAM Size/2.
- Number of Primary Shards = Number of Nodes of the Analytics Cluster.
- Number of Replica per shard >= 2.
Note: If there is only one node then there is no need of a replica.

▲

Last modified on