OpenSearch
OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.
Persistent Mobile Foundation Analytics uses OpenSearch as follows.
- Storing data and running search queries.
- PMF Analytics Server - Storing all the mobile and server data in the JSON format.
- PMF Analytics Console - Populated by querying the OpenSearch instances in real-time.
Prerequisites
This section outlines the minimum and recommended system requirements for running OpenSearch with PMF.
For official OpenSearch software requirements and supported versions, see OpenSearch 3.1 documentation - Install and upgrade OpenSearch.
Hardware requirements
The hardware requirements for OpenSearch are highly dependent on the use case, data volume, and query load. Following are general guidelines for minimum requirements.
Development environment
A single or multiple node cluster with following configuration:
Type | Configuration |
---|---|
RAM | 4 GB per node |
CPU | 2 cores per node |
Storage | 20 GB per node (SSD recommended) |
Production environment
A minimum 3 node (Cluster manager and 2 data nodes) cluster with each node having following configuration is recommended:
Type | Configuration |
---|---|
RAM | 16 GB per node |
CPU | 4 cores per node |
Storage | 200 GB per node (SSD recommended) |
Note: For detailed information on sizing and hardware considerations, see Hardware sizing calculator.
Software requirements
Following are the prerequisite software requirements.
- OpenSearch 3.1
- Operating systems -
- Rocky Linux 8
- Alma Linux 8
- Amazon Linux 2/2023
- Ubuntu 24.04
- Windows Server 2019
- Ubuntu 22.04 LTS (Tested by PMF team)
For more information, see OpenSearch 3.1 documentation - Supported operating systems.
-
OpenJDK 21.0.7 (Tested by PMF team)
If you want to use a different OpenJDK version, specify the
OPENSEARCH_JAVA_HOME
orJAVA_HOME
environment variable as follows.export OPENSEARCH_JAVA_HOME=/path/to/opensearch-3.1.0/jdk
File system recommendations
- Solid-State Drives (SSDs) installed on the host nodes are recommended.
- Network files systems (NFS) are not recommended.
Network requirements
Following are the network requirement details.
Port number | OpenSearch component |
---|---|
443 | OpenSearch Dashboards in AWS OpenSearch Service with encryption in transit (TLS) |
5601 | OpenSearch Dashboards |
9200 | OpenSearch ReST API |
9300 | Node communication and transport (internal), cross cluster search |
9600 | Performance Analyzer |
For more information, see OpenSearch 3.1 documentation - Network requirements.
Important configurations
You need to set the following configurations.
- Memory map setting: For production workloads running on Linux, make sure the
vm.max_map_count
is set to at least 262144. - Disable memory lock: Disable JVM heap memory swapping by setting the value of the
bootstrap.memory_lock
property to “true”. -
Optional: Disable swap on a Linux system: Open
/etc/fstab
in a text editor with root privileges and comment out or remove any lines containing the word “swap”.Note: Disabling swap on a system with limited RAM might cause issues if OpenSearch tries to allocate more memory than is available. Ensure that you have sufficient RAM to accommodate the OpenSearch process.
-
Java heap size setting: Ensure that the value of the Java heap size (OPENSEARCH_JAVA_OPTS) property is half of the RAM size. For example, if your node has 16GB RAM, you should allocate 8GB to heap space. Though the default value of the property is
OPENSEARCH_JAVA_OPTS=--Xms1g -Xmx1g
, it is recommended to change the values by updating the following lines in thejvm.options
file located in theconfig
directory.- Xms1g to Xms8g
- Xmx1g to Xmx8g
- File descriptor limit: Set the value of
nofile
property to “65536” to specify limit of 65536 open files for a OpenSearch user.
For more details on how to update these configurations, see OpenSearch 3.1 documentation - Important settings.
Installation
Following is a high-level overview of installing OpenSearch for PMF. We recommend a clustered deployment for production environments to ensure high availability and reliability.
Following are the types of available installations.
-
Tarball/ZIP
This is the most traditional method for self-hosting on a virtual machine or server. This requires manual installation and configuration. For detailed information, see Capacity Planning - OpenSearch sheet in the Hardware sizing calculator.
For more information, see OpenSearch 3.1 documentation - Installing OpenSearch through Tarball.
-
Containerized For containerized environments, OpenSearch provides official Docker images. This is a popular choice for local development and production deployments using container orchestration platforms like Kubernetes.
- For more information on the docker-based deployment, see OpenSearch 3.1 documentation - Docker.
- For more information on the helm-based deployment, see OpenSearch 3.1 documentation - Helm.
- For more information on the Kubernetes-based deployment, see OpenSearch 3.1 documentation - Kubernetes.
-
Managed service
OpenSearch can be provisioned using managed service provided by public cloud platforms. One of the options is provisioning by using Amazon OpenSearch Service. For more information, see Amazon OpenSearch Service.
Following table displays the mapping of testing done with different deployment and installation types with operating systems.
Infrastructure | Installation type | Platform | Tested (Y/N) |
---|---|---|---|
Cloud | Helm charts/Operator-based | Openshift | ✔ |
Cloud | Managed Service | Amazon Web Services | ✖ |
On-Premises | Helm charts/Operator-based | Kubernetes | ✔ |
Physical/Virtual Server | Tarball | Linux – Ubuntu | ✔ |
Physical/Virtual Server | Tarball | Linux – Other Linux | ✖ |
Physical/Virtual Server | RPM | Linux – RHEL, Centos | ✖ |
Physical/Virtual Server | APM | Linux – Debian | ✖ |
Post-installation instructions
You need to update the following.
-
Install
analytics-icu
pluginYou must install the analytics-icu plugin for successful integration with PMF, else you will get the following message in the OpenSearch logs.
Custom Analyzer [normalization] failed to find tokenizer under name [icu_tokenizer]
For more information, see OpenSearch 3.1 documentation - Installing plugins.
-
Data retention policy
The data retention policy is set through Time to Live (TTL) period, in days, on the PMF Analytics console. By default value is 30 days. Though you can change the default value, PMF team recommends not to unless required due to specific data retention needs.
Note: Some documents might get retained for an extended period due to adjustment done to the index-based document deletion. For more information, see Index management section in the Best practices.
Best practices
Following are the key maintenance and operational best practices for an OpenSearch cluster integrated with PMF.
-
Monitoring
Use a monitoring solution (like OpenSearch Dashboards, or a third-party tool) to monitor key metrics, including CPU usage, memory pressure, disk utilization, and cluster health status (Green, Yellow, or Red).
-
Backups and snapshots
Regularly take snapshots of your OpenSearch cluster to prevent data loss. For more information, see OpenSearch 3.1 documentation - Snapshots.
-
Security
Ensure that the fine-grained access control is enabled and properly configured to secure your data. Avoid public access to your cluster. For more information, see OpenSearch 3.1 documentation - Security in OpenSearch.
-
Upgrades
Plan for and perform regular upgrades to stay on a supported version and patch any security vulnerabilities. For more information, see OpenSearch 3.1 documentation - Upgrade OpenSearch.
-
Data migration
The data migration process currently supports data migration from PMF 9.0.x, and PMF 9.x releases to the PMF 10.x release. This includes migrating data from the Elastic Search cluster (9.0.x and 9.x releases) to the OpenSearch cluster (10.x release). The PMF bundles Node.js-based utility for migrating data from Elastic Search 1.0 to OpenSearch 3.1. For more information, see Using Analytics Data Migration tool.
-
Index management
The PMF provides automated index management for data ingested through PMF.
- Index rollover - The indices would be rolled over based on the following conditions,
- The size of the index reaches 30 GB, OR
- Rollover period, in days, for index is reached.
Following formula is used to calculate the index rollover period:
Rollover period (Days) = Math.ceil (TTL in days/4)
Where,
TTL is time period, in days, the Analytic data is going to be retained in the OpenSearch cluster.
Example
So if you set the rollover time as 30 days, the rollover period for indices as per the following calculation will be 8 days.
Rollover period = Math.ceil (30/4)=8 days
-
Index expiration, and deletion
The old rolled-over indices are deleted when they reach expiration period. The expiration period is calculated by adding configured TTL and Index rollover period calculated above.
Expiration period = Configured TTL + Index rollover period
Example
If you set the configured TTL as 30 days, and the Index rollover period is calculated as 8 days, then the index get expired and deleted on or after 38th day from the date of creation.
Expiration period = 30 + 8 = 38 days
Note: The PMF uses cron JOBs to delete the old indices which have reached the deletion period. The deletion period is counted from date of creation of the index. This might result in the data loss by deleting latest added documents early and before configured TTL period. To prevent this, PMF adds index rollover period to configured TTL period.
- Index rollover - The indices would be rolled over based on the following conditions,
-
Tuning Operating system
- Increase the allowed number of open file descriptors to 32k or 64k.
- Increase the virtual memory map counts.
Note: Check the corresponding documentation for the Operating system.
-
Tuning OpenSearch Cluster
- Both Java Xms and Xmx has to be set (Min and Max) as same.
- Maximum allowed Heapsize Per JVM <= RAM Size/2.
- Number of Primary Shards = Number of Nodes of the Analytics Cluster.
- Number of Replica per shard >= 2.
Note: If there is only one node then there is no need of a replica.