OpenSearch

OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

Persistent Mobile Foundation Analytics uses OpenSearch as follows.

  • Storing data and running search queries.
  • PMF Analytics Server - Storing all the mobile and server data in the JSON format.
  • PMF Analytics Console - Populated by querying the OpenSearch instances in real-time.

Prerequisites

This section outlines the minimum and recommended system requirements for running OpenSearch with PMF.

For official OpenSearch software requirements and supported versions, see OpenSearch 3.1 documentation - Install and upgrade OpenSearch.

Hardware requirements

The hardware requirements for OpenSearch are highly dependent on the use case, data volume, and query load. Following are general guidelines for minimum requirements.

Development environment

A single or multiple node cluster with following configuration:

Type Configuration
RAM 4 GB per node
CPU 2 cores per node
Storage 20 GB per node (SSD recommended)

Production environment

A minimum 3 node (Cluster manager and 2 data nodes) cluster with each node having following configuration is recommended:

Type Configuration
RAM 16 GB per node
CPU 4 cores per node
Storage 200 GB per node (SSD recommended)

Note: For detailed information on sizing and hardware considerations, see Hardware sizing calculator.

Software requirements

Following are the prerequisite software requirements.

  • OpenSearch 3.1
  • Operating systems -
    • Rocky Linux 8
    • Alma Linux 8
    • Amazon Linux 2/2023
    • Ubuntu 24.04
    • Windows Server 2019
    • Ubuntu 22.04 LTS (Tested by PMF team)

    For more information, see OpenSearch 3.1 documentation - Supported operating systems.

  • OpenJDK 21.0.7 (Tested by PMF team)

    If you want to use a different OpenJDK version, specify the OPENSEARCH_JAVA_HOME or JAVA_HOME environment variable as follows.

     export OPENSEARCH_JAVA_HOME=/path/to/opensearch-3.1.0/jdk
    

File system recommendations

  • Solid-State Drives (SSDs) installed on the host nodes are recommended.
  • Network files systems (NFS) are not recommended.

Network requirements

Following are the network requirement details.

Port number OpenSearch component
443 OpenSearch Dashboards in AWS OpenSearch Service with encryption in transit (TLS)
5601 OpenSearch Dashboards
9200 OpenSearch ReST API
9300 Node communication and transport (internal), cross cluster search
9600 Performance Analyzer

For more information, see OpenSearch 3.1 documentation - Network requirements.

Important configurations

You need to set the following configurations.

  • Memory map setting: For production workloads running on Linux, make sure the vm.max_map_count is set to at least 262144.
  • Disable memory lock: Disable JVM heap memory swapping by setting the value of the bootstrap.memory_lock property to “true”.
  • Optional: Disable swap on a Linux system: Open /etc/fstab in a text editor with root privileges and comment out or remove any lines containing the word “swap”.

    Note: Disabling swap on a system with limited RAM might cause issues if OpenSearch tries to allocate more memory than is available. Ensure that you have sufficient RAM to accommodate the OpenSearch process.

  • Java heap size setting: Ensure that the value of the Java heap size (OPENSEARCH_JAVA_OPTS) property is half of the RAM size. For example, if your node has 16GB RAM, you should allocate 8GB to heap space. Though the default value of the property is OPENSEARCH_JAVA_OPTS=--Xms1g -Xmx1g, it is recommended to change the values by updating the following lines in the jvm.options file located in the config directory.

    • Xms1g to Xms8g
    • Xmx1g to Xmx8g
  • File descriptor limit: Set the value of nofile property to “65536” to specify limit of 65536 open files for a OpenSearch user.

For more details on how to update these configurations, see OpenSearch 3.1 documentation - Important settings.

Installation

Following is a high-level overview of installing OpenSearch for PMF. We recommend a clustered deployment for production environments to ensure high availability and reliability.

Following are the types of available installations.

Following table displays the mapping of testing done with different deployment and installation types with operating systems.

Infrastructure Installation type Platform Tested (Y/N)
Cloud Helm charts/Operator-based Openshift
Cloud Managed Service Amazon Web Services
On-Premises Helm charts/Operator-based Kubernetes
Physical/Virtual Server Tarball Linux – Ubuntu
Physical/Virtual Server Tarball Linux – Other Linux
Physical/Virtual Server RPM Linux – RHEL, Centos
Physical/Virtual Server APM Linux – Debian

Post-installation instructions

You need to update the following.

  • Install analytics-icu plugin

    You must install the analytics-icu plugin for successful integration with PMF, else you will get the following message in the OpenSearch logs.

      Custom Analyzer [normalization] failed to find tokenizer under name [icu_tokenizer]
    

    For more information, see OpenSearch 3.1 documentation - Installing plugins.

  • Data retention policy

    The data retention policy is set through Time to Live (TTL) period, in days, on the PMF Analytics console. By default value is 30 days. Though you can change the default value, PMF team recommends not to unless required due to specific data retention needs.

    Note: Some documents might get retained for an extended period due to adjustment done to the index-based document deletion. For more information, see Index management section in the Best practices.

Best practices

Following are the key maintenance and operational best practices for an OpenSearch cluster integrated with PMF.

  • Monitoring

    Use a monitoring solution (like OpenSearch Dashboards, or a third-party tool) to monitor key metrics, including CPU usage, memory pressure, disk utilization, and cluster health status (Green, Yellow, or Red).

  • Backups and snapshots

    Regularly take snapshots of your OpenSearch cluster to prevent data loss. For more information, see OpenSearch 3.1 documentation - Snapshots.

  • Security

    Ensure that the fine-grained access control is enabled and properly configured to secure your data. Avoid public access to your cluster. For more information, see OpenSearch 3.1 documentation - Security in OpenSearch.

  • Upgrades

    Plan for and perform regular upgrades to stay on a supported version and patch any security vulnerabilities. For more information, see OpenSearch 3.1 documentation - Upgrade OpenSearch.

  • Data migration

    The data migration process currently supports data migration from PMF 9.0.x, and PMF 9.x releases to the PMF 10.x release. This includes migrating data from the Elastic Search cluster (9.0.x and 9.x releases) to the OpenSearch cluster (10.x release). The PMF bundles Node.js-based utility for migrating data from Elastic Search 1.0 to OpenSearch 3.1. For more information, see Using Analytics Data Migration tool.

  • Index management

    The PMF provides automated index management for data ingested through PMF.

    • Index rollover - The indices would be rolled over based on the following conditions,
      • The size of the index reaches 30 GB, OR
      • Rollover period, in days, for index is reached.

      Following formula is used to calculate the index rollover period:

       Rollover period (Days) = Math.ceil (TTL in days/4)
      

      Where,

      TTL is time period, in days, the Analytic data is going to be retained in the OpenSearch cluster.

      Example

      So if you set the rollover time as 30 days, the rollover period for indices as per the following calculation will be 8 days.

       Rollover period = Math.ceil (30/4)=8 days
      
    • Index expiration, and deletion

      The old rolled-over indices are deleted when they reach expiration period. The expiration period is calculated by adding configured TTL and Index rollover period calculated above.

       Expiration period = Configured TTL + Index rollover period
      

      Example

      If you set the configured TTL as 30 days, and the Index rollover period is calculated as 8 days, then the index get expired and deleted on or after 38th day from the date of creation.

       Expiration period = 30 + 8 = 38 days
      

      Note: The PMF uses cron JOBs to delete the old indices which have reached the deletion period. The deletion period is counted from date of creation of the index. This might result in the data loss by deleting latest added documents early and before configured TTL period. To prevent this, PMF adds index rollover period to configured TTL period.

  • Tuning Operating system

    • Increase the allowed number of open file descriptors to 32k or 64k.
    • Increase the virtual memory map counts.

    Note: Check the corresponding documentation for the Operating system.

  • Tuning OpenSearch Cluster

    • Both Java Xms and Xmx has to be set (Min and Max) as same.
    • Maximum allowed Heapsize Per JVM <= RAM Size/2.
    • Number of Primary Shards = Number of Nodes of the Analytics Cluster.
    • Number of Replica per shard >= 2.

    Note: If there is only one node then there is no need of a replica.

Last modified on