Migrate analytics data from MFP Cloud 8.0 to PMF Cloud 9.1 on Kubernetes

Assumption:

Analytics data is migrated from an existing MFP Cloud 8.0 deployment to PMF 9.1 in InPlace upgrade.

Prerequisite:

  • Ensure to take snapshot of the analytics data into a directory
  • Note the number of shards configured in elastic search OnPrem environment

Steps:

Take snapshot of analytics data (Taking backup of analytics data).

To take snapshot follow below steps:

#1. Upgrade es-operator to 8.1.32 version In your existing MFP 8.0 set up package, to es/deploy directory & modify operator.yaml file and change es-operator image to do the upgrade. Replace existing operator image version with 8.1.32 version, ensure you have updated version info on below lines (3 places) in operator.yaml

Note: PMF 9.1 package contains es-operator:8.1.32 image, push it to your internal docker repository before applying changes in operator.yaml file. Please refer to PMF documentation to know how to push images to private docker registry.

      release: es-operator-8.1.32
      ...
      release: es-operator-8.1.32
      ...
      image: es-operator:8.1.32

#2. Apply operator.yaml changes

kubectl apply -f operator.yaml

#3. Wait till all ibm-es-* pods come up before proceeding further

#4. Check if ibm-es-es-configmap configmap reflects the path.repo, run below command and look for ‘path.repo’

kubectl edit configmap ibm-es-es-configmap

If the changes do not reflect in ibm-es-es-configmap configmap then manually edit configmap and make below changes

kubectl edit configmap ibm-es-es-configmap

Update path.repo: /es_backup\n in elasticsearch.yaml section

Example:

apiVersion: v1 
data: 
  elasticsearch.yml: "cluster:\n  name: ${CLUSTER_NAME}\nnode:\n  master: ${NODE_MASTER}\n 
    \ data: ${NODE_DATA}\n  name: ${NODE_NAME}\n  ingest: ${NODE_INGEST}\nindex:\n 
    \ number_of_replicas: 1\n  number_of_shards: 3\n  mapper.dynamic: true\npath:\n 
    \ data: /data/data_${staticname}\n  logs: /data/log_${staticname}\n  plugins: 
    /elasticsearch/plugins\n  work: /data/work_${staticname}      \nprocessors: ${PROCESSORS:1}\nbootstrap:\n 
    \ memory_lock: false\nhttp:\n  enabled: ${HTTP_ENABLE}\n  compression: true\n 
    \ cors:\n    enabled: true\n    allow-origin: \"*\"\ncloud:\n  k8s:\n    service: 
    ${DISCOVERY_SERVICE}\n    namespace: ${NAMESPACE}\ndiscovery:\n  type: io.fabric8.elasticsearch.discovery.k8s.K8sDiscoveryModule\n 
    \ zen:\n    ping.multicast.enabled: false\n    minimum_master_nodes: 1\nxpack.security.enabled: 
    false\nxpack.ml.enabled: false\ncompress.lzf.decoder: optimal\ndiscovery.zen.ping.multicast.enabled: 
    false\nbootstrap.mlockall: true\ncompress.lzf.decoder: safe\nscript.inline: true\npath.repo: 
    /es_backup\n"

#5. Create PersistentVolume (PV) and PersistentVolumeClaim (PVC)

a. Create PersistentVolume as below

# es-pv-volume.yaml 
apiVersion: v1 
kind: PersistentVolume 
metadata: 
  name: es-pv-volume 
  labels: 
    type: local 
spec: 
  storageClassName: manual 
  capacity: 
    storage: 2Gi 
  accessModes: 
    - ReadWriteMany 
  hostPath: 
    path: "/es_backup" 
kubectl apply -f es-pv-volume.yaml

b. Create PersistentVolumeClaim as below

# es-pv-claim.yaml 
apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  name: es-pv-claim 
spec: 
  storageClassName: manual 
  accessModes: 
    - ReadWriteMany 
  resources: 
    requests: 
      storage: 2Gi 
kubectl apply -f es-pv-claim.yaml

[Change the storage size as per your requirements]

kubectl get pv
NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
es-pv-volume     2Gi        RWX            Retain           Bound    mfpmig2/es-pv-claim            manual         <unset>                          12s

kubectl get pvc
NAME                   STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
es-pv-claim            Bound    es-pv-volume     2Gi        RWX            manual         <unset>                 9s

#6. Make volumeMounts and volumes changes in elasticsearch deployments & statefulset

To see the deployments,

kubectl get deployments 
kubectl get deployments

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
es-operator              1/1     1            1           40h
ibm-es-esclient          1/1     1            1           40h
ibm-es-esmaster          1/1     1            1           40h
ibm-mf-analytics         1/1     1            1           41h
ibm-mf-analytics-recvr   1/1     1            1           41h
ibm-mf-server            1/1     1            1           41h
mf-operator              1/1     1            1           41 

To see the statefulsets,

 kubectl get statefulsets

NAME            READY   AGE
ibm-es-esdata   1/1     40h

Modify ibm-es-esclient and ibm-es-esmaster deployments to update volumeMounts and volumes as per below:

a. Modify ibm-es-esclient deployment

kubectl edit deployment ibm-es-esclient

Add below volumeMount:

        - mountPath: /es_backup 
          name: hp-volume 

and add below volume:

      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim  

Example after adding volume and volumeMount in ibm-es-esclient deployment:

        volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: storage 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 

...
...
...
        volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - emptyDir: {} 
        name: storage 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config    

b. Modify ibm-es-esmaster deployment

kubectl edit deployment ibm-es-esmaster

Add below volumeMount:

        - mountPath: /es_backup 
          name: hp-volume 

Add below volume:

      - name: hp-volume
        persistentVolumeClaim: 
          claimName: es-pv-claim 

Example after adding volume and volumeMount in ibm-es-esmaster deployment:

        volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: storage 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 
           
...
...
...
      volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - emptyDir: {} 
        name: storage 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config   

c. Modify ibm-es-esdata statefulset

kubectl edit statefulset ibm-es-esdata

Add below volumeMount:

        - mountPath: /es_backup 
          name: hp-volume 

Add below volume:

      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim  

Example after adding volume and volumeMount in ibm-es-esdata statefulset:

         volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: analytics-data 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 
           
...
...
...
      volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - name: analytics-data 
        persistentVolumeClaim: 
          claimName: mfanalyticsvolclaim2 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config  

#7. Verify that /es_backup path is mounted in each of the elasticsearch pods

ibm-es-esclient-69f4974f8f-brczt          1/1     Running     0          42h
ibm-es-esdata-0                           1/1     Running     0          42h
ibm-es-esmaster-6548b4ddf9-l884h          1/1     Running     0          42h

a. Verify that /es_backup path in ibm-es-esclient pod

kubectl exec -it ibm-es-esclient-7cb8768cf5-kv5lc bash
bash-4.4$ cat /elasticsearch/config/elasticsearch.yml 
cluster:
  name: ${CLUSTER_NAME}
...
...
...

xpack.ml.enabled: false
compress.lzf.decoder: optimal
discovery.zen.ping.multicast.enabled: false
bootstrap.mlockall: true
compress.lzf.decoder: safe
script.inline: true
path.repo: /es_backup

bash-4.4$ ls -ld /es_backup/

You should see: path.repo: /es_backup in elasticsearch.yaml and also a mounted path.

b. Similiarly verify /es_backup path for ibm-es-esdata-* and ibm-es-esmaster-* pods also

#8. Take snapshot using elasticsearch API

a. Get elasticsearch master pod IP address

kubectl get pods -o wide

NAME                                      READY   STATUS      RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
es-operator-658779fbb4-xbm76              1/1     Running     0          42h   172.16.77.139   master-node   <none>           <none>
ibm-es-esclient-69f4974f8f-brczt          1/1     Running     0          42h   172.16.77.138   master-node   <none>           <none>
ibm-es-esdata-0                           1/1     Running     0          42h   172.16.77.135   master-node   <none>           <none>
ibm-es-esmaster-6548b4ddf9-l884h          1/1     Running     0          42h   172.16.77.143   master-node   <none>           <none>
ibm-mf-analytics-58574df7d6-49bft         1/1     Running     0          42h   172.16.77.142   master-node   <none>           <none>
ibm-mf-analytics-recvr-7bf6857955-xw4ct   1/1     Running     0          42h   172.16.77.191   master-node   <none>           <none>
ibm-mf-defaultsecrets-job-2bcg9           0/2     Completed   0          42h   172.16.77.163   master-node   <none>           <none>
ibm-mf-server-867d785477-bmzmp            1/1     Running     0          42h   172.16.77.184   master-node   <none>           <none>
mf-operator-5c499fd5d8-84sfr              1/1     Running     0          42h   172.16.77.154   master-node   <none>           <none> 

Elastic Search master pod IP address in this example is: 172.16.77.143

b. Execute below API to verify that elastic search cluster is up and running:

curl --location --request GET 'http://172.16.77.143:9200/_nodes/process?pretty' 

c. Execute below API to set backup directory

curl --location --request PUT 'http://172.16.77.143:9200/_snapshot/my_backup' --header 'Content-Type: application/json' --data '{"type": "fs","settings": {"location": "/es_backup"}}' 

Success response: {"acknowledged":true}

d. Execute below API to take snapshot

curl --location --request PUT 'http://172.16.77.143:9200/_snapshot/my_backup/snapshot_1'              

Success response: {"accepted":true}

e. Verify that the snapshot is created [Run this command on the VM/Node where /es_backup volume is created ]

            ls /es_backup/
            
            index  indices  metadata-snapshot_1  snapshot-snapshot_1

Execute below steps after upgrading to PMF 9.1

Restore MFP 8.0 analytics snapshot data in PMF 9.1

#1. Create PersistentVolume and PersistentVolumeClaim and point it to MFP snapshot (/es_backup directory located on the node/vm) also ensure that elasticsearch.yaml is relfecting path.repo location which is /es_backup

a. Create PersistentVolume as below

# es-pv-volume.yaml 
apiVersion: v1 
kind: PersistentVolume 
metadata: 
  name: es-pv-volume 
  labels: 
    type: local 
spec: 
  storageClassName: manual 
  capacity: 
    storage: 2Gi 
  accessModes: 
    - ReadWriteMany 
  hostPath: 
    path: "/es_backup" 
kubectl apply -f es-pv-volume.yaml

b. Create PersistentVolumeClaim as below

# es-pv-claim.yaml 
apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  name: es-pv-claim 
spec: 
  storageClassName: manual 
  accessModes: 
    - ReadWriteMany 
  resources: 
    requests: 
      storage: 2Gi  
kubectl apply -f es-pv-claim.yaml 

[Change the storage size as per your requirements]

kubectl get pv
NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
es-pv-volume     2Gi        RWX            Retain           Bound    mfpmig2/es-pv-claim            manual         <unset>                          12s
kubectl get pvc
NAME                   STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
es-pv-claim            Bound    es-pv-volume     2Gi        RWX            manual         <unset>                 9s

#2. Make volumeMounts and volumes changes in elasticsearch deployments & statefulset

To see the deployments,

kubectl get deployments

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
es-operator              1/1     1            1           40h
ibm-es-esclient          1/1     1            1           40h
ibm-es-esmaster          1/1     1            1           40h
ibm-mf-analytics         1/1     1            1           41h
ibm-mf-analytics-recvr   1/1     1            1           41h
ibm-mf-server            1/1     1            1           41h
mf-operator              1/1     1            1           41 

To see the statefulsets

kubectl get statefulsets
NAME            READY   AGE
ibm-es-esdata   1/1     40h

Modify ibm-es-esclient and ibm-es-esmaster deployments to update volumeMounts and volumes as per below:

a. Modify ibm-es-esclient deployment

 kubectl edit deployment ibm-es-esclient

Add below volumeMount:

         - mountPath: /es_backup 
          name: hp-volume 

Add below volume:

       - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim  

Example after adding volume and volumeMount in ibm-es-esclient deployment:

         volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: storage 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 
           
...
...
...
      volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - emptyDir: {} 
        name: storage 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config                         

b. Modify ibm-es-esmaster deployment

 kubectl edit deployment ibm-es-esmaster

Add below volumeMount:

          - mountPath: /es_backup 
          name: hp-volume 

Add below volume:

        - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 

Example after adding volume and volumeMount in ibm-es-esmaster deployment:

        volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: storage 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 
...
...
...           

      volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - emptyDir: {} 
        name: storage 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config    

c. Modify ibm-es-esdata statefulset

kubectl edit statefulset ibm-es-esdata 

Add below volumeMount:

        - mountPath: /es_backup
          name: hp-volume 

Add below volume:

      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 

Example after adding volume and volumeMount in ibm-es-esdata statefulset:

         volumeMounts: 
        - mountPath: /es_backup 
          name: hp-volume 
        - mountPath: /data 
          name: analytics-data 
        - mountPath: /elasticsearch/config/elasticsearch.yml 
          name: config 
          subPath: elasticsearch.yml 
        - mountPath: /elasticsearch/config/log4j2.properties 
          name: config 
          subPath: log4j2.properties 

...
...
...           

      volumes: 
      - name: hp-volume 
        persistentVolumeClaim: 
          claimName: es-pv-claim 
      - name: analytics-data 
        persistentVolumeClaim: 
          claimName: mfanalyticsvolclaim2 
      - configMap: 
          defaultMode: 420 
          name: ibm-es-es-configmap 
        name: config 

#3. Verify that /es_backup path is mounted in each of the elasticsearch pods

ibm-es-esclient-69f4974f8f-brczt          1/1     Running     0          42h
ibm-es-esdata-0                           1/1     Running     0          42h
ibm-es-esmaster-6548b4ddf9-l884h          1/1     Running     0          42h

a. Verify that /es_backup path in ibm-es-esclient pod

kubectl exec -it ibm-es-esclient-7cb8768cf5-kv5lc bash
bash-4.4$ cat /elasticsearch/config/elasticsearch.yml 
cluster:
  name: ${CLUSTER_NAME}
...
...
...

xpack.ml.enabled: false
compress.lzf.decoder: optimal
discovery.zen.ping.multicast.enabled: false
bootstrap.mlockall: true
compress.lzf.decoder: safe
script.inline: true
path.repo: /es_backup


bash-4.4$ ls -ld /es_backup/

You should see: path.repo: /es_backup in elasticsearch.yaml and also a mounted path

b. Similiarly verify /es_backup path for ibm-es-esdata-* and ibm-es-esmaster-* pods also

c. Verify that snapshot data available inside /es_backup directory

     $ls /es_backup/ 

     index  indices  metadata-snapshot_1  snapshot-snapshot_1 

#4. Restore snapshot using elasticsearch API

a. Get elasticsearch master pod IP address

kubectl get pods -o wide
NAME                                      READY   STATUS      RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
es-operator-658779fbb4-xbm76              1/1     Running     0          42h   172.16.77.139   master-node   <none>           <none>
ibm-es-esclient-69f4974f8f-brczt          1/1     Running     0          42h   172.16.77.138   master-node   <none>           <none>
ibm-es-esdata-0                           1/1     Running     0          42h   172.16.77.135   master-node   <none>           <none>
ibm-es-esmaster-6548b4ddf9-l884h          1/1     Running     0          42h   172.16.77.137   master-node   <none>           <none>
ibm-mf-analytics-58574df7d6-49bft         1/1     Running     0          42h   172.16.77.142   master-node   <none>           <none>
ibm-mf-analytics-recvr-7bf6857955-xw4ct   1/1     Running     0          42h   172.16.77.191   master-node   <none>           <none>
ibm-mf-defaultsecrets-job-2bcg9           0/2     Completed   0          42h   172.16.77.163   master-node   <none>           <none>
ibm-mf-server-867d785477-bmzmp            1/1     Running     0          42h   172.16.77.184   master-node   <none>           <none>
mf-operator-5c499fd5d8-84sfr              1/1     Running     0          42h   172.16.77.154   master-node   <none>           <none> 

Elastic Search master pod IP address in this example is: 172.16.77.137

b. Execute below API to verify that elastic search cluster is up and running:

  curl --location --request GET 'http://172.16.77.137:9200/_nodes/process?pretty'

c. Execute below API to set backup directory

  curl --location --request PUT 'http://172.16.77.137:9200/_snapshot/my_backup' --header 'Content-Type: application/json' --data '{"type": "fs","settings": {"location": "/es_backup"}}'  
 
  Success response: {"acknowledged":true}

d. Execute below API to restore the snapshot

  curl --location --request POST 'http://172.16.77.137:9200/_snapshot/my_backup/snapshot_1/_restore'

  Success response: {"accepted":true}   

e. If you get open index error then close the index and try restore command again

To close the index, execute below API

  curl --location --request POST 'http://172.16.77.173:9200/global_all_tenants/_close'

where ‘global_all_tenants’ is the index to be closed. For each open index error, run execute this API with the given open index

f. Try restore after closing all the open indexes

    curl --location --request POST 'http://172.16.77.137:9200/_snapshot/my_backup/snapshot_1/_restore'
    
    Success Response: {"accepted":true}

g. Once the restore is successful, open all the closed index

To open an index, execute below API ```bash
curl --location --request POST 'http://172.16.77.173:9200/global_all_tenants/_open'    ```

h. Restart the analytics pod bash kubectl delete pod ibm-mf-analytics-58574df7d6-49bft 92

Last modified on