Links

Failover - Active/Passive

Introduction

Manage the product implemented in active/passive cluster mode.

Target

This procedure is applied for Nodeum Active – Passive Implementation.

Architecture

Implementation Overview

The failover implementation is set through an Ansible package during the initial Nodeum deployment.
The Ansible inventory definition contains the list of all members of the cluster and associated services.
Once deployed, the systems is designed to provide these main services :
  • Redundancy of the services Nodeum and associated components
  • Redundancy of the cache disk
  • Single NameSpace IP Address
This means that if one of the node fall-down, the contents are automatically accessible from the second node.

Ansible inventory overview

The following definition describes the deployment of each services across the two servers.
20-main
[mongodb]
srv01
srv02
[web]
srv01
srv02
[core]
srv01
srv02
[scheduler]
srv01
srv02
[refparser]
srv01
srv02
[mount_point_scanning]
srv01
srv02
11-mariadb-cluster
[mariadb]
; When updating from a single node to a cluster, only the node that
; previously hold the data should be set to `true`
srv01 galera_master=true
srv02
31-catalog-indexer
[zookeeper_nodes]
srv01
srv02
[solr]
srv01
srv02
[solr:vars]
; Product of these two variables should be lower or equal to number of hosts
; Number of shards to split the collection into.
; Default: 3
; solr_shards=3
; Number of copies of each document in the collection.
; Default: 1
; solr_replication_factor=1
[catalog_indexer]
srv01
srv02
31-catalog-indexer
[gluster_cache]
srv01 gluster_cache_device=/dev/sdb gluster_logical_size=100%FREE
srv02 gluster_cache_device=/dev/sdb gluster_logical_size=100%FREE
[gluster_cache:vars]
; Arbiter count
; gluster_arbiters=
; Disperse count
; gluster_disperses=
; Redundancy count
; gluster_redundancies=
; Replica count
; gluster_replicas=
; Stripe count
; gluster_stripes=

Resiliency level – Cluster – Failover

Definitions

Cluster
Service in cluster mode is running simultaneously on both nodes without any interruption.
Failover
This mode allows automatic service restarts the connection when the solution falls down to the passive server or move back to active one. Failover is associated to the switch of the virtual IP address to the passive server.
You can find a list of all services with their associated level of resiliency :
Nodeum Services
Resiliency Level
Notification Manager
Failover
Core Manager
Failover
Tape Library Manager
Failover
Data Mining
Failover
File System Virtualization
Failover
Watchdog
Failover
Ref. File Parsing
Failover
Scheduler
Failover
File Listing Processing
Failover
Indexation Engine
Failover
System Services
RESILIENCY LEVEL
CACHE Disk
Cluster
Solr
Cluster
NGINX
Cluster
MariaDB
Cluster
MongoDB
Cluster
SMB
Cluster
NFS
Cluster
MinIO
Not yet available

System troubleshooting

Status of services

The status of each service can be monitor and be visible in accessing each web interface of each node. Active server has all Nodeum services up and running when the Passive server needs to have the “Core Manager”, “Scheduler”, “File Listing Processing” and “Indexation Engine” services stopped.
Server “Active”
Server “Passive”
Server “Active”
Server “Passive”

How to do maintenance on the cluster node ?

The maintenance of the cluster requires to stop each server separately and always keep one server active. The process required to shutdown one server in using either the Nodeum Console of the server you want to shut down, either in connecting to the server in SSH and do the shutdown.

How to verify which node is the active one ?

The active node is always the one who has the clustered IP assigned. This can be display in using the command ‘ip address show’. This command will show the ip address defined on the connected network interface; and for the server active, the clustered ip address is also defined in addition to the main ip address.
In this example:
- ens160 is the name of network interface device in both servers
- ip address of cluster is : 10.3.1.153
- ip address of nodcluster01 is : 10.3.1.154
- ip address of nodcluster02 is : 10.3.1.155
$ ip address show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:be:f4:1d brd ff:ff:ff:ff:ff:ff
inet 10.3.1.154/24 brd 10.3.1.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::8216:6cb0:f936:9863/64 scope link noprefixroute
valid_lft forever preferred_lft forever
$ ip address show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:be:cc:d4 brd ff:ff:ff:ff:ff:ff
inet 10.3.1.155/24 brd 10.3.1.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 10.3.1.153/32 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::641e:f11a:fe5d:e531/64 scope link noprefixroute
valid_lft forever preferred_lft forever

Situation 1: Unexpected stop of active node

List of cases:
- Power outage
- Virtual Server down
- Loss of Operating System
What’s happened:
- Nodeum switch the second (passive) node automatically
- Services are restarted on the second for the failover service.
Note: In this situation, the second (passive) node is not aware that a state transfer of the cluster is done. The cluster is suspected to be split and the node is in a smaller part (for example, during a network glitch, when nodes temporarily lose each other). The node takes this measure to prevent data inconsistency.
Result:
I can't get into the Nodeum Console, it gives me a “internal 500 error”.
Determine the root cause:
Check the Status of MariaDB with this command: ‘systemctl status mariadb’. The status may display the following error message 'WSREP has not yet prepared node for application use'.
Resolution:
This temporary state which can be detected by checking wsrep_ready value. The node allows SHOW and SET command during this period.
$ mysql -u root -padmin
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_ready';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_ready | OFF |
+---------------+-------+
In the Server that have the Issue:
$ mysql -u root -padmin
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 48
Server version: 10.4.18-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

Situation 2: Two nodes stopped Restart two nodes which fall down in the same time

List of cases:
- Power outage on both node
- (Virtual) Servers down
- Virtual Cluster down
- Loss of Operating System
What’s happened:
- All servers are down and must be restarted when systems are back online
- Once servers are restarted, they need to elect the master one to handle the DB Cluster service.
Note : If you shut down all nodes at the same time, then you have effectively terminated the cluster. Of course, the cluster's data still exists, but the running cluster no longer exists.
Result :
MariaDB does not start correctly.
Resolution :
Once you restart the servers, you'll need to bootstrap the cluster again. If the cluster is not bootstrapped and MariaDB on the first node is just started normally, then the node will try to connect to at least one of the nodes listed in the wsrep_cluster_address option.
If no nodes are currently running, then this will fail. Bootstrapping the first node solves this problem. In some cases, Galera will refuse to bootstrap a node if it detects that it might not be the most advanced node in the cluster. Galera makes this determination if the node was not the last one in the cluster to be shut down or if the node crashed. In those cases, manual intervention is needed.
If you experience this issue the recovery_galera command solves it
$ /usr/bin/galera_recovery
If we cannot recover with the recovery_galera command it means that we will have to do it manually, for which on the server we will edit the file /var/lib/mysql/grastate.dat and change the value of safe_to_bootstrap: 0 to safe_to_bootstrap: 1 on the server that we believe has the most up-to-date data from the databases.
$ vi /var/lib/mysql/grastate.dat
Then on the same server we execute the following command:
$ galera_new_cluster
And on the other server we start MariaDB normally
$ systemctl restart mariadb
With this, our MariaDB cluster should be normalized.

Situation 3: Lost of network connectivity on node 1

List of cases:
- Network Equipments are down
- Network Cable(s) connected to the server are faulty
- Network interface of the server is faulty
What’s happened:
- The server is unreachable from a network point of view; then the failover service of the cluster will detect that the server is not reachable anymore from the network.
- The result is that the system will failover to the second server and reassign the clustered ip to the second server.

Situation 4: Unexpected disconnection of the cache storage

List of cases:
- Network have been disconnected – flapped
- Network Cable(s) connected to the server are faulty
- Internal disk that serves as cache has been disconnected
What’s happened:
- The server is unreachable from a network point of view; the internal volume serving the cache is not available.
- The result is that the system that the Container contents cannot operate properly.
- Task(s) can display some file with the status as ‘NO FILE’.
Result :
Service ‘nodeum_file_system_virt’ does not start correctly.
Resolution :
On both servers, execute these actions:
Node 1: unmount the volume manually and remount it
$ umount /srv/gluster/nodeum_cache_brick
$ mount -a
Node 2: Unmount the volume manually and remount it
$ umount /srv/gluster/nodeum_cache_brick
$ mount -a
Afterwards, you can restart the GlusterFS daemon and the Nodeum File System virtualization service.
Node 1:
$ systemctl restart glusterd
$ systemctl restart nodeum_file_system_virt
Node 2:
$ systemctl restart glusterd
$ systemctl restart nodeum_file_system_virt
At this stage, on both servers, you will be able to display the volume behind each of these volumes:
$ ls /srv/gluster/nodeum_cache_brick
$ ls /mnt/CACHE
$ ls /mnt/FUSE
If tasks reported file with ‘NO FILE’ status, then you have to restart the task and the problems should be resolved, meaning that all files have to be processed.
It is also important to use these commands to verify the good state of the Gluster File System
On both servers, we need to have the same results for these following commands:
$ gluster volume status nodeum_cache_brick clients
Client connections for volume nodeum_cache_brick
----------------------------------------------
Brick : 10.x.x.1:/srv/gluster/nodeum_cache_brick
Clients connected : 2
Hostname BytesRead BytesWritten OpVersion
-------- --------- ------------ ---------
10.x.x.1:49108 38043530114 230085103629 70200
10.x.x.2:49144 37815908 179832112 70200
----------------------------------------------
----------------------------------------------
$ gluster volume status nodeum_cache_brick
Status of volume: nodeum_cache_brick
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.x.x.1:/srv/gluster/nodeum_cache
_brick 49152 0 Y 110881
Brick 10.x.x.2:/srv/gluster/nodeum_cache
_brick N/A N/A Y 16258
Task Status of Volume nodeum_cache_brick
------------------------------------------------------------------------------
There are no active volume tasks
$ gluster volume status nodeum_cache_brick clients
Client connections for volume nodeum_cache_brick
----------------------------------------------
Brick : 10.x.x.1:/srv/gluster/nodeum_cache_brick
Clients connected : 1
Hostname BytesRead BytesWritten OpVersion
-------- --------- ------------ ---------
10.x.x.1:49108 38319242436 231731737617 70200
----------------------------------------------
Brick : 10.x.x.2:/srv/gluster/nodeum_cache_brick
Clients connected : 1
Hostname BytesRead BytesWritten OpVersion
-------- --------- ------------ ---------
10.x.x.2:49147 372933965 2066922877 70200
----------------------------------------------
$ gluster peer status
Number of Peers: 1
Hostname: 10.x.x.2
Uuid: a3b18da3-5a42-4399-9480-105f1f7032fb
State: Peer in Cluster (Connected)
$ gluster peer status
Number of Peers: 1
Hostname: 10.x.x.1
Uuid: a3b18da3-5a42-4399-9480-105f1f7032fb
State: Peer in Cluster (Connected)

Point of attention

Make sure that the “/” directory has enough space.
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 576K 16G 1% /dev/shm
tmpfs 16G 1.6G 15G 11% /run
tmpfs 16G 0 16G 0%
/dev/mapper/centos_nodeu-root 788G 29G 719G 4% /
/dev/sdb1 8.2T 7.5T 254G 97% /mnt/CACHE
/dev/sda1 1014M 275M 740M 28% /boot
tmpfs 3.2G 0 3.2G 0%
tmpfs 3.2G 0 3.2G 0% /run/user/0
core_fuse 102T 8.0T 94T 8% /mnt/FUSE

Backup Feature - Manual Execution

This procedure needs to be applied on each node of the cluster

How to execute a backup manually?

There is a command line that must be executed for starting a manual backup or restore.
The shell script is "/opt/nodeum/tools/backup_restore.sh"
$ /usr/mtc/bin/backup_restore.sh param1 param2
The first parameter: f / full backup or i / incremental backup.
The second parameter: it is the target path where the backup will be saved or where the backup is located for a restore
If the command line is configured to do an incremental, and there is no full already done, it will perform a full backup.
The incremental option will always increment an existing full backup. This means that the incremental backup is restorable.
Examples:
$ nohup /opt/nodeum/tools/backup_restore.sh full_backup /root/nodeum_bck_2302 &
"nohup" and "&" allow to run the backup script in daemon, there is a file named "nohup.out" ; this file contains the result of the executed command.

How to execute a restore manually?

There is a command line that must be executed for restoring a backup.
$ /opt/nodeum/tools /backup_restore.sh param1 param2
param1 : r for restore
param2 : source path where the backup is located
Example :
$ nohup /opt/nodeum/tools/backup_restore.sh restore /root/nodeum_bck_2302 &
"nohup" and "&" allow to run the backup script in daemon, there is a file named "nohup.out"; this file contains the result of the executed command.

Point of Attention

By default, when script is running, it uses a temporary folder : /tmp/bckp/ in the main file system ; this temporary folder is used to store the backup before moved to the final location. The temporary folder can be changed in specifying another folder in the 3rd argument.
Default temp folder :
In this example, the backup will be stored in the folder …/nas/backupnodeum/ and the backup system will use as implicit temporary cache which /tmp/
/bin/bash ./backup_restore.sh full_backup /mnt/MOUNT_POINTS/nas/backupnodeum
Another temp folder
In this example, the backup will be stored in the folder …/nas/backupnodeum/ and the backup system will use as temporary cache, the directory /mnt/CACHE/tempbck
/bin/bash ./backup_restore.sh full_backup /mnt/MOUNT_POINTS/nas/backupnodeum /mnt/
CACHE/tempbck

Point of Attention

If the backup do not run and the console mentioned that there is already another backup_restore.sh script running, there are two things to review :
  • Used a "ps -aef" command to verify if there is another process already running
  • It is possible that a lock file (nodeum_bkp_lock) is stored, this lock file is stored into the /tmp folder ; and this even if the temporary folder location has been changed.
Last modified 2mo ago