IBM SmartCloud Control Desk application failover scenarios were tested by simulating database failover. The application availability was tested using DB2 using HADR and shared disk configuration.
Database HADR failover testing
The following failover scenarios were tested with the DB2 HADR setup:
System failure
This scenario was tested by powering down the primary database server. The entire workload was transferred to the secondary database server by the cluster manager. The following steps were executed:
1. Run the lssam command as root from either the primary or the secondary database server. Figure 3-48 shows the output of the lssam command in the normal operating environment.
Figure 3-48 lssam output for a normal operating environment
2. Log on to the IBM SmartCloud Control Desk application and navigate to one of the application panels.
3. Shut down the primary database server. Power off the server from the console as if the server crashed and powered off. Run lssam on the
secondary server to see the behavior of the system. Figure 3-49 displays the output of lssam in case of a server failure.
Figure 3-49 lssam output in case of a server failure
4. The IBM SmartCloud Control Desk session hangs for a short interval while the cluster manager transfers the workload to the secondary server. In one of the tests, the IBM SmartCloud Control Desk session was lost. In that case relog in to the application and resume work. All the transactions that were not committed would be lost or rolled back.
5. All the resources are now transferred to the secondary server. When the primary server comes back up, the old primary server will be added back to the cluster manager and monitored.
Process failure
This scenario was tested by simulating the DB2 server process failure. The database server instance was shut down while the application was connected. The cluster manager detected that the DB2 server process was down and restarted the process. The following steps were executed:
1. Run the lssam command as root from either the primary or the secondary database server. The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one of the application screens.
3. Issue the db2_kill command to abruptly end all the DB2 server processes. Run lssam as root user to list the status of the cluster. Figure 3-50 on page 145 shows the lssam output during the DB2 server process failure.
Figure 3-50 lssam output during the DB2 server process failure
Graceful transfer to secondary server
This scenario was tested by manually transferring the resources to the
secondary server. In case of a planned change the application resources can be transferred to the secondary server while the primary server undergoes any maintenance change. The following steps were executed:
1. Run lssam as root from either the primary or the secondary database server. The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one of the application panels.
3. Issue the rgreq -o move db2_db2inst1_db2inst1_MAXDB75-rg command to move the resources over to the secondary server.
4. All the DB2 resources are transferred to the secondary node. The DB2 application or the server can now be taken down for maintenance or changes.
Symptoms of failover
When a database failover occurs, the IBM SmartCloud Control Desk application will appear to hang until the database failover sequence is complete. When service is restored, the user interface may show a brief database error: The database connection failed and the record was not retrieved. Try the operation again. If you experience repeated failures, check the log files in the home directory or contact your system administrator.
Sometimes the user may receive a blank panel when using the application during failover. Refreshing the browser page often corrects this problem. If the browser
session cannot be recovered, the user may need to navigate back to the login page and re-authenticate.
DB2 shared disk failover testing
The following failover scenarios should be tested with the DB2 shared disk setup.
System failure
This scenario can be tested by powering down the primary database server. The entire workload should be transferred to the secondary database server by the cluster manager.
1. Run lssam as root from either the primary or the secondary database server. Figure 3-51 shows the output of the lssam command in the normal operating environment.
Figure 3-51 lssam output for normal operating environment
2. Log on the IBM SmartCloud Control Desk application and navigate to one of the application panels.
3. Shut down the primary database server. Run lssam on the secondary server to see the behavior of the system. Figure 3-52 on page 147 shows the lssam
Figure 3-52 lssam output in case of a server failure
4. All the resources are now transferred to the secondary server. When the primary server comes back up, the old primary server will be added back to the cluster manager and monitored.
Process failure
This scenario simulates the DB2 server process failure. The cluster manager detects that the DB2 server process is down and restarts the process.
1. Run lssam as root from either the primary or the secondary database server. The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one of the application panels.
3. Issue the db2_kill command to abruptly end all the DB2 server processes. Run lssam as root user to list the status of the cluster. Figure 3-53 displays the lssam output in case of DB2 process failure in the shared disk setup.
4. The DB2 process should restart and the application should continue processing as normal.
Graceful failover
This scenario can be tested by manually transferring the resources to the secondary server. In case of a planned change the application resources can be transferred to the secondary server while the primary server undergoes any maintenance change.
1. Run lssam as root from either the primary or the secondary database server. The output should indicate normal operation.
2. Log on to the IBM SmartCloud Control Desk application and navigate to one of the application panels.
3. Issue the rgreq -o move db2_db2inst1_db2inst1_0-rg command to move the resources over to the secondary server.
4. All the DB2 resources are transferred to the secondary node. The DB2 application or the server can now be taken down for maintenance or changes.
Symptoms of failure
When a database failover occurs, the IBM SmartCloud Control Desk application will appear to hang until the database failover sequence is complete. When service is restored, the user interface may show a brief database error: The database connection failed and the record was not retrieved. Try the operation again. If you experience repeated failures, check the log files in the home directory or contact your system administrator.
Sometimes you may receive a blank panel when using the application during failover. Refreshing the browser page often corrects this problem. If the browser session cannot be recovered, you may need to navigate back to the login page and re-authenticate.
3.10 Conclusion
This chapter gave an overview and configuration examples for local high availability. It described how to eliminate single points of failure in an IBM SmartCloud Control Desk environment.