• No se han encontrado resultados

A 5620 SAM server and database pair installed on Solaris can be deployed in a redundant configuration to provide greater fault tolerance. 5620-SAM redundancy uses extra server and database components to ensure that there is no single point of software failure within the 5620-SAM system. A redundant configuration that uses redundant physical links to the managed network ensures that the network remains visible to the 5620 SAM during a single network or hardware failure.

The state of a server or database defines the current role of the component, whether primary or standby. The primary server actively manages the network while the primary database is open in read/write mode. When a standby component detects a primary component failure, it automatically assumes the primary role. A server role change, whether automatically or manually performed, is called an activity switch.

An automatic database role change is called a failover, while a manually performed database role change is called a switchover.

The 5620 SAM supports collocated and distributed redundancy configurations (for VN2 a Distributed System has be recommended). In a distributed configuration, the 5620-SAM server and the 5620 SAM database are installed on separate stations. The servers and databases are four independent entities, regardless of the physical or geographical deployment.

The 5620 SAM servers ping each other periodically to verify redundancy. If the standby server fails to reach the primary server over a 60-s period, the standby server activates and becomes the primary server. The 5620 SAM databases achieve redundancy through Oracle functionality.

The 5620 SAM client GUIs always connect to the current primary server. After a server activity switch, the 5620 SAM client GUIs connect to the new primary server, which is the former standby server. Activity switches are transparent to a

5620 SAM client. OSS clients also connect to the primary server, but after an

activity switch, the connection is lost. OSS clients must know which server is the new primary server before they can again interact with the 5620 SAM.

Both the primary and standby servers have visibility of the redundant databases. You can use the 5620 SAM GUI to do the following related to redundancy:

• Check the status of redundant 5620 SAM servers and databases

• Perform a switchover from the primary to the standby 5620 SAM database You can use scripts on the server to perform a server activity switch.

Redundancy is configured during a 5620-SAM server or database component installation.

8.5.1 Activity switches, failovers, and switchovers

The figure below shows the three types of redundancy activities:

• Server activity switches

• Database switchovers

• Database failovers.

Figure 35: Server activity switch, and database switchover and failover

8.5.2 Server activity switches

A 5620 SAM server activity switch can be performed to:

• Recover from subnet failures or errors

• Test redundancy in case of a catastrophic failure

• Prepare a station for a software upgrade There are two types of activity switches for servers:

• Automatic

• Manual

During a server activity switch transition period, a main server does not process SNMP traps from the network, and no regularly scheduled resynchronizations occur. For example, when an on-demand statistics request occurs during an activity switch, the 5620 SAM collects no statistics for the current collection period. An auxiliary server continues to process outstanding requests during an activity switch but does not communicate with a main server during this time.

The figures below show the relationship of the servers and databases before and after a server activity switch.

Figure 36: Before activity switch

Figure 37: After activity switch During the activity switch shown in the figures above:

• Alarms are raised.

• Client GUIs receive a message about the redundancy change to indicate that the server is not available and they must connect after the activity switch completes.

After an activity switch:

• GUI clients communicate with the new primary server and are aware of the current redundancy status.

• OSS clients must reconnect to the primary server, as described in the 5620 SAM-O OSS Interface Developer Guide [4]

• The new primary server establishes communication and synchronizes information with the known auxiliary servers in the 5620-SAM domain.

• Auxiliary servers no longer exchange information with the former primary server.

• The Preferred or Reserved state of an auxiliary server may change, depending on the new primary main server configuration settings.

If the primary server does not complete a deployment request from a client before the activity switch, the new primary server attempts to redeploy the request.

8.5.3 Database switchovers

The figures show the relationship of the servers and databases before and after a successful database switchover.

Figure 38: Before Switchover

Figure 39: After Switchover

Before a database switchover, the primary server requests that each auxiliary server in the cluster release all connections to the current primary database.

When a switchover is successful:

• The primary and standby database roles are reversed.

• The primary server can establish connections to the new primary database.

• Archive logging begins on the new primary database.

• The primary server directs each auxiliary server to connect to the new primary database.

When a switchover is not successful, the databases remain in the original primary and standby roles.

8.5.4 Database failovers

A failover is an automatic transition of a standby database to the primary database role, for example, when there is a catastrophic primary database failure. After a failover, database redundancy is inactive.

A database failover is a disaster-recovery mechanism. A failover occurs only when the primary server and the standby database lose visibility of the primary database.

When this happens, the primary server initiates a database switchover on the standby database. Failover functionality is enabled by default during 5620-SAM database installations.

A database failover occurs after one of the following events:

• The primary database station loses power.

• The primary database station becomes unreachable over the network.

• The primary database crashes or otherwise ceases to communicate with the primary server.

The following conditions must be present before a database failover occurs:

• The standby database is configured, operational, and reachable.

• The primary database is unavailable.

• All auxiliary server connections to the primary database are closed.

Before it initiates a failover, the primary server requests that each auxiliary server in the cluster release all connections to the current primary database. After a successful failover, the primary server directs each auxiliary server to connect to the new primary database.

The figures below show the relationship of the server and the databases before and after a successful failover

Figure 40: Before failover

Figure 41: After failover

When a failover is not successful, the primary server retries its connection to the former primary database. If the former primary database remains unavailable, the primary server again attempts to initiate a failover.

Communication between the primary server and the primary database must fail for a minimum of five minutes before an automatic failover occurs. If the failure also triggers a server activity switch, the time increases to a minimum of 15 min.

The table below lists the minimum failure periods for specific scenarios before a database failover occurs.

Scenario Minimum length of communication failure Primary database crashes, or otherwise

ceases to communicate 5 min

Primary database station loses power 5 min Primary database station becomes

unreachable 5 min

Table 18: Failover minimum timings

8.5.5 Events associated with an activity switch.

Type Description Notes

Database switchover

A switchover is a manual operation that reverses the primary and standby database roles, for example, during primary database maintenance, or to realign database roles with specific database stations after a server activity switch.

For a switchover to occur both the primary and standby databases must be functioning correctly and communicating with each other.

When the primary 5620 SAM server detects a communication failure between itself and the primary 5620 SAM database, the client GUIs are informed that the primary 5620 SAM database is not reachable.

When the primary 5620-SAM server detects a communication failure between itself and the standby 5620 SAM database, the

Type Description Notes Database failover A failover is the automatic transition of a

standby database to the primary database role, for example, after a catastrophic primary database failure.

A failover involves the following sequence of events.

• The primary server fails to establish communication with the primary database for a certain time period.

• The primary server asks each auxiliary server in the cluster to release all database connections.

If all database connections are not released within 15 min, the switchover fails.

• After all database connections are released, the primary server initiates a switchover to the standby database.

• The switchover completes; the former standby database is the new primary database.

• The primary server restarts to ensure proper synchronization of information with the new primary database.

• Depending on the duration of the primary server restart, the standby server may interpret peer unresponsiveness as a failure and attempt an activity switch to primary.

• The primary server directs each auxiliary server to connect to the new primary database.

After a failover, the former primary database can be restored to the 5620- SAM system as the new standby database. Database redundancy is not available until re-instantiation is complete.

A failover results in minimal data loss.

client GUIs are notified that the standby 5620 SAM database is not reachable.

After the problem that caused the communication failure is resolved, the clients GUIs are notified that database redundancy is operational.

Type Description Notes Re-establishing

database redundancy

Re-establishing database redundancy is a user action that restores the primary database to the redundant 5620-SAM system as the new standby database.

After a failover, the former primary database does not participate in database redundancy until a 5620 SAM operator with the appropriate scope of command role re-instantiates it as the new standby database.

The following conditions must be met before database redundancy can be re-established.

• The failover completes successfully.

• The station that contains the primary database is operational.

• The former primary proxy port is configured and in service.

Documento similar