StaQWare: High Availability for Cobalt RaQ3i Servers - page 2
Extending the Cobalt RaQ3i Server Clusters
StaQWare lets you quickly create a high-availability cluster of two RaQ3i servers, configured as an active/standby pair. The RaQ3i's must have identical internal disks and dual NICs -- only RAM can vary. External storage and non-RaQ3i servers cannot be used with StaQWare. No hardware changes are required to your LAN or the active RaQ3i -- StaQWare is truly a "drop in" upgrade. Because the standby RaQ3i is wiped clean during installation, most will purchase a StaQWare license ($999) and a standby RaQ3i (from $2m499) for each existing RaQ3i to be upgraded with HA.
RaQ3i's have dual 10/100 NICs. "Network 1" NICs connect both servers to your production LAN for normal traffic and availability monitoring. "Network 2" NICs connect the pair back-to-back for real-time data synchronization (peer-to-peer RAID). Cobalt strongly recommends a cross-over cable be used for Network 2. We found no reason not to: RaQ3i's with StaQWare do not listen to any other protocol on this NIC, and synchronization is much slower if a hub is used. Supplied diagrams clearly illustrate cabling alternatives.
Installation is simple. Cable the units and power them on. Use the new (standby) RaQ3i's LCD control panel to configure Network 1 address. Access the Server Management GUI on the existing (active) RaQ3i at http://activeIP:81/.cobalt/sysManage. Select Maintenance / Install Software and verify that the OS is version 2.0 or later. If not, upgrade the OS using the supplied CD. We subsequently upgraded to version 3.0 and strongly recommend that all users do so before StaQWare installation -- we'll explain why later.
StaQWare is installed just like as any other RaQ software upgrade: select the file supplied on the StaQWare CD using Maintenance / Install Software. After reboot, use this page to jump to HA Cluster Management or go to http://activeIP:81/.cobalt/cobalt-ha directly. There, select Cluster Network to launch a 3-step configuration wizard.
To configure StaQWare, you'll need the admin login/password and six (6) IP addresses:
- a) One pair for normal client/server traffic on Network 1,
- b) One pair for StaQWare availability monitoring on Network 1, and
- c) One pair for StaQWare synchronization on Network 2.
While c) can be private, a) and b) must be routable public addresses. Address a) on the active RaQ3i is assumed by the standby RaQ3i during failover -- this is the external address used by clients to reach the server cluster. The other addresses must be distinct from virtual site addresses hosted by this server.
After initial configuration, the active RaQ3i contacts the standby RaQ3i and begins to mirror stored data to the standby, a process that can take up to 3 hours. During this time, the active RaQ3i remains in-service, supporting web, file, and mail clients.
Cluster Status and ControlOnce created, all HA Cluster Management occurs through the active RaQ3i. . The standby cannot be managed through either NIC -- although it can be monitored by pinging address b).
A Cluster Status page indicates whether HA services are available and, if not, why not. As a rule, HA services are unavailable ("degraded") whenever the standby RaQ3i is unreachable, data has not yet been (re)synchronized, or HA has been administratively disabled.
When HA services are available, the active and standby continuously ping each other over Network 1, and synchronize data using peer-to-peer RAID over Network 2. If the standby cannot ping the active for a configurable period (by default, 5 seconds), it initiates failover. If the active cannot ping the standby for this period, it declares HA services unavailable. This "outage tolerance" can be adjusted up to 5 minutes to accommodate frequent-but-normal loss of reachability.
HA services can also be disabled for up to 24 hours to permit maintenance of the standby. For scheduled maintenance of the active, manually initiate failover, then disable HA services.