Table of Contents
- Overview
- Key Terms and Definitions
- Supported HA Modes
- Common HA Topologies
- HA Synchronization Data
- HA Failover Triggers
- Troubleshooting Checklist
- Useful CLI Commands
- Sample HA Configuration (CLI)
- Conclusion
Overview: FortiGate Firewall High Availability (HA)
What is FortiGate High Availability (HA)?
FortiGate High Availability (HA) is a feature that allows multiple FortiGate firewall units to work together in a cluster to provide continuous network protection and service. If one unit fails or becomes unavailable, another unit in the cluster automatically takes over, ensuring uninterrupted security and connectivity for your network.
Why You Need to Know About FortiGate HA
- Maximized Uptime: HA minimizes downtime by providing automatic failover in the event of hardware, software, or network failures.
- Business Continuity: Essential services remain available even during maintenance or unexpected outages, which is critical for organizations that rely on constant network access.
- Simplified Management: Centralized configuration and synchronization across cluster members reduce administrative overhead and ensure consistency.
- Scalability: HA can be used to increase throughput and performance by distributing traffic across multiple units (especially in Active-Active mode).
- Regulatory Compliance: Many industries require high availability for security infrastructure to meet compliance standards.
How FortiGate HA Works
- Cluster Formation: Two or more FortiGate units are grouped into a cluster. One acts as the primary (master), while others serve as secondary (backup) units.
- Heartbeat Communication: Dedicated interfaces (heartbeat links) are used for cluster members to exchange health and status information.
- Session and Configuration Synchronization: The primary unit continuously synchronizes session tables, configuration files, and routing information with secondary units to ensure a seamless transition if failover occurs.
- Failover Mechanism: If the primary unit fails (due to hardware, network, or software issues), a secondary unit automatically takes over, maintaining all active connections and security policies.
- HA Modes: FortiGate supports several HA modes:
- Active-Passive: Only one unit handles traffic at a time; others are on standby.
- Active-Active: Multiple units actively process traffic, sharing the load.
- Virtual Clustering: Used with Virtual Domains (VDOMs) to distribute different VDOMs across cluster members for granular redundancy and load balancing.
In summary, FortiGate HA is a critical feature for organizations seeking robust, resilient, and highly available network security. By understanding and implementing HA, you can protect your network from outages, simplify management, and ensure your security infrastructure meets both operational and compliance needs.
Key Terms and Definitions
These are the foundational terms you need to understand FortiGate Firewall High Availability (HA):
- High Availability (HA): The ability of a FortiGate cluster to maintain network connectivity even if one device or link fails, by having another unit take over without service interruption. All units share session and configuration data to ensure seamless failover[2][3].
- Primary (Master) Unit: The FortiGate device currently controlling the cluster and handling all active traffic. It is the authoritative source for configuration and session information[5].
- Secondary (Slave) Unit: The backup device(s) that monitor the primary unit and take over if the primary fails, ensuring continuity of service[5].
- HA Cluster: A group of two or more FortiGate units operating together to provide redundancy and failover capabilities. The minimum is two units, and up to four are supported in a cluster[3][5].
- Heartbeat Interface: Dedicated network interfaces used for HA status communication and synchronization between cluster units. If one heartbeat interface fails, another can take over[2].
- FortiGate Clustering Protocol (FGCP): The protocol used by FortiGate devices to manage HA clustering, including discovery, health monitoring, and data synchronization[5].
- Session Synchronization: The process of replicating session information between HA units to ensure seamless failover and no session loss during device transitions[2].
- Interface Monitoring: Also called port monitoring; checks the health of specific interfaces and triggers failover if a monitored interface fails or disconnects[2].
- HA Virtual MAC Address: A shared MAC address assigned to all interfaces in the cluster, used for cluster communications and to maintain network consistency during failover[2].
- Active-Passive Mode: Only the primary unit handles traffic, while secondary units remain on standby, ready to take over if needed[5].
- Active-Active Mode: Multiple units in the cluster share the traffic load for improved performance and redundancy[5].
Supported HA Modes
FortiGate firewalls support several High Availability (HA) modes to ensure network resilience and performance. Each mode is designed for different operational needs:
- Active-Passive (A-P): In this mode, one FortiGate unit (the primary) actively processes all network traffic, while the secondary unit(s) remain on standby. If the primary unit fails, a secondary unit automatically takes over, ensuring minimal service disruption. This is the most commonly used HA mode for redundancy and failover protection[5][6][7].
- Active-Active (A-A): Multiple FortiGate units in the cluster share the traffic load, with all units actively processing sessions. This mode not only provides failover protection but also increases throughput and performance by distributing sessions across the cluster members. If one unit fails, the remaining units continue to handle the traffic[5][6][7].
- Virtual Clustering: This advanced mode is used when Virtual Domains (VDOMs) are enabled. It allows you to partition the cluster so that different VDOMs are handled by different FortiGate units, providing both load balancing and failover protection at the VDOM level. Each virtual cluster operates as its own HA group, and traffic for each VDOM can be processed by a different unit[8][12][16].
HA Mode | Description | Best Use Case |
---|---|---|
Active-Passive | Primary unit handles all traffic; secondary units are on standby and take over if the primary fails. | Redundancy and failover protection |
Active-Active | All units share the traffic load, increasing performance and providing failover. | High performance and redundancy |
Virtual Clustering | VDOMs are distributed among cluster units for load balancing and failover at the VDOM level. | Multi-tenant environments with VDOMs |
Common HA Topologies
FortiGate firewalls can be deployed in several High Availability (HA) topologies to maximize redundancy and minimize downtime. Here are the most common HA topologies:
- Full Mesh Topology: Every FortiGate unit in the HA cluster is directly connected to every network component (such as redundant switches and links). This design minimizes single points of failure and provides the highest level of redundancy. If any single device or connection fails, traffic is rerouted through alternate paths, ensuring continuous network operation[7][12][13].
- Partial Mesh Topology: Only some FortiGate units or interfaces are interconnected, or there are redundant links to critical components but not all. This approach increases redundancy compared to a simple topology but may still have a few single points of failure. It's often used when full mesh is not feasible due to cost or complexity[13].
- Single Heartbeat (Back-to-Back): Two FortiGate units are connected directly via a dedicated heartbeat interface (using a crossover cable or direct patch). This is the simplest HA topology and is typically used for basic active-passive clusters. While easy to deploy, it offers less redundancy than mesh designs[9][19].
- Dual Heartbeat: Two or more dedicated heartbeat interfaces are used between the FortiGate units, often connected through separate switches. This increases redundancy for HA synchronization and helps prevent split-brain scenarios if one heartbeat link fails[9][19].
- Daisy-Chained Topology: FortiGate units are connected in a chain, often used when integrating with stacked or chained FortiSwitches. While this setup can simplify cabling, it may introduce additional points of failure and is less robust than mesh designs[2].
Topology | Description | Redundancy Level | Best Use Case |
---|---|---|---|
Full Mesh | All units and network components are interconnected with redundant paths. | Highest | Critical environments needing maximum uptime |
Partial Mesh | Some, but not all, components are redundantly connected. | Moderate | Balanced redundancy and cost |
Single Heartbeat | Direct connection between HA units using one dedicated interface. | Basic | Simple, small deployments |
Dual Heartbeat | Two or more dedicated heartbeat links for HA synchronization. | High | Environments needing heartbeat redundancy |
Daisy-Chained | Units connected in a chain, often with stacked switches. | Variable | Switch stacking or limited cabling options |
HA Synchronization Data
In a FortiGate High Availability (HA) cluster, synchronization ensures all units share the same critical data for seamless failover and consistent operation. The following types of data are synchronized across the HA cluster:
- Configuration Files: All configuration changes made on the primary unit are incrementally synchronized to subordinate units to keep settings consistent across the cluster. This includes firewall policies, objects, and most system settings, but excludes certain HA-specific parameters like device priority and hostnames[1][13].
- Session Tables: Active network sessions (such as TCP, UDP, and ICMP connections) are synchronized to ensure ongoing connections are not dropped during a failover. Session synchronization can be fine-tuned for performance, and dedicated interfaces can be used for this purpose[6][9].
- Routing Tables: Dynamic routing information is synchronized so that all cluster members have up-to-date route entries, ensuring uninterrupted packet forwarding after a failover event[1].
- ARP Tables: Address Resolution Protocol (ARP) tables are synchronized to maintain correct IP-to-MAC address mappings across the cluster, which is essential for seamless network communication[1].
- DHCP Lease Databases: DHCP server address lease information is kept in sync to prevent duplicate IP assignments and maintain client connectivity[1].
- IPsec SAs (Security Associations): VPN tunnel information is synchronized to ensure secure communications continue without interruption during failover[1].
- User Authentication Sessions: Information about authenticated users and their sessions is replicated, so users do not need to re-authenticate if a failover occurs[1].
Note: Some settings, such as device hostnames, HA priorities, and reserved management interface configurations, are not synchronized and must be manually configured on each unit if needed[13].
HA Failover Triggers
FortiGate High Availability (HA) clusters are designed to maintain network uptime by automatically switching control to a backup unit when certain failures or conditions are detected. Below are the main triggers that can cause an HA failover event:
- Hardware Failure: If the primary FortiGate unit experiences a hardware malfunction—such as power supply, CPU, or memory failure—the cluster will initiate a failover to a healthy secondary unit[1][14].
- Network Interface Failure: Loss of connectivity or link failure on a monitored interface (such as WAN or LAN ports) can trigger a failover to ensure uninterrupted traffic flow. Monitored interfaces are continuously checked for link status and connectivity[6][8].
- Heartbeat Loss: The heartbeat interfaces between cluster members are used to monitor the health and status of each unit. If the heartbeat signal is lost or degraded (due to cable or switch failure), a failover is triggered to prevent split-brain scenarios[8][9].
- System Software Crash: If the operating system or critical processes on the primary unit crash or become unresponsive, the cluster will fail over to a standby unit[1][14].
- Manual Intervention: Administrators can manually trigger a failover for maintenance, testing, or troubleshooting purposes using CLI or GUI commands[2][11].
- Configuration Mismatch or Sync Failure: Significant configuration mismatches or synchronization failures between cluster members can prompt a failover to maintain cluster integrity[1].
- Environmental Triggers: Events such as excessive temperature or power fluctuations detected by onboard sensors can also trigger a failover to protect hardware and maintain service availability[1].
Note: Proper monitoring and configuration of failover triggers are essential for reliable HA operation. Regularly test failover scenarios to ensure your cluster responds as expected in real-world conditions.
Troubleshooting Checklist
Use this checklist to systematically diagnose and resolve FortiGate High Availability (HA) issues. Following these steps will help ensure your HA cluster remains healthy and resilient:
- Check Physical Connections: Ensure all HA heartbeat and monitored interfaces are securely connected and functioning. Inspect cables, switches, and interface LEDs for any faults or disconnections[3].
- Verify Firmware Versions: Confirm that all FortiGate units in the cluster are running the exact same firmware version. Mismatched firmware can prevent synchronization and cluster formation[2].
- Check HA Configuration Consistency: Make sure HA settings (mode, group name/ID, priorities, passwords) are identical across all units. Even minor discrepancies can cause cluster issues[3].
-
Monitor HA Status and Synchronization:
Use CLI commands like
get system ha status
anddiagnose sys ha checksum cluster
to check cluster health, sync status, and identify any configuration mismatches[1][2][6]. - Review Cluster Logs and Events: Examine system logs and HA event history for recent failovers, errors, or warnings. This can reveal the cause of unexpected transitions or synchronization failures[1].
- Test Failover Functionality: Periodically perform controlled failover tests to ensure secondary units take over seamlessly and services remain uninterrupted[3].
- Check Disk and Storage Status: Verify that log disks and storage devices are healthy and available on all units. Mismatched or failed disks can cause sync problems[2].
- Validate Interface Monitoring: Ensure that all critical interfaces are being monitored for link status and that failover triggers are configured correctly[3].
-
Manual Sync and Troubleshooting Commands:
If the cluster is out of sync, try recalculating checksums or manually synchronizing using commands such as
diag sys ha checksum recalculate
andexecute ha synchronize
[2][6]. - Isolate and Rebuild Units if Needed: If persistent sync issues occur, isolate the problematic unit, reset to factory defaults, and rejoin it to the cluster after reconfiguration[2][13].
Note: Always back up configurations before making major changes or troubleshooting steps. Consult Fortinet documentation for advanced troubleshooting and support if issues persist.
Useful CLI Commands
Below are essential CLI commands for managing and troubleshooting FortiGate High Availability (HA) clusters. Use these commands to monitor HA status, manage cluster members, and resolve synchronization issues:
-
Show HA Status:
get system ha status
Displays the current HA cluster status, including mode, group, member roles, and synchronization state[1][7][13]. -
Show Detailed HA Information:
diagnose sys ha status
Provides detailed diagnostic information about the HA cluster, including health and event logs[4][7]. -
Access Secondary Unit:
execute ha manage <unit-id> [username]
Allows you to log in to and manage a specific HA member by its unit ID. Useful for direct troubleshooting on backup units[6][15]. -
Show Cluster History:
diagnose sys ha history read
Displays the history of HA events, including failovers and synchronization issues[4][7]. -
Show and Recalculate Cluster Checksums:
diagnose sys ha checksum cluster
diagnose sys ha checksum recalculate
Compares configuration checksums across cluster members and recalculates them to resolve out-of-sync problems[7][9]. -
Manually Start/Stop HA Synchronization:
execute ha synchronize start
execute ha synchronize stop
Initiates or halts manual synchronization between HA units[7]. -
Reset HA Uptime (Test Failover):
diagnose sys ha reset-uptime
Resets the HA uptime counter, often used to test failover behavior[7][19]. -
Debug HA Processes:
diagnose debug application hatalk -1
diagnose debug application hasync -1
diagnose debug enable
Enables real-time debugging for HA-related processes[7].
Tip: Always use caution when running commands that affect cluster synchronization or failover. Review official documentation and back up configurations before making changes.
Sample HA Configuration (CLI)
Below is a sample CLI configuration for setting up a basic Active-Passive High Availability (HA) cluster on FortiGate firewalls. This example demonstrates the essential commands and parameters to establish HA, synchronize sessions, and configure heartbeat interfaces:
config system ha set mode a-p set group-id 1 set group-name "FGT-HA-Cluster" set hbdev "port3" 50 "port4" 50 set session-pickup enable set override enable set priority 255 end
- set mode a-p: Configures the cluster in Active-Passive mode, where one unit is active and others are standby.
- set group-id 1: Assigns a unique group ID to identify the HA cluster. All units in the same cluster must use the same group ID.
- set group-name "FGT-HA-Cluster": Sets a descriptive name for the HA cluster.
- set hbdev "port3" 50 "port4" 50: Assigns port3 and port4 as heartbeat interfaces with a priority of 50 each. Heartbeat interfaces synchronize HA status and session data.
- set session-pickup enable: Ensures active sessions are synchronized and maintained during failover.
- set override enable: Allows the unit with the highest priority to become the primary unit, even if it rejoins the cluster after a failure.
- set priority 255: Sets the priority of this unit (higher value means higher priority for becoming primary).
Note: Adjust interface names, priorities, and other parameters to match your specific network environment and redundancy requirements. Always back up configurations before making changes and consult official Fortinet documentation for advanced options.
Conclusion
Throughout this blog post, we've explored the essential components and best practices for implementing and managing FortiGate High Availability (HA). Whether you're deploying HA for the first time or optimizing an existing setup, understanding the key concepts and tools is crucial for ensuring network resilience and uptime.
Key Takeaways:
- High Availability (HA) ensures continuous service by allowing FortiGate units to take over automatically in the event of failure.
- FortiGate supports multiple HA modes—Active-Passive, Active-Active, and Virtual Clustering—each suited for different network needs.
- Common HA topologies like full mesh, dual heartbeat, and back-to-back setups help you design for redundancy and performance.
- Synchronization of configuration, session, and routing data is vital for seamless failover and uninterrupted user experience.
- A variety of failover triggers—from hardware issues to heartbeat loss—help maintain service continuity.
- A solid troubleshooting checklist and CLI command toolkit empower administrators to quickly diagnose and resolve HA issues.
- A sample CLI configuration provides a practical starting point for deploying HA in real-world environments.
FortiGate HA is a powerful feature that, when configured correctly, can dramatically improve your network's fault tolerance and reliability. Whether you're protecting a small office or a large enterprise, HA ensures your firewall infrastructure remains resilient and responsive.
Thanks for following along! We hope this guide has helped you better understand FortiGate HA and how to make the most of it in your environment. If you have questions or want to share your own HA tips, feel free to drop a comment or reach out.