Table of Contents
- Overview
- Key Terminology
- Architecture Details
- ISSU and Upgrade Best Practices
- Operational Workflow
- Conclusion
Overview of State Sharing and In-Service Software Upgrade (ISSU)
What Is It?
State Sharing and In-Service Software Upgrade (ISSU) are advanced technologies found in Arista’s network switches. Together, they ensure your network devices remain operational and highly available—even during maintenance or software upgrades.
- State Sharing: Arista switches use a modular architecture where software processes (like protocol handlers and management agents) run independently but share the system’s critical state information through a centralized in-memory database.
- ISSU: This technology allows network administrators to upgrade the switch’s operating system (EOS) without restarting the device or interrupting data traffic.
Why You Need to Know About It
- Minimize Downtime: In high-availability environments such as data centers, network downtime can be costly. ISSU provides a way to keep devices online while applying software updates, meaning essential services and applications aren’t disrupted.
- Operational Flexibility: Upgrades and maintenance activities often require system reboots, but Arista’s approach lets you restart individual software processes or apply patches on the fly.
- Higher Reliability: State sharing ensures that when a process fails or is restarted, the switch can recover quickly without losing configuration, routing, or forwarding information.
How It Works
- Modular Process Architecture: Each main function of the switch (e.g., routing, management, monitoring) runs as an isolated software process. These processes interact through a centralized, high-speed database that retains the current ‘state’ of the switch.
- Independent Process Restarts: If one process needs to be restarted or upgraded (such as during an ISSU), it can do so independently. The in-memory database preserves the state, so the switch doesn’t lose track of its configuration or current network activity.
- In-Service Upgrade Flow:
- The new software image is loaded onto the device while the old image is still running.
- Processes are then gracefully restarted using the new software, one at a time, while the overall device remains up.
- When using redundancy features like MLAG (Multi-Chassis Link Aggregation), traffic can be shifted automatically to a peer device during upgrades, further reducing disruption.
In summary:
State Sharing and ISSU make it possible for network operators to keep infrastructure robust and up-to-date without scheduling costly shutdowns or risking unexpected outages. These innovations are essential for organizations demanding uninterrupted connectivity and operational agility.
Key Terminology
Understanding the core terms related to Arista's State Sharing and In-Service Software Upgrade (ISSU) helps clarify how the technology functions:
-
EOS (Extensible Operating System):
Arista’s modular, Linux-based operating system that powers Arista switches. EOS separates protocol processing from the kernel for higher reliability and flexibility. -
State Sharing Architecture:
A design where switch state information is stored in an in-memory database, enabling rapid and independent updates and restarts of processes without disrupting the system. -
ISSU (In-Service Software Upgrade):
Technology that allows upgrading switch software without rebooting or causing significant traffic disruptions, ensuring continuous network availability. -
Smart System Upgrade (SSU):
Arista’s method of managing network-wide upgrades leveraging system redundancy and modular architecture to minimize downtime and upgrade risk. -
MLAG (Multi-Chassis Link Aggregation):
Technology allowing two Arista switches to operate as one logical unit, providing active-active connectivity and seamless traffic continuation during maintenance or upgrades.
Architecture Details
The architecture of Arista switches is designed to support high availability, scalability, and efficient software upgrades through modular components and state sharing.
Multi-Process State Sharing
- Modular Process Design: Protocol processing, management, and device drivers each run as separate processes in isolated memory spaces, outside the Linux kernel. This enhances system stability and fault isolation.
- In-Memory Database: Critical switch state information is maintained in a centralized in-memory database. Processes communicate state updates using a publish/subscribe model, enabling rapid synchronization and minimal overhead.
- Independent Process Restarts: The architecture allows individual processes to restart independently without requiring a full system reboot, facilitating quick recovery from faults and seamless upgrades.
Self-Healing Mechanisms
- Software Fault Containment (SFC): Faults are contained within individual processes to prevent system-wide failures, maintaining overall switch stability.
- Stateful Fault Repair (SFR): Processes can be restarted while preserving state information, allowing the system to recover automatically and maintain continuous operation.
ISSU and Upgrade Best Practices
Following best practices for In-Service Software Upgrade (ISSU) helps ensure minimal disruption and a smooth upgrade process on Arista switches.
Feature | Description |
---|---|
In-Service Software Upgrade (ISSU) | Allow upgrades without rebooting or interrupting active data traffic, ensuring high network availability. |
Smart System Upgrade (SSU) | Uses system redundancy and modular architecture to perform upgrades network-wide with minimal downtime and risks. |
MLAG-based Upgrades | Leverages Multi-Chassis Link Aggregation (MLAG) to upgrade one peer switch while the other actively handles traffic, minimizing downtime and packet loss. |
Standard Upgrades | For switches without redundancy or MLAG, upgrades typically require a reload. Scheduling maintenance windows is recommended to avoid service impact. |
Important Considerations:
- Not all hardware models or features (e.g., some 7050SX2/7280 variants, MACsec, VRRP, BFD, PTP, VXLAN routing) fully support ISSU or SSU—verify support with official documentation before proceeding.
- Though ISSU is designed for hitless upgrades, always prepare for fallback reloads and test upgrades in controlled environments where possible.
- Disable or carefully manage sensitive protocols and features during ISSU as recommended by Arista’s guidelines.
Operational Workflow
Implementing a successful state sharing and ISSU upgrade involves a structured workflow to ensure network stability and minimal downtime:
-
Preparation
Review hardware compatibility and EOS version support for ISSU/SSU. Identify any features or protocols that may need to be disabled during the upgrade to avoid disruption. -
Redundancy Utilization
For networks configured with MLAG or redundant supervisor modules, plan rolling upgrades where one device is upgraded while the other maintains active traffic handling to ensure continuous operation. -
Maintenance Mode
Place devices into maintenance mode before starting the upgrade to safely remove them from the forwarding path where applicable. -
Perform Upgrade
Execute the ISSU or SSU upgrade commands following Arista’s documented procedures. Monitor system messages and protocol states throughout the process. -
Post-Upgrade Verification
After completion, verify that all processes are running normally, protocols are converged, and device state is fully synchronized with the network. -
Reintegration
Take devices out of maintenance mode and ensure they resume full forwarding duties without packet loss or traffic disruption.
Conclusion
In conclusion, Arista’s State Sharing architecture and In-Service Software Upgrade (ISSU) capabilities showcase the power of modular software design in delivering high availability and operational flexibility in modern data center networks. By separating protocol processing into independent processes and leveraging an in-memory database for state sharing, Arista EOS enables seamless upgrades and rapid fault recovery with minimal impact on network traffic.
Key takeaways include:
- State Sharing allows independent process restarts without disrupting overall switch operations.
- ISSU and Smart System Upgrade (SSU) methods reduce downtime by enabling software upgrades without full system reloads.
- MLAG provides network redundancy that significantly smooths upgrade workflows by maintaining active traffic paths.
- Following best practices and carefully planning upgrades ensures smooth transitions with minimal risk.
- Understanding hardware and feature support limitations is essential before performing ISSU.
We hope this deep dive into Arista’s innovative approach to software upgrades and state management has been insightful and helps you confidently manage your network infrastructure. Thanks for reading, and happy networking!