Introduction
Backup and recovery capability is the ultimate security backstop. Preventive controls can fail. Detection can be delayed. But an organization with tested, current backups of its OT systems and practiced recovery procedures can survive even a catastrophic cyber incident — ransomware, destructive malware, or targeted sabotage — and restore operations with defined, predictable timelines.
OT backup and recovery is not the same as IT backup and recovery, and IT-focused backup programs do not adequately protect OT environments. An enterprise backup solution that correctly images every Windows server may completely ignore the PLC logic that actually runs the process, the HMI project files that define the operator interface, and the historian database that contains years of process history. When a ransomware attack encrypts the SCADA server and corrupts the PLC configurations, the IT backup program provides servers that boot correctly into an environment where the control systems no longer work.
This guide provides a complete framework for OT backup and recovery — from understanding what needs to be backed up and why, through backup methodology, offline storage requirements, recovery time objectives, procedure development, and testing.
What Needs to Be Backed Up in OT
The OT backup inventory goes far beyond the servers that enterprise backup tools typically manage.
Tier 1: Control System Logic and Configuration
These backups are irreplaceable if lost and are the foundation of any OT recovery.
PLC and RTU Logic Programs: The ladder logic, function block diagrams, and structured text programs running on every PLC, RTU, and DCS controller in the facility. These programs represent years of engineering effort — the accumulated logic of process optimization, safety interlock development, and operational refinement. Without current backups, recovery from a logic-modifying attack requires re-engineering the control logic from process documentation, which may take weeks and may not perfectly replicate the tuned parameters of the original.
DCS Controller Configurations: Distributed Control System controller databases, loop configurations, control module hierarchies, and faceplates. DCS vendors (Honeywell DeltaV, Yokogawa CENTUM, Emerson Ovation, ABB 800xA) each have vendor-specific backup procedures that export the controller configuration in a format that can be fully restored.
Safety Instrumented System (SIS) Logic and Configuration: SIS configurations are the most critical backup in the OT environment. A SIS that cannot be restored to its certified configuration following a cyber incident cannot be returned to service without extensive re-certification testing. Backup SIS configurations with version control and store offline.
Network Device Configurations: Managed switch configurations, industrial firewall rule sets, router configurations, and wireless access point configurations. Network device configuration backup is frequently overlooked in OT backup programs, but a facility where every PLC is restored but the network is misconfigured cannot resume operations.
Field Instrument and Device Configurations: Process transmitters, analyzers, intelligent motor controllers, and variable frequency drives that have network-configurable parameters. These are often not backed up because they are considered "field instruments" rather than "control systems." In practice, reconfiguring dozens of field instruments after a cyber incident can add days to the recovery timeline.
Tier 2: Windows-Based OT Systems
SCADA Server Images: Full system images of SCADA servers, including operating system, SCADA application, tag database, alarm configuration, and historian connections. A SCADA server image backup allows restoration to a known-good state without re-installing and re-configuring from scratch.
HMI Workstation Images: Full images of HMI servers and operator workstations, including the HMI project files (process graphics, navigation structure, alarm displays), the HMI software installation, and configurations. HMI images allow restoration of the operator interface to an operational state.
Engineering Workstation Images: Full images of engineering workstations, including the vendor engineering software (TIA Portal, Studio 5000, etc.), project files, and configurations. Engineering workstation images are critical for recovery — the workstation is the tool needed to restore PLC configurations, and if it is compromised, the recovery process requires rebuilding or using a clean spare.
Historian Databases: Process historian databases (OSIsoft PI, Canary, Ignition Historian) contain the operational data record of the facility. The database itself can be restored from backup; the ongoing data collection can resume after server recovery. Historian database backup frequency should match the value of the historical data — daily backups for most facilities.
Tier 3: Documentation and Configuration Records
As-Built Network Diagrams: Current network diagrams showing all device connections, IP addresses, VLAN assignments, and zone boundaries. Without current network documentation, rebuilding the network topology after a destructive incident is complex and error-prone.
Firewall Rule Sets and Access Control Lists: Exported rule set configurations for all industrial firewalls and network ACLs. In many cases, firewall management systems maintain version-controlled rule sets; confirm that the management system's configuration database is also backed up.
Vendor Licenses and Keys: Software license keys, hardware keys (dongles), and vendor-issued certificates for OT software. A SCADA server restored from image backup may fail to start if the license management server is not also restored or the license is not available. Document and store all license keys offline.
Vendor Contact Information and Escalation Contacts: During an incident, the vendor emergency contact list is critical. Store a current copy offline and ensure it includes direct technical support contacts, not just sales or service desk numbers.
Backup Methodology by Component Type
PLC and Controller Logic Backup
Method: Use the vendor's engineering software to export PLC logic in a backup-compatible format:
- Siemens S7: TIA Portal > Project > Archive project as .zap17 (TIA Portal V17) or equivalent. Full project archive includes all PLC programs, hardware configurations, and network configurations.
- Rockwell Studio 5000: File > Save As creates .ACD or .L5K (text format) files. The .L5K format is human-readable and version-controllable in a code repository.
- Schneider EcoStruxure Control Expert: File > Export creates .STA or .EXP project files.
- GE Proficy Machine Edition: File > Export creates backup archives.
Frequency: After every authorized change. PLC logic should be backed up as part of the change management process — take the backup before making changes (pre-change baseline) and after changes are validated (post-change authorized baseline). Automated backup tools can also perform periodic scheduled backups.
Automation tools:
- Rockwell FactoryTalk AssetCentre: Automated backup and change detection for Rockwell controller logic
- Siemens SINEMA Network Manager: Backup management for Siemens networking and some PLC infrastructure
- Generic tools (Indegy/Tenable OT, Claroty) can backup controller logic from multiple vendor platforms
Version control: Store PLC logic backups in a version-controlled system (Git or a document management system with version history). Version control allows comparison between current and previous logic versions to identify unauthorized changes.
Windows-Based System Backup
Image-based backup: Use image-based backup tools to capture full system images of SCADA servers, HMI servers, and engineering workstations:
- Veeam Backup and Replication: Full image backup with bare-metal restore capability. Widely used for OT Windows systems.
- Acronis Cyber Backup: Image-based backup with encrypted storage and offline backup capability.
- Windows Server Backup (native): Built-in to Windows Server; acceptable for basic image backup where third-party tools are not available.
Frequency:
- SCADA servers: Weekly full image backup; daily incremental if change frequency warrants it
- HMI servers: Weekly full image backup; after every significant HMI project change
- Engineering workstations: Weekly full image backup; after every authorized software or configuration change
Retention: Retain at least 4 weeks of image history. This provides the ability to restore to a pre-compromise state if an incident is discovered after some delay.
Isolation of backup storage: Backup storage must be isolated from both the OT network and the IT network. Ransomware that encrypts OT systems will also encrypt any backup storage it can reach. Backup storage isolation options:
- Offline storage (removable media or offline tape that is physically disconnected from all networks)
- Write-once storage (WORM media or object storage with object lock enabled)
- Backup infrastructure on a dedicated, isolated network segment with no access from OT or IT production networks
Historian Database Backup
Most historian platforms include native backup capabilities:
- OSIsoft PI: PI Backup utility creates full backups of the PI Data Archive, including all points and historical data. Schedule automatic backups and verify completion.
- Canary Labs: CSV export or native backup of the Canary data store.
- Ignition: Ignition Backup/Restore creates project and data backups.
Historian database backups should be stored on isolated backup infrastructure, not on the OT network storage that is also the target of potential ransomware.
For historian databases with years of process history, the backup volume may be large. Verify that the backup infrastructure has sufficient capacity and that restore time from backup is within the recovery time objective.
Offline Storage Requirements
The most critical backup media must be isolated from all networks to ensure ransomware cannot encrypt or corrupt backups.
Storage Architecture
Tier 1 — Offline / Air-Gapped Storage: PLC logic backups, SIS configurations, and "golden images" of critical OT servers should be stored on media that is physically disconnected from all networks. Options:
- External hard drives stored in a secure location (fireproof safe, operations building safe)
- Write-protected USB drives stored offline
- Offline NAS device that is powered off and physically disconnected between backup events
Tier 2 — Isolated Network Storage: Regular backup data (weekly images, daily historian backups) may be stored on backup infrastructure that is network-connected but isolated from both OT and IT networks. The backup network should have no routing to OT or IT; the only connections should be the scheduled backup agents that push data to the backup system.
Tier 3 — Offsite Storage: A copy of the most critical backup data should be maintained offsite — either at a secondary facility, a secure storage service, or an encrypted cloud backup. Offsite storage protects against physical events (fire, flood) that could destroy all on-site backup media.
Backup Media Management
- Label all backup media with the backup date, the system backed up, and the backup type
- Maintain a backup media log: what is stored on each piece of media, its location, and its retention period
- Rotate backup media: do not overwrite the most recent backup until the previous backup is verified as complete and readable
- Test media periodically: confirm that backup media is readable and that the backup data is restorable before you need it in an emergency
Recovery Time Objectives
Define Recovery Time Objectives (RTO) for each OT system category:
| System Category | Typical RTO | Recovery Method |
|---|---|---|
| Safety Instrumented System | 4-24 hours | Logic restore from backup + SIS functional testing before restart |
| Primary DCS Controllers | 8-24 hours | Controller restore from backup + process checkout |
| SCADA Server | 2-8 hours | Bare-metal restore from image backup |
| HMI Server | 1-4 hours | Bare-metal restore from image backup |
| Engineering Workstation | 4-8 hours | Bare-metal restore or spare EWS deployment |
| Field Instruments | 1-48 hours (per quantity) | Configuration restore using engineering workstation |
| Historian Server | 4-24 hours (data recovery ongoing) | Server restore from image; historical data restore from DB backup |
| Network Infrastructure | 2-8 hours | Configuration restore to spare hardware |
RTOs are targets, not guarantees. Actual recovery times depend on:
- Whether backup media is current and accessible
- Whether recovery procedures are documented and practiced
- Whether spare hardware is available for failed equipment
- Whether vendor support is available for complex restorations
Test your RTOs annually against actual recovery exercises. Untested RTOs are estimates, not commitments.
Recovery Procedure Development
Recovery procedures must be documented at sufficient detail that personnel who were not involved in the original system configuration can execute them correctly under pressure.
Structure of an OT Recovery Procedure
Each recovery procedure should include:
1. Scope and Applicability What system(s) does this procedure cover? Under what circumstances should it be invoked?
2. Prerequisites What must be in place before starting? (Backup media available, spare hardware in place, vendor contact list available, process in safe state, operations team notified)
3. Recovery Team Who must be involved? (OT security engineer, control system engineer, vendor contact, operations supervisor) What are their roles and communication contacts?
4. Step-by-Step Procedure Each action described in sufficient detail for execution by a qualified engineer without additional reference material. For complex steps, include screenshots or configuration screenshots.
5. Verification Steps After each major step, what checks confirm the step completed successfully before proceeding?
6. Process Restart Criteria What conditions must be verified before the process can be restarted? This section is typically written jointly with the operations team and may include physical inspections, instrument calibration checks, and loop functional tests.
7. Rollback Options If the recovery procedure fails at a specific step, what options exist?
8. Escalation Contacts Who to call at each stage if problems arise that the team cannot resolve. Include vendor emergency contacts with direct phone numbers, not just general support lines.
Procedure Testing
Recovery procedures must be tested before they are needed in an emergency. Testing options:
Tabletop Exercise: Walk through the procedure with the full recovery team. Identify gaps, ambiguities, and missing prerequisites. A tabletop exercise reveals most procedure problems without requiring a production system to be taken offline.
Component Restoration Test: Restore a non-critical OT system component (a decommissioned HMI server, a development PLC, a spare engineering workstation) from backup in a test environment. Verify that the restored system comes up correctly and the backup data is complete and valid.
Full Recovery Exercise: At planned maintenance or shutdown, execute a complete recovery of a section of the OT environment from backup. This provides the highest confidence that the procedures work but requires coordination with operations and a production maintenance window.
Conduct tabletop exercises at minimum annually. Component restoration tests should be performed at least semi-annually. Full recovery exercises should occur at least once every 18-24 months for critical OT systems.
Ransomware-Specific Recovery Considerations
Ransomware attacks on OT environments present specific recovery challenges:
Assessment before restoration: Before restoring from backup, assess whether the adversary still has access. Restoring encrypted systems to a network where the attacker maintains persistence allows re-encryption. The restoration sequence should include re-establishing the network architecture from a clean state, not simply restoring systems to the existing compromised network.
Clean restore environment: Restore OT systems to a clean, rebuilt network environment. If the IT network is compromised, keep the OT restoration network completely isolated from IT until the IT investigation is complete.
PLC and controller logic verification: After restoring PLC logic from backup, verify the restored logic against the backup record using hash comparison or automated logic comparison tools. If the backup was made before the attacker modified the logic, restoring it restores the correct logic. If the backup was made after modification, it may contain the attacker's changes.
Recovery sequencing: Restore and verify each layer in sequence:
- Network infrastructure (switches, firewalls) — the network must be correct before systems are connected
- Engineering workstations — needed to restore and verify controller configurations
- Safety systems — must be restored and verified before any process restart
- DCS and SCADA systems — restore before returning operators to automated control
- Historians and reporting systems — restore after the process is stable
Preserve evidence: Before restoring, capture forensic images of affected systems where possible. The encrypted systems and ransom notes contain forensic evidence relevant to the investigation, insurance claim, and regulatory reporting.
Business Continuity: Operating Through a Cyber Incident
For extended incidents where full OT restoration takes days or weeks, business continuity planning addresses how to maintain safe plant operation during the recovery period.
Manual operations procedures: Document manual operating procedures for each process area — how operators maintain safe control of the process if SCADA visibility is lost or HMIs are unavailable. Manual operations are cognitively demanding and error-prone; practiced procedures significantly reduce the risk of operator error during a crisis.
Degraded mode operations: Define acceptable reduced-capacity operations that can be maintained with degraded automation. For example, a process that normally runs at 100% capacity under automated control may be maintainable at 60% under manual control until the SCADA system is restored.
Communication alternatives: If normal OT communications are disrupted, what alternatives exist? Panel-mounted instruments (local gauges, local indicators) that are independent of the SCADA network provide operator visibility into process parameters without relying on the SCADA network.
Customer and stakeholder communication: Prepare communication templates for notifying customers of production delays, regulatory bodies of incidents as required, and internal leadership of status and recovery timelines.
Conclusion
OT backup and recovery is not an insurance policy that is set up and forgotten. It requires ongoing maintenance — keeping backups current, testing restoration procedures, updating RTOs as systems change, and practicing the coordination required for an effective recovery response.
The organizations that recover fastest from OT cyber incidents are those that have made the investment before the incident: current, tested backups; practiced recovery procedures; and a recovery team that knows its roles without having to figure it out under pressure. That preparation is achievable, and its value is proven every time it converts a potential catastrophe into a manageable recovery event.
Beacon Security provides OT backup program design, recovery procedure development, restoration testing, and ransomware readiness assessments for industrial organizations. Contact us to assess the completeness and testedness of your OT backup and recovery capability.

