Security Operations22 min read

Building an OT Security Operations Center: Design, Staff, and Operate

Introduction

A Security Operations Center (SOC) provides the continuous monitoring, detection, and response capability that converts security investments into operational security outcomes. Without a SOC function — some combination of people, processes, and technology capable of receiving security alerts, investigating them, and acting on findings — the monitoring infrastructure that organizations deploy in OT environments produces alerts that nobody reviews and incidents that nobody responds to in time.

OT SOC design is substantially more complex than enterprise IT SOC design. The technology is different: OT monitoring tools analyze industrial protocols rather than IT network traffic. The threat landscape is different: OT-targeted attacks have different techniques, timelines, and consequences than IT-focused attacks. The operational context is different: an alert about anomalous communication on a control network cannot be investigated with the same actions as an alert on a corporate endpoint. And the people are different: OT SOC analysts need a combination of cybersecurity skills and industrial process knowledge that is genuinely scarce.

This guide provides a structured framework for designing, staffing, and operating an OT SOC — from the initial model selection through technology deployment, staffing, playbook development, and maturity measurement.

SOC Model Selection

The first decision in OT SOC design is the operational model: who delivers the SOC function, and how is it structured?

Model 1: Dedicated Internal OT SOC

A dedicated internal OT SOC employs full-time security analysts and engineers focused exclusively on OT security monitoring and response. This model provides:

  • Maximum operational knowledge: analysts who work exclusively with OT environments develop deep familiarity with the specific processes, systems, and normal behavior of the facilities they monitor
  • Fastest response integration with operations: an internal OT SOC has direct relationships with the operations team, enabling rapid coordination during incidents
  • Complete data control: all OT monitoring data stays on premises under the organization's control

The dedicated model is appropriate for large industrial operators with multiple facilities, high OT security regulatory requirements (NERC CIP Tier 3+), or specifically elevated threat profiles. It requires the most investment in staffing, tooling, and process development. Recruiting OT SOC analysts is a genuine challenge given the talent shortage described in our workforce development guidance.

Model 2: Hybrid OT/IT SOC with OT Specialization

The most common model for large industrial organizations integrates OT monitoring into an existing enterprise SOC with OT-specialist staff embedded within or closely aligned to the enterprise SOC team.

In this model:

  • OT monitoring platforms feed alerts into the enterprise SIEM
  • A subset of SOC analysts with OT training handle OT-specific alerts
  • OT-specific playbooks guide investigation and response for industrial alerts
  • Tier 1 analysts handle initial triage of OT alerts; OT specialists handle escalation

This model benefits from enterprise SOC infrastructure investment and provides OT visibility within the broader organizational security picture. The risk is that OT specialists can be pulled into IT work and that IT-focused SOC leadership may not adequately prioritize OT monitoring capability development.

Model 3: Outsourced OT SOC / Managed Detection and Response

For organizations that cannot or choose not to build internal OT SOC capability, outsourced OT Managed Detection and Response (MDR) services provide continuous monitoring, detection, and response support through a specialist provider.

OT MDR providers include:

  • Dragos Managed Services (monitoring using the Dragos platform)
  • Claroty Managed Threat Detection
  • Nozomi Vantage as a Service
  • Sector-specific MSSPs with demonstrated OT capability

When evaluating OT MDR providers, assess:

  • Actual OT protocol coverage and industrial sector experience — not generic MSSP services rebranded as OT
  • The investigation and response process for OT alerts, including how they coordinate with your operations team
  • Data residency and security of OT monitoring data
  • SLAs for alert response times and escalation

Model 4: Co-Managed OT SOC

A co-managed model combines internal OT security staff with external specialist support, with a defined split of responsibilities:

  • Internal team: asset management, tuning, context provision, operations coordination, Level 2 investigation
  • External provider: 24/7 monitoring coverage, Level 1 triage, threat intelligence integration, Level 3 specialist support

This model is increasingly popular for mid-market industrial organizations that have some internal OT security capability but cannot sustain 24/7 coverage or deep OT threat intelligence expertise internally.

Technology Stack

OT Monitoring Platform

The cornerstone of the OT SOC is a platform that understands industrial protocols and can monitor OT network traffic for threats, anomalies, and policy violations without actively probing OT devices.

Platform selection criteria:

  • Protocol coverage: does the platform natively support the industrial protocols in your environment? (Modbus, DNP3, EtherNet/IP, OPC UA, IEC 104, PROFINET, BACnet, etc.)
  • Asset discovery: can it automatically build and maintain an OT asset inventory from passive traffic analysis?
  • Behavioral analytics: does it establish communication baselines and detect deviations?
  • Threat detection: does it include ICS-specific signatures and detection logic for known OT malware families and attack techniques?
  • ATT&CK for ICS alignment: are detections mapped to the ATT&CK for ICS framework?
  • SIEM integration: can it forward normalized events to your enterprise SIEM?
  • Scalability: can it support the number of sites and devices in your environment?

Leading platforms:

  • Dragos: Strong threat intelligence integration and ATT&CK for ICS alignment. Protocol coverage across most industrial sectors.
  • Claroty: Broad protocol support, strong asset inventory capability, enterprise SIEM integration.
  • Nozomi Networks: Good scalability for large multi-site deployments. Strong OT/IoT combined coverage.
  • Microsoft Defender for IoT (formerly CyberX): Native integration with Microsoft Sentinel for organizations already in the Microsoft security stack.
  • Tenable OT Security (formerly Indegy): Strong vulnerability management integration alongside monitoring.

SIEM Integration Architecture

OT monitoring events should flow into the enterprise SIEM alongside IT security events. This integration enables correlation: an IT security event (suspicious login to an engineering workstation) correlated with an OT monitoring event (new connection from that workstation to a PLC) is far more significant than either event in isolation.

Integration architecture considerations:

  • Log normalization: OT monitoring platform events must be translated into the SIEM's data model. Most major platforms provide native connectors for Splunk, Microsoft Sentinel, IBM QRadar, and others.
  • Context enrichment: OT asset context (device type, process area, criticality, zone) should be available to SIEM analysts investigating OT alerts
  • Alert routing: OT-specific alerts should route to OT-trained analysts, not the general IT analyst pool
  • Retention: OT log retention requirements should align with regulatory requirements (NERC CIP requires 90 days minimum; longer retention is valuable for incident investigation)

OT-Specific SIEM Content

Generic SIEM content — correlation rules, dashboards, and reports designed for IT security — has limited value for OT environments. Develop or procure OT-specific SIEM content:

OT-relevant correlation rules:

  • New device communicating on the OT network (potential unauthorized device)
  • First-time communication between two devices that have not previously communicated
  • Control command from an unexpected source (PLC write from a non-SCADA server)
  • Protocol anomaly (Modbus function code outside the permitted set)
  • Engineering software connection outside of authorized maintenance windows
  • Authentication failure followed by successful authentication on an OT system
  • Firmware update command on a PLC (potential unauthorized firmware modification)
  • Communication with known malicious IP addresses from IT-connected OT systems

OT-specific dashboards:

  • Active connections per zone and across zone boundaries
  • Alert volume by zone and by protocol
  • Asset inventory changes (new devices discovered, devices no longer communicating)
  • Alert trend over time with comparison to baseline

Ticketing and Case Management

OT security events require dedicated case management that captures OT context alongside standard security case information. Ensure your ticketing system or SOAR platform can capture:

  • The OT asset(s) involved, with their zone, process area, and operational criticality
  • The operational status at the time of the alert (normal operations, maintenance, startup/shutdown)
  • The operations team contact notified and their assessment
  • The containment decision and operations team concurrence

Many organizations use their enterprise ticketing system (ServiceNow, Jira) for OT security cases with an OT-specific ticket template. Others maintain separate OT case management. Either works; the key requirement is that OT context is captured in the case record.

Staffing Requirements

The OT SOC Analyst Profile

The OT SOC analyst role requires a combination of skills that is genuinely scarce: cybersecurity analysis capabilities combined with sufficient industrial process knowledge to understand the operational context of the events they are investigating.

Core skills:

  • Network traffic analysis: ability to read and interpret packet captures, identify anomalous communication patterns
  • Security alert triage: familiarity with attacker TTPs, ability to distinguish true positives from false positives
  • Industrial protocol knowledge: understanding of Modbus, DNP3, EtherNet/IP, and other protocols present in the environment well enough to recognize anomalies
  • OT architecture understanding: knowledge of Purdue model, zone/conduit concepts, the role of historians, engineering workstations, and SCADA servers

Desirable skills:

  • Experience with at least one major OT monitoring platform (Dragos, Claroty, Nozomi)
  • Familiarity with industrial control system vendors and their common platforms
  • Incident response experience in OT environments
  • ATT&CK for ICS knowledge

Career backgrounds that produce good OT SOC analysts:

  • IT security analysts with structured OT training and process plant exposure
  • Control system engineers who have developed cybersecurity skills
  • Former ICS vendor technical staff with security training

Staffing Model by SOC Model

SOC ModelMinimum StaffingKey Roles
Dedicated Internal5-8 analysts + 1-2 engineersOT SOC Lead, Senior OT Analysts, OT Security Engineers
Hybrid IT/OT2-3 OT specialists embedded in IT SOCOT SOC Specialist, OT Security Engineer
Outsourced MDR1-2 internal OT security engineersOT Security Program Manager, internal liaison
Co-Managed2-3 internal + external teamOT Security Lead, internal context providers

Shift Coverage for OT

OT security monitoring requirements do not map neatly to business hours. Industrial processes run continuously, and threat actors do not schedule attacks during business hours. 24/7 monitoring coverage for OT requires planning:

  • Internal 24/7: Requires three shifts of analysts plus backup coverage. Expensive and difficult to staff with OT-specialist roles.
  • Follow-the-sun: If the organization has OT operations in multiple time zones, distributing monitoring across regions provides coverage without triple-staffing a single location.
  • After-hours outsourcing: Internal team covers business hours; external MDR provider covers nights, weekends, and holidays.
  • Tiered alerting: Critical alerts (safety system anomaly, new device in control zone) page an on-call OT analyst 24/7; lower-priority alerts queue for business hours review.

Most organizations with internal OT SOC capability use a tiered alerting model: true 24/7 coverage for critical alerts through an on-call rotation, with non-critical review during business hours. This requires clear definitions of which alert types require immediate 24/7 response and which can wait for the next business day.

Playbook Development for OT Alerts

Playbook Structure

OT security playbooks should follow a consistent structure that guides analysts through investigation and response without requiring expert-level OT knowledge at every step:

  1. Alert description: What triggered this alert? What does it mean in OT context?
  2. Initial triage questions: The first three to five questions to determine whether this is a true positive requiring response or a false positive requiring tuning
  3. Evidence gathering: What additional data to collect (PCAP, logs, asset information, operations team input)
  4. Escalation criteria: At what point does this escalate to the OT security engineer or operations team?
  5. Operations team notification script: What to say to operations, in operational language, not security jargon
  6. Containment options: Available response actions ordered from least to most disruptive
  7. Resolution and documentation: How to close the case with appropriate documentation
  8. False positive feedback: How to report a false positive for tuning

Essential OT Playbooks

Develop playbooks for the alert types most likely to occur in your environment. Priority playbooks for most OT environments:

New Device Discovery Triggered when an OT monitoring platform detects a device communicating on the OT network that was not previously in the asset inventory. Triage questions: Is this device authorized? Was a maintenance activity underway that could explain a new connection? Is this an attacker-controlled device or a legitimate asset that was not inventoried? Response: verify with the responsible operations engineer, add to inventory if authorized, investigate if unauthorized.

Unauthorized Control Command Triggered when a control command (Modbus write, EtherNet/IP Set_Attribute) is received by a PLC from a source other than an authorized SCADA server or engineering workstation. This is a high-priority alert requiring rapid investigation. Triage: verify the source address, confirm whether a maintenance activity was underway, review recent access logs for the source system.

Engineering Software Connection Outside Maintenance Window Triggered when a PLC programming tool (TIA Portal, Studio 5000, RSLogix) initiates a connection to a PLC outside of scheduled maintenance windows. Could indicate unauthorized logic modification attempt. Immediate escalation to OT security engineer and operations notification required.

First-Time Communication Between Two OT Devices A device that has never previously communicated with another specific device begins doing so. Could be a new integration or could indicate lateral movement. Triage: verify against recent change records, review the nature of the communication (protocol, data volume, frequency).

Protocol Anomaly A Modbus function code outside the permitted set, a DNP3 request with an unusual or malformed structure, an OPC UA connection attempt with an invalid certificate. These may indicate reconnaissance, exploitation attempts, or misconfigured systems. Triage and categorize per the specific protocol and anomaly type.

Communication to/from Known Malicious Infrastructure An OT-connected system (historian, engineering workstation, DMZ server) initiating communication to or receiving communication from an IP address associated with known malicious infrastructure. Treat as a potential active incident. Immediate escalation.

Integration with Enterprise SOC

Defining the Handoff

The boundary between the enterprise (IT) SOC and the OT SOC requires explicit definition. Common models:

  • OT SOC handles all OT alerts independently: The OT SOC receives OT monitoring events directly. The IT SOC receives a summary feed of significant OT events for situational awareness but does not triage or investigate OT alerts.
  • IT SOC triages OT alerts at Tier 1: Enterprise SOC analysts handle initial triage of all alerts, including OT, and escalate to OT specialists for anything requiring industrial process context.
  • Fully integrated investigation with shared tooling: Both IT and OT events flow into the same SIEM, with role-based access and routing that directs OT alerts to OT-trained analysts.

The fully integrated model provides the best correlation capability but requires the most investment in OT-specific SIEM content and analyst training. The independent model is simpler to operate but risks missing cross-domain attack patterns.

Joint Investigation Procedures

For incidents that cross the IT/OT boundary — which is the most common pattern for significant OT incidents — a joint investigation process is essential. Define:

  • Who leads: is the OT SOC lead or the IT SOC lead the incident commander for cross-domain investigations?
  • Communication channel: how do the two teams communicate during an active joint investigation?
  • Evidence sharing: how is evidence collected from IT systems made available to OT investigators and vice versa?
  • Decision authority: who authorizes containment actions on IT systems that may affect OT? Who authorizes OT containment actions?

SOC Maturity Measurement

Key Metrics

Track and report these metrics to demonstrate SOC effectiveness and drive improvement:

Detection metrics:

  • Mean Time to Detect (MTTD): average time from incident occurrence to alert generation
  • Alert coverage: percentage of OT network covered by active monitoring
  • Detection rate: estimated percentage of malicious activity in the environment that generates alerts (assessed through purple teaming and simulation)

Response metrics:

  • Mean Time to Acknowledge (MTTA): time from alert generation to analyst acknowledgment
  • Mean Time to Investigate (MTTI): time from acknowledgment to investigation conclusion
  • Mean Time to Contain (MTTC): time from confirmed true positive to containment action
  • False positive rate: percentage of alerts that are determined to be false positives after investigation

Operational metrics:

  • Playbook coverage: percentage of alert types with documented playbooks
  • Analyst competency: training completion rates, exercise performance scores
  • Backlog rate: number of alerts older than SLA without resolution

Maturity Model

Use a structured maturity model to assess current state and plan improvement:

LevelCharacteristics
1 — Ad HocMonitoring exists but no consistent alert review process. Response is reactive and undocumented.
2 — DevelopingAlerts reviewed but inconsistently. Some playbooks exist. No formal metrics. SOC staffed but OT expertise limited.
3 — DefinedConsistent alert review with documented playbooks for common scenarios. Metrics tracked. OT and IT SOC integration defined.
4 — ManagedMetrics-driven improvement. Regular exercises. Playbook coverage for all significant alert types. Threat intelligence integrated.
5 — OptimizingContinuous improvement through red/purple team exercises. Predictive analytics. Leading indicator metrics beyond detection speed.

Most organizations entering this journey are at Level 1 or 2. The goal for year one should be reaching Level 3: consistent, documented, metric-tracked SOC operations for OT. Levels 4 and 5 are multi-year development objectives.

Conclusion

An OT SOC is the operational expression of the security program's monitoring and response capabilities. Without it, alerts go unreviewed, incidents go undetected, and the security investment in sensors and monitoring platforms delivers no operational value.

The model that is right for your organization depends on your size, risk profile, regulatory requirements, and available resources. But some version of the SOC function — defined, staffed, equipped, and operated — is a requirement for any industrial organization that takes its OT security posture seriously.


Beacon Security helps industrial organizations design, build, and operate OT Security Operations Centers, from initial model selection through technology deployment, playbook development, and ongoing operational support. Contact us to assess your current OT monitoring and response capabilities.

Industrial infrastructure
OT Cybersecurity Experts

Your OT Environment Deserves
Expert Protection

IT security tools were not built for Modbus, OPC, or safety-rated controllers. Get a dedicated OT cybersecurity team that understands industrial protocols, control system architecture, and the operational constraints of your environment.

IEC/ISA 62443 Aligned
NIST 800-82 Compliant
OTCC Ready
ECC Aligned
Zero Operational Disruption