Building an OT Security Operations Center: Design, Staff, and Operate

Introduction

A Security Operations Center (SOC) provides the continuous monitoring, detection, and response capability that converts security investments into operational security outcomes. Without a SOC function -- some combination of people, processes, and technology capable of receiving security alerts, investigating them, and acting on findings -- the monitoring infrastructure that organizations deploy in OT environments produces alerts that nobody reviews and incidents that nobody responds to in time.

OT SOC design is substantially more complex than enterprise IT SOC design. The technology is different: OT monitoring tools analyze industrial protocols rather than IT network traffic. The threat landscape is different: OT-targeted attacks have different techniques, timelines, and consequences than IT-focused attacks. The operational context is different: an alert about anomalous communication on a control network cannot be investigated with the same actions as an alert on a corporate endpoint. And the people are different: OT SOC analysts need a combination of cybersecurity skills and industrial process knowledge that is genuinely scarce.

This guide provides a structured framework for designing, staffing, and operating an OT SOC, from the initial model selection through technology deployment, staffing, playbook development, and maturity measurement.

SOC Model Selection

The first decision in OT SOC design is the operational model: who delivers the SOC function, and how is it structured? There is no universally correct answer -- the right model depends on the organization's size, threat profile, regulatory requirements, and available talent. What follows is an honest comparison of each option, the kind of analysis we walk clients through at Beacon Security when they are making this decision.

Model 1: Dedicated Internal OT SOC

A dedicated internal OT SOC employs full-time security analysts and engineers focused exclusively on OT security monitoring and response. This model provides:

Maximum operational knowledge: analysts who work exclusively with OT environments develop deep familiarity with the specific processes, systems, and normal behavior of the facilities they monitor
Fastest response integration with operations: an internal OT SOC has direct relationships with the operations team, enabling rapid coordination during incidents
Complete data control: all OT monitoring data stays on premises under the organization's control

The dedicated model is appropriate for large industrial operators with multiple facilities, high OT security regulatory requirements (NERC CIP Tier 3+), or specifically elevated threat profiles. It requires the most investment in staffing, tooling, and process development. The honest caveat: recruiting OT SOC analysts is hard. The overlap of cybersecurity skills and process plant knowledge is a narrow Venn diagram, and the talent pool is small relative to demand.

Model 2: Hybrid OT/IT SOC with OT Specialization

The most common model for large industrial organizations integrates OT monitoring into an existing enterprise SOC with OT-specialist staff embedded within or closely aligned to the enterprise SOC team.

In this model:

OT monitoring platforms feed alerts into the enterprise SIEM
A subset of SOC analysts with OT training handle OT-specific alerts
OT-specific playbooks guide investigation and response for industrial alerts
Tier 1 analysts handle initial triage of OT alerts; OT specialists handle escalation

This model benefits from enterprise SOC infrastructure investment and provides OT visibility within the broader organizational security picture. The risk is real and worth naming: OT specialists in a hybrid SOC tend to get pulled into IT work, and IT-focused SOC leadership may not adequately invest in developing the OT monitoring capability over time. Protecting the OT specialist role requires deliberate organizational commitment.

Model 3: Outsourced OT SOC / Managed Detection and Response

For organizations that cannot or choose not to build internal OT SOC capability, outsourced OT Managed Detection and Response (MDR) services provide continuous monitoring, detection, and response support through a specialist provider.

OT MDR providers include Dragos Managed Services, Claroty Managed Threat Detection, Nozomi Vantage as a Service, and sector-specific MSSPs with demonstrated OT capability.

When evaluating OT MDR providers, the critical question is whether you are getting genuine OT expertise or a generic MSSP service with "OT" applied as a label. Assess actual industrial sector experience and protocol coverage. Ask to see how the provider handles a specific OT alert type relevant to your environment. Understand the investigation and response process for OT alerts, including how they coordinate with your operations team during an active incident. Generic MSSP services rebranded as OT often fall short precisely when the stakes are highest.

Model 4: Co-Managed OT SOC

A co-managed model combines internal OT security staff with external specialist support, with a defined split of responsibilities:

Internal team: asset management, tuning, context provision, operations coordination, Level 2 investigation
External provider: 24/7 monitoring coverage, Level 1 triage, threat intelligence integration, Level 3 specialist support

This model is increasingly popular for mid-market industrial organizations that have some internal OT security capability but cannot sustain 24/7 coverage or deep OT threat intelligence expertise internally. At Beacon Security, we often recommend this model as the pragmatic starting point for organizations that are serious about OT security but realistic about their internal capacity -- it provides genuine coverage without requiring an organization to build a full SOC from scratch.

The key success factor in co-managed arrangements is a well-defined interface: clear role boundaries, documented escalation paths, and regular joint reviews to prevent the external provider from operating without adequate local context.

Technology Stack

OT Monitoring Platform

The cornerstone of the OT SOC is a platform that understands industrial protocols and can monitor OT network traffic for threats, anomalies, and policy violations without actively probing OT devices.

Platform selection criteria:

Protocol coverage: does the platform natively support the industrial protocols in your environment? (Modbus, DNP3, EtherNet/IP, OPC UA, IEC 104, PROFINET, BACnet, etc.)
Asset discovery: can it automatically build and maintain an OT asset inventory from passive traffic analysis?
Behavioral analytics: does it establish communication baselines and detect deviations?
Threat detection: does it include ICS-specific signatures and detection logic for known OT malware families and attack techniques?
ATT&CK for ICS alignment: are detections mapped to the ATT&CK for ICS framework?
SIEM integration: can it forward normalized events to your enterprise SIEM?
Scalability: can it support the number of sites and devices in your environment?

Leading platforms:

Dragos: Strong threat intelligence integration and ATT&CK for ICS alignment. Protocol coverage across most industrial sectors.
Claroty: Broad protocol support, strong asset inventory capability, enterprise SIEM integration.
Nozomi Networks: Good scalability for large multi-site deployments. Strong OT/IoT combined coverage.
Microsoft Defender for IoT (formerly CyberX): Native integration with Microsoft Sentinel for organizations already in the Microsoft security stack.
Tenable OT Security (formerly Indegy): Strong vulnerability management integration alongside monitoring.

SIEM Integration Architecture

OT monitoring events should flow into the enterprise SIEM alongside IT security events. This integration enables correlation: an IT security event (suspicious login to an engineering workstation) correlated with an OT monitoring event (new connection from that workstation to a PLC) is far more significant than either event in isolation. That cross-domain correlation is where the most consequential detections happen.

Integration architecture considerations:

Log normalization: OT monitoring platform events must be translated into the SIEM's data model. Most major platforms provide native connectors for Splunk, Microsoft Sentinel, IBM QRadar, and others.
Context enrichment: OT asset context (device type, process area, criticality, zone) should be available to SIEM analysts investigating OT alerts
Alert routing: OT-specific alerts should route to OT-trained analysts, not the general IT analyst pool
Retention: OT log retention requirements should align with regulatory requirements (NERC CIP requires 90 days minimum; longer retention is valuable for incident investigation)

OT-Specific SIEM Content

Generic SIEM content designed for IT security has limited value for OT environments. Develop or procure OT-specific correlation rules, dashboards, and reports. The most valuable detections in OT environments are often behavioral -- things that are technically valid but operationally anomalous:

OT-relevant correlation rules:

New device communicating on the OT network (potential unauthorized device)
First-time communication between two devices that have not previously communicated
Control command from an unexpected source (PLC write from a non-SCADA server)
Protocol anomaly (Modbus function code outside the permitted set)
Engineering software connection outside of authorized maintenance windows
Authentication failure followed by successful authentication on an OT system
Firmware update command on a PLC (potential unauthorized firmware modification)
Communication with known malicious IP addresses from IT-connected OT systems

OT-specific dashboards:

Active connections per zone and across zone boundaries
Alert volume by zone and by protocol
Asset inventory changes (new devices discovered, devices no longer communicating)
Alert trend over time with comparison to baseline

Ticketing and Case Management

OT security events require dedicated case management that captures OT context alongside standard security case information. Ensure your ticketing system or SOAR platform can capture:

The OT asset(s) involved, with their zone, process area, and operational criticality
The operational status at the time of the alert (normal operations, maintenance, startup/shutdown)
The operations team contact notified and their assessment
The containment decision and operations team concurrence

Many organizations use their enterprise ticketing system (ServiceNow, Jira) for OT security cases with an OT-specific ticket template. Others maintain separate OT case management. Either works; the key requirement is that OT context is captured in the case record.

Staffing Requirements

The OT SOC Analyst Profile

The OT SOC analyst role requires a combination of skills that is scarce: cybersecurity analysis capabilities combined with sufficient industrial process knowledge to understand the operational context of the events they are investigating.

Core skills:

Network traffic analysis: ability to read and interpret packet captures, identify anomalous communication patterns
Security alert triage: familiarity with attacker TTPs, ability to distinguish true positives from false positives
Industrial protocol knowledge: understanding of Modbus, DNP3, EtherNet/IP, and other protocols present in the environment well enough to recognize anomalies
OT architecture understanding: knowledge of Purdue model, zone/conduit concepts, the role of historians, engineering workstations, and SCADA servers

Desirable skills:

Experience with at least one major OT monitoring platform (Dragos, Claroty, Nozomi)
Familiarity with industrial control system vendors and their common platforms
Incident response experience in OT environments
ATT&CK for ICS knowledge

Career backgrounds that produce good OT SOC analysts:

IT security analysts with structured OT training and process plant exposure
Control system engineers who have developed cybersecurity skills
Former ICS vendor technical staff with security training

Staffing Model by SOC Model

SOC Model	Minimum Staffing	Key Roles
Dedicated Internal	5-8 analysts + 1-2 engineers	OT SOC Lead, Senior OT Analysts, OT Security Engineers
Hybrid IT/OT	2-3 OT specialists embedded in IT SOC	OT SOC Specialist, OT Security Engineer
Outsourced MDR	1-2 internal OT security engineers	OT Security Program Manager, internal liaison
Co-Managed	2-3 internal + external team	OT Security Lead, internal context providers

Shift Coverage for OT

OT security monitoring requirements do not map neatly to business hours. Industrial processes run continuously, and threat actors do not schedule attacks during business hours. 24/7 monitoring coverage for OT requires planning:

Internal 24/7: Requires three shifts of analysts plus backup coverage. Expensive and difficult to staff with OT-specialist roles.
Follow-the-sun: If the organization has OT operations in multiple time zones, distributing monitoring across regions provides coverage without triple-staffing a single location.
After-hours outsourcing: Internal team covers business hours; external MDR provider covers nights, weekends, and holidays.
Tiered alerting: Critical alerts (safety system anomaly, new device in control zone) page an on-call OT analyst 24/7; lower-priority alerts queue for business hours review.

Most organizations with internal OT SOC capability use a tiered alerting model: true 24/7 coverage for critical alerts through an on-call rotation, with non-critical review during business hours. This requires clear definitions of which alert types require immediate 24/7 response and which can wait for the next business day.

Playbook Development for OT Alerts

Playbook Structure

OT security playbooks should follow a consistent structure that guides analysts through investigation and response without requiring expert-level OT knowledge at every step:

Alert description: What triggered this alert? What does it mean in OT context?
Initial triage questions: The first three to five questions to determine whether this is a true positive requiring response or a false positive requiring tuning
Evidence gathering: What additional data to collect (PCAP, logs, asset information, operations team input)
Escalation criteria: At what point does this escalate to the OT security engineer or operations team?
Operations team notification script: What to say to operations, in operational language, not security jargon
Containment options: Available response actions ordered from least to most disruptive
Resolution and documentation: How to close the case with appropriate documentation
False positive feedback: How to report a false positive for tuning

Essential OT Playbooks

Develop playbooks for the alert types most likely to occur in your environment. Priority playbooks for most OT environments:

New Device Discovery Triggered when an OT monitoring platform detects a device communicating on the OT network that was not previously in the asset inventory. Triage questions: Is this device authorized? Was a maintenance activity underway that could explain a new connection? Is this an attacker-controlled device or a legitimate asset that was not inventoried? Response: verify with the responsible operations engineer, add to inventory if authorized, investigate if unauthorized.

Unauthorized Control Command Triggered when a control command (Modbus write, EtherNet/IP Set_Attribute) is received by a PLC from a source other than an authorized SCADA server or engineering workstation. This is a high-priority alert requiring rapid investigation. Triage: verify the source address, confirm whether a maintenance activity was underway, review recent access logs for the source system.

Engineering Software Connection Outside Maintenance Window Triggered when a PLC programming tool (TIA Portal, Studio 5000, RSLogix) initiates a connection to a PLC outside of scheduled maintenance windows. Could indicate unauthorized logic modification attempt. Immediate escalation to OT security engineer and operations notification required.

First-Time Communication Between Two OT Devices A device that has never previously communicated with another specific device begins doing so. Could be a new integration or could indicate lateral movement. Triage: verify against recent change records, review the nature of the communication (protocol, data volume, frequency).

Protocol Anomaly A Modbus function code outside the permitted set, a DNP3 request with an unusual or malformed structure, an OPC UA connection attempt with an invalid certificate. These may indicate reconnaissance, exploitation attempts, or misconfigured systems. Triage and categorize per the specific protocol and anomaly type.

Communication to/from Known Malicious Infrastructure An OT-connected system (historian, engineering workstation, DMZ server) initiating communication to or receiving communication from an IP address associated with known malicious infrastructure. Treat as a potential active incident. Immediate escalation.

Integration with Enterprise SOC

Defining the Handoff

The boundary between the enterprise (IT) SOC and the OT SOC requires explicit definition. Common models:

OT SOC handles all OT alerts independently: The OT SOC receives OT monitoring events directly. The IT SOC receives a summary feed of significant OT events for situational awareness but does not triage or investigate OT alerts.
IT SOC triages OT alerts at Tier 1: Enterprise SOC analysts handle initial triage of all alerts, including OT, and escalate to OT specialists for anything requiring industrial process context.
Fully integrated investigation with shared tooling: Both IT and OT events flow into the same SIEM, with role-based access and routing that directs OT alerts to OT-trained analysts.

The fully integrated model provides the best correlation capability but requires the most investment in OT-specific SIEM content and analyst training. The independent model is simpler to operate but risks missing cross-domain attack patterns, which are exactly the patterns that characterize the most sophisticated OT intrusions.

Joint Investigation Procedures

For incidents that cross the IT/OT boundary -- which is the most common pattern for significant OT incidents -- a joint investigation process is essential. Define:

Who leads: is the OT SOC lead or the IT SOC lead the incident commander for cross-domain investigations?
Communication channel: how do the two teams communicate during an active joint investigation?
Evidence sharing: how is evidence collected from IT systems made available to OT investigators and vice versa?
Decision authority: who authorizes containment actions on IT systems that may affect OT? Who authorizes OT containment actions?

These questions feel administrative until an active incident forces them in real time. Defining the answers in advance makes the difference between a coordinated response and two teams working at cross-purposes.

SOC Maturity Measurement

Key Metrics

Track and report these metrics to demonstrate SOC effectiveness and drive improvement:

Detection metrics:

Mean Time to Detect (MTTD): average time from incident occurrence to alert generation
Alert coverage: percentage of OT network covered by active monitoring
Detection rate: estimated percentage of malicious activity in the environment that generates alerts (assessed through purple teaming and simulation)

Response metrics:

Mean Time to Acknowledge (MTTA): time from alert generation to analyst acknowledgment
Mean Time to Investigate (MTTI): time from acknowledgment to investigation conclusion
Mean Time to Contain (MTTC): time from confirmed true positive to containment action
False positive rate: percentage of alerts that are determined to be false positives after investigation

Operational metrics:

Playbook coverage: percentage of alert types with documented playbooks
Analyst competency: training completion rates, exercise performance scores
Backlog rate: number of alerts older than SLA without resolution

Maturity Model

Use a structured maturity model to assess current state and plan improvement:

Level	Characteristics
1, Ad Hoc	Monitoring exists but no consistent alert review process. Response is reactive and undocumented.
2, Developing	Alerts reviewed but inconsistently. Some playbooks exist. No formal metrics. SOC staffed but OT expertise limited.
3, Defined	Consistent alert review with documented playbooks for common scenarios. Metrics tracked. OT and IT SOC integration defined.
4, Managed	Metrics-driven improvement. Regular exercises. Playbook coverage for all significant alert types. Threat intelligence integrated.
5, Optimizing	Continuous improvement through red/purple team exercises. Predictive analytics. Leading indicator metrics beyond detection speed.

Most organizations entering this journey are at Level 1 or 2. The goal for year one should be reaching Level 3: consistent, documented, metric-tracked SOC operations for OT. Levels 4 and 5 are multi-year development objectives. Beacon Security has helped organizations across energy, manufacturing, and utilities build toward Level 3 as a foundation, and the consistent finding is that the biggest gains come not from technology but from the playbooks and escalation processes that make the technology actionable.

Conclusion

An OT SOC is the operational expression of the security program's monitoring and response capabilities. Without it, alerts go unreviewed, incidents go undetected, and the security investment in sensors and monitoring platforms delivers no operational value.

The model that is right for your organization depends on your size, risk profile, regulatory requirements, and available resources. But some version of the SOC function -- defined, staffed, equipped, and operated -- is a requirement for any industrial organization that takes its OT security posture seriously.

Beacon Security helps industrial organizations design, build, and operate OT Security Operations Centers, from initial model selection through technology deployment, playbook development, and ongoing operational support. Contact us to assess your current OT monitoring and response capabilities.