Mid‑market factories have plenty of tools—EDR, firewalls, collectors, scanners, “single panes of glass.” Yet outages, unsafe changes, and unresolved alerts still crush OEE and erode trust. The blunt truth: tools without a 24×7 NOC/SOC, a clear playbook, and disciplined operations don’t produce outcomes. This article details why always‑on operations are non‑negotiable in manufacturing, and how Surya’s Manufacturing Platform couples an OT‑first stack with follow‑the‑sun services to turn your tooling into real resilience.
The uncomfortable truth: Tools ≠ outcomes
Signal volume is not the problem. Conversion is. Factories drown in alerts that never convert into action because no one owns triage end-to-end, especially off-hours.
OT risk is asymmetric. A single bad change to a PLC network or a mistimed patch window can idle lines, spoil batches, or create physical hazards.
Mean time to notice (MTTN) is the silent killer. If nobody’s watching at 2:17 a.m., your noon stand-up is already twelve hours late.
Conclusion: Without a 24×7 NOC/SOC with authority, playbooks, and service-level discipline, your investments won’t move OEE, safety, or cyber risk.
Manufacturing reality check (OT is not IT)
- Safety and availability beat everything. Recovery plans must be tested in the context of human safety and quality, not just uptime SLAs.
- Legacy + proprietary protocols. Many assets are decades old; visibility and change control must adapt to Modbus, Profinet, EtherNet/IP, and vendor-specific quirks.
- Frozen change windows. Maintenance windows are narrow and seasonal; your SOC/NOC must plan months ahead, not days.
- Constrained bandwidth and remote sites. Edge-heavy designs, offline plants, and brownfield networks demand thoughtful telemetry and store-and-forward strategies.
What “always-on” actually means
Follow-the-sun coverage with accountable handoffs, not “best-effort” on-call.
Layer |
Function |
Examples of Responsibilities |
L1 (24×7) |
Event Intake & Triage |
Dedup, enrich, and correlate alerts; validate against work orders & planned maintenance; escalate by runbook. |
L2 |
Incident Response & Remediation |
Contain account/endpoint, isolate VLAN/segment, rollback config, coordinate with plant ops, verify safety interlocks & QA checks. |
L3/SME |
Root Cause & Engineering |
Chronic problems, network/OT architecture fixes, PLC/SCADA vendor coordination, playbook automation. |
Duty Manager |
Command & Communications |
Severity classification, executive comms, regulator/customer notifications, post-incident review. |
Shift design: 24×7 L1, regional L2/L3 with on-call rotation, daily global stand-up, and mandatory written handoffs.
The playbook is the product
If your “playbook” is a wiki nobody reads, you don’t have a playbook—you have risk. Surya treats playbooks as living code: versioned, tested, measured.
Core manufacturing playbooks (examples):
- Loss of visibility to a PLC/robot cell (Profinet/EtherNet/IP)...
- Unauthorized remote access attempt to an OT segment...
- Ransomware in IT with potential OT blast radius...
- Vendor firmware upgrade window...
- Quality anomaly tied to control loop drift...
Each playbook has: trigger conditions, data sources, owners, comms templates, guardrails, rollback, and success criteria.
Metrics that matter (tie to OEE and risk)
- MTTN/MTTD/MTTR: measured by site and line; trend weekly.
- % Known OT assets (OT CMDB coverage) and signal-to-action conversion rate.
- Change success rate and unauthorized change prevention.
- Backup integrity (restore tests, not just job success).
- Vuln backlog burn-down for crown jewels...
- OEE lift attributable to stability work (baseline before/after).
Reference operating model (people • process • tech)
People
- OT NOC/SOC analysts with industrial protocol literacy...
- Plant operations liaison in every incident; safety officer on speed dial.
- Named vendor contacts (PLC/robotics/network) with SLAs.
Process
- Single intake queue (ServiceNow). No side channels.
- Standard severities; duty manager on every Sev1/Sev2.
- Daily change advisory for OT; weekly risk review...
Tech
- Asset discovery & deep protocol monitoring at the cell level.
- Telemetry consolidation...
- Declarative configs and versioned runbooks; drift detection...
The Surya Manufacturing Platform: tools and services
OT Security & Asset Intelligence — Claroty
- Passive asset discovery, risk scoring...
- Threat detection tuned to OT realities...
Observability & AIOps — Datadog
- Plant-aware dashboards...
- Network performance monitoring...
Service Management & OT CMDB — ServiceNow
- OT CMDB classes for PLCs, HMIs, historians, robots...
- Incident, change, problem workflows...
Identity & Zero Trust — Microsoft 365 E5 / Entra
- Identity-driven segmentation...
Edge & Hybrid — Azure and AWS
- Azure: Arc/Stack HCI for plant compute...
- AWS: Outposts / Snow family for ruggedized edge...
Glue: Event bus + normalization → correlation in Datadog and ServiceNow → governed actions via identity and network controls → documented outcomes and learning.
Landing the program (fast)
30-Day OT Assessment (PoV Appliance)
- Deploy passive discovery and observability...
90-Day Stabilize
- Stand up the OT CMDB, core playbooks...
12-Month Mature
- Automate top playbooks; integrate vendor workflows...
Commercials
Device-based pricing aligned to asset count and sites...
Sample incident flow (Sev2: PLC cell intermittent drops)
- L1 receives burst loss alarms...
- L1 escalates to L2...
- Duty manager engages plant lead...
Governance that sticks
- Change control: no SNOW ticket, no change. Period.
- Tabletop exercises...
- After-action reviews...
What “good” looks like in 6 months
- 95% OT asset census...
- MTTN < 5 minutes...
- Measurable OEE lift...
Bottom line
If you run plants, a 24×7 NOC/SOC with real playbooks is not optional—it’s the difference between “we have tools” and “we hit our schedule.” Surya doesn’t just sell software or staffing. We operate your resilience. Tools + services, aligned to the factory floor, measured against OEE and risk. Start with a 30-day assessment. End with a multi-site, always-on operation you actually trust.
Call to action
- Book a 30-minute executive briefing...
- Run the 30-day PoV at one line or site...
- Pick your expansion plan...