Technical

Why Beacon Built the Agentic Data and Context Layer for Modern Security Operations

AI SecOps has a dirty secret: the agents work fine. The data doesn't.

Or Mattatia

March 2026

There's a surge of energy around security agents right now. Detection agents. Triage agents. Investigation agents. The question everyone's asking is how far automation can go inside SecOps. The uncomfortable answer: farther than the data allows.

Agents are becoming more capable. They can plug into SIEM and automation platforms, ticketing systems, and communication tools. But integration is not intelligence. The last mile, actually enabling agents to apply their intelligence and act more autonomously, is blocked by fragmented, low-quality telemetry, and that is why real-world adoption stays behind the hype. 

An agent that can pull alerts from a SIEM and then query the identity platform and endpoint security platform, is seeing the tip of the iceberg. The real job is accurately reasoning across identity, endpoint, cloud, email, SaaS, and AI activity simultaneously, at scale. 

And in most environments, that data is siloed, noisy, and far higher volume than an agent can process. Agents inherit the same data problems that have plagued human analysts for years, except they fail fast, fail visibly, and erode the trust organizations need to let automation take on real work.

Here's what happens when an investigation agent meets real-world data.

The Attack That Sailed Through Triage

AiTM phishing is one of the most common and noisy alert patterns SOC teams deal with. It's also a good way to show exactly where investigation agents fall short.

The case starts with a weak but plausible signal: Okta flags a risky sign-in. New IP, new geo. Enough to open triage, not enough to prove compromise. CrowdStrike displays that the endpoint is clean because there's no malware to find. In reality, the attacker used an AiTM reverse proxy to steal the session cookie. From the agent's perspective? Clean login, real user, valid session.

Meanwhile, the attacker replays the stolen session through a residential proxy and moves through SSO into Salesforce, Glean, and AWS. Glean helps them map sensitive projects and internal context. Salesforce gives them customer data. AWS gives them a foothold to enumerate the environment and move toward sensitive data stores.

The evidence to catch this existed. It just wasn't accessible to the agent.

Salesforce logs were already landing in Snowflake for the GTM team, but security didn't know they were there. Glean wasn't connected to the SIEM or the AI SOC tooling at all. In AWS, only CloudTrail management events were collected. The logs needed to understand the real scope (VPC Flow Logs, S3 data events, DNS logs, load balancer telemetry) were excluded because ingesting them at full volume would have tripled the bill. This is a common tradeoff, and it almost always resolves in favor of cost over coverage.

The agent tried anyway. It attempted to query the VPC Flow Logs bucket directly, burning through API tokens and LLM context scanning terabytes of unpartitioned, unenriched data. It never surfaced the actual attack story. So the agent reasoned on a partial picture and let the attack sail through triage.

This Pattern Repeats Across Security Operations

This isn't specific to AiTM investigations. The same failure mode shows up in detection engineering, threat hunting, compliance, and incident response. Anywhere a workflow depends on correlating telemetry across domains that are collected and governed separately.

For example, a detection engineering agent designs rules to catch unauthorized access to sensitive data in Azure Blob Storage may need to correlate three signals: role elevation or PIM activation in Entra ID or Azure RBAC, Storage resource logs showing blob data plane reads and writes, and network flow telemetry showing connectivity to storage endpoints.

It does not matter whether a human or an agent writes the rule. The rule can only be as good as the data available.

In many environments, the SIEM primarily ingests control plane telemetry such as Azure Activity Logs and sometimes Entra audit logs, which cover role changes and resource modifications. But blob data plane logging is often not enabled, or is not routed to the SIEM due to its volume and ingestion cost. Network flow logs may exist in storage, but are typically too noisy and too large to operationalize for detection at scale.

The result is that the rule gets reduced to flagging suspicious privilege elevation alone, while the actual blob access or exfiltration goes unseen.

The detection logic did not fail. The telemetry strategy underneath it did.

What's Missing 

From these examples, one thing becomes clear: security operations need an architectural layer that governs how data is understood, not just how it moves.

Entity resolution, so the same user stops appearing as three different identities across three different systems. Schema consistency, so fields mean the same thing regardless of source. Contextual enrichment, so an agent knows whether a role assumption is routine or anomalous before it starts reasoning. Security meaning preserved at every event, so optimization doesn't strip the data an investigation will eventually need.

Without that layer, adding more agents only increases the number of questions that can't be answered.

Beacon's Approach

This is what we built at Beacon: an agentic data and context layer that sits between raw telemetry and every downstream consumer, human or AI agent. And to make it dramatically faster, more adaptive, and less dependent on engineering effort, we embedded AI into every stage of the pipeline.

Collection. Agents build and maintain integrations. New sources are onboarded in hours, not weeks, and adapt when APIs or schemas change. Onboarding a new source is no longer a 2-month project.

Coverage. The Beacon Agent continuously monitors the telemetry estate, tracking new applications, flagging missing sources, alerting when critical data goes quiet, and resolving issues before they impact detection or investigation.

Normalization and enrichment. AI maps raw fields to standard schemas with contextual understanding of what each field represents. Identity, threat intelligence, and asset context are applied in-stream through two approaches: expert-vetted Recipes that deliver production-ready logic out of the box, and the AI Assistant, which lets teams describe custom enrichment and transformation needs in natural language.

Optimization. Security-driven Recipes reduce volume by 60–80% without losing detection fidelity. High-volume sources that were too expensive to collect become a no-brainer. The cost-coverage tradeoff that forced teams to fly blind disappears.

Sensitive Data Detection and Control. Beacon’s AI detects and classifies sensitive data in motion, including PII, financial data, credentials, and secrets. Policies are enforced in-stream, ensuring data is masked, redacted, or rerouted before reaching downstream systems. This reduces exposure and compliance risk while maintaining full telemetry visibility.

The result: every downstream consumer (analyst, detection rule, or AI agent) works with data that is structured, complete, and governed.

The Same Attack, Different Data Layer

Let's go back to the AiTM case: same alert, same attacker, same environment. The difference is what the agent has to work with.

The Okta alert still opens the investigation. But this time, the agent has access to Beacon's live map of the organization's telemetry: what exists, where it lives, what schema it follows, and what security value it carries.

The pipeline has already enriched the Okta event at ingestion. Reverse DNS on the source IP resolves to known phishing infrastructure. DNS and IP reputation are applied in-stream. The alert reaches the agent flagged as likely AiTM, not just a generic risky sign-in.

Salesforce activity exists in Snowflake under a GTM-owned pipeline. Beacon knows it's there. Glean was onboarded automatically as part of continuous coverage (Beacon tracks new applications entering the environment and keeps their telemetry usable for security without waiting for a ticket). Data that was invisible before is now part of the investigation from the start.

In AWS, the agent no longer has to choose between blindness and terabytes of raw logs. Beacon has already filtered, partitioned, and summarized high-volume telemetry so the relevant evidence is queryable in seconds, at a fraction of the cost. A second detection fires in-stream on VPC Flow Logs (connections to the same malicious IP) without anyone shipping the full raw volume into the SIEM.

Firewall events arriving over syslog are parsed, structured, and normalized so the agent can bound the blast radius: what was touched, what wasn't, and where to focus.

The result: the agent works on top of a continuously governed telemetry estate instead of piecing together a partial story from stale, disconnected data.

Where This Is Going

The agents are ready. The models are improving fast. The future of autonomous security operations is real, and the teams that get there first will be the ones that solved the data problem underneath. 

We built Beacon to be that foundation.

Ready to see it in practice? Talk to our team.

See what your security data can become
Schedule a demo