Beacon Security

Research

The Battle of Schemas: ECS, UDM, and OCSF Compared

There's a category of work in security that doesn't get much attention but everything depends on. It's not detection engineering and it's not threat intel. It's getting your security data to speak the same language across every source you collect.

Roman Gurevich

March 2026

There's a category of work in security that doesn't get much attention but everything depends on.

‍

It's not detection engineering and it's not threat intel. It's getting your security data to speak the same language across every source you collect.

‍

Every security tool, cloud platform, and SaaS vendor structures its logs differently. Different field names, different nesting, different conventions, all describing the same things. And if you want to correlate across any of them, someone has to translate.

‍

There are plenty of schemas and formats in the security industry trying to solve this. Some are vendor-specific, some are community-driven. In this post we're focusing on three that keep coming up in the conversations we have with security teams: ECS, UDM, and OCSF.

‍

ECS is native to the Elastic ecosystem, and UDM is native to Google Chronicle/SecOps. OCSF is a community-led standard that gained major traction through AWS Security Lake, and is now used beyond it. In this post we’ll focus on these three; other common frameworks exist (e.g., Splunk CIM and Microsoft Sentinel ASIM), but they’re out of scope for this comparison.

‍

We'll look at what each one optimizes for, where it creates friction, and why the mapping quality ends up mattering more than which schema you're on.

‍

To keep things concrete, we'll map the same raw Zoom event into all three schemas side by side. If you've been following our series on Zoom as a security data source (Part 1 explored the telemetry, Part 2 built a location detection on top of it), this is the layer underneath. But you don't need to have read them to follow along.

The Babel Problem

The problem shows up the moment you try to correlate across sources. Take a single Zoom meeting event and try to correlate it with sign-in activity from Okta and endpoint telemetry from CrowdStrike.

‍

Zoom gives you user_id, email, ip_address, os, qos[].details.avg_latency. Okta gives you actor.id, client.ipAddress, client.userAgent. CrowdStrike gives you UserName, aip , event_simpleName.

‍

Same concepts. Different field names. Different nesting. Different conventions.

‍

If you want to correlate across these sources ("show me everything this user did across Okta, Zoom, and CrowdStrike"), you have two choices: write custom joins every time, hunting through each vendor's schema for equivalent fields. Or translate everything into a common language first and query once. The rest of this post is about the second option.

Meet the Three Schemas

We're going to map a single Zoom event into ECS, UDM, and OCSF. But before we get to the mapping, it's worth understanding what makes each schema different. Not the spec details, but the philosophy behind them and the tradeoffs that come with each approach.

ECS: Search-First and Flexible

Cares most about: "Put it where analysts will look for it."

‍

ECS uses flat dot-notation. Fields live in families: user.* for identity, source.* for network origin, host.* for device, event.* for classification. The philosophy is search-first: put fields where analysts will look for them. If you're used to typing user.email: in Kibana, ECS makes that work across any data source.

‍

ECS is also flexible. You pick the event classification. You decide whether to mirror fields (putting IP in both source.ip and client.ip for different query patterns). There's no schema police rejecting your data if you make unconventional choices.

‍

That flexibility comes with a tradeoff. Because multiple fields can validly represent the same concept, different teams or vendors can map the same source in different ways and both be "correct" by ECS standards. When that happens, you end up writing queries that account for multiple possible field locations for the same data, which is exactly the kind of complexity a schema is supposed to eliminate.

UDM: Role-Based and Structured

Cares most about: "Who did what to whom?"

‍

UDM thinks in nouns: principal (who did it), target (what was affected), src, dst, observer. Every event is fundamentally an actor doing something to a target.

‍

The philosophy is role-based: before you map a field, you have to answer "is this the actor or the target?" For a Zoom meeting, that's straightforward, the participant is the principal. For a firewall log, it gets more interesting.

‍

UDM is more constrained than ECS. Event types come from an enum. Field paths are deeper and more verbose. The upside is consistency: once you learn the principal/target model, you can navigate any UDM data.

‍

That structure forces you to think carefully about every mapping, which can slow down initial onboarding but pays off downstream. The tradeoff is verbosity: field paths get long, and events that don't fit a clean actor/target model (like system health checks or passive telemetry) can require awkward workarounds.

OCSF: Classification-First and Validated

Cares more about: "What category of event is this?"

‍

OCSF thinks in classes. Every event belongs to exactly one class: Authentication (3002), Network Activity (4001), API Activity (6003), etc. The class you pick determines which fields exist and whether they're required, recommended, or optional.

‍

The philosophy is classification-first: before you can map anything, you have to answer "what category of event is this?" That commitment unlocks structure. OCSF can validate that you've filled in required fields for your chosen class.

‍

OCSF also uses integer enums where ECS uses strings. activity_id: 1 instead of event.action: "logon". More precise, less readable.

‍

OCSF is newer and still evolving. Its class system gives you strong structure, but it gets tricky when one source spans multiple event types. With Zoom, for example, you may need to map participation data and QoS telemetry into different classes, which can lead to duplicated context (user/device repeated across events) and higher volume (sometimes up to ~3×). Since the class choice dictates the event shape and required fields, a wrong choice can force a remap later. And because many values are integer enums, the raw data is harder to interpret without reference tables.

Same Zoom Event, Three Translations

Let's come back to our Zoom example. Here's a single Zoom participant record from the QoS Summary API. It's a composed event that combines meeting participation data with network quality metrics. The QoS fields might look unusual for security data, but they turn out to be surprisingly useful for detection. Latency, jitter, and packet loss encode physical distance in ways that IP geolocation can't, which is how we used them to build a location anomaly detection.

‍

{
  "user_id": "abc123",
  "user_name": "Alice Smith",
  "email": "alice@corp.com",
  "ip_address": "203.0.113.42",
  "internal_ip_addresses": ["10.0.1.50"],
  "os": "Win",
  "os_version": "10.0.19045",
  "pc_name": "ALICE-LAPTOP",
  "mac_addr": "00:1A:2B:3C:4D:5E",
  "join_time": "2024-01-15T14:30:00Z",
  "leave_time": "2024-01-15T15:45:00Z",
  "health": "good",
  "qos": [
    {
      "type": "audio_input",
      "details": {
        "avg_latency": "126 ms",
        "avg_jitter": "12 ms",
        "avg_loss": "0.03%",
        "avg_bitrate": "27.15 kbps"
      }
    }
  ],
  "meeting": {
    "id": 98765432101,
    "uuid": "abc123xyz",
    "topic": "Weekly Sync"
  }
}

‍

Note: Zoom's QoS Summary returns quality metrics as strings with units (e.g., "126 ms", "0.03%"). We parse these into proper numerics so aggregations, thresholds, and percentiles work correctly.

‍

Now let's see where each field lands in each schema. The table below maps core concepts across ECS, UDM, and OCSF. It's not exhaustive, it's meant to show where the same data ends up in each schema and where the differences get interesting.

‍

One caveat: not all fields are always available. Zoom's API may not return participant emails for guests or external users, and some fields depend on account-level privacy settings.

Category	Zoom Raw	ECS	UDM	OCSF	Notes
Identity	`email`	`user.email`	`principal.user.email_addresses[0]`	`actor.user.email_addr`	Zoom may not return participant emails, especially for guests or external users. We anchor on email when present, otherwise fall back to stable participant identifiers.
	`user_id`	`user.id`	`principal.user.userid`	`actor.user.uid`
	`user_name`	`user.name` `user.full_name`	`principal.user.user_display_name`	`actor.user.name`
Network	`ip_address`	`source.ip`	`principal.ip[0]`	`src_endpoint.ip`	UDM puts IP in principal because this is the actor's IP, not a target's.
	`internal_ip_addresses`	`host.ip`	`principal.asset.ip`	`device.ip`
Device	`os ("Win")`	`host.os.type: "windows"`	`principal.asset.platform_software.platform: "WINDOWS"`	`device.os.type_id: 100`	Value normalization: Zoom says "Win", ECS wants lowercase, UDM wants uppercase, OCSF wants an integer enum.
	`os_version`	`host.os.version`	`principal.asset.platform_software.platform_version`	`device.os.version`
	`pc_name`	`host.name` `host.hostname`	`principal.hostname`	`device.hostname`
	`mac_addr`	`host.mac`	`principal.mac`	`device.mac`
Event	`event type`	`event.action: "meeting.participant.summary"`	`metadata.event_type: "USER_RESOURCE_ACCESS"`	`class_uid: 6003` `activity_id: 1`	ECS is flexible, UDM constrains to an enum, and OCSF uses API Activity as a best-fit class for this collaboration telemetry.
	`category`	`event.category: ["session"]`	`(implied)`	`category_uid: 6`
QoS	`qos[].details.avg_latency`	`zoom.qos.audio.latency_ms.avg`	`additional.fields["qos_audio_latency_ms"]`	`unmapped.qos.audio.latency_ms`	No schema has native QoS fields. ECS uses custom namespaces, UDM uses key-value fields, and OCSF uses unmapped.
	`qos[].details.avg_jitter`	`zoom.qos.audio.jitter_ms.avg`	`additional.fields["qos_audio_jitter_ms"]`	`unmapped.qos.audio.jitter_ms`	`"12 ms"` → `12.0` normalization applies.

Why This Matters: Cross-Source Detection

The schema comparison above might feel academic: who cares whether the email lives in user.email or principal.user.email_addresses? Here's why it matters.

‍

It matters the moment you try to combine sources. Say you want to add Okta logs alongside your Zoom data. Okta captures something Zoom doesn't: where users authenticated from. Now you can ask a question neither source can answer alone: does the user's Zoom network behavior match where they logged in?

‍

That question only works if both sources share common anchors. Without a unified schema, here's what that looks like:

‍

-- Without unified schema: different field names, different paths
SELECT *
FROM okta_logs o
JOIN zoom_sessions z
  ON o.actor.alternateId = z.participant_email
WHERE o.client.ipAddress != z.ip_address
  AND o.published BETWEEN z.join_time AND z.leave_time

‍

You're doing the translation work inside the query itself. Every new source means learning another vendor's naming conventions. Now imagine doing that across five, ten, twenty sources.

‍

With unified schema (ECS in this case):

‍

-- With unified schema: same anchors across sources
SELECT *
FROM events
WHERE user.email = 'alice@corp.com'
  AND event.module IN ('okta', 'zoom')
  AND @timestamp BETWEEN '2024-01-15T09:00:00Z' AND '2024-01-15T10:00:00Z'

‍

The join is implicit. Both sources use user.email for identity, source.ip for network origin, @timestamp for time.

‍

Now, you can keep layering sources to make the investigation stronger: HR data for expected work location, VPN logs to check whether an IP is a known tunnel endpoint, endpoint telemetry for the device's actual network path. Each source adds a signal, but only if they land in a schema where user.email means the same thing everywhere, where source.ip is always the originating IP, where timestamps are in the same format.

What about AI?

It's tempting to think that AI agents can just work directly on raw vendor logs and figure out the schema differences on the fly. And in theory, they can. But in practice, if identity, timestamps, IP roles, device context, and event classification aren't consistent and correctly typed across sources, the agent ends up re-deriving semantics every time it runs. That's where you get the failure modes people associate with unreliable agent tooling: investigations that produce different results each time, conclusions that are hard to reproduce or audit.

Agents get dramatically more reliable when the data layer underneath them is stable.
Clean, normalized, well-typed data means the agent can spend its intelligence on reasoning and prioritization instead of repeatedly figuring out what a field probably means.

We'll go deeper on what "agent-ready" data quality looks like in a follow-up post.

So Which Schema Should You Be On?

In practice, you rarely choose a schema. You inherit one from the platform you build on. Every detection, dashboard, enrichment pipeline, and correlation rule you build is coupled to that schema's field paths and conventions.

That coupling also goes deeper than field names. UDM and OCSF enforce type correctness at ingestion: send a string where an integer is expected and the event gets rejected or dropped. ECS is more forgiving, it'll accept the wrong type without complaint, but your query will silently return no results. Either way, the failure mode is the same: detections that look right but don't fire.

Understanding these tradeoffs matters in two places. First, if you're evaluating a SIEM or data lake, the schema should be part of the decision. Prefer one with broad security coverage, an active community, and a track record of keeping up with new source types. Second, consider whether you want to own the mapping layer instead of depending on your SIEM vendor's defaults. Most out-of-the-box mappings are incomplete or inconsistent across sources, and for a source like Zoom, there's usually no default mapping at all.

Bridging the Gap

This is where a security data pipeline changes the equation. Instead of being locked into one schema and one vendor's mapping quality, you control the translation layer.

Beacon builds and maintains schema mappings from any source to any destination (ECS, UDM, OCSF, or whatever comes next) including sources your SIEM doesn't natively support. Zoom is a good example: SIEMs don't ship with native Zoom mappings, so teams either skip the data or do the mapping work themselves. With Beacon, the data lands directly in your SIEM, in its native format, with consistent anchors and correct types.

That mapping layer also decouples you from your SIEM choice. If you're happy with your SIEM, you get better data quality without changing anything else. If you're migrating, you re-target the mappings instead of rebuilding every detection and dashboard from scratch. And if you're routing data to multiple destinations, a SIEM and a data lake, for example, each one gets the data in its native format.

Want to see it in action? Get in touch to take it for a test drive.

‍

Go Back

See what your security data can become

Schedule a demo