RESEARCH

Inside the Beacon Research Lab: Trimming Security Logs Without Going Blind

Security teams often reduce log volume by sampling or filtering, only to discover gaps during detections or investigations. This piece outlines a security-engineering approach to log optimization that starts from real detection and IR workflows, with concrete examples across web, endpoint, and cloud telemetry showing how to cut noise without losing signal.

Doron Karmi

January 2025

Security teams today face a flood of data from network devices, endpoints, cloud platforms, identity systems, SaaS tools, and more. These sources generate a continuous stream of logs at volumes that often outpace teams’ ability to use them effectively.

As a result, this explosion of telemetry creates serious operational pressure across security teams. Below, we focus on three challenges that severely impact detection, investigation, and overall security effectiveness.

The first challenge is noise: when everything is logged, genuinely suspicious behavior is surrounded by vast amounts of routine activity. Detections become harder to tune, investigations take longer, and analysts spend more time filtering than analyzing.

The second is cost, as SIEMs and data lakes charge based on ingestion and query volume.  Indiscriminately ingesting logs means that teams end up paying premium rates to store and search data that may rarely contribute to detection or investigation. 

The third is exclusion due to quota limits. Storage and query budgets consumed by low-value telemetry leave less room for other data sources that could materially improve security coverage. 

At Beacon Research, we believe that tackling these challenges requires a new approach. One grounded in strong data engineering and AI practices, deep cybersecurity expertise, and field tested security data management experience. Achieving this requires understanding how attackers operate, how each log type can be used, how security telemetry is leveraged by detection logic, investigation workflows, and security tools, which events and fields carry meaningful security signal, and where redundancy adds cost without improving security value.

This paper presents several examples of optimizing security telemetry through log specific data transformation Recipes. It walks through cases where low value data is identified and handled, resulting in lower data volumes, reduced costs, and clearer security signal, without sacrificing detection capability or the ability to reconstruct attacks and investigations.

Important context: The examples in this paper intentionally focus on individual, log-specific optimization techniques to illustrate how Beacon engineers security telemetry safely and precisely. In real-world deployments, customers do not rely on a single method in isolation. Beacon combines multiple complementary optimization strategies tailored to each environment, data source, and security objective. When applied holistically, these combined approaches routinely deliver far greater overall reductions in security data volume, often exceeding 70%, while preserving full detection and investigation fidelity. (For example, Lemonade achieved a 75% total reduction across its security data pipeline using Beacon’s multi-layered optimization approach.)

Why Broad Approaches Don't Work

Most volume reduction strategies used today by vendors and security teams fall into one of two categories. Some rely on sampling or other statistical techniques. Others apply broad filtering based on attributes such as event type or severity. While effective at reducing volume, both approaches share the same limitation: they optimize for data reduction without considering how the data is actually used in real security operations.

In practice, different log sources serve very different purposes. Web proxy logs are great for spotting reconnaissance and authentication abuse, like large-scale scanning or brute-force login attempts. Endpoint telemetry underpins process-tree reconstruction and allows responders to trace how code actually executed on a host. Cloud audit trails show who changed what in your environment and are essential for investigating configuration drift and privilege escalation. Each data source can play a distinct role in security operations, and each relies on a specific set of events, fields, and relationships. In many cases, the most critical security signal is not obvious from the raw log structure or default schemas.

That is why, instead of asking what can be discarded, we start by asking how attackers operate and what detections and investigations actually require. Framing the problem this way leads to fundamentally different outcomes – enabling far greater reduction, without destroying security fidelity.

Three Case Studies

We chose to highlight three example optimization strategies from hundreds developed by Beacon. In live environments, Beacon combines multiple strategies into an adaptive Recipe to ensure customers meet both security objectives and data volume targets. If teams ever need a raw copy of the data, the Beacon platform also bifurcates the original stream and stores the full dataset in ultra affordable cloud storage.

Cloudflare HTTP Logs: Intelligent Aggregation with RayID Awareness

Cloudflare HTTP logs capture web traffic at the edge. URLs, methods, status codes, source IPs, user agents, and dozens of other attributes. Security teams use these logs to detect web attacks, correlate WAF alerts with traffic patterns, and investigate suspicious clients. Even modest environments generate enormous volumes of these events, with massive repetition across identical or near-identical requests.

The key to optimizing Cloudflare logs is understanding how Cloudflare structures request handling. Every request gets a RayID, an identifier that appears in logs and can be used to trace a request through their infrastructure. According to Cloudflare's documentation, Ray IDs help evaluate security events for patterns and false positives. But they also note that Ray IDs are not guaranteed to be unique for every request. 

This non-uniqueness matters. In production environments, we see two distinct patterns:

  1. Many events share identical values for all security-relevant attributes within short time windows (same source IP, same path, same method, same status code, same user agent). These represent genuine repetition that can be safely collapsed.

  2. Some RayIDs appear across multiple events that differ in meaningful ways, potentially representing different security rules firing or different internal processing steps for the same underlying request.

The optimization needs to preserve the second pattern while consolidating the first.

Our approach:

Our technology calculates how many times each RayID appears in the dataset. Events are then split into two groups: those with RayIDs that appear exactly once, and those with RayIDs that appear multiple times.

For the single-occurrence group, we apply aggregation based on security-relevant attributes. Events are grouped by important fields such as source IP, path, HTTP method, status code, and user agent, within a defined short time window. The timestamp is rounded to the window boundary. A request count field records how many individual events were consolidated into each aggregated row.

For the multiple-occurrence group, we make no changes to maintain security fidelity. Each event is retained as-is with a request count of one, preserving the full per-event context that might represent different rules or processing steps.

The two datasets are recombined through a union operation. In real deployments, this recipe reduces Cloudflare HTTP volume by approximately 12%, with variation depending on traffic patterns and environment characteristics.

Why this preserves security fidelity:

Detection logic for web traffic usually operates on counts, timing, and context. How many requests came from this IP? When did they occur? What paths were accessed? What was the response pattern? All of these questions remain answerable after optimization. The request count is now explicit rather than implicit in row count. Time is preserved at reasonable resolution through short windowing. Key fields remain visible because they form the grouping criteria.

For investigations, analysts can still pivot on RayID, IP, path, or user agent to understand client behavior. Instead of seeing hundreds of individual identical rows, they see aggregated summaries that communicate the same information more efficiently. 

For detection logic that requires each and every record, we have developed other types of Cloudflare optimization strategies.

CrowdStrike FDR: Canonical Event Selection

CrowdStrike Falcon Data Replicator provides detailed endpoint telemetry including comprehensive process execution data. This telemetry is central to process-based threat detection, building process trees during incident response, and correlating endpoint behavior with identity and network activity. Certain event types appear with extremely high frequency.

Analysis of production FDR deployments revealed three event types that dominated by volume: EndOfProcess, ProcessRollup2, and ProcessRollup2Stats. These events are not equally valuable for security operations.

EndOfProcess events log process termination with limited additional context. They record that a process ended but provide little information about what the process did or why it matters. Detection rules and investigation workflows reference them far less frequently than process creation events.

ProcessRollup2Stats events contain aggregated statistics about process behavior. They can be useful for performance analysis or capacity planning, but they rarely drive threat detections or appear as primary evidence in security investigations. They represent a summary view rather than detailed behavioral data.

ProcessRollup2 events contain rich process execution data: start time, parent-child relationships, executable path, command line arguments, user context, and related metadata. Detection rules overwhelmingly reference this event type when triggering on process-based behaviors. Incident responders depend on ProcessRollup2 data to reconstruct attack chains and understand how adversaries moved through an environment.

Given this disparity in security value, this optimization strategy treats ProcessRollup2 as the canonical process event for security use cases. ProcessRollup2 is routed to the security analytics destination, such as a SIEM or security data lake, while ProcessRollup2Stats is typically excluded. Key information from EndOfProcess can then be reintroduced in a more compact form. In real world environments, this approach yields roughly a 25 percent reduction in process telemetry volume from FDR.

Why this preserves security fidelity:

Process-based detection rules typically trigger on process start events and execution context. They examine command line patterns, parent process relationships, user privileges, executable paths, and file hashes. All of this information lives in ProcessRollup2 events. Knowing that a process terminated, or seeing separate statistical summaries, adds little to the detection posture.

Similarly, incident response centers on reconstructing what happened. Responders follow process trees, identify suspicious executions, and trace lateral movement or privilege escalation. The critical questions are: What ran? Under what context? With which parent process? Spawning what children? These questions all find answers in ProcessRollup2 events. The precise termination time or separate statistical aggregations rarely change the investigative narrative.

By consolidating around a canonical event type, we retain the complete story of process execution while discarding low-utility noise that inflates costs without improving security outcomes.

AWS CloudTrail: Service-Linked Role Filtering

AWS CloudTrail records activity across AWS environments. Configuration changes, resource creation and deletion, permission modifications. Security teams use CloudTrail to monitor control plane activity, detect suspicious configuration changes, and maintain audit trails for compliance. As AWS usage scales, CloudTrail volume grows correspondingly, with much of it generated by AWS services operating on behalf of the customer.

A key concept in AWS IAM is the service-linked role. These are special roles created and managed by AWS services themselves. Unlike customer-created IAM roles, service-linked roles are owned by AWS. They bypass service and resource control policies. The service defines their permissions and trust relationships. Customers cannot directly assume these roles or modify their permissions in the same way they can with ordinary IAM roles. Adversaries who compromise customer credentials similarly cannot leverage service-linked roles as attack vectors in the typical fashion. These roles exist to enable AWS services to perform necessary operations within customer accounts, and their names always begin with "AWSServiceRoleFor".

In production AWS environments, service-linked roles can often generate substantial CloudTrail volume, particularly through read-only operations and KMS Decrypt calls. 

Our optimization:

This strategy identifies CloudTrail events where the principal is a service-linked role based on the naming convention. For these events only, the optimization strategy excludes:

  • Read-only actions (Get, List, Describe prefixes)
  • Decrypt operations when they constitute a large share of volume

All modifying actions (Create, Update, Delete, Put, Attach, and similar prefixes) are retained, even when performed by service-linked roles.

The volume reduction varies by environment and usage patterns, but commonly removes a significant portion of CloudTrail traffic sent to the analytics destination.

Why this preserves security fidelity:

Service-linked roles represent a constrained attack surface. They are AWS-owned and tightly scoped to specific service functions. If an attacker somehow managed to abuse a service or its integrations in a novel way, the modifying actions would still appear in the logs and remain visible for detection and investigation. 

By retaining all modifying actions from service-linked roles, this strategy preserves visibility into what changed in the environment, which service initiated the change, and how service-driven modifications interact with customer-owned identities and resources. If a service-linked role were compromised or a service abused in unexpected ways, the trail of modifications would still be available for investigation.

For compliance requirements that demand full log retention, Beacon routes the complete raw dataset to an ultra affordable storage. The analytics destination receives the higher-value subset that actually drives detections and investigations. This separates compliance retention from operational analytics, allowing each to be optimized appropriately.

Validation Methodology

Optimization recipes are designed to be safe and preserve security capabilities and business objectives. In order to achieve that, Beacon employs both static and dynamic validation to ensure that optimizations do not degrade security posture. In the following blog post, we will highlight the rigorous methods we employ to ensure that and the red-team exercises we employ to test that.

Engineering and Optimizing Security Telemetry for Real Security Outcomes

Safe optimization of security telemetry must begin with security and operational requirements, not volume reduction targets. By grounding optimization in the security objectives and business outcomes each data source is meant to support, Beacon designs recipes that deliberately shape data to preserve security fidelity while materially reducing cost and volume.

This is a data engineering discipline that blends AI powered analysis with deep human understanding of attackers, defenders, and the workflows that depend on security data. As security operations become increasingly automated, telemetry is no longer just something you collect. It is something you design. When designed correctly, it becomes a force multiplier that enables security teams to operate more efficiently, with less noise and greater precision and clarity.

See what your security data can become
Schedule a demo