Your new AI security analyst just confidently told you that
arn:aws:sts::123456789012:assumed-role/DataPipelineRole/session-xyz
exfiltrated 50GB from an S3 bucket containing customer PII. It cited three CloudTrail events, showed you a timeline, and recommended immediate incident response.
There's one problem: it's wrong. The role was assumed by a legitimate data engineer running a scheduled ETL job. The actual suspicious activity (a different session from a compromised credential) went unnoticed because the AI SOC analyst (driven by an LLM operating on messy, inconsistent log data) couldn’t untangle who did what.
Most teams deploying agentic SOC do so because they want faster investigations and fewer false positives. But hallucinations work directly against both goals. False-negative investigations force analysts back into heavy human-in-the-loop validation, often taking longer than a traditional manual workflow. And hallucinated correlations or incorrect entity mappings generate more false positives, not fewer – ironically increasing alert fatigue rather than reducing it.
This isn't a prompt engineering problem. It's a data problem. And it's getting worse as organizations rush to deploy AI SOC capabilities on top of the same raw, inconsistent log infrastructure that's plagued security teams for years.
Why LLMs Make Bad Data Worse
Large language models are pattern-matching machines trained on structured examples. When you feed them security logs, they're looking for patterns that match their training data. The problem is that raw security telemetry is structurally inconsistent across sources, and LLMs don't handle this gracefully – they hallucinate structure where none exists.
In your environment, the same person might appear as:
- john.doe@company.com in Okta logs
- jdoe in GitHub audit logs
- johndoe in AWS CloudTrail (if using IAM users)
- John Doe in Google Workspace
- A numeric ID in Salesforce
- DOMAIN\john.doe in Windows Event Logs
An AI SOC analyst asked to "trace all actions by this user" will fail to connect these dots unless the data has been normalized beforehand. It might confidently state that john.doe@company.com had no AWS activity because it never realized johndoe was the same person. The response sounds authoritative. The analysis is worthless.
Timestamps present similar problems. Some sources use Unix epochs, others use ISO 8601, some use local time without timezone information. An AI SOC analyst asked to build a timeline across Palo Alto firewall logs (local time) and CloudTrail events (UTC) will produce temporal sequences that are simply incorrect. It won't tell you it's guessing about the timezone offset, it will just build the wrong timeline.
The result is confident hallucinations that are harder to spot than traditional false positives, because they come wrapped in natural language explanations that seem plausible.
Example: The CloudTrail Assume-Role Problem
Let's look at a specific example that breaks most LLM investigation tools: AWS AssumeRole chains.
When someone uses an assumed role in AWS, CloudTrail generates events that look like this:
{
"eventName": "AssumeRole",
"userIdentity": {
"type": "IAMUser",
"principalId": "AIDAI23EXAMPLE",
"userName": "alice@company.com"
},
"requestParameters": {
"roleArn": "arn:aws:iam::123456789012:role/DataPipelineRole",
"roleSessionName": "alice-session"
}
}
Then, subsequent actions using that role appear as:
{
"eventName": "RunInstances",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROAI67EXAMPLE:alice-session",
"arn": "arn:aws:sts::123456789012:assumed-role/DataPipelineRole/alice-session"
}
}
Notice the problem? The second event has no direct reference to alice@company.com. If you're investigating suspicious EC2 instances and you only have the RunInstances event, you see a role ARN – not a human.
Now add complexity: Alice assumes Role A, which has permission to assume Role B, which has permission to assume Role C. The actual CloudTrail event shows only Role C. The chain of assumptions exists in earlier logs, but connecting them requires temporal correlation across potentially thousands of events.
Ask an AI SOC analyst trained on raw logs to "tell me who launched these suspicious instances," and it will confidently tell you it was DataPipelineRole. Ask it to find all actions by Alice, and it will miss everything she did through assumed roles. The LLM isn't broken; the data structure is.
What's needed is enrichment at ingestion time: unravel the AssumeRole chain and add the original principal identity to every event in the assumed role session. This is deterministic correlation, not inference. But it needs to happen before the data reaches your AI SOC analyst, not as part of the prompt.
Requirements for AI-Ready Security Data
Getting consistent, accurate output from an AI SOC analyst requires solving three problems: schema normalization, entity resolution, and contextual enrichment.
Schema normalization means translating vendor-specific log formats into consistent structures. For example, this means ensuring that "user identity" lives in a predictable field name regardless of whether the event came from Okta, AWS, or GitHub. Standards like OCSF (Open Cybersecurity Schema Framework) or ECS (Elastic Common Schema) exist for this reason, but most organizations don't apply them at ingestion time.
Without normalization, your LLM needs to learn that userIdentity.userName, actor.login, user, principal.email, and subject all mean roughly the same thing. It will get this wrong in edge cases, and edge cases are where attackers live.
Entity resolution means connecting the same logical entity across different representations. The john.doe@company.com/jdoe/johndoe problem isn't solved by better prompts—it's solved by maintaining entity mappings.
GitHub provides a concrete example. GitHub audit logs reference users by their GitHub username, not email. If you're investigating whether a leaked credential was used to access your repositories, you need to correlate GitHub activity with your IDP. This requires enriching GitHub events with email addresses by correlating the username with GitHub's user API or with earlier authentication events that contain both identifiers.
This type of enrichment can't happen at query time when you're asking an LLM to investigate. It needs to be baked into the data. Otherwise, the LLM is doing inference where you need facts.
Contextual enrichment means adding information that changes how events should be interpreted. GeoIP is the obvious example—an API call from an unexpected country is different from one from your office—but there are deeper examples.
Consider Okta events. A login attempt might show:
{
"actor": {
"id": "00u1abcdefg",
"type": "User"
}
}
Is this user an admin? Do they have MFA enrolled? Are they a service account? This context determines whether the event is routine or requires investigation, but it's not in the event itself. Enrichment at ingestion time—correlating with Okta's user API to add isAdmin: true or mfaEnrolled: false—gives LLMs the context they need to make accurate assessments.
Without this enrichment, LLMs either ignore important context (treating all logins the same) or hallucinate it (guessing whether someone is an admin based on username patterns).
Data Infrastructure Comes First
AI security tool vendors will tell you their models are getting better at handling messy data and improving context awareness. They're training on more diverse datasets, using better prompts, adding retrieval-augmented generation. All of this is true, and none of it solves the fundamental problem.
You can't prompt-engineer your way out of structurally inconsistent data. An AI SOC analyst doesn't know that it should correlate the AssumeRole event from 30 minutes ago with the current RunInstances event – that's not a reasoning problem, it's a data pipeline problem.
The organizations getting value from AI security tools aren't the ones with the best prompts. They're the ones with modern data pipelines that normalize, enrich, and contextualize security telemetry before it reaches any analysis layer (human or AI).
This means:
- Connectors that understand each source's schema quirks
- Transformation logic that maps to common schemas without losing fidelity
- Enrichment that adds context from authoritative sources (IDPs, CMDBs, threat feeds)
- Entity resolution that maintains consistent identities across sources
If you're evaluating AI SOC analysts, evaluate your data pipeline first. The best AI analyst in the world can't overcome garbage data – it will just explain its hallucinations more convincingly.
Start with the data layer. The AI will work better when you do.

.png)
