Role Definition
| Field | Value |
|---|---|
| Job Title | Cybersecurity Data Scientist |
| Seniority Level | Mid-level |
| Primary Function | Builds ML models for security applications -- malware classification, anomaly detection, phishing detection, user behaviour analytics (UEBA), and network traffic analysis. Performs exploratory data analysis on security telemetry (logs, PCAP, endpoint data), engineers features from threat data, trains and validates models, and communicates findings to SOC and threat intelligence teams. Works at security vendors (CrowdStrike, Darktrace, Palo Alto Networks) or enterprise SOCs with dedicated data science teams. |
| What This Role Is NOT | NOT an AI/ML Engineer -- Cybersecurity (69.2 Green Accelerated) who builds production ML pipelines, deploys models at scale, and architects MLOps infrastructure. The data scientist focuses on research, analysis, model prototyping, and statistical validation rather than production engineering. NOT a generic Data Scientist (19.0 Red) lacking security domain expertise. NOT a Threat Intelligence Analyst (30.4 Yellow) who consumes ML model outputs rather than building them. NOT a SOC Analyst who triages alerts generated by these models. |
| Typical Experience | 3-7 years. Typically 2-4 years in data science/statistics plus 1-3 years in cybersecurity domain. Python, scikit-learn, PyTorch/TensorFlow, Pandas, SQL. Security knowledge: MITRE ATT&CK, network protocols, malware families, log analysis. Common certs: Security+, CySA+, AWS ML Specialty. |
Seniority note: Junior (0-2 years) would score Yellow -- executing existing notebooks and running pre-built pipelines without designing novel detection models. Senior/Lead (8+ years) would score deeper Green with research agenda ownership and strategic influence over detection architecture.
- Protective Principles + AI Growth Correlation
| Principle | Score (0-3) | Rationale |
|---|---|---|
| Embodied Physicality | 0 | Fully digital. All work in Jupyter notebooks, ML platforms, and security analytics environments. |
| Deep Interpersonal Connection | 0 | Primarily analytical. Collaborates with SOC and threat intel teams but core value is statistical and ML modelling capability, not relationships. |
| Goal-Setting & Moral Judgment | 2 | Makes consequential decisions about detection model design -- acceptable false positive/negative trade-offs, which threat categories to prioritise, how to handle adversarial evasion. Does not set organisational strategy (that is senior/CISO), but exercises significant domain-specific analytical judgment about what to model and how to validate it. |
| Protective Total | 2/9 | |
| AI Growth Correlation | 2 | Dual recursive demand: (1) more AI adoption generates more AI-powered attacks requiring ML-based detection, and (2) security vendors and enterprise SOCs invest in data science teams to build next-generation detection. Every new attack vector creates a new modelling problem. |
Quick screen result: Protective 2 + Correlation 2 = Likely Green Zone (Accelerated). Proceed to confirm.
Task Decomposition (Agentic AI Scoring)
| Task | Time % | Score (1-5) | Weighted | Aug/Disp | Rationale |
|---|---|---|---|---|---|
| Exploratory data analysis on security telemetry | 15% | 4 | 0.60 | DISPLACEMENT | Standard EDA (distributions, correlations, outlier identification) on security logs, PCAP, and endpoint data. AI agents now perform EDA end-to-end with minimal oversight. The security context adds some complexity but the analytical workflow is largely automatable. |
| Feature engineering from threat data | 15% | 3 | 0.45 | AUGMENTATION | Extracting meaningful features from raw security data (PE headers, API call sequences, network flow statistics, user session patterns). Requires domain knowledge of what attackers do and what distinguishes malicious from benign. AI handles routine feature extraction but the human designs novel features for emerging threats. |
| Build and validate ML models (malware classification, anomaly detection, UEBA, phishing) | 25% | 2 | 0.50 | AUGMENTATION | Core modelling work against adversarial data. Each security environment has unique baselines, threat profiles, and evasion patterns. Off-the-shelf AutoML produces unacceptable false positive rates in adversarial settings. The data scientist designs model architectures tuned to specific threat landscapes, validates against adversarial examples, and handles concept drift from evolving attacker TTPs. AI assists with hyperparameter tuning and architecture search but cannot independently design robust detection for novel threats. |
| Research novel detection techniques and threat landscape analysis | 15% | 1 | 0.15 | NOT INVOLVED | Evaluating emerging ML approaches (graph neural networks for lateral movement, transformers for log sequences, foundation models for security telemetry) and mapping them to specific detection problems. Genuine novelty -- the threat landscape evolves continuously and no automated system can independently determine which ML technique addresses which emerging attack pattern. |
| Statistical validation and model performance analysis | 10% | 3 | 0.30 | AUGMENTATION | A/B testing detection models, statistical significance testing, ROC/PR curve analysis, cross-validation design. AI tools handle computation but the data scientist sets evaluation criteria, determines acceptable performance thresholds in operational context, and decides when a model is production-ready given adversarial constraints. |
| Communicate findings to SOC/IR/threat intel teams | 10% | 2 | 0.20 | AUGMENTATION | Translating model outputs into actionable intelligence for security operations. Explaining what the model detects, its limitations, expected false positive rates, and how to interpret its alerts. Requires security domain knowledge and the ability to bridge data science and security operations. AI drafts summaries but the human provides context and operational judgment. |
| Automate detection workflows and integrate models with SIEM/SOAR | 10% | 3 | 0.30 | AUGMENTATION | Building data pipelines and integrating trained models into security platforms. SOAR and SIEM platforms handle structured integration, but designing the intelligence layer and ensuring model outputs drive correct automated responses requires human judgment about security context. Overlaps with ML engineering but at a lighter, more analytical level. |
| Total | 100% | 2.50 |
Task Resistance Score: 6.00 - 2.50 = 3.50/5.0
Assessor adjustment to 3.55/5.0: The raw 3.50 slightly underweights the adversarial dimension. Unlike generic data science where model performance improves monotonically, security ML faces intelligent adversaries who actively evade detection models. This cat-and-mouse dynamic adds a persistent human requirement that the task-level scoring captures at the individual task level but compounds across the full role. Adjustment is minimal (0.05) and keeps the score below AI/ML Engineer Cybersecurity (3.80) where it belongs -- the engineering role has stronger production system responsibilities.
Displacement/Augmentation split: 15% displacement, 70% augmentation, 15% not involved.
Reinstatement check (Acemoglu): Yes -- AI creates new tasks: designing detection models for AI-generated phishing, building classifiers for deepfake social engineering, developing UEBA models for AI agent behaviour monitoring, adversarial robustness testing for security ML models, and foundation model adaptation for security telemetry. The threat landscape expands with every AI capability advance.
Evidence Score
| Dimension | Score (-2 to 2) | Evidence |
|---|---|---|
| Job Posting Trends | 2 | AI/ML postings surged 163% YoY to 49,200 (Lightcast 2025). Cybersecurity at 457,000+ US openings (CyberSeek 2025). The intersection -- data scientists with security expertise -- is acutely scarce. ZipRecruiter lists average cybersecurity data scientist salary at $165K (March 2026), indicating strong employer demand. LinkedIn ranked AI engineering the #1 fastest-growing job title for 2026. |
| Company Actions | 2 | Every major security vendor employs data science teams: CrowdStrike (Falcon ML), SentinelOne (Purple AI), Darktrace (autonomous response), Palo Alto (Cortex XSIAM), Exabeam (UEBA), Securonix, Gurucul. Startups raising heavily for AI-powered security (Abnormal Security, Vectra AI). No evidence of role cuts -- vendors are expanding ML/DS teams. |
| Wage Trends | 1 | Cybersecurity data scientist average $120K-$165K mid-level (Salary.com, ZipRecruiter 2026). Intersection premium: cybersecurity salaries growing 4.7% YoY (Motion Recruitment 2026) plus AI premium of 28% (HeroHunt). Growing above inflation but not as steeply as pure ML engineering roles due to the data science market's mixed signals at the generic level. |
| AI Tool Maturity | 1 | AutoML handles standard classification/regression but security-domain models require adversarial robustness that off-the-shelf tools cannot provide. Attackers actively evade detection models -- AutoML trained on historical data cannot adapt to novel evasion techniques. Platforms (SageMaker, MLflow) automate pipeline operations but the data scientist designs what to build. Anthropic observed exposure: Data Scientists 46.05%, Information Security Analysts 48.59% -- mixed automated/augmented. |
| Expert Consensus | 1 | ISC2 2025: AI is top-5 cybersecurity skill. Cisco Talos: LLMs are "sidekicks" that "complement rather than replace." Gartner: 45% of cybersecurity tasks automatable by 2028 -- creates demand for those who build the automation. However, generic data science consensus is more cautious -- AutoML is compressing mid-level DS roles. The cybersecurity domain adds protection but does not fully escape the DS compression narrative. |
| Total | 7 |
Barrier Assessment
Reframed question: What prevents AI execution even when programmatically possible?
| Barrier | Score (0-2) | Rationale |
|---|---|---|
| Regulatory/Licensing | 1 | No formal licensing. EU AI Act mandates human oversight for high-risk AI systems used in security monitoring of critical infrastructure. NIST AI RMF requires documented human-in-the-loop. Creates structural demand for qualified humans who understand model behaviour. |
| Physical Presence | 0 | Fully remote capable. |
| Union/Collective Bargaining | 0 | Tech sector, at-will employment. |
| Liability/Accountability | 1 | Detection models that miss threats cause real harm -- breaches, data loss, regulatory penalties. If a malware classifier fails to catch an intrusion, someone is accountable. EU AI Act assigns liability to providers of high-risk AI. Mid-level data scientists share accountability with leadership. |
| Cultural/Ethical | 1 | Organisations require human validation that security models are robust, unbiased, and not susceptible to adversarial manipulation. The stakes of false negatives (missed breaches) and false positives (operational disruption) demand human oversight of model decisions. |
| Total | 3/10 |
AI Growth Correlation Check
Confirmed at 2. Dual recursive demand:
- AI growth drives attack growth: 82.6% of phishing emails now contain AI content (KnowBe4 2025). AI-generated malware, deepfake social engineering, and automated exploitation chains create new detection problems requiring new ML models.
- AI growth drives defence investment: Security vendors invest heavily in data science teams to build detection capabilities into their platforms. Every new AI deployment creates new attack surfaces requiring ML-based monitoring.
- The adversarial feedback loop: Unlike generic data science, security ML operates against adversaries who adapt to evade detection. This creates perpetual demand for human data scientists who can design models that stay ahead.
This qualifies as Green Zone (Accelerated): AI Growth Correlation = 2 AND AIJRI >= 48.
JobZone Composite Score (AIJRI)
| Input | Value |
|---|---|
| Task Resistance Score | 3.55/5.0 |
| Evidence Modifier | 1.0 + (7 x 0.04) = 1.28 |
| Barrier Modifier | 1.0 + (3 x 0.02) = 1.06 |
| Growth Modifier | 1.0 + (2 x 0.05) = 1.10 |
Raw: 3.55 x 1.28 x 1.06 x 1.10 = 5.2988
JobZone Score: (5.2988 - 0.54) / 7.93 x 100 = 60.0/100
Zone: GREEN (Green >=48, Yellow 25-47, Red <25)
Sub-Label Determination
| Metric | Value |
|---|---|
| % of task time scoring 3+ | 50% |
| AI Growth Correlation | 2 |
| Sub-label | Green (Accelerated) -- Growth Correlation = 2 AND AIJRI >= 48 |
Assessor override: Formula score 60.0 accepted. Adjusting final display to 60.7 (+0.7) to reflect the adversarial ML dimension that compounds across tasks beyond what individual scoring captures. This role sits logically between Cyber Security Researcher (52.6) and AI/ML Engineer Cybersecurity (69.2) -- the data scientist is more analytical and less production-engineering focused than the ML engineer, warranting the gap. No zone boundary is affected.
Assessor Commentary
Score vs Reality Check
The 60.7 AIJRI is well-calibrated within the cybersecurity domain. It sits 8.5 points below AI/ML Engineer Cybersecurity (69.2) -- correct because the ML engineer owns production pipeline architecture and deployment, adding structural task resistance the data scientist lacks. It sits above Cyber Security Researcher (52.6) because the data scientist's direct model-building for threat detection has stronger evidence and growth correlation than pure research. Compared to generic Data Scientist (19.0 Red), the cybersecurity specialisation adds 41.7 points through the adversarial domain (+0.30 task resistance via adjustment), strongly positive evidence (+7 vs implied negative), and AI Growth Correlation (+2 vs 0). No borderline risk -- 12.7 points above Green threshold.
What the Numbers Don't Capture
- Supply shortage confound. The intersection of data science and cybersecurity is exceptionally rare -- most data scientists lack security domain knowledge and most security professionals lack statistical modelling depth. Premium salaries partly reflect scarcity rather than structural protection. If cross-training programmes close the gap, wage premiums could compress while the role itself remains Green.
- AutoML compression on the data science side. The generic data science market is under severe pressure from AutoML. The cybersecurity domain adds protection through adversarial complexity, but the EDA and standard modelling portions (15% displacement) are vulnerable to the same AutoML tools compressing generic DS roles. The adversarial moat must hold for the score to remain valid.
- Title rotation risk. "Cybersecurity Data Scientist" may not persist as a distinct title. As ML becomes embedded in security platforms, this work could fold into "Detection Engineer," "Security Researcher," or "ML Engineer" titles. The work persists; the title and its distinct premium may not.
- Function-spending vs people-spending. Security vendors invest in ML capability, but increasingly build it into their platforms. Enterprise teams that once hired in-house cybersecurity data scientists may instead consume vendor-built ML models, reducing the total addressable headcount outside vendor R&D teams.
Who Should Worry (and Who Shouldn't)
If you are building custom ML models for novel threat detection -- designing malware classifiers that resist evasion, building UEBA models that detect lateral movement in unique environments, or developing detection for AI-generated phishing -- you are in a strong position. The adversarial dimension of your work means AutoML cannot replace you, and both AI growth and cybersecurity growth feed your demand.
If your work is primarily running pre-built notebooks on vendor-supplied security datasets, tuning hyperparameters on existing models, or performing routine EDA on SIEM logs -- your risk profile is closer to generic Data Scientist (Red Zone). Platform vendors are automating this layer into their products.
The single biggest factor: whether you design novel detection models or operate existing ones. Building models that resist active evasion by human attackers is the moat. Running existing analytics pipelines is not.
What This Means
The role in 2028: The cybersecurity data scientist will focus on building detection systems for AI-powered attacks (deepfake social engineering, AI-generated malware variants, automated exploitation chains), developing UEBA models for AI agent behaviour monitoring, and designing adversarial robustness frameworks. Foundation models adapted for security telemetry will be standard tooling. EDA and standard modelling shrink further as AI agents handle these. The role becomes more specialised and more adversarial.
Survival strategy:
- Master adversarial ML. Adversarial examples, evasion attacks, model poisoning, concept drift in security contexts -- this is the moat AutoML cannot cross. It separates this role from generic data science.
- Build deep security domain expertise. MITRE ATT&CK fluency, threat intelligence integration, understanding of attacker TTPs. The $165K+ roles go to data scientists who understand both the models and the threats.
- Move toward LLM and agentic AI security applications. AI agent behaviour monitoring, LLM-powered threat analysis, foundation model adaptation for security -- these are the frontier applications where demand is accelerating.
Timeline: Role strengthens over the next 5-10+ years. Dual growth drivers (AI adoption and cybersecurity threat expansion) create compounding demand. Those who maintain adversarial ML expertise and deep domain knowledge are well-positioned.