Role Definition
| Field | Value |
|---|---|
| Job Title | AI Safety Researcher |
| Seniority Level | Mid-Senior |
| Primary Function | Conducts original research in AI alignment, mechanistic interpretability, adversarial robustness, and scalable oversight at frontier AI labs. Designs novel safety techniques, publishes peer-reviewed research, red-teams frontier models, and develops methods to ensure advanced AI systems remain safe and controllable. This is pure research — designing the science of AI safety, not applying existing tools. |
| What This Role Is NOT | NOT an AI Security Engineer (who implements security controls for AI systems). NOT an ML Engineer (who builds production models). NOT an AI Governance Lead (who manages policy and compliance). NOT an applied researcher optimising model performance. |
| Typical Experience | 5-10+ years. PhD in ML, CS, mathematics, physics, or neuroscience typically required. Strong publication record at NeurIPS, ICML, ICLR. Prior work at frontier labs or safety-focused research organisations (MIRI, MATS, FAR.AI, Redwood Research). |
Seniority note: Junior safety researchers (post-PhD, 0-3 years) would still score Green but lower — more execution of established research agendas, less agenda-setting. The core work remains irreducibly human at all levels, but the Goal-Setting score drops from 3 to 2.
Protective Principles + AI Growth Correlation
| Principle | Score (0-3) | Rationale |
|---|---|---|
| Embodied Physicality | 0 | Fully digital. All work occurs in compute environments, whiteboards, and research papers. |
| Deep Interpersonal Connection | 1 | Collaborative research with team members. Mentoring junior researchers. Some stakeholder communication on safety findings. But the core value is intellectual, not relational. |
| Goal-Setting & Moral Judgment | 3 | Defines what "safe AI" means. Sets research agendas for problems that have no precedent. Decides which alignment approaches to pursue and which to abandon. Every research direction is a judgment call about existential risk — there is no playbook to follow. |
| Protective Total | 4/9 | |
| AI Growth Correlation | 2 | Recursive dependency: more powerful AI → more alignment problems → more safety research needed. The role exists because AI is advancing. You cannot automate the work of ensuring AI is safe — that requires the kind of genuine novelty and moral reasoning that defines irreducibly human work. |
Quick screen result: Protective 4 + Correlation 2 = Likely Green Zone (Accelerated). Proceed to confirm.
Task Decomposition (Agentic AI Scoring)
| Task | Time % | Score (1-5) | Weighted | Aug/Disp | Rationale |
|---|---|---|---|---|---|
| Novel alignment & safety research (scalable oversight, model organisms of misalignment, reward hacking mitigation) | 30% | 1 | 0.30 | NOT INVOLVED | Irreducibly human. Inventing new safety techniques for unprecedented AI capabilities requires genuine novelty — no training data exists for problems that haven't been conceived yet. This is frontier science, not pattern-matching. |
| Mechanistic interpretability research (reverse-engineering neural networks, circuit analysis, feature mapping) | 20% | 1 | 0.20 | NOT INVOLVED | Irreducibly human. Understanding how neural networks think internally is analogous to neuroscience — it requires forming novel hypotheses about systems whose internal representations are poorly understood. AI cannot reverse-engineer itself in ways its creators haven't yet imagined. |
| Adversarial robustness & red-teaming (jailbreak research, specification gaming, adaptive defenses) | 15% | 2 | 0.30 | AUGMENTATION | AI assists with known attack pattern execution and automated fuzzing. But discovering novel attack surfaces on frontier models — the kind that break safety guarantees in unexpected ways — requires creative adversarial thinking that exceeds current AI capability. Human leads; AI accelerates sub-tasks. |
| Publishing, peer review & conference presentation (NeurIPS, ICML, ICLR papers) | 15% | 2 | 0.30 | AUGMENTATION | AI drafts sections, helps with literature reviews, and checks mathematical proofs. But the core intellectual contribution — the novel insight, the experimental design, the argumentation for why a safety approach works — is the researcher's. Academic peer review also requires human judgment about scientific merit. |
| AI control & monitoring system design (behavioral monitoring, activation monitoring, anomaly detection) | 10% | 2 | 0.20 | AUGMENTATION | AI assists with implementing monitoring systems and analysing activation patterns at scale. But designing what to monitor and why — choosing which internal representations signal deception or misalignment — requires theoretical understanding that the researcher provides. |
| Mentoring, collaboration & stakeholder communication | 10% | 1 | 0.10 | NOT INVOLVED | Training the next generation of safety researchers, collaborating across teams, communicating safety findings to leadership and policymakers. Human trust and intellectual mentorship cannot be delegated. |
| Total | 100% | 1.40 |
Task Resistance Score: 6.00 - 1.40 = 4.60/5.0
Displacement/Augmentation split: 0% displacement, 40% augmentation, 60% not involved.
Reinstatement check (Acemoglu): Strongly positive. AI creates entirely new research tasks for this role: mechanistic interpretability of novel architectures, alignment of agentic multi-model systems, safety evaluation of recursive self-improvement, machine unlearning techniques, multi-agent governance. The task portfolio expands with every capability advance. This role is not merely persisting — it is accelerating.
Evidence Score
| Dimension | Score (-2 to 2) | Evidence |
|---|---|---|
| Job Posting Trends | 2 | ~3,200 AI safety researcher postings with +78% YoY growth. Every frontier lab (Anthropic, OpenAI, DeepMind, Meta FAIR) actively hiring. Anthropic Fellows Program expanding to two 2026 cohorts. MATS Summer 2026 is the largest ever (120 fellows, 100 mentors). Google DeepMind's AGI Safety & Alignment Team posted dedicated hiring in Feb 2025. |
| Company Actions | 2 | All frontier labs expanding dedicated safety teams. Anthropic's Alignment Science team publishing recommended research directions (Feb 2025). 12 frontier AI companies published safety frameworks in 2025. No evidence of any company reducing safety headcount — the opposite. International AI Safety Report 2026 published Feb 3, reinforcing institutional commitment. |
| Wage Trends | 2 | $150K-$300K+ at frontier labs. Senior researchers at Anthropic/OpenAI/DeepMind commanding $200K-$400K+ total compensation. AI salary premium of 28% over equivalent traditional tech roles. Safety specialisation commands additional premium within AI. Wages surging faster than inflation. |
| AI Tool Maturity | 1 | AI assists with experiment infrastructure, automated evaluation, and scaling interpretability analysis. But novel safety research — inventing new alignment techniques, forming hypotheses about model cognition — has no viable AI replacement. The tools augment researcher productivity but cannot replace the creative research process. |
| Expert Consensus | 2 | Universal agreement. WEF ranks AI/ML specialists as the #1 fastest-growing role through 2030. Anthropic, OpenAI, and DeepMind leadership all publicly state safety research is their top priority. EU AI Act and US EO 14110 codify the need for safety research. Future of Life Institute AI Safety Index (2025) tracks growing global investment. |
| Total | 9 |
Barrier Assessment
Reframed question: What prevents AI execution even when programmatically possible?
| Barrier | Score (0-2) | Rationale |
|---|---|---|
| Regulatory/Licensing | 1 | No formal licensing, but PhD is a de facto requirement. EU AI Act mandates human oversight for high-risk AI. US EO 14110 requires red-teaming by human researchers. These create structural demand but not a licensing barrier per se. |
| Physical Presence | 0 | Fully remote capable. Research is conducted computationally. |
| Union/Collective Bargaining | 0 | Tech sector, at-will employment. No collective bargaining protections. |
| Liability/Accountability | 1 | Growing accountability. Safety researcher sign-off on model safety has increasing legal weight under EU AI Act. If a frontier model causes harm due to inadequate safety research, accountability traces back to the safety team. Not yet at "someone goes to prison" level, but structurally increasing. |
| Cultural/Ethical | 1 | Society and regulators demand that humans — not AI — verify that AI systems are safe. The trust paradox: "can we trust AI to certify itself as safe?" is a core philosophical objection that creates structural demand for human safety researchers. However, this is a soft barrier, not a hard regulatory one. |
| Total | 3/10 |
AI Growth Correlation Check
Confirmed at +2. This is the strongest possible position: the role has a recursive dependency on AI growth itself.
- Every advance in AI capability creates new alignment problems that require novel safety research.
- More powerful models are harder to interpret, creating more interpretability work.
- Agentic AI systems introduce multi-agent safety challenges that didn't exist two years ago.
- Regulatory frameworks (EU AI Act, US EO 14110) mandate safety research outputs as AI deployment scales.
- The "who watches the watchers?" problem is irreducible — AI cannot be trusted to certify its own safety.
This qualifies as Green Zone (Accelerated): AI Growth Correlation = 2 AND JobZone Score ≥ 48.
JobZone Composite Score (AIJRI)
| Input | Value |
|---|---|
| Task Resistance Score | 4.60/5.0 |
| Evidence Modifier | 1.0 + (9 × 0.04) = 1.36 |
| Barrier Modifier | 1.0 + (3 × 0.02) = 1.06 |
| Growth Modifier | 1.0 + (2 × 0.05) = 1.10 |
Raw: 4.60 × 1.36 × 1.06 × 1.10 = 7.2945
JobZone Score: (7.2945 - 0.54) / 7.93 × 100 = 85.2/100
Zone: GREEN (Green ≥48, Yellow 25-47, Red <25)
Sub-Label Determination
| Metric | Value |
|---|---|
| % of task time scoring 3+ | 0% |
| AI Growth Correlation | 2 |
| Sub-label | Green (Accelerated) — Growth Correlation = 2 AND JobZone Score ≥ 48 |
Assessor override: None — formula score accepted. At 85.2, this is the highest-scoring role in the project, which is warranted by the combination of irreducibly human task resistance (4.60) and strong recursive demand. The 4.60 Task Resistance — driven by 50% of time at Score 1 (irreducible) — reflects the genuine novelty required for frontier safety research.
Assessor Commentary
Score vs Reality Check
The 85.2 score is honest and the highest in the project. This is justified: AI safety research is the purest form of "irreducibly human intellectual work" in the AI domain. The task resistance (4.60) exceeds the CISO (4.25) and AI Security Engineer (4.15) because research into novel alignment techniques is more resistant to automation than applying security to AI systems. The barrier score (3/10) is modest — this role survives on the strength of its tasks and market demand, not on regulatory protection. That makes it a clean Green: no barrier dependency, no evidence masking.
What the Numbers Don't Capture
- Supply shortage confound. The surging wages and demand are partly driven by an extremely thin talent pool — perhaps a few hundred world-class alignment researchers globally. If university programmes and fellowship pipelines (MATS, Anthropic Fellows) succeed in scaling supply, wage premiums may compress even as the role remains Green. The work stays; the $300K+ premium may not.
- Field definition instability. "AI Safety Research" is a moving target. Five years ago it was theoretical alignment philosophy. Today it's empirical interpretability and red-teaming. The skills required shift faster than almost any other role. A researcher who doesn't continuously adapt their toolkit risks obsolescence even within a Green Zone role.
- Concentration risk. The majority of positions are at 5-6 frontier labs. If the industry consolidates or if AI development slows (a low-probability but non-zero scenario), the job market contracts dramatically. This role is less diversified across employers than CISO or AI Security Engineer.
- Function-spending vs people-spending. Frontier labs are investing heavily in safety infrastructure (automated evaluation, interpretability tooling) that could eventually reduce the number of researchers needed per safety insight, even as total safety investment grows.
Who Should Worry (and Who Shouldn't)
If you're publishing original research at frontier labs on alignment, interpretability, or adversarial robustness — you're in the strongest career position in the AI economy. Every capability advance creates more work for you. Your skills are globally scarce and regulatory demand is compounding on top of commercial demand.
If you're a junior safety researcher running established evaluation frameworks and benchmarks without contributing novel research — you're in a weaker position than the label suggests. The evaluation and benchmarking layer is where AI tooling will automate first. The researchers who survive long-term are those who design the safety techniques, not those who execute standardised tests.
The single biggest factor: originality of research contribution. The $300K+ roles go to researchers who define new safety problems and invent solutions to them. Running someone else's interpretability pipeline on a new model is useful work, but it's the first layer that automation will absorb.
What This Means
The role in 2028: AI Safety Researchers in 2028 will be tackling safety for increasingly autonomous multi-agent systems, recursive self-improvement, and AI systems that may exceed human performance in specific domains. Mechanistic interpretability will have matured from a nascent field to a core discipline. Automated evaluation tools will handle routine safety benchmarking, freeing researchers to focus on the hardest problems: ensuring alignment for systems whose capabilities are unprecedented.
Survival strategy:
- Maintain a frontier publication record. Papers at NeurIPS, ICML, ICLR on novel safety techniques are the primary career currency. The field moves fast — yesterday's alignment technique is tomorrow's baseline.
- Build deep expertise in one area while maintaining breadth. Specialise in interpretability, alignment theory, or adversarial robustness — but understand enough of the others to collaborate across the safety stack.
- Develop relationships across the safety ecosystem. The community is small. Cross-lab collaborations, conference presence, and mentoring junior researchers build the network that sustains a long career.
Timeline: This role strengthens over the next 10+ years. The driver is AI capability growth itself — more powerful systems require more sophisticated safety research. The only scenario where demand declines is if AI development slows or if AGI arrives and renders the question moot.