1. Introduction
The rapid advancement of large language models, multi-modal AI systems, and agentic software architectures has fundamentally altered the calculus of occupational risk. Unlike previous waves of automation — which targeted routine manual and cognitive tasks — current AI systems demonstrate competence across creative, analytical, and interpersonal domains that were previously considered resistant to machine substitution [5].
Existing frameworks for assessing occupational exposure to AI tend to fall into two categories: broad-brush probability estimates [8] that lack the granularity needed for individual career planning, and task-level decompositions [6] that catalogue exposure without weighting for real-world adoption barriers. Neither approach produces a single, interpretable risk score that accounts for both technical capability and deployment likelihood.
This paper introduces the AI Job Resistance Index (AIJRI), a composite scoring methodology designed to address this gap. AIJRI decomposes occupations into weighted tasks, scores each for resistance to agentic AI, then applies evidence-informed modifiers capturing deployment signals from the labour market, industry adoption patterns, and regulatory environments. The result is a normalised 0–100 score that maps to actionable risk zones.
Every assessment is scoped to a specific role at a specific seniority level. “Software Engineer” is not assessed — “Junior Software Developer (0–2 years)” and “Senior Software Engineer (7+ years)” are assessed separately. They land in different zones.
1.1 Key Assumptions
Every scoring framework rests on assumptions. AIJRI is explicit about its own:
- Sub-AGI AI. The methodology assumes increasingly capable, agentic AI — not artificial general intelligence (AGI). If AGI arrives, the question changes from “what can AI do?” to “what should AI be allowed to do?” and the entire framework would need reconsidering.
- 3–5 year horizon. Scores reflect near-to-medium term displacement risk based on current AI capability trajectories. They are not predictions about 2035 or beyond.
- Occupations, not individuals. AIJRI assesses roles as categories. A highly skilled practitioner in a Red-zone role may possess niche expertise not captured by the aggregate score. Conversely, a weak practitioner in a Green-zone role may face more personal risk than the score suggests.
- Western labour markets. Evidence sources are predominantly US/UK-centric (BLS, O*NET, Indeed, LinkedIn). Scores may not transfer to markets with different regulatory frameworks, union structures, or technology adoption rates. The core AI capability assessment (can AI do these tasks?) is largely geography-independent, but the modifiers — especially barriers and evidence — reflect Western conditions.
- Current regulatory environment. AIJRI models existing barriers (licensing, regulation, liability frameworks) but does not predict future AI legislation. Material regulatory changes (EU AI Act enforcement, potential US regulation) could shift scores significantly.
- No robotics discontinuity. Physical trades score highly partly because humanoid robotics has not achieved dexterity in unstructured environments. This is explicitly temporal — if embodied AI achieves breakthrough capability, trades scores would require revision.
- Adaptation is variable. Many Yellow-zone assessments carry an implicit assumption about whether the practitioner adapts to AI tools. A worker who actively integrates AI into their workflow faces less risk than one who resists. Scores generally assume a typical level of adaptation for the role’s seniority level.
- Task decomposition is representative. Each role is broken into 5–10 weighted constituent tasks. Different employers may structure the same role title differently, shifting which tasks dominate and potentially changing the score.
- Current AI capability trajectory continues. Scores assume continued incremental progress in current AI modalities (language models, vision, code generation, agentic orchestration). A discontinuous capability leap — or a sustained plateau — would invalidate many assessments.
- Displacement, not creation. AIJRI models displacement of existing roles. It does not model the creation of new roles that emerge because of AI deployment [1]. This means the framework has an inherently conservative bias — it captures the risk side of the ledger, not the opportunity side.
These assumptions are revisited in the Limitations section, where their practical consequences for score interpretation are discussed.
2. Theoretical Foundations & Related Work
2.1 Task-Based Approaches
The task-based approach to occupational analysis originates with Autor, Levy, and Murnane’s framework for categorising routine versus non-routine work [2]. This was subsequently extended by Frey and Osborne, who estimated the probability of computerisation for 702 occupations using a Gaussian process classifier trained on O*NET task descriptions [8]. While influential, the Frey-Osborne model predates the emergence of large language models and does not account for the non-routine cognitive tasks now demonstrably performed by AI systems.
More recent work by Eloundou et al. introduced the concept of “exposure” to LLMs at the task level [6], distinguishing between direct exposure and LLM-augmented exposure. AIJRI builds on this decomposition but adds a critical dimension: evidence of real-world deployment.
Massenkoff and McCrory [18] extended this by introducing Observed Exposure — a measure combining Eloundou et al.’s theoretical capability scores with actual AI usage data from Anthropic’s Claude platform. Their findings provide direct empirical validation of AIJRI’s approach: the four occupations with highest observed exposure (computer programmers 74.5%, customer service representatives 70.1%, data entry keyers 67.7%, medical records specialists 66.7%) all fall in AIJRI’s Red Zone. Critically, they quantify the gap between theoretical capability and actual deployment — Computer & Math occupations are 94% theoretically exposed but only 33% covered in practice — confirming that task analysis alone systematically overstates displacement risk. Their weighting of fully automated use at 100% and augmentative use at 50% mirrors AIJRI’s displacement vs augmentation classification in Step 2.
2.2 The Harvard Critique
Lichtinger and Hosseini Maasoum [10] argue that pure task analysis is speculative without market evidence. The fundamental problem: technical capability ≠ actual deployment. An AI might be able to do a task but never be permitted or deployed to do it. AIJRI’s Evidence Score (Step 3) and Barrier Assessment (Step 4) directly address this critique.
2.3 Augmentation vs Displacement
Acemoglu and Restrepo [1] model the tension between automation (displacing workers from existing tasks) and the creation of new tasks where labour has a comparative advantage. Brynjolfsson et al. [3][4] demonstrate that “exposure ≠ displacement” — high-exposure, high-complementarity roles saw increased labour demand [17].
The nurse paradox illustrates this: among the earliest and most active healthcare AI adopters (ambient AI documentation, clinical decision support, predictive analytics), yet one of the most AI-resistant roles in the economy (AIJRI 82.2, Green Stable). In AIJRI, every task is classified as DISPLACEMENT, AUGMENTATION, or NOT INVOLVED — not just “exposed.”
2.4 Seniority Divergence
Stanford research [4] found that workers aged 22–25 in AI-exposed roles saw −13% employment since 2022, while older workers in the same occupations grew 6–9%. Harvard [10] frames this as “seniority-biased technological change.”
| Job Family | Junior / Entry Level | Senior / Strategic Level |
|---|---|---|
| Software Developer | RED (9.3) | GREEN (55.4) |
| Accountant | RED (bookkeeper) | YELLOW (advisory, 47.3) |
| SOC Analyst | RED IMMINENT (T1, 5.4) | GREEN (CISO, 83.0) |
| Cloud | YELLOW (cloud engineer, 25.3) | GREEN (security architect, 62.7) |
2.5 Scoring Model Design
The UN Human Development Index switched from arithmetic mean to geometric mean in 2010 [13] to reduce substitutability between dimensions — preventing high income from fully compensating for poor health or education. CVSS [7] uses multiplicative elements within sub-score calculations, creating a non-compensatory structure where weakness in key dimensions dominates the output. This property — where weakness cannot be hidden by strength elsewhere — inspired AIJRI’s fully multiplicative composite.
3. Data Sources & Collection Infrastructure
The infrastructure was purpose-built to ground every assessment in verifiable market evidence rather than theoretical speculation.
| Source Category | Primary Data | What It Captures |
|---|---|---|
| Labour market | Indeed, LinkedIn, Glassdoor, Google Jobs, USAJobs, Reed | Job posting trends, salary data, skill requirements |
| Company actions | 79 curated RSS feeds, earnings calls, press releases | Hiring/layoff decisions citing AI, restructuring signals |
| AI tool maturity | Systematic product review, vendor documentation, Anthropic Economic Index [18] | Commercial AI tools targeting each role’s core tasks, cross-referenced with observed AI usage data at the task level |
| Academic research | Semantic Scholar (220M+ papers), industry reports | Expert consensus on displacement/augmentation |
| Regulatory | Licensing boards, EU AI Act, union agreements | Structural barriers to AI deployment |
| Occupational data | BLS, O*NET, Glassdoor salary percentiles | Baseline employment, task descriptions, wages |
Trust hierarchy (strict order): 1) Official sources (BLS, O*NET, government agencies), 2) Platform data (direct API returns from job boards), 3) Curated research (peer-reviewed papers), 4) Web intelligence (news, earnings calls — always validated against higher tiers).
3.1 Research Infrastructure
Evidence is gathered through a purpose-built multi-engine research system that runs parallel queries across multiple search and analysis platforms. Each assessment draws from some or all of the following tiers, depending on the complexity and sensitivity of the role:
| Tier | Engines | What It Provides |
|---|---|---|
| Quick | Web search + BrightData SERP | Fact checking, headline verification, breaking news |
| Wide | Perplexity Sonar + Gemini Flash + Tavily | Three independent search engines run in parallel for cross-validated general research |
| Deep | Perplexity Sonar Pro + Gemini Pro + Tavily Advanced | Upgraded models for complex multi-part queries requiring synthesis |
| Ultra | Perplexity Deep Research and/or Gemini Deep Research Agent | Agentic research systems that autonomously plan and execute multi-step investigations across 100+ sources over 2–5 minutes |
| Scholarly | Semantic Scholar (220M+ papers) | Peer-reviewed academic papers, citation networks, expert consensus |
| Labour market | Indeed, LinkedIn, Glassdoor, Google Jobs, USAJobs, Reed (via BrightData APIs) | Job posting counts, salary data, skill requirements, hiring trends across 7 countries |
| News & signals | 79 curated RSS feeds + BrightData scraping | Company announcements, earnings calls, restructuring signals, layoffs citing AI |
Multiple engines querying the same evidence domain reduces single-source bias. When Perplexity and Gemini return conflicting findings, the assessor investigates the discrepancy rather than averaging it — disagreement between engines is a signal, not noise.
3.2 Domain Research Layer
To prevent version drift (e.g., citing a workforce gap as “3.5M” in one assessment and “4.8M” in another), shared domain-level evidence is captured once per domain and refreshed on a 30-day cycle. Each of the 25 occupational domains maintains a research file covering: workforce and market data, salary benchmarks, production-deployed AI tools, expert consensus, regulatory landscape, and calibration anchors. Role-specific assessments inherit this baseline and only research what the domain file doesn’t cover.
3.3 Computational Verification
All numerical calculations — weighted task scores, evidence totals, barrier sums, and the final composite formula — are computed programmatically rather than by hand. The assessor enters raw scores into a calculation script that produces the modifiers, raw composite, normalised JobZone Score, and zone classification. This eliminates arithmetic errors and ensures every published score is reproducible from its inputs. The interactive calculator in Section 5.4 below implements the same formula for independent verification.
3.4 Evidence Validation — The TruthSeeker Protocol
Every factual claim cited in an assessment — layoff announcements, AI tool capabilities, workforce statistics, expert predictions — is subject to a structured fact-checking protocol adapted from professional journalism and intelligence analysis standards. The system applies methodologies from the IFCN (International Fact-Checking Network), the SIFT method (Stop, Investigate, Find, Trace) developed by Mike Caulfield, and Analysis of Competing Hypotheses (ACH) from intelligence tradecraft.
The protocol operates in 12 phases:
Phase 0–1
Emotional check & classification. Is the claim factual, opinion, satire, or prediction? Is it designed to manipulate? Claims that trigger emotional reactions receive extra scrutiny.
Phase 2–3
Existing fact-checkers & lateral reading. Check Snopes, PolitiFact, Reuters, and AP first. Then verify the source’s credibility — not just what they said, but what others say about them.
Phase 4–6
Primary source hunt, competing hypotheses, steelmanning. Trace to the original document or data. Generate alternative explanations (ACH). Find the best evidence FOR the claim before testing it — avoiding confirmation bias.
Phase 7–8
Multi-source verification & source authentication. Cross-validate across independent engines. Apply a 6-tier source credibility hierarchy from primary sources (Tier 1) to low-credibility outlets (Tier 6). Context gathering ensures claims are not technically true but misleading.
Phase 9–10
Visual verification & manipulation detection. Reverse image search for visual claims. Scan for weasel words, emotional language, cherry-picking, and false dichotomies.
Phase 11
Conflict resolution & self-review. When sources disagree, weight by credibility × independence × recency × methodology. A 22-point pre-publication checklist must pass before any claim is accepted into an assessment.
Crucially, when multiple search engines return the same underlying article, that counts as one source seen through multiple engines — not independent corroboration. True corroboration requires different organisations reaching the same conclusion independently. This distinction prevents the illusion of consensus from search result duplication.
Each verified claim receives a confidence score (0–100) and a structured verdict (TRUE, MOSTLY_TRUE, MIXED, MOSTLY_FALSE, FALSE, UNPROVEN, OUTDATED, among others). Claims that cannot be verified to a high confidence threshold are either excluded from the assessment or explicitly flagged as low-confidence evidence.
3.5 Taxonomy Cross-Validation
To verify completeness of occupational coverage, the AIJRI role database is cross-referenced against five international job classification systems:
| Taxonomy | Jurisdiction | Occupations | Matched |
|---|---|---|---|
| O*NET | United States (Dept of Labor) | 1,016 | 100% |
| ESCO | European Union (European Commission) | 3,043 | 100% |
| UK SOC 2020 | United Kingdom (ONS) | 412 | 100% |
| NOC | Canada (Statistics Canada) | 516 | 100% |
| ANZSCO | Australia & New Zealand (ABS) | 1,440 | 100% |
| Total | 6,427 | 100% | |
Matching uses a three-layer approach: exact slug matching against assessment filenames, alias matching against 8,000+ registered title aliases, and keyword overlap analysis using extracted term sets. Every occupation in every taxonomy either maps directly to an assessed role or to a registered alias that resolves to one.
4. The AIJRI Methodology — The 8-Step Assessment
Each occupation is assessed through a standardised 8-step pipeline. The assessment is always scoped to a specific role at a specific seniority level.
The 8-Step AIJRI Assessment Pipeline
Step 0
Define
Scoping
Step 1
Screen
Scoping
Step 2
Tasks
Theoretical
Step 3
Evidence
Empirical
Step 4
Barriers
Empirical
Step 5
Growth
Empirical
Step 6
Composite
Calculation
Step 7
Commentary
Judgment
Steps 2, 3, 4, 5 feed into Step 6 (Composite formula inputs)
Step 0: Role Definition
The assessment begins with a precise role definition: job title, seniority level, primary function, what the role is NOT, and typical experience. If an occupation exists at multiple seniority levels, each is assessed separately.
Step 1: Quick Screen
Three Protective Principles (Embodied Physicality, Deep Interpersonal Connection, Goal-Setting & Moral Judgment) scored 0–3 each, plus an AI Growth Correlation scored −2 to +2. The quick screen provides rapid triage but does NOT determine the zone.
Step 2: Task Decomposition (Theoretical Layer)
The role is decomposed into 5–10 constituent tasks, each weighted by percentage of total role time. Tasks are scored for agentic AI capability — not “can AI assist?” but “can an AI agent execute this entire workflow without a human?”
| Score | Automation Potential | Criteria |
|---|---|---|
| 1 | Irreducible Human | Protected by legal accountability, ethical judgment, trust/relationship, regulatory mandate |
| 2 | Barrier-Protected | Requires licensed professional judgment, high-stakes oversight, strategic thinking |
| 3 | Human-Led, AI-Accelerated | AI agents handle sub-workflows but a human leads, directs, and validates |
| 4 | Agent-Executable | Multi-step workflows an AI agent can execute end-to-end with minimal oversight |
| 5 | Fully Automatable | Deterministic, rule-based, or pattern-matching. AI agents already perform at scale. |
Task Resistance Score (TRS): TRS = 6.0 − Weighted Total — the flip converts automation potential to resistance. Higher scores on the task rubric (more automatable) produce a lower TRS (less resistant). Range: 1.0–5.0.
Step 3: Evidence Score (Real-World Layer)
Five evidence dimensions, each scored −2 to +2:
| Dimension | What It Measures |
|---|---|
| Job Posting Trends | YoY change in role-specific postings |
| Company Actions | Hiring/firing decisions explicitly citing AI |
| Wage Trends | Real-terms wage movement (inflation-adjusted) |
| AI Tool Maturity | Commercial AI products targeting core tasks, corroborated by Anthropic’s observed exposure metric [18] |
| Expert Consensus | Agreement across academics, analysts, practitioners |
Evidence Score: Sum of all five. Range: −10 to +10. Each dimension has published calibration thresholds with quantitative anchors.
Step 4: Barrier Assessment
Not “what slows automation down” but what prevents AI execution even when programmatically possible. Five structural barriers, each scored 0–2:
Regulatory / Licensing
Professional licensing, EU AI Act high-risk mandates
Physical Presence
Human body required in unstructured environments
Union / Collective Bargaining
Collective agreements, AI-specific protections
Liability / Accountability
Personal legal liability AI cannot assume
Cultural / Trust
Societal resistance to AI in health, freedom, education, care
Barrier Score: Sum. Range: 0–10.
Step 5: AI Growth Correlation Check
Revisit and confirm/revise the Growth Correlation from Step 1, informed by evidence from Steps 3–4. This determines whether the role is Accelerated (demand grows with AI), Stable, or Transforming.
Step 6: Composite Scoring
Covered in full in Section 5.
Step 7: Assessor Commentary — The Honest Check
Three required sub-sections: Score vs Reality Check (does the zone label match the full picture?), What the Numbers Don’t Capture (structured blind-spot check), and Who Should Worry (and Who Shouldn’t) (plain language guidance).
5. The Composite Scoring Model
5.1 Design Rationale
The v3 composite is multiplicative and non-compensatory: all four dimensions contribute, and weakness in any dimension drags the composite down proportionally. A role with high task resistance but collapsing market evidence should NOT score Green. The market has spoken.
The AIJRI Composite Formula
Raw = TRS × Emod × Bmod × Gmod
TRS
Task Resistance
1.0–5.0
Emod
1.0 + (Evidence × 0.04)
0.60–1.40
Bmod
1.0 + (Barriers × 0.02)
1.00–1.20
Gmod
1.0 + (Growth × 0.05)
0.90–1.10
Cap at 0 and 100 | Normalisation: min 0.54, range 7.93
5.2 Modifier Behaviour
| Modifier | Coefficient | Range | Design Intent |
|---|---|---|---|
| Evidence | 0.04/point | 0.60–1.40 | Most powerful. Market reality is the ultimate arbiter. ±40%. |
| Barriers | 0.02/point | 1.00–1.20 | Can only help, never hurt. Absence of barriers doesn’t accelerate displacement. +20% max. |
| Growth | 0.05/point | 0.90–1.10 | Modest trajectory signal. Strong enough to tip borderline cases. ±10%. |
Coefficient derivation. The coefficients were set through iterative calibration against known occupations with well-understood AI exposure profiles (e.g., SOC Analyst T1, CISO, Electrician, Receptionist). Starting from equal weighting, each coefficient was adjusted until the model produced zone classifications that matched expert consensus for benchmark roles. This is calibration by inspection — standard practice in composite scoring frameworks (CVSS v3.1 base metrics were similarly set by expert working groups, not empirical derivation). We publish the exact coefficients and invite sensitivity analysis: readers can use the interactive calculator below to test how coefficient changes affect outputs.
Methodological honesty. AIJRI is a structured expert assessment methodology, not an empirical predictive model. The task resistance score (the base) reflects expert judgment about AI capability against specific tasks — informed by evidence, but fundamentally a human assessment. The modifiers then adjust this base using empirical market signals. We describe the modifiers as “evidence-informed” rather than “evidence-based” because the underlying task scoring remains a judgment call. The formula provides mathematical discipline and transparency, not mathematical certainty.
5.3 Normalisation
The normalisation uses the v3.0 range. Minimum (0.54) = TRS 1.0 × Emod 0.6 × Bmod 1.0 × Gmod 0.9. Denominator (7.93) = v3.0 theoretical max (8.47) minus minimum (0.54). The v3.1 barrier coefficient recalibration raised the theoretical maximum to 9.24, but the denominator is deliberately not recalculated — doing so would absorb the barrier boost and no role would actually benefit. Scores above 100 are capped. Practical score range: real-world assessments produce scores from approximately 3 to 83.
5.4 Worked Examples
Three real assessments illustrate how the composite model behaves across the spectrum — a Red Imminent role where every modifier compounds downward, a Yellow Moderate role where resistant tasks meet a declining market, and a Green Stable role where all modifiers reinforce the base.
Example 1: SOC Analyst Tier 1 (Entry-Level)
Full pipeline walkthrough — Red Imminent
Step 2: Task Decomposition
| Task | Time | Score | Wtd |
|---|---|---|---|
| Monitor alerts/dashboards | 30% | 5 | 1.50 |
| Triage alerts (true/false positive) | 25% | 5 | 1.25 |
| Follow incident playbooks | 20% | 5 | 1.00 |
| Write/update tickets | 15% | 5 | 0.75 |
| Escalate to L2/L3 | 10% | 3 | 0.30 |
Raw TRS: 6.00 − 4.80 = 1.20 → adjusted to 1.55/5.0
Steps 3–5: Modifiers
| Input | Raw | Modifier |
|---|---|---|
| Evidence | −8/10 | 0.68 |
| Barriers | 1/10 | 1.02 |
| Growth | −2/2 | 0.90 |
Evidence: CrowdStrike cut 500 citing AI. Carvana: 100% T1 alerts now AI-handled. Multiple vendors market “AI SOC Analyst” as the product.
Step 6: Composite Calculation
Raw = 1.55 × 0.68 × 1.02 × 0.90 = 0.968
JobZone Score = (0.968 − 0.54) / 7.93 × 100 = 5.4
Sub-label: TRS 1.55 < 1.8 AND Evidence −8 ≤ −6 AND Barriers 1 ≤ 2 → Red (Imminent)
Example 2: Engineering Manager (Mid-Level)
Resistant tasks, declining market — Yellow Moderate
Inputs
| Task Resistance Score | 3.75/5.0 |
| Evidence Score | −3/10 |
| Barrier Score | 2/10 |
| AI Growth Correlation | −1/2 |
Modifiers
| Evidence | 1.0 + (−3 × 0.04) = 0.88 |
| Barrier | 1.0 + (2 × 0.02) = 1.04 |
| Growth | 1.0 + (−1 × 0.05) = 0.95 |
Composite Calculation
Raw = 3.75 × 0.88 × 1.04 × 0.95 = 3.26
JobZone Score = (3.26 − 0.54) / 7.93 × 100 = 34.3
Combined modifier effect: 0.88 × 1.04 × 0.95 = 0.869 — base score cut 13%. Core tasks (people management, judgment) ARE hard to automate, but teams are shrinking and AI is eliminating middle management layers. The composite captures both: resistant tasks in a declining market.
Example 3: Electrician (Mid-Level)
All modifiers reinforce the base — Green Stable
Inputs
| Task Resistance Score | 4.10/5.0 |
| Evidence Score | +10/10 |
| Barrier Score | 9/10 |
| AI Growth Correlation | +1/2 |
Modifiers
| Evidence | 1.0 + (10 × 0.04) = 1.40 |
| Barrier | 1.0 + (9 × 0.02) = 1.18 |
| Growth | 1.0 + (1 × 0.05) = 1.05 |
Composite Calculation
Raw = 4.10 × 1.40 × 1.18 × 1.05 = 7.11
JobZone Score = (7.11 − 0.54) / 7.93 × 100 = 82.9
Every modifier reinforces the base. Compare to Janitor (AIJRI 44.2) — same physical work category, similar task resistance, but evidence −2, barriers 3/10, growth 0. The composite reveals what a simpler model would hide: physically protected roles are NOT equally safe.
Verify It Yourself
Enter any values to see the composite formula in action. Try replicating the worked examples above, or experiment with your own inputs.
6. Zone Classification & Sub-Labels
6.1 The Three Zones
| AIJRI Score | Zone | Meaning | Time Horizon |
|---|---|---|---|
| 48–100 | GREEN | Role is protected or growing | Safe for 5+ years |
| 25–47 | YELLOW | Role is transforming | Adapt within 2–7 years |
| 0–24 | RED | Role is being displaced | Act now |
AIJRI Zone Classification (0–100)
Practical range: ~3 to ~83
6.2 The 7-Tier Sub-Label System
| Label | Determination |
|---|---|
| Green (Accelerated) | AIJRI ≥ 48 AND Growth Correlation = +2 |
| Green (Stable) | AIJRI ≥ 48 AND <20% of task time scores 3+ |
| Green (Transforming) | AIJRI ≥ 48 AND ≥20% of task time scores 3+ |
| Yellow (Moderate) | AIJRI 25–47 AND <40% of task time scores 3+ |
| Yellow (Urgent) | AIJRI 25–47 AND ≥40% of task time scores 3+ |
| Red | AIJRI <25 AND (TRS ≥ 1.8 OR Evidence > −6 OR Barriers > 2) |
| Red (Imminent) | AIJRI <25 AND TRS < 1.8 AND Evidence ≤ −6 AND Barriers ≤ 2 |
6.3 Calibration Anchors
| Role | TRS | Evidence | Barriers | Growth | AIJRI | Zone |
|---|---|---|---|---|---|---|
| CISO (Executive) | 4.25 | +9 | 6 | +2 | 83.0 | Green Accel. |
| Electrician | 4.10 | +10 | 9 | +1 | 82.9 | Green Stable |
| Registered Nurse | 4.40 | +9 | 9 | 0 | 82.2 | Green Stable |
| Senior SW Engineer | 3.95 | +5 | 2 | 0 | 55.4 | Green Trans. |
| Pen Tester (Mid) | 2.80 | +1 | 5 | +1 | 35.6 | Yellow Urg. |
| Graphic Designer | 2.65 | −7 | 1 | −1 | 16.5 | Red |
| Junior SW Developer | 2.10 | −9 | 0 | −1 | 9.3 | Red |
| SOC Analyst T1 | 1.55 | −8 | 1 | −2 | 5.4 | Red Imm. |
7. Results & Distribution
The AIJRI corpus covers occupations across 28 domains and over 194 specialisms, assessed on a rolling basis with 100% US workforce coverage by employment volume. Browse the full corpus →
Notable Findings
The Nurse Paradox: Registered Nurses score 82.2 (Green Stable) despite being among the earliest and most active healthcare AI adopters. AI augments nursing tasks — ambient documentation, clinical decision support, predictive analytics — without substituting for the nurse.
Physical Trades Resilience: Electricians (82.9), plumbers, and similar skilled trades in unstructured environments score among the highest in the corpus. Moravec’s Paradox provides decades of protection based on current robotics trajectories.
The AI-Accelerated Zone: A distinct cluster scores Green because of AI growth: AI Security Engineer, CISO, AI Governance Lead, ML/AI Engineer. You can’t fully automate securing AI because the attack surface IS AI.
The Yellow Majority: Yellow is consistently the largest zone. The dominant near-term impact of AI is transformation, not outright displacement. Most workers face a changing role, not an eliminated one.
7.1 International Workforce Estimates
AIJRI assessments are built on US labour market data — BLS employment projections, O*NET task descriptions, and US-centric job posting signals. The Data Monitor extends these zone distributions to 8 additional countries and regions using a sector-weighted adjustment model. This section documents the methodology and its limitations.
Rationale
Approximately 85% of the 3,649 assessed roles fall within services and knowledge-work occupations. Countries whose workforce is more heavily concentrated in agriculture and heavy manufacturing — sectors with physical, environmental, and spatial complexity that resists near-term automation — should logically have a larger share of workers in the “green” (protected) zone. A country with 27% agricultural employment (the global average) has far more workers in sectors that AIJRI domain scores rate overwhelmingly GREEN (Agriculture: 6% RED, Trades: 5% RED) compared to the US (1.6% agriculture). This is consistent with IMF estimates that low-income countries face roughly half the AI displacement risk of advanced economies.
Data Source
Sector employment shares are drawn from the World Bank Open Data platform, indicators SL.AGR.EMPL.ZS (agriculture), SL.IND.EMPL.ZS (industry), and SL.SRV.EMPL.ZS (services). These are ILO modelled estimates (November 2025 model release), not direct survey observations. For most high-income countries, the modelled values closely track national statistics; for developing economies, the model may diverge from ground truth.
| Country | Agriculture % | Industry % | Services % | Confidence |
|---|---|---|---|---|
| US | 1.6 | 19.0 | 79.4 | Direct (BLS) |
| UK | 0.9 | 16.1 | 83.0 | High |
| Canada | 1.1 | 18.9 | 79.9 | High |
| Australia | 2.2 | 19.4 | 78.4 | High |
| EU | 3.3 | 24.2 | 72.5 | Moderate |
| Germany | 1.1 | 26.3 | 72.6 | Moderate |
| Japan | 2.9 | 23.3 | 73.8 | Moderate |
| South Korea | 5.2 | 23.7 | 71.1 | Moderate |
| Global | 26.1 | 23.6 | 50.3 | Low |
Adjustment Formula
The US serves as the baseline (79.4% services employment). For each country, a services delta adjustment shifts zone percentages proportionally:
servicesDelta = 79.4 − country.servicesPct
greenAdjust = servicesDelta × 0.2
adjustedGreen = baseGreen + greenAdjust
adjustedRed = baseRed − greenAdjust
adjustedYellow = 100 − adjustedGreen − adjustedRed
The 0.2 factor means that 20% of the services delta shifts between the green and red zones. This is deliberately conservative — a larger factor would overstate the adjustment given the coarseness of the sector-level proxy. The base green and yellow percentages for each country are derived from IMF, OECD, PwC, and ILO labour market estimates.
Confidence Tiers
Each country is assigned a confidence tier based on the structural similarity of its economy to the US (measured by services employment share):
- Direct (US only) — Actual per-role BLS employment data. No extrapolation.
- High (services within 5pp of US: UK, Canada, Australia) — Similar economic structure. Adjustment is <1 percentage point. Zone estimates are likely close to what per-role assessment would produce.
- Moderate (services 5–15pp below US: EU, Germany, Japan, South Korea) — More industrial economies. Adjustment is 1–2pp. Zone estimates are reasonable but less certain.
- Low (services >15pp below US: Global) — Very different economic structure. Adjustment is ~6pp. Zone estimates are indicative only.
Limitations of This Approach
- Sector-level proxy, not role-level assessment — The adjustment assumes that unassessed sectors (agriculture, heavy industry) would contribute more green-zone workers, based on their physical and environmental complexity. This aligns with AIJRI domain scores (Agriculture: 6% RED, Trades: 5% RED) and IMF exposure estimates, but is not verified by per-role assessment in those countries.
- Flat factor across all countries — The 0.2 adjustment factor does not vary by country. In practice, the composition of “services” differs significantly (e.g., Japan’s services sector includes a larger share of retail and hospitality than the US).
- ILO modelled estimates — The sector data are econometric model outputs, not direct survey measurements. They are updated annually and may lag structural changes.
- No within-sector variation — Two countries with identical services percentages may have very different AI exposure if one’s services sector is dominated by finance (high exposure) and the other by healthcare (lower exposure).
- Base zone percentages are themselves estimates — The starting green/yellow/red percentages for non-US countries come from IMF, OECD, and ILO estimates, not from per-role AIJRI assessments.
For these reasons, the Data Monitor displays a confidence badge and sector breakdown for every country, so users can assess for themselves how much weight to place on non-US estimates.
7.2 US State-Level Estimates
The Policymaker Briefing and Data Monitor break down AI displacement risk by US state. These are estimates, not direct per-state assessments. This section documents how they are constructed.
Data Source
State employment counts come from the BLS Occupational Employment and Wage Statistics (OEWS) survey, May 2024 release (state_M2024_dl.xlsx). OEWS reports the number of jobs per Standard Occupational Classification (SOC) code in each of the 50 states plus the District of Columbia.
Methodology
- SOC-to-zone mapping — Each of the ~830 SOC occupation codes is mapped to the AIJRI zone classification (Green, Yellow, or Red) of its corresponding JobZone role assessment. Where a single SOC code maps to multiple JobZone assessments (e.g., senior and junior variants), the primary assessment’s zone is used.
- State aggregation — For each state, the employment count of every SOC code is assigned to its mapped zone. The state’s Green, Yellow, and Red totals are the sum of all employment in SOC codes classified in that zone.
- Percentage calculation — Zone percentages are calculated as
(zone employment / total state employment) × 100. These represent the share of jobs (not individuals) in each zone.
This process achieves 99.6% coverage — 838 of the ~840 SOC codes in the OEWS dataset are matched to an AIJRI zone classification. The remaining codes are excluded from the totals.
What State Differences Reflect
States vary in their zone distributions primarily because of occupational mix, not because the same job is riskier in one state than another. A software developer scores the same AIJRI regardless of whether they work in California or Iowa. The difference is that California has a proportionally larger share of knowledge-work occupations (many of which fall in Yellow or Red zones), while states with higher shares of skilled trades, healthcare, and agriculture workers tend to show more Green.
Limitations
- Job counts, not people — OEWS counts jobs (positions), not individuals. A person holding two jobs is counted twice. This is consistent with how BLS reports employment but means state totals slightly overcount the number of affected workers.
- National zone applied locally — The AIJRI assessment is national (predominantly US-focused). A role’s AI displacement risk may differ at the state level due to local regulation, industry concentration, or adoption rates, but these state-specific factors are not captured.
- SOC granularity limits — SOC codes are broad categories (e.g., “Software Developers” covers everything from web developers to embedded systems engineers). Within a single SOC code, actual AI exposure can vary substantially. The zone assignment reflects the most representative assessment.
- Self-employment excluded — OEWS covers payroll employment only. Self-employed workers, independent contractors, and gig workers are not included. States with large gig economies (e.g., California, Florida) may have understated totals.
Despite these limitations, the state estimates provide a useful approximation of how AI displacement risk is distributed geographically. The underlying data (BLS OEWS) is the most comprehensive and reliable source of state-level occupational employment available.
7.3 International Regional Estimates
The Data Monitor displays choropleth maps for subnational regions within each tracked country (UK regions, German states, Japanese prefectures, Canadian provinces, Australian states, South Korean provinces, and EU member states). These regional breakdowns are estimates derived from national-level data. This section documents how they are constructed.
Data Sources
Regional workforce totals come from each country’s official statistics agency:
- UK: Office for National Statistics (ONS) — 12 ITL1/NUTS1 regions
- Germany: Destatis — 16 Bundesländer
- Japan: Statistics Bureau Japan — 47 prefectures
- Canada: Statistics Canada — 13 provinces and territories
- Australia: ABS — 8 states and territories
- South Korea: KOSIS — 17 administrative divisions
- EU: Eurostat workforce totals; World Bank/ILO services employment percentages (indicator
SL.SRV.EMPL.ZS)
Methodology
For EU member states, the same services-delta formula described in Section 7.1 is applied individually to each country using its World Bank/ILO services employment percentage. Countries with lower services employment (e.g., Romania at 47%) show larger GREEN zones and smaller RED zones than high-services economies (e.g., Luxembourg at 88%).
For within-country regions (UK, Germany, Japan, Canada, Australia, South Korea), the method uses three steps:
- National baseline: The country’s zone percentages from the global dataset serve as the target national average.
- Regional variation: Regions with higher services-sector concentration (typically capital and financial cities) receive higher RED percentages. Regions with more agriculture, manufacturing, or resource-extraction employment receive lower RED and higher GREEN percentages. This reflects the same logic as the country-level adjustment: sectors with physical, environmental, and spatial complexity resist near-term automation.
- Spread calibration: The magnitude of regional variation is calibrated against the observed spread in US state-level data (the only ground-truth within-country data available). US states show approximately a 6 percentage-point spread in RED zone percentages. International regional estimates are compressed to produce similar spreads.
What Regional Differences Reflect
Regional variation in zone percentages reflects differences in economic structure, not differences in how the same job is scored. A financial analyst in London scores the same AIJRI as one in Edinburgh. The difference is that London has a proportionally larger concentration of financial services, professional services, and administrative roles — occupations that tend to fall in the RED zone — while regions with more agriculture, manufacturing, and healthcare employment tend to show more GREEN.
Limitations
- No per-region occupational data: Unlike US states (which use actual BLS occupational employment by state), international regional estimates do not have per-region occupational breakdowns mapped to AIJRI zones. The regional variation is modelled from economic structure, not observed.
- Compressed variation: Real within-country variation may be larger or smaller than the ~6pp spread used. Countries with greater regional economic inequality (e.g., Japan, South Korea) likely have wider spreads in practice.
- Urban/rural proxy: The method assumes that services concentration is the primary driver of regional AI exposure differences. Other factors — regional regulation, industry-specific AI adoption rates, workforce demographics — are not modelled.
- EU member states as “regions”: The EU choropleth treats sovereign nations as regions within a bloc. Individual EU countries have far greater economic diversity than subnational regions, making per-country estimates less precise than the label suggests.
For these reasons, regional choropleth maps should be interpreted as indicative of direction (which regions face more or less exposure) rather than precise estimates of absolute levels. The Data Monitor displays data caveats alongside each regional map, plus contextual explanations (“Why does this breakdown look like this?”) for every country and for individual US states, EU countries, and UK regions — highlighting the specific industries and economic structures that drive each area’s zone distribution.
8. Limitations & Future Work
8.1 Known Limitations
We list these not as disclaimers buried in fine print, but as genuine constraints that users and critics should weigh when interpreting scores. If AIJRI has weaknesses, we would rather identify them ourselves than have them discovered by others.
- No inter-rater reliability data — All assessments are produced by the same assessor system. No independent assessors have scored the same role to test whether two people would reach the same result. Without inter-rater reliability data, we can claim internal consistency but not proven reproducibility. We publish the full methodology and rubrics specifically to enable independent replication and invite researchers to test this.
- Assessor override introduces subjectivity — The methodology permits adjustments of ±5 points on the composite (or equivalent at the task level) with documented rationale. In the SOC T1 worked example, the raw TRS of 1.20 was adjusted to 1.55 — a 29% increase. This is a deliberate design choice: purely formulaic scoring produces absurd edge cases that human judgment must correct. But it means the formula is advisory, not deterministic. Every override is documented, but the mechanism exists.
- Task scoring is inherently subjective — The 1–5 scale for “can an AI agent execute this task?” is a judgment call. Two reasonable assessors could disagree by 1 point per task. Across 5–7 weighted tasks, this compounds into TRS differences of 0.5–1.0, translating to roughly 6–12 points on the final score — enough to shift a zone boundary. The rubric with worked examples constrains this, but does not eliminate it.
- Evidence dimensions are coarse — Five dimensions scored −2 to +2 (a 5-point integer scale) compress complex labour market dynamics into integers. The trade-off is deliberate: finer granularity would create false precision that the underlying data does not support. But it means that the difference between “postings declining slightly” (−1) and “postings collapsing” (−2) is a single point producing a 4% swing.
- Coefficients were derived by calibration, not empirical measurement — The modifier coefficients (0.04, 0.02, 0.05) were set through iterative testing against benchmark roles, not derived from statistical analysis of outcomes. This is standard for composite scoring frameworks but means the relative weighting of evidence vs barriers vs growth reflects design judgment, not observed data.
- Occupations, not individuals — AIJRI assesses roles as categories. A highly skilled practitioner in a Red-zone role may possess niche expertise not captured by the aggregate score.
- Evidence signals lag — The trust hierarchy correctly places official sources (BLS, O*NET) at the top for factual claims, but these are lagging indicators for AI deployment — they will not reflect displacement until years after it occurs. For AI tool maturity, the most informative sources are vendor documentation and tech journalism, which carry lower weight in the trust hierarchy.
- Score validity is temporal — Each assessment reflects conditions at the date of scoring. AI capability changes quarter by quarter. We target a 6-month refresh cycle for high-volatility roles (Red, Yellow Urgent) and 12 months for stable roles (Green), but published scores may become stale between refresh cycles.
- Normalisation drift — The normalisation constants (0.54, 7.93) are frozen from v3.0. As the methodology evolves and coefficients are recalibrated, these may require updating, which would shift all published scores.
- Regulatory interventions not modelled — AI safety legislation (EU AI Act, potential US regulation) could materially alter displacement timelines. AIJRI models current barriers, not future policy.
- Second-order effects — AIJRI does not model new role creation from AI deployment, only displacement of existing roles.
- Physical barriers are temporal — Robotics is eroding physical barriers. Scores for trades reflect today’s reality, not the trajectory of embodied AI.
- LLM hallucination risk — Assessments are generated using large language models processing millions of tokens of evidence per role. Despite the TruthSeeker validation protocol, LLMs can fabricate plausible-sounding citations or statistics. Individual data points should be treated as best-effort synthesis, not independently verified fact.
- Western labour market bias — Evidence sources are overwhelmingly US/UK-centric. Scores may not transfer to labour markets with different regulatory frameworks, union structures, or technology adoption rates. The Data Monitor’s international view uses a sector-weighted extrapolation to partially address this; see Section 7.1 for the methodology and its own limitations.
- Practical score range — Real-world scores span approximately 3 to 83 out of a theoretical 0–100 range. No role currently scores above 83 or below 3, meaning roughly 17% of the top and 3% of the bottom of the scale are unused. This may indicate the normalisation could be tighter, though it also reflects that extreme combinations of inputs are rare in practice.
8.2 Purpose & Best-Efforts Notice
AIJRI was created with a simple purpose: to help people see what is coming. The AI displacement wave is already reshaping labour markets, and most workers have no structured way to assess their own exposure. Existing frameworks are either too academic to be actionable or too simplistic to be useful. AIJRI attempts to bridge that gap.
This methodology is not perfect. It is the best effort we could produce with the tools, data, and knowledge available in early 2026. We publish it openly not because we believe it is definitive, but because we believe the conversation matters more than the precision. If AIJRI provokes debate — about which roles are truly at risk, about whether our scoring weights are right, about what we’ve missed — then it has served its purpose.
We actively invite scrutiny. If our scores are wrong, show us where and we will correct them. The worst outcome is not that AIJRI is imperfect; it is that people sleepwalk into displacement because no one attempted to quantify the risk at all.