Methodology

1. Introduction

The rapid advancement of large language models, multi-modal AI systems, and agentic software architectures has fundamentally altered the calculus of occupational risk. Unlike previous waves of automation — which targeted routine manual and cognitive tasks — current AI systems demonstrate competence across creative, analytical, and interpersonal domains that were previously considered resistant to machine substitution [5].

Existing frameworks for assessing occupational exposure to AI tend to fall into two categories: broad-brush probability estimates [8] that lack the granularity needed for individual career planning, and task-level decompositions [6] that catalogue exposure without weighting for real-world adoption barriers. Neither approach produces a single, interpretable risk score that accounts for both technical capability and deployment likelihood.

This paper introduces the AI Job Resistance Index (AIJRI), a composite scoring methodology designed to address this gap. AIJRI decomposes occupations into weighted tasks, scores each for resistance to agentic AI, then applies evidence-informed modifiers capturing deployment signals from the labour market, industry adoption patterns, and regulatory environments. The result is a normalised 0–100 score that maps to actionable risk zones.

Every assessment is scoped to a specific role at a specific seniority level. “Software Engineer” is not assessed — “Junior Software Developer (0–2 years)” and “Senior Software Engineer (7+ years)” are assessed separately. They land in different zones.

1.1 Key Assumptions

Every scoring framework rests on assumptions. AIJRI is explicit about its own:

Sub-AGI AI. The methodology assumes increasingly capable, agentic AI — not artificial general intelligence (AGI). If AGI arrives, the question changes from “what can AI do?” to “what should AI be allowed to do?” and the entire framework would need reconsidering.
3–5 year horizon. Scores reflect near-to-medium term displacement risk based on current AI capability trajectories. They are not predictions about 2035 or beyond.
Occupations, not individuals. AIJRI assesses roles as categories. A highly skilled practitioner in a Red-zone role may possess niche expertise not captured by the aggregate score. Conversely, a weak practitioner in a Green-zone role may face more personal risk than the score suggests.
Western labour markets. Evidence sources are predominantly US/UK-centric (BLS, O*NET, Indeed, LinkedIn). Scores may not transfer to markets with different regulatory frameworks, union structures, or technology adoption rates. The core AI capability assessment (can AI do these tasks?) is largely geography-independent, but the modifiers — especially barriers and evidence — reflect Western conditions.
Current regulatory environment. AIJRI models existing barriers (licensing, regulation, liability frameworks) but does not predict future AI legislation. Material regulatory changes (EU AI Act enforcement, potential US regulation) could shift scores significantly.
No robotics discontinuity. Physical trades score highly partly because humanoid robotics has not achieved dexterity in unstructured environments. This is explicitly temporal — if embodied AI achieves breakthrough capability, trades scores would require revision.
Adaptation is variable. Many Yellow-zone assessments carry an implicit assumption about whether the practitioner adapts to AI tools. A worker who actively integrates AI into their workflow faces less risk than one who resists. Scores generally assume a typical level of adaptation for the role’s seniority level.
Task decomposition is representative. Each role is broken into 5–10 weighted constituent tasks. Different employers may structure the same role title differently, shifting which tasks dominate and potentially changing the score.
Current AI capability trajectory continues. Scores assume continued incremental progress in current AI modalities (language models, vision, code generation, agentic orchestration). A discontinuous capability leap — or a sustained plateau — would invalidate many assessments.
Displacement, not creation. AIJRI models displacement of existing roles. It does not model the creation of new roles that emerge because of AI deployment [1]. This means the framework has an inherently conservative bias — it captures the risk side of the ledger, not the opportunity side.

These assumptions are revisited in the Limitations section, where their practical consequences for score interpretation are discussed.

2. Theoretical Foundations & Related Work

2.1 Task-Based Approaches

The task-based approach to occupational analysis originates with Autor, Levy, and Murnane’s framework for categorising routine versus non-routine work [2]. This was subsequently extended by Frey and Osborne, who estimated the probability of computerisation for 702 occupations using a Gaussian process classifier trained on O*NET task descriptions [8]. While influential, the Frey-Osborne model predates the emergence of large language models and does not account for the non-routine cognitive tasks now demonstrably performed by AI systems.

More recent work by Eloundou et al. introduced the concept of “exposure” to LLMs at the task level [6], distinguishing between direct exposure and LLM-augmented exposure. AIJRI builds on this decomposition but adds a critical dimension: evidence of real-world deployment.

Massenkoff and McCrory [18] extended this by introducing Observed Exposure — a measure combining Eloundou et al.’s theoretical capability scores with actual AI usage data from Anthropic’s Claude platform. Their findings provide direct empirical validation of AIJRI’s approach: the four occupations with highest observed exposure (computer programmers 74.5%, customer service representatives 70.1%, data entry keyers 67.7%, medical records specialists 66.7%) all fall in AIJRI’s Red Zone. Critically, they quantify the gap between theoretical capability and actual deployment — Computer & Math occupations are 94% theoretically exposed but only 33% covered in practice — confirming that task analysis alone systematically overstates displacement risk. Their weighting of fully automated use at 100% and augmentative use at 50% mirrors AIJRI’s displacement vs augmentation classification in Step 2.

2.2 The Harvard Critique

Lichtinger and Hosseini Maasoum [10] argue that pure task analysis is speculative without market evidence. The fundamental problem: technical capability ≠ actual deployment. An AI might be able to do a task but never be permitted or deployed to do it. AIJRI’s Evidence Score (Step 3) and Barrier Assessment (Step 4) directly address this critique.

2.3 Augmentation vs Displacement

Acemoglu and Restrepo [1] model the tension between automation (displacing workers from existing tasks) and the creation of new tasks where labour has a comparative advantage. Brynjolfsson et al. [3][4] demonstrate that “exposure ≠ displacement” — high-exposure, high-complementarity roles saw increased labour demand [17].

The nurse paradox illustrates this: among the earliest and most active healthcare AI adopters (ambient AI documentation, clinical decision support, predictive analytics), yet one of the most AI-resistant roles in the economy (AIJRI 82.2, Green Stable). In AIJRI, every task is classified as DISPLACEMENT, AUGMENTATION, or NOT INVOLVED — not just “exposed.”

2.4 Seniority Divergence

Stanford research [4] found that workers aged 22–25 in AI-exposed roles saw −13% employment since 2022, while older workers in the same occupations grew 6–9%. Harvard [10] frames this as “seniority-biased technological change.”

Table 1: Seniority divergence examples from the AIJRI corpus
Job Family	Junior / Entry Level	Senior / Strategic Level
Software Developer	RED (9.3)	GREEN (55.4)
Accountant	RED (bookkeeper)	YELLOW (advisory, 47.3)
SOC Analyst	RED IMMINENT (T1, 5.4)	GREEN (CISO, 83.0)
Cloud	YELLOW (cloud engineer, 25.3)	GREEN (security architect, 62.7)

2.5 Scoring Model Design

The UN Human Development Index switched from arithmetic mean to geometric mean in 2010 [13] to reduce substitutability between dimensions — preventing high income from fully compensating for poor health or education. CVSS [7] uses multiplicative elements within sub-score calculations, creating a non-compensatory structure where weakness in key dimensions dominates the output. This property — where weakness cannot be hidden by strength elsewhere — inspired AIJRI’s fully multiplicative composite.

3. Data Sources & Collection Infrastructure

The infrastructure was purpose-built to ground every assessment in verifiable market evidence rather than theoretical speculation.

Table 2: AIJRI data sources and evidence types
Source Category	Primary Data	What It Captures
Labour market	Indeed, LinkedIn, Glassdoor, Google Jobs, USAJobs, Reed	Job posting trends, salary data, skill requirements
Company actions	79 curated RSS feeds, earnings calls, press releases	Hiring/layoff decisions citing AI, restructuring signals
AI tool maturity	Systematic product review, vendor documentation, Anthropic Economic Index [18]	Commercial AI tools targeting each role’s core tasks, cross-referenced with observed AI usage data at the task level
Academic research	Semantic Scholar (220M+ papers), industry reports	Expert consensus on displacement/augmentation
Regulatory	Licensing boards, EU AI Act, union agreements	Structural barriers to AI deployment
Occupational data	BLS, O*NET, Glassdoor salary percentiles	Baseline employment, task descriptions, wages

Trust hierarchy (strict order): 1) Official sources (BLS, O*NET, government agencies), 2) Platform data (direct API returns from job boards), 3) Curated research (peer-reviewed papers), 4) Web intelligence (news, earnings calls — always validated against higher tiers).

3.1 Research Infrastructure

Evidence is gathered through a purpose-built multi-engine research system that runs parallel queries across multiple search and analysis platforms. Each assessment draws from some or all of the following tiers, depending on the complexity and sensitivity of the role:

Table 2b: Research engine tiers used in evidence collection
Tier	Engines	What It Provides
Quick	Web search + BrightData SERP	Fact checking, headline verification, breaking news
Wide	Perplexity Sonar + Gemini Flash + Tavily	Three independent search engines run in parallel for cross-validated general research
Deep	Perplexity Sonar Pro + Gemini Pro + Tavily Advanced	Upgraded models for complex multi-part queries requiring synthesis
Ultra	Perplexity Deep Research and/or Gemini Deep Research Agent	Agentic research systems that autonomously plan and execute multi-step investigations across 100+ sources over 2–5 minutes
Scholarly	Semantic Scholar (220M+ papers)	Peer-reviewed academic papers, citation networks, expert consensus
Labour market	Indeed, LinkedIn, Glassdoor, Google Jobs, USAJobs, Reed (via BrightData APIs)	Job posting counts, salary data, skill requirements, hiring trends across 7 countries
News & signals	79 curated RSS feeds + BrightData scraping	Company announcements, earnings calls, restructuring signals, layoffs citing AI

Multiple engines querying the same evidence domain reduces single-source bias. When Perplexity and Gemini return conflicting findings, the assessor investigates the discrepancy rather than averaging it — disagreement between engines is a signal, not noise.

3.2 Domain Research Layer

To prevent version drift (e.g., citing a workforce gap as “3.5M” in one assessment and “4.8M” in another), shared domain-level evidence is captured once per domain and refreshed on a 30-day cycle. Each of the 25 occupational domains maintains a research file covering: workforce and market data, salary benchmarks, production-deployed AI tools, expert consensus, regulatory landscape, and calibration anchors. Role-specific assessments inherit this baseline and only research what the domain file doesn’t cover.

3.3 Computational Verification

All numerical calculations — weighted task scores, evidence totals, barrier sums, and the final composite formula — are computed programmatically rather than by hand. The assessor enters raw scores into a calculation script that produces the modifiers, raw composite, normalised JobZone Score, and zone classification. This eliminates arithmetic errors and ensures every published score is reproducible from its inputs. The interactive calculator in Section 5.4 below implements the same formula for independent verification.

3.4 Evidence Validation — The TruthSeeker Protocol

Every factual claim cited in an assessment — layoff announcements, AI tool capabilities, workforce statistics, expert predictions — is subject to a structured fact-checking protocol adapted from professional journalism and intelligence analysis standards. The system applies methodologies from the IFCN (International Fact-Checking Network), the SIFT method (Stop, Investigate, Find, Trace) developed by Mike Caulfield, and Analysis of Competing Hypotheses (ACH) from intelligence tradecraft.

The protocol operates in 12 phases:

Phase 0–1

Emotional check & classification. Is the claim factual, opinion, satire, or prediction? Is it designed to manipulate? Claims that trigger emotional reactions receive extra scrutiny.

Phase 2–3

Existing fact-checkers & lateral reading. Check Snopes, PolitiFact, Reuters, and AP first. Then verify the source’s credibility — not just what they said, but what others say about them.

Phase 4–6

Primary source hunt, competing hypotheses, steelmanning. Trace to the original document or data. Generate alternative explanations (ACH). Find the best evidence FOR the claim before testing it — avoiding confirmation bias.

Phase 7–8

Multi-source verification & source authentication. Cross-validate across independent engines. Apply a 6-tier source credibility hierarchy from primary sources (Tier 1) to low-credibility outlets (Tier 6). Context gathering ensures claims are not technically true but misleading.

Phase 9–10

Visual verification & manipulation detection. Reverse image search for visual claims. Scan for weasel words, emotional language, cherry-picking, and false dichotomies.

Phase 11

Conflict resolution & self-review. When sources disagree, weight by credibility × independence × recency × methodology. A 22-point pre-publication checklist must pass before any claim is accepted into an assessment.

Crucially, when multiple search engines return the same underlying article, that counts as one source seen through multiple engines — not independent corroboration. True corroboration requires different organisations reaching the same conclusion independently. This distinction prevents the illusion of consensus from search result duplication.

Each verified claim receives a confidence score (0–100) and a structured verdict (TRUE, MOSTLY_TRUE, MIXED, MOSTLY_FALSE, FALSE, UNPROVEN, OUTDATED, among others). Claims that cannot be verified to a high confidence threshold are either excluded from the assessment or explicitly flagged as low-confidence evidence.

3.5 Taxonomy Cross-Validation

To verify completeness of occupational coverage, the AIJRI role database is cross-referenced against five international job classification systems:

Table 3: International taxonomy coverage
Taxonomy	Jurisdiction	Occupations	Matched
O*NET	United States (Dept of Labor)	1,016	100%
ESCO	European Union (European Commission)	3,043	100%
UK SOC 2020	United Kingdom (ONS)	412	100%
NOC	Canada (Statistics Canada)	516	100%
ANZSCO	Australia & New Zealand (ABS)	1,440	100%
Total		6,427	100%

Matching uses a three-layer approach: exact slug matching against assessment filenames, alias matching against 8,000+ registered title aliases, and keyword overlap analysis using extracted term sets. Every occupation in every taxonomy either maps directly to an assessed role or to a registered alias that resolves to one.

4. The AIJRI Methodology — The 8-Step Assessment

Each occupation is assessed through a standardised 8-step pipeline. The assessment is always scoped to a specific role at a specific seniority level.

The 8-Step AIJRI Assessment Pipeline

Step 0

Define

Scoping

→

Step 1

Screen

Scoping

→

Step 2

Tasks

Theoretical

→

Step 3

Evidence

Empirical

→

Step 4

Barriers

Empirical

→

Step 5

Growth

Empirical

→

Step 6

Composite

Calculation

→

Step 7

Commentary

Judgment

Steps 2, 3, 4, 5 feed into Step 6 (Composite formula inputs)

Step 0: Role Definition

The assessment begins with a precise role definition: job title, seniority level, primary function, what the role is NOT, and typical experience. If an occupation exists at multiple seniority levels, each is assessed separately.

Step 1: Quick Screen

Three Protective Principles (Embodied Physicality, Deep Interpersonal Connection, Goal-Setting & Moral Judgment) scored 0–3 each, plus an AI Growth Correlation scored −2 to +2. The quick screen provides rapid triage but does NOT determine the zone.

Step 2: Task Decomposition (Theoretical Layer)

The role is decomposed into 5–10 constituent tasks, each weighted by percentage of total role time. Tasks are scored for agentic AI capability — not “can AI assist?” but “can an AI agent execute this entire workflow without a human?”

Table 3: Task Automation Scoring Rubric (1–5)
Score	Automation Potential	Criteria
1	Irreducible Human	Protected by legal accountability, ethical judgment, trust/relationship, regulatory mandate
2	Barrier-Protected	Requires licensed professional judgment, high-stakes oversight, strategic thinking
3	Human-Led, AI-Accelerated	AI agents handle sub-workflows but a human leads, directs, and validates
4	Agent-Executable	Multi-step workflows an AI agent can execute end-to-end with minimal oversight
5	Fully Automatable	Deterministic, rule-based, or pattern-matching. AI agents already perform at scale.

Task Resistance Score (TRS): TRS = 6.0 − Weighted Total — the flip converts automation potential to resistance. Higher scores on the task rubric (more automatable) produce a lower TRS (less resistant). Range: 1.0–5.0.

Step 3: Evidence Score (Real-World Layer)

Five evidence dimensions, each scored −2 to +2:

Table 4: Evidence dimensions
Dimension	What It Measures
Job Posting Trends	YoY change in role-specific postings
Company Actions	Hiring/firing decisions explicitly citing AI
Wage Trends	Real-terms wage movement (inflation-adjusted)
AI Tool Maturity	Commercial AI products targeting core tasks, corroborated by Anthropic’s observed exposure metric [18]
Expert Consensus	Agreement across academics, analysts, practitioners

Evidence Score: Sum of all five. Range: −10 to +10. Each dimension has published calibration thresholds with quantitative anchors.

Step 4: Barrier Assessment

Not “what slows automation down” but what prevents AI execution even when programmatically possible. Five structural barriers, each scored 0–2:

Regulatory / Licensing

Professional licensing, EU AI Act high-risk mandates

Physical Presence

Human body required in unstructured environments

Union / Collective Bargaining

Collective agreements, AI-specific protections

Liability / Accountability

Personal legal liability AI cannot assume

Cultural / Trust

Societal resistance to AI in health, freedom, education, care

Barrier Score: Sum. Range: 0–10.

Step 5: AI Growth Correlation Check

Revisit and confirm/revise the Growth Correlation from Step 1, informed by evidence from Steps 3–4. This determines whether the role is Accelerated (demand grows with AI), Stable, or Transforming.

Step 6: Composite Scoring

Covered in full in Section 5.

Step 7: Assessor Commentary — The Honest Check

Three required sub-sections: Score vs Reality Check (does the zone label match the full picture?), What the Numbers Don’t Capture (structured blind-spot check), and Who Should Worry (and Who Shouldn’t) (plain language guidance).

5. The Composite Scoring Model

5.1 Design Rationale

The v3 composite is multiplicative and non-compensatory: all four dimensions contribute, and weakness in any dimension drags the composite down proportionally. A role with high task resistance but collapsing market evidence should NOT score Green. The market has spoken.

The AIJRI Composite Formula

Raw = TRS × E_mod × B_mod × G_mod

AIJRI Score = (Raw − 0.54) 7.93 × 100

TRS

Task Resistance

1.0–5.0

E_mod

1.0 + (Evidence × 0.04)

0.60–1.40

B_mod

1.0 + (Barriers × 0.02)

1.00–1.20

G_mod

1.0 + (Growth × 0.05)

0.90–1.10

Cap at 0 and 100 | Normalisation: min 0.54, range 7.93

5.2 Modifier Behaviour

Table 5: Modifier coefficients and design intent
Modifier	Coefficient	Range	Design Intent
Evidence	0.04/point	0.60–1.40	Most powerful. Market reality is the ultimate arbiter. ±40%.
Barriers	0.02/point	1.00–1.20	Can only help, never hurt. Absence of barriers doesn’t accelerate displacement. +20% max.
Growth	0.05/point	0.90–1.10	Modest trajectory signal. Strong enough to tip borderline cases. ±10%.

Coefficient derivation. The coefficients were set through iterative calibration against known occupations with well-understood AI exposure profiles (e.g., SOC Analyst T1, CISO, Electrician, Receptionist). Starting from equal weighting, each coefficient was adjusted until the model produced zone classifications that matched expert consensus for benchmark roles. This is calibration by inspection — standard practice in composite scoring frameworks (CVSS v3.1 base metrics were similarly set by expert working groups, not empirical derivation). We publish the exact coefficients and invite sensitivity analysis: readers can use the interactive calculator below to test how coefficient changes affect outputs.

Methodological honesty. AIJRI is a structured expert assessment methodology, not an empirical predictive model. The task resistance score (the base) reflects expert judgment about AI capability against specific tasks — informed by evidence, but fundamentally a human assessment. The modifiers then adjust this base using empirical market signals. We describe the modifiers as “evidence-informed” rather than “evidence-based” because the underlying task scoring remains a judgment call. The formula provides mathematical discipline and transparency, not mathematical certainty.

5.3 Normalisation

The normalisation uses the v3.0 range. Minimum (0.54) = TRS 1.0 × E_mod 0.6 × B_mod 1.0 × G_mod 0.9. Denominator (7.93) = v3.0 theoretical max (8.47) minus minimum (0.54). The v3.1 barrier coefficient recalibration raised the theoretical maximum to 9.24, but the denominator is deliberately not recalculated — doing so would absorb the barrier boost and no role would actually benefit. Scores above 100 are capped. Practical score range: real-world assessments produce scores from approximately 3 to 83.

5.4 Worked Examples

Three real assessments illustrate how the composite model behaves across the spectrum — a Red Imminent role where every modifier compounds downward, a Yellow Moderate role where resistant tasks meet a declining market, and a Green Stable role where all modifiers reinforce the base.

Example 1: SOC Analyst Tier 1 (Entry-Level)

Full pipeline walkthrough — Red Imminent

Step 2: Task Decomposition

Task	Time	Score	Wtd
Monitor alerts/dashboards	30%	5	1.50
Triage alerts (true/false positive)	25%	5	1.25
Follow incident playbooks	20%	5	1.00
Write/update tickets	15%	5	0.75
Escalate to L2/L3	10%	3	0.30

Raw TRS: 6.00 − 4.80 = 1.20 → adjusted to 1.55/5.0

Steps 3–5: Modifiers

Input	Raw	Modifier
Evidence	−8/10	0.68
Barriers	1/10	1.02
Growth	−2/2	0.90

Evidence: CrowdStrike cut 500 citing AI. Carvana: 100% T1 alerts now AI-handled. Multiple vendors market “AI SOC Analyst” as the product.

Step 6: Composite Calculation

Raw = 1.55 × 0.68 × 1.02 × 0.90 = 0.968

JobZone Score = (0.968 − 0.54) / 7.93 × 100 = 5.4

Sub-label: TRS 1.55 < 1.8 AND Evidence −8 ≤ −6 AND Barriers 1 ≤ 2 → Red (Imminent)

Example 2: Engineering Manager (Mid-Level)

Resistant tasks, declining market — Yellow Moderate

Inputs

Task Resistance Score	3.75/5.0
Evidence Score	−3/10
Barrier Score	2/10
AI Growth Correlation	−1/2

Modifiers

Evidence	1.0 + (−3 × 0.04) = 0.88
Barrier	1.0 + (2 × 0.02) = 1.04
Growth	1.0 + (−1 × 0.05) = 0.95

Composite Calculation

Raw = 3.75 × 0.88 × 1.04 × 0.95 = 3.26

JobZone Score = (3.26 − 0.54) / 7.93 × 100 = 34.3

Combined modifier effect: 0.88 × 1.04 × 0.95 = 0.869 — base score cut 13%. Core tasks (people management, judgment) ARE hard to automate, but teams are shrinking and AI is eliminating middle management layers. The composite captures both: resistant tasks in a declining market.

Example 3: Electrician (Mid-Level)

All modifiers reinforce the base — Green Stable

Inputs

Task Resistance Score	4.10/5.0
Evidence Score	+10/10
Barrier Score	9/10
AI Growth Correlation	+1/2

Modifiers

Evidence	1.0 + (10 × 0.04) = 1.40
Barrier	1.0 + (9 × 0.02) = 1.18
Growth	1.0 + (1 × 0.05) = 1.05

Composite Calculation

Raw = 4.10 × 1.40 × 1.18 × 1.05 = 7.11

JobZone Score = (7.11 − 0.54) / 7.93 × 100 = 82.9

Every modifier reinforces the base. Compare to Janitor (AIJRI 44.2) — same physical work category, similar task resistance, but evidence −2, barriers 3/10, growth 0. The composite reveals what a simpler model would hide: physically protected roles are NOT equally safe.

Verify It Yourself

Enter any values to see the composite formula in action. Try replicating the worked examples above, or experiment with your own inputs.

Task Resistance (1–5)

Evidence (−10 to +10)

Barriers (0–10)

Growth (−2 to +2)

6. Zone Classification & Sub-Labels

6.1 The Three Zones

AIJRI Score	Zone	Meaning	Time Horizon
48–100	GREEN	Role is protected or growing	Safe for 5+ years
25–47	YELLOW	Role is transforming	Adapt within 2–7 years
0–24	RED	Role is being displaced	Act now

AIJRI Zone Classification (0–100)

SOC T1

(5.4)

Junior Dev

(9.3)

Graphic Designer

(16.5)

Pen Tester

(35.6)

Senior SW Eng

(55.4)

Nurse

(82.2)

CISO

(83)

0 25 48 100

RED

YELLOW

GREEN

Practical range: ~3 to ~83

SOC T15.4

Junior Dev9.3

Graphic Designer16.5

Pen Tester35.6

Senior SW Eng55.4

Nurse82.2

CISO83

6.2 The 7-Tier Sub-Label System

Table 6: Sub-label determination rules
Label	Determination
Green (Accelerated)	AIJRI ≥ 48 AND Growth Correlation = +2
Green (Stable)	AIJRI ≥ 48 AND <20% of task time scores 3+
Green (Transforming)	AIJRI ≥ 48 AND ≥20% of task time scores 3+
Yellow (Moderate)	AIJRI 25–47 AND <40% of task time scores 3+
Yellow (Urgent)	AIJRI 25–47 AND ≥40% of task time scores 3+
Red	AIJRI <25 AND (TRS ≥ 1.8 OR Evidence > −6 OR Barriers > 2)
Red (Imminent)	AIJRI <25 AND TRS < 1.8 AND Evidence ≤ −6 AND Barriers ≤ 2

6.3 Calibration Anchors

Table 7: Calibration anchor roles
Role	TRS	Evidence	Barriers	Growth	AIJRI	Zone
CISO (Executive)	4.25	+9	6	+2	83.0	Green Accel.
Electrician	4.10	+10	9	+1	82.9	Green Stable
Registered Nurse	4.40	+9	9	0	82.2	Green Stable
Senior SW Engineer	3.95	+5	2	0	55.4	Green Trans.
Pen Tester (Mid)	2.80	+1	5	+1	35.6	Yellow Urg.
Graphic Designer	2.65	−7	1	−1	16.5	Red
Junior SW Developer	2.10	−9	0	−1	9.3	Red
SOC Analyst T1	1.55	−8	1	−2	5.4	Red Imm.

7. Results & Distribution

The AIJRI corpus covers occupations across 28 domains and over 194 specialisms, assessed on a rolling basis with 100% US workforce coverage by employment volume. Browse the full corpus →

Notable Findings

The Nurse Paradox: Registered Nurses score 82.2 (Green Stable) despite being among the earliest and most active healthcare AI adopters. AI augments nursing tasks — ambient documentation, clinical decision support, predictive analytics — without substituting for the nurse.

Physical Trades Resilience: Electricians (82.9), plumbers, and similar skilled trades in unstructured environments score among the highest in the corpus. Moravec’s Paradox provides decades of protection based on current robotics trajectories.

The AI-Accelerated Zone: A distinct cluster scores Green because of AI growth: AI Security Engineer, CISO, AI Governance Lead, ML/AI Engineer. You can’t fully automate securing AI because the attack surface IS AI.

The Yellow Majority: Yellow is consistently the largest zone. The dominant near-term impact of AI is transformation, not outright displacement. Most workers face a changing role, not an eliminated one.

7.1 International Workforce Estimates

AIJRI assessments are built on US labour market data — BLS employment projections, O*NET task descriptions, and US-centric job posting signals. The Data Monitor extends these zone distributions to 8 additional countries and regions using a sector-weighted adjustment model. This section documents the methodology and its limitations.

Rationale

Approximately 85% of the 3,649 assessed roles fall within services and knowledge-work occupations. Countries whose workforce is more heavily concentrated in agriculture and heavy manufacturing — sectors with physical, environmental, and spatial complexity that resists near-term automation — should logically have a larger share of workers in the “green” (protected) zone. A country with 27% agricultural employment (the global average) has far more workers in sectors that AIJRI domain scores rate overwhelmingly GREEN (Agriculture: 6% RED, Trades: 5% RED) compared to the US (1.6% agriculture). This is consistent with IMF estimates that low-income countries face roughly half the AI displacement risk of advanced economies.

Data Source

Sector employment shares are drawn from the World Bank Open Data platform, indicators SL.AGR.EMPL.ZS (agriculture), SL.IND.EMPL.ZS (industry), and SL.SRV.EMPL.ZS (services). These are ILO modelled estimates (November 2025 model release), not direct survey observations. For most high-income countries, the modelled values closely track national statistics; for developing economies, the model may diverge from ground truth.

Country	Agriculture %	Industry %	Services %	Confidence
US	1.6	19.0	79.4	Direct (BLS)
UK	0.9	16.1	83.0	High
Canada	1.1	18.9	79.9	High
Australia	2.2	19.4	78.4	High
EU	3.3	24.2	72.5	Moderate
Germany	1.1	26.3	72.6	Moderate
Japan	2.9	23.3	73.8	Moderate
South Korea	5.2	23.7	71.1	Moderate
Global	26.1	23.6	50.3	Low

Adjustment Formula

The US serves as the baseline (79.4% services employment). For each country, a services delta adjustment shifts zone percentages proportionally:

servicesDelta = 79.4 − country.servicesPct

greenAdjust = servicesDelta × 0.2

adjustedGreen = baseGreen + greenAdjust

adjustedRed = baseRed − greenAdjust

adjustedYellow = 100 − adjustedGreen − adjustedRed

The 0.2 factor means that 20% of the services delta shifts between the green and red zones. This is deliberately conservative — a larger factor would overstate the adjustment given the coarseness of the sector-level proxy. The base green and yellow percentages for each country are derived from IMF, OECD, PwC, and ILO labour market estimates.

Confidence Tiers

Each country is assigned a confidence tier based on the structural similarity of its economy to the US (measured by services employment share):

Direct (US only) — Actual per-role BLS employment data. No extrapolation.
High (services within 5pp of US: UK, Canada, Australia) — Similar economic structure. Adjustment is <1 percentage point. Zone estimates are likely close to what per-role assessment would produce.
Moderate (services 5–15pp below US: EU, Germany, Japan, South Korea) — More industrial economies. Adjustment is 1–2pp. Zone estimates are reasonable but less certain.
Low (services >15pp below US: Global) — Very different economic structure. Adjustment is ~6pp. Zone estimates are indicative only.

Limitations of This Approach

Sector-level proxy, not role-level assessment — The adjustment assumes that unassessed sectors (agriculture, heavy industry) would contribute more green-zone workers, based on their physical and environmental complexity. This aligns with AIJRI domain scores (Agriculture: 6% RED, Trades: 5% RED) and IMF exposure estimates, but is not verified by per-role assessment in those countries.
Flat factor across all countries — The 0.2 adjustment factor does not vary by country. In practice, the composition of “services” differs significantly (e.g., Japan’s services sector includes a larger share of retail and hospitality than the US).
ILO modelled estimates — The sector data are econometric model outputs, not direct survey measurements. They are updated annually and may lag structural changes.
No within-sector variation — Two countries with identical services percentages may have very different AI exposure if one’s services sector is dominated by finance (high exposure) and the other by healthcare (lower exposure).
Base zone percentages are themselves estimates — The starting green/yellow/red percentages for non-US countries come from IMF, OECD, and ILO estimates, not from per-role AIJRI assessments.

For these reasons, the Data Monitor displays a confidence badge and sector breakdown for every country, so users can assess for themselves how much weight to place on non-US estimates.

7.2 US State-Level Estimates

The Policymaker Briefing and Data Monitor break down AI displacement risk by US state. These are estimates, not direct per-state assessments. This section documents how they are constructed.

Data Source

State employment counts come from the BLS Occupational Employment and Wage Statistics (OEWS) survey, May 2024 release (state_M2024_dl.xlsx). OEWS reports the number of jobs per Standard Occupational Classification (SOC) code in each of the 50 states plus the District of Columbia.

SOC-to-zone mapping — Each of the ~830 SOC occupation codes is mapped to the AIJRI zone classification (Green, Yellow, or Red) of its corresponding JobZone role assessment. Where a single SOC code maps to multiple JobZone assessments (e.g., senior and junior variants), the primary assessment’s zone is used.
State aggregation — For each state, the employment count of every SOC code is assigned to its mapped zone. The state’s Green, Yellow, and Red totals are the sum of all employment in SOC codes classified in that zone.
Percentage calculation — Zone percentages are calculated as (zone employment / total state employment) × 100. These represent the share of jobs (not individuals) in each zone.

This process achieves 99.6% coverage — 838 of the ~840 SOC codes in the OEWS dataset are matched to an AIJRI zone classification. The remaining codes are excluded from the totals.

What State Differences Reflect

States vary in their zone distributions primarily because of occupational mix, not because the same job is riskier in one state than another. A software developer scores the same AIJRI regardless of whether they work in California or Iowa. The difference is that California has a proportionally larger share of knowledge-work occupations (many of which fall in Yellow or Red zones), while states with higher shares of skilled trades, healthcare, and agriculture workers tend to show more Green.

Limitations

Job counts, not people — OEWS counts jobs (positions), not individuals. A person holding two jobs is counted twice. This is consistent with how BLS reports employment but means state totals slightly overcount the number of affected workers.
National zone applied locally — The AIJRI assessment is national (predominantly US-focused). A role’s AI displacement risk may differ at the state level due to local regulation, industry concentration, or adoption rates, but these state-specific factors are not captured.
SOC granularity limits — SOC codes are broad categories (e.g., “Software Developers” covers everything from web developers to embedded systems engineers). Within a single SOC code, actual AI exposure can vary substantially. The zone assignment reflects the most representative assessment.
Self-employment excluded — OEWS covers payroll employment only. Self-employed workers, independent contractors, and gig workers are not included. States with large gig economies (e.g., California, Florida) may have understated totals.

Despite these limitations, the state estimates provide a useful approximation of how AI displacement risk is distributed geographically. The underlying data (BLS OEWS) is the most comprehensive and reliable source of state-level occupational employment available.

7.3 International Regional Estimates

The Data Monitor displays choropleth maps for subnational regions within each tracked country (UK regions, German states, Japanese prefectures, Canadian provinces, Australian states, South Korean provinces, and EU member states). These regional breakdowns are estimates derived from national-level data. This section documents how they are constructed.

Data Sources

Regional workforce totals come from each country’s official statistics agency:

UK: Office for National Statistics (ONS) — 12 ITL1/NUTS1 regions
Germany: Destatis — 16 Bundesländer
Japan: Statistics Bureau Japan — 47 prefectures
Canada: Statistics Canada — 13 provinces and territories
Australia: ABS — 8 states and territories
South Korea: KOSIS — 17 administrative divisions
EU: Eurostat workforce totals; World Bank/ILO services employment percentages (indicator SL.SRV.EMPL.ZS)

Methodology

For EU member states, the same services-delta formula described in Section 7.1 is applied individually to each country using its World Bank/ILO services employment percentage. Countries with lower services employment (e.g., Romania at 47%) show larger GREEN zones and smaller RED zones than high-services economies (e.g., Luxembourg at 88%).

For within-country regions (UK, Germany, Japan, Canada, Australia, South Korea), the method uses three steps:

National baseline: The country’s zone percentages from the global dataset serve as the target national average.
Regional variation: Regions with higher services-sector concentration (typically capital and financial cities) receive higher RED percentages. Regions with more agriculture, manufacturing, or resource-extraction employment receive lower RED and higher GREEN percentages. This reflects the same logic as the country-level adjustment: sectors with physical, environmental, and spatial complexity resist near-term automation.
Spread calibration: The magnitude of regional variation is calibrated against the observed spread in US state-level data (the only ground-truth within-country data available). US states show approximately a 6 percentage-point spread in RED zone percentages. International regional estimates are compressed to produce similar spreads.

What Regional Differences Reflect

Regional variation in zone percentages reflects differences in economic structure, not differences in how the same job is scored. A financial analyst in London scores the same AIJRI as one in Edinburgh. The difference is that London has a proportionally larger concentration of financial services, professional services, and administrative roles — occupations that tend to fall in the RED zone — while regions with more agriculture, manufacturing, and healthcare employment tend to show more GREEN.

Limitations

No per-region occupational data: Unlike US states (which use actual BLS occupational employment by state), international regional estimates do not have per-region occupational breakdowns mapped to AIJRI zones. The regional variation is modelled from economic structure, not observed.
Compressed variation: Real within-country variation may be larger or smaller than the ~6pp spread used. Countries with greater regional economic inequality (e.g., Japan, South Korea) likely have wider spreads in practice.
Urban/rural proxy: The method assumes that services concentration is the primary driver of regional AI exposure differences. Other factors — regional regulation, industry-specific AI adoption rates, workforce demographics — are not modelled.
EU member states as “regions”: The EU choropleth treats sovereign nations as regions within a bloc. Individual EU countries have far greater economic diversity than subnational regions, making per-country estimates less precise than the label suggests.

For these reasons, regional choropleth maps should be interpreted as indicative of direction (which regions face more or less exposure) rather than precise estimates of absolute levels. The Data Monitor displays data caveats alongside each regional map, plus contextual explanations (“Why does this breakdown look like this?”) for every country and for individual US states, EU countries, and UK regions — highlighting the specific industries and economic structures that drive each area’s zone distribution.

8. Limitations & Future Work

8.1 Known Limitations

We list these not as disclaimers buried in fine print, but as genuine constraints that users and critics should weigh when interpreting scores. If AIJRI has weaknesses, we would rather identify them ourselves than have them discovered by others.

No inter-rater reliability data — All assessments are produced by the same assessor system. No independent assessors have scored the same role to test whether two people would reach the same result. Without inter-rater reliability data, we can claim internal consistency but not proven reproducibility. We publish the full methodology and rubrics specifically to enable independent replication and invite researchers to test this.
Assessor override introduces subjectivity — The methodology permits adjustments of ±5 points on the composite (or equivalent at the task level) with documented rationale. In the SOC T1 worked example, the raw TRS of 1.20 was adjusted to 1.55 — a 29% increase. This is a deliberate design choice: purely formulaic scoring produces absurd edge cases that human judgment must correct. But it means the formula is advisory, not deterministic. Every override is documented, but the mechanism exists.
Task scoring is inherently subjective — The 1–5 scale for “can an AI agent execute this task?” is a judgment call. Two reasonable assessors could disagree by 1 point per task. Across 5–7 weighted tasks, this compounds into TRS differences of 0.5–1.0, translating to roughly 6–12 points on the final score — enough to shift a zone boundary. The rubric with worked examples constrains this, but does not eliminate it.
Evidence dimensions are coarse — Five dimensions scored −2 to +2 (a 5-point integer scale) compress complex labour market dynamics into integers. The trade-off is deliberate: finer granularity would create false precision that the underlying data does not support. But it means that the difference between “postings declining slightly” (−1) and “postings collapsing” (−2) is a single point producing a 4% swing.
Coefficients were derived by calibration, not empirical measurement — The modifier coefficients (0.04, 0.02, 0.05) were set through iterative testing against benchmark roles, not derived from statistical analysis of outcomes. This is standard for composite scoring frameworks but means the relative weighting of evidence vs barriers vs growth reflects design judgment, not observed data.
Occupations, not individuals — AIJRI assesses roles as categories. A highly skilled practitioner in a Red-zone role may possess niche expertise not captured by the aggregate score.
Evidence signals lag — The trust hierarchy correctly places official sources (BLS, O*NET) at the top for factual claims, but these are lagging indicators for AI deployment — they will not reflect displacement until years after it occurs. For AI tool maturity, the most informative sources are vendor documentation and tech journalism, which carry lower weight in the trust hierarchy.
Score validity is temporal — Each assessment reflects conditions at the date of scoring. AI capability changes quarter by quarter. We target a 6-month refresh cycle for high-volatility roles (Red, Yellow Urgent) and 12 months for stable roles (Green), but published scores may become stale between refresh cycles.
Normalisation drift — The normalisation constants (0.54, 7.93) are frozen from v3.0. As the methodology evolves and coefficients are recalibrated, these may require updating, which would shift all published scores.
Regulatory interventions not modelled — AI safety legislation (EU AI Act, potential US regulation) could materially alter displacement timelines. AIJRI models current barriers, not future policy.
Second-order effects — AIJRI does not model new role creation from AI deployment, only displacement of existing roles.
Physical barriers are temporal — Robotics is eroding physical barriers. Scores for trades reflect today’s reality, not the trajectory of embodied AI.
LLM hallucination risk — Assessments are generated using large language models processing millions of tokens of evidence per role. Despite the TruthSeeker validation protocol, LLMs can fabricate plausible-sounding citations or statistics. Individual data points should be treated as best-effort synthesis, not independently verified fact.
Western labour market bias — Evidence sources are overwhelmingly US/UK-centric. Scores may not transfer to labour markets with different regulatory frameworks, union structures, or technology adoption rates. The Data Monitor’s international view uses a sector-weighted extrapolation to partially address this; see Section 7.1 for the methodology and its own limitations.
Practical score range — Real-world scores span approximately 3 to 83 out of a theoretical 0–100 range. No role currently scores above 83 or below 3, meaning roughly 17% of the top and 3% of the bottom of the scale are unused. This may indicate the normalisation could be tighter, though it also reflects that extreme combinations of inputs are rare in practice.

8.2 Purpose & Best-Efforts Notice

AIJRI was created with a simple purpose: to help people see what is coming. The AI displacement wave is already reshaping labour markets, and most workers have no structured way to assess their own exposure. Existing frameworks are either too academic to be actionable or too simplistic to be useful. AIJRI attempts to bridge that gap.

This methodology is not perfect. It is the best effort we could produce with the tools, data, and knowledge available in early 2026. We publish it openly not because we believe it is definitive, but because we believe the conversation matters more than the precision. If AIJRI provokes debate — about which roles are truly at risk, about whether our scoring weights are right, about what we’ve missed — then it has served its purpose.

We actively invite scrutiny. If our scores are wrong, show us where and we will correct them. The worst outcome is not that AIJRI is imperfect; it is that people sleepwalk into displacement because no one attempted to quantify the risk at all.

The AI Job Resistance Index (AIJRI)

AIJRI Methodology

The AI Job Resistance Index (AIJRI)

Abstract

Contents

1. Introduction

1.1 Key Assumptions

2. Theoretical Foundations & Related Work

2.1 Task-Based Approaches

2.2 The Harvard Critique

2.3 Augmentation vs Displacement

2.4 Seniority Divergence

2.5 Scoring Model Design

3. Data Sources & Collection Infrastructure

3.1 Research Infrastructure

3.2 Domain Research Layer

3.3 Computational Verification

3.4 Evidence Validation — The TruthSeeker Protocol

3.5 Taxonomy Cross-Validation

4. The AIJRI Methodology — The 8-Step Assessment

Step 0: Role Definition

Step 1: Quick Screen

Step 2: Task Decomposition (Theoretical Layer)

Step 3: Evidence Score (Real-World Layer)

Step 4: Barrier Assessment

Step 5: AI Growth Correlation Check

Step 6: Composite Scoring

Step 7: Assessor Commentary — The Honest Check

5. The Composite Scoring Model

5.1 Design Rationale

5.2 Modifier Behaviour

5.3 Normalisation

5.4 Worked Examples

Example 1: SOC Analyst Tier 1 (Entry-Level)

Example 2: Engineering Manager (Mid-Level)

Example 3: Electrician (Mid-Level)

Verify It Yourself

6. Zone Classification & Sub-Labels

6.1 The Three Zones

6.2 The 7-Tier Sub-Label System

6.3 Calibration Anchors

7. Results & Distribution

Notable Findings

7.1 International Workforce Estimates

Rationale

Data Source

Adjustment Formula

Confidence Tiers

Limitations of This Approach

7.2 US State-Level Estimates

Data Source

Methodology

What State Differences Reflect

Limitations

7.3 International Regional Estimates

Data Sources

Methodology

What Regional Differences Reflect

Limitations

8. Limitations & Future Work

8.1 Known Limitations

8.2 Purpose & Best-Efforts Notice

Legal Disclaimer

References