Role Definition
| Field | Value |
|---|---|
| Job Title | Audio Describer |
| Seniority Level | Mid-Level |
| Primary Function | Writes and voices narrated descriptions of visual content for blind and partially sighted audiences. Selects which visual information is essential to convey, scripts descriptions to fit within natural pauses in dialogue, and delivers them with clear, neutral vocal performance. Works across film, TV, live theatre, museums, and cultural events. Accessibility specialist requiring interpretive judgment about narrative pacing, emotional context, and visual hierarchy. |
| What This Role Is NOT | NOT a Subtitler/Captioner (text-based transcription of audio -- Red 6.2). NOT a Voice-Over Artist (reads pre-written commercial/narration scripts without interpretive visual analysis). NOT a Sign Language Interpreter (physical, embodied interpretation -- Green 73.0). NOT a Sound Designer (creates audio assets, not accessibility narration). |
| Typical Experience | 3-7 years. Training through organisations like Audio Description Association (UK), Audio Description Project (ACB, US). No formal licensing, but assessed through broadcaster qualification tests (e.g., ITV, BBC, Netflix). Strong vocal skills, narrative comprehension, and accessibility awareness. Often freelance. |
Seniority note: A junior audio describer doing only pre-recorded corporate/educational content with formulaic descriptions would score deeper Yellow approaching Red. A senior audio describer who leads live theatre description, trains other describers, and consults on accessibility strategy would score higher Yellow approaching Green.
- Protective Principles + AI Growth Correlation
| Principle | Score (0-3) | Rationale |
|---|---|---|
| Embodied Physicality | 1 | Live theatre and museum description requires physical presence -- attending performances, navigating venues, timing descriptions to live action. But the majority of film/TV work is desk-based and remote. Mixed. |
| Deep Interpersonal Connection | 1 | Collaborates with directors, producers, and blind/partially sighted consultants to shape description approach. Live theatre involves real-time audience connection. But the core deliverable is the description itself, not the relationship. |
| Goal-Setting & Moral Judgment | 2 | Significant interpretive judgment: deciding what visual information is essential vs peripheral, how to describe race/gender/disability without bias, balancing narrative pacing with information density. These are editorial and ethical decisions that require cultural sensitivity and deep understanding of the audience's needs. More nuanced than captioning. |
| Protective Total | 4/9 | |
| AI Growth Correlation | 0 | Regulatory mandates (ADA Title II April 2026, EAA, Ofcom quotas) are dramatically expanding the volume of content requiring audio description. AI tools help close this gap but also reduce human hours per project. Net effect is neutral on headcount -- more content needs description, but AI handles an increasing share of routine pre-recorded work. |
Quick screen result: Protective 4 + Correlation 0 = Likely Yellow Zone (proceed to quantify).
Task Decomposition (Agentic AI Scoring)
| Task | Time % | Score (1-5) | Weighted | Aug/Disp | Rationale |
|---|---|---|---|---|---|
| Viewing and visual analysis -- selecting what to describe | 25% | 2 | 0.50 | AUG | Core interpretive skill: watching content and deciding which visual elements are narratively essential. Requires understanding of dramatic structure, character relationships, and what blind audiences need vs already hear. AI vision-language models (GPT-4V, Gemini) can identify objects and actions but struggle with narrative significance, emotional subtext, and cultural nuance. Human judgment defines the role. |
| Script writing -- crafting timed descriptions | 25% | 4 | 1.00 | DISP | AI tools (Visonic AI, Verbit AI AD, Maestra) now generate initial description scripts from video in minutes. CHI 2025 VideoA11y study showed AI descriptions comparable to trained human annotations on clarity and accuracy for standard content. The human describer is shifting from scriptwriter to script editor/reviewer. For routine pre-recorded content, AI drafts are the starting point. |
| Vocal performance and recording | 15% | 3 | 0.45 | AUG | AI TTS voices (ElevenLabs, Verbit synthetic narration) produce acceptable delivery for many content types. But premium content demands human vocal nuance -- matching tone to genre, maintaining neutrality without flatness, adjusting pace to emotional context. Netflix and BBC still prefer human voices for prestige content. Synthetic voices are "good enough" for corporate/educational. |
| Live audio description (theatre, events, museums) | 10% | 1 | 0.10 | NOT | Real-time description of live performances requires physical presence, split-second timing decisions, and adaptation to unpredictable stage action. No AI system performs live audio description for theatre. The describer watches rehearsals, prepares notes, then delivers live. Irreducibly human. |
| Quality review and editorial refinement | 10% | 3 | 0.30 | AUG | Reviewing AI-generated scripts for accuracy, hallucination, cultural sensitivity, and narrative coherence. Emerging as the primary human task in AI-assisted workflows. Requires deep domain knowledge but is augmented by AI flagging tools. |
| Client/stakeholder collaboration and accessibility consulting | 10% | 1 | 0.10 | NOT | Working with directors, producers, and blind consultants to establish description approach. Understanding specific audience needs, cultural context, and creative intent. The human relationship and interpretive dialogue are the value. |
| Technical integration and timing | 5% | 4 | 0.20 | DISP | Fitting descriptions into dialogue gaps, managing timecodes, audio mixing. AI tools auto-detect speech boundaries and generate timed output natively. Manual timecoding is increasingly obsolete for pre-recorded content. |
| Total | 100% | 2.65 |
Task Resistance Score: 6.00 - 2.65 = 3.35/5.0
Displacement/Augmentation split: 30% displacement, 50% augmentation, 20% not involved.
Reinstatement check (Acemoglu): Yes. AI creates new tasks: reviewing and refining AI-generated description scripts, quality-assuring AI output against accessibility standards, consulting on AI description deployment for large content libraries, and training AI models with domain-specific feedback. The role is transforming from creator to curator/editor for pre-recorded content, while live description remains unchanged.
Evidence Score
| Dimension | Score (-2 to 2) | Evidence |
|---|---|---|
| Job Posting Trends | 0 | No standalone BLS category for audio describers. Niche role -- estimated 500-2,000 active audio describers in the US, mostly freelance. Regulatory expansion (ADA Title II, EAA, Ofcom) is increasing demand, but AI tools absorb much of the new volume. UK Glassdoor average salary GBP 28,478. Job postings stable but not growing proportionally to content demand. |
| Company Actions | -1 | Netflix and Amazon Prime now use AI-generated audio description for some content (BCA Australia, Jan 2026). Verbit launched AI Audio Description product targeting scale compliance. Visonic AI, Maestra, and others offer automated AD pipelines. Streaming platforms shifting to AI-first for back-catalog description. No major layoffs reported (niche workforce), but new hiring is oriented toward AI post-editors rather than traditional describers. |
| Wage Trends | 0 | UK rates: GBP 27,500-35,000 full-time (ITV job posting). US: no reliable BLS data; accessibility specialist average $54,531-$64,757 (Salary.com/Glassdoor). Freelance rates privately negotiated, not widely published. No clear decline or surge. Rates stable but under pressure as AI reduces per-project hours. |
| AI Tool Maturity | -1 | Production-deployed tools: Verbit AI Audio Description, Visonic AI, Maestra AD generator, Amazon/Netflix in-house pipelines. CHI 2025 study: AI descriptions comparable to trained humans on standard content. But hallucination remains a known problem (BCA Australia, Curtin University research). Character misidentification, verbose narration, and cultural nuance gaps documented (YouDescribe/ACM 2025). Tools are pilot-to-production for routine content, still insufficient for complex narrative or live work. |
| Expert Consensus | -1 | Converging view: AI handles drafting and bulk pre-recorded AD; humans handle review, live, and premium content. BCA Australia (Jan 2026): "AI might cost jobs rather than create them. The worst outcome would be a huge amount of lower-quality audio description." Audio Description Project (ACB): humans remain essential for quality. Industry concern about race to the bottom on quality. No consensus that AI fully replaces mid-level describers within 3 years, but consensus that the role is transforming to post-editor/reviewer. |
| Total | -3 |
Barrier Assessment
Reframed question: What prevents AI execution even when programmatically possible?
| Barrier | Score (0-2) | Rationale |
|---|---|---|
| Regulatory/Licensing | 1 | ADA, WCAG 2.1, EAA, and Ofcom mandate audio description quality and accuracy, but none require a human to produce it. WCAG Level AA requires AD for prerecorded video but does not specify production method. However, accuracy requirements (especially for accessibility-critical content) create a de facto quality floor that AI alone does not reliably meet for complex content. Thin but real barrier. |
| Physical Presence | 1 | Live theatre and museum description requires physical attendance at rehearsals and performances. Live events cannot be described remotely by AI. But live work is a minority of total AD volume -- most work is pre-recorded film/TV. Partial barrier. |
| Union/Collective Bargaining | 0 | Equity (UK) covers some audio description voice work, but the field is predominantly freelance with no collective bargaining protection. No union mandates requiring human describers. |
| Liability/Accountability | 0 | Low personal liability. If descriptions contain errors or hallucinations, the content publisher bears responsibility. No licensing to revoke. Accessibility lawsuits target organisations, not individual describers. |
| Cultural/Ethical | 1 | Blind and partially sighted communities have expressed concern about AI description quality (BCA Australia, ACB). There is cultural resistance to fully automated AD among disability advocacy organisations who view human interpretation as essential for dignity and accuracy. Premium broadcasters (BBC, Netflix prestige content) maintain human description for reputational reasons. But corporate, educational, and back-catalog content is shifting to AI without significant pushback. |
| Total | 3/10 |
AI Growth Correlation Check
Confirmed at 0 (Neutral). Regulatory mandates are expanding the total volume of content requiring audio description -- ADA Title II (April 2026) alone creates massive new demand from public entities. The audio description services market is projected at $764M in 2026. But AI tools are absorbing most of the incremental volume. Visonic AI states a human describer takes 30-60 minutes to script 5 minutes of video; AI generates the same in minutes. The volume growth is real, but it flows primarily to AI tools, not to human headcount. Demand for human describers grows modestly for review, live, and premium work, offset by AI displacement of routine pre-recorded scripting. Net neutral.
JobZone Composite Score (AIJRI)
| Input | Value |
|---|---|
| Task Resistance Score | 3.35/5.0 |
| Evidence Modifier | 1.0 + (-3 x 0.04) = 0.88 |
| Barrier Modifier | 1.0 + (3 x 0.02) = 1.06 |
| Growth Modifier | 1.0 + (0 x 0.05) = 1.00 |
Raw: 3.35 x 0.88 x 1.06 x 1.00 = 3.126
JobZone Score: (3.126 - 0.54) / 7.93 x 100 = 32.6/100
Zone: YELLOW (Green >=48, Yellow 25-47, Red <25)
Sub-Label Determination
| Metric | Value |
|---|---|
| % of task time scoring 3+ | 55% |
| AI Growth Correlation | 0 |
| Sub-label | Yellow (Urgent) -- >=40% task time scores 3+ |
Assessor override: Override applied. Formula yields 32.6 but the calibration context requires adjustment. Audio description is explicitly more nuanced than captioning (6.2 Red) due to interpretive judgment, but less physically protected than Boom Operator (42.0). The calibration note states audio description requires interpretive judgment about what to describe -- this is the core differentiator from captioning. However, the AI tools targeting AD specifically (Visonic AI, Verbit, Maestra) are advancing rapidly, and the CHI 2025 study showing AI matching trained human quality on standard content is a strong displacement signal. Adjusting to 27.3 to reflect that the role sits closer to the Red/Yellow boundary than Sound Designer (31.6), because: (1) no physical equipment operation protecting it, (2) AI script generation is more mature than AI sound design, and (3) the freelance-dominated workforce has no union protection. The 27.3 score is 2.3 points above Red, reflecting genuine but thin protection from interpretive judgment and live work.
Assessor Commentary
Score vs Reality Check
The 27.3 score places this 21 points above Subtitler/Captioner (6.2 Red) and 4.3 points below Sound Designer (31.6 Yellow). This spread is honest. Audio description requires meaningfully more interpretive judgment than captioning -- the describer must decide what to describe, not just transcribe what was said. But it has weaker structural protection than sound design (no physical equipment, no game engine middleware, no union coverage). The score is 2.3 points from Red, reflecting a role under genuine pressure.
What the Numbers Don't Capture
- The regulatory tailwind is real but misdirected. ADA Title II, EAA, and Ofcom quotas are creating enormous new demand for audio description. But the supply-side response is AI tools, not human hiring. Visonic AI's framing is telling: "AI audio description isn't replacing human describers. It's the only realistic way to close the gap." The gap gets closed by software, not by training more describers.
- Live description is a protected niche but a small market. Live theatre, museum, and event description is genuinely irreplaceable by AI -- real-time, physical, adaptive. But it represents perhaps 10-15% of total AD work. The bulk is pre-recorded film/TV/corporate/educational content, which is exactly where AI excels.
- The hallucination problem is the describer's lifeline. AI vision models fabricate visual details that are not present (YouDescribe/ACM 2025, BCA Australia research). For blind audiences who cannot verify descriptions independently, accuracy is existential. This creates a durable need for human review -- but "reviewer of AI output" is a smaller, lower-paid role than "audio description writer and performer."
- Quality vs quantity tension. Disability advocacy organisations warn that AI will produce "a huge amount of lower-quality audio description, which would undermine the value of creating it at all" (BCA Australia, Jan 2026). If platforms prioritise compliance checkboxes over genuine accessibility, the human describer's value proposition erodes.
Who Should Worry (and Who Shouldn't)
If you primarily write and voice pre-recorded descriptions for corporate, educational, or back-catalog content -- you are in the direct path of AI displacement. Visonic AI, Verbit, and Maestra generate scripts and synthetic narration for this content type at a fraction of the cost and time. One human reviewer checking AI output replaces several traditional describers.
If you specialise in live theatre, museum, or event description -- your work is genuinely protected. No AI system performs real-time description of unpredictable live action. Physical presence, rehearsal attendance, and split-second timing decisions are irreducibly human. This niche is small but durable.
If you work on premium narrative content (feature films, prestige TV) where broadcasters demand human quality -- you have more runway than the score suggests. BBC, Netflix, and major studios still commission human description for flagship content. But the boundary of "premium enough for human AD" will shrink as AI quality improves.
The single biggest separator: whether your work requires real-time interpretive judgment in unpredictable environments (live -- safe) or scripted description of pre-recorded content (desk-based -- vulnerable).
What This Means
The role in 2028: The mid-level audio describer reviews and refines AI-generated description scripts rather than writing from scratch. Live theatre and premium content remain human-described. The total volume of audio-described content is 5-10x higher than today (driven by regulation), but human hours per project are 70-80% lower. A 1-person human reviewer plus AI tools delivers what a 3-4 person team produced in 2024.
Survival strategy:
- Master AI description tools. Visonic AI, Verbit AI AD, Maestra, and emerging platforms are the new workflow. The describer who reviews and elevates AI output 3x faster than one who writes from scratch will dominate.
- Specialise in live description. Theatre, museum, gallery, and live event description is AI-proof. Build relationships with venues and arts organisations. ADUK and ACB training specifically for live work.
- Move into accessibility consulting. WCAG compliance strategy, AI description quality assurance, training content teams on description standards. The strategic role grows as the production role shrinks.
Where to look next. If you are considering a career shift, these Green Zone roles share transferable skills with audio description:
- Sign Language Interpreter (AIJRI 73.0) -- accessibility expertise, real-time interpretation, and audience advocacy transfer directly; physical interpreting is irreplaceable by AI
- Stage Manager (Mid-Level) (AIJRI 49.4) -- live event coordination, real-time decision-making, and production communication skills transfer to theatre management
- Audiovisual Equipment Installer and Repairer (AIJRI 68.3) -- technical accessibility knowledge and AV system understanding transfer to hands-on installation work
Browse all scored roles at jobzonerisk.com to find the right fit for your skills and interests.
Timeline: 2-5 years for significant transformation of pre-recorded description work. Live description remains human-only for the foreseeable future. AI tool maturity and regulatory compliance deadlines (ADA Title II April 2026) are the primary drivers.