Task Taxonomy

50 standardized LLM tasks with scoring parameters

Tasks represent common LLM use cases categorized by cognitive demand, economic regime, and failure profile. Each task is calibrated against a specific Artificial Analysis benchmark at a defined threshold — the capability level at which a model is estimated to succeed 50% of the time. Sigmoid steepness and anchor thresholds are estimated parameters, not empirically measured values. Tasks marked low estimation confidence should be treated as especially directional.

Tasks: 50
Avg Difficulty: 5.4
Avg Anchor Threshold: 54%
Using Benchmarks: 35
Filter by Category

Agentic Workflow Orchestration

Agentic (10 steps)

Orchestrate complex multi-step workflows: planning, tool execution, error handling, and iterative refinement.

D:9+
Agentic & Multi-StepSuccess DominatedSteeplivecodebench70%BenchmarkDirectional only

Greenfield Feature Implementation

Agentic (8 steps)

Implement a new feature across multiple files, including tests, following project conventions and architecture.

D:8+
Code & EngineeringSuccess DominatedSteeplivecodebench65%BenchmarkEst. confidence

Code Vulnerability Review

Agentic (5 steps)

Review code for security vulnerabilities (injection, auth issues, crypto flaws) and recommend fixes.

D:8+
Security & ComplianceSuccess DominatedSteeplivecodebench60%BenchmarkDirectional only

Financial Forecast from Messy Data

Agentic (6 steps)

Clean historical financial data, identify trends and seasonality, and build a forecast model with confidence intervals.

D:8+
Analytical ReasoningSuccess DominatedSteepscicode25%BenchmarkDirectional only

Research Synthesis (30 Sources)

Agentic (6 steps)

Synthesize 30 research sources into a coherent position paper with citations, identifying consensus and disagreements.

D:8+
Summarization & SynthesisSuccess DominatedSteepgpqa55%BenchmarkDirectional only

Browser Automation Agent

Agentic (8 steps)

Execute multi-step web tasks via tool use: navigation, form filling, data extraction.

D:8+
Agentic & Multi-StepSuccess DominatedSteeplivecodebench65%BenchmarkDirectional only

Data Pipeline Orchestration

Agentic (7 steps)

Agentic ETL with schema inference, transformation, error recovery, and validation.

D:8+
Agentic & Multi-StepSuccess DominatedSteeplivecodebench65%BenchmarkDirectional only

Multi-Tool Research Agent

Agentic (8 steps)

Research agent orchestrating search, calculation, and retrieval tools for complex queries.

D:8+
Agentic & Multi-StepSuccess DominatedSteeplivecodebench65%BenchmarkDirectional only

Contract Clause Extraction & Flagging

Extract key clauses from legal contracts and flag non-standard or risky terms for attorney review.

D:7+
Analytical ReasoningSuccess DominatedSigmoidgpqa40%BenchmarkDirectional only

Bug Diagnosis & Fix (Unfamiliar Codebase)

Agentic (5 steps)

Diagnose and fix bugs in an unfamiliar codebase by analyzing error messages, stack traces, and relevant code sections.

D:7+
Code & EngineeringSuccess DominatedSigmoidlivecodebench55%BenchmarkHigh confidence

SOC Alert Triage & Response

Agentic (4 steps)

Triage security operations center alerts, classify severity, correlate indicators, and recommend response actions.

D:7+
Security & ComplianceSuccess DominatedSteepgpqa50%BenchmarkDirectional only

High-Stakes Persuasive Email

Draft negotiation or persuasive emails for high-stakes external communication (investor updates, contract negotiations, crisis response).

D:7+
Content GenerationSuccess DominatedLinearmmlu_pro70%Implied DifficultyDirectional only

Root Cause Analysis

Analyze incident reports and produce ranked root causes with supporting evidence.

D:7+
Analytical ReasoningSuccess DominatedSigmoidgpqa55%BenchmarkDirectional only

Market Sizing (TAM/SAM/SOM)

Estimate Total, Serviceable, and Obtainable market sizes with stated assumptions.

D:7+
Analytical ReasoningSuccess DominatedSigmoidgpqa55%BenchmarkDirectional only

Compliance Violation Check

Flag regulation violations (GDPR, HIPAA, SOX) in documents or policies with citations.

D:7+
Security & ComplianceSuccess DominatedSigmoidgpqa50%BenchmarkDirectional only

Natural Language to SQL

Convert natural language questions into SQL queries for a known database schema.

D:6+
Code & EngineeringSuccess DominatedSigmoidlivecodebench45%BenchmarkHigh confidence

Data Visualization & Storytelling

Agentic (4 steps)

Analyze datasets, select appropriate chart types, generate visualization code, and craft narrative insights.

D:6+
Analytical ReasoningMixedLinearlivecodebench50%BenchmarkEst. confidence

RAG-Graded Answering

Generate grounded answers from multi-chunk retrieval input with inline citations.

D:6+
Summarization & SynthesisSuccess DominatedSigmoidgpqa50%BenchmarkEst. confidence

Earnings Call Synthesis

Extract key takeaways, numbers, and forward guidance from earnings call transcripts.

D:6+
Summarization & SynthesisSuccess DominatedSigmoidgpqa50%BenchmarkEst. confidence

Code Translation

Translate code from one programming language to an equivalent implementation in another.

D:6+
Code & EngineeringSuccess DominatedSigmoidlivecodebench55%BenchmarkEst. confidence

Competitive Analysis

Produce structured market and competitor breakdown with positioning, strengths, and gaps.

D:6+
Analytical ReasoningSuccess DominatedLineargpqa50%BenchmarkDirectional only

Log Anomaly Detection

Identify suspicious patterns in log streams and flag potential security incidents.

D:6+
Security & ComplianceSuccess DominatedSigmoidlivecodebench50%BenchmarkDirectional only

Product Description Translation

Translate 5,000 product descriptions from English to 5 target languages, preserving marketing tone and technical accuracy.

D:5+
Content GenerationMixedLinearmmlu_pro65%Implied DifficultyEst. confidence

Meeting Summarization + Action Items

Summarize meeting transcripts and extract action items with owners and deadlines.

D:5+
Summarization & SynthesisMixedLinearmmlu_pro60%BenchmarkEst. confidence

First-Line Support Chatbot

Agentic (3 steps)

Handle initial customer inquiries: answer FAQs, collect issue details, route to specialists, or resolve simple tickets.

D:5+
Customer-Facing & ConversationalMixedSigmoidifbench55%BenchmarkEst. confidence

Resume Screening & JD Matching

Score and rank candidate resumes against job descriptions, flagging key qualifications and gaps.

D:5+
Analytical ReasoningMixedLinearmmlu_pro60%Implied DifficultyDirectional only

Blog Draft from Outline

Create long-form blog draft from an outline and research notes, maintaining consistent voice.

D:5+
Content GenerationMixedLinearmmlu_pro60%BenchmarkEst. confidence

Podcast Transcript Summary

Summarize long podcast/video transcripts into structured summaries with timestamps and key quotes.

D:5+
Summarization & SynthesisMixedLinearmmlu_pro55%BenchmarkEst. confidence

Regex Pattern Generation

Generate tested regex patterns from natural language descriptions with edge case handling.

D:5+
Code & EngineeringSuccess DominatedSigmoidlivecodebench45%BenchmarkHigh confidence

Unit Test Generation

Generate comprehensive unit tests for functions/classes including edge cases and mocks.

D:5+
Code & EngineeringSuccess DominatedSigmoidlivecodebench50%BenchmarkHigh confidence

API Documentation Drafting

Generate user-facing API documentation from code with examples and type signatures.

D:5+
Code & EngineeringMixedLinearlivecodebench50%BenchmarkEst. confidence

Phishing Email Detection

Classify emails as phishing attempts with reasoning and confidence indicators.

D:5+
Security & ComplianceSuccess DominatedSigmoidgpqa50%BenchmarkEst. confidence

Sales Assistant Chat

Agentic (4 steps)

Qualify leads, answer product questions, and guide prospects through consideration phase.

D:5+
Customer-Facing & ConversationalMixedSigmoidifbench55%BenchmarkEst. confidence

Escalation Router

Analyze conversation transcripts and decide if/how to escalate with reasoning.

D:5+
Customer-Facing & ConversationalSuccess DominatedSigmoidifbench55%BenchmarkEst. confidence

PII Redaction

Identify and redact personally identifiable information (names, SSNs, addresses, etc.) across a document corpus.

D:4+
Classification & ExtractionSuccess DominatedSigmoidifbench65%Implied DifficultyEst. confidence

Invoice/Receipt Field Extraction

Extract structured data (vendor, date, line items, totals, tax) from scanned or digital invoices and receipts.

D:4+
Classification & ExtractionMixedSigmoidifbench60%BenchmarkHigh confidence

Weekly Highlights Email Draft

Generate executive summary emails from dashboard metrics, highlighting key changes and anomalies.

D:4+
Summarization & SynthesisVolume DominatedFlatmmlu_pro55%BenchmarkEst. confidence

Toxicity Detection

Flag harmful, abusive, or inappropriate content with severity tier classification.

D:4+
Classification & ExtractionSuccess DominatedSigmoidifbench55%Implied DifficultyEst. confidence

Marketing Copy Variants

Generate 5 ad copy variants from a product brief with different angles, tones, and CTAs.

D:4+
Content GenerationMixedLinearmmlu_pro55%Implied DifficultyDirectional only

Product Description Writing

Write SEO-aware product page copy with feature highlights, benefits, and specifications.

D:4+
Content GenerationMixedLinearmmlu_pro55%Implied DifficultyEst. confidence

Email Template Generation

Create drip campaign email sequence drafts with personalization slots and clear CTAs.

D:4+
Content GenerationMixedLinearmmlu_pro55%Implied DifficultyEst. confidence

Document Q&A (Short Answer)

Answer a specific question from a single document with a concise, grounded response.

D:4+
Summarization & SynthesisMixedLineargpqa45%BenchmarkHigh confidence

Interactive Onboarding Guide

Agentic (5 steps)

Guide new users through product onboarding with contextual help and progress tracking.

D:4+
Customer-Facing & ConversationalMixedLinearifbench50%BenchmarkEst. confidence

Email Categorization

Classify 50,000 emails into categories (spam, promotional, personal, work) for inbox organization.

D:3+
Classification & ExtractionVolume DominatedFlatmmlu_pro50%Implied DifficultyHigh confidence

Support Ticket Routing

Route incoming support tickets to the correct team queue based on content analysis.

D:3+
Classification & ExtractionVolume DominatedFlatmmlu_pro50%Implied DifficultyHigh confidence

Named Entity Extraction

Extract people, organizations, locations, dates, and other named entities from text.

D:3+
Classification & ExtractionVolume DominatedFlatifbench50%Implied DifficultyHigh confidence

Social Post Drafting

Draft platform-specific social media posts (LinkedIn, Twitter/X, Instagram) with appropriate tone and formatting.

D:3+
Content GenerationVolume DominatedFlatmmlu_pro50%Implied DifficultyDirectional only

Sentiment Analysis - Product Reviews

Classify sentiment (positive/negative/neutral) for 10,000 customer product reviews with optional aspect extraction.

D:2+
Classification & ExtractionVolume DominatedFlatmmlu_pro55%Implied DifficultyHigh confidence

Intent Classification

Classify user queries into intent labels from a fixed taxonomy for routing or response selection.

D:2+
Classification & ExtractionVolume DominatedFlatifbench45%Implied DifficultyHigh confidence

Language Detection

Identify the language(s) present in text, including mixed-language content detection.

D:1+
Classification & ExtractionVolume DominatedFlatmmlu_pro40%Implied DifficultyHigh confidence