Client Documents

📁

Drag & drop files here or click to browse

Maximum 50 MB per file

PDF DOCX DOC XLSX PPTX PNG JPG TXT CSV XML/BPMN

Uploaded Files

Files already uploaded to this session. These will be processed when you run the pipeline.

Upload client documents in the Upload tab, then run the full AI analysis pipeline below. The pipeline ingests your documents, extracts the organizational structure, standardizes tasks to ESCO, scores AI exposure using the Anthropic Economic Index, and generates all output reports.

Recommended 🚀

Full Analysis

Complete end-to-end pipeline — ingest, extract org structure, ESCO standardization, AI scoring, and all output reports.

Ingest Extract ESCO Score Generate

🔍

Extract Only

Ingest documents and extract the org structure for manual review before running the full analysis.

Ingest Extract

🔄

Re-run Analysis

Re-run AI scoring and report generation on an existing extraction — useful after manually editing the org map.

ESCO Score Generate

🧠 Company AI Usage Context (optional)

Describe which AI/LLM tools this client already uses and how deeply they're deployed. The scorer will use this to raise baseline adoption for affected roles and classify tasks more realistically. Examples: "M365 Copilot rolled out org-wide Q1 2026, used daily by ~80% of knowledge workers for email and document drafting.", "Engineering team uses GitHub Copilot. Marketing pilots Jasper for content. No agentic workflows deployed."

Saved to this session. Applied at the Score stage. 0 / 8000

Initializing… 0%

⚠️ ESCO QA Review Required

Review the ESCO QA report before continuing.

Recent Jobs

No jobs yet.

📊

No results yet

Run an analysis in the Analyse tab to generate reports.

💬

Ask questions about your organizational analysis.
Available context is automatically included when results are loaded.

OrgReview — User Guide

AI Organisational Impact Analysis · SIA

Sign In & Set Your Session Name

Enter the password when prompted on first load. Your session is saved in the browser — you will not need to re-enter it unless you clear your browser data.

Session ID (top of every page): The session name is the first thing to set — it's the identifier for your client's analysis. Use a short memorable name like acme-corp or q2-retail (letters, numbers, hyphens only). You can:

Type a name — Type your chosen name into the Session ID bar at the top and press Enter or click Use this ID. All uploads and results will be stored under that name.
Share — Copy the Session ID and give it to a colleague so they can load the same session on their browser
Switch — Type or paste a different Session ID to load another client's analysis
New ID — Click New ID to generate a fresh random name and start a clean session
Delete — Remove all data for the current session (a warning appears for sessions older than 7 days)

⚠️ Set the Session ID before uploading files. Files are stored under whichever ID is active when you upload — changing it afterwards will not move existing files.

Upload Client Documents — Upload tab

Go to the Upload tab. Drag and drop or click to upload any combination of:

Org charts — Excel, CSV, or PDF
Job descriptions — Word, PDF, or plain text
Headcount reports — any spreadsheet format
Process documents — PDFs describing workflows
Process diagrams — BPMN (.bpmn) or image exports (PNG/JPG) of swimlane / flowchart process maps. The model detects swimlanes, tasks, gateways, and labelled arrows, and uses the upstream/downstream relationships to sharpen E-level classification (especially for spotting cross-role agentic-pipeline E3 opportunities).

Multiple uploads are supported — upload more files at any time before running the analysis. Your files are stored under the Session ID shown in the bar at the top of the page. Share that name with a colleague so they can load the same session on their browser.

Run the Analysis — Analyse tab

Go to the Analyse tab and click Run Full Analysis. The pipeline automatically processes all files you uploaded in your session.

The pipeline runs these stages automatically:

Ingest — reads and extracts text from all uploaded files
Extract — identifies roles, departments, and headcount from the text
ESCO Lookup — standardises task names against the EU ESCO taxonomy
ESCO QA — flags any data quality issues for review
Approval checkpoint — the pipeline pauses here. Review the QA report (click View QA Report), then click Approve & Continue
Analyse — scores every role E0–E3 using AWS Bedrock (Claude Opus 4.7) with AEI calibration
Aggregate — computes department and enterprise-level summaries
Generate — writes all output reports (Excel, HTML, JSON)
Use Cases — identifies 5–12 AI use cases and scores them on 6 KPIs

The full pipeline typically takes 3–8 minutes depending on the size of the organisation. A live progress bar shows each stage in real-time.

Job History: Below the progress bar, you'll see a history of all pipeline runs for this session. You can:

Reconnect — Resume watching a running job if you refresh the page
Cancel — Stop a running job
View Results — Jump directly to results for completed jobs

Session Lock: Once a pipeline run completes, the session is locked. To run a new analysis, generate a new Session ID.

How the E-level is decided — the two core papers

Every task classification is produced by the following paper-grounded decision procedure, embedded verbatim in the Bedrock prompt:

Eloundou β (via Anthropic — Massenkoff & McCrory 2026, p. 5): can an LLM double the task's speed? β=1 (LLM alone) / β=0.5 (LLM + tools) / β=0 (not feasible, forces E0).
Anthropic's observed_exposure anchor (Massenkoff & McCrory 2026, p. 6–7): the closest SOC occupation's measured AEI exposure from job_exposure.csv (756 occupations). Sets whether β>0 tasks lean E1 (≥0.45, already widely automated) or E2/E3 (lower adoption today).
OpenAI's GDPval capability + failure-mode check (Patwardhan et al. 2025, §3.1–3.3): Claude Opus 4.1 reached parity-or-better on 47.6% of real deliverables (grounds E1 = 0.5). Tasks requiring accountability sign-off fall under the "try-n-times, then fix it yourself" workflow (§3.2) → E2. Tasks GDPval explicitly excludes — manual labour, tacit knowledge, interpersonal communication — map to E0.
OECD autonomy typology (OECD 2022, p. 53): E0 = no-action autonomy, E1 = operational automation, E2 = human-in-the-loop, E3 = human-on-the-loop (agentic).

Each justification field in the output Excel contains a concise (≤15 word) label naming the decisive factor — e.g. "β=1, report drafting → E1" or "GDPval §3.2 human review, compliance → E2". The full paper-grounded decision procedure is embedded in the Bedrock prompt; the short label is what surfaces in client-facing reports.

How time allocation is decided — O*NET 28.3

Time-per-task percentages are not guessed by the LLM. They are computed deterministically from O*NET 28.3 Work Activities (U.S. Department of Labor, January 2025 release):

For each role, the pipeline looks up its closest SOC occupation in reference/job_exposure.csv.
For that SOC, it reads reference/onet-activities.csv — the top-5 Work Activities by Importance, pre-tagged with E-levels (e.g. "Analyzing Data or Information" → E1, "Coordinating Work" → E2, "Establishing Interpersonal Relationships" → E0).
The Importance scores are normalised so the five activities sum to 75% — the realistic working-day target (remaining ~25% is meetings/admin/interruptions, counted as zero AI impact).
After the LLM classifies each task's E-level, the pipeline distributes the SOC's E-level budget equally across tasks at that level. Two E1 tasks in a role with an E1 budget of 30% each get 15%.

This means two re-runs on the same session produce identical time allocations — the source is a frozen O*NET release, not a stochastic LLM guess. Source: onetonline.org.

📚 References for the E-level scoring model

The E0–E3 framework and the realistic time-saved weights (E1 = 0.5, E2 = 0.3, E3 = 0.2) are grounded in four papers, all embedded verbatim (with page citations) into the Bedrock system prompt at runtime:

Anthropic — Massenkoff, M. & McCrory, P. (2026). Labor market impacts of AI: A new measure and early evidence. Published 5 March 2026. (With companion Appendix.) The primary empirical source. Introduces the "observed exposure" measure that underlies reference/job_exposure.csv. Verbatim methodology from p. 6: "fully automated implementations receive full weight, while augmentative use receives half weight." This is the direct justification for our E1 = 0.5 weight. Also supplies the top-10 exposure table (Figure 3, p. 7) used as empirical anchors in calibration. The companion Appendix provides the full per-occupation SOC-level exposure table used to build job_exposure.csv, plus the methodological detail behind the automation/augmentation split.
OpenAI — Patwardhan, T. et al. (2025). GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks. Capability benchmark covering 1,320 tasks across 44 occupations and the 9 sectors contributing most to U.S. GDP. Headline result (§3.1, p. 5-6): "47.6% of deliverables by Claude Opus 4.1 were graded as better than (wins) or as good as (ties) the human deliverable." Directly supports E1 = 0.5 — parity on roughly half of real knowledge-work tasks. §3.2 on "try n times, if unsatisfactory, fix it yourself" grounds E2 = 0.3 (human-in-the-loop review cycles).
McKinsey Global Institute (2023). The economic potential of generative AI: The next productivity frontier. Chui, M., Hazan, E., Roberts, R., et al. June 2023. Sets the theoretical ceiling we deliberately discount. Key insight #4, p. 3: "Current generative AI and other technologies have the potential to automate work activities that absorb 60 to 70 percent of employees' time today." Our weights (E1=0.5, E2=0.3, E3=0.2) explicitly sit below this theoretical ceiling to reflect realistic, today-deployable savings. Customer-service case (p. 15) — AI can decrease productivity for highly skilled agents — is the concrete evidence that E1 ≠ 1.0.
OECD (2022). OECD Framework for the Classification of AI Systems. OECD Digital Economy Papers No. 323, February 2022. Source of the four-level autonomy typology (p. 53) that maps onto E0/E1/E2/E3. Defines human-support, human-in-the-loop, human-on-the-loop, and human-out-of-the-loop autonomy. Page 53 also establishes that high-autonomy AI is policy-gated in critical or rights-affecting contexts — directly justifies E3 = 0.2 (deployment-limited today even when capability is high).

Weight choice rationale: E1 = 0.5 (Anthropic — Massenkoff & McCrory half-weight rule for augmentation), E2 = 0.3 (human review keeps ~70% of task time), E3 = 0.2 (high theoretical capability but deployment-limited today per OECD autonomy framework).

Full calibration formula: Calibrated E-score = 0.7 × task-weighted score + 0.3 × Anthropic observed_exposure (Massenkoff & McCrory). The 70/30 blend keeps the model sensitive to role-specific task detail while anchoring to empirical data.

Review Results — Results tab

Go to the Results tab. You will see:

Enterprise Summary card — overall exposure band, average E-score, role and headcount totals, with a headline stat showing percentage of tasks AI-impactable
E-level stacked bar chart — visual breakdown of E0/E1/E2/E3 distribution across the organization
Department Breakdown table — E-score and exposure band per department with E-level bars. Click any department row to expand and see individual roles
Per-role drill-down — expanded department view shows each role with its own E-level stacked bar
AI Use Case Impact Assessment table — ranked use cases with 6-KPI scores and priority tiers (🔴 High / 🟡 Medium / 🟢 Low)
Excel download — full task-level analysis with a Department Summary sheet, ready for client delivery
HTML reports — heatmap, implementation matrix, FTE savings, ROI, impact assessment, QA report — all previewable in the browser

Click 👁 Preview on any HTML file to open it in a new tab. Click ⬇ Download to save any file locally. All file links include authentication tokens automatically.

AEI Calibration: E-scores are calibrated against the Anthropic Economic Index (756 SOC occupations) using a 70/30 blend of task-level analysis and empirical benchmark data. Confidence levels (high/medium/low) are reported in the FTE savings output.

📚 References for AI use case identification & the 6 KPIs

Use cases are generated by an LLM from the E1/E2/E3 task list and scored on six KPIs on a 1–5 scale. The scorecard draws on three of the four source papers (all embedded verbatim into the scoring prompts):

McKinsey Global Institute (2023). The economic potential of generative AI. June 2023. Supplies the four high-value functions driving strategic_goals_alignment and revenue_growth KPIs. Verbatim from p. 12: "Our analysis of 16 business functions identified just four — customer operations, marketing and sales, software engineering, and research and development — that could account for approximately 75 percent of the total annual value from generative AI use cases." Customer-service evidence (p. 15) grounds quality_errors: AI decreased productivity for skilled agents in one study — quality KPIs cannot be assumed positive.
OpenAI — Patwardhan, T. et al. (2025). GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks. Headline result (§3.1, p. 5-6): frontier models achieve parity-or-better on 48% of real-world deliverables. Failure-mode analysis (§3.3, p. 7) — instruction-following failures, formatting errors, occasional hallucination — is the direct grounding for the quality_errors KPI.
Anthropic — Massenkoff & McCrory (2026). Labor market impacts of AI: A new measure and early evidence. Observed exposure scores per occupation (Figure 3, p. 7) feed the people_impacted KPI — the headcount-weighted spread of AI-touchable tasks across the role set.

Scalability KPI and quick_win flag — internal design choice, not externally cited. The scalability score favours use cases whose underlying E1 patterns repeat across multiple roles; the quick_win threshold (deployable in <4 weeks with standard LLM tooling) is a consulting heuristic baked into our prompt. Neither currently has a published-source anchor.

Priority tier thresholds: impact_total ≥ 3.5 → High, 2.5–3.4 → Medium, < 2.5 → Low. impact_total is recomputed server-side as the mean of the 6 KPI scores — the LLM cannot fudge the arithmetic.

📚 References for the FTE savings calculation

The FTE savings headline uses: FTE saved = weighted_e_score × headcount, with an "incremental" variant that subtracts already-realised baseline adoption.

The potential vs. realised distinction motivating incremental_e_score = weighted × (1 − baseline_adoption) comes from Anthropic — Massenkoff & McCrory (2026), p. 5: "AI is far from reaching its theoretical capability: actual coverage remains a fraction of what's feasible." We don't double-count savings the client has already captured through existing tools.

Baseline-adoption detection — internal design choice, not externally cited. The keyword list used to detect current AI tool usage (ChatGPT, Copilot, Claude, Cursor, Jasper, Gemini, etc.) and the 0.0–1.0 adoption scale in analyze.py are our own heuristics. They are not drawn from a published adoption survey.

Framing note: FTE savings figures are presented as reallocatable capacity (time that can be redirected to higher-value work), not as headcount reductions. This framing is a consulting choice, not a citation-backed claim.

Ask Questions — Chat tab

Go to the Chat tab to ask natural-language questions about the analysis. Examples:

"Which department has the highest AI exposure?"
"What are the top 3 quick-win use cases?"
"Explain the E-score for the Finance team"
"Which roles should we prioritise for reskilling?"

The chat automatically uses the analysis context when results have been loaded.

E-Level Framework Reference

Level	Name	Weight	AI Type	Description
E0	No AI impact	0.0	—	Requires physical presence, licensed liability, or high emotional intelligence. AI cannot meaningfully assist.
E1	Direct automation	0.5	GenAI	LLM alone achieves ≥2× speedup. Weight reflects realistic 50% time saved — human still prompts and reviews.
E2	Augmented automation	0.3	GenAI + Human	AI assists but human review is required. Accountability and sign-off keep ~70% of the task time intact.
E3	Agentic pipeline	0.2	Agentic AI	Multi-step AI pipeline requiring deployment infrastructure. Theoretical ceiling is higher, but only ~20% realized today.

Exposure bands: E-score ≥ 0.30 = HIGH | ≥ 0.15 = MEDIUM | < 0.15 = LOW.
E-score = sum(E-level weight × time% / 100) across a role's tasks. Untouched portions of the week (meetings, admin) count as 0 AI impact.
AEI Calibration: Scores are calibrated using a 70/30 blend of task-level analysis and Anthropic Economic Index benchmark data (756 SOC occupations).

📚 Why these four levels?

The E0/E1/E2/E3 levels map directly onto the OECD's four-level action-autonomy typology, which itself originates in Endsley (1987):

OECD (2022). Framework for the Classification of AI Systems. OECD Digital Economy Papers No. 323. Verbatim from page 53: "No-action autonomy (human support): System cannot act on its recommendations. Low-action autonomy (human-in-the-loop): System acts if the human agrees. Medium-action autonomy (human-on-the-loop): System acts unless the human vetoes. High-action autonomy (human-out-of-the-loop): System acts without human involvement." These map to E0 / E2 / E1 (fast tasks) or E2 (accountability tasks) / E3 respectively.
Anthropic — Massenkoff & McCrory (2026). Labor market impacts of AI: A new measure and early evidence. Provides the observed-vs-theoretical distinction (p. 5) that justifies E3 as its own level: "AI is far from reaching its theoretical capability: actual coverage remains a fraction of what's feasible." Our E3 = 0.2 weight captures that deployment gap.

Tips for Best Results

Include headcount data wherever possible — it unlocks FTE savings calculations and weighted averages.
Upload job descriptions alongside the org chart — richer task data produces more accurate E-scores.
Use a consistent company name across Upload and Analyse tabs (lowercase, no spaces).
The ESCO QA checkpoint is your quality gate — review the QA report before approving, especially for niche or sector-specific roles.
All outputs are session-isolated — different Session IDs produce independent analyses.
If you refresh the page during a pipeline run, use the Reconnect button in the Job History to resume watching progress.
Delete old sessions periodically using the Delete session data button (sessions older than 7 days show a warning).
The system outputs are filtered — only primary deliverables (Excel, HTML reports) are shown. Intermediate files (.md, working JSON) are hidden for clarity.

Recent Updates (April 2026)

AEI Calibration: E-scores now calibrated against Anthropic Economic Index (756 SOC occupations) for more accurate, defensible estimates
Job History: View all pipeline runs for your session with Reconnect/Cancel/View Results controls
Session Lock: Completed sessions are locked to prevent accidental re-runs — generate a new Session ID for fresh analyses
Enhanced Results View: E-level stacked bars, per-role drill-down, and headline stats for better visualization
QA Report: Pretty HTML QA report available at the approval checkpoint
Chat Improvements: Markdown table rendering and full context (FTE savings, ROI, use cases)
Session Age Warnings: Automatic alerts for sessions older than 7 days with quick delete option