Excel Charting : AI-Powered Chart Design Recommendations

Introduction

Creating charts is easy. Making them GOOD is hard. Users struggled with chart type selection, styling decisions, and best practices—resulting in suboptimal visualizations even when data was correct.

As Lead Designer for Copilot Chart Design Recommendations, I designed an LLM-powered system that analyzes data shape, infers user intent, and suggests optimal chart configurations. This required deep collaboration with ML engineers to train and tune the RAG models with data visualization principles—essentially encoding expert knowledge into AI prompts.

The result bridges the gap between 'chart exists' and 'chart communicates effectively,' democratizing data visualization expertise for 400M users.

Results Overview

99% Execution Success

9 mins User Effort Saved

172 Clicks Eliminated

FY26 H1 Shipping Timeline

The Problem: The Chart Design Expertise Gap

What Users Were Struggling With

"I created a chart but don't know if I picked the right type." — Intermediate User

"It looks boring. How do I make it presentation-ready?" — Enterprise Analyst

"I spend hours tweaking charts to look professional." — Power User

The pattern was clear: Users could INSERT charts (thanks to our P0 improvements), but they couldn't optimize them.

Pain Point	User Behavior	Business Impact
Chart Type Uncertainty	Try multiple types, delete, start over. 5-10 minute cycle.	Wasted time, user frustration, suboptimal final choices
Styling Paralysis	Don't know which formatting options matter. Either over-style or under-style.	Charts look unprofessional or cluttered
Best Practice Ignorance	Unaware of data viz principles (e.g., start axis at zero, use direct labels)	Misleading visuals, poor communication

Why Competitors Had the Advantage

Competitive analysis revealed sophisticated design assistance:

Tableau: 'Show Me' feature auto-recommends chart types based on field types

Power BI: Smart narrative and formatting suggestions

ChatGPT Code Interpreter: Generates Python code with matplotlib best practices baked in

Napkin AI: Fully automated design from text descriptions

CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.

🎯 As a User (Functional & Emotional JTBDs)

Primary Hero JTBD

"When I insert a chart, help me create a visualization that tells my story effectively — without needing to be a data viz expert."

Job Category	User Statement	Pain Point It Solves
Chart Type Selection	"Help me figure out which chart best represents my data"	5-10 minute trial-and-error cycles; users try multiple types, delete, start over
Visual Design	"Make my chart look professional and presentation-ready"	Charts described as "boring," "old-fashioned," "embarrassing to present"
Best Practice Application	"Tell me what I don't know about good data visualization"	Users unaware of principles like "start Y-axis at zero" or "use direct labels"

Supporting JTBDs

(from research)

JTBD	Description	Copilot Intent Share
Comparative Analysis	Compare values across categories, geographies, or periods to uncover insights	Part of 83% "Create Chart" intents
Presentation & Storytelling	Make complex information clear, engaging, persuasive in meetings/reports	9.6% of explicit intents
Trend Analysis	Visualize how metrics change over time to identify patterns	Primary use case for Line charts
Answering Business Questions	Create ad-hoc visuals to answer specific questions quickly	Core Excel workflow

The "Magic Wand" Quote

(from User Research)

"If you could wave a magic wand, what would you change?"

Users wanted three things:

Automatic chart creation — "Based on my specific goal and storytelling needs, help me tell my story"

Automatic beautification — "Make my charts look beautiful without me having to figure it out"

Natural language customization — "Let me ask for customizations in plain English"

💼 As a Business (Strategic JTBDs)

Primary Hero JTBD

"Increase chart adoption and retention to keep users within the M365 ecosystem for their data visualization needs — preventing defection to competitors."

Business Metrics We're Driving

Metric	Baseline Problem	Target Impact
Chart Kept Rate	~45% of charts deleted in same session	Push toward >70% retention
Chart Create MAU	Only 2% of MAU on web create charts	Increase top-of-funnel creation
Net Chart Creation	Inserts minus deletes was too low	Increase net positive
Data Viz NPS	Charting issues dragging down Excel NPS	Measurable improvement
Copilot Tried/Enabled	Design Recommendations as gateway	Lift adoption rate

Strategic Business JTBDs

Business Job	Why It Matters	How Design Recommendations Solves It
Compete Defense	Tableau, Power BI, ChatGPT Code Interpreter, Napkin AI democratizing design expertise	Embed expertise IN the tool; no learning curve required
Copilot Adoption	Only ~9% of Copilot users engaged with chart-related prompts	Proactive recommendations at insert = gateway to Copilot
User Retention	Users looking outside M365 for data viz needs	"Wow moment" on first chart = sticky behavior
Unlock Latent Demand	33% of commercial users want to create charts but don't	Remove friction to convert intent → action

The Business Funnel Problem (from telemetry)

The massive drop from awareness → creation is where AI Design Recommendations lives. It attacks the -98% conversion gap.

My Role: Designing AI as Design Partner

Systems thinking — Thinking Charts through complete M365 ecosystem

End-to-end UX strategy for Copilot-powered design recommendations

AI prompt engineering collaboration — co-designed LLM prompts with ML team for chart analysis, gave examples of visually stunning data viz.

Recommendation interaction patterns — preview, apply, undo flows

RAG model training & tuning — defined data viz properties that inform chart type recommendations

Multi-recommendation handling — when LLM suggests 3-5 improvements, how to present without overwhelming

Trust-building mechanisms — explainability, rationale, learn more links

Design recommendations should feel like:

A helpful colleague, not a know-it-all boss

Suggestions, not mandates — users always have final say

Educational — explain WHY, don't just say WHAT

Confidence-building — help users become better designers over time

Design Process: From Data Shape to Design Intelligence

Phase 1: Discovery & Problem Framing

What I did:

Analyzed telemetry showing 45% chart deletion rate in same session — users weren't getting what they needed on first try

Reviewed OCV/NPS feedback: users called charts "boring," "embarrassing," "not presentation-ready"

Mapped the 98% funnel drop from awareness → creation — the problem wasn't awareness, it was friction

Defined the Hero JTBD: "Help me create a visualization that tells my story — without being a data viz expert"

The OCV analysis showed 38% of chart complaints were about poor quality — tables instead of charts, wrong grouping, blank outputs. That's what we were solving.

Phase 2: Concept Exploration

What I did:

Explored 3 concepts: "Copilot Takeover" (auto-apply), "Inline Tooltips" (minimal), "Side-by-Side Preview" (balanced)

Ran quick concept tests — users rejected auto-apply ("let me see my chart first")

Debated with PM on Design-first vs. Insights-first — advocated for sequential trust model

Landed on story-first titles ("Quarterly trends" not "Line chart") — users think in intent, not chart types

Phase 3: Constraints Negotiation

What I did:

Partnered with engineering to understand hard limits: preview fidelity (~85%), LLM latency (2-4 sec), single Copilot pane

Made tradeoff calls: 4 recommendations (not 3), refresh button (explore more), dropdown on Replace (user controls placement)

Designed around constraints: loading state that feels "worth the wait," disclaimer for preview variance

Pushed back on auto-apply — insisted on preview-first, explicit commit

Phase 4: Training & Tuning the RAG Model

What I did:

Defined "what good looks like" — created quality criteria for recommendations (relevance, variety, actionability)

Built a golden dataset of 50+ data scenarios with ideal chart recommendations — used for model validation

Partnered with data science to craft prompt engineering — injected design principles into LLM instructions:

"Prioritize story-first titles over chart-type labels"
"Always include rationale explaining WHY this chart type fits the data"
"Limit to 4 diverse recommendations — avoid redundant suggestions"

Defined statistical signals for the Python preprocessing layer:

Time-series detection → recommend line/area charts
Part-to-whole patterns → recommend pie/donut
Category comparison → recommend bar/column
High cardinality → warn against pie charts

Reviewed model outputs weekly — flagged bad recommendations, added to negative examples

Created edge case library: sparse data, single column, dates as text, currency formatting — ensured graceful degradation

Tuned confidence thresholds — recommendations below 70% confidence get filtered out

LLM Prompt Co-Design

Working with the ML team, I co-created prompts optimized for chart type selection. The key was encoding data visualization best practices into the prompt structure.

Chart Type Selection Logic (Encoded in Prompts)

I defined the decision tree that the model uses to recommend chart types:

User Intent / Data Shape	Recommended Chart	Why This Works
Time series with trend	Line Chart	Shows change over time; eye follows the trajectory
Categorical comparison	Clustered Bar/Column	Easy side-by-side comparison; clear value differences
Part-to-whole (<7 categories)	Pie/Donut Chart	Intuitive percentage representation; limited categories
Part-to-whole (>7 categories)	Stacked Bar/Area	Handles many categories; shows composition
Correlation/distribution	Scatter/Bubble Chart	Reveals relationships; shows outliers clearly
Actual vs. Target	Combo Chart	Different visual encoding for different data types

Prompt Iteration & Quality Tuning

Through iterative testing on 20+ sample datasets across industries (Telecom, Finance, Manufacturing, Retail), we tuned the prompts to:

Maximize accuracy — >95% factually correct recommendations

Improve analytical depth — avoid "obvious" suggestions, focus on insights

Ensure clarity — plain language explanations users understand

Add validation layer — constraint-based prompts + validation before showing users


**Prompt Architecture**

Given a chart with [data structure], current type [X], analyze if a better visualization exists.
Consider: 
1) Data relationships,
2) Storytelling intent, 
3) Visual clarity

Return top 4 recommendations with executable chart config and brief rationale.

Phase 4: Detailed Design

What I did:

Designed the List → Detail two-panel flow (discovery vs. commitment)

Specified the recommendation card anatomy: thumbnail, story-first title, rationale, actions, review changes

Added "Review changes" section with expandable details — transparency builds trust

Defined micro-interactions: hover states, dropdown behavior, back navigation

Phase 5: Validation & Iteration

What I did:

Ran usability sessions — validated story-first titles resonated, users understood rationale text

Iterated on "Review changes" — added "Show details" expander for power users

Tested feedback loop (👍👎) placement — moved to footer for less intrusion

Confirmed Copy action was critical for PPT workflow users

Initial direction, design and concepts

The MVP

4 thumbnails

More options while staying scannable

Story-first titles

User decides based on WHAT they want to communicate, not HOW

Refresh button

Get next 4 recommendations without dismissing

AI disclaimer + feedback

Trust calibration + quality signal collection

Title + Rationale

Story-first title + chart-type rationale combined

Large preview

"What you see is what you get" before commit

Replace Chart and other options

Options: replace existing OR insert new on grid or simply copy it to clipboard and paste to MS Powerpoint or Word

Review changes

Transparency — user knows exactly what will change

Key Design Decisions & Trade-offs

Native Charts vs. AI-Generated Images

Choice: Generate NATIVE Excel charts, not PNG images

Why: Editable, data-bound, refreshable. Competitors' AI-generated images look good but can't be tweaked.

Top 1 vs. Top 4 Recommendations

Choice: Show 1-4 recommendations, prioritized by confidence

Why: Balance guidance with choice. 1 felt prescriptive, 5+ overwhelmed. 4 was sweet spot.

In-Pane Preview vs. Live Preview

Choice: Thumbnail preview in pane, NOT live chart manipulation on hover

Why: Live preview felt overwhelming. Thumbnails gave control without distraction.

Explain Rationale vs. Just Apply

Choice: Always show WHY, not just WHAT to change

Why: Builds user understanding over time. Trust through transparency.

Impact & Results

79% Kept Rate (▲+8pp)

Kept/Tried Rate

Users who apply recommendation keep the chart

9.3% Tried/Enabled (▲+52%)

Copilot Tried/Enabled Lift

Uplift in users who try Copilot after seeing recommendations

18% Poor Quality (▼-20pp)

Error-Free Load Rate

Down from 38% baseline

Ready for 25% rollout

Chart Retention Improvement

Reduction in same-session chart deletions

The feature is successfully reaching Novice/Intermediate users who traditionally DON'T use Copilot for charts (0.3%-3.7% baseline). Their engagement rates (54-61%) far exceed their typical Copilot usage.

Metric	Target	Month 1	Month 2	Trend
Kept/Tried Rate	≥80%	71%	79%	📈 +8pp
Error-Free Load Rate	≥80%	74%	85%	📈 +11pp
Poor Quality Rate	<20%	32%	18%	📉 -14pp
Pane Dismiss Rate	<30%	41%	28%	📉 -13pp
Net Satisfaction (👍-👎)	>60%	48%	64%	📈 +16pp

💬 User Feedback

"This is like having a data viz expert sitting next to me."

— Power User, Internal Preview

"I learned more about charting from these suggestions than from any tutorial."

— Novice User, Usability Study

"This is like having a data analyst whispering in my ear when I make a chart."

— Financial Analyst, Early Adopter

Technical Success Metrics (Internal Testing)

✅ 100% execution success rate

✅ LLM-generated chart code worked every time in controlled tests

✅ 9 mins, 172 clicks saved

✅ Measured against manual chart optimization workflow

✅ All common chart types supported

✅ Column, bar, line, scatter, pie, combo — full MVP coverage

User Testing Insights

👨🏽‍🦱 "This is like having a data viz expert sitting next to me." — Power User

👨🏻 "I learned more about charts from these suggestions than from tutorials." — Novice

👩🏼 "Finally! I don't have to guess if my chart is good." — Intermediate User

Key finding: Users didn't just apply recommendations — they LEARNED from them. Over time, they started making better initial choices.

Strategic Impact

Proved AI-native differentiation — Excel is only tool that combines native charts + AI recommendations + editability

Unlocked Copilot adoption — Design recommendations were #2 most-used Copilot feature after Insights

Elevated Excel's positioning — From 'spreadsheet tool' to 'intelligent design assistant'

Foundation for future — Opened door for AI assistance in formatting, tables, and beyond

Key Learnings: Designing for AI Collaboration

What Worked Exceptionally Well

Co-designing prompts with ML team — Designer + data scientist collaboration produced better outcomes than either alone

Explaining rationale — Educational approach built trust and improved user skill over time

Prioritizing native charts — Maintained editability vs. competitors' static AI outputs

Modular card pattern — Flexible system that scaled from 1 to N recommendations

Challenges & How We Solved Them

LLM sometimes suggested impractical changes — Constraint-based prompts, validation layer before showing to users

Preview thumbnails took too long to generate — Cached common transformations, optimized rendering pipeline

Users wanted to compare multiple recommendations side-by-side — Added comparison view in Walk phase (deferred from MVP)

The Bigger Lesson for AI Product Design

AI features succeed when they:

1. Augment, don't automate — User stays in control, AI provides options

2. Explain the 'why' — Black box AI breeds mistrust

3. Preserve human creativity — Recommendations, not prescriptions

4. Build expertise over time — Users learn patterns and become less dependent on AI