THATGUYABHISHEK
navigation

Excel Charting : AI-Powered Chart Design Recommendations

notion image

Introduction

Creating charts is easy. Making them GOOD is hard. Users struggled with chart type selection, styling decisions, and best practices—resulting in suboptimal visualizations even when data was correct.
As Lead Designer for Copilot Chart Design Recommendations, I designed an LLM-powered system that analyzes data shape, infers user intent, and suggests optimal chart configurations. This required deep collaboration with ML engineers to train and tune the RAG models with data visualization principles—essentially encoding expert knowledge into AI prompts.
The result bridges the gap between 'chart exists' and 'chart communicates effectively,' democratizing data visualization expertise for 400M users.
 
 
Results Overview
99% Execution Success
9 mins User Effort Saved
172 Clicks Eliminated
FY26 H1 Shipping Timeline

The Problem: The Chart Design Expertise Gap

What Users Were Struggling With
"I created a chart but don't know if I picked the right type." — Intermediate User
"It looks boring. How do I make it presentation-ready?" — Enterprise Analyst
"I spend hours tweaking charts to look professional." — Power User
 
The pattern was clear: Users could INSERT charts (thanks to our P0 improvements), but they couldn't optimize them.
Pain Point
User Behavior
Business Impact
Chart Type Uncertainty
Try multiple types, delete, start over. 5-10 minute cycle.
Wasted time, user frustration, suboptimal final choices
Styling Paralysis
Don't know which formatting options matter. Either over-style or under-style.
Charts look unprofessional or cluttered
Best Practice Ignorance
Unaware of data viz principles (e.g., start axis at zero, use direct labels)
Misleading visuals, poor communication

Why Competitors Had the Advantage

Competitive analysis revealed sophisticated design assistance:
  • Tableau: 'Show Me' feature auto-recommends chart types based on field types
  • Power BI: Smart narrative and formatting suggestions
  • ChatGPT Code Interpreter: Generates Python code with matplotlib best practices baked in
  • Napkin AI: Fully automated design from text descriptions
 
CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.
 
 

🎯 As a User (Functional & Emotional JTBDs)

Primary Hero JTBD

"When I insert a chart, help me create a visualization that tells my story effectively — without needing to be a data viz expert."
Job Category
User Statement
Pain Point It Solves
Chart Type Selection
"Help me figure out which chart best represents my data"
5-10 minute trial-and-error cycles; users try multiple types, delete, start over
Visual Design
"Make my chart look professional and presentation-ready"
Charts described as "boring," "old-fashioned," "embarrassing to present"
Best Practice Application
"Tell me what I don't know about good data visualization"
Users unaware of principles like "start Y-axis at zero" or "use direct labels"

Supporting JTBDs

(from research)
JTBD
Description
Copilot Intent Share
Comparative Analysis
Compare values across categories, geographies, or periods to uncover insights
Part of 83% "Create Chart" intents
Presentation & Storytelling
Make complex information clear, engaging, persuasive in meetings/reports
9.6% of explicit intents
Trend Analysis
Visualize how metrics change over time to identify patterns
Primary use case for Line charts
Answering Business Questions
Create ad-hoc visuals to answer specific questions quickly
Core Excel workflow

The "Magic Wand" Quote

(from User Research)
"If you could wave a magic wand, what would you change?"
Users wanted three things:
  1. Automatic chart creation — "Based on my specific goal and storytelling needs, help me tell my story"
  1. Automatic beautification — "Make my charts look beautiful without me having to figure it out"
  1. Natural language customization — "Let me ask for customizations in plain English"

💼 As a Business (Strategic JTBDs)

Primary Hero JTBD

"Increase chart adoption and retention to keep users within the M365 ecosystem for their data visualization needs — preventing defection to competitors."
Business Metrics We're Driving
Metric
Baseline Problem
Target Impact
Chart Kept Rate
~45% of charts deleted in same session
Push toward >70% retention
Chart Create MAU
Only 2% of MAU on web create charts
Increase top-of-funnel creation
Net Chart Creation
Inserts minus deletes was too low
Increase net positive
Data Viz NPS
Charting issues dragging down Excel NPS
Measurable improvement
Copilot Tried/Enabled
Design Recommendations as gateway
Lift adoption rate

Strategic Business JTBDs

 
Business Job
Why It Matters
How Design Recommendations Solves It
Compete Defense
Tableau, Power BI, ChatGPT Code Interpreter, Napkin AI democratizing design expertise
Embed expertise IN the tool; no learning curve required
Copilot Adoption
Only ~9% of Copilot users engaged with chart-related prompts
Proactive recommendations at insert = gateway to Copilot
User Retention
Users looking outside M365 for data viz needs
"Wow moment" on first chart = sticky behavior
Unlock Latent Demand
33% of commercial users want to create charts but don't
Remove friction to convert intent → action

The Business Funnel Problem (from telemetry)

The massive drop from awareness → creation is where AI Design Recommendations lives. It attacks the -98% conversion gap.
notion image
 

My Role: Designing AI as Design Partner

  • Systems thinking — Thinking Charts through complete M365 ecosystem
  • End-to-end UX strategy for Copilot-powered design recommendations
  • AI prompt engineering collaboration — co-designed LLM prompts with ML team for chart analysis, gave examples of visually stunning data viz.
  • Recommendation interaction patterns — preview, apply, undo flows
  • RAG model training & tuning — defined data viz properties that inform chart type recommendations
  • Multi-recommendation handling — when LLM suggests 3-5 improvements, how to present without overwhelming
  • Trust-building mechanisms — explainability, rationale, learn more links
 

Design recommendations should feel like:

  • A helpful colleague, not a know-it-all boss
  • Suggestions, not mandates — users always have final say
  • Educational — explain WHY, don't just say WHAT
  • Confidence-building — help users become better designers over time
 

Design Process: From Data Shape to Design Intelligence

Phase 1: Discovery & Problem Framing

notion image
What I did:
  • Analyzed telemetry showing 45% chart deletion rate in same session — users weren't getting what they needed on first try
  • Reviewed OCV/NPS feedback: users called charts "boring," "embarrassing," "not presentation-ready"
  • Mapped the 98% funnel drop from awareness → creation — the problem wasn't awareness, it was friction
  • Defined the Hero JTBD: "Help me create a visualization that tells my story — without being a data viz expert"
 
The OCV analysis showed 38% of chart complaints were about poor quality — tables instead of charts, wrong grouping, blank outputs. That's what we were solving.

Phase 2: Concept Exploration

notion image
What I did:
  • Explored 3 concepts: "Copilot Takeover" (auto-apply), "Inline Tooltips" (minimal), "Side-by-Side Preview" (balanced)
  • Ran quick concept tests — users rejected auto-apply ("let me see my chart first")
  • Debated with PM on Design-first vs. Insights-first — advocated for sequential trust model
  • Landed on story-first titles ("Quarterly trends" not "Line chart") — users think in intent, not chart types

Phase 3: Constraints Negotiation

notion image
What I did:
  • Partnered with engineering to understand hard limits: preview fidelity (~85%), LLM latency (2-4 sec), single Copilot pane
  • Made tradeoff calls: 4 recommendations (not 3), refresh button (explore more), dropdown on Replace (user controls placement)
  • Designed around constraints: loading state that feels "worth the wait," disclaimer for preview variance
  • Pushed back on auto-apply — insisted on preview-first, explicit commit

Phase 4: Training & Tuning the RAG Model

What I did:
  • Defined "what good looks like" — created quality criteria for recommendations (relevance, variety, actionability)
  • Built a golden dataset of 50+ data scenarios with ideal chart recommendations — used for model validation
  • Partnered with data science to craft prompt engineering — injected design principles into LLM instructions:
    • "Prioritize story-first titles over chart-type labels"
    • "Always include rationale explaining WHY this chart type fits the data"
    • "Limit to 4 diverse recommendations — avoid redundant suggestions"
  • Defined statistical signals for the Python preprocessing layer:
    • Time-series detection → recommend line/area charts
    • Part-to-whole patterns → recommend pie/donut
    • Category comparison → recommend bar/column
    • High cardinality → warn against pie charts
  • Reviewed model outputs weekly — flagged bad recommendations, added to negative examples
  • Created edge case library: sparse data, single column, dates as text, currency formatting — ensured graceful degradation
  • Tuned confidence thresholds — recommendations below 70% confidence get filtered out
 
notion image

LLM Prompt Co-Design

Working with the ML team, I co-created prompts optimized for chart type selection. The key was encoding data visualization best practices into the prompt structure.

Chart Type Selection Logic (Encoded in Prompts)

I defined the decision tree that the model uses to recommend chart types:
User Intent / Data Shape
Recommended Chart
Why This Works
Time series with trend
Line Chart
Shows change over time; eye follows the trajectory
Categorical comparison
Clustered Bar/Column
Easy side-by-side comparison; clear value differences
Part-to-whole (<7 categories)
Pie/Donut Chart
Intuitive percentage representation; limited categories
Part-to-whole (>7 categories)
Stacked Bar/Area
Handles many categories; shows composition
Correlation/distribution
Scatter/Bubble Chart
Reveals relationships; shows outliers clearly
Actual vs. Target
Combo Chart
Different visual encoding for different data types

Prompt Iteration & Quality Tuning

Through iterative testing on 20+ sample datasets across industries (Telecom, Finance, Manufacturing, Retail), we tuned the prompts to:
  1. Maximize accuracy — >95% factually correct recommendations
  1. Improve analytical depth — avoid "obvious" suggestions, focus on insights
  1. Ensure clarity — plain language explanations users understand
  1. Add validation layer — constraint-based prompts + validation before showing users
 
**Prompt Architecture** Given a chart with [data structure], current type [X], analyze if a better visualization exists. Consider: 1) Data relationships, 2) Storytelling intent, 3) Visual clarity Return top 4 recommendations with executable chart config and brief rationale.

Phase 4: Detailed Design

notion image
 
notion image
What I did:
  • Designed the List → Detail two-panel flow (discovery vs. commitment)
  • Specified the recommendation card anatomy: thumbnail, story-first title, rationale, actions, review changes
  • Added "Review changes" section with expandable details — transparency builds trust
  • Defined micro-interactions: hover states, dropdown behavior, back navigation

Phase 5: Validation & Iteration

notion image
What I did:
  • Ran usability sessions — validated story-first titles resonated, users understood rationale text
  • Iterated on "Review changes" — added "Show details" expander for power users
  • Tested feedback loop (👍👎) placement — moved to footer for less intrusion
  • Confirmed Copy action was critical for PPT workflow users
 

Initial direction, design and concepts

 
 
 
 

The MVP

notion image
4 thumbnails
More options while staying scannable
Story-first titles
User decides based on WHAT they want to communicate, not HOW
Refresh button
Get next 4 recommendations without dismissing
AI disclaimer + feedback
Trust calibration + quality signal collection

 
Title + Rationale
Story-first title + chart-type rationale combined
Large preview
"What you see is what you get" before commit
Replace Chart and other options
Options: replace existing OR insert new on grid or simply copy it to clipboard and paste to MS Powerpoint or Word
Review changes
Transparency — user knows exactly what will change
notion image
 

Key Design Decisions & Trade-offs

Native Charts vs. AI-Generated Images
Choice: Generate NATIVE Excel charts, not PNG images
Why: Editable, data-bound, refreshable. Competitors' AI-generated images look good but can't be tweaked.
Top 1 vs. Top 4 Recommendations
Choice: Show 1-4 recommendations, prioritized by confidence
Why: Balance guidance with choice. 1 felt prescriptive, 5+ overwhelmed. 4 was sweet spot.
 
In-Pane Preview vs. Live Preview
Choice: Thumbnail preview in pane, NOT live chart manipulation on hover
Why: Live preview felt overwhelming. Thumbnails gave control without distraction.
 
Explain Rationale vs. Just Apply
Choice: Always show WHY, not just WHAT to change
Why: Builds user understanding over time. Trust through transparency.
 
 

Impact & Results

 

79% Kept Rate (▲+8pp)

Kept/Tried Rate
Users who apply recommendation keep the chart
 

9.3% Tried/Enabled (▲+52%)

Copilot Tried/Enabled Lift
Uplift in users who try Copilot after seeing recommendations
 

18% Poor Quality (▼-20pp)

Error-Free Load Rate
Down from 38% baseline
 

Ready for 25% rollout

Chart Retention Improvement
Reduction in same-session chart deletions
The feature is successfully reaching Novice/Intermediate users who traditionally DON'T use Copilot for charts (0.3%-3.7% baseline). Their engagement rates (54-61%) far exceed their typical Copilot usage.
 
Metric
Target
Month 1
Month 2
Trend
Kept/Tried Rate
≥80%
71%
79%
📈 +8pp
Error-Free Load Rate
≥80%
74%
85%
📈 +11pp
Poor Quality Rate
<20%
32%
18%
📉 -14pp
Pane Dismiss Rate
<30%
41%
28%
📉 -13pp
Net Satisfaction (👍-👎)
>60%
48%
64%
📈 +16pp

💬 User Feedback

"This is like having a data viz expert sitting next to me."
— Power User, Internal Preview
"I learned more about charting from these suggestions than from any tutorial."
— Novice User, Usability Study
"This is like having a data analyst whispering in my ear when I make a chart."
— Financial Analyst, Early Adopter
Technical Success Metrics (Internal Testing)
✅  100% execution success rate
✅  LLM-generated chart code worked every time in controlled tests
✅  9 mins, 172 clicks saved
✅  Measured against manual chart optimization workflow
✅  All common chart types supported
✅  Column, bar, line, scatter, pie, combo — full MVP coverage
User Testing Insights
👨🏽‍🦱 "This is like having a data viz expert sitting next to me." — Power User
👨🏻 "I learned more about charts from these suggestions than from tutorials." — Novice
👩🏼 "Finally! I don't have to guess if my chart is good." — Intermediate User
Key finding: Users didn't just apply recommendations — they LEARNED from them. Over time, they started making better initial choices.
Strategic Impact
  • Proved AI-native differentiation — Excel is only tool that combines native charts + AI recommendations + editability
  • Unlocked Copilot adoption — Design recommendations were #2 most-used Copilot feature after Insights
  • Elevated Excel's positioning — From 'spreadsheet tool' to 'intelligent design assistant'
  • Foundation for future — Opened door for AI assistance in formatting, tables, and beyond
 
 

Key Learnings: Designing for AI Collaboration

What Worked Exceptionally Well
  • Co-designing prompts with ML team — Designer + data scientist collaboration produced better outcomes than either alone
  • Explaining rationale — Educational approach built trust and improved user skill over time
  • Prioritizing native charts — Maintained editability vs. competitors' static AI outputs
  • Modular card pattern — Flexible system that scaled from 1 to N recommendations
Challenges & How We Solved Them
  • LLM sometimes suggested impractical changes — Constraint-based prompts, validation layer before showing to users
  • Preview thumbnails took too long to generate Cached common transformations, optimized rendering pipeline
  • Users wanted to compare multiple recommendations side-by-side Added comparison view in Walk phase (deferred from MVP)
 

The Bigger Lesson for AI Product Design

AI features succeed when they:
  • 1. Augment, don't automate — User stays in control, AI provides options
  • 2. Explain the 'why' — Black box AI breeds mistrust
  • 3. Preserve human creativity — Recommendations, not prescriptions
  • 4. Build expertise over time — Users learn patterns and become less dependent on AI
badge