Building Ellie: Lessons from Creating an Adaptive Language Learning App

The Journey of Building Ellie

Six months ago, I started building Ellie with two friends who shared my frustration with traditional language learning apps. We believed that learning a language shouldn’t feel like being forced through the same rigid curriculum as millions of other people.

Today, Ellie has over 1,000 beta users and a 67% 30-day retention rate - more than double the industry average. Here’s what we learned along the way.

The Core Problem

Most language learning apps treat personalization as an afterthought. They might let you choose your “level” (beginner, intermediate, advanced), but from there, everyone gets the same content in the same order.

This creates several problems:

The content isn’t relevant - A business professional doesn’t need to know how to order ice cream, but they do need presentation vocabulary
The difficulty is wrong - Some concepts are too easy (boring), others too hard (frustrating)
The pacing doesn’t match - Some people can handle 30 minutes daily, others only have 5
Learning styles are ignored - Visual learners get the same experience as auditory learners

We wanted to fix all of this with technology.

The Technical Challenge: Real-Time Adaptation

The hardest technical problem we faced was building an algorithm that could adapt difficulty in real-time without disrupting the learning flow.

Attempt 1: Rule-Based System

Our first approach used simple rules:

If user gets 3 questions wrong in a row → decrease difficulty
If user gets 10 questions right in a row → increase difficulty

This felt jarring. Users would suddenly jump from easy to hard content with no smooth transition. The feedback was immediate: “It feels broken.”

Attempt 2: Rolling Window Analysis

We implemented a rolling window that analyzed the last 20 interactions:

function calculateDifficultyAdjustment(recentInteractions: Interaction[]): number {
  const window = recentInteractions.slice(-20);
  const accuracy = window.filter(i => i.correct).length / window.length;
  const avgResponseTime = window.reduce((sum, i) => sum + i.timeMs, 0) / window.length;

  // Fast + accurate = too easy
  if (accuracy > 0.85 && avgResponseTime < 3000) {
    return +0.1; // Increase difficulty
  }

  // Slow + inaccurate = too hard
  if (accuracy < 0.6 && avgResponseTime > 8000) {
    return -0.1; // Decrease difficulty
  }

  return 0; // Keep current level
}

This worked much better. The adjustments were subtle enough that users didn’t notice the mechanics, but effective enough that content stayed in the “sweet spot” of challenging but achievable.

Attempt 3: Machine Learning

For our third iteration, we introduced a TensorFlow model that predicted optimal difficulty based on:

Historical performance patterns
Time of day (users perform differently at 7am vs 10pm)
Days since last session (retention correlation)
Content type preferences
Learning velocity trends

The ML model improved retention by another 15%, but required significant infrastructure:

Python backend for model training
Daily batch processing of user data
A/B testing framework to validate improvements
Fallback to rule-based system when ML fails

The complexity was worth it. Users started saying things like “It’s like the app knows me.”

The Content Challenge: Personalization at Scale

Personalized learning requires personalized content. But creating unique content for thousands of users isn’t scalable if done manually.

Our Hybrid Approach

Core Content Library - 50,000+ professionally curated phrases, dialogues, and scenarios
Interest Tags - Every piece of content tagged with topics (travel, business, food, etc.)
Template System - Generate variations using templates (“I’m traveling to [city]” → 1000s of cities)
AI Expansion - Use GPT to generate contextual variations while maintaining quality
Native Speaker Review - Quality control pipeline for AI-generated content

This let us offer truly personalized learning paths without creating millions of individual lessons.

The Product Challenge: Balancing Automation vs. Control

One surprising insight: users want personalization, but they also want control.

Early versions of Ellie were fully automated - the algorithm decided everything. User research revealed this made people anxious:

“I don’t know what I’m learning”
“I can’t skip topics I don’t care about”
“It feels like I have no control”

We added:

Topic selection - Choose your interest areas upfront
Goal setting - Define what you want to achieve
Skip buttons - Opt out of content that isn’t relevant
Progress visibility - See what you’ve learned and what’s coming

Retention improved by 20% after these changes. The lesson: personalization works best when users feel they’re in the driver’s seat, even if algorithms do most of the work.

Flutter: The Right Choice for Rapid Iteration

We chose Flutter for the mobile app, and it proved to be the right call:

Pros:

Write once, deploy to iOS and Android
Hot reload made iteration incredibly fast
Rich widget library for complex UIs
Great performance for animations

Cons:

App size is larger than native
Some platform-specific features required custom plugins
Debugging iOS-specific issues on Windows was painful

The productivity gain from cross-platform development outweighed the downsides. Our small team of 3 developers shipped features that would’ve required 6 with separate iOS/Android codebases.

Measuring What Matters

We tracked dozens of metrics, but three proved most important:

1. First Lesson Completion Rate (95%)

If users finish their first lesson, they’re likely to come back. We optimized relentlessly for this:

Made first lesson shorter (5 mins instead of 15)
Reduced difficulty (easy wins build confidence)
Added celebration animations
Showed immediate progress

2. 7-Day Retention (78%)

The first week is critical. We focused on:

Smart push notifications (based on optimal times for each user)
Streak mechanics (but forgiving - you can “freeze” streaks)
Quick wins (visible progress daily)

3. Time to Value (< 10 minutes)

How fast can a new user experience the “aha moment”? We got it under 10 minutes:

Skip lengthy onboarding
Start with interactive content immediately
Show personality early
Defer account creation until after first lesson

What I’d Do Differently

1. Start with Content Variety Earlier

We launched with one language (Spanish) to perfect the algorithm. Users wanted more variety sooner. In retrospect, launching with 3 languages would’ve been better, even if the algorithm wasn’t perfect.

2. Build Community Features from Day One

Social learning became our most-requested feature. Users wanted to:

Practice with other learners
Share progress
Compete on leaderboards
Form study groups

We’re adding these now, but building them from the start would’ve improved retention.

3. More Aggressive Beta Testing

We were cautious with our beta, growing slowly from 30 → 200 → 1,000 users. Growing faster would’ve revealed issues sooner and given us more data for the ML model.

Key Takeaways

Personalization is table stakes - Users expect it, but it’s hard to do well
Algorithms should be invisible - Users notice when things feel off, not when they feel right
Control + automation works best - Give users a steering wheel even if you’re doing most of the driving
Retention > acquisition - Better to have fewer engaged users than many churned ones
Measure early, measure often - Data reveals truth that user interviews miss

What’s Next

We’re now preparing for public launch in Q2 2025. The roadmap includes:

Expanding to 10 languages
Adding real-time conversation practice with AI tutors
Building social learning features
Creating a web app alongside mobile

Building Ellie has been the most challenging and rewarding project of my career. If you’re building a learning app or working on personalization problems, I’d love to chat. Reach out on Twitter or email.

This is part of a series on building Ellie. Next up: “The ML Pipeline Behind Adaptive Learning”