Building Ellie: Lessons from Creating an Adaptive Language Learning App
The Journey of Building Ellie
Six months ago, I started building Ellie with two friends who shared my frustration with traditional language learning apps. We believed that learning a language shouldn’t feel like being forced through the same rigid curriculum as millions of other people.
Today, Ellie has over 1,000 beta users and a 67% 30-day retention rate - more than double the industry average. Here’s what we learned along the way.
The Core Problem
Most language learning apps treat personalization as an afterthought. They might let you choose your “level” (beginner, intermediate, advanced), but from there, everyone gets the same content in the same order.
This creates several problems:
- The content isn’t relevant - A business professional doesn’t need to know how to order ice cream, but they do need presentation vocabulary
- The difficulty is wrong - Some concepts are too easy (boring), others too hard (frustrating)
- The pacing doesn’t match - Some people can handle 30 minutes daily, others only have 5
- Learning styles are ignored - Visual learners get the same experience as auditory learners
We wanted to fix all of this with technology.
The Technical Challenge: Real-Time Adaptation
The hardest technical problem we faced was building an algorithm that could adapt difficulty in real-time without disrupting the learning flow.
Attempt 1: Rule-Based System
Our first approach used simple rules:
- If user gets 3 questions wrong in a row → decrease difficulty
- If user gets 10 questions right in a row → increase difficulty
This felt jarring. Users would suddenly jump from easy to hard content with no smooth transition. The feedback was immediate: “It feels broken.”
Attempt 2: Rolling Window Analysis
We implemented a rolling window that analyzed the last 20 interactions:
function calculateDifficultyAdjustment(recentInteractions: Interaction[]): number {
const window = recentInteractions.slice(-20);
const accuracy = window.filter(i => i.correct).length / window.length;
const avgResponseTime = window.reduce((sum, i) => sum + i.timeMs, 0) / window.length;
// Fast + accurate = too easy
if (accuracy > 0.85 && avgResponseTime < 3000) {
return +0.1; // Increase difficulty
}
// Slow + inaccurate = too hard
if (accuracy < 0.6 && avgResponseTime > 8000) {
return -0.1; // Decrease difficulty
}
return 0; // Keep current level
}
This worked much better. The adjustments were subtle enough that users didn’t notice the mechanics, but effective enough that content stayed in the “sweet spot” of challenging but achievable.
Attempt 3: Machine Learning
For our third iteration, we introduced a TensorFlow model that predicted optimal difficulty based on:
- Historical performance patterns
- Time of day (users perform differently at 7am vs 10pm)
- Days since last session (retention correlation)
- Content type preferences
- Learning velocity trends
The ML model improved retention by another 15%, but required significant infrastructure:
- Python backend for model training
- Daily batch processing of user data
- A/B testing framework to validate improvements
- Fallback to rule-based system when ML fails
The complexity was worth it. Users started saying things like “It’s like the app knows me.”
The Content Challenge: Personalization at Scale
Personalized learning requires personalized content. But creating unique content for thousands of users isn’t scalable if done manually.
Our Hybrid Approach
- Core Content Library - 50,000+ professionally curated phrases, dialogues, and scenarios
- Interest Tags - Every piece of content tagged with topics (travel, business, food, etc.)
- Template System - Generate variations using templates (“I’m traveling to [city]” → 1000s of cities)
- AI Expansion - Use GPT to generate contextual variations while maintaining quality
- Native Speaker Review - Quality control pipeline for AI-generated content
This let us offer truly personalized learning paths without creating millions of individual lessons.
The Product Challenge: Balancing Automation vs. Control
One surprising insight: users want personalization, but they also want control.
Early versions of Ellie were fully automated - the algorithm decided everything. User research revealed this made people anxious:
- “I don’t know what I’m learning”
- “I can’t skip topics I don’t care about”
- “It feels like I have no control”
We added:
- Topic selection - Choose your interest areas upfront
- Goal setting - Define what you want to achieve
- Skip buttons - Opt out of content that isn’t relevant
- Progress visibility - See what you’ve learned and what’s coming
Retention improved by 20% after these changes. The lesson: personalization works best when users feel they’re in the driver’s seat, even if algorithms do most of the work.
Flutter: The Right Choice for Rapid Iteration
We chose Flutter for the mobile app, and it proved to be the right call:
Pros:
- Write once, deploy to iOS and Android
- Hot reload made iteration incredibly fast
- Rich widget library for complex UIs
- Great performance for animations
Cons:
- App size is larger than native
- Some platform-specific features required custom plugins
- Debugging iOS-specific issues on Windows was painful
The productivity gain from cross-platform development outweighed the downsides. Our small team of 3 developers shipped features that would’ve required 6 with separate iOS/Android codebases.
Measuring What Matters
We tracked dozens of metrics, but three proved most important:
1. First Lesson Completion Rate (95%)
If users finish their first lesson, they’re likely to come back. We optimized relentlessly for this:
- Made first lesson shorter (5 mins instead of 15)
- Reduced difficulty (easy wins build confidence)
- Added celebration animations
- Showed immediate progress
2. 7-Day Retention (78%)
The first week is critical. We focused on:
- Smart push notifications (based on optimal times for each user)
- Streak mechanics (but forgiving - you can “freeze” streaks)
- Quick wins (visible progress daily)
3. Time to Value (< 10 minutes)
How fast can a new user experience the “aha moment”? We got it under 10 minutes:
- Skip lengthy onboarding
- Start with interactive content immediately
- Show personality early
- Defer account creation until after first lesson
What I’d Do Differently
1. Start with Content Variety Earlier
We launched with one language (Spanish) to perfect the algorithm. Users wanted more variety sooner. In retrospect, launching with 3 languages would’ve been better, even if the algorithm wasn’t perfect.
2. Build Community Features from Day One
Social learning became our most-requested feature. Users wanted to:
- Practice with other learners
- Share progress
- Compete on leaderboards
- Form study groups
We’re adding these now, but building them from the start would’ve improved retention.
3. More Aggressive Beta Testing
We were cautious with our beta, growing slowly from 30 → 200 → 1,000 users. Growing faster would’ve revealed issues sooner and given us more data for the ML model.
Key Takeaways
- Personalization is table stakes - Users expect it, but it’s hard to do well
- Algorithms should be invisible - Users notice when things feel off, not when they feel right
- Control + automation works best - Give users a steering wheel even if you’re doing most of the driving
- Retention > acquisition - Better to have fewer engaged users than many churned ones
- Measure early, measure often - Data reveals truth that user interviews miss
What’s Next
We’re now preparing for public launch in Q2 2025. The roadmap includes:
- Expanding to 10 languages
- Adding real-time conversation practice with AI tutors
- Building social learning features
- Creating a web app alongside mobile
Building Ellie has been the most challenging and rewarding project of my career. If you’re building a learning app or working on personalization problems, I’d love to chat. Reach out on Twitter or email.
This is part of a series on building Ellie. Next up: “The ML Pipeline Behind Adaptive Learning”