How to build an app like Duolingo: features, tech stack, and cost
- Riya ThambirajBuild & ShipLast updated on

Summary
Building an app like Duolingo requires a gamification engine (streaks, XP, hearts, leaderboards), a lesson module system with multiple exercise types (multiple choice, fill-in, speaking, listening), speech recognition, adaptive learning algorithms, and push notification infrastructure. An MVP with one language costs $40,000-$80,000 and takes 16-20 weeks. A full version with three languages and all features costs $120,000-$250,000. The hardest parts are not the UI: they are the content creation pipeline (thousands of exercises per language), adaptive difficulty tuning, and reliable speech recognition for speaking exercises.
Key Takeaways
Duolingo's retention is built on streak mechanics and XP loops — 95% of language learners quit within a week, and streak systems directly cut that churn.
The hardest part of building a Duolingo-like app is not the gamification UI: it is the content pipeline. Each language requires thousands of exercises, audio recordings, and adaptive difficulty tuning.
An MVP with one language, core lesson types, and basic gamification costs $40,000-$80,000 and takes 16-20 weeks with a focused team.
Speech recognition for speaking exercises is a genuine engineering challenge — off-the-shelf APIs (Google Speech-to-Text, Azure Speech) handle the heavy lifting, but scoring pronunciation accurately still requires calibration.
Gamified apps see 40% higher engagement than traditional learning apps, but gamification that isn't tied to real learning progress feels hollow and churns users at the point where novelty wears off.
Duolingo has 500 million registered users and 37 million people who open it every day. That is not because French is suddenly exciting. It is because Duolingo made a habit loop so well-engineered that missing your streak feels like a real loss.
95% of language learners quit within a week. Streak mechanics, XP rewards, and a green owl with emotional range cut that number significantly. Gamified learning apps see 40% higher engagement than traditional formats. Duolingo did not discover this by accident — they built an entire behavioral design system on top of a language curriculum.
This post covers what actually goes into building something like it: the features that drive retention, the technical decisions that matter, and what a realistic build costs.
TL;DR
Why Duolingo actually works
Before talking about how to build it, it helps to understand why the product retains users at a rate that most edtech apps do not come close to.
Streaks: A daily active day counter. Once your streak hits 30, 60, or 100 days, losing it genuinely hurts. This is loss aversion engineered into a product. The streak is not a feature — it is the retention mechanism.
XP and levels: Experience points earned per lesson, visible on your profile. Levels give users a sense of progress that is separate from actual language competence. Users who are not yet conversational still feel accomplished because their XP is climbing.
Hearts: Duolingo's mistake limit per session. You get five hearts. Each wrong answer costs one. Run out and the session ends. This creates focus — every answer matters. It also drives the Duolingo Plus subscription, which gives users unlimited hearts.
Leaderboards: Weekly rankings against other learners in your "league." Bronze, Silver, Gold, Sapphire, Ruby, Diamond. You can be promoted or relegated each week based on XP earned. This is competitive social engagement layered on top of individual habit-building.
Duo the owl: The emotional character of the app. Duo celebrates wins, looks disappointed at misses, and sends push notifications with escalating urgency ("It's been 5 days. I miss you."). The character is not a mascot — it is a psychological trigger for emotional engagement.
Bite-sized lessons: Each lesson takes 3-5 minutes. This removes the "I don't have time" objection. You have five minutes. The lesson fits.
None of this is accidental. Every one of these systems required deliberate product design, and every one of them needs to be built.
Core features to build
User onboarding and language selection
First-time setup is where Duolingo makes a key UX decision: before you see a single lesson, you pick your goal (Casual = 5 min/day through to Serious = 20 min/day), your motivation, and your starting level. This personalizes the experience immediately.
Build: language selection with proficiency test option, daily goal setting, reason for learning (travel, work, school), and account creation with social login.
Lesson modules
This is the meat of the product. A lesson is a sequence of exercises, typically 15-20 per session. Exercise types to support:
Multiple choice — pick the correct translation or fill the blank
Translation — type a sentence in the target language
Fill-in-the-blank — complete a sentence with the missing word
Listening comprehension — hear audio, select or type what you heard
Speaking exercises — say a phrase into the microphone, scored against the expected pronunciation
Word matching — pair words in the native and target language
Each exercise type has its own component and scoring logic. The lesson engine sequences exercises based on the curriculum and the adaptive learning layer.
- ✓Multiple choice translation exercises
- ✓Text translation (typed input)
- ✓Fill-in-the-blank
- ✓Listening comprehension with audio playback
- ✓Speaking exercises with microphone input
- ✓Word bank tap-to-select (mobile)
- ✓Word matching pairs
- ✓Image-to-word association
Streak system and notifications
The streak is architecturally simple: a daily completion boolean per user, a running count, and a background job that checks each morning. What makes it work is the notification layer.
Users who have not completed their daily lesson get a push notification in the evening — timed based on their usual activity window. The message escalates over days: a gentle reminder at day one, Duo looking worried at day two, Duo looking devastated at day three.
Build this with a scheduled job (AWS EventBridge or a cron job service), Firebase Cloud Messaging for Android, and Apple Push Notification Service for iOS. The notification content should be dynamic based on streak length and days since last activity.
Streak freeze — the feature that protects a streak for one missed day — is a stored flag on the user record, consumed by the morning job before it resets the streak counter.
XP and level progression
XP is awarded per lesson completed, with bonuses for perfect sessions and longer streaks. Levels are XP thresholds — enough points and you level up. These numbers display on the user profile and feed the leaderboard.
The math does not have to be complex. A flat 10 XP per lesson with a 5 XP bonus for zero errors works for an MVP. Refine the economy later based on engagement data.
Hearts system
Five hearts per session. Wrong answer removes one heart. Zero hearts ends the session. Hearts regenerate over time (one per hour in Duolingo's default), or replenish instantly with a heart refill (premium feature) or by completing practice exercises from the review queue.
The hearts system introduces friction that improves focus. It also creates a premium upgrade path: unlimited hearts as a subscription benefit.
Leaderboards
Weekly leaderboards showing XP earned against a group of other learners in your league. The group resets each week — top performers advance to the next league, bottom performers are relegated.
This is a read-heavy, write-light system. Users earn XP throughout the week; the leaderboard ranking updates in near-real-time. Redis sorted sets are the natural data structure for this — add XP as a score, retrieve the top N users in O(log N) time.
Social features
Friends lists, following, and seeing a friend's streak or weekly XP on their profile. This is lightweight social compared to a full social network — more accountability layer than social media.
Offline mode
Lessons downloaded for offline use. This is important for users in areas with spotty connectivity and for commutes. The technical approach is to pre-download lesson content (exercise data, audio files) to local storage, queue XP and progress updates locally, and sync when connectivity returns.
React Native with a local SQLite database (via expo-sqlite or WatermelonDB) handles this well.
Progress tracking
Per-skill completion percentage, overall course map progress, vocabulary mastered count, and historical XP/streak data. This feeds both the user-facing dashboard and the adaptive learning layer.
What makes this harder than it looks
Content creation pipeline
Every exercise requires content. A language course with 1,000 exercises per skill across 25 skills means 25,000 exercises for a single language pair. Each exercise needs:
Source sentence
Target sentence
Audio recording (native speaker, ideally multiple voices)
Difficulty classification
Skill and topic tags
Duolingo has a full content team and crowdsourced translation from their contributor community. For a custom build, you need a content management system where linguists can create, review, and publish exercises, plus a recording workflow for audio production.
This pipeline is often a larger investment than the app itself. Budget time and cost for it separately from the engineering work.
Adaptive learning algorithm
A static curriculum sends every user through exercises in the same order regardless of what they know. An adaptive system tracks performance and adjusts — surfacing vocabulary a user struggles with more frequently, skipping ahead in areas they already know.
The starting point is spaced repetition, specifically a variant of the SM-2 algorithm. It tracks each vocabulary item per user with a difficulty score and a next-review date. Items answered wrong are scheduled for review sooner. Items answered correctly are pushed out further.
A more sophisticated version uses ML to predict per-user difficulty and optimal exercise sequencing. This requires training data (user response patterns across millions of sessions) and is a later-phase project for most teams.
Speech recognition for speaking exercises
Getting a user to say "Je voudrais un café" into a microphone and accurately scoring whether their pronunciation was correct is not trivial. Duolingo uses a combination of cloud speech-to-text (transcribing what was said) and a pronunciation scoring model (how close the pronunciation was to native speech).
For a custom build: Google Speech-to-Text or Azure Speech Services handles transcription. Scoring pronunciation accuracy on top of the transcript is an additional layer — either a custom model trained on language learner data or a third-party pronunciation API (SpeechSuper, Speechace).
Expect this feature to take 4-6 weeks including integration, testing across device types and accents, and UX around what to do when the microphone doesn't pick up clearly.
Audio quality for listening exercises
Listening exercises require high-quality audio recordings for every sentence in the content library. Text-to-speech synthesis (Google TTS, Amazon Polly, or ElevenLabs) can generate audio from text at low cost, but TTS quality varies significantly by language and is noticeably synthetic.
For premium products: human-recorded audio from native speakers. For an MVP: TTS with carefully selected voices per language. Either way, audio files need to be stored (S3 or equivalent) and served via CDN with low latency.
Tech stack
Mobile: React Native (iOS and Android from one codebase). Expo for faster development setup. Offline support via expo-sqlite or WatermelonDB.
Backend: Node.js or Python (FastAPI). REST API for lesson delivery and progress sync. Python is the better choice if you are building ML-based adaptive learning, since the ML ecosystem is Python-native.
Database: PostgreSQL for user data, lesson content, progress records, and streak state. Redis for leaderboard sorted sets, session state, and caching frequently accessed lesson data.
Speech recognition: Google Speech-to-Text API or Azure Cognitive Services Speech. Budget API costs per speaking exercise submitted.
Audio storage: S3 with CloudFront CDN. Pre-generate audio at content creation time, not at request time.
Adaptive learning: SM-2 spaced repetition algorithm in v1 (implement in-app, no separate service needed). ML-based adaptive model in v2, served via a Python FastAPI endpoint.
Push notifications: Firebase Cloud Messaging (Android) and APNs (iOS), abstracted via OneSignal or Expo Notifications.
Infrastructure: AWS or GCP. Containerized backend (ECS or GKE). Auto-scaling is important as lesson completion events are bursty (everyone opens the app at the same times of day).
Cost breakdown
The biggest cost variables are:
- Content creation — exercises, audio, and curriculum design per language. This often matches or exceeds the engineering cost.
- Speaking exercise infrastructure — speech recognition APIs are pay-per-use and add up quickly at scale.
- Adaptive learning — basic spaced repetition is cheap to build. ML-based adaptive learning requires training data and model infrastructure.
Build vs. buy
Off-the-shelf LMS platforms (Teachable, Thinkific, Kajabi) handle video-based courses and basic quizzes. They do not replicate gamification. There is no streak system, no XP economy, no hearts, and no leaderboard in a Teachable course.
Platforms like Duolingo ABC or language learning SDKs exist but are niche and limited in customization. If gamification is the core of your product — if the behavioral retention loop is why users return every day — you need a custom build.
Use an off-the-shelf LMS if: you are building a structured video course, you have under 1,000 users, and you do not need daily-habit mechanics.
Build custom if: retention is driven by gamification, you need specific exercise types (speaking, listening), you want control over the content model, or you are building a B2B product with custom branding and white-labeling.
Note
One common mistake: teams build the gamification layer (streaks, XP, leaderboards) before building enough content to sustain engagement. The mechanics work. Users open the app, earn XP, and protect their streak — for two weeks. Then they run out of content and churn. Build your content pipeline in parallel with the app, not after it.
How to start
Week 1-2: Define your content model. How many skills? How many exercises per skill? What exercise types? Map the curriculum before writing code — the database schema depends on it.
Week 3-4: Build the lesson engine. One exercise type (multiple choice) working end-to-end, from content in the database to a user completing a session and earning XP. Everything else builds on this.
Week 5-10: Add exercise types, streak system, push notifications, and XP/level progression. This is the gamification core.
Week 11-16: Speaking and listening exercises, leaderboards, progress dashboard, offline support. Polish and QA.
After launch: Watch lesson completion rates and day-7 retention. These numbers tell you whether the gamification loop is working. A day-7 retention above 25% is a strong signal. Below 15% and the habit loop is not forming — investigate whether the content is too hard, the onboarding is losing people, or the notifications are not landing.
Building an edtech or gamified learning product? We build custom software for edtech companies and SaaS businesses. See our custom software development services or talk to a founder about your product.
Frequently Asked Questions
- An MVP with one language, core lesson types (multiple choice, fill-in-the-blank, translation), a streak system, and XP progression costs $40,000-$80,000 and takes 16-20 weeks. A full version with three languages, speaking and listening exercises, leaderboards, social features, and offline mode costs $120,000-$250,000 and takes 28-40 weeks. A Duolingo-scale platform with 50+ languages, ML-driven adaptive learning, and a full content management pipeline costs $500,000 and up. The biggest cost variables are content creation (exercises per language), speech recognition integration, and the adaptive learning algorithm.
- The standard stack for a Duolingo-like app is React Native for mobile (iOS and Android from one codebase), Node.js or Python on the backend, PostgreSQL for user data and lesson content, and Redis for session state and leaderboard rankings. For speech recognition, use Google Speech-to-Text or Azure Speech Services rather than building your own. For audio delivery, store recordings in S3 and serve via a CDN. ML-based adaptive learning can be added in a later phase using Python-based model serving (FastAPI + TensorFlow or PyTorch).
- The streak is a daily active day counter stored per user. A background job checks each morning whether the user completed at least one lesson the previous day. If yes, the streak increments. If no, it resets to zero. Push notifications fire at a set time (usually evening) to remind users who have not yet completed their daily goal. The streak freeze feature (which protects a streak for one missed day) is a stored flag that the job checks before resetting. Technically this is not complex — the power is in the behavioral psychology, not the code.
- Adaptive learning adjusts lesson difficulty based on a user's past performance. A basic version uses spaced repetition — the system tracks which vocabulary items a user has answered correctly and incorrectly, and resurfaces hard items more frequently. This can be built with a variant of the SM-2 algorithm (open source, well-documented) in a few weeks. A more advanced version uses a machine learning model trained on user performance data to predict which exercise type and difficulty level each user should see next. The ML version requires meaningful training data, so most teams start with rule-based spaced repetition and add ML in a later phase.
- Off-the-shelf LMS platforms (Teachable, Thinkific, Kajabi) handle video courses and basic quizzes well. They do not replicate gamification — no streaks, no XP system, no hearts, no leaderboards. If gamified engagement is the core of your product, you need a custom build. If you are building a structured language or skill course with some light quiz elements, a white-label LMS might be sufficient for an early version. The correct answer depends on whether gamification is your product's core differentiator or a secondary enhancement.


