How to Build an App Like Discord: Real-Time Community Platform Architecture and Real Costs
- Ashit VoraBuild & ShipLast updated on

Summary
To build an app like Discord, you need a real-time messaging layer (WebSockets), a hierarchical data model (servers > channels > threads), a WebRTC-based voice system using a Selective Forwarding Unit (SFU), and a permission engine. An MVP covering text channels, DMs, and basic voice rooms takes 14-20 weeks and costs $80K-$150K. A full platform with video stage channels, bot APIs, and server monetization costs $180K-$350K. The hardest engineering problems are concurrent WebSocket connections at scale, message delivery ordering, and WebRTC SFU setup for multi-party voice.
Key Takeaways
Discord is a three-level hierarchy: servers (communities) contain channels (topics), which contain threads and messages. Get this data model right in v1 -- changing it later breaks everything.
WebRTC voice rooms require a Selective Forwarding Unit (SFU) -- not peer-to-peer -- once you have more than 3-4 participants. Mediasoup and Janus are the open-source options; Livekit is the managed alternative.
Presence (online/idle/do not disturb/offline) looks trivial and requires dedicated infrastructure. It is a separate service from messaging, not a field you add to the users table.
A text-only community MVP (servers, channels, DMs, roles, invite links) takes 14-20 weeks and $80K-$150K. Adding WebRTC voice and video adds another 8-14 weeks and $40K-$100K.
The bot and integration API is what made Discord the default for gaming communities. Skip it for v1. It is an entire developer platform, not a feature.
Most founders who want to build a Discord alternative think they are building a chat app. They are not. Discord is a persistent community operating system -- a place where communities live 24 hours a day, with always-on voice rooms, hierarchical channel structures, role-based access control, and a bot ecosystem that extends its functionality in every direction.
The chat UI is the easy part. The hard parts are concurrent WebSocket connections at scale, WebRTC voice rooms that work reliably for 20+ people, a permission system that handles complex role hierarchies, and presence infrastructure that does not collapse under load.
This guide covers the product architecture, the engineering problems that actually matter, and realistic costs for building a real-time community platform.
TL;DR
What Discord actually is
Discord is not just a messaging app. It is a hierarchy with three levels:
Servers are communities. A server can be a gaming clan, an open-source project, a startup team, or a hobby group. Servers have their own identity, rules, member roles, and channel structure. A user can be a member of dozens of servers simultaneously.
Channels are topics within a server. A server might have a #general text channel, a #announcements channel (read-only for most members), a #dev-talk channel for technical discussion, and several voice channels for different activities. Channels have individual permission settings that override server-wide role permissions.
Threads are sub-conversations within a channel. A message in #general can spawn a thread for a specific discussion without cluttering the main channel.
This three-level hierarchy is what makes Discord more than a chat app. It gives communities the ability to organize themselves at scale. A server with 50,000 members can route different conversations to different channels without chaos.
If you are building a community platform, this hierarchy is your core data model. Get it right in v1. Changing it after launch with real user data is a months-long migration project.
V1 features: Build only these
Server creation and management
Users create servers, invite others via shareable links, and configure basic settings. For v1, a server has a name, an icon, and a list of channels. Server discovery (browsing public servers) is a v2 feature -- it requires moderation infrastructure you do not have yet.
Text channels
Public and private text channels within a server. Members post messages, share files, and react with emoji. The channel shows message history from the beginning (or a configurable cutoff). Messages support basic formatting: bold, italic, code blocks, links.
Direct messages (DMs)
One-on-one messaging between users, independent of any server. DMs need the same real-time delivery guarantees as channel messages but are scoped to two users rather than potentially thousands.
Basic voice rooms
Always-on voice channels that users drop in and out of. When you are in a voice channel, other server members can see your avatar in the channel sidebar. No scheduling, no meeting links -- just join and talk.
For v1, support up to 20 concurrent users per voice room with audio only. Video can come in v2. The WebRTC setup for audio-only is significantly simpler than full video.
User roles and permissions
Servers have roles (Admin, Moderator, Member, Guest). Roles have permissions: can post in channels, can manage channels, can kick/ban members, can view private channels. Individual channels can override role permissions -- a channel can be read-only for Member role but writable for Moderator role.
This is the minimum viable permission system. The full Discord permission model -- with per-channel overrides for every role -- is a v2 project.
Invite links
Shareable links that let new members join a server. Links can be time-limited (expires in 24 hours) or use-limited (valid for 10 invites). When someone visits an invite link, they see the server name and member count before joining. This is the primary growth mechanism for Discord communities.
Notifications
Users get notified of messages in channels they are subscribed to, DMs, and mentions (@username). Notification settings are configurable per server and per channel. Push notifications for mobile, in-app notifications for desktop.
What to skip in v1
Bot and integration API: This is an entire developer platform. Discord's bot ecosystem took years to build. Skip it entirely. Your v1 community platform does not need bots.
Video stage channels: Large-audience video broadcast (Discord Stage Channels). Complex to build, limited use in most community contexts. Audio voice rooms cover 90% of use cases.
Server boosting and monetization: Paid server upgrades for better audio quality, more emoji slots, custom server banners. This is a monetization layer, not a product feature. Build it after you have active communities.
Thread archiving and search: Full-text search across message history is a 4-6 week standalone project. Skip for v1, add it once you have user data proving search is a top need.
Complex role hierarchies: Multiple overlapping roles with inheritance and channel-level overrides. Start with 3 roles (Admin, Moderator, Member) and flat channel permissions.
The hard engineering problems
Real-time messaging at scale
Every connected user maintains a persistent WebSocket connection to receive messages in real time. A server with 5,000 members and 50 active users simultaneously generates 50 WebSocket connections, each receiving messages broadcast across all 50.
At low scale, a single Node.js WebSocket server with Redis pub/sub works. At high scale -- thousands of concurrent users across hundreds of servers -- you need stateful gateway servers with sticky sessions, a message broker (Kafka or Redis pub/sub) routing messages to the correct gateway, and careful memory management for WebSocket state.
The key design decision: a user connected to Gateway Server A will receive messages routed through Gateway Server A. Your routing layer must know which gateway holds each user's connection.
For Discord-scale, Elixir/Phoenix is the most efficient option. Phoenix's channels abstraction was built for this exact pattern and handles 2 million+ concurrent WebSocket connections per server node. Node.js with Socket.io works for smaller scale -- up to a few hundred thousand concurrent connections across a cluster.
Message ordering
When two users in the same channel send messages within milliseconds of each other, which message appears first? On a single server, it is insertion order. On a distributed system, it depends on which server received which message first -- which introduces race conditions.
Discord and Slack both use channel-level sequence numbers. Every message in a channel gets a monotonically increasing ID. All clients sort by sequence number, not by timestamp. Clock skew across servers makes timestamps unreliable for ordering.
Implement this in v1. It is not recoverable from if you skip it -- ordering conflicts corrupt conversation history and are very hard to fix with production data in place.
WebRTC voice rooms: Peer-to-peer versus SFU
Peer-to-peer WebRTC works for 2-3 participants. Beyond that, it breaks. Each participant must send a separate stream to every other participant. With 10 people in a voice room, each person sends 9 outgoing streams. Bandwidth and CPU usage grows exponentially.
A Selective Forwarding Unit (SFU) solves this. Each participant sends one audio stream to the SFU server. The SFU forwards the streams to all other participants without decoding or re-encoding them. Adding a 10th participant requires 10 additional receive connections, not 90.
For v1 voice rooms, use an SFU. Your options:
mediasoup: Open-source, written in Node.js and C++. High performance. Requires more infrastructure setup and expertise than managed options.
Janus: Open-source, written in C. Battle-tested, large community. More configuration overhead than mediasoup.
Livekit: Open-source core with a managed cloud option. Fastest to get running. $0.002 per participant-minute on the cloud tier, or self-host for free.
For most teams building a v1, Livekit managed is the right starting point. Move to self-hosted mediasoup when you have the scale to justify the infrastructure cost.
Presence and typing indicators
Presence (online, idle, do not disturb, offline) looks like a simple field on the user record. It is not.
At scale, presence is a separate service. The challenge: how do you know when a user goes offline? Browsers crash, network connections drop, phones lose signal. You cannot rely on an explicit disconnect event.
The standard approach: heartbeat pings every 30 seconds from each client. If 90 seconds pass with no ping, the user is marked offline. The presence service broadcasts the state change to all servers the user belongs to.
Typing indicators (the "... is typing" notification) are ephemeral -- they do not need to be stored, only broadcast. Broadcast the typing event over WebSocket and let it expire after 5 seconds with no update.
The permission system
Discord's permission system has two layers: server-wide role permissions and per-channel permission overrides.
At the server level, a role like Moderator might have "manage messages" and "kick members" enabled. At the channel level, the Moderator role might have "view channel" disabled for a private staff channel.
The data model looks like this:
Server roles table: role_id, server_id, permissions_bitmask
Channel permission overrides table: channel_id, role_id, allow_bitmask, deny_bitmask
Permission evaluation runs: get the server-level permissions for the user's roles, apply channel-level allow overrides, apply channel-level deny overrides. The result is the effective permission set for that user in that channel.
This bitmask approach is how Discord implements it. Each permission flag is a bit position. You can combine permissions with bitwise OR and check them with bitwise AND. Fast to compute, easy to store.
Tech stack
| Layer | Choice |
|---|---|
| Web frontend | React or Next.js |
| Mobile | React Native |
| Real-time backend | Elixir/Phoenix (high scale) or Node.js + Socket.io |
| WebRTC SFU | Livekit (managed) or mediasoup (self-hosted) |
| Database | PostgreSQL |
| Presence and pub/sub | Redis |
| Message broker (high scale) | Kafka |
| File storage | AWS S3 or Cloudflare R2 |
| CDN | Cloudflare |
| Push notifications | Firebase Cloud Messaging |
| Hosting | AWS, GCP, or Hetzner (for cost efficiency at scale) |
How much does it cost to build an app like Discord?
| Scope | Timeline | Cost |
|---|---|---|
| Text-only MVP (servers, channels, DMs, roles, invites, notifications) | 14-20 weeks | $80K-$150K |
| Text plus voice rooms (WebRTC audio, drop-in rooms) | 18-26 weeks | $120K-$220K |
| Full platform (video, screen share, thread search, moderation tools) | 6-10 months | $180K-$350K |
Post-launch infrastructure at moderate scale (10,000 daily active users, 500 concurrent voice room participants): $5K-$15K/month. Voice rooms are the largest cost driver -- SFU servers are compute-intensive.
Why community platforms are harder than they look
A generic chat app scales linearly. A community platform does not.
The core challenge is fan-out. When a user sends a message in a channel with 5,000 members and 200 currently online, your system must deliver that message to 200 WebSocket connections within 500ms. At low scale, a naive broadcast works. At scale, you need a fan-out service that efficiently routes messages to the correct gateway servers for delivery.
Message delivery guarantees add another layer. In a text message, "delivered" means the server received it. In a community platform, "delivered" means every online member in the channel received it. Building retry logic, delivery receipts, and catch-up sync for users who reconnect after being offline is a significant engineering project.
Then there is abuse prevention. An open community platform attracts spam, harassment, and coordinated attacks. You need rate limiting per user and per IP, server-level ban lists, and moderation tooling -- none of which a consumer chat app needs.
These are not insurmountable problems. They are just problems that take longer to solve than the chat UI does.
Why founders choose RaftLabs to build community platforms
Most agencies build the chat UI and hand you a codebase without the real-time infrastructure. The app looks right in a demo with 5 users. It falls over with 500.
RaftLabs builds the full system: WebSocket gateway architecture, message sequencing, SFU-based voice rooms, the permission engine, and the admin moderation tools. We have shipped real-time products across gaming, professional networking, and team collaboration -- and we know which corners you can cut in v1 and which ones you cannot.
If you want to understand what a scoped Discord alternative would cost for your specific market, start with a 30-minute call.
Frequently Asked Questions
- A text-only MVP -- servers, channels, DMs, roles, invite links, notifications -- costs $80K-$150K and takes 14-20 weeks. Adding WebRTC voice rooms brings the total to $120K-$220K. A full platform with video stage channels, screen sharing, bot APIs, and server monetization costs $180K-$350K and takes 6-10 months. Post-launch infrastructure runs $5K-$20K/month depending on concurrent user load and voice room usage.
- Frontend: React or Next.js for web, React Native for mobile. Real-time backend: Elixir/Phoenix (handles 2M+ concurrent WebSocket connections per node) or Node.js with Socket.io for smaller scale. WebRTC: mediasoup or Janus as SFU for voice/video rooms, or Livekit as a managed alternative. Database: PostgreSQL for messages and user data, Redis for presence and pub/sub. File storage: S3 or compatible object storage. CDN: Cloudflare for static assets and DDoS protection.
- Three things: (1) Message ordering under concurrent writes -- when multiple users send messages simultaneously, all clients must see them in the same order. Use channel-level sequence numbers, not timestamps. (2) WebRTC SFU setup -- peer-to-peer WebRTC breaks at 4+ participants. You need an SFU to forward media streams without decoding/re-encoding. This adds significant infrastructure complexity. (3) The permission system -- Discord allows role-based permissions at the server level and channel-level overrides per role per channel. This is a non-trivial data model to get right.
- Discord uses a Selective Forwarding Unit (SFU) architecture for voice and video. Each client sends one audio/video stream to the SFU server. The SFU forwards the streams to all other participants without mixing them. This scales to dozens of participants without the exponential bandwidth growth of peer-to-peer. Discord built their own SFU. For a custom platform, mediasoup (open source, Node.js/C++) or Livekit (managed, open-source core) are the standard choices in 2026.
- Slack is built for professional teams in defined workspaces. Discord is built for open communities that can have thousands of anonymous members across hundreds of channels. The key architectural differences: Discord needs public server discovery and invite links (Slack is invite-only by default), Discord needs much more aggressive rate limiting and abuse prevention (open communities attract spam), Discord's voice rooms are always-on and drop-in (Slack calls are scheduled), and Discord's permission system must handle anonymous and guest users. The real-time scale Discord operates at is significantly higher than a typical Slack deployment.


