Multi-tenant SaaS tenant isolation: Three patterns, three trade-offs
Three multi-tenant SaaS products, three different tenant isolation solutions: schema-per-tenant, row-level security, database-per-tenant. Trade-offs plus a decision matrix.
If you’re building a multi-tenant SaaS and the “data from one tenant must not leak to another” question sits on your architecture table, this post sharpens the call. Tenant isolation is the architectural decision we revisit most often inside multi-tenant SaaS work. The question looks simple — “data from one tenant must not leak to another” — but the answer opens up to three different patterns, and the operational cost, performance ceiling, and audit/compliance posture of each are very different.
Across three of our own products — caveflo, crm2b, and Mediatic AI — tenant isolation is solved three different ways, because the scale, sensitivity, and operational capacity were different in each case. This post walks through the three patterns — schema-per-tenant, row-level security (RLS), database-per-tenant — with examples from our production reality, then a performance comparison, migration strategies, and a decision matrix for which pattern fits which situation. We work in Postgres because that is what the three products run on; many of the concepts carry over to MySQL and SQL Server, but the syntax is different.
Pattern 1: Schema-per-tenant — strict isolation
The first pattern is the classic: a separate Postgres schema for each tenant. The connection pool points at the same database, but the application sets search_path at the start of every request to switch into the active tenant’s schema.
-- During tenant onboarding
CREATE SCHEMA tenant_a23f9;
CREATE TABLE tenant_a23f9.contacts (
id UUID PRIMARY KEY,
email TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- At the start of each request
SET search_path TO tenant_a23f9, public;
SELECT * FROM contacts WHERE email = $1;
Upside. Isolation is genuinely strict — to query another tenant’s schema you have to explicitly write tenant_b.contacts. Per-tenant migrations become possible: a single customer can get a feature flag, an extra column, or a custom table inside their own schema. Backup and restore for one tenant is clean: pg_dump -n tenant_a23f9.
Downside. Migration complexity grows linearly with tenant count. For a 50-tenant install, adding a new column means running ALTER TABLE 50 times. Tools like dbmate, sqitch, or Flyway can automate it, but at 500+ tenants the migration window itself becomes an incident. Each schema also carries roughly 5MB of metadata overhead; 1,000 tenants means about 5GB of metadata sitting in Postgres.
The connection pooling problem. PgBouncer in transaction-pooling mode plus schema-per-tenant produces unexpected bugs because SET search_path resets at the end of every transaction. Fix: either move to session pooling (which exhausts connection limits fast) or stop using SET search_path and use fully-qualified names (tenant_a23f9.contacts) in every query. The second is cleaner but requires every ORM call to carry the tenant prefix.
We chose this pattern for caveflo at launch because every customer runs an essentially independent sales operation; per-tenant customization demand was high. At ~120 tenants the migration tooling is sustainable; two engineers maintain the migration runner. We have a migration plan for moving to RLS once we cross 500+ tenants.
Pattern 2: Row-level security (Postgres RLS) — shared schema
The second pattern uses RLS, available since Postgres 9.5. All tenants share the same tables, every table has a tenant_id column, and policies are enforced at the database level.
-- Table definition
CREATE TABLE contacts (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
email TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_contacts_tenant ON contacts (tenant_id);
-- Enable RLS
ALTER TABLE contacts ENABLE ROW LEVEL SECURITY;
ALTER TABLE contacts FORCE ROW LEVEL SECURITY;
-- Policy
CREATE POLICY tenant_isolation_policy ON contacts
USING (tenant_id = current_setting('app.tenant_id')::UUID);
-- At the start of each request
SET app.tenant_id = 'a23f9-...';
SELECT * FROM contacts WHERE email = $1;
-- Postgres filters automatically: WHERE tenant_id = 'a23f9-...' AND email = $1
Upside. Migrations run once and apply to every tenant. Backup is a single database backup. With Postgres 14+ partitioning combined with RLS, performance gets close to schema-per-tenant. We also stack an app-level guard alongside RLS — every query carries WHERE tenant_id = ?, and RLS sits behind it as a safety net. Two layers of defense.
Downside. Policy complexity grows with use cases. If “company group” is part of the model (tenant A is a parent of tenant B), the policy starts looking like tenant_id = current OR tenant_id IN (SELECT child FROM tenant_hierarchy WHERE parent = current) — easy to write, harder to test. Test coverage has to be aggressive because an RLS policy bug is a critical data leak.
Index strategy is critical. The tenant_id column has to lead the primary index on every table. Wrong: INDEX (created_at, tenant_id). Right: INDEX (tenant_id, created_at). On a table with 10K rows per tenant, the wrong index ordering can blow up p95 query latency by 50x. EXPLAIN ANALYZE is non-negotiable.
Connection pooling is easier. RLS works cleanly with transaction pooling — SET app.tenant_id runs at the start of every transaction and survives the PgBouncer transaction-pooling boundary. None of the session-pooling constraints that schema-per-tenant introduces.
We use RLS in crm2b from day one. Currently around 700 tenants; query latency p95 is 12ms (thanks to the tenant-id index). New features ship as a single migration. The reason the operations team picked this pattern: at 500+ tenants, schema-per-tenant becomes operationally exhausting; RLS policy complexity is manageable, schema-per-tenant migration complexity is not.
Pattern 3: Database-per-tenant — for the enterprise tier
The third pattern is the strictest: a separate Postgres database for each tenant. Either multiple databases on the same server or different servers entirely. A “tenant routing” layer in the application directs each request to the right database.
// Pseudocode — connection routing
function getDbConnection(tenantId: string): Connection {
const dbConfig = tenantConnectionMap.get(tenantId);
return connectionPool.connect(dbConfig);
}
// Usage
const db = getDbConnection(req.tenantId);
const contacts = await db.query("SELECT * FROM contacts WHERE email = $1", [email]);
Upside. Maximum isolation: a single tenant’s database failing does not affect others. Audit and compliance benefits are large — financial services, healthcare, government work all carry “data residency” and “physical separation” requirements that this pattern satisfies cleanly. Backup, encryption at rest, and compliance audit happen against a single database; per-tenant SLAs become possible.
Downside. Operational cost is unworkable past 100+ tenants. 1,000 tenants = 1,000 Postgres databases = 1,000 backup pipelines + 1,000 connection pools + 1,000 migration runs. On managed services like AWS RDS, the cost of 1,000 instances is astronomical. On self-managed Postgres, holding 1,000 databases on a single cluster is theoretically possible, but vacuum and analyze costs stack quickly.
The fix: enterprise-only deployment. Use database-per-tenant for the enterprise tier only. The main product runs on RLS (700 tenants); the 5-15 enterprise contracts get their own databases. The hybrid keeps operational cost reasonable and gives enterprise buyers an answer to compliance questions.
We use the hybrid model in Mediatic AI: the main install runs on RLS, and enterprise-tier customers (large agency groups, public-sector buyers) run on dedicated databases. We can answer compliance questions with “yes, your data sits in a physically separate database” — concrete leverage in enterprise sales conversations.
Performance comparison
Typical metrics across our three products (production, 90-day average):
- caveflo (schema-per-tenant, ~120 tenants): p95 query latency 8ms, connection pool overhead 4%, vacuum/analyze ~22 minutes/week.
- crm2b (RLS, ~700 tenants): p95 query latency 12ms, connection pool overhead 2%, vacuum/analyze ~38 minutes/week.
- Mediatic AI (hybrid: main RLS + 9 database-per-tenant): p95 query latency 14ms (RLS), 7ms (database-per-tenant), connection pool overhead 6% (hybrid manager), vacuum/analyze ~45 minutes/week.
Two observations stand out:
- Schema-per-tenant gives the lowest query latency — the Postgres planner picks better index strategies on smaller per-tenant tables. The trade-off is connection pool overhead (every schema needs its own search_path).
- RLS vacuum/analyze cost rises with scale. Vacuuming a single 7M-row table across 700 tenants costs roughly 70% more than vacuuming 700 tables of 10K rows. With Postgres 15+ parallel vacuum, the gap shrinks; with autotuning, 38 minutes/week is acceptable.
Performance alone is not the deciding factor. The gap between 100ms p95 and 12ms p95 matters little for most B2B SaaS; it matters for a consumer SaaS handling 10K req/s. Weigh against your scale and UX sensitivity.
Migration strategies: schema-per-tenant to RLS
Moving between patterns is operationally hard. The most common need is migration from schema-per-tenant to RLS — most SaaS products start on schema-per-tenant (it feels intuitive) and run into migration complexity at 500+ tenants.
The migration playbook we wrote for caveflo (planned for next quarter, not yet executed):
- Preparation: build the shared schema. A new schema defines every table with a
tenant_idcolumn. RLS policies are written and tested against test data. - Dual-write phase. The application writes to both the old schema-per-tenant structure and the new shared schema. 2-4 weeks. During this phase, daily consistency checks compare row counts on both sides.
- Backfill. Historical data moves to the shared schema. Per tenant:
INSERT INTO shared.contacts (tenant_id, ...) SELECT 'tenant-a-uuid', ... FROM tenant_a23f9.contacts. For 100M+ rows, 1-2 weeks of batched migration jobs. - Read switchover. The application moves read traffic to the shared schema. Writes still go to both. One-to-two weeks of observation.
- Write cutover. Writes move fully to the shared schema. Old per-tenant tables stay live for reads for four more weeks (rollback insurance), then archive.
The full process typically runs 6-10 weeks and consumes one full-time engineer plus a part-time DBA. The product team slows new feature work during this window — there is no “downtime,” but feature velocity drops 30-50%. Without that investment up front, the cost of escaping schema-per-tenant grows linearly with each new tenant.
Operational reality: which pattern at which size
The operational reality of the three patterns sorts cleanly along the axes of tenant count and isolation sensitivity.
- <100 tenants + heavy per-tenant customization: Schema-per-tenant wins. Migration complexity is not yet exhausting.
- 100-500 tenants + standard feature set: RLS wins. Single migration, no or minimal per-tenant customization.
- 500+ tenants + standard: RLS, definitively. Schema-per-tenant becomes operationally unworkable.
- Compliance / data residency / enterprise tier: Hybrid (main RLS + enterprise database-per-tenant). Do not apply database-per-tenant to the whole tenant base.
- A handful of “very large” tenants (e.g., 5 customers, each enormous): Database-per-tenant. RLS at this shape produces a single bloated table.
Decision matrix
Three questions usually settle which pattern a new multi-tenant SaaS should start with:
| Question | Answer | Recommendation |
|---|---|---|
| Will tenant count cross 500+? | Yes | Prefer RLS |
| Will tenant count cross 500+? | No | Schema-per-tenant is fine |
| Compliance/data residency required? | Yes | Hybrid (RLS + enterprise DB-per-tenant) |
| Heavy per-tenant customization? | Yes | Schema-per-tenant |
| Ops team has 1 DBA + 2 backend? | Yes | Any pattern works |
| 1 backend, no DBA? | Yes | RLS — lower operational load |
Do not use the matrix in isolation. Look at your roadmap 18 months out. Schema-per-tenant feels comfortable at 50 tenants today; if you do not want a migration trauma at 800 tenants in 24 months, start on RLS.
If you are setting up a multi-tenant architecture, or you see migration looming on an existing system, our enterprise systems practice typically runs a 2-3 week architecture review for exactly these decisions. Reach out — bring your current tenant count, isolation sensitivity, and ops capacity to the discovery call, and we map out which pattern (and which migration path, if needed) fits the next 18 months of growth.