Discover the exact playbook we used to rescue a scaling fintech product from critical database downtime using strategic caching, index optimization, and connection pooling.

It was 10:00 AM on a Friday when our client, a fast-growing digital payment platform, saw its primary database CPU utilization hit 100 percent. It stayed there. Within minutes, connection pools began to saturate, API responses slowed to a crawl, and the mobile checkout screens began throwing timeout errors. For a fintech backend architecture processing thousands of micro-transactions per second, even a few minutes of instability means lost revenue, frustrated users, and regulatory scrutiny.
The client team contacted us because they were trapped in a classic scaling bottleneck. Their platform had achieved product-market fit, but their relational database could no longer keep pace with the write-heavy transactional load and the complex read-heavy reports running simultaneously. They approached our team for a tech partnership and consultation to rescue their platform from critical downtime and build a sustainable database scaling strategy.
In this post, we share the exact playbook we used to stabilize, optimize, and scale this client database. We will walk through how we diagnosed the root causes, restructured the indexes, implemented strategic caching, and refactored inefficient queries. Whether you are building your first fintech MVP or scaling an enterprise financial ledger, these database scaling strategies will help you prevent catastrophic performance failures.
When we first inspected the client database, we saw a classic cascading failure. In high-volume systems, a single slow operation can trigger a chain reaction that brings down the entire application. The core of the platform relied on a PostgreSQL database containing millions of rows of transaction ledgers, user wallets, and audit logs. During normal traffic periods, the system handled the load comfortably, but the morning peak created a perfect storm of lock contention and resource depletion.
The primary issue was how the application handled wallet balance updates. Whenever a user completed a transaction, the backend would start a database transaction, lock the user wallet row, update the balance, insert a ledger record, and commit the transaction. Under heavy load, multiple requests would try to update the same high-frequency merchant wallets simultaneously. This created row-level lock contention, meaning hundreds of database connections were forced to wait for a single lock to be released.
As connections waited, they remained open, consuming valuable slots in the database connection pool. When the pool filled to capacity, incoming requests from the API gateway were rejected, resulting in widespread checkout failures. This reminded us of our work in the anatomy of an API leak incident response and recovery, where a failure in one subsystem quickly starved resources across the rest of the application. The system was not failing because of a lack of hardware, it was failing because the database was spending all its time managing locks rather than executing queries.
To fix a database, you must first understand exactly where it is spending its time. We started our query performance optimization by enabling pg_stat_statements, a built-in PostgreSQL module that records execution statistics for all queries. This allowed us to view the total execution time, call count, and mean runtime for every query run by the application.
The statistics revealed a shocking truth. A relatively simple query used to display a user's transaction history on the home screen was responsible for over 40 percent of the total database load. This query ran every time a user opened the mobile app. Because the table lacked the correct index, PostgreSQL was forced to perform a sequential scan, which means searching every single row in the table from start to finish, for every balance check.
We verified this by running the EXPLAIN ANALYZE command on the offending query. The PostgreSQL EXPLAIN documentation describes how this command runs the query and returns a detailed execution plan, showing how the database engine intends to retrieve the data. The output confirmed that the database was scanning five million rows of transaction data on every home screen load. By identifying these unindexed queries, we found the low-hanging fruit that would yield the most immediate performance improvements.
To understand how the database was behaving, we had to analyze the specific steps the database planner was taking. The execution plan showed that the database was ignoring the primary key because the query was filtering by a non-indexed foreign key, user_id, and sorting by created_at.
Without a proper index, the database engine had to read every data page from the disk into the database memory, sort the records in a temporary disk file, and then return the top ten records. This process took nearly three seconds per query. When multiplied by thousands of active users, this single query was enough to saturate the database storage disk input-output operations.
Indexes are the index cards of your database. Without them, the database engine must read every page on the disk to find a single row. However, in a fintech backend architecture, you cannot simply add indexes to every column. Every index you add slows down write operations because the database must update the index files every time a new row is inserted or updated.
We designed a balanced indexing strategy that optimized read queries without degrading write speeds. First, we created composite indexes, which are indexes that cover multiple columns in a specific order. For the transaction history query, we created an index on both the user ID and the transaction date. This allowed the database to quickly jump to a specific user's transactions and read them in chronological order without sorting them in memory.
Next, we implemented partial indexes. In PostgreSQL, a partial index is an index built over a subset of a table defined by a conditional statement. For example, the client had millions of historical transactions, but users rarely queried transactions older than 30 days. We created a partial index that only indexed transactions created in the last month. This kept the index size small, ensuring it could fit entirely into the database RAM, random access memory, for rapid access, while dramatically speeding up the most common queries.
EXPLAIN to verify if the query planner will actually use your new index.Even with perfect indexes, hitting the database for every single read operation is unsustainable at scale. To relieve the pressure, we introduced Redis, a high-performance in-memory database, as a caching layer. Caching stores frequently accessed data in memory so that subsequent requests do not need to query the primary database.
We chose the cache-aside pattern for this implementation. When the mobile app requested a user's current wallet balance, the backend would first check Redis. If the balance was found in the cache, a cache hit occurred, and the data was returned instantly. If the balance was not found, a cache miss occurred, the backend queried the PostgreSQL database, stored the result in Redis for future requests, and returned it to the user.
The biggest challenge with caching in financial applications is cache invalidation, which is the process of updating or removing cached data when the source database changes. In a fintech app, showing an outdated balance is unacceptable. We solved this by using transaction-bound cache eviction. Whenever a wallet balance was successfully updated in PostgreSQL, the backend would immediately evict the old balance from Redis. This ensured that the next read request would fetch the fresh, accurate balance from the database and rebuild the cache. Implementing this cache-aside model allowed us to bypass the database for over 80 percent of read requests, similar to how we design local-first sync strategies to minimize server load by keeping data close to the user. You can learn more about how Redis manages memory and data structures in the Redis official documentation.
With reads optimized and cached, we turned our attention to the write pipeline. When thousands of users try to make payments simultaneously, the database can easily run out of available connection slots. Each database connection consumes significant server memory and CPU, so opening too many connections actually degrades performance.
To solve this, we implemented PgBouncer, a lightweight connection pooler for PostgreSQL. According to the PgBouncer architecture documentation, a connection pooler acts as a middleman between the application and the database. Instead of the application opening a dedicated database connection for every single user request, PgBouncer maintains a small pool of active connections and shares them among the incoming web requests. This kept our database connection count stable, even when traffic spiked by 500 percent.
For non-instantaneous write operations, such as generating monthly statement PDFs or sending transaction notification emails, we moved them out of the main transaction flow entirely. We introduced an asynchronous message queue using RabbitMQ. When a transaction completed, the backend placed a message on the queue and returned a success response to the user immediately. Background worker processes then consumed these messages and performed the slow, secondary tasks. This kept our web servers free to handle new incoming transactions. This architecture aligns closely with the principles of managed container orchestration, where isolating tasks into specialized containers ensures that high-volume writes do not block critical user actions.
As the platform continued to grow, a single database server, even with caching and connection pooling, became a single point of failure and a scaling bottleneck. To scale further, we implemented database replication, which is the process of copying data from one database server to one or more replica servers in real-time.
We set up a primary-replica architecture. The primary database server handled all write operations, such as creating transactions, updating balances, and modifying user profiles. We then configured two read replicas, which were read-only copies of the primary database. We updated the backend application code to route all read-only queries, such as transaction history lists, user profile displays, and administrative reports, directly to these read replicas.
The main challenge with this approach is replication lag, which is the delay between a write happening on the primary server and that write appearing on the replica servers. In fintech, if a user sends money and is redirected to their transaction history screen, seeing a missing transaction due to replication lag can cause panic. We implemented a smart routing middleware. If a user had performed a write operation in the last five seconds, the application would route their subsequent read queries to the primary database. For all other users, reads were safely routed to the replicas, ensuring high consistency where it mattered most.
Over years of rapid growth, codebases accumulate technical debt, and database queries are no exception. The client database had several legacy SQL queries that had been written when the table had only a few thousand rows. These queries used complex joins, subqueries, and temporary tables that worked fine in development but choked under production volumes.
We undertook a massive refactoring effort to simplify these queries. One major culprit was a query that calculated daily transaction volume by joining the transactions table with the users, merchants, and promotions tables. The query used multiple subqueries inside a loop, causing an N+1 query problem, which is when an application makes one database query to fetch a list of items and then executes another query for every single item in that list.
We refactored this query to use common table expressions, or WITH clauses, which allow you to write auxiliary queries for use within a larger query. We also replaced the nested queries with explicit, index-backed joins. This reduced the query execution time from twelve seconds down to just eighty milliseconds. This refactoring effort showed us how vital it is to address technical debt early, a topic we cover extensively in our guide on refactoring legacy database structures. Simplifying your SQL code is often the cheapest and most effective way to scale your database without paying for expensive hardware upgrades.
One of the most valuable lessons we have learned as an agency is that technical problems cannot always be solved with technical solutions alone. Sometimes, the most effective way to scale a database is to change the product requirements. We worked closely with the client's product management team to align their feature designs with the physical limitations of database hardware.
For instance, the original product design required the mobile app to show a real-time, animated counter of the total platform-wide transaction volume on the landing page. To power this counter, the database had to run a count query across the entire transaction table every few seconds for millions of active users. This single product feature was consuming a massive amount of database resources for very little business value.
We proposed a compromise: instead of running a live database count, we would pre-calculate the transaction volume once every hour and store that single number in the Redis cache. The mobile app would read the cached number and use client-side animation to simulate a real-time count. The users still got an engaging experience, but we eliminated millions of expensive database queries. This is why we value a product-minded engineering mindset; when developers understand the business goals and user experience, they can find creative ways to protect system architecture while still delivering what the users want.
Scaling a database is not a one-time event, it is a continuous process of monitoring, adjusting, and planning. By implementing caching, composite indexing, connection pooling, and read replicas, we rescued our client's fintech backend from critical downtime and prepared their platform for future growth. The database CPU utilization dropped from a pinned 100 percent down to a stable 25 percent, even during peak Friday evening traffic.
To ensure the system remains stable over the long term, we set up continuous database monitoring. We integrated tools like Datadog to track key metrics such as index hit rate, replication lag, connection counts, and slow queries. You can explore how these metrics help maintain database health in the Datadog database monitoring guide. This monitoring acts as an early warning system, alerting the engineering team before a slow query or a full disk causes another outage.
For teams that lack the internal resources to manage this constant vigilance, partnering with a professional team for maintenance and customer support is an effective way to keep your database healthy while your internal developers focus on shipping new features. Having a dedicated team monitoring your system ensures that small performance degradation issues are caught and fixed before they turn into customer-facing incidents.
Our team followed a structured process to complete this rescue operation. This checklist outlines the steps we took to stabilize and scale the system:
Key takeaways
- Solve lock contention first: High CPU utilization during peak traffic is often caused by queries waiting on database locks rather than actual computation.
- Use composite and partial indexes: Target your indexes to match the specific filtering and sorting patterns of your slowest queries.
- Protect your connection pool: Implement a connection pooler like PgBouncer to shield your database from connection spikes.
- Cache with strict invalidation: Use Redis to offload read traffic, but tie cache eviction directly to database write transactions to prevent stale data.
- Align product with engineering: Work with product teams to design features that respect database limits, such as using cached values instead of live counts.
Database scaling is one of the most challenging aspects of building a fintech platform. Relational databases are powerful, but they require careful design, regular maintenance, and a deep understanding of how your queries interact with the underlying hardware. By taking a systematic approach to auditing queries, designing smart indexes, and using in-memory caching, you can build a backend that handles peak traffic without breaking a sweat.
If your growing application is experiencing slow load times, connection timeouts, or database scaling pains, you do not have to solve it alone. As an experienced custom software development partner, we have helped dozens of companies optimize their systems and scale their databases to handle millions of transactions. If you are planning a migration, rewriting legacy code, or preparing for a major traffic event, we are happy to talk it through and help you build a reliable, high-performance architecture.
01 · RelatedDiscover this week's essential technical trends, from local-first architectures and small language models to modular monoliths and server-side WebAssembly.
Read post
02 · RelatedLearn how to transition your mobile app from static request-response APIsto autonomous reasoning agents using modern edge and cloud architectures.
Read post
03 · RelatedA practical, opinionated rundown of architecture, state management, offline-first databases, and security strategies for mobile engineering leaders.
Read postWe will reply in plain English within one business day, NDA on request. Discovery call is free.