How not to implement Redis

Given how popular Redis is and given how many times I have seen people repeat the same mistakes I thought it will be a good idea to keep logging things that we should not do when it comes to redis.

SOFTWARE ENGINEERINGDATABASESNOSQL

12/20/20233 min read

Sharding for Large Databases

Large databases on a single shard or Redis instance can lead to slow failover, backup, and recovery times. It's recommended to keep shards at recommended sizes, typically around 25GB or 25,000 operations per second.

Redis Cloud recommends sharding for data exceeding 25GB and a high number of operations. Sharding can also improve performance for over 25,000 operations per second. For lower operations per second, it can handle up to 50GB of data.

Reducing Connection Overhead with Many Clients

With a large number of clients, reconnecting to a single-threaded Redis process can overwhelm it and trigger a failover. To reduce the number of open connections to your Redis server, use the right tools.

Redis Enterprise DMC proxy acts as a proxy to reduce connections to your cache server. Twemproxy is another fast and lightweight option that allows you to reduce open connections to your Redis server. It was built to minimize connections to backend caching servers and works with protocol pipelining and sharding to horizontally scale your distributed caching architecture.

Using More Than One Secondary Shard (Redis OSS)

Redis OSS uses a shard-based quorum. It's recommended to use at least three copies of the data (two replica shards per master shard) to protect against split-brain situations. In essence, Redis OSS addresses the quorum challenge by having an odd number of shards (primary + two replicas).

Redis Cloud solves the quorum challenge with an odd number of nodes, avoiding a split-brain situation with only two copies of the data for better cost-efficiency. Additionally, the 'quorum-only node' can be used to bring a cluster up to an odd number of nodes if an extra data node isn't necessary.

Performing Single Operations with Redis Pipelining

Performing multiple operations serially increases connection overhead. Instead, use Redis Pipelining, which involves sending multiple messages without waiting for each reply and typically processing the replies later.

Pipelining is entirely client-side and aims to solve response latency issues in high network latency environments. By minimizing time spent sending commands and reading responses over the network, pipelining significantly improves protocol performance. It can speed up connections by a factor of five for local connections and up to a hundred times for slower internet connections.

Setting Time-to-Live (TTL) for Cached Keys

Redis functions primarily as a key-value store where you can set timeout values (TTL) on keys. A timeout expiration automatically deletes the key. Additionally, commands that delete or overwrite key contents will clear the timeout. The Redis TTL command is used to get the remaining time of the key expiry in seconds.

Keys without TTL accumulate and eventually get evicted. Therefore, it's recommended to set TTLs on all caching keys.

Tuning Slave and Client Buffers for Slow Replication

When replicating a large active database over a slow or saturated link, replication might not finish due to continuous updates. To allow for slower replication, tune the slave and client buffers.

Sharding Hot Keys

Redis can become the core of your application's operational data, holding valuable and frequently accessed information. However, centralizing access to a few constantly accessed pieces of data creates a hot-key problem.

In a Redis cluster, the key determines where the data is stored based on hashing. When you repeatedly access a single key, you're repeatedly accessing a single node/shard. For instance, if you have a cluster of 99 nodes and a single key gets a million requests per second, all those requests will hit a single node, not distributed across the other nodes.

Redis provides tools to find hot keys. Use redis-cli with the --hotkeys argument.

The best defense is to avoid creating hot keys. Writing data to multiple keys residing in different shards allows more frequent access to the same data. In short, shard hot keys using hashing algorithms. You can set a policy to LFU and run redis-cli --hotkeys to identify them.

Using SCAN Instead of KEYS

The KEYS command in Redis can perform exhaustive pattern matching on all stored keys. This is not recommended for instances with a large number of keys, as it can take a long time to complete and slow down the Redis instance. It's equivalent to running an unbound query in the relational world (SELECT...FROM without a WHERE clause).

Use SCAN instead, which spreads the iteration over many calls, avoiding overloading your server. Scanning keyspace by keyname is an extremely slow operation with a time complexity of O(N), where N is the number of keys.

It's recommended to use Redis Search to return information based on data content instead of iterating through the key space.