Real-time decisioning has rapidly gained popularity within the tech zeitgeist, especially with its connection to streaming data.
It seems like everything these days either happens in real time or it doesn’t happen at all.
But companies are starting to run into a major issue with real-time data and real-time decisioning. They’re trying to fit a square peg into a round role: ie, trying to achieve real-time data processing and decisioning with legacy architectures that just weren’t built for it, and, in doing so, creating massive technical debt and TCO that slowly sinks their revenues and maybe even the entire company.
Before we look at why this has become such an issue, let’s define our terms.
What is “Real-Time Decisioning”, Anyway?
“Real-time decisioning” is a pretty loaded term. Let’s break it down, starting with the easy part — decisioning.
In one of my first jobs at a bank they had a ‘real time’ system that logged transactions to a computer tape, then fed them into batch processing later that night.
So before we can have any kind of rational discussion about this, it’s important to agree on what we mean.
“Decision” is an easier concept to work with. It’s when you start an event without knowing the outcome for certain. Rearranging fields in a record, or writing a log message are not ‘decisions’. Deciding whether to connect a phone call, let someone join a game, or what level of difficulty their video game should switch to are examples of decisions.
How Do We Define Real Time?
There are two variables you need to consider when evaluating how ‘real time’ something is:
1. Average latency
Average latency is often measured in milliseconds, or thousandths of a second. An average human reaction time is between 150 and 300 milliseconds. A traditional ‘spinning rust’ hard drive will take a couple of milliseconds to return data. In an ideal environment and situation, a legacy RDBMS can do it in around 10ms, and Volt Active Data can do it in 1-2ms. Light will travel 186 miles during a millisecond. Below milliseconds you get into microseconds (millionths of a second) and nanoseconds (billionths of a second). Some exotic trading systems work in microseconds, and physicists often work with nanoseconds.
2. Long-tail latency
If you rely on an answer to control something in the real world then long tail latency matters to you. People often refer to ‘99th-percentile latency’ — the time it takes for 99% of events to happen. If your average latency is 2ms but your 99th percentile is 1000ms, then one event in 100 will not have finished after a second. On the other hand, if you need events to finish within 1000ms, an average latency of 800ms with a 99th percentile of 950ms will be a lot better, as it means that no more than one in a thousand events will be late. The reality is that if you are in a situation where you have a deadline to do something, average latency isn’t really the metric to use – what matters is what proportion of events miss the deadline.
When enough events finish late to disrupt activity then you have a ‘long-tail latency’ problem. A social media feed that responds instantly to 99.9% of requests and takes 20 minutes for the remainder is not really an issue. But an ATM taking 20 minutes to dispense cash would be unacceptable. It’s much more likely to be an issue if you are changing something, and the change still happens, but long after you’ve given up waiting.
In addition to latency, in the real-time decisioning world, some decision of value that alters state somewhere usually needs to happen. If so, scaling becomes a problem. If all you’re doing is pulling entries from a cache you can scale by cloning the cache, but if you’re allocating or spending something, you can’t.
Putting all of the above together, we can define “real time” as:
Occurring within a time frame that is short enough to have real-world impacts on business decisions and update key data before it goes stale. This time frame is usually single-digit milliseconds, although the length will vary depending on the environment and use case.
Where, Why, and How Companies Get into Trouble with Real Time
The most common mistake we see is failing to understand the difference between average and long-tail latency when defining real-time data processing requirements. As the above example makes clear, you need a really clear understanding of actual deadlines, not hypothetical management ones, and what will happen if you fail to meet them.
Humans will generally wait a couple of seconds and then try again. A lot of devices will treat a latency SLA failure as if it were a transient network outage, and try again, immediately. It could then fail because it bumps into the original request, leading to a storm of retries at one millisecond intervals and related chaos.
The second mistake we see is that people will assemble a stack of ‘best of breed’ components and assume that ‘latency’ will be the combined execution time of each layer. In practice, this doesn’t happen. If you get twice the expected average latency you’re doing well, and long-tail latency can be all over the place.
The third mistake is to undercount the number of actual network trips. A lot of applications will require multiple round trips to solve a business problem. 5ms latency for a 40ms SLA is fine, until you need to do 7 round tps, and you’re looking at 35 ms latency for your 40ms SLA. Edge computing can make this a really serious issue.
The Tech-Stack Conundrum: You Can’t “Just Add” Real-Time Decisioning
A lot of problems in technology can be solved by using or adding more of something — hardware, bandwidth, etc. But it’s really, really hard to ‘add speed’ by adding another layer.
Caching might outwardly be the exception to this, but caching doesn’t involve decisions. Taking a decision usually means you need 100% up-to-date and accurate data. In any situation where your decision-making logic and data storage are in separate places, you run the risk of being ‘overtaken by events’: you read something, but it changes the moment you stop looking at it. You can solve that by double-checking, but that’s yet another trip.
In short, once you’ve built your solution you may find it really, really hard to reduce real-time latency without a rewrite. You can’t make an existing stack faster by adding more stuff, any more than you can make a finished cake bigger by piling more raw ingredients on top.
What Can You Do? Here Are a Few Tips
So how do you reach your goal of consistent low-latency real-time decisions if you can’t do it by adding ‘more’ of something?
There’s no easy answer.
You’ll need to start from the available time budget and architect with that in mind as a hard limit.
One design goal should be to have as few different layers as possible, to avoid time being wasted on internal communications.
You may also find that different parts of the system operate on different timescales and might be able to split it into a ‘fast’ part and a ‘really fast’ part.
Using really fast CPUs might help, but will at best give you a 50% speedup when you could need 500%.
While there is no obvious, one-size-fits-all answer, once you accept that in any real-time decisioning system the fundamental constraint is latency your choices will become simpler and better.