Recently Forrester group has started to talk about “translytics” and Translytical Databases – databases that can “Deliver Analytics At The Speed Of Transactions”. Is this more marketing hype? Or is there something to it?
A traditional OLTP database handles lots of rapid fire transactions, but doesn’t really do much more than that. An OLTP database application generally involves:
“Shared Finite Resources” – something of value is being measured, allocated or used. Sometimes multiple transactions will try to use the same ‘shared finite resource’. In a legacy RDBMS this is handled with row level locking.
Large numbers of similar ACID Transactions.
Low latency, generally in the single digit millisecond range.
The term “Translytics” refers to OLTP with additional features – so OLTP is a subset of Translytics, which adds:
- “Counting” – A Translytical database will need to accurately keeping running totals that are continually being changed by arbitrary groups of transactions. “How many cars are in this zip code right now?” is an example.
- “Aggregating” is to do with taking a stream of incoming transactions and generating new data that consolidates many pieces of data across multiple transactions. How much bandwidth your home DSL connection uses each hour is an example of aggregation, as we turn an unpredictable number of input records into 24 output records per day per subscriber. In some use cases we can get away with guess work, in others we may have a legal requirement for 100% accuracy.
- “Telling” is the process of sending data to downstream systems. At some point the rest of world needs to know about the wonderful OLTP stuff you are doing, but how does the outside world know what to check? It’s common to see scenarios where enormous amounts of time and energy are spent polling an OLTP system to see what has changed.
Now surely all this is easy? On a small scale of a few transactions per second it’s not an issue. But as workloads go up this turns into a significant computational problem.
Legacy RDBMS products can’t handle significant translytical workloads.
When you issue a query in a legacy RDBMS it attempts to give an answer based on what the data looked like at the moment you executed the query. For fast running queries this isn’t a factor, but the longer the query keeps running the harder it is for the server to ‘remember’ what all the rows looked like at the moment the query was issued, as they may have been changed repeatedly since the moment the query was issued. Doing this for one query is manageable, but when hundreds of these queries are being issued every second for the purposes of “counting” and “aggregating” the overheads become totally unmanageable.
NoSQL products can’t do accurate aggregation and counting in real time at scale
NoSQL products find “counting” and “aggregating” even harder, as their implementations of ACID are generally limited to a single record or key/value pair. This makes it really hard to count or aggregate more than one ‘thing’ with any degree of accuracy. “Telling” can also be extremely difficult, as some products (such as Cassandra) don’t like the kind of range queries you need to issue to get blocks of records to feed to downstream systems.
Volt Active Data’s architecture is ideal for Translytics.
Volt Active Data is very good at traditional OLTP workloads, with customer workloads routinely exceeding 100K TPS and sometimes much more than that. But it also has features which are incredibly useful for Translytics:
Volt Active Data’s Materialized Views work differently from those in a traditional RDBMS. In a traditional RDBMS a view is a SQL query disguised as a table. When you query the ‘view’ you issue the underlying query, possibly with some additional complications based on your WHERE clause. While this implementation of ‘views’ doesn’t slow down inserts, it can be very expensive to read from, leading the the ‘counting’, ‘aggregating’ and ‘telling’ problems we discussed above.
In Volt Active Data a materialized view is actually a real table that is updated every time you update the underlying table(s) it’s based on. It turns out that the incremental cost of doing this in Volt Active Data is tiny – a couple of percent increase in execution time. But when you “SELECT” from a Volt Active Data Materialized View you can simply read the answers you want, without having to issue an expensive read consistent query that touches thousands of volatile rows in the underlying tables.
A classic problem of OLTP systems is telling other connected systems what’s going on at the aggregate level. If we’re tracking prepaid phone usage at some point we’ll need to report on how many calls were made, how much they cost, and so on. As we’ve mentioned above this doesn’t scale.
Volt Active Data has a concept of an “Export Stream”. It’s like a table, except that you can only INSERT into it. When you insert a row it’s placed in an ‘At Least Once’ queue that is attached to any valid external destination, be it HDFS, Kafka, HTTP or one you implement yourself. INSERTs have all the ACID semantics that other statements do, so an INSERT either happens as part of a transaction or doesn’t happen at all. Export Streams scale because they exist at the partition level, so if you have 12 partitions and 2 export streams you’ll have 24 export streams in total. How fast a record reaches its destination is heavily dependent on the platform you are exporting to – for example if you write to HDFS, records will take longer to become visible than if you write to a CSV file on the database server.
Export Streams can also be combined with Materialized Views, so you can write information downstream while retaining a useful summary of state, such as the last time a customer was seen.
From a developer perspective export streams allow arbitrary messages to be sent to downstream systems with high performance, minimal hassle and ACID semantics. It also allows us to perform the crucial “Telling” part of a Translytics workload, as there’s no point in being able to do aggregate calculations in real time if you can’t tell anyone.
“Translytics” isn’t just a fancy name for OLTP. It combines the traditional and well understood functions of an OLTP system with modern requirements to produce aggregate and totalized data in real time so that business needs can be met. Both legacy RDBMS and NoSQL products struggle to meet this goal well, whereas Volt Active Data has a significant architectural advantage. Don’t take my word for it. Try Volt today.