A lot of companies use Volt Active Data for classic Online Transaction Processing (OLTP), but why would you pick Volt as an OLTP data platform and how would a Volt-based OLTP data platform differ from other data platforms in its handling of OLTP?
First — let’s define our terms.
What is a Transaction?
OLTP data platforms, of course, need to handle transactions. But whis is a transaction? There are multiple definitions, ranging from the simplistic to the highly academic and technical.
For our purposes, a “transaction” is when you need to make one or more changes that meet the definition of ACID:
Atomic: Either everything happens, or nothing happens, even if you change multiple things.
Consistent: The data moves from one consistent condition to another, from the perspective of an outside observer. You can’t see half of someone else’s transaction; you either see all of it or none of it.
Isolation: Transactions act like nothing else is happening, even if other transactions are happening.
Durability: Once a transaction has been committed, it will remain committed even in the case of a system failure. Traditionally, this meant ‘on disk’. In Volt, it can also mean ‘on another running Volt node’.
OLTP Data Platform Types
Now that we’ve defined our most important terms, let’s look at the different types of OLTP data platforms.
There are three potential types of OLTP data platforms: legacy RDBMS plus SQL, NoSQL, and Volt.
The most important thing to understand is that Volt treats transactions differently from other OLTP data platforms. Let’s look at each type and then how is different.
Legacy RDBMS: Lots of network calls, one transaction
Traditional RDBMS products are not easy to use as OLTP data platforms because they have lots of network trips in the form of SQL calls, which collectively make up one transaction. While making life simple for developers, the large number of network trips makes low latency near impossible, and the fact that the transaction can, in theory, take hours causes chaos for the back-end database, especially when working with shared, finite resources.
The good news is that all of your SQL calls are effectively invisible to other users until the transaction finishes. This is important if you are, for example, moving money between account ‘A’ and account ‘B’.
Legacy NoSQL: Lots of network calls, each of which is a transaction, 4X acceleration
NoSQL normally provides one transaction per network trip, which means the server is much simpler and faster. However, developers have to live in a world where they can see other people’s half-completed business events, and undoing something if a failure happens halfway through a business transaction is somewhere between ‘really hard’ and ‘impossible’. Even if you can remember what to undo, other people may have already made decisions based on changes you temporarily made.
The good news is that NoSQL is generally about 3-4x faster than a legacy RDBMS. The bad news is that even allowing for a faster server, things can still end up slow because of the sheer volume of network trips needed.
Volt: One network call, one transaction, 9x acceleration
By allowing you to do one big transaction in one network trip, Volt gives you the best of both worlds as an OLTP data platform. To make this work, you don’t use SQL directly. Instead, you send Volt the name of a Volt procedure and some parameters. Once your call reaches Volt, it passes the parameter to a Java class that matches the Volt Procedure name. The class interweaves SQL statements and Java to implement your business logic where the data is, which is why Vol only needs one network trip.
Volt also runs one dedicated thread for each physical core and executes each Volt Procedure call one at a time, to completion, asynchronously. Unlike almost every other data platform out there, Volt spends zero time trying to do more than one thing at once; each core (or ‘partition’) has a queue of requests and will process 5,000 or more requests on each core per second. This makes Volt 3-4 times faster than many NoSQL platforms, even when breaking up transactions into multiple calls.
Simple OLTP Scenario
Let’s imagine you have to keep track of customers spending money or credit in an online store. The data structures look like this:
- Customer_id number
- Customer_zipcode int
- Customer_balance float
- Customer_id number
- Transaction_id number
- Item_id number
- Qty number
- Item_Price float
- Item_salestax float
- Purchase_value float
- Transaction_date timestamp
- Zip_code int
- Tax_rate_pct float
- To make things slightly more complicated, the back office system needs to be told about purchases as soon as possible after the purchase.
- You also have to make sure that only one purchase can happen at a time and that the customer balance is always correct.
- You need to calculate and charge sales tax based on the zip code.
- You need to keep track of total spend per customer.
- Customers can’t spend money they haven’t got.
How we’d do this in Volt
Here’s how we’d do the same thing in Volt:
1) Determining a partitioning scheme
The first thing to do is to determine what gets partitioned. ‘Partitioning’ is used to break up the workload across CPU cores in Volt, as all requests to read or write data in the same partition will be handled by the same CPU core. Not all data has to be partitioned; data that doesn’t change often can be left to be shared by everyone. In this case, the obvious way to partition things is by customer_id. Since sales taxes rarely change, they can be global and unpartitioned. Unpartitioned things can be updated like anything else, but it’s slower.
2) Defining tables and streams
Below is the first part of the Data Definition Language (‘DDL’) to create our schema. As you can see, it maps closely to the description above. The “PARTITION TABLE” statements are important. In addition to allowing us to evenly distribute data among the CPU cores of the server, they also guarantee that Volt Procedure calls for a given customer_id will always be executed sequentially, which means we don’t need to worry about locking or ACID.
Note that we record purchases to ‘purchase_stream’. In Volt, a stream is like a table, but one you can only insert into. Streams have ACID semantics, so data inserted is only visible if and when your transaction finishes properly. In this case, we pipe purchase_stream into purchase_topic, which can be read using a Kafka client.
This code can be found in create_db.sql.
3) Creating views
We also create two database views. In most SQL databases, a ‘view’ is a SQL statement that gets fired when you SELECT from it. Volt does things differently. In Volt, a ‘view’ is a table that gets updated as the underlying tables change. Volt does this very efficiently (adding a view slows down execution by 1-2%).
We create two views. Purchase_history has one row per customer and contains the total value of all purchases for that customer. It calculates this by watching purchase_stream. So even though you can’t SELECT from purchase_stream, you can still use the view to find the total value of purchases for a customer. We also create purchase_history_by_zip, which allows you to track sales by zip code. When you read from purchase_history_by_zip, a request is sent to all the partitions for ‘their bit’, and the results are consolidated before being sent back to you.
4) Creating Volt Procedures
We create two Volt Procedures. One is linked to Java code, and the other is defined in DDL.
“SpendMoney” is defined in Java. It allows us to implement non-trivial logic to solve our business problem — in this case, a purchase — in one round trip to the database.
QueryCustomer tells us everything we know about a given customer. Even though it has multiple SQL statements, we can define it using DDL. The only downside is that we have to pass in the same customer_id three times.
5) Creating metadata
We then create customers and zip codes so people can buy stuff. Note that customer 4 is in a zip code, 20151, for which we don’t know how to calculate tax.
6) Implementing the logic in Java and SQL
We’ve put our logic into a Volt Procedure called ‘SpendMoney’, but how does it work? We start by defining the SQL statements we want to issue, using ‘?’ to indicate parameters:
We then get to the ‘run’ method. Every Volt Procedure has one. It takes ‘Object…’ as parameters. When we extend the class we define this in more detail, like below. All VoltProcedures return an array of VoltTable. A VoltTable is a ‘pass by value’ data type; unlike a SQL cursor, it has no connection to Volt once you get it back at the client side. If your procedure doesn’t have anything to return, it can send a zero-length array of VoltTable back.
Once we start running, we need to check for errors before we change anything. We do this by seeing if the user is valid and calculating the sales tax for them if we can.
At this point, it becomes important to understand ‘voltQueueSQL’ and ‘voltExecuteSQL()’. We are writing code in Java, which is easy to write but hard to create applications with at a predictable low latency. So we use C++ to store data and execute SQL statements. When you call ‘voltQueueSQL’, you are telling the C++ part of Volt that you want to execute a SQL statement. You can do this any number of times before you call ‘voltExecuteSQL()’, which sends your queued SQL to the C++ core and returns the results. Note that voltExecuteSQL() does not do a ‘commit’, unless you call voltExecuteSQL(true) – an implicit commit happens if you get to the end of the run method and return something.
voltExecuteSQL() returns an array of VoltTable. By calling the boolean method ‘advanceRow()’, we can see if a given table is empty or not. In our case we won’t return more than one row, but you could return thousands.
Having established this is a viable transaction, we update the balance, record the purchase, and write a record to the output stream, and finish.
7) Running some simple tests
We can now run some simple tests. We don’t even need a client for this; we can use the command line interface ‘sqlcmd’. The commands below are found in testrun.sql.
By default, sqlcmd exits if a procedure throws an error and you’ll need to tell it not to do that when you run it. You could also just start sqlcmd and paste the statements in one at a time in you felt like it:
In this simplified walkthrough, we’ve shown how a simple OLTP application works in Volt. In the next one, we’ll create a client to call it programmatically and also explain how Volt does high availability.