I work on the Volt Active Data Scrum team tasked to create interfaces – both import and export – to many other applications and systems in the broader ecosystem of big and fast data. It’s an always-interesting place to be since we’re in the center of the very rapidly evolving world of open source and private platforms, applications and solutions.
Today I’m writing about a recent integration, released in Volt Active Data v6.6, that allows system builders to connect Volt Active Data in either data producer and data consumer roles with Amazon (AWS) Kinesis streaming services.
This is another step in our continuing commitment to build and support connections to a variety of producers and consumers in the big and fast data world. That includes connectors to Hadoop, Kakfa, RabbitMQ, and Elasticsearch, as well as to generic interfaces like JDBC to connect to other database systems, HTTP, and file import and streaming export (CSV, TSV).
The goals are consistent – easy to set up and use, high performance, and the no-compromises strong ACID compliance that’s at the core of Volt Active Data.
Streaming to Amazon Kinesis Firehose
For the first case, Volt Active Data is the data source with respect to Amazon Kinesis. Let’s imagine a use case where client applications load high-frequency stock trade records into a Volt Active Data cluster for validation and aggregation. Since Volt Active Data is an in-memory database, we set up time windowing procedures to insert rows into streams (export tables) and delete the corresponding rows from the in-memory tables.
The plumbing to transmit streaming data from Volt Active Data to Amazon is the Volt Active Data Kinesis Firehose Export Conduit, freely available at https://github.com/Volt Active Data/export-kinesis.
A stream in Volt Active Data can be thought of as a virtual table. This table doesn’t hold state; rather, it defines an outbound schema that can be consumed, e.g. streamed, by an external system. This enables a Volt Active Data application to easily, and transactionally, export data from Volt Active Data to another system – in this example, Kinesis – but also to many other systems such as Kafka, HDFS, relational databases, etc.
Here’s an example of a simple stream definition:
In Kinesis Firehose, we’ve already created the stream “streamdata” and configured its handling in the world of Amazon Services. This involves creating the delivery stream, directing the data flow through Amazon S3, and then specifying that the rows of data are inserted into Amazon Redshift, a Postgres-like database for persistent storage hosted on an EC2 cluster.
In “demo” applications in AWS, the default data transfer rate to Amazon using the Kinesis Firehose service is quite limited – default is 2,000 transactions per second or 5,000 records per second or 5MB of data per second. However, to support our development and testing at typical Volt Active Data transaction/second rates, the Amazon Kinesis team graciously provisioned much higher transfer rates and likewise will work with business teams to support a wide range of data transfer requirements.
With provisioning for better performance from Amazon, our export streaming speed exceeded 28,000 rows per second. That’s still slower than Volt Active Data’s transactional capability, but as mentioned above, the database buffers rows to disk and send them to the Amazon consumer at its maximum consumption rate.
Behind the scenes in Volt Active Data, the streaming service is capable of buffering data on disk to match slower consumers without reducing primary database performance. In a Volt Active Data application, rows are inserted into the stream, just as you would insert data into a relational table. In the background, the Volt Active Data export stream manager delivers the rows to the stream consumer or consumers at their maximum rate. This is handled with Volt Active Data reliability.
Streaming to Volt Active Data from Amazon Kinesis Streams
In the second case, Volt Active Data is the data consumer, ingesting rows of data from Amazon Kinesis streams and inserting them into database tables transactionally. This data can be ingested using default Volt Active Data insert transactions, or the application can specify custom business logic via a transactional Stored Procedure, for example, to handle validation, data cleansing, aggregation, or routing.
In this example use case, Amazon Kinesis streams consume click data from numerous geographically-distributed web servers. Volt Active Data consumes this stream data, validating and “sessionizing” the data using stored procedures and inserting the processed rows into appropriate tables and materialized views for aggregation and real-time analysis. It’s easy to visualize output streaming as well since it’s possible there’s a downstream data warehouse as the next stop.
We specify the Volt Active Data consumer connection to the Amazon Kinesis stream in the “import” section of the database deployment file:
Note the “procedure” property in the deployment file. This is the stored procedure that processes the incoming data. Volt Active Data calls the procedure for each incoming row. The procedure can validate the data, query other tables to “sessionize,” and insert the row into a database table. This happens as a transaction, atomic by definition, so if there are problems or errors, all the processing will be rolled back.
Volt Active Data automatically creates “default” procedures, including a simple INSERT. If the goal is to get the row into a table as fast as possible, it can be done without any coding at all.
We’ve seen how Volt Active Data can consume – import – streaming data from Amazon Kinesis and it can produce – export – streaming data to Amazon Kinesis, or both if that’s the application design requirement.
It is easy to import or stream export data with no application coding at all. If complex processing and aggregation are required, that’s straightforward too, as the example in this post illustrates.
Give it a try today and let us know what you think. Download Volt Active Data here.
Volt Active Data documentation, https://docs.voltdb.com/. In particular, see “Importing and Exporting Live Data”.
Amazon Kinesis, both Streams and Firehose: https://aws.amazon.com/kinesis/