When Does Streaming Data Become Real-Time Data?

Key Takeaways

Real-time data platforms rarely operate in complete isolation and often need to interact with slower, back-office systems for tasks like updating financials.
Achieving real-time data processing alongside streaming data is challenging due to fundamental differences in how these systems handle data and time, leading to potential data inconsistencies and lags.
Developing in-house solutions for connecting real-time data processing to streaming data via Change Data Capture (CDC) can be complex and resource-intensive, requiring ACID compliance to ensure data accuracy and reliability.
While off-the-shelf streaming CDC solutions can help, they may face scalability issues with modern data volumes, and it's crucial to consider latency, the need for data aggregation, and future demands.
Volt's Active(SD) offers a unified platform for processing, storage, and streaming, addressing the challenges of integrating real-time and streaming data by handling late-arriving events and providing a Kafka-compatible interface.

We’ve been hearing a lot about real-time data platforms and also about streaming data and streaming data-enabling messaging technology such as Apache Kafka.

The idea is to turn your streaming data into real-time data by using it while it still has value. This isn’t always easy to achieve. In fact, it’s almost never easy to achieve, and most companies are still struggling to do it on a consistent basis without sacrificing other things, such as consistency and reliability.

This blog explores why.

REAL-TIME DATA PLATFORMS ARE RARELY 100% REAL TIME

The first thing we need to consider is that very few real-time data platforms exist in some kind of mysterious void where they don’t need to speak to any other, slower system. Real-time latency might be a non-negotiable requirement of what you’re building and/or doing, but it rarely works alone. At a minimum, it needs to tell some back office system somewhere what it’s done, so that the company can update its financials.

The same thing is true in reverse All real-time systems have, at a minimum, an updatable list of real-time products and their prices, or something functionally equivalent. So while upstream feeds might be small in comparison, they’re still critical to the functioning of the system as a whole.

DOING REAL-TIME AND STREAMING AT THE SAME TIME IS CHALLENGING

Historically, sending and receiving data to and from real-time systems was an awkward batch process that tended to get tacked on as an afterthought. Legacy RDBMS products weren’t very good at this, as you tended to get big, expensive SQL operations that had a nasty habit of going wrong at two in the morning. In theory, the widespread adoption of streaming ought to make life a lot simpler, but the reality is sadly different. Real-time data and streaming data have wildly different ways of seeing the world compared to the old way. For example, there isn’t really such a thing as ‘complete’ in a streaming world, where we used to be able to assume the contents of a file or results of a big query were, by definition, up to date.

There’s a fairly deep-seated and hard-to-fix ‘impedance mismatch’ that isn’t visible when you first sketch this out on a whiteboard but becomes visible at implementation.

The first issue is that most real-time systems think in terms of 1-10ms, whereas streaming thinks in multiples of 100ms. So a real-time system might have finished with something and have moved on to another task long before the streaming subsystem becomes aware a data item even exists. The problem starts a few seconds later when you want to know an aggregate, like the total number of widgets you sold at the last minute. Because there’s a lag with an undefined duration between the event happening and you reliably knowing about it downstream, there’s a gap in time between the real-time system and your back-office system when nobody knows for certain what’s going on.

Scaling is another issue. Popular streaming systems such as Kafka can be very brittle when it comes to rapid changes in volumes, which means people tend to settle for transient backlogs, which makes the time lag issue we just discussed worse.

IN-HOUSE SOLUTIONS CONNECTING REAL-TIME DATA PROCESSING TO STREAMING DATA ARE PROBLEMATIC

The obvious approach to this issue is for the developers to build some form of CDC (Change Data Capture) functionality into the base application. This is a lot of work, much more than first appears, because if you want your downstream numbers to match then it needs to be an ACID-compliant CDC system, where there is a guaranteed 1:1 relationship between an event happening and a CDC message reaching its destination, even in the case of outages.

Most NoSQL products or DataGrids can’t do this without heroic levels of effort by developers, and you’re often left with an unpleasant choice between slow and complicated or fast and unreliable. The bottom line is that CDC systems are complicated and tricky enough to spawn entire families of products to do the job.

WHAT ABOUT BUYING AN OFF-THE-SHELF STREAMING CDC SOLUTION AND ADDING IT TO MY DATA PLATFORM?

This may or may not help. Assuming the new system does actually work as advertised, then it will allow you to get data out of your real-time data processing platform reasonably quickly and effectively, but you still need to consider the following caveats and ‘gotchas’:

Most CDC systems were designed with smaller scales in mind.

Modern systems are producing 10 to 100x the output compared to ones from a few years ago. Any CDC system you integrate needs to be able to keep pace not just with the demands you face today but able to meet the needs of 3-5 years from now.

Copying every change to every record may not be what you actually want.

For a lot of modern systems, the sheer amount of activity means that sending every single change downstream may overwhelm back-end systems. In many cases, you’d be better off doing what the telco industry calls ‘aggregation’ — consolidating a stream of hundreds of technical events into a much smaller one consisting of ‘business relevant’ events. By aggregating before records are sent downstream, you can avoid having to create an elaborate system to do the same aggregation on arrival.

Off-the-shelf CDC may have the same problems with latency and lag we discussed earlier.

Without extensive testing it’s hard to tell whether an off-the-shelf CDC product will work. A fundamental issue is that, ironically, because it’s a mature market sector, many of the offerings are going to struggle with the volumes we can expect in the future.

HOW ACTIVE(SD) HELPS

Volt’s Active(SD) combines processing, storage, and streaming into a single platform, allowing companies to avoid the issues mentioned above. Active(SD) can store data to make sense of late-arriving events, then compare incoming records with stored ones, and finally stream the data you need when you need it. A key feature is that it can be made to look like a Kafka cluster on your network, so existing applications that send data to Kafka can be re-plumbed so they send it to Volt.

Click here to lean more about Active(SD).

Get Started with Volt

Architecture

Capabilities

Data Center Replication

In-Service Upgrades

Low Latency

Consistency

High Availability

Scalability

Page group one

Fraud Prevention

Hyper-Personalization

Private 5G Networks

Streaming Data

Edge-Based Deployments

Page group two

Industrial IoT

AI + ML

Business Support Systems

5G Streaming Mediation

The 6 Reasons BFSI Companies Need Real-Time Data Processing

From Tsunami to Transformation: 6 Key Takeaways from IoT Tech Expo North America 2025

Telco

BFSI

Intelligent Manufacturing

Smart Utilities

Supply Chain

Fantasy Sports

Retail

Resource Library

Blog

Partners

For Customers

Support

Professional Services

Documentation

For Developers

Developer Hub

Quick Start Guide

Developer Edition

About

Careers

News

Press Releases

Webinars & Events

Our Team

Contact Us

When Does Streaming Data Become Real-Time Data?

Key Takeaways

REAL-TIME DATA PLATFORMS ARE RARELY 100% REAL TIME

DOING REAL-TIME AND STREAMING AT THE SAME TIME IS CHALLENGING

IN-HOUSE SOLUTIONS CONNECTING REAL-TIME DATA PROCESSING TO STREAMING DATA ARE PROBLEMATIC

WHAT ABOUT BUYING AN OFF-THE-SHELF STREAMING CDC SOLUTION AND ADDING IT TO MY DATA PLATFORM?

HOW ACTIVE(SD) HELPS

About Author

Featured Resources

Guide to Streaming Data Platforms

Volt + Kafka for Solution Architects

Follow Us:

Categories

Power Real-Time BFSI Success

Guide to Streaming Data Platforms

Volt Active Data’s Top-10 Capabilities

Why Your Tech Stack Is About to Break (and How to Avoid It)

Test Drive the Only Lightening-Fast No-Compromise Real-Time Data Platform on the Planet

Guide to Private 5G Networks