Home < Blog < OEMs: The 6 Questions to Ask Your Data Platform Vendors

OEMs: The 6 Questions to Ask Your Data Platform Vendors

8 min read

OEMs face incredible challenges today. They’re being asked to do more with less as supply chain and operation overrun issues create a legitimate OEM crisis

And yet, more often than not, OEMs don’t consider their underlying tech stack technologies until they are way too deep into said crisis. By that point, the amount of money they need to spend to get out of the crisis almost panic-inducing—and often it’s just too late. No tech investment can save them. 

Finding the right data platform BEFORE you get mired in an emergency iis key. But choosing the best possible data platform to embed in your solution isn’t easy. 

Here are the questions we think OEMs should ask when selecting a data platform for their applications..

1. Where and how can I run your product?

This potentially sounds silly, but it isn’t. Problems around not having a clear understanding of the types of environments (ie, cloud vs on-prem) the data platform can operate in can result in insurmountable issues down the line. 

Consider the cloud. A lot of data platforms are now only available in a specific hyperscaler’s cloud. While this might make early development a lot simpler, as development finishes  problems can arise around one’s choice of hyperscaler. Will your potential customer be willing to buy your product if it only runs on a certain hyperscaler’s cloud, one they don’t like?

Another issue is geo-distribution. If you pick a data platform that can only be deployed in a set number of geographic locations, it could lead to latency issues due to increasingly stringent latency SLAs and trouble meeting those SLAs due to the limits of physics. 

The third issue to consider is regulatory compliance. In Europe, a lot of consumers are unhappy with the notion of personal data being processed overseas. Legislation such as the GDPR can make a simple technical decision into a complicated legal one. GDPR is not a one-off, and similar legislation is going to be enacted in many more jurisdictions.

2. How much ongoing ‘care and feeding’ does your platform need for a real-world, large, enterprise deployment?

With legacy databases there was a built-in assumption that you’d need a full-time team of DBAs to support a large deployment, even of something that was, in theory, a product. This, in turn, sometimes led to local DBAs making problems worse, or sometimes conjuring problems out of thin air by (ab)using their mandate to ‘tune’ the system. 

Data platforms are like car engines: the more moving parts and levers and valves you have, the more likely something, at some point, will break, rendering your car undriveable, at least temporarily. Likewise, many newer data platforms have made the same mistake as legacy products – they have far too many knobs and levers and things that can break them if someone  tweaks them in the wrong way.  

So if you are in the business of making a product that includes an embedded data platform what you actually need is a data platform  with as few valves,levers and moving parts as possible, so that there are fewer things  that can break and a limited number of predictable settings to change as your deployment scales. 

3. Does your platform provide SQL and ACID?

There’s a reason many NoSQL data platform vendors are retrofitting SQL and ACID to their products: experience in the field has shown that they’re still sorely needed and will be for some time to come. This is because application complexity inevitably grows over time, and as more and more use cases come into play, the likelihood you’ll need  SQL and ACID increases with every new feature you add. . This is why we’re seeing a resurgence of both ACID and SQL in the marketplace.. 

Not only do you need SQL and ACID, you also need to fully understand how they’re being implemented in the data platform you’re considering, and the tradeoffs involved. Early RDBMS products were designed to offer SQL and ACID and not much else. But this still spawned a US$35 billion database industry, because SQL and ACID are valuable because they avoid data loss and make it easier to understand and work with data structures as they get more and more complicated. This is still a very hard problem to solve, especially when you decide to add scale and high availability as features. If it were easy everybody would have done it years ago.

What this means in practice is that vendor claims of SQL and ACID compliance need to be treated with caution. One recent example the author saw was of a well known NoSQL database that claims ACID compliance, but buried in the manual is an admission that when recovering from backups, incomplete transactions will be reloaded.  So if you were moving money from account ‘A’ to account ‘B’, and needed to recover after an outage, you’d find some transactions where money left ‘A’ and vanished, and others where it materialized in ‘B’’ out of thin air.

There are also performance implications of putting a SQL/ACID layer on top of an existing architectural framework. A blazingly fast NoSQL product might not perform so well when cosplaying ACID and SQL support. Things to watch out for would be transactional behavior at scale, and what happens when multiple people try and change the same record at the same time. 

4. How does pricing work?

A lot of OEM solutions come in multiple sizes and shapes for different markets. If your data platform vendor has a strict one-size-fits-all policy, that could mean your overall TCO might sound appealing up front but becomes uncompetitive under certain circumstances. Are you able to negotiate a deal that lets everyone come out ahead, or is the best you can hope for a small percentage discount? Pricing is another one of those issues that needs to be confronted early.

5. How much data does your geo-replication process lose?

A decade ago geo-replication was, at best, a “nice to have”. Now it’s a must have in a lot of markets. But geo-replicated data platforms are far from simple. In a geo-replicated world, independent copies of your data will live in multiple geographic locations, many hundreds of miles apart. Changes made to one location are propagated to the other, and vice-versa. In an ideal world you’d change both at the same time, but the laws of physics mean that doing so can take hundreds of milliseconds or even whole seconds. 

To avert a drag on latency, you can make the change locally and send it to the remote site.  But what happens if someone makes a conflicting change at the remote site, and the two changes pass each other on the network? Almost every data platform out there will pick a winner, generally by looking at who made the last change. They then make the ‘losing’ change disappear. This is needed to make sure the content in the two geographic locations doesn’t diverge over time, but it also involves permanent loss of completed transactions, which is bad. 

Your data platform should have the ability to manage this by creating a message queue explaining what it had to do to keep the two, three, or four sites consistent with each other. With this ability, your system can read this queue, see that a transaction ‘lost,  and redo it if needed. Most data platforms don’t have this capability and therefore lose a small but non-negligible amount of completed transactions during normal operation. It’s one thing to tolerate this in an internal system, but another thing entirely to put it in a product you sell to people.

6. How does support work?

The last big issue we see is that sometimes OEM vendors don’t think through how a data platform will be supported until late in the development process. OEM support isn’t like normal support. You don’t want your end users going to third parties seeking assistance with your intellectual property, You probably won’t have people on site and you may not even have direct access to the system in question. Vendors that treat support as a fixed-price commodity may not understand the level of urgency from your perspective, or move at the speeds you need. Doing the support yourselves might be an option, but it needs to be costed, budgeted, and staffed. Again, you need to understand this before you make your choice, not after.

Table Of Contents

Conclusion

While not exhaustive, the questions above are key for any OEM in need of a new data platform. 

A key point to remember: OEMs seeking or needing to embed data platforms into their products have different needs from normal data platform users.

Unlike a single, internal system that’s only deployed once, the TCO of a data platform that’s deployed dozens or hundreds of times can directly and continually impact an OEM’s bottom line, often with the OEM not even knowing it. Relationships with data platform vendors can also be very different, both from a support and a commercial perspective.

Learn why the Volt Active Data Platform was designed for OEMs.

David Rolfe