Home < Blog < Optimizing Index Performance – A CMU Intern’s Work at Volt Active Data

Optimizing Index Performance – A CMU Intern’s Work at Volt Active Data

Sep 1, 2014

4 min read

Key Takeaways

Low-cardinality indices can cause slow row deletion in Volt Active Data due to linear search comparable to a full table scan.
Appending the tuple address to the original key creates a new, unique key type, improving deletion performance to O(log(n)).
The value field in the index entry became unnecessary and was eliminated to save space, with adjustments made to search functions.
A C++ template was used in the index to reduce code complexity, improve readability and maintainability, and boost runtime performance by eliminating type checking.
The new method significantly improves deletion performance, especially with low-cardinality indices, as demonstrated by benchmark results showing reduced deletion times.

My name is Yetian Xia. I worked as a software engineering intern at Volt Active Data during the summer, working with the core team. My favorite project was optimizing the performance of deleting entries from a low-cardinality non-unique tree index.

Index is used to increase the speed of looking up a row, and is heavily used in almost all database use cases. Logically, it can be regarded as a map, in which the key is a search key generated from some columns of a row and the value is the address of a row. Volt Active Data has two types of indexes, hash indexes and balanced tree indexes. This architecture works well most of the time. In very rare cases, Volt Active Data can be slow when deleting rows from tables. These cases all have low-cardinality indices on some tables, which means the indices have few unique keys.

A low-cardinality index is often cited as a reason for bad performance. When deleting a row from an index, the Volt Active Data engine needed to generate the key based on the row, and iterate the nodes sharing the same key to find the exact entry of that row, because the key does not contain any information on the address. If the index has low cardinality, the linear search of these keys is comparable to a full table scan, leading to a slow deletion process. In short, the worst case for deleting a tuple entry from a low-cardinality tree index is O(n).

The method we used to resolve this problem was to create a new key type by appending the address of the tuple to the original key. Then all duplicate keys become unique. Finding a specific tuple is the same as searching a node in a balanced tree, which only needs O(log(n)) time.

Two more items needed extra work. One was that the value field was no longer necessary for an entry, since the address had already been kept in the new key. Rather than waste 64 bits for each row, so I refactored the tree node implementation to eliminate the field. Additionally, several search functions of the index had to be slightly modified. It is efficient to look up tuples using the new (key, address) scheme on deletion, but it is not correct if you want to find all tuples that share the same key. In these cases, index lookups only use the key portion.

Also worth mentioning, we used a C++ template in the index. This reduced the number of lines of code, making the code simpler and easier to read and maintain. It also eliminated type checking in the code, boosting performance at runtime.

Below is an example of the improvement in performance on my own desktop, running i7-8 cores, 12G RAM and Ubuntu 14.04.

The ddl I used was:CREATE TABLE P1 ( ID BIGINT NOT NULL UNIQUE, AGE INTEGER, NUM INTEGER, GENDER TINYINT, PRIMARY KEY (ID) ); create index idx_AGE_TREE on P1(AGE); create index idx_NUM_TREE on P1(NUM); PARTITION TABLE P1 ON COLUMN ID;

AGE was evenly distributed between 5~99, and NUM was evenly distributed between 0~9. GENDER was a random binary variable. If we executed the SQL query:DELETE FROM P1 WHERE AGE=5;

This query would delete around 1% rows from the table P1. The result is shown in the table below.

# of rows	old (delete 1%)	new(delete 1%)
1m	0.77s	0.08s
4m	12.00s	0.12s
16m	189.62s	0.23s

The new method achieved better performance and scaled better. If we had defined an index on binary column GENDER, the performance improvement would have been even greater.

About Author

Adrian Scholes

Get Started with Volt

Architecture

Capabilities

Data Center Replication

In-Service Upgrades

Low Latency

Consistency

High Availability

Scalability

Page group one

Fraud Prevention

Hyper-Personalization

Private 5G Networks

Streaming Data

Edge-Based Deployments

Page group two

Industrial IoT

AI + ML

Business Support Systems

5G Streaming Mediation

The 6 Reasons BFSI Companies Need Real-Time Data Processing

From Tsunami to Transformation: 6 Key Takeaways from IoT Tech Expo North America 2025

Telco

BFSI

Intelligent Manufacturing

Smart Utilities

Supply Chain

Fantasy Sports

Retail

Resource Library

Blog

Partners

For Customers

Support

Professional Services

Documentation

For Developers

Developer Hub

Quick Start Guide

Developer Edition

About

Careers

News

Press Releases

Webinars & Events

Our Team

Contact Us

Optimizing Index Performance – A CMU Intern’s Work at Volt Active Data

Key Takeaways

About Author

Featured Resources

5 Reasons Volt Was Built for Telco-Grade Resiliency

The Real-Time Data Platform for Financial Services

Follow Us:

Categories

Power Real-Time BFSI Success

Guide to Streaming Data Platforms

Volt Active Data’s Top-10 Capabilities

Why Your Tech Stack Is About to Break (and How to Avoid It)

Test Drive the Only Lightening-Fast No-Compromise Real-Time Data Platform on the Planet

Guide to Private 5G Networks