An intern’s experience at Volt Active Data: Partial snapshots, YCSB benchmark

An intern’s experience at Volt Active Data: Partial snapshots, YCSB benchmark

September 07, 2016

Working at Volt Active Data is a fantastic experience. Coming from Brown CS, I worked as a software engineering intern during the summer of 2016. I was part of the Data Replication (DR) team, which builds cross data-center replication and database backup and recovery software.

I started by making improvements to the partial snapshot feature. A database snapshot records the state of the database at a particular point in time. It is used mostly for backup. Since a full backup may take a long time for a large enterprise database, Volt Active Data developed a new feature called ‘partial snapshot’. The user can specify the table(s) in the database to be saved in a snapshot. Partial snapshots can save considerable disk space and backup / recovery time because less data is saved, compared to a full snapshot.

There were some minor issues this new feature: what if a user specifies a non-existing table? How do you determine a consistent state from several snapshots that contain tables saved at different time? Things get more complicated when taking multi-thread issues into consideration. I am very pleased I solved those issues. Now Volt Active Data rejects a bad snapshot request with non-existing tables with a clear error message. We only allow one snapshot per nonce (the unique name of the snapshot) to ensure consistency, regardless of the number of tables included in the snapshot. Database recovery recognizes partial snapshots and avoids recovering from them. My favorite deliverable was allowing restore from multiple partial snapshots. By removing some constraints stored in the internal ZooKeeper and changing the way the database restores metadata, users can now restore as many partial snapshots as they want. This enables users to re-assemble a database from different pieces (snapshots) of data.

Another area I worked on was benchmarking with YCSB (the Yahoo! Cloud Serving Benchmark). YCSB is a well-developed benchmark used to compare different databases. In addition to testing Volt Active Data’s performance, it’s necessary to compare Volt Active Data’s YCSB performance with other databases, since the benchmark is widely used by vendors and users. I did different experiments with different configurations, such as the size of cluster and clients; the number of client threads and running parameters; and trying to locate throughput and latency bottlenecks. After modifying YCSB run scripts in Jenkins CI, our YCSB performance test is more stable and meaningful. I also updated Volt Active Data’s YCSB driver to make it compatible with the latest YCSB framework, so we now get a more precise throughput and latency report from the benchmark.

“Innovation Week” is an interesting quarterly event at Volt Active Data. During Innovation Week, engineers are encouraged to implement anything they want – from a code refactoring, to refining and experimenting with new ideas. That is so cool! A lot of new features or improvements are developed during the week. I collaborated with Oluwafemi Olukoya, another intern from Carnegie Mellon University, on the MTA real-time transit project. By using Volt Active Data and real-time data feeds from the MTA, we were able to do ETL and querying extremely fast. With stored procedures, questions like “find trains near me” or “calculate a route from A to B” can be answered. We also designed a framework that can “pull” and “chew” data from arbitrary data sources. Another promising idea we worked on during that project was using plugin APIs from Google Maps, so visualizations like planning routes on maps will be accessible.

The internship experience at Volt Active Data was fantastic. I learned very quickly with help from my mentor. He helped me resolve my doubts about my approach to solving problems and responded to questions incredibly fast, while also letting me think about problems independently. I really appreciate that “mentoring strategy,” which benefited me both by strengthening my skills and improving my thought processes.

Moreover, people at Volt Active Data are like a family. They care not only about your work but also about your life. We had lots of pleasant conversations during lunchtime and played ping pong a lot.

I appreciate what I got from Volt Active Data: knowledge, skills, ambitions and tons of memories. Thank you Volt Active Data!

  • 184/A, Newman, Main Street Victor
  • 889 787 685 6