Introducing CREATE TASK

Volt Active Data Technical Spotlight blog

Introducing CREATE TASK

January 17, 2020

Second post in a series on features of Volt Active Data Version 9.2.

Version 9.2 of Volt Active Data introduces two new features that are highly relevant to people developing streaming applications. This is part two of a series of three blog posts:

This blog post will focus on CREATE TASK. Use the link above to find a link to MIGRATE and look for an upcoming post on an example of an application that uses MIGRATE and CREATE TASK.

What is CREATE TASK?

Most things that happen inside Volt Active Data are event driven – we receive a message about something that happens in the outside world and then take a decision or compute an outcome of some kind. But in any real-world system there are activities that need to take place at regular intervals. Traditionally people use cron or some other mechanism to do this but Volt Active Data’s new CREATE TASK functionality is better for the following reasons:

  • Scalability
    Volt Active Data handles scalability by partitioning the data. Unlike cron, which neither knows nor cares how the data is partitioned, a Volt Active Data task can be directed to run on all the partitions at roughly the same moment in time. Using the “ON PARTITIONS” syntax a workload can be distributed over all the nodes of a cluster, without the user having to know or care about which nodes or which partitions.
  • Reliability
    A more fundamental problem with scheduling tasks using cron is that Volt Active Data tasks live inside the database cluster, not on a single node. If we are relying on cron and the node hosting cron dies so does the task. Now, one can get round this by creating a second cron job on a different machine that watches the first one, but that’s hardly satisfactory. Volt Active Data makes sure that tasks, once created, will continue to run as requested, as long as the database cluster has enough nodes to continue operating.

Using Create Task

The Create Task syntax is:

CREATE TASK task-name
ON SCHEDULE {CRON cron-definition | DELAY time-interval | EVERY time-interval}
PROCEDURE procedure-name [WITH (argument [,...])]
[ON ERROR {LOG | IGNORE | STOP} ]
[RUN ON {DATABASE | PARTITIONS} ]
[AS USER user-name]
[ENABLE | DISABLE]
time-interval: integer {MILLISECONDS | SECONDS | MINUTES | HOURS | DAYS}

The single most important decision you’ll need to make is arguably the distinction between running something on the entire DATABASE transactionally or on individual PARTITIONS. Some tasks require read-consistent view of the entire database to work, such as the total amount of money held is customer balances at a moment in time. Other tasks may not require visibility to data outside a partition, or even that they be run at the exact same instant on every single partition. In the Volt Active Data universe there are significant performance advantages to running on PARTITIONS as opposed to the entire DATABASE. If you decide to run on PARTITIONS you will need to use a DIRECTED procedure.

The second thing you need to consider is the time interval. Volt Active Data gives you considerable granularity, from milliseconds to days or even (if you use a crontab file) months. But you need to look at the task you have and where possible avoid creating tasks that take more than a couple of seconds. Even if it’s a ‘batch job’ where possible you should structure it so it spends a few dozen milliseconds nibbling away at the task in hand, instead of disappearing for minutes. For people used to traditional RDBMS products this will seem totally counterintuitive, but doing it this way avoids harming SLAs for other requests Volt Active Data is trying to handle.

Once you’ve defined what your task will do, and how long it will take to do it, you then need to decide how often it repeats. The ‘EVERY’ option queues an invocation at fixed intervals, but assumes you are 100% certain you will never take longer than the interval to run, as doing so leads to grumpy messages in the log. ‘DELAY’ is more practical, in that it defines how long the job spends not running, and thus guarantees that other requests will be served. There is also an option to use cron.

Conclusion

Learn more about Version 9.2 of Volt Active Data for yourself and try Volt today!

  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6