Table of Contents
How VoltSP Stream Processing Works
Stream data processing has become a critical component of business operations. The exponential growth of available data and the pressure to act on information in real time have made traditional computing approaches obsolete. It is no longer sufficient to gather data and post process it to determine what actions to take. Now businesses need to operate on the data in flight to filter, format, validate, measure, and respond to events in a timely manner.
And for simple operations this works. Many operations, like filtering data based on fixed rules or converting from one format to another, can be performed at speed. However, the Achilles heel of stream data processing is the fact that many operations still require access to up-to-date and entrusted information such as customer accounts, inventory levels, and resource availability. These stateful operations, if performed against a traditional SQL database, incur the same latency to which previous centralized operations were susceptible. Which is where VoltSP comes in.
By integrating stream data processing with Volt Active Data — an ACID database designed to maximize throughput without sacrificing consistency, durability, or availability — VoltSP makes it possible to combine both stateless and stateful processing in flight and at speed.
Figure 3.1. VoltSP Architecture
VoltSP Architecture
The VoltSP architecture consists of three primary parts: sources, sinks, and processors. And the Domain Specific Language you use to define VoltSP pipelines mirror the exact same structure, letting you define the source, one or more processors, and a sink:
stream
source
processor
processor
processor
[ ... ]
sink
Where the processors can be any combination of stateless or stateful operations, with Volt Active Data providing real time access to reference data that can be used to verify, authenticate, authorize, or in other ways validate and enhance the data as it passes.
The advantages the VoltSP architecture offers are:
- Cloud Native — VoltSP pipelines are designed from the ground up to run in the cloud. It is also self contained and does not require any additional infrastructure (such as resource managers, schedulers, or the like). This allows for easy setup, scaling, and management.
- Apache Kafka and Volt Active Data integration — Kafka is supported out of the box as a data source and both Kafka and Volt Active Data are supported as sinks for the pipeline, so that setting up the initial pipeline template is trivial.
- Complex business logic — Partitioned procedures in Volt Active Data can be used to incorporate complex, stateful operations on the data without sacrificing latency.
- Flexibility — The pipelines are designed as templates, using placeholders for key resources such as server addresses and topic names, so that different pipelines can be created from the same template by identifying different resources in the properties at runtime.
- Scalability — The pipelines themselves can be scaled at runtime completely separately from the resources, such as Kafka servers or Volt Active Data cluster nodes allowing you to optimize computing resources to match actual needs.
Circuit breaker
A Circuit breaker is a mechanism to temporarily halt processing with a remote system. This is beneficial when a remote system experiences major problems and sending more request will not help. A circuit breaker will try to retry sending requests again to finish any pending processing.
public interface CircuitBreaker {
State getState();
/**
* Checks state of circuit breaker.
* @return true if circuit breaker is closed
*/
default boolean isClosed() {
return getState().isClosed();
}
default boolean isOpen() {
return !isClosed();
}
/**
* Updates the monitored value and/or opens circuit breaker.
*/
void markFailure();
/**
* Updates the monitored value and/or closes circuit breaker.
*/
void markSuccess();
}
During configuration phase an operator can request its own circuit breaker.
interface StreamExecutionContext {
/**
* Use CircuitBreaker to control the data flow of each StreamStage.
* If a sequence of failures occurs, StreamStage will not execute for a certain period.
*
* @param operator to own a circuit breaker
* @return CircuitBreaker assigned to operator within StreamStage
*/
CircuitBreaker getCircuitBreakerFor(Operator operator);
}
The circuit breaker is counting how many failures or successes where received from remote system this operator connects with. When number of failures pass configured threshold, the circuit breaker will open. With the provided implementation it is enough to receive one successful response to close an opened circuit breaker.
If the remote system fails to return to stable state before configured commitResultTimeout, the system will fail commit operation and VoltSp will schedule another batch processing (possibly with same data)
Global circuit breaker
Voltsp also is equipped with global circuit breaker. The main difference is that global circuit breaker will halt current execution and will block any further processing until circuit breaker is closed again. When pipeline's logic realise that remote system is not able to process requests, pipeline may open global circuit breaker and await until current batch is successfully processed.
interface StreamExecutionContext {
/**
* Gives a way to temporarily pause the event processing because of expected of unexpected remote system unavailability.
* Once a circuit breaker is opened voltsp system will wait maximum of {@link VoltEnvironment#CIRCUIT_BREAKER_TIMEOUT_MS_PROPERTY} for remote systems to be back online.
* After configured time voltsp will crash as it cannot make a progress.
*
* @param batchId value of the current batch
* @param dataToRetry non-empty list of events to be reprocessed by failed operator
* @param initialCause of the failure that caused the circuit breaker to be opened
*/
void openCircuitBreaker(long batchId, List<?> dataToRetry, Throwable initialCause);
/**
* Similar to {@link #openCircuitBreaker(long, List, Throwable)} but it will use all events emitted to main sink.
* @param batchId value of the current batch
* @param initialCause of the failure that caused the circuit breaker to be opened
*/
void openCircuitBreaker(long batchId, Throwable initialCause);
}
Note that after configured time VoltSp will crash as it cannot make a progress. See configuration for CIRCUIT_BREAKER_TIMEOUT_MS - currently a default for timeout is 10 minutes.
Committer
A committer is a service that counts how many requests were sent remotely and how many responses were received. If those numbers are equal a committer will successfully complete commit result.
The committer also creates and tracks a CommitHandle created for given batchId
. A commitHandle is used to distinguish commit phases.
VoltSp creates unique batchId for new processing phase and commit handle is create for that batchId
at each commit phase.
Batch processing can be retried and for each retry VoltSp will create new, unique commit handle.
Note that responses related to some old handle are ignored and will not impact current commit phase.
During configuration phase an operator can request its own committer.
interface StreamExecutionContext {
/**
* Use committer to control batch commit for asynchronous processing.
* @param operator to own a committer
* @return a committer
* @param <T> an event type
*/
<T> Committer<T> getCommitterFor(Operator operator);
}