msum(X, [K1, …, Kn])for the incremental computation of sumsmprod(X, [K1, …, Kn])for the incremental computation of productsmcount([K1])for the incremental computation of countsmmin(X, [K1, …, Kn])for the incremental computation of minimalmmax(X, [K1, …, Kn])for the incremental computation of maximalmunion(X, [K1, …, Kn])for the incremental union of setsmavg(X)for the incremental computation of averagesmmedian(X, "variant")for the incremental computation of medians
mcount, take as first argument the value to be used in the incremental computation of the aggregation.
Some aggregate functions cannot be used inside a recursive rule because the value they return may change in a non-monotonic way when new facts arrive (e.g., the “winner” of an election can be replaced by a later vote). These functions are fully supported in non-recursive queries or as a post-processing step after the recursive evaluation has converged.
These functions are:
maxcount()selects, for each group identified by the keys, the row that occurs most often and the corresponding count.
Basic Example
As an example consider the following program:showLineNumbers
ssum contains the following tuples:
pprod contains the following tuples:
pmin contains the following tuples:
pmax contains the following tuples:
ccount contains the following tuples:
ccount_other contains the following tuples:
aavg contains the following tuples:
mmedian_result contains the following tuples:
General Form
A rule with an assignment using an aggregation operator has the general form:K1, …, Knare zero or more group by argumentsbodyis the rule body,maggris the aggregation function desired (mmin,mmax,msum,mprod),C1, …, Cmare zero or more variables of the body (withCi ≠ Kj, 1 ≤ i ≤ m, 1 ≤ j ≤ n), which we call contributor argumentsxis a constant, a body variable or an expression containing only single-fact operators.
K1, …, Kn, a monotonic increasing (or decreasing) aggregate function maggr maps an input multi-set of vectors G, each of the form gi = (C1, … , Cm, xi) into a set D of values, such that xi is in D if xi is less (greater) than the previously observed value for the sub-vector of contributors (C1, …, Cٖm).
Such aggregation functions are monotonic with respect to the set containment and can be used in Vadalog together with recursive rules to calculate aggregates without resorting to stratification (separation of the program into ordered levels on the basis of the dependencies between rules).
In the execution of a program, for each aggregation, the aggregation memorizes for each vector (C1, …, Cٖm) the current minimum (or maximum) value of x. Then, for each activation of a rule with the monotonic aggregation at hand, an updated value for the group selected by K1, …, Km is calculated by combining all the values in D for the various vectors of contributors.
The kind of combination specifically depends on the semantics of maggr (e.g. minimum, maximum, sum, product, count) and, provided the monotonicity of the aggregates, it is easily calculated by memorizing a current aggregate, which is updated with the new contributions brought by the sequence of rule activations.
The following assumption is made:
if a position pos in a head for predicate p is calculated with an aggregate function, whenever a head for p appears in any other rule, pos must be existentially quantified and calculated with the same aggregate function.This assumption guarantees the homogeneity of the facts with existentially aggregated functions.
Simple Sum Example
Let us start with a basic example.showLineNumbers
msum, we calculate the current aggregate, for each group, denoted by variable Y in f. So for the first group, “a”, we have 1, and than 3=1+2. For the second group, “b”, we have 4 and then 7=4+3.
This example gives the flavour of how monotonic aggregations work. For each activation of the rule, the current result is produced. Since the aggregations are monotonic, we are certain that the maximum (or minimum) value for each group is deterministically the correct aggregate.
On the other hand, the intermediate values for partial aggregates are non-deterministic, since they depend on the specific execution order.
Contributors
Let us now consider contributors through the following example.showLineNumbers
Graph Indegree Example
showLineNumbers
J > 2.
Post-processing with @post
In addition, Vadalog Parallel provides an annotation mechanism that allows to introduce special behaviors, pre-processing and post-processing features. A post-processing annotation can be used to filter only the maximum (or minimum) values for each group as follows (see the section on Post-processing for more details).
showLineNumbers
Using mmin and mmax Without Intermediate Results
The aggregates mmin and mmax have the expected semantics with the difference that they produce no intermediate results.
Consider for example the following program.
showLineNumbers
showLineNumbers
Counting with mcount
The aggregate mcount has the expected semantics and does not produce any intermediate results.
Consider for example the following program.
showLineNumbers
Set Union with munion
To combine some properties into a set, you can simply do the following:
Median Aggregation
The aggregatemmedian computes the median (middle value) of a dataset. Unlike mean/average, median is robust to outliers and provides a better measure of central tendency for skewed distributions.
Vadalog Parallel provides three variants of median computation:
-
mmedian(X, "exact")- Exact median computation- Stores all values and computes precise median
- Best for small to medium datasets (< 10,000 values)
- 100% accuracy
-
mmedian(X, "p2_algorithm")- P² algorithm approximation- Uses only 5 markers regardless of dataset size
- Best for very large datasets (millions+ of values)
- Typical accuracy: 95-99%
- Minimal memory footprint (40 bytes)
-
mmedian(X, "reservoir_sampling")- Reservoir sampling approximation- Maintains a sample of 1000 values
- Best for large datasets (10,000 - 1,000,000+ values)
- High accuracy with constant memory (8KB)
| Variant | Best For | Accuracy | Memory |
|---|---|---|---|
"exact" | Small to medium datasets (< 10,000 values) | 100% | All values |
"p2_algorithm" | Very large datasets (millions+) | 95-99% | 40 bytes |
"reservoir_sampling" | Large datasets (10,000-1,000,000+) | High | 8KB |
Example: Basic Median Computation
Example: Median vs Average with Outliers
Median is particularly useful when dealing with outliers:Choosing the Right Variant
-
Use “exact” when:
- Dataset is small (< 10,000 values)
- 100% accuracy is required
- Memory is not a constraint
-
Use “p2_algorithm” when:
- Dataset is very large (millions+ of values)
- Memory is extremely limited
- ~95-99% accuracy is acceptable
-
Use “reservoir_sampling” when:
- Dataset is large (10,000+ values)
- Good balance between accuracy and performance needed
- Consistent memory usage is important
maxcount
The aggregate maxcount returns the key tuple with the highest frequency

