Why graph analytics inside a reasoning language?
Many real-world knowledge-management tasks (supply-chain optimisation, fraud detection, recommendation, etc.) are both graph-shaped and rule-driven. Vadalog lets you run industrial-strength graph algorithms and logical reasoning in the same declarative program. Execution is automatically parallel and distributed.Using Graph Functions
A graph algorithm appears as a function literal that starts with# and lives in a rule body:
Built-in Graph Function Catalogue
| Function | Problem solved | Input predicate (arity) | Output variables | Notes |
|---|---|---|---|---|
#TC(P) | Transitive Closure – reachability. | P/2 edge(U,V) | (X,Y) | Directed graph. Includes self-loops from cycles. |
#PATHS(P[,options]) | All Paths with configurable options. | P/2 edge(U,V) + optional string | (X,Y,Visited) or (X,Y) | Optional parameters: visited=true/false (default false), self_loops=true/false (default false), all_paths=true/false (default false), max_depth=N (default -1 = unlimited), source=X (only paths from node X), target=Y (only paths to node Y), sources=[a,b,...] (only paths from this set of nodes), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[s1:t1,s2:t2,...] (only the listed origin–destination couples). source/sources are mutually exclusive (same for target/targets), and none of them can be combined with source_target_pairs. Format: "visited=true,max_depth=5,source=A,target=D", "sources=[A,B],targets=[D]" or "source_target_pairs=[A:D,B:D]". |
#ASP(P[,options]) | All‑Shortest Paths (weighted). | P/3 edge(U,V,W) + optional string | (X,Y,Z) or (X,Y,Z,Visited) | Parallel multi‑source Dijkstra. Can include self-loops from cycles. Optional parameters: visited=true/false (default false; when true returns all equal-cost shortest paths with the ordered list of visited nodes), source=X (only paths from node X), target=Y (only paths to node Y), sources=[a,b,...] (only paths from this set of nodes), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[s1:t1,s2:t2,...] (only the listed origin–destination couples). source/sources are mutually exclusive (same for target/targets), and none of them can be combined with source_target_pairs. Format: "visited=true,source=1,target=4", "sources=[1,2],targets=[4]" or "source_target_pairs=[1:4,2:3]". |
#SSSP(P[,options]) | Single‑Source Shortest Path (weighted). | P/3 edge(U,V,W) + optional string | (X,Y,Z) or (X,Y,Z,Visited) | Same engine as #ASP but a source is required, so results are restricted to shortest paths from one source. Output is (Source, Target, Distance); the trivial self-row is not emitted. Optional parameters: visited=true/false (default false), target=Y (only the path to node Y), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[...] (satisfies the required-source rule). The required source may also be supplied as a single-element sources=[X]. Format: "source=1" or "source=1,target=4,visited=true". |
#BFS(P[,options]) | Breadth‑First Levels (unweighted shortest path). | P/2 edge(U,V) (or P/3, weights ignored) + optional string | (X,Y,Level) or (X,Y,Level,Visited) | Level = minimum number of hops between the couple, as an integer. Same engine as #ASP but with unit weights; any edge weight is ignored. Optional parameters: source=X, target=Y, sources=[a,b,...], targets=[x,y,...], source_target_pairs=[...], visited=true/false (default false; adds the ordered fewest-hops path). Source is optional (all-sources by default). Format: "source=1", "sources=[1,5]" or "source=1,visited=true". |
#CC(P[,options]) | Connected Components with optional sorting and ID. | P/2 edge(U,V) + optional string | (X,Comp) or (X,ID,Comp) | Optional parameters: sort_component=true/false (default false), component_id=true/false (default false). Format: "sort_component=true,component_id=true". For undirected graphs (assumes edges are symmetric). |
#WCC(P) | Weakly Connected Components. | P/2 | (X,Comp) | For directed graphs – ignores edge direction. |
#TRI(P) | Triangle Enumeration. | P/2 | (X,Y,Z) | Unordered triangles. |
#DC(P[,options]) | Degree Centrality | P/2 edge(U,V) | (X,Score) | Normalized by (N-1). Optional parameters: type=in/out/total (default total). Format: "type=in". Scores range 0.0–1.0. |
#BC(P[,options]) | Betweenness Centrality | P/2 edge(U,V) | (X,Score) | Optional sampling for large graphs. Normalized by (N-1)(N-2). Optional parameters: sample=K for approximate BC using K random sources. Format: "sample=500". Scores range 0.0–1.0. |
#PR(P[,options]) | PageRank | P/2 edge(U,V) | (X,Score) | Optional parameters: damping=d (default 0.85), tolerance=t (default 0.0001). Format: "damping=0.85,tolerance=0.0001". Scores sum to 1.0. |
Examples
1 Transitive closure
(1,2), (2,3), (3,4), (4,1), (1,3), (2,4), (3,1), (4,2), (1,4), (2,1), (3,2), (4,3), (1,1), (2,2), (3,3), (4,4) – 16 tuples total.
2 PATHS function with configurable options
paths_default Output: All reachable pairs including self-loops from cycles (using defaults): (1,2), (2,3), (3,1), (1,3), (2,1), (3,2), (1,1), (2,2), (3,3) – 9 tuples.
paths_visited Output: Only simple acyclic paths: (1,2,[1,2]), (2,3,[2,3]), (3,1,[3,1]), (1,3,[1,2,3]), (2,1,[2,3,1]), (3,2,[3,1,2]) – 6 tuples. Each includes visited nodes.
paths_with_loops Output: Same as paths_default – 9 tuples (self-loops already included by default).
paths_depth1 Output: Only direct edges: (1,2), (2,3), (3,1) – 3 tuples.
paths_visited_depth2 Output: Paths up to 2 hops with visited tracking: (1,2,[1,2]), (2,3,[2,3]), (3,1,[3,1]), (1,3,[1,2,3]), (2,1,[2,3,1]), (3,2,[3,1,2]) – 6 tuples.
2b PATHS with source and target filtering
Thesource and target parameters restrict path enumeration to specific endpoints. This is particularly useful for large graphs where you only care about paths between two known nodes.
paths_from_a Output: All paths originating at A: (A,B,[A,B]), (A,C,[A,C]), (A,D,[A,B,D]), (A,D,[A,C,D]), (A,E,[A,B,E]), (A,F,[A,B,D,F]), (A,F,[A,C,D,F]) – 7 tuples. All start from A.
paths_to_f Output: All paths ending at F: (A,F,[A,B,D,F]), (A,F,[A,C,D,F]), (B,F,[B,D,F]), (C,F,[C,D,F]), (D,F,[D,F]) – 5 tuples. All end at F.
paths_a_to_f Output: Only paths from A to F: (A,F,[A,B,D,F]), (A,F,[A,C,D,F]) – 2 tuples. Two distinct routes from A to F.
To restrict #PATHS to several specific origin–destination couples at once, use
source_target_pairs. Only the listed sources are explored, which is cheaper
than computing all paths and filtering afterwards:
paths_pairs Output: (1,4), (2,3). Square brackets and per-node quotes are optional; quote a node only when its identifier contains a separator (:, ,, |) or whitespace, e.g. source_target_pairs=['New York':'Boston']. This option cannot be combined with source/target and works with visited=true/all_paths=true too.
When you want several sources or targets (but not a fixed pairing between
them), use the list options sources and targets. They generalise the single
source/target: sources=[a,b] keeps paths starting from any of the
listed nodes, targets=[x,y] keeps paths ending at any of them, and using
both restricts to the cross-product (every listed source to every listed
target):
paths_from_sources Output: (1,2), (1,3), (1,4), (5,6) — every path from
node 1 plus the path from node 5.
paths_to_targets Output: (1,3), (2,3), (1,4), (2,4), (3,4) — every path
ending at 3 or 4.
paths_cross Output: (1,4), (2,4) — only the source/target cross-product.
sources/targets follow the same quoting rules as source_target_pairs
(brackets and per-node quotes optional; quote a node only when its identifier
contains a separator or whitespace, e.g. sources=['New York','k']). source
and sources are mutually exclusive (likewise target/targets), and none of
them can be combined with source_target_pairs.
2c PATHS — every option on one graph (copy‑paste & compare)
This is a single, self-contained script that runs#PATHS with each option on
the same small DAG so you can paste it as-is and diff the outputs side by side.
| Output | Rows | Result |
|---|---|---|
paths_all | 9 | (1,2),(1,3),(1,4),(1,5),(2,4),(2,5),(3,4),(3,5),(4,5) |
paths_visited | 9 | (1,2,[1,2]),(1,3,[1,3]),(1,4,[1,2,4]),(1,5,[1,2,4,5]),(2,4,[2,4]),(2,5,[2,4,5]),(3,4,[3,4]),(3,5,[3,4,5]),(4,5,[4,5]) |
paths_depth2 | 8 | same as paths_all minus (1,5) (3 hops away) |
paths_from_1 | 4 | (1,2,[1,2]),(1,3,[1,3]),(1,4,[1,2,4]),(1,5,[1,2,4,5]) |
paths_1_to_5 | 2 | (1,5,[1,2,4,5]),(1,5,[1,3,4,5]) — both routes |
paths_pairs | 2 | (1,5),(2,4) |
visited=true alone returns one representative
simple path per (source,target) couple, while adding all_paths=true
enumerates every distinct route (compare paths_from_1’s single 1→5 row
with paths_1_to_5’s two rows).
3 All shortest paths
#ASP also accepts optional parameters to restrict the result and to return the
visited nodes along each shortest path:
source_target_pairs, the source and target values may be quoted
when the node identifier is a complex string (e.g. source='New York'); quotes
are optional for numbers and simple strings.
asp_from_1 Output: (1,2,10), (1,3,5), (1,4,11) — shortest distance from 1 to each reachable node (1 → 2 → 4 costs 11).
asp_1_to_4 Output: (1,4,11) — only the requested source/target pair.
asp_visited Output: (1,2,10,[1,2]), (1,3,5,[1,3]), (1,4,11,[1,2,4]) — when more than one shortest path of equal cost exists, each is returned on its own row.
To answer many-to-many “origin–destination” queries efficiently, restrict the
computation to an explicit list of couples with source_target_pairs. Only the
sources that appear in the list are processed, so this is much cheaper than
computing the full all‑pairs result and filtering afterwards:
asp_pairs Output: ("a","b",1.0), ("k","z",3.0).
Quoting rules: single (or double) quotes around each node are optional for
numbers and simple strings — source_target_pairs=[1:4,2:3] and
source_target_pairs=[a:b] are both fine. Quotes are required only when an
identifier is a complex string that itself contains a separator character
(:, ,, |) or whitespace, e.g. source_target_pairs=['New York':'Los Angeles','a:b':'c,d'].
Square brackets are also optional. This option cannot be combined with
source/target.
To restrict the shortest-path computation to a set of sources and/or targets
(without pinning specific couples), use the sources and targets list
options. They generalise the single source/target: combining both keeps the
cross-product (every listed source to every listed target):
asp_from_sources Output: (1,2,10), (1,3,5), (1,4,11), (2,3,2), (2,4,1) —
shortest paths from sources 1 and 2.
asp_to_targets Output: (1,3,5), (2,3,2), (1,4,11), (2,4,1), (3,4,9) —
shortest paths to targets 3 and 4.
asp_cross Output: (1,4,11), (2,4,1) — the source/target cross-product.
sources/targets use the same quoting rules as source_target_pairs.
source/sources are mutually exclusive (likewise target/targets), and none
of them can be combined with source_target_pairs.
3a Single-Source Shortest Path
#SSSP is the same weighted shortest-path engine as #ASP, but the source
parameter is required, so the output is restricted to shortest paths from a
single source node. The output schema is (Source, Target, Distance) and the
trivial self-row is not emitted.
target/targets and visited parameters from #ASP are also
available, e.g. #SSSP(edge, "source=1,target=4") returns only (1, 4, 11), and
#SSSP(edge, "source=1,visited=true") adds the ordered list of visited nodes as
a fourth column. The required source may equivalently be given as a single-element
list sources=[1]; source_target_pairs also satisfies the required-source rule
(e.g. #SSSP(edge, "source_target_pairs=[1:2,1:4]")).
3b Shortest paths — every option on one graph (copy‑paste & compare)
A single self-contained script that exercises#ASP and #SSSP with each option
on the same weighted graph. The graph has two equal-cost routes from 1 to
4 (and on to 5), which is what makes visited=true return more than one row.
| Output | Rows | Result |
|---|---|---|
asp_all | 9 | (1,2,1),(1,3,1),(1,4,2),(1,5,4),(2,4,1),(2,5,3),(3,4,1),(3,5,3),(4,5,2) |
asp_from_1 | 4 | (1,2,1),(1,3,1),(1,4,2),(1,5,4) |
asp_1_to_5 | 1 | (1,5,4) |
asp_visited | 6 | (1,2,1,[1,2]),(1,3,1,[1,3]),(1,4,2,[1,2,4]),(1,4,2,[1,3,4]),(1,5,4,[1,2,4,5]),(1,5,4,[1,3,4,5]) |
asp_pairs | 2 | (1,4,2),(2,5,3) |
sssp_from_1 | 4 | (1,2,1),(1,3,1),(1,4,2),(1,5,4) — identical to asp_from_1 |
#PATHS, visited=true here returns all
equal-cost shortest paths (two rows each for 1→4 and 1→5), and #SSSP is
simply #ASP with a mandatory source, so sssp_from_1 matches asp_from_1.
3c Breadth-First levels (BFS)
#BFS returns the level = the minimum number of hops between nodes. It is an
unweighted shortest path (the same engine as #ASP/#SSSP with every weight set
to 1), and the level is an integer. Any edge weight in the input is ignored.
bfs_from_1 Output: (1,2,1), (1,3,1), (1,4,2), (1,5,3) — hop distance from node 1 to each reachable node.
#BFS accepts the same options as #ASP:
source/target/sources/targets/source_target_pairs restrict the couples,
and visited=true adds the ordered fewest-hops path as a fourth column. The
source is optional — omit it to compute levels between every reachable pair, e.g.
bfs_all(X,Y,L) <- #BFS(edge).
Use the sources/targets list options to restrict to a set of endpoints, e.g.
levels from several start nodes at once:
bfs_from_sources Output: (1,2,1), (1,3,2), (5,6,1) — hop distances from
nodes 1 and 5.
Distance to a target
You can also specify onlytarget (without source) to get the hop-distance
from every node to that target:
#PATHS, #ASP and #BFS (but not #SSSP, which always
requires a source). Note that target-only does not prune the computation —
the engine groups by source, so it computes from all sources and then filters to
the target. If “distance to one target” is a hot path on a large graph, reverse
the edges and query by source instead, which runs a single traversal from the
target:
#PATHS and #ASP.
4 Connected Components
4a. Simple disconnected graph
cc_default Output: Each node mapped to its component array (order may vary): (1, [1,2,3]), (2, [1,2,3]), (3, [1,2,3]), (4, [4,5]), (5, [4,5]).
cc_sorted Output: Same but arrays are sorted: (1, [1,2,3]), (2, [1,2,3]), (3, [1,2,3]), (4, [4,5]), (5, [4,5]).
cc_with_id Output: Includes component ID (minimum node): (1, 1, [1,2,3]), (2, 1, [1,2,3]), (3, 1, [1,2,3]), (4, 4, [4,5]), (5, 4, [4,5]).
cc_full Output: Both sorted and with ID: (1, 1, [1,2,3]), (2, 1, [1,2,3]), (3, 1, [1,2,3]), (4, 4, [4,5]), (5, 4, [4,5]).
4b. Cyclic graph
(1, [1,2,3,4]), (2, [1,2,3,4]), (3, [1,2,3,4]), (4, [1,2,3,4]).
4c. Multiple disconnected components
(1, [1,2]), (3, [3,4]), (5, [5,6]).
4d. Why use component_id=true?
The component ID (minimum node) is crucial for deterministic results in distributed execution:
5 Connected Components: CC vs WCC
{1,2} and {3,4}.
Key difference:
#CC: Expects edges to already be symmetric (undirected). You must explicitly createedge(X,Y)andedge(Y,X)for each undirected edge.#WCC: Designed for directed graphs. Automatically treatsedge(1,2)as ifedge(2,1)also exists (ignores direction).
- Use
#CCwhen your graph is naturally undirected (friendships, co-authorship, etc.) and you’ve made edges symmetric. - Use
#WCCwhen your graph is naturally directed (web links, citations, follows) but you want to find components ignoring direction.
Graph Analytics with Struct Nodes
All graph functions (#TC, #PATHS, #ASP, #CC) automatically support struct nodes, allowing you to attach rich metadata (properties, attributes) to graph nodes while preserving them throughout computation.
Why Struct Nodes?
In real-world graphs, nodes often have multiple properties:- People: ID, name, age, department
- Companies: ID, name, industry, country
- Cities: ID, name, population, coordinates
How It Works
Key requirements:- Both source and destination nodes must be structs (or both simple)
- The first field of the struct is used as the node ID for comparison and deduplication
- The full struct is preserved in the output
Example 1: Transitive Closure with Struct Nodes
Example 2: PATHS with Struct Nodes and Visited Tracking
Example 3: All-Shortest Paths with Weighted Struct Edges
Example 4: Connected Components with Struct Nodes
Benefits of Struct Nodes
- Rich Context: Keep all node metadata throughout graph traversal
- Type Safety: First field must be unique ID; all other fields are arbitrary
- Query Flexibility: Filter/aggregate on any struct field in downstream rules
- Join Elimination: No need to join back with node tables to get properties
- Automatic: Functions auto-detect struct vs simple nodes
Performance Notes
- Lightweight internals: Graph algorithms work with IDs only (first field)
- Full reconstruction: Complete structs are rebuilt for output
- Memory efficient: ~50% savings vs storing full edges in adjacency lists
- Type validation: Both source and destination must be the same type (both structs or both simple)
Centrality Measures
Centrality measures quantify the importance of nodes in a graph. Vadalog provides multiple centrality algorithms, all optimized for distributed computation.6 Degree Centrality
Degree Centrality measures node importance by counting connections, normalized by the maximum possible degree.dc_total:
- Always normalized by (N-1) where N is total nodes
- Three degree types:
in,out,total(default)
7 Betweenness Centrality
Betweenness Centrality measures how often a node lies on shortest paths between other nodes. High betweenness indicates “bridge” nodes.bc_exact (linear chain):
- Normalized by (N-1)(N-2)
- Sampling support: Use
sample=Kfor approximate results on large graphssample=500: Compute from 500 random sources instead of all N- Provides unbiased estimates with significant speedup
- Recommended for graphs >10M edges
8 PageRank
PageRank computes node importance using link structure. Originally used by Google for web search ranking.pr_default (cycle):
- Scores always sum to 1.0 (probability distribution)
- Handles dangling nodes (no outgoing edges) correctly
- Parameters:
damping: Probability of following a link (default 0.85)tolerance: Convergence threshold (default 0.0001)

