Why graph analytics inside a reasoning language?

Many real-world knowledge-management tasks (supply-chain optimisation, fraud detection, recommendation, etc.) are both graph-shaped and rule-driven. Vadalog lets you run industrial-strength graph algorithms and logical reasoning in the same declarative program. Execution is automatically parallel and distributed.

Using Graph Functions

A graph algorithm appears as a function literal that starts with # and lives in a rule body:
tc_function(X,Y)  <-  #TC(edge).           % transitive closure on edge/2
paths_function(X,Y) <- #PATHS(edge).       % all paths with default options
asp_function(X,Y,Z) <-  #ASP(edge).        % all‑shortest paths on weighted edge/3
Variables on the head of the rule with the function receive the algorithm’s results.

Built-in Graph Function Catalogue

FunctionProblem solvedInput predicate (arity)Output variablesNotes
#TC(P)Transitive Closure – reachability.P/2 edge(U,V)(X,Y)Directed graph. Includes self-loops from cycles.
#PATHS(P[,options])All Paths with configurable options.P/2 edge(U,V) + optional string(X,Y,Visited) or (X,Y)Optional parameters: visited=true/false (default false), self_loops=true/false (default false), all_paths=true/false (default false), max_depth=N (default -1 = unlimited), source=X (only paths from node X), target=Y (only paths to node Y), sources=[a,b,...] (only paths from this set of nodes), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[s1:t1,s2:t2,...] (only the listed origin–destination couples). source/sources are mutually exclusive (same for target/targets), and none of them can be combined with source_target_pairs. Format: "visited=true,max_depth=5,source=A,target=D", "sources=[A,B],targets=[D]" or "source_target_pairs=[A:D,B:D]".
#ASP(P[,options])All‑Shortest Paths (weighted).P/3 edge(U,V,W) + optional string(X,Y,Z) or (X,Y,Z,Visited)Parallel multi‑source Dijkstra. Can include self-loops from cycles. Optional parameters: visited=true/false (default false; when true returns all equal-cost shortest paths with the ordered list of visited nodes), source=X (only paths from node X), target=Y (only paths to node Y), sources=[a,b,...] (only paths from this set of nodes), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[s1:t1,s2:t2,...] (only the listed origin–destination couples). source/sources are mutually exclusive (same for target/targets), and none of them can be combined with source_target_pairs. Format: "visited=true,source=1,target=4", "sources=[1,2],targets=[4]" or "source_target_pairs=[1:4,2:3]".
#SSSP(P[,options])Single‑Source Shortest Path (weighted).P/3 edge(U,V,W) + optional string(X,Y,Z) or (X,Y,Z,Visited)Same engine as #ASP but a source is required, so results are restricted to shortest paths from one source. Output is (Source, Target, Distance); the trivial self-row is not emitted. Optional parameters: visited=true/false (default false), target=Y (only the path to node Y), targets=[x,y,...] (only paths to this set of nodes), source_target_pairs=[...] (satisfies the required-source rule). The required source may also be supplied as a single-element sources=[X]. Format: "source=1" or "source=1,target=4,visited=true".
#BFS(P[,options])Breadth‑First Levels (unweighted shortest path).P/2 edge(U,V) (or P/3, weights ignored) + optional string(X,Y,Level) or (X,Y,Level,Visited)Level = minimum number of hops between the couple, as an integer. Same engine as #ASP but with unit weights; any edge weight is ignored. Optional parameters: source=X, target=Y, sources=[a,b,...], targets=[x,y,...], source_target_pairs=[...], visited=true/false (default false; adds the ordered fewest-hops path). Source is optional (all-sources by default). Format: "source=1", "sources=[1,5]" or "source=1,visited=true".
#CC(P[,options])Connected Components with optional sorting and ID.P/2 edge(U,V) + optional string(X,Comp) or (X,ID,Comp)Optional parameters: sort_component=true/false (default false), component_id=true/false (default false). Format: "sort_component=true,component_id=true". For undirected graphs (assumes edges are symmetric).
#WCC(P)Weakly Connected Components.P/2(X,Comp)For directed graphs – ignores edge direction.
#TRI(P)Triangle Enumeration.P/2(X,Y,Z)Unordered triangles.
#DC(P[,options])Degree CentralityP/2 edge(U,V)(X,Score)Normalized by (N-1). Optional parameters: type=in/out/total (default total). Format: "type=in". Scores range 0.0–1.0.
#BC(P[,options])Betweenness CentralityP/2 edge(U,V)(X,Score)Optional sampling for large graphs. Normalized by (N-1)(N-2). Optional parameters: sample=K for approximate BC using K random sources. Format: "sample=500". Scores range 0.0–1.0.
#PR(P[,options])PageRankP/2 edge(U,V)(X,Score)Optional parameters: damping=d (default 0.85), tolerance=t (default 0.0001). Format: "damping=0.85,tolerance=0.0001". Scores sum to 1.0.

Examples

1 Transitive closure

edge(1,2).
edge(2,3).
edge(3,4).
edge(4,1).

tc_function(X,Y) <- #TC(edge).
@output("tc_function").
Output: All reachable pairs including self-loops from cycles: (1,2), (2,3), (3,4), (4,1), (1,3), (2,4), (3,1), (4,2), (1,4), (2,1), (3,2), (4,3), (1,1), (2,2), (3,3), (4,4) – 16 tuples total.

2 PATHS function with configurable options

edge(1,2).
edge(2,3).
edge(3,1).

% Default behavior (no parameters)
paths_default(X,Y) <- #PATHS(edge).

% Paths with visited nodes tracking (simple paths only)
paths_visited(X,Y,Visited) <- #PATHS(edge, "visited=true").

% Paths with self-loops enabled
paths_with_loops(X,Y) <- #PATHS(edge, "self_loops=true").

% Paths with max depth limit (only 1 hop)
paths_depth1(X,Y) <- #PATHS(edge, "max_depth=1").

% Paths with visited tracking and max depth 2
paths_visited_depth2(X,Y,V) <- #PATHS(edge, "visited=true,max_depth=2").

% All options specified
paths_custom(X,Y,V) <- #PATHS(edge, "visited=true,self_loops=false,all_paths=true,max_depth=10").

@output("paths_default").
@output("paths_visited").
@output("paths_with_loops").
@output("paths_depth1").
@output("paths_visited_depth2").
@output("paths_custom").
paths_default Output: All reachable pairs including self-loops from cycles (using defaults): (1,2), (2,3), (3,1), (1,3), (2,1), (3,2), (1,1), (2,2), (3,3) – 9 tuples. paths_visited Output: Only simple acyclic paths: (1,2,[1,2]), (2,3,[2,3]), (3,1,[3,1]), (1,3,[1,2,3]), (2,1,[2,3,1]), (3,2,[3,1,2]) – 6 tuples. Each includes visited nodes. paths_with_loops Output: Same as paths_default – 9 tuples (self-loops already included by default). paths_depth1 Output: Only direct edges: (1,2), (2,3), (3,1) – 3 tuples. paths_visited_depth2 Output: Paths up to 2 hops with visited tracking: (1,2,[1,2]), (2,3,[2,3]), (3,1,[3,1]), (1,3,[1,2,3]), (2,1,[2,3,1]), (3,2,[3,1,2]) – 6 tuples.

2b PATHS with source and target filtering

The source and target parameters restrict path enumeration to specific endpoints. This is particularly useful for large graphs where you only care about paths between two known nodes.
edge("A","B").
edge("A","C").
edge("B","D").
edge("C","D").
edge("B","E").
edge("D","F").

% Only paths starting from node A
paths_from_a(X,Y,V) <- #PATHS(edge, "visited=true,source=A").

% Only paths ending at node F
paths_to_f(X,Y,V) <- #PATHS(edge, "visited=true,target=F").

% Only paths from A to F (all distinct routes)
paths_a_to_f(X,Y,V) <- #PATHS(edge, "visited=true,all_paths=true,source=A,target=F").

@output("paths_from_a").
@output("paths_to_f").
@output("paths_a_to_f").
paths_from_a Output: All paths originating at A: (A,B,[A,B]), (A,C,[A,C]), (A,D,[A,B,D]), (A,D,[A,C,D]), (A,E,[A,B,E]), (A,F,[A,B,D,F]), (A,F,[A,C,D,F]) – 7 tuples. All start from A. paths_to_f Output: All paths ending at F: (A,F,[A,B,D,F]), (A,F,[A,C,D,F]), (B,F,[B,D,F]), (C,F,[C,D,F]), (D,F,[D,F]) – 5 tuples. All end at F. paths_a_to_f Output: Only paths from A to F: (A,F,[A,B,D,F]), (A,F,[A,C,D,F]) – 2 tuples. Two distinct routes from A to F. To restrict #PATHS to several specific origin–destination couples at once, use source_target_pairs. Only the listed sources are explored, which is cheaper than computing all paths and filtering afterwards:
edge(1,2).
edge(2,3).
edge(3,4).

% Only the couples (1,4) and (2,3)
paths_pairs(X,Y) <- #PATHS(edge, "source_target_pairs=[1:4,2:3]").
@output("paths_pairs").
paths_pairs Output: (1,4), (2,3). Square brackets and per-node quotes are optional; quote a node only when its identifier contains a separator (:, ,, |) or whitespace, e.g. source_target_pairs=['New York':'Boston']. This option cannot be combined with source/target and works with visited=true/all_paths=true too. When you want several sources or targets (but not a fixed pairing between them), use the list options sources and targets. They generalise the single source/target: sources=[a,b] keeps paths starting from any of the listed nodes, targets=[x,y] keeps paths ending at any of them, and using both restricts to the cross-product (every listed source to every listed target):
edge(1,2).
edge(2,3).
edge(3,4).
edge(5,6).

% Paths starting from node 1 or node 5
paths_from_sources(X,Y) <- #PATHS(edge, "sources=[1,5]").

% Paths ending at node 3 or node 4
paths_to_targets(X,Y)   <- #PATHS(edge, "targets=[3,4]").

% Cross-product: from {1,2} to {4}
paths_cross(X,Y)        <- #PATHS(edge, "sources=[1,2],targets=[4]").

@output("paths_from_sources").
@output("paths_to_targets").
@output("paths_cross").
paths_from_sources Output: (1,2), (1,3), (1,4), (5,6) — every path from node 1 plus the path from node 5. paths_to_targets Output: (1,3), (2,3), (1,4), (2,4), (3,4) — every path ending at 3 or 4. paths_cross Output: (1,4), (2,4) — only the source/target cross-product. sources/targets follow the same quoting rules as source_target_pairs (brackets and per-node quotes optional; quote a node only when its identifier contains a separator or whitespace, e.g. sources=['New York','k']). source and sources are mutually exclusive (likewise target/targets), and none of them can be combined with source_target_pairs.

2c PATHS — every option on one graph (copy‑paste & compare)

This is a single, self-contained script that runs #PATHS with each option on the same small DAG so you can paste it as-is and diff the outputs side by side.
% Two routes 1 → 4 (via 2 and via 3), continuing to 5
edge(1,2).
edge(1,3).
edge(2,4).
edge(3,4).
edge(4,5).

% (a) Default: every reachable (source, target) couple
paths_all(X,Y)        <- #PATHS(edge).

% (b) One simple path per couple, with the visited nodes
paths_visited(X,Y,V)  <- #PATHS(edge, "visited=true").

% (c) Limit the traversal to 2 hops
paths_depth2(X,Y)     <- #PATHS(edge, "max_depth=2").

% (d) Only paths starting at node 1
paths_from_1(X,Y,V)   <- #PATHS(edge, "visited=true,source=1").

% (e) Every distinct route from 1 to 5 (note all_paths=true)
paths_1_to_5(X,Y,V)   <- #PATHS(edge, "visited=true,all_paths=true,source=1,target=5").

% (f) A specific set of couples (many-to-many)
paths_pairs(X,Y)      <- #PATHS(edge, "source_target_pairs=[1:5,2:4]").

@output("paths_all").
@output("paths_visited").
@output("paths_depth2").
@output("paths_from_1").
@output("paths_1_to_5").
@output("paths_pairs").
OutputRowsResult
paths_all9(1,2),(1,3),(1,4),(1,5),(2,4),(2,5),(3,4),(3,5),(4,5)
paths_visited9(1,2,[1,2]),(1,3,[1,3]),(1,4,[1,2,4]),(1,5,[1,2,4,5]),(2,4,[2,4]),(2,5,[2,4,5]),(3,4,[3,4]),(3,5,[3,4,5]),(4,5,[4,5])
paths_depth28same as paths_all minus (1,5) (3 hops away)
paths_from_14(1,2,[1,2]),(1,3,[1,3]),(1,4,[1,2,4]),(1,5,[1,2,4,5])
paths_1_to_52(1,5,[1,2,4,5]),(1,5,[1,3,4,5]) — both routes
paths_pairs2(1,5),(2,4)
Key difference to notice: visited=true alone returns one representative simple path per (source,target) couple, while adding all_paths=true enumerates every distinct route (compare paths_from_1’s single 1→5 row with paths_1_to_5’s two rows).

3 All shortest paths

edge(1,2,9).
edge(2,3,5).
edge(3,4,6).
edge(4,1,8).

asp_function(X,Y,Z) <- #ASP(edge).
@output("asp_function").
#ASP also accepts optional parameters to restrict the result and to return the visited nodes along each shortest path:
edge(1,2,10).
edge(1,3,5).
edge(2,3,2).
edge(2,4,1).
edge(3,4,9).

% Only shortest paths from node 1
asp_from_1(X,Y,Z)         <- #ASP(edge, "source=1").

% Only the shortest path(s) from node 1 to node 4
asp_1_to_4(X,Y,Z)         <- #ASP(edge, "source=1,target=4").

% Return all equal-cost shortest paths with the ordered list of visited nodes
asp_visited(X,Y,Z,Visited) <- #ASP(edge, "source=1,visited=true").

@output("asp_from_1").
@output("asp_1_to_4").
@output("asp_visited").
As with source_target_pairs, the source and target values may be quoted when the node identifier is a complex string (e.g. source='New York'); quotes are optional for numbers and simple strings. asp_from_1 Output: (1,2,10), (1,3,5), (1,4,11) — shortest distance from 1 to each reachable node (1 → 2 → 4 costs 11). asp_1_to_4 Output: (1,4,11) — only the requested source/target pair. asp_visited Output: (1,2,10,[1,2]), (1,3,5,[1,3]), (1,4,11,[1,2,4]) — when more than one shortest path of equal cost exists, each is returned on its own row. To answer many-to-many “origin–destination” queries efficiently, restrict the computation to an explicit list of couples with source_target_pairs. Only the sources that appear in the list are processed, so this is much cheaper than computing the full all‑pairs result and filtering afterwards:
edge("a","b",1.0).
edge("a","c",5.0).
edge("b","c",1.0).
edge("k","z",3.0).
edge("k","m",1.0).

% Only the couples a -> b and k -> z
asp_pairs(X,Y,Z) <- #ASP(edge, "source_target_pairs=['a':'b','k':'z']").
@output("asp_pairs").
asp_pairs Output: ("a","b",1.0), ("k","z",3.0). Quoting rules: single (or double) quotes around each node are optional for numbers and simple strings — source_target_pairs=[1:4,2:3] and source_target_pairs=[a:b] are both fine. Quotes are required only when an identifier is a complex string that itself contains a separator character (:, ,, |) or whitespace, e.g. source_target_pairs=['New York':'Los Angeles','a:b':'c,d']. Square brackets are also optional. This option cannot be combined with source/target. To restrict the shortest-path computation to a set of sources and/or targets (without pinning specific couples), use the sources and targets list options. They generalise the single source/target: combining both keeps the cross-product (every listed source to every listed target):
edge(1,2,10).
edge(1,3,5).
edge(2,3,2).
edge(2,4,1).
edge(3,4,9).

% Shortest paths from node 1 or node 2
asp_from_sources(X,Y,Z) <- #ASP(edge, "sources=[1,2]").

% Shortest paths ending at node 3 or node 4
asp_to_targets(X,Y,Z)   <- #ASP(edge, "targets=[3,4]").

% Cross-product: from {1,2} to {4}
asp_cross(X,Y,Z)        <- #ASP(edge, "sources=[1,2],targets=[4]").

@output("asp_from_sources").
@output("asp_to_targets").
@output("asp_cross").
asp_from_sources Output: (1,2,10), (1,3,5), (1,4,11), (2,3,2), (2,4,1) — shortest paths from sources 1 and 2. asp_to_targets Output: (1,3,5), (2,3,2), (1,4,11), (2,4,1), (3,4,9) — shortest paths to targets 3 and 4. asp_cross Output: (1,4,11), (2,4,1) — the source/target cross-product. sources/targets use the same quoting rules as source_target_pairs. source/sources are mutually exclusive (likewise target/targets), and none of them can be combined with source_target_pairs.

3a Single-Source Shortest Path

#SSSP is the same weighted shortest-path engine as #ASP, but the source parameter is required, so the output is restricted to shortest paths from a single source node. The output schema is (Source, Target, Distance) and the trivial self-row is not emitted.
edge(1,2,10).
edge(1,3,5).
edge(2,3,2).
edge(2,4,1).
edge(3,4,9).

% Shortest paths from node 1 to every reachable node
sssp_from_node1(Source, Target, Distance) <- #SSSP(edge, "source=1").
@output("sssp_from_node1").
Output:
sssp_from_node1(1, 2, 10)   % Shortest path 1 → 2
sssp_from_node1(1, 3, 5)    % Shortest path 1 → 3
sssp_from_node1(1, 4, 11)   % Shortest path 1 → 2 → 4
The optional target/targets and visited parameters from #ASP are also available, e.g. #SSSP(edge, "source=1,target=4") returns only (1, 4, 11), and #SSSP(edge, "source=1,visited=true") adds the ordered list of visited nodes as a fourth column. The required source may equivalently be given as a single-element list sources=[1]; source_target_pairs also satisfies the required-source rule (e.g. #SSSP(edge, "source_target_pairs=[1:2,1:4]")).

3b Shortest paths — every option on one graph (copy‑paste & compare)

A single self-contained script that exercises #ASP and #SSSP with each option on the same weighted graph. The graph has two equal-cost routes from 1 to 4 (and on to 5), which is what makes visited=true return more than one row.
% Two equal-cost routes 1 → 4 (via 2 and via 3); 1 → 5 is cheapest through 4
edge(1,2,1.0).
edge(1,3,1.0).
edge(2,4,1.0).
edge(3,4,1.0).
edge(4,5,2.0).
edge(2,5,5.0).

% (a) All-pairs shortest distances
asp_all(X,Y,Z)       <- #ASP(edge).

% (b) From a single source
asp_from_1(X,Y,Z)    <- #ASP(edge, "source=1").

% (c) A single origin → destination pair
asp_1_to_5(X,Y,Z)    <- #ASP(edge, "source=1,target=5").

% (d) All equal-cost shortest paths from 1, with the visited nodes
asp_visited(X,Y,Z,V) <- #ASP(edge, "source=1,visited=true").

% (e) A specific set of couples (many-to-many)
asp_pairs(X,Y,Z)     <- #ASP(edge, "source_target_pairs=[1:4,2:5]").

% (f) Single-Source Shortest Path (source is required for #SSSP)
sssp_from_1(S,T,D)   <- #SSSP(edge, "source=1").

@output("asp_all").
@output("asp_from_1").
@output("asp_1_to_5").
@output("asp_visited").
@output("asp_pairs").
@output("sssp_from_1").
OutputRowsResult
asp_all9(1,2,1),(1,3,1),(1,4,2),(1,5,4),(2,4,1),(2,5,3),(3,4,1),(3,5,3),(4,5,2)
asp_from_14(1,2,1),(1,3,1),(1,4,2),(1,5,4)
asp_1_to_51(1,5,4)
asp_visited6(1,2,1,[1,2]),(1,3,1,[1,3]),(1,4,2,[1,2,4]),(1,4,2,[1,3,4]),(1,5,4,[1,2,4,5]),(1,5,4,[1,3,4,5])
asp_pairs2(1,4,2),(2,5,3)
sssp_from_14(1,2,1),(1,3,1),(1,4,2),(1,5,4) — identical to asp_from_1
Key difference to notice: unlike #PATHS, visited=true here returns all equal-cost shortest paths (two rows each for 1→4 and 1→5), and #SSSP is simply #ASP with a mandatory source, so sssp_from_1 matches asp_from_1.

3c Breadth-First levels (BFS)

#BFS returns the level = the minimum number of hops between nodes. It is an unweighted shortest path (the same engine as #ASP/#SSSP with every weight set to 1), and the level is an integer. Any edge weight in the input is ignored.
edge(1,2).
edge(1,3).
edge(2,4).
edge(3,4).
edge(4,5).

% BFS levels from node 1
bfs_from_1(Source, Target, Level) <- #BFS(edge, "source=1").
@output("bfs_from_1").
bfs_from_1 Output: (1,2,1), (1,3,1), (1,4,2), (1,5,3) — hop distance from node 1 to each reachable node. #BFS accepts the same options as #ASP: source/target/sources/targets/source_target_pairs restrict the couples, and visited=true adds the ordered fewest-hops path as a fourth column. The source is optional — omit it to compute levels between every reachable pair, e.g. bfs_all(X,Y,L) <- #BFS(edge). Use the sources/targets list options to restrict to a set of endpoints, e.g. levels from several start nodes at once:
edge(1,2).
edge(2,3).
edge(5,6).

% Levels from node 1 or node 5
bfs_from_sources(Source, Target, Level) <- #BFS(edge, "sources=[1,5]").
@output("bfs_from_sources").
bfs_from_sources Output: (1,2,1), (1,3,2), (5,6,1) — hop distances from nodes 1 and 5.

Distance to a target

You can also specify only target (without source) to get the hop-distance from every node to that target:
to_target(X, Y, Level) <- #BFS(edge, "target='F'").
This works for #PATHS, #ASP and #BFS (but not #SSSP, which always requires a source). Note that target-only does not prune the computation — the engine groups by source, so it computes from all sources and then filters to the target. If “distance to one target” is a hot path on a large graph, reverse the edges and query by source instead, which runs a single traversal from the target:
redge(Y,X)        <- edge(X,Y).
to_target(X, T, L) <- #BFS(redge, "source='F'").
The same reversed-edge trick applies to #PATHS and #ASP.

4 Connected Components

4a. Simple disconnected graph

% Two separate components: {1,2,3} and {4,5}
edge_raw(1,2).
edge_raw(2,3).
edge_raw(4,5).

% Make edges undirected (symmetric)
edge(X,Y) <- edge_raw(X,Y).
edge(Y,X) <- edge_raw(X,Y).

% Default: no sorting, no component ID
cc_default(X,C) <- #CC(edge).

% With sorted components for determinism
cc_sorted(X,C) <- #CC(edge, "sort_component=true").

% With component ID (minimum node in component)
cc_with_id(X,ID,C) <- #CC(edge, "component_id=true").

% With both sorting and ID for full determinism
cc_full(X,ID,C) <- #CC(edge, "sort_component=true,component_id=true").

@output("cc_default").
@output("cc_sorted").
@output("cc_with_id").
@output("cc_full").
cc_default Output: Each node mapped to its component array (order may vary): (1, [1,2,3]), (2, [1,2,3]), (3, [1,2,3]), (4, [4,5]), (5, [4,5]). cc_sorted Output: Same but arrays are sorted: (1, [1,2,3]), (2, [1,2,3]), (3, [1,2,3]), (4, [4,5]), (5, [4,5]). cc_with_id Output: Includes component ID (minimum node): (1, 1, [1,2,3]), (2, 1, [1,2,3]), (3, 1, [1,2,3]), (4, 4, [4,5]), (5, 4, [4,5]). cc_full Output: Both sorted and with ID: (1, 1, [1,2,3]), (2, 1, [1,2,3]), (3, 1, [1,2,3]), (4, 4, [4,5]), (5, 4, [4,5]).

4b. Cyclic graph

% Single component with cycle: {1,2,3,4}
edge(1,2).
edge(2,3).
edge(3,4).
edge(4,1).

cc(X,C) <- #CC(edge).
@output("cc").
Output: All nodes in one component: (1, [1,2,3,4]), (2, [1,2,3,4]), (3, [1,2,3,4]), (4, [1,2,3,4]).

4c. Multiple disconnected components

% Three separate components
edge(1,2).
edge(3,4).
edge(5,6).

cc(X,C) <- #CC(edge).
@output("cc").
Output: Three components: (1, [1,2]), (3, [3,4]), (5, [5,6]).

4d. Why use component_id=true?

The component ID (minimum node) is crucial for deterministic results in distributed execution:
% Large graph processed across multiple partitions
@bind("large_graph.csv")
edge(X,Y).

% Without ID: each partition might output the same component differently
cc_no_id(X,C) <- #CC(edge).

% With ID: all partitions agree on component identity (minimum node)
cc_with_id(X,ID,C) <- #CC(edge, "component_id=true").

% Deduplicate by component ID
unique_components(ID,C) <- cc_with_id(X,ID,C), X = ID.

@output("unique_components").
Key insight: Since components are disjoint sets (cannot overlap), all partitions discovering the same component will find the same minimum node, guaranteeing deterministic component IDs across the cluster.

5 Connected Components: CC vs WCC

% Directed edges forming two components: {1,2} and {3,4}
directed_edge(1,2).
directed_edge(3,4).

% For CC: make the graph undirected by adding symmetric edges
undirected_edge(X,Y) <- directed_edge(X,Y).
undirected_edge(X,Y) <- directed_edge(Y,X).

% CC expects undirected (symmetric) edges
cc_result(X,C) <- #CC(undirected_edge).

% WCC works directly on directed edges (ignores direction)
wcc_result(X,C) <- #WCC(directed_edge).

@output("cc_result").
@output("wcc_result").
Both outputs produce the same components: {1,2} and {3,4}. Key difference:
  • #CC: Expects edges to already be symmetric (undirected). You must explicitly create edge(X,Y) and edge(Y,X) for each undirected edge.
  • #WCC: Designed for directed graphs. Automatically treats edge(1,2) as if edge(2,1) also exists (ignores direction).
When to use each:
  • Use #CC when your graph is naturally undirected (friendships, co-authorship, etc.) and you’ve made edges symmetric.
  • Use #WCC when your graph is naturally directed (web links, citations, follows) but you want to find components ignoring direction.

Graph Analytics with Struct Nodes

All graph functions (#TC, #PATHS, #ASP, #CC) automatically support struct nodes, allowing you to attach rich metadata (properties, attributes) to graph nodes while preserving them throughout computation.

Why Struct Nodes?

In real-world graphs, nodes often have multiple properties:
  • People: ID, name, age, department
  • Companies: ID, name, industry, country
  • Cities: ID, name, population, coordinates
Struct nodes let you keep this metadata attached during graph traversal instead of just tracking node IDs.

How It Works

Key requirements:
  1. Both source and destination nodes must be structs (or both simple)
  2. The first field of the struct is used as the node ID for comparison and deduplication
  3. The full struct is preserved in the output

Example 1: Transitive Closure with Struct Nodes

% Define node properties
node(1, "Alice", "Engineering").
node(2, "Bob", "Sales").
node(3, "Charlie", "Engineering").

% Define simple edges (just IDs)
edge_simple(1, 2).
edge_simple(2, 3).

% Create struct edges by joining nodes with edges
edge_struct(X, Y) <- 
    edge_simple(Id1, Id2),
    node(Id1, Name1, Dept1),
    node(Id2, Name2, Dept2),
    X = struct("id", Id1, "name", Name1, "dept", Dept1),
    Y = struct("id", Id2, "name", Name2, "dept", Dept2).

% Apply TC on struct edges - properties are preserved!
tc(X, Y) <- #TC(edge_struct).

@output("tc").
Output includes full structs with all properties preserved:
(struct(id=1, name="Alice", dept="Engineering"), 
 struct(id=2, name="Bob", dept="Sales"))

(struct(id=2, name="Bob", dept="Sales"), 
 struct(id=3, name="Charlie", dept="Engineering"))

(struct(id=1, name="Alice", dept="Engineering"), 
 struct(id=3, name="Charlie", dept="Engineering"))

Example 2: PATHS with Struct Nodes and Visited Tracking

% Person nodes with metadata
person(1, "Alice", 28).
person(2, "Bob", 35).
person(3, "Charlie", 42).

% Social connections
knows(1, 2).
knows(2, 3).

% Create struct edges
knows_struct(X, Y) <- 
    knows(Id1, Id2),
    person(Id1, Name1, Age1),
    person(Id2, Name2, Age2),
    X = struct("id", Id1, "name", Name1, "age", Age1),
    Y = struct("id", Id2, "name", Name2, "age", Age2).

% Find all paths with visited nodes (visited nodes are also full structs!)
paths_with_metadata(X, Y, Visited) <- 
    #PATHS(knows_struct, "visited=true").

@output("paths_with_metadata").
Output includes full structs in visited arrays:
(struct(id=1, name="Alice", age=28), 
 struct(id=2, name="Bob", age=35), 
 [struct(id=1, name="Alice", age=28), struct(id=2, name="Bob", age=35)])

(struct(id=1, name="Alice", age=28), 
 struct(id=3, name="Charlie", age=42), 
 [struct(id=1, name="Alice", age=28), struct(id=2, name="Bob", age=35), struct(id=3, name="Charlie", age=42)])

Example 3: All-Shortest Paths with Weighted Struct Edges

% City nodes with coordinates
city(1, "New York", 40.7128, -74.0060).
city(2, "London", 51.5074, -0.1278).
city(3, "Tokyo", 35.6762, 139.6503).

% Weighted flights (with distance)
flight(1, 2, 5570.0).  % NYC to London: 5570 km
flight(2, 3, 9560.0).  % London to Tokyo: 9560 km
flight(1, 3, 10850.0). % NYC to Tokyo (direct): 10850 km

% Create weighted struct edges
flight_struct(X, Y, Dist) <- 
    flight(Id1, Id2, Dist),
    city(Id1, Name1, Lat1, Lon1),
    city(Id2, Name2, Lat2, Lon2),
    X = struct("id", Id1, "name", Name1, "lat", Lat1, "lon", Lon1),
    Y = struct("id", Id2, "name", Name2, "lat", Lat2, "lon", Lon2).

% Find shortest paths with city metadata preserved
asp(X, Y, Distance) <- #ASP(flight_struct).

@output("asp").
Output shows shortest paths with full city information:
(struct(id=1, name="New York", lat=40.7128, lon=-74.0060),
 struct(id=3, name="Tokyo", lat=35.6762, lon=139.6503),
 10850.0)  % Direct flight is shorter than via London

Example 4: Connected Components with Struct Nodes

% Company nodes
company(1, "Acme Inc", "Technology").
company(2, "Global Corp", "Technology").
company(3, "Local Ltd", "Retail").
company(4, "Small Co", "Retail").

% Partnerships (undirected)
partner_raw(1, 2).
partner_raw(3, 4).

% Make undirected and create struct edges
partner(X, Y) <- partner_raw(A, B), company(A, N1, I1), company(B, N2, I2),
                 X = struct("id", A, "name", N1, "industry", I1),
                 Y = struct("id", B, "name", N2, "industry", I2).
partner(X, Y) <- partner_raw(B, A), company(A, N1, I1), company(B, N2, I2),
                 X = struct("id", A, "name", N1, "industry", I1),
                 Y = struct("id", B, "name", N2, "industry", I2).

% Find components with company metadata and IDs
cc(Node, ComponentId, Component) <- 
    #CC(partner, "component_id=true,sort_component=true").

@output("cc").
Output includes full company structs:
(struct(id=1, name="Acme Inc", industry="Technology"),
 struct(id=1, name="Acme Inc", industry="Technology"),  % Component ID
 [struct(id=1, name="Acme Inc", industry="Technology"), 
  struct(id=2, name="Global Corp", industry="Technology")])

(struct(id=3, name="Local Ltd", industry="Retail"),
 struct(id=3, name="Local Ltd", industry="Retail"),  % Component ID
 [struct(id=3, name="Local Ltd", industry="Retail"), 
  struct(id=4, name="Small Co", industry="Retail")])

Benefits of Struct Nodes

  1. Rich Context: Keep all node metadata throughout graph traversal
  2. Type Safety: First field must be unique ID; all other fields are arbitrary
  3. Query Flexibility: Filter/aggregate on any struct field in downstream rules
  4. Join Elimination: No need to join back with node tables to get properties
  5. Automatic: Functions auto-detect struct vs simple nodes

Performance Notes

  • Lightweight internals: Graph algorithms work with IDs only (first field)
  • Full reconstruction: Complete structs are rebuilt for output
  • Memory efficient: ~50% savings vs storing full edges in adjacency lists
  • Type validation: Both source and destination must be the same type (both structs or both simple)

Centrality Measures

Centrality measures quantify the importance of nodes in a graph. Vadalog provides multiple centrality algorithms, all optimized for distributed computation.

6 Degree Centrality

Degree Centrality measures node importance by counting connections, normalized by the maximum possible degree.
edge("a","b").
edge("a","e").
edge("a","f").
edge("b","d").
edge("c","b").
edge("c","e").
edge("d","c").
edge("d","h").
edge("f","g").

% Total degree (in + out, normalized)
dc_total(N, DC) <- #DC(edge).

% In-degree only (incoming edges)
dc_in(N, DC) <- #DC(edge, "type=in").

% Out-degree only (outgoing edges)
dc_out(N, DC) <- #DC(edge, "type=out").

@output("dc_total").
@output("dc_in").
@output("dc_out").
Output: Normalized scores between 0.0 and 1.0. For dc_total:
(a, 0.375)  % 3 connections / 8 nodes = 3/8
(b, 0.375)
(c, 0.375)
(d, 0.375)
(e, 0.25)
(f, 0.25)
(g, 0.125)
(h, 0.125)
Features:
  • Always normalized by (N-1) where N is total nodes
  • Three degree types: in, out, total (default)

7 Betweenness Centrality

Betweenness Centrality measures how often a node lies on shortest paths between other nodes. High betweenness indicates “bridge” nodes.
% Simple chain: 1 -> 2 -> 3 -> 4 -> 5 -> 6
edge(1,2).
edge(2,3).
edge(3,4).
edge(4,5).
edge(5,6).

% Exact betweenness (all sources)
bc_exact(N, BC) <- #BC(edge).

% Approximate betweenness (sample 100 sources for large graphs)
bc_approx(N, BC) <- #BC(edge, "sample=100").

@output("bc_exact").
@output("bc_approx").
Output for bc_exact (linear chain):
(1, 0.0)    % Endpoints have zero betweenness
(2, 0.2)    % Node 2 is on 20% of shortest paths
(3, 0.3)    % Node 3 is on 30% of shortest paths
(4, 0.3)
(5, 0.2)
(6, 0.0)
Features:
  • Normalized by (N-1)(N-2)
  • Sampling support: Use sample=K for approximate results on large graphs
    • sample=500: Compute from 500 random sources instead of all N
    • Provides unbiased estimates with significant speedup
    • Recommended for graphs >10M edges

8 PageRank

PageRank computes node importance using link structure. Originally used by Google for web search ranking.
% Directed cycle: 1 -> 2 -> 3 -> 4 -> 1
edge(1,2).
edge(2,3).
edge(3,4).
edge(4,1).

% Default PageRank (damping=0.85, tolerance=0.0001)
pr_default(N, PR) <- #PR(edge).

% Custom damping factor
pr_custom(N, PR) <- #PR(edge, "damping=0.9").

% High precision
pr_precise(N, PR) <- #PR(edge, "damping=0.85,tolerance=0.00001").

@output("pr_default").
@output("pr_custom").
@output("pr_precise").
Output for pr_default (cycle):
(1, 0.25)  % All nodes equal in symmetric cycle
(2, 0.25)
(3, 0.25)
(4, 0.25)
Features:
  • Scores always sum to 1.0 (probability distribution)
  • Handles dangling nodes (no outgoing edges) correctly
  • Parameters:
    • damping: Probability of following a link (default 0.85)
    • tolerance: Convergence threshold (default 0.0001)

9 Centrality Comparison Example

% Same graph for all three measures
edge(1,2).
edge(2,3).
edge(3,4).
edge(1,3).

% Compute all centralities
dc(N, Score) <- #DC(edge).
bc(N, Score) <- #BC(edge).
pr(N, Score) <- #PR(edge).

@output("dc").
@output("bc").
@output("pr").

Reasoning + Graph Analytics

edge_large(1,2,9).
edge_large(3,4,6).

edge_short(2,3,2).
edge_short(4,1,1).

unweighted_edge(1,5).
unweighted_edge(5,2).

edge(X,Y,Z) <- edge_large(X,Y,Z).
edge(X,Y,Z) <- edge_short(X,Y,Z).
edge(X,Y,1) <- unweighted_edge(X,Y). % We assign a default distance = 1

asp_function(X,Y,Z) <- #ASP(edge).
max_asp(X,Y,M) <- asp_function(X,Y,Z), M = mmax(Z).
@output("max_asp").