Why Cypher in Vadalog?
Prometheux supports native Cypher queries embedded directly within Vadalog rules. A Cypher body is not limited to Neo4j: the same pattern runs whether the predicates it references are bound to CSV / Parquet / Iceberg files, a relational database (PostgreSQL, MariaDB, Snowflake, …), a vector store, in-memory facts, or a Neo4j server.- Write graph queries against tabular data — pattern-match nodes and relationships over rows in your existing tables and files, no graph database required
- Leverage existing Cypher skills —
MATCH,WHERE,RETURN,WITH,OPTIONAL MATCH, and the GDS algorithm family work the same everywhere - Mix data layers — a single Cypher query can join Iceberg with PostgreSQL, Neo4j, Qdrant, or in-memory facts; each side keeps its own connector-native pushdown
- Push down to the source — for non-Neo4j sources the engine generates a SQL plan that pushes filters, projections, partition pruning and file skipping into the source scan; for Neo4j the Cypher is sent verbatim to the server
- Compose with Vadalog — anything Cypher does not express (recursion, the chase, AI / vector functions, hashing) is added as a plain Vadalog rule on top of the Cypher result
Two Ways to Use Cypher
1. Cypher in Rule Bodies (over any source)
When a rule body starts with aMATCH (or OPTIONAL MATCH, CALL, or WITH) keyword, it is interpreted as an inline Cypher query:
- The Cypher body starts with
MATCH/OPTIONAL MATCH/CALL/WITHand ends with.(the rule terminator) - The head’s arguments are positional projections of the Cypher
RETURNlist (one head arg per returned column) - All standard Cypher constructs are supported in the body — see Patterns, Filtering, Projection, Aggregation, WITH pipelines, Graph algorithms
2. Cypher in Neo4j Bindings (server-side pushdown)
When binding a Neo4j source, the bind’s table slot accepts a node label or relationship pattern, and the engine generates the rightMATCH to scan it server-side:
Cypher cells in the Prometheux platformThe Prometheux platform authors programs as cells, each with a type — The platform binds the resulting columns to a predicate that the next cell can consume — Vadalog, Cypher, SQL, or Python alike. The pushdown behaviour described in the Tip above applies the same way.
vadalog, cypher, sql, or python. In a Cypher cell you write the query directly, without wrapping it in a Vadalog rule head:
The rest of this page documents Cypher used inside a Vadalog cell — i.e. the predicate(...) <- MATCH ... . rule-body form. Pick this when you want to compose a Cypher pattern with other Vadalog rules in the same cell; pick a plain Cypher cell when the query stands on its own.
How Cypher patterns map to your data
For a Cypher pattern to line up with relational tables and files, follow the property-graph ↔ relational convention below:| Cypher element | Relational layout |
|---|---|
Node label (:Person) | a table person (lower-cased) with an id column |
Relationship type [:KNOWS] | a table knows (lower-cased) with src and dst columns referencing node ids |
Node property p.age | a column age on the node table |
Relationship property r.weight | a column weight on the relationship table |
@mapping-aliased column. If there is no corresponding name, the query errors with column not found.
For Neo4j-bound predicates, no layout convention is required: the Cypher is sent verbatim to the server, which already understands the graph schema.
Patterns
Single-relation pattern
The most common case: a one-hop traversal. Pushed down to the source scan with column and predicate pruning.Joining node properties across a relationship
A pattern that reads node properties across a relationship pulls in the node label table automatically. Each side keeps its own pushdown; the engine joins the projected results.Multi-hop traversal
Comma-separated patterns and multi-hop chains are recognised; the engine wires up the joins for you.Variable-length and bounded patterns
Variable-length traversal(a)-[:R*]->(b) (with optional *m..n bounds) is recognised as a reachability query and lowered to a transitive-closure computation by the engine:
shortestPath / allShortestPaths
OPTIONAL MATCH (left-join semantics)
The matched companion plus a negated branch are emitted so rows that do not have the optional relationship still appear with null columns:
Filtering in WHERE
Comparison and boolean composition
Arithmetic on properties and literals
String predicates: STARTS WITH, ENDS WITH, CONTAINS
Regex =~
List membership IN [list]
Null tests IS NULL / IS NOT NULL
NOT (…)
Negation is simplified when it sits in front of a single comparison (NOT a > 65 becomes a <= 65); otherwise it wraps the inner condition.
Projection and value transformation
Aliases, DISTINCT, ordering and paging
Arithmetic in RETURN
coalesce and CASE WHEN
coalesce(...) returns the first non-null argument; CASE is rewritten into nested if(...). Both forms are supported — the searched form (CASE WHEN cond THEN …) and the simple form (CASE expr WHEN value THEN …) — with any number of WHEN branches and a mandatory ELSE.
Type conversions
toInteger, toFloat, toString, toBoolean, date(s), datetime(s) become a SQL CAST(... AS ...) on the source side (or a Vadalog as_long / as_double / as_string / … term in a multi-source join).
Math functions
abs, ceil, floor, round, sqrt, sign, exp, log, pow.
String functions
toLower, toUpper, trim, replace, split, substring, size(string).
Map projection (struct construction)
A Cypher map projection inRETURN produces a single struct-typed column.
Working with list-typed columns
Over sources whose columns are array typed (Parquet, JSON, Iceberg, struct-aware databases), list length, slicing (end-exclusive), zero-based indexing, predicate functions (any / all / none / single), and list comprehensions are all supported:
Temporal expressions
Zero-argumentdate() / datetime() return the current date / timestamp; duration.between(a, b) returns the day difference between two date/timestamp values.
Aggregation, grouping and paging
count, count(*), sum, avg, min, max, collect are supported, with an implicit GROUP BY on the non-aggregated RETURN items. ORDER BY accepts multiple keys and ASC / DESC; LIMIT and SKIP translate to SQL LIMIT / OFFSET.
Multi-stage WITH pipelines
A WITH clause carries a projection (or an aggregate) into the next stage; a WHERE after the WITH filters those carried rows — including the canonical “filter on an aggregate” (HAVING-style) shape. Multiple WITH stages chain.
Project then filter
Aggregate then HAVING-style filter
Multi-stage chain
Graph algorithms (GDS)
Graph Data Science calls work directly over any edge table. The shape is uniform: project a graph from the relationship table, then call the algorithm.@bind’d table, the engine restricts the graph to edges whose endpoints exist in that table — a node-induced edge filter applied before the algorithm runs.
Algorithm parameters (relationshipWeightProperty, sourceNode, dampingFactor) are passed through to the function:
gds.* call:
Examples
Example 1: Cypher over CSV
Example 2: Cypher over a relational database
The same query runs over PostgreSQL with no changes other than the binding — the whole Cypher is pushed down as a single SQLJOIN to the database:
Example 3: Cypher over Iceberg
For Iceberg, the Cypher body benefits from partition pruning, file skipping via min/max stats, and bloom filters — the same planning-time optimizations that apply to a SQL body:Example 4: Cypher pushed down to Neo4j
When every predicate the body references is bound to Neo4j, the whole Cypher is sent verbatim to the server:Example 5: Mixed-source join (files + database + Neo4j)
A single Cypher body can reference predicates bound to different data layers. Each side keeps its own connector-native pushdown and the engine joins the projected results:Example 6: Cypher reading from a derived predicate
A Cypher body can also match against a predicate produced upstream by another rule. There is noid column on a derived predicate, so reference columns by their positional alias predicateName_columnIndex:
Example 7: Cypher with GDS over a database edge table
Example 8: Cypher pipeline with WITH over CSV
Beyond the Cypher surface
The engine’s expression library goes considerably beyond what core Cypher offers. When a query needs one of the following, drop into a Vadalog rule on top of the Cypher result:- Hashing and ids:
hash:md5,hash:sha1,hash:sha2,hash:hash,uuid(), monotonically increasing ids. - AI / vector-search functions:
embeddings:vectorize,embeddings:cosine_sim,llm:generate, and theask(...)function for vector retrieval over Qdrant collections. - Set algebra on arrays:
|(union),&(intersection), andcollections:difference/union/intersection. - Richer date arithmetic:
date:add,date:sub,date:diff,date:next_day,date:prev_day,date:spec_day,date:format,date:to_timestamp. - Boolean and conditional families beyond
CASE:xor,nand,nor,xnor,implies,iff, plusnullManagement:ifnull(x, fallback)andnullManagement:coalesceover arbitrary expressions. - JSON / struct round-tripping:
as_json,as_struct,as_list,as_set,as_map,struct:get. - Recursive reasoning: monotonic aggregations, fixed-point rules, the chase — the kind of inference that does not have a Cypher analog at all.
CREATE, MERGE, DELETE, SET, REMOVE) are also expressed outside the Cypher body — writes go through @bind with the saveMode option.
For instance, an MD5-fingerprinted profile, computed on top of a Cypher map projection:
Best Practices
1. Choose Cypher vs SQL deliberately
Use Cypher in rule bodies when- The query is naturally a graph pattern (multi-hop, variable-length, shortest path)
- You want to call a GDS algorithm (PageRank, connected components, betweenness, …)
- The graph layer is your mental model — labels, relationships, properties
- The query is naturally relational (joins, aggregations, window functions, CTEs)
- You need analytical SQL constructs (
WITH RECURSIVE, window functions,GROUPING SETS)
- You need recursion that goes beyond a single GDS call (the chase, monotonic aggregations, fixed-point)
- The logic involves complex rule-based reasoning or inference
- You need the engine’s full expression library — hashing, AI, vectors, set algebra (see Beyond the Cypher surface)
2. Follow the layout convention
For non-Neo4j sources, naming conventions are what let Cypher patterns line up:- Node label → lower-cased table name with an
idcolumn - Relationship type → lower-cased table name with
srcanddstcolumns - Properties → columns of the same name as the Cypher property
@mapping annotations to alias source columns into this layout when the underlying schema does not match (see @mapping).
3. Keep mixed-source queries focused
A single Cypher body that spans Iceberg + PostgreSQL + Neo4j is fully supported, but each side keeps its own pushdown only on the predicates filtered locally. Push as much filtering as possible into the CypherWHERE so each per-source projection is small before the engine joins them.
4. Use binds that pre-select with query=
If the underlying source needs a projection or WHERE more specific than the Cypher layout convention provides, lift it into the @bind itself with the query= option:
5. Compose with Vadalog rather than over-stuffing the Cypher
A longWITH … WITH … RETURN chain over multiple sources is often easier to read as a few Vadalog rules, each carrying one Cypher pattern:
Summary
Cypher integration in Vadalog provides a single, source-agnostic graph query surface on top of every data layer Prometheux supports: ✅ Inline Cypher in rule bodies over CSV / Parquet / Iceberg / JSON / relational databases / Neo4j / in-memory facts / derived predicates✅ Pattern matching — single-relation, multi-hop, variable-length,
shortestPath, OPTIONAL MATCH✅ Full expression surface — arithmetic, string predicates, regex,
IN, null tests, coalesce, multi-branch CASE WHEN, type conversions, math and string functions, map projection, list operations, temporal helpers✅ Aggregations —
count, sum, avg, min, max, collect with implicit GROUP BY, ORDER BY, LIMIT, SKIP✅ Multi-stage pipelines —
WITH chains with HAVING-style post-aggregation filters✅ Graph algorithms — the full GDS catalog, with node-induced edge filters from
@bind’d label tables✅ Mixed-source joins — each side keeps its own connector-native pushdown; the engine joins the projected results
✅ Compose with Vadalog — drop into rules whenever the engine’s full expression library or recursive reasoning is needed

