SQL Integration in Vadalog

Why SQL in Vadalog?

Prometheux supports native SQL queries embedded directly within Vadalog rules. This powerful feature allows you to:

Leverage existing SQL skills – write familiar SQL SELECT statements alongside Vadalog rules
Use SQL's expressive power – complex JOINs, aggregations, window functions, and CTEs
Query across data sources – seamlessly combine PostgreSQL, MariaDB, CSV files, and in-memory facts in a single SQL query
Scale to large datasets – benefit from distributed query execution for large-scale data processing
Simplify data transformation – use SQL for data manipulation while keeping Vadalog for logical reasoning

Prometheux automatically parallelizes and optimizes SQL queries across distributed compute resources while maintaining the declarative semantics of your program.

Two Ways to Use SQL

1. SQL in Rule Bodies (Simple Approach)

You can replace the traditional Vadalog body with a SQL SELECT statement using the <- operator:

result_predicate() <- SELECT column1, column2 FROM table WHERE condition.

Syntax:

Use <- instead of :- to indicate a SQL body
The SQL query starts with SELECT and ends with . (the rule terminator)
No arguments needed in the head – the output schema is defined by the SELECT clause

Simple Example:

% Define some facts
person("Alice", 25).
person("Bob", 30).
person("Charlie", 20).

% SQL rule - empty parentheses in head
adults() <- SELECT person_0 as name, person_1 as age 
            FROM person 
            WHERE person_1 >= 25.

@output("adults").

Output:

adults("Alice", 25)
adults("Bob", 30)

CSV Example:

@bind("employees", "csv useHeaders=true", "data", "employees.csv").

% Filter employees using SQL
high_earners() <- SELECT name, salary 
                  FROM employees 
                  WHERE salary > 100000.

@output("high_earners").

2. SQL in Functions (Advanced)

You can pass SQL queries as string arguments to graph analytics functions. This is useful when you want to filter or transform data before applying graph algorithms.

Syntax:

Wrap the SQL query in double quotes: "SELECT ..."
The SQL query replaces the predicate atom normally passed to the function
Head arguments ARE required – they receive the function's output
All function options still work (e.g., "visited=true", "max_depth=5")

Example:

edge(1, 2).
edge(2, 3).
edge(3, 4).

% Traditional approach - pass predicate atom
tc_traditional(X, Y) :- #TC(edge).

% SQL approach - pass SQL query as string
tc_sql(X, Y) :- #TC("SELECT edge_0, edge_1 FROM edge WHERE edge_0 < 3").

@output("tc_sql").

Head Arguments

SQL in rule bodies: Head arguments are optional (empty parentheses () work)
SQL in functions: Head arguments are required (e.g., (X, Y)) to capture function output

The difference is that functions like #TC compute results (transitive closure) and assign them to variables, while SQL bodies directly define the output.

Column Naming Conventions

Vadalog uses specific column naming conventions depending on the data source and query type.

Column Naming Rules

When all tables in a SQL query are from the same database, you can use actual column names. For queries across different data sources, use predicateName_columnIndex notation.

For In-Memory Facts

Facts use the pattern predicateName_columnIndex (zero-based):

person("Alice", 30, "Engineer").
person("Bob", 25, "Designer").

% Columns: person_0, person_1, person_2
adults() <- SELECT person_0 as name, person_1 as age 
            FROM person 
            WHERE person_1 >= 18.

For CSV/Parquet with Headers

Single-table queries can use actual column names:

@bind("employees", "csv useHeaders=true", "data", "employees.csv").

% Use actual column names (name, age, salary)
high_earners() <- SELECT name, salary 
                  FROM employees 
                  WHERE salary > 100000.

Multi-table queries across different sources must use predicateName_columnIndex:

@bind("employees", "csv useHeaders=true", "data", "employees.csv").
@bind("departments", "csv useHeaders=true", "data", "departments.csv").

% Different CSV files - MUST use predicateName_columnIndex
emp_dept() <- SELECT employees_1, departments_1 
              FROM employees 
              JOIN departments ON employees_2 = departments_0.

For Database Tables (PostgreSQL, MariaDB, etc.)

Single-table queries can use actual database column names:

@bind("employees", "postgresql", "company_db", "employees").

% Use actual database column names
high_earners() <- SELECT name, salary 
                  FROM employees 
                  WHERE salary > 100000.

Multi-table queries from the same database can use actual column names:

@bind("employees", "postgresql", "company_db", "employees").
@bind("departments", "postgresql", "company_db", "departments").

% Both tables are in the same PostgreSQL database
% Can use actual column names
emp_dept() <- SELECT e.name, e.salary, d.dept_name, d.location
              FROM employees e
              JOIN departments d ON e.dept_id = d.dept_id.

Multi-table queries across different databases must use predicateName_columnIndex:

@bind("employees", "postgresql", "company_db", "employees").
@bind("departments", "mariadb", "other_db", "departments").

% Different databases - MUST use predicateName_columnIndex
% employees_0 (id), employees_1 (name), employees_2 (dept_id), employees_3 (salary)
% departments_0 (dept_id), departments_1 (dept_name), departments_2 (location)
emp_dept() <- SELECT employees_1, employees_3, departments_1 
              FROM employees 
              JOIN departments ON employees_2 = departments_0.

Column Naming Rules

Single table → Use actual column names (CSV headers or DB column names)
Multiple tables from same database → Use actual column names
Multiple tables from different sources → Use predicateName_columnIndex notation

Case-Sensitive Identifiers

Some databases (e.g., PostgreSQL, Oracle) are case-sensitive and require quoted identifiers for mixed-case column or table names. Vadalog supports universal backtick notation that automatically converts to database-specific quotes.

Using Backticks for Case-Sensitive Identifiers

@bind("inventory", "postgresql", "warehouse_db", "inventory_case").

% Use backticks for case-sensitive columns
% Backticks are converted to double quotes for PostgreSQL
high_stock() <- SELECT `ItemId`, `ProductName`, `StockLevel`
                FROM `inventory_case`
                WHERE `StockLevel` > 100.

Database-Specific Conversion

Vadalog automatically converts backticks to the appropriate quote character for each database:

Database	Quote Character	Example
PostgreSQL	`"` (double quotes)	`SELECT "ItemId" FROM "inventory"`
Oracle	`"` (double quotes)	`SELECT "ItemId" FROM "inventory"`
Snowflake	`"` (double quotes)	`SELECT "ItemId" FROM "inventory"`
DB2	`"` (double quotes)	`SELECT "ItemId" FROM "inventory"`
SQL Server	`"` (double quotes)	`SELECT "ItemId" FROM "inventory"`
MySQL	` (backticks)	SELECT `ItemId` FROM `inventory`
MariaDB	` (backticks)	SELECT `ItemId` FROM `inventory`
Databricks	` (backticks)	SELECT `ItemId` FROM `inventory`
Neo4j	` (backticks)	SELECT `ItemId` FROM `inventory`

Universal Backtick Notation

Always use backticks (`) for case-sensitive identifiers in your Vadalog code. The system will automatically convert them to the correct syntax for your target database. This ensures your code is portable across different database systems.

Example: Case-Sensitive JOIN

@bind("inventory", "postgresql", "warehouse_db", "inventory_case").
@bind("transactions", "postgresql", "warehouse_db", "transactions_case").

% Both tables in same PostgreSQL database with case-sensitive columns
% Backticks are converted to double quotes
sales_report() <- SELECT i.`ProductName`, t.`Quantity`, t.`TotalAmount`
                  FROM `inventory_case` i
                  JOIN `transactions_case` t ON i.`ItemId` = t.`ItemId`
                  WHERE t.`Quantity` >= 5.

Examples

Example 1: Basic SQL Body with Facts

person("Alice", 25).
person("Bob", 30).
person("Charlie", 20).

% Use default column names for facts
adults() <- SELECT person_0 as name, person_1 as age 
            FROM person 
            WHERE person_1 >= 25.

@output("adults").

Output:

adults("Alice", 25)
adults("Bob", 30)

Example 2: SQL Body with CSV Files

@bind("employees", "csv useHeaders=true", "data", "employees.csv").

% Single table - use actual column names
high_earners() <- SELECT name, salary 
                  FROM employees 
                  WHERE salary > 70000.

@output("high_earners").

Example 3: Same-Database JOIN

@bind("employees", "postgresql", "company_db", "employees").
@bind("departments", "postgresql", "company_db", "departments").

% Both tables in same database - use actual column names
emp_dept() <- SELECT e.name as emp_name,
                     e.salary,
                     d.dept_name,
                     d.location
              FROM employees e
              JOIN departments d ON e.dept_id = d.dept_id
              WHERE e.salary > 50000.

@output("emp_dept").

Example 4: Cross-Source JOIN with CSVs

@bind("employees", "csv useHeaders=true", "data", "employees.csv").
@bind("departments", "csv useHeaders=true", "data", "departments.csv").

% Different CSV files - MUST use predicateName_columnIndex
emp_dept() <- SELECT employees_1 as emp_name, 
                     departments_1 as dept_name, 
                     departments_2 as location
              FROM employees 
              JOIN departments ON employees_2 = departments_0.

@output("emp_dept").

Example 5: Mixing Facts and CSV

@bind("departments", "csv useHeaders=true", "data", "departments.csv").

employees(1, "Alice", 100, 75000).
employees(2, "Bob", 200, 65000).
employees(3, "Charlie", 100, 80000).

% Multi-source - use predicateName_columnIndex
emp_with_dept() <- SELECT employees_1, employees_3, departments_1 
                   FROM employees 
                   JOIN departments ON employees_2 = departments_0.

@output("emp_with_dept").

Example 6: SQL Aggregation with Same-Database Tables

@bind("employees", "mariadb", "company_db", "employees").
@bind("departments", "mariadb", "company_db", "departments").

% Both tables in same MariaDB database - use actual column names
dept_stats() <- SELECT d.dept_name,
                       COUNT(*) as emp_count,
                       SUM(e.salary) as total_salary,
                       AVG(e.salary) as avg_salary
                FROM employees e
                JOIN departments d ON e.dept_id = d.dept_id
                GROUP BY d.dept_name.

@output("dept_stats").

Example 7: SQL with UNION

us_employees("Alice", "USA").
us_employees("Bob", "USA").

uk_employees("Charlie", "UK").
uk_employees("Diana", "UK").

all_employees() <- SELECT us_employees_0 as name, us_employees_1 as country 
                   FROM us_employees 
                   UNION 
                   SELECT uk_employees_0 as name, uk_employees_1 as country 
                   FROM uk_employees.

@output("all_employees").

Example 8: Cross-Database Queries (PostgreSQL + MariaDB)

@bind("employees", "postgresql", "company_db", "employees").
@bind("orders", "mariadb", "sales_db", "orders").

% Different databases - MUST use predicateName_columnIndex
employee_orders() <- SELECT employees_1, orders_1, orders_2 
                     FROM employees 
                     JOIN orders ON employees_1 = orders_3.

@output("employee_orders").

Example 9: Hybrid Queries (PostgreSQL + CSV + Facts)

@bind("pg_departments", "postgresql", "company_db", "departments").
@bind("csv_employees", "csv useHeaders=true", "data", "employees.csv").

budgets(100, 500000).
budgets(200, 300000).

% Three-way join across different sources
dept_budget() <- SELECT pg_departments_1 as dept_name,
                        csv_employees_1 as employee_name,
                        budgets_1 as budget
                 FROM pg_departments 
                 JOIN csv_employees ON pg_departments_0 = csv_employees_2
                 JOIN budgets ON pg_departments_0 = budgets_0.

@output("dept_budget").

SQL in Graph Functions

All graph analytics functions (#TC, #ASP, #PATHS, #CC, etc.) can accept SQL queries instead of predicate atoms.

Example 10: Transitive Closure with SQL

edge(1, 2).
edge(2, 3).
edge(3, 4).

% Compute transitive closure using SQL
tc(X, Y) :- #TC("SELECT edge_0, edge_1 FROM edge").

@output("tc").

Result:
All reachable pairs including transitive paths (1→2, 2→3, 3→4, 1→3, 2→4, 1→4)

Example 11: All-Shortest Paths with SQL

edge(1, 2, 10).
edge(2, 3, 20).
edge(3, 4, 15).

% Compute shortest paths using SQL
asp(X, Y, Dist) :- #ASP("SELECT edge_0, edge_1, edge_2 FROM edge").

@output("asp").

Example 12: PATHS Function with SQL and Options

edge(1, 2).
edge(2, 3).
edge(3, 4).
edge(4, 5).

% Compute paths with max depth using SQL
paths(X, Y, V) :- #PATHS("SELECT edge_0, edge_1 FROM edge", 
                         "visited=true,max_depth=2").

@output("paths").

Example 13: Connected Components with SQL

edge(1, 2).
edge(2, 1).
edge(3, 4).
edge(4, 3).

% Find connected components using SQL
cc(Node, ComponentId, Component) :- #CC("SELECT edge_0, edge_1 FROM edge", 
                                         "component_id=true").

@output("cc").

Example 14: SQL Function with Filtering and JOIN

@bind("employees", "postgresql", "company_db", "employees").

% Transitive closure over filtered PostgreSQL data
tc(X, Y) :- #TC("SELECT name, dept_id 
                 FROM employees 
                 WHERE dept_id IS NOT NULL").

@output("tc").

Example 15: SQL Function from CSV

@bind("ownerships", "csv useHeaders=true", "data", "ownerships.csv").

% Compute transitive closure of ownership relationships
tc(Company1, Company2) :- #TC("SELECT companyfrom, companyto 
                               FROM ownerships 
                               WHERE ownership_pct > 25").

@output("tc").

Advanced Features

Mixing SQL Rules and Vadalog Rules

You can freely mix SQL-based rules with traditional Vadalog rules:

@bind("employees", "csv useHeaders=true", "data", "employees.csv").

% SQL rule
high_earners() <- SELECT name, salary 
                  FROM employees 
                  WHERE salary > 100000.

% Traditional Vadalog rule using SQL rule result
very_high_earner(Name) :- high_earners(Name, Salary), Salary > 150000.

% Another SQL rule referencing Vadalog-derived predicate
top_earners() <- SELECT high_earners_0, high_earners_1 
                 FROM high_earners 
                 ORDER BY high_earners_1 DESC 
                 LIMIT 10.

@output("top_earners").

SQL with Subqueries

@bind("employees", "csv useHeaders=true", "data", "employees.csv").
@bind("departments", "csv useHeaders=true", "data", "departments.csv").

above_avg_earners() <- SELECT employees_1, employees_3, departments_1 
                       FROM employees 
                       JOIN departments ON employees_2 = departments_0 
                       WHERE employees_3 > (SELECT AVG(employees_3) FROM employees).

@output("above_avg_earners").

SQL with Window Functions

@bind("sales", "postgresql", "sales_db", "transactions").

% Rank sales by region using window functions
ranked_sales() <- SELECT sales_0 as region,
                         sales_1 as product,
                         sales_2 as amount,
                         RANK() OVER (PARTITION BY sales_0 ORDER BY sales_2 DESC) as rank
                  FROM sales.

@output("ranked_sales").

Common Table Expressions (CTEs)

@bind("employees", "postgresql", "company_db", "employees").

% Use CTE (WITH clause) for complex queries
dept_summary() <- WITH dept_avg AS (
                    SELECT employees_2 as dept, AVG(employees_3) as avg_sal
                    FROM employees
                    GROUP BY employees_2
                  )
                  SELECT employees_1, employees_3, dept_avg.avg_sal
                  FROM employees
                  JOIN dept_avg ON employees_2 = dept_avg.dept
                  WHERE employees_3 > dept_avg.avg_sal.

@output("dept_summary").

Best Practices

1. Choose the Right Approach

Use SQL bodies when:

You need complex data transformations
You're working with aggregations, window functions, or CTEs
You're joining data from multiple sources
The logic is naturally expressed in SQL

Use traditional Vadalog when:

You need recursion or fixpoint computation
The logic involves complex logical rules
You're doing rule-based reasoning or inference

2. Use Appropriate Column Naming

When querying tables from the same database, use actual column names for cleaner, more readable queries:

@bind("employees", "postgresql", "company_db", "employees").
@bind("departments", "postgresql", "company_db", "departments").

% Good - actual column names (same database)
readable() <- SELECT e.name, d.dept_name
              FROM employees e
              JOIN departments d ON e.dept_id = d.dept_id.

% Required for cross-source queries
cross_source() <- SELECT employees_1, departments_1
                  FROM employees
                  JOIN departments ON employees_2 = departments_0.

3. Column Naming Clarity

Always use aliases to make your output schema clear:

% Good - clear aliases
result() <- SELECT e.name as employee_name,
                   d.dept_name
            FROM employees e
            JOIN departments d ON e.dept_id = d.dept_id.

% Avoid - unclear column names in output
result() <- SELECT employees_1, departments_1
            FROM employees JOIN departments 
            ON employees_2 = departments_0.

4. Use Backticks for Case-Sensitive Identifiers

For databases with case-sensitive columns, use backticks for portability:

@bind("inventory", "postgresql", "warehouse_db", "inventory_case").

% Good - backticks are converted to database-specific quotes
portable() <- SELECT `ItemId`, `ProductName`
              FROM `inventory_case`
              WHERE `StockLevel` > 100.

% Avoid - hard-coding database-specific quotes
not_portable() <- SELECT "ItemId", "ProductName"
                  FROM "inventory_case"
                  WHERE "StockLevel" > 100.

6. Leverage Data Source Bindings

Use @bind annotations to connect to diverse data sources, then query them uniformly with SQL:

@bind("pg_data", "postgresql", "db1", "table1").
@bind("csv_data", "csv useHeaders=true", "data", "file.csv").
@bind("maria_data", "mariadb", "db2", "table2").

% Query all three as if they were local tables
combined() <- SELECT pg_data_0, csv_data_1, maria_data_2
              FROM pg_data
              JOIN csv_data ON pg_data_0 = csv_data_0
              JOIN maria_data ON csv_data_1 = maria_data_1.

7. Filter Data in SQL Queries

You can filter data directly within SQL queries passed to graph functions:

% Filter edges with SQL before computing transitive closure
filtered_tc(X, Y) :- #TC("SELECT edge_0, edge_1 
                          FROM edge 
                          WHERE weight > 100").

8. Validate Table References

Ensure all tables in your SQL queries are either:

Bound via @bind annotations
Defined as facts in the program
Derived from other rules

% This will fail - 'unknown_table' not defined
bad_rule() <- SELECT * FROM unknown_table.

% Correct - table is bound
@bind("employees", "csv", "data", "employees.csv").
good_rule() <- SELECT * FROM employees.

Troubleshooting

Error: "Table not defined"

Cause: SQL query references a table that has no @bind annotation, no facts, and no deriving rules.

Solution: Add a @bind annotation or define the predicate as facts:

@bind("employees", "csv useHeaders=true", "data", "employees.csv").
% OR
employee("Alice", 30).
employee("Bob", 25).

Error: "Column not found"

Cause: Using wrong column naming convention (e.g., actual column names in multi-table query).

Solution: Use predicateName_columnIndex for multi-table queries:

% Wrong
result() <- SELECT name FROM employees JOIN departments ...

% Correct
result() <- SELECT employees_1 FROM employees JOIN departments ...

Error: "Invalid SQL syntax"

Cause: SQL query contains syntax errors.

Solution: Validate your SQL query using standard SQL syntax. Vadalog supports most ANSI SQL features including window functions, CTEs, and complex aggregations.

Summary

SQL integration in Vadalog provides a powerful bridge between declarative logic programming and industrial-strength SQL:

✅ Embed SQL SELECT statements directly in rule bodies
✅ Pass SQL queries to graph analytics functions
✅ Query across data sources – PostgreSQL, MariaDB, CSV, facts, and more
✅ Use full SQL expressiveness – JOINs, aggregations, CTEs, window functions
✅ Flexible column naming – use actual column names for same-database queries, or predicateName_i for cross-source queries
✅ Universal backtick notation – write portable queries with automatic quote conversion
✅ Automatic parallelization and optimization across distributed compute resources
✅ Seamless integration with Vadalog rules and reasoning

This hybrid approach combines the strengths of both paradigms: use SQL when you need its familiar syntax and expressive power, and use Vadalog for recursion, reasoning, and complex logical rules.

Why SQL in Vadalog?​

Two Ways to Use SQL​

1. SQL in Rule Bodies (Simple Approach)​

2. SQL in Functions (Advanced)​

Column Naming Conventions​

For In-Memory Facts​

For CSV/Parquet with Headers​

For Database Tables (PostgreSQL, MariaDB, etc.)​

Case-Sensitive Identifiers​

Using Backticks for Case-Sensitive Identifiers​

Database-Specific Conversion​

Example: Case-Sensitive JOIN​

Examples​

Example 1: Basic SQL Body with Facts​

Example 2: SQL Body with CSV Files​

Example 3: Same-Database JOIN​

Example 4: Cross-Source JOIN with CSVs​

Example 5: Mixing Facts and CSV​

Example 6: SQL Aggregation with Same-Database Tables​

Example 7: SQL with UNION​

Example 8: Cross-Database Queries (PostgreSQL + MariaDB)​

Example 9: Hybrid Queries (PostgreSQL + CSV + Facts)​

SQL in Graph Functions​

Example 10: Transitive Closure with SQL​

Example 11: All-Shortest Paths with SQL​

Example 12: PATHS Function with SQL and Options​

Example 13: Connected Components with SQL​

Example 14: SQL Function with Filtering and JOIN​

Example 15: SQL Function from CSV​

Advanced Features​

Mixing SQL Rules and Vadalog Rules​

SQL with Subqueries​

SQL with Window Functions​

Common Table Expressions (CTEs)​

Best Practices​

1. Choose the Right Approach​

2. Use Appropriate Column Naming​

3. Column Naming Clarity​

4. Use Backticks for Case-Sensitive Identifiers​

6. Leverage Data Source Bindings​

7. Filter Data in SQL Queries​

8. Validate Table References​

Troubleshooting​

Error: "Table not defined"​

Error: "Column not found"​

Error: "Invalid SQL syntax"​

Summary​