Cluster

For teams running Prometheux on self-managed Yarn or Kubernetes clusters.

warning

This is an advanced deployment path. We recommend contacting the Prometheux team for guidance before proceeding.

Overview

Prometheux can run on your own compute infrastructure using Apache Spark as the execution layer. This is suited to organisations that require full control over their cluster, networking, and storage.

Architecture

Prometheux enables distributed and parallelised processing by converting its primitives (project, select, join) into map, filter, reduce and shuffle transformations executed in parallel on a cluster.

Supported Cluster Managers

Yarn

Prometheux integrates with the Yarn resource manager. Two deployment modes are available:

Client mode — the Vadalog driver resides on the client machine. The client submits the program to the Yarn Resource Manager, which elects an application master and allocates executors. The driver communicates with executors and returns results to the client.
Cluster mode — the driver runs inside the cluster on the application master node. The client submits the program and the driver manages executor communication from within the cluster.

Kubernetes

Prometheux can also run on a Kubernetes cluster. The client interacts directly with the Kubernetes API Server on the control plane, which schedules executor pods on worker nodes.

Client mode — the driver runs outside the cluster (inside or outside a pod). It contacts the API Server to schedule executor pods and communicates with them directly.
Cluster mode — the driver runs inside a pod on a worker node and coordinates with executor pods through the API Server.

Local Mode

In local mode, the driver, master, and executor run in a single JVM on the workstation where the application runs. This is useful for development and testing.

Prerequisites

A running Yarn or Kubernetes cluster (or a local workstation for local mode)
Java 11 or later
Apache Spark (version compatible with your cluster)
The Prometheux engine JAR (provided by the Prometheux team)

Configuration Reference

When configuring database properties such as credentials, settings are applied globally to all instances of a database type (e.g., PostgreSQL, Neo4j). If specific configurations are defined via the @bind annotation, those override the global values.

Database Properties

Property	Default	Description
`postgresql.url`	`jdbc:postgresql://localhost:5432/postgres`	JDBC URL for PostgreSQL
`postgresql.username`	`postgres`	PostgreSQL username
`postgresql.password`	`postgres`	PostgreSQL password
`sqlite.url`	`jdbc:sqlite:sqlite/testdb.db`	SQLite database URL
`neo4j.url`	`bolt://localhost:7687`	Neo4j Bolt URL
`neo4j.username`	`neo4j`	Neo4j username
`neo4j.password`	`neo4j`	Neo4j password
`neo4j.authentication.type`	`basic`	Auth type: none, basic, kerberos, custom, bearer
`neo4j.relationshipNodesMap`	`false`	Map relationship nodes
`neo4j.chase.url`	`bolt://localhost:7687`	URL for chase data in Neo4j
`neo4j.chase.username`	`neo4j`	Chase storage username
`neo4j.chase.password`	`neo4j`	Chase storage password
`neo4j.chase.authenticationType`	`basic`	Chase storage auth type
`mariadb.platform`	`mariadb`	Platform identifier
`mariadb.url`	`jdbc:mysql://localhost:3306/mariadb`	MariaDB JDBC URL
`mariadb.username`	`mariadb`	MariaDB username
`mariadb.password`	`mariadb`	MariaDB password
`mongodb.url`	`mongodb://localhost:27017`	MongoDB connection URL
`mongodb.username`	`mongo`	MongoDB username
`mongodb.password`	`mongo`	MongoDB password
`csv.withHeader`	`false`	Whether CSV files include a header row

Engine Properties

Property	Default	Description
`decimal_digits`	`3`	Decimal precision
`nullGenerationMode`	`UNIQUE_NULLS`	`UNIQUE_NULLS` or `SAME_NULLS`
`sparkConfFile`	`spark-defaults.conf`	Path to the Spark configuration file
`optimizationStrategy`	`default`	Options: `default`, `snaJoin`, `sna`, `noTermination`
`computeAcceleratorPreference`	`cpu`	`cpu` or `gpu` (GPU-enabled environments only)
`s3aAaccessKey`	`myAccess`	AWS S3 access key
`s3aSecretKey`	`mySecret`	AWS S3 secret key
`restService`	`off`	Set to `livy` to submit jobs via the Livy REST service

Spark Configuration

For full details, see the Apache Spark documentation.

Property	Default	Description
`appName`	`prometheux`	Spark application name
`spark.master`	`local[*]`	Master URL (`local[*]`, `spark://HOST:PORT`, `yarn`)
`spark.driver.memory`	`4g`	Driver memory
`spark.driver.maxResultSize`	`4g`	Max driver result size
`spark.executor.memory`	`4g`	Executor memory
`spark.submit.deployMode`	`client`	`client` or `cluster`
`spark.executor.instances`	`1`	Number of executors
`spark.executor.cores`	`4`	Cores per executor
`spark.dynamicAllocation.enabled`	`false`	Dynamic executor allocation
`spark.sql.adaptive.enabled`	`true`	Adaptive query execution
`spark.sql.shuffle.partitions`	`4`	Shuffle partition count
`spark.hadoop.defaultFS`	`hdfs://localhost:9000`	Hadoop filesystem URL
`spark.yarn.stagingDir`	`hdfs://localhost:9000/user/`	Yarn staging directory
`spark.hadoop.yarn.resourcemanager.hostname`	`localhost`	Yarn RM hostname
`spark.hadoop.yarn.resourcemanager.address`	`localhost:8032`	Yarn RM address
`spark.serializer`	`org.apache.spark.serializer.KryoSerializer`	Serializer class
`spark.local.dir`	`tmp`	Local scratch directory
`spark.checkpoint.compress`	`true`	Compress RDD checkpoints
`spark.shuffle.compress`	`true`	Compress shuffle data
`spark.sql.autoBroadcastJoinThreshold`	`-1`	Set to -1 for SortMergeJoin
`spark.hadoop.fs.s3a.impl`	`org.apache.hadoop.fs.s3a.S3AFileSystem`	S3 filesystem impl
`spark.hadoop.fs.s3a.path.style.access`	`true`	S3 path-style access
`spark.hadoop.fs.s3a.server-side-encryption-algorithm`	`AES256`	S3 encryption

GPU Acceleration

Prometheux supports the Spark-RAPIDS plugin for GPU-accelerated processing. See the NVIDIA blog post on Accelerating Neuro-Symbolic AI with RAPIDS and Vadalog Parallel for details.

Property	Default	Description
`spark.plugins`	`com.nvidia.spark.SQLPlugin`	Enables Spark-RAPIDS
`spark.kryo.registrator`	`com.nvidia.spark.rapids.GpuKryoRegistrator`	GPU Kryo serialization
`spark.rapids.sql.enabled`	`true`	Enable GPU for SQL ops
`spark.rapids.sql.concurrentGpuTasks`	`2`	Concurrent GPU tasks

Livy REST Service

Prometheux jobs can be submitted remotely via Apache Livy, which provides a REST interface for Spark clusters. To enable it, set restService=livy.

Property	Default	Description
`livy.uri`	`http://localhost:8998`	Livy REST service URI
`livy.hdfs.jar.files`	—	HDFS path for required JARs
`livy.hdfs.jar.path`	`/home/prometheux/livy`	JAR storage path
`livy.java.security.auth.login.config`	`jaas.conf`	Kerberos login config
`livy.java.security.krb5.conf`	`none`	Kerberos config file
`livy.sun.security.krb5.debug`	`true`	Kerberos debug logging
`livy.javax.security.auth.useSubjectCredsOnly`	`true`	Subject credentials only
`livy.session.logSize`	`0`	Session log size
`livy.shutdownContext`	`true`	Shut down Spark context after job completion

Next Steps

REST API — Engine REST endpoints
Chat API — AI-powered Vadalog assistant
Python SDK — Programmatic access via prometheux_chain

Overview​

Architecture​

Supported Cluster Managers​

Yarn​

Kubernetes​

Local Mode​

Prerequisites​

Configuration Reference​

Database Properties​

Engine Properties​

Spark Configuration​

GPU Acceleration​

Livy REST Service​

Next Steps​