Skip to main content

Cluster

For teams running Prometheux on self-managed Yarn or Kubernetes clusters.

warning

This is an advanced deployment path. We recommend contacting the Prometheux team for guidance before proceeding.

Overview

Prometheux can run on your own compute infrastructure using Apache Spark as the execution layer. This is suited to organisations that require full control over their cluster, networking, and storage.

Architecture

Prometheux enables distributed and parallelised processing by converting its primitives (project, select, join) into map, filter, reduce and shuffle transformations executed in parallel on a cluster.

Supported Cluster Managers

Yarn

Prometheux integrates with the Yarn resource manager. Two deployment modes are available:

  • Client mode — the Vadalog driver resides on the client machine. The client submits the program to the Yarn Resource Manager, which elects an application master and allocates executors. The driver communicates with executors and returns results to the client.
  • Cluster mode — the driver runs inside the cluster on the application master node. The client submits the program and the driver manages executor communication from within the cluster.

Kubernetes

Prometheux can also run on a Kubernetes cluster. The client interacts directly with the Kubernetes API Server on the control plane, which schedules executor pods on worker nodes.

  • Client mode — the driver runs outside the cluster (inside or outside a pod). It contacts the API Server to schedule executor pods and communicates with them directly.
  • Cluster mode — the driver runs inside a pod on a worker node and coordinates with executor pods through the API Server.

Local Mode

In local mode, the driver, master, and executor run in a single JVM on the workstation where the application runs. This is useful for development and testing.

Prerequisites

  • A running Yarn or Kubernetes cluster (or a local workstation for local mode)
  • Java 11 or later
  • Apache Spark (version compatible with your cluster)
  • The Prometheux engine JAR (provided by the Prometheux team)

Configuration Reference

When configuring database properties such as credentials, settings are applied globally to all instances of a database type (e.g., PostgreSQL, Neo4j). If specific configurations are defined via the @bind annotation, those override the global values.

Database Properties

PropertyDefaultDescription
postgresql.urljdbc:postgresql://localhost:5432/postgresJDBC URL for PostgreSQL
postgresql.usernamepostgresPostgreSQL username
postgresql.passwordpostgresPostgreSQL password
sqlite.urljdbc:sqlite:sqlite/testdb.dbSQLite database URL
neo4j.urlbolt://localhost:7687Neo4j Bolt URL
neo4j.usernameneo4jNeo4j username
neo4j.passwordneo4jNeo4j password
neo4j.authentication.typebasicAuth type: none, basic, kerberos, custom, bearer
neo4j.relationshipNodesMapfalseMap relationship nodes
neo4j.chase.urlbolt://localhost:7687URL for chase data in Neo4j
neo4j.chase.usernameneo4jChase storage username
neo4j.chase.passwordneo4jChase storage password
neo4j.chase.authenticationTypebasicChase storage auth type
mariadb.platformmariadbPlatform identifier
mariadb.urljdbc:mysql://localhost:3306/mariadbMariaDB JDBC URL
mariadb.usernamemariadbMariaDB username
mariadb.passwordmariadbMariaDB password
mongodb.urlmongodb://localhost:27017MongoDB connection URL
mongodb.usernamemongoMongoDB username
mongodb.passwordmongoMongoDB password
csv.withHeaderfalseWhether CSV files include a header row

Engine Properties

PropertyDefaultDescription
decimal_digits3Decimal precision
nullGenerationModeUNIQUE_NULLSUNIQUE_NULLS or SAME_NULLS
sparkConfFilespark-defaults.confPath to the Spark configuration file
optimizationStrategydefaultOptions: default, snaJoin, sna, noTermination
computeAcceleratorPreferencecpucpu or gpu (GPU-enabled environments only)
s3aAaccessKeymyAccessAWS S3 access key
s3aSecretKeymySecretAWS S3 secret key
restServiceoffSet to livy to submit jobs via the Livy REST service

Spark Configuration

For full details, see the Apache Spark documentation.

PropertyDefaultDescription
appNameprometheuxSpark application name
spark.masterlocal[*]Master URL (local[*], spark://HOST:PORT, yarn)
spark.driver.memory4gDriver memory
spark.driver.maxResultSize4gMax driver result size
spark.executor.memory4gExecutor memory
spark.submit.deployModeclientclient or cluster
spark.executor.instances1Number of executors
spark.executor.cores4Cores per executor
spark.dynamicAllocation.enabledfalseDynamic executor allocation
spark.sql.adaptive.enabledtrueAdaptive query execution
spark.sql.shuffle.partitions4Shuffle partition count
spark.hadoop.defaultFShdfs://localhost:9000Hadoop filesystem URL
spark.yarn.stagingDirhdfs://localhost:9000/user/Yarn staging directory
spark.hadoop.yarn.resourcemanager.hostnamelocalhostYarn RM hostname
spark.hadoop.yarn.resourcemanager.addresslocalhost:8032Yarn RM address
spark.serializerorg.apache.spark.serializer.KryoSerializerSerializer class
spark.local.dirtmpLocal scratch directory
spark.checkpoint.compresstrueCompress RDD checkpoints
spark.shuffle.compresstrueCompress shuffle data
spark.sql.autoBroadcastJoinThreshold-1Set to -1 for SortMergeJoin
spark.hadoop.fs.s3a.implorg.apache.hadoop.fs.s3a.S3AFileSystemS3 filesystem impl
spark.hadoop.fs.s3a.path.style.accesstrueS3 path-style access
spark.hadoop.fs.s3a.server-side-encryption-algorithmAES256S3 encryption

GPU Acceleration

Prometheux supports the Spark-RAPIDS plugin for GPU-accelerated processing. See the NVIDIA blog post on Accelerating Neuro-Symbolic AI with RAPIDS and Vadalog Parallel for details.

PropertyDefaultDescription
spark.pluginscom.nvidia.spark.SQLPluginEnables Spark-RAPIDS
spark.kryo.registratorcom.nvidia.spark.rapids.GpuKryoRegistratorGPU Kryo serialization
spark.rapids.sql.enabledtrueEnable GPU for SQL ops
spark.rapids.sql.concurrentGpuTasks2Concurrent GPU tasks

Livy REST Service

Prometheux jobs can be submitted remotely via Apache Livy, which provides a REST interface for Spark clusters. To enable it, set restService=livy.

PropertyDefaultDescription
livy.urihttp://localhost:8998Livy REST service URI
livy.hdfs.jar.filesHDFS path for required JARs
livy.hdfs.jar.path/home/prometheux/livyJAR storage path
livy.java.security.auth.login.configjaas.confKerberos login config
livy.java.security.krb5.confnoneKerberos config file
livy.sun.security.krb5.debugtrueKerberos debug logging
livy.javax.security.auth.useSubjectCredsOnlytrueSubject credentials only
livy.session.logSize0Session log size
livy.shutdownContexttrueShut down Spark context after job completion

Next Steps

  • REST API — Engine REST endpoints
  • Chat API — AI-powered Vadalog assistant
  • Python SDK — Programmatic access via prometheux_chain