Prometheux Databricks App
AI-powered Knowledge Graph platform deployed as a Databricks native application.
Features
- React Frontend: Modern UI for data exploration and visualization
- FastAPI Backend: High-performance Python API
- Java Services: Data Manager and Vadalog Parser
- Single-port Architecture: Unified deployment via Node.js proxy
Prerequisites
- Databricks workspace (AWS, Azure, or GCP)
- Databricks CLI installed and configured (
databricks configure) - AWS credentials for Java services - contact Prometheux to obtain
Quick Start
1. Clone the Repository
git clone https://github.com/prometheuxresearch/prometheux-databricks-app.git
cd prometheux-databricks-app
2. Configure Environment Variables
Copy the example configuration:
cp bundle-vars.example.yml bundle-vars.yml
Edit bundle-vars.yml and fill in the AWS credentials provided by Prometheux:
# Data Manager (provided by Prometheux)
data_manager_s3_bucket: "<provided-by-prometheux>"
data_manager_s3_key: "<provided-by-prometheux>"
data_manager_aws_access_key_id: "<provided-by-prometheux>"
data_manager_aws_secret_access_key: "<provided-by-prometheux>"
# Vadalog Parser (provided by Prometheux)
vadalog_parser_s3_bucket: "<provided-by-prometheux>"
vadalog_parser_s3_key: "<provided-by-prometheux>"
vadalog_parser_aws_access_key_id: "<provided-by-prometheux>"
vadalog_parser_aws_secret_access_key: "<provided-by-prometheux>"
Contact Prometheux to obtain these credentials (required for full functionality).
3. Deploy & Run
./deploy.sh
The script will:
- Check if you have Databricks CLI installed
- Prompt you to set up Lakebase (recommended) or use SQLite
- Deploy the app to your Databricks workspace
- Ask if you want to start the app immediately
If you decline to run the app, you can start it later with:
./deploy.sh run
4. Access Your App
Your app will be available at:
https://prometheux-<workspace-id>.databricksapps.com
Other useful commands:
./deploy.sh run # Start the app (if not started during deploy)
./deploy.sh restart # Restart app (fast - processes auto-cleanup)
./deploy.sh stop # Stop compute (slow - takes minutes to restart)
./deploy.sh status # Check app status
./deploy.sh logs # View app logs
./deploy.sh destroy # Remove the app
Java Services & Prometheux Engine
Data Manager & Vadalog Parser
The app includes Java services that are downloaded automatically at runtime:
- Data Manager: Advanced data processing and integration
- Vadalog Parser: Parses and validates Vadalog logic programs
Setup: Contact Prometheux to obtain AWS credentials, then add them to bundle-vars.yml (see Step 2 above).
Prometheux Engine
The Prometheux Engine is the full reasoning engine for executing Vadalog programs.
The Prometheux Engine is NOT included in this app package.
✅ Provided separately by Prometheux as a Databricks cluster library
✅ Installed directly on your Databricks cluster
✅ Contact Prometheux to request access and installation guidance
Configuration
Database Storage
The app uses Lakebase (Databricks managed PostgreSQL) for persistent data storage.
The ./deploy.sh script automatically handles Lakebase setup:
- Creates Lakebase project
- Configures permissions
- Connects the app
Alternative: If you decline Lakebase, the app uses SQLite (ephemeral - data lost on restart).
Configuration Variables
All configuration is in bundle-vars.yml. See the Quick Start (Step 2) for required values.
Core settings:
run_mode:development(default, no auth) orproductionorganization: Your organization nameusername: Default username (development mode only)- Java services: AWS credentials provided by Prometheux (see Quick Start Step 2)
Architecture
Application Components
┌─────────────────────────────────────────────────────────────────────────┐
│ PROMETHEUX DATABRICKS APP │
│ (Databricks Apps Container) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ User Browser │
│ │ │
│ ├──────────> Node.js Proxy (Port 8000) │
│ │ │
│ ├── /api/* ──> FastAPI Backend │
│ │ │ │
│ │ ├─> Data Manager │
│ │ │ (Java service) │
│ │ │ │
│ │ ├─> Vadalog Parser │
│ │ │ (Java service) │
│ │ │ │
│ │ └─> Lakebase │
│ │ (PostgreSQL) │
│ │ │
│ └── /* ──> React Frontend │
│ (Static files) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
│
│ Authentication:
│ • Bearer Token (from frontend)
│ • OAuth2 M2M
│ • User-to-Machine (U2M)
│ • Personal Access Token (PAT)
│ • Or default Databricks App credentials
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ DATABRICKS WORKSPACE ENVIRONMENT │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ DATABRICKS COMPUTE (Spark Clusters) │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Prometheux Engine (PX JAR) │ │ │
│ │ │ • Installed via Libraries API │ │ │
│ │ │ • Graph Analytics │ │ │
│ │ │ • ETL Processing │ │ │
│ │ │ • Data Processing │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ ▲ │ │
│ │ │ Jobs Submit API │ │
│ │ │ (Vadalog execution) │ │
│ └──────────────────────────┼─────────────────────────────────────┘ │
│ │ │
│ │ Spark Connect API │
│ │ (Get Results) │
│ │ │
│ ┌──────────────────────────┴─────────────────────────────────────┐ │
│ │ UNITY CATALOG │ │
│ │ • Volumes (Uploaded JARs) │ │
│ │ • Tables (Input/Output) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
Authentication Flow:
- Frontend passes auth token to Backend
- Backend authenticates with Databricks using:
- User's Bearer Token (U2M)
- OAuth2 M2M credentials
- Personal Access Token (PAT)
- Or default App credentials
- Backend submits Vadalog jobs to Prometheux Engine on cluster
- Results retrieved via Spark Connect API
Key Components
Databricks App Container:
- React Frontend: Modern UI for ontology management and data exploration
- FastAPI Backend: Python API handling business logic
- Node.js Proxy: Single entry point, routes requests to frontend/backend
- Data Manager: Advanced data processing (Java service)
- Vadalog Parser: Logic program validation (Java service)
- Lakebase: Persistent PostgreSQL storage for user data
Databricks Workspace:
- Prometheux Engine: Full reasoning engine (separate installation on cluster)
- Unity Catalog: Data and library storage
- Spark Clusters: Compute resources for engine execution
Updating
To update to a new version:
cd prometheux-databricks-app
git pull
databricks bundle deploy --var-file bundle-vars.yml
databricks bundle run prometheux
Your bundle-vars.yml configuration will be preserved.
Troubleshooting
Check App Logs
databricks apps logs prometheux
Check App Status
databricks apps get prometheux
Common Issues
- App won't start: Check that all S3 credentials are correct and buckets are accessible
- 404 errors: Ensure the app has finished starting (can take 2-3 minutes)
- Java services not running: Check that JAR files exist in S3 at the specified paths
Development Mode
The default run_mode: development bypasses JWT authentication for easier testing. For production, set run_mode: production in your bundle-vars.yml.