Connecting to Databases and External Data Sources
Prometheux can seamlessly integrate and migrate data across various platforms. It supports file-based sources (CSV, Parquet, Excel, JSON, COBOL), relational and graph databases, cloud data warehouses, NoSQL stores, REST APIs, and distributed file systems like S3 and HDFS.@bind Options
The bind command configures reading from and writing to databases and data sources:datasource_type values:
| Type | Description |
|---|---|
csv | CSV files |
parquet | Parquet files |
excel | Excel files |
json | JSON files |
cobol | Legacy COBOL / EBCDIC files (with copybook) |
postgresql | PostgreSQL databases |
neo4j | Neo4j graph databases |
db2 | DB2 databases |
mariadb | MariaDB databases |
oracle | Oracle databases |
sqlite | SQLite databases |
mysql | MySQL databases |
sqlserver | SQL Server databases |
h2 | H2 databases |
sybase | Sybase databases |
teradata | Teradata databases |
redshift | Amazon Redshift |
bigquery | Google BigQuery |
hive | Apache Hive |
presto | Presto |
snowflake | Snowflake |
databricks | Databricks |
dynamodb | Amazon DynamoDB |
api | REST APIs |
text | Plain text files |
binary | Binary files (PDF, images, etc.) |
url: Full JDBC URL (e.g.jdbc:postgresql://localhost:5432/prometheux)host: Database hostport: Database portdatabase: Database nameusername: Login usernamepassword: Login password
Configuring Credentials
Credentials can be specified directly in@bind annotations or stored in px.properties for centralized management:
CSV Datasource
Prometheux supports CSV files for both reading and writing. By default, all fields are treated as strings. Values\N are treated as null (labelled nulls).
@bind options
useHeaders:trueorfalse— whether headers are presentdelimiter: Field separator characterrecordSeparator: Record separatorquoteMode:all,minimal,non_numeric, ornonenullString: String to use for null valuesmultiline:trueto handle multi-line fieldscoalesce:trueto produce a single output file (standalone environments only)
Examples
Simple read:Parquet Datasource
Parquet is a columnar storage format optimized for large-scale data lake scenarios.Excel Datasource
Write from CSV to Excel:JSON Datasource
Prometheux reads JSON files and automatically infers nested objects as struct types. Nested fields are accessible via dot notation in SQL or viastruct:get in Vadalog rules.
Simple Read
SQL Queries on Nested Fields
Using struct:get
Query Option in @bind
COBOL Datasource
Prometheux can read legacy COBOL / EBCDIC data files — VSAM dumps, flat record files, variable-length RDW/BDW streams — and join them with other sources using standard Vadalog rules. The connector is powered by AbsaOSS Cobrix. A COBOL binding always needs:- A data file (addressed by
filepath+filename) - A copybook describing the record layout — passed via the
copybookoption as a path to a.cpyfile
The
@bind tokenizer strips \n and \t from the option block and does not support escaped single quotes, so multi-line copybook bodies cannot be embedded inline. Keep the copybook as a file and reference it by path.@bind options
copybook: required — path to the COBOL copybook filecobolPreset: shortcut for common mainframe framings:flat-fixed-ebcdic(default) — fixed-length records, EBCDIC encodingflat-fixed-ascii— fixed-length records, ASCII encodingmainframe-v— IBM variable-length records with 4-byte RDW headermainframe-vb— IBM variable-blocked: BDW-prefixed blocks of RDW-prefixed recordscustom— no preset; supply all Cobrix flags viacobrix.*options
encoding: Override encoding inferred by preset (ebcdic,ascii)record_format:F(fixed),V(variable),VB(variable-blocked)cobolFlattenPolicy:dotted(default) — flatten nested groups into prefixed top-level columnskeepNested— preserve nested structs asmapvalues (accessible viastruct:get)
cobolFileExtensions: Filter files in a directory (e.g.,.dat,.bin)cobrix.<flag>: Passthrough to the Cobrix reader verbatim
Example: Fixed-Length EBCDIC File
Example: Variable-Length Mainframe File
Example: Advanced Cobrix Overrides
Keeping Nested Groups as Structs
Querying COBOL Data with SQL
PIC S9(n)V99 COMP-3(packed decimal) is exposed asdoublein VadalogPIC 9(n) COMPbinary counters are exposed asintorlongOCCURS n TIMESgroups become Vadaloglistcolumns, expandable withcollections:explode- Point
filepathto a directory to read all matching extracts; usecobolFileExtensionsto filter
PostgreSQL Database
Write data from CSV to PostgreSQL:PostgreSQL with Supabase
Connect using the JDBC Transaction Pooler (recommended):After creating the user and granting permissions, it may take a few minutes for the changes to propagate. If you encounter connection issues, wait 5 minutes and try again.
MariaDB Database
Neo4j Database
Write nodes and relationships from CSV:Querying Neo4j with SQL
Prometheux automatically translates SQL to optimized Cypher, pushing it down to the Neo4j server:SQL-to-Neo4j translation:
SELECT * with LIMIT/OFFSET generates Cypher with SKIP/LIMIT; COUNT(*) generates a Cypher count query. No data is loaded into memory unnecessarily.Amazon DynamoDB
@bind Options
region: AWS region (e.g.,us-east-1)username: AWS Access Key IDpassword: AWS Secret Access KeysessionToken: Session Token for temporary credentialsendpointOverride: Custom endpoint URL (for DynamoDB Local)partitionKey: Partition key attribute name for table creationsortKey: Sort key attribute name (optional)billingMode:PAY_PER_REQUEST(default) orPROVISIONEDreadCapacity/writeCapacity: Capacity units for provisioned modewriteBatchSize: Items per batch (1–25, default 25)readPageSize: Page size for read operations (default 100)totalSegments: Segments for parallel scanning (default 8)
S3 Storage
Write to S3:Consuming Data via REST API
The API datasource supports JSON (default), CSV, and XML response formats, with full SQL integration for querying nested structures.Configuration Options
responseFormat:json(default),csv, orxmlauthType:basic,bearer, orapikeyusername/password: For basic authenticationtoken: For bearer authenticationapikey: For API key authenticationheaders: Custom headers askey1:value1,key2:value2delimiter: CSV delimiter (default,)
Quick Start
Querying Nested JSON
Joining Multiple API Sources
When joining multiple API sources, use renamed column format (predicate_name_i):
Prometheus Metrics
CSV Format API
Authentication Summary
| Auth Type | Parameters | Example |
|---|---|---|
| Bearer Token | authType=bearer, token=<token> | GitHub, Kubernetes, Prometheus |
| Basic Auth | authType=basic, username=<user>, password=<pass> | Private APIs |
| API Key | authType=apikey, apikey=<key> | REST APIs |
| Custom Headers | headers=<key1>:<value1>,<key2>:<value2> | APIs with custom headers |
| No Auth | (none) | Public APIs |
Working with Arrays
Best Practices
- JSON is the default
responseFormat— omit it for JSON APIs - Store tokens in environment variables:
token=${API_TOKEN} - Use SQL in rule bodies for filtering and aggregations
- Access nested fields with dot notation:
market_data.current_price.usd - Use
SIZE(array_field)for reliable array length checks - For JOINs between multiple sources, use renamed column references (
predicate_name_i) - Be mindful of rate limits on free-tier APIs
Text Files
Binary Files
Read from a PDF:documentType:
invoice, receipt.retailMeal, idDocument.passport, tax.us.w2, mortgage.us.1003, contract, and many more.
HDFS
Other Database Connectors
Sybase
Teradata
Amazon Redshift
Google BigQuery
Obtain credentials from the Google Cloud Console, then use either a credentials file or an access token:Snowflake
Retrieve your connection info from the Snowflake UI: user icon → “Connect a tool to Snowflake” → Connectors/Drivers → JDBC.password.

