Lightning Catalog logo

An open source framework for modern query federation and data preparation 

 

Get Started

Zetaris Lightning OSS Diagram

 

Overview

Lightning Catalog is a fast, lightweight and intuitive Spark based data catalog for the preparing data at any scale for ad-hoc analytics, data warehouse, lake house and ML projects.

Unified Access

Unified Access

Move data from source and legacy systems to target state while continuing the business.

Query Federation

Query Federation

A single view of all your data transformed into one unified semantic model and business language.

Enable Data Science Workloads

Enable Data Science Workloads

Lightning Catalog can remove the burden of data preparation workload for ML engineer, and help them focusing on building model.

Simplified Pipeline Execution Engine

Simplified Pipeline Execution Engine

Lightning Catalog simplify the life cycle of data engineering pipe line, build, test and deploy by leveraging Data Flow Table 

Discover

 

DISCOVER

Discover and register all your source metadata information

Diversify

 

DIVERSIFY

Accelerate your data transformations using basic SQL queries

Distribute

 

DISTRIBUTE

Distribute data by connecting upstream to downstream via secure JDBC/ODBC Connections

Key Features

Check icon

 

Fully Managed Catalog built in file systems (HDFS, Blob, and local file) which allows version control.
Check icon
Support Apache Spark Plug-in architecture.
Check icon
Support running data pipeline at MPP scale by leveraging Apache Spark and optional NVIDIA GPU
Check icon
Support running ANSI SQL and Hive QL over source systems defined in the Catalog
Check icon
Support multiple namespace.
Check icon
Support data quality by integrating Amazon Deequ.
Check icon
Support data flow table, declarative ETL framework which defines and transforms your data.
Check icon
Support metadata processing for unstructured data using endpoint declarations.

Latest Supported Data Sources

PostgreSQL

Google Big Query

Snowflake

MySQL

Amazon Redshift

terradata

Oracle

greenplumb

Vertica

MariaDB

iceberg

MongoDB

IBM DB2

delta-lake-logo

Microsoft SQL Server

Apache_Parquet_logo.svg

Apache_Avro_Logo_2023.svg

json

Google

CSV

... and many other compatible data sources

 

Get Started

Spark Jdbc (postgres, mysql), Spark Mongodb, Spark Azure blob, big query, Spark Hive/Glue, Spark Rest API, Spark XML, Spark Metastore/Catalog, Spark datalake (iceberg,  delta), Spark Access Control/Authentication, Spark data migration, Spark pipeline

The Lightning Catalog is licensed under the MIT License.