Introduction
Introduction
Seeknal is a platform that abstracts away the complexity of data transformation and AI/ML engineering. It is a collection of tools that help you transform data, store it, and use it for machine learning and data analytics.
Seeknal lets you:
Define data and feature transformations from raw data sources using Pythonic APIs and YAML.
Register transformations and feature groups by names and get transformed data and features for various use cases including AI/ML modeling, data engineering, business metrics calculation and more.
Share transformations and feature groups across teams and company.
Seeknal is useful in multiple use cases including:
AI/ML modeling: computes your feature transformations and incorporates them into your training data, using point-in-time joins to prevent data leakage while supporting the materialization and deployment of your features for online use in production.
Data analytics: build data pipelines to extract features and metrics from raw data for Analytics and AI/ML modeling.
High Level Design
The Seeknal architecture is designed as an end-to-end data transformation platform that enables teams to build, operate, and maintain large-scale ETL pipelines. Below is a conceptual overview of how Seeknal’s components work together, from data ingestion to feature storage and orchestration.
Overview

Open transformer architecture.drawio.png
Data Sources Seeknal accepts data from diverse sources (databases, APIs, files, etc.) to power its transformation workflows. These sources can be defined and configured within each project’s workspace.
Transformer
Common Artifacts: Contains shared configurations and variables that standardize the way data transformations are defined and executed.
Rule: Houses any validation or transformation rules that can be reused across multiple pipelines.
DXL (Data Transformation Language): Encapsulates transformation logic in YAML, making it easy to collaborate, audit, and modify.
Project and Workspaces: Provide isolation and organization, allowing different teams to manage separate pipelines under a unified system.
Feature Store & Transformed Data After processing, data can be published to the Feature Store for machine learning applications or stored as transformed data for downstream analytics. This ensures a centralized and consistent location for both operational and ML-driven use cases.
Orchestration (Prefect) Seeknal integrates with Prefect to automate and schedule transformations, enabling hands-free operation of daily or real-time pipelines.
Python SDK The Python SDK offers a user-friendly interface for defining, executing, and debugging transformations. It allows teams to incorporate Seeknal’s functionalities into existing Python workflows seamlessly.
Last updated