Tangle: Getting Started

Tangle is a service and web app that allows users to build and run machine learning pipelines using drag and drop without having to set up a development environment.

Jump to Tangle

The experimental new version of the Tangle app is now available.
No registration is required to experiment with building pipelines.

Run your first pipeline in seconds

What does a pipeline system do?

Tangle is a pipeline system that:

Orchestrates distributed execution, scheduling, data passing, and caching
Containerizes tasks for isolation and reproducibility
Uses command-line interfaces as the true interface with user code
Runs programs, not just functions passing shared in-memory objects

Why use Tangle

Tangle offers unique advantages for both teams and individual ML engineers:

Visual pipeline editor

Unlike any other pipeline system, Tangle provides a powerful drag-and-drop UI editor
Iterate faster, create complex pipelines with ease
No coding required to build pipelines—perfect for non-engineers and rapid prototyping
Jump seamlessly between visual editor and code when needed

Advanced execution caching

Content-based caching saves significant time and compute costs
When you modify a pipeline, only changed tasks are re-executed
Both upstream and downstream cached results are reused when possible
Can even reuse running executions, not just completed ones

Tracking and reproducibility

All pipeline runs are automatically recorded with graphs, logs, and artifact metadata
Intermediate data is immutable and never overwritten
Clone any pipeline run to reproduce exact results
Strict component versioning ensures reproducibility

Component reusability

Build a library of reusable components like Lego pieces
Components are self-contained and language-agnostic (Python, Java, Shell, Ruby, C++, JS/TS)
Compatible with existing Kubeflow and Vertex AI pipeline components
Share components across teams and pipelines without dependency conflicts

Open source and flexible

Run on any cloud provider or locally
Own your data and infrastructure
No vendor lock-in

How Tangle compares

Tangle vs. Kubeflow Pipelines

Kubeflow Pipelines pioneered the component-based approach to ML workflows that Tangle builds upon. In fact, Tangle uses the same ComponentSpec format introduced in KFP v1, meaning components can be reused between systems.

Here's how Kubeflow compares to Tangle:

Visual pipeline editor: While Kubeflow requires writing Python code to define pipelines, Tangle offers a visual drag-and-drop editor that makes pipeline creation accessible to non-engineers.
Execution caching: Our content-based caching is more advanced than Kubeflow's lineage-based approach. We cache globally across all pipelines and can even reuse running executions, whereas Kubeflow only caches successful executions within each pipeline. This means significantly better resource utilization and faster iteration cycles.
Doesn't require Kubernetes: Kubeflow Pipelines requires Kubernetes infrastructure, which can be complex to set up and maintain.

Tangle vs. Vertex AI Pipelines

Vertex AI Pipelines is Google Cloud's managed version of pipeline orchestration. It uses a similar component model to Tangle but is proprietary and only available on Google Cloud.

Here's how Vertex AI Pipelines compares to Tangle:

Visual pipeline editor: While Vertex AI requires writing code to define pipelines, Tangle offers a visual drag-and-drop editor that makes pipeline creation accessible to non-engineers.
Execution caching: Like Kubeflow, Vertex Pipelines uses lineage-based caching that's limited to successful executions within each pipeline. Tangle's content-based caching works globally across all pipelines and can reuse partial results, making experimentation more efficient.
Open source and cloud-agnostic: Because Tangle is open source, you can run it anywhere you want: On your laptop, on-premises, or on any cloud provider.

Tangle vs. Apache Airflow

Apache Airflow was built for data engineering workflows rather than ML pipelines. This shows in fundamental architectural differences that matter for ML use cases.

Here's how Airflow compares to Tangle:

Visual pipeline editor: In Airflow, everything is code—you write Python operators that execute locally or call remote services. Tangle provides a visual interface.
Data passing: Airflow relies on XComs which are limited to small JSON data and require manual data management for anything substantial. Tangle provides explicit input/output definitions and manages data storage and passing automatically.
Execution caching: Airflow doesn't provide execution caching. Every time you run a workflow in Airflow, all tasks execute from scratch. Tangle uses content-based caching. That means you only pay for computation that actually needs to be redone, which can translate to significant cost savings in ML experimentation where pipeline iterations are common.
Containerization: Airflow operators typically run in a shared Python environment, which can lead to the dependency management challenges familiar to any Python developer. Tangle runs each task in an isolated container with clearly defined interfaces, ensuring reproducibility and eliminating dependency conflicts.

Credits

The Tangle app is based on the Pipeline Editor app created by Alexey Volkov as part of the Cloud Pipelines project.

Please report any bugs you find using GitHub Issues.

Jump to Tangle

What does a pipeline system do?​

Why use Tangle​

How Tangle compares​

Tangle vs. Kubeflow Pipelines​

Tangle vs. Vertex AI Pipelines​

Tangle vs. Apache Airflow​

Credits​