Skip to main content

Tangle: Getting Started

Tangle is a service and Web app that allows users to build and run Machine Learning pipelines using drag and drop without having to set up a development environment.

Jump To Tangle

The experimental new version of the Tangle app is now available.
No registration is required to experiment with building pipelines.

What does a pipeline system do?

A pipeline system like Tangle:

  • Orchestrates distributed execution, scheduling, data passing, and caching
  • Containerizes tasks for isolation and reproducibility
  • Uses command-line interfaces as the true interface with user code
  • Runs programs (not functions passing shared in-memory objects)

Why Use Tangle

Key Benefits

image

Tangle offers unique advantages for both teams and individual ML engineers:

Visual Pipeline Editor

Advanced Execution Caching

  • Content-based caching saves significant time and compute costs
  • When you modify a pipeline, only changed tasks are re-executed
  • Both upstream and downstream cached results are reused when possible
  • Can even reuse running executions, not just completed ones

Tracking and Reproducibility

  • All pipeline runs are automatically recorded with graphs, logs, and artifact metadata
  • Intermediate data is immutable and never overwritten
  • Clone any pipeline run to reproduce exact results
  • Strict component versioning ensures reproducibility

Component Reusability

Open Source and Flexible

  • Run on any cloud provider or locally
  • Own your data and infrastructure
  • No vendor lock-in

How Tangle Compares

Tangle vs. Kubeflow Pipelines

Kubeflow Pipelines pioneered the component-based approach to ML workflows that Tangle builds upon. In fact, Tangle uses the same ComponentSpec format introduced in KFP v1, meaning components can be reused between systems.

Where Tangle differentiates itself is in user experience and efficiency. While Kubeflow requires writing Python code to define pipelines, Tangle offers a visual drag-and-drop editor that makes pipeline creation accessible to non-engineers. Our content-based caching is more advanced than Kubeflow's lineage-based approach - we cache globally across all pipelines and can even reuse running executions, whereas Kubeflow only caches successful executions within each pipeline. This means significantly better resource utilization and faster iteration cycles.

Tangle is also designed to be cloud-agnostic from the ground up, while Kubeflow Pipelines requires Kubernetes infrastructure, which can be complex to set up and maintain.

Tangle vs. Vertex AI Pipelines

Vertex AI Pipelines is Google Cloud's managed version of pipeline orchestration, deeply integrated with the Google Cloud ecosystem. While this integration can be powerful for teams already on Google Cloud, it creates vendor lock-in that many organizations want to avoid.

Vertex Pipelines uses a similar component model but is proprietary and only available on Google Cloud. Tangle, being open source, can run anywhere - on your laptop, on-premises, or on any cloud provider. This flexibility is crucial for teams that need multi-cloud strategies or want to avoid dependency on a single vendor.

Like Kubeflow, Vertex Pipelines uses lineage-based caching that's limited to successful executions within each pipeline. Tangle's content-based caching works globally across all pipelines and can reuse partial results, making experimentation more efficient. Additionally, Vertex Pipelines lacks a visual pipeline editor, requiring all pipeline definitions to be written in code.

Tangle vs. Apache Airflow

Apache Airflow comes from a different tradition - it was built for data engineering workflows rather than ML pipelines. This shows in fundamental architectural differences that matter for ML use cases.

In Airflow, everything is code - you write Python operators that execute locally or call remote services. This can be powerful for teams comfortable with code, but it lacks the visual interface that makes Tangle accessible to a broader audience. More importantly, Airflow wasn't designed for the data passing requirements of ML workflows. While Tangle provides explicit input/output definitions and manages data storage and passing automatically, Airflow relies on XComs which are limited to small JSON data and require manual data management for anything substantial.

Airflow also lacks execution caching entirely. Every time you run a workflow in Airflow, all tasks execute from scratch. In contrast, Tangle's content-based caching means you only pay for computation that actually needs to be redone, which can translate to significant cost savings in ML experimentation where pipeline iterations are common.

The containerization model also differs significantly. Tangle runs each task in an isolated container with clearly defined interfaces, ensuring reproducibility and eliminating dependency conflicts. Airflow operators typically run in a shared Python environment, which can lead to the dependency management challenges familiar to any Python developer.

Key Features

  • No-code Pipeline Creation: Start building immediately with drag-and-drop interface
  • Flexible Execution: Run locally or deploy to any cloud
  • Fast Iteration: Clone, modify, and re-run pipelines with two clicks
  • Automatic Caching: Save time and compute with intelligent execution reuse
  • Complete Tracking: All runs preserved with logs, artifacts, and metadata
  • Component Ecosystem: Growing library of pre-built components plus easy custom component creation
  • Cross-Language Support: Components in any language that can run in a container
  • Industry Compatibility: Works with existing Kubeflow and Vertex AI components

Credits

The Tangle app is based on the Pipeline Editor app created by Alexey Volkov as part of the Cloud Pipelines project.

Please report any bugs you find using GitHub Issues.