Skip to main content

Basic Terminology

This guide explains the essential concepts and terminology you'll encounter when working with Pipeline Studio. Understanding these fundamentals will help you build and manage ML pipelines more effectively.

The Building Blocks

Components vs Tasks vs Execution

The relationship between Components and Tasks is fundamental to understanding Pipeline Studio:

ConceptDefinitionAnalogy
ComponentIs a self-contained, reusable unit of functionality defined by a ComponentSpec YAML fileLike a recipe or function definition
TaskAn instance of a component specified in the Pipeline and visible on the canvasLike cooking from a recipe or declaring a function call in code
ExecutionA run of a task with actual input/outputs artifacts, execution metadata (start/end time, launcher-specific information), logs, program exit code etcLike calling a function with actual arguments in a runtime

Pipelines: Orchestrating Components

A pipeline in TangleML is a special kind of component known as a "graph component" Instead of acting as a standalone container, its main role is to orchestrate multiple tasks and manage how data flows between them. Pipelines are highly reusable—they can themselves be used as components in other pipelines, enabling flexible and hierarchical workflows. When a pipeline runs, it takes care of task dependencies and directs the flow of data, making sure everything happens in the right order.

warning

Make sure you understand how Pipelines are persisted. Unless Pipeline is submitted as a run, it is stored in the browser's storage.

Inputs and Outputs

info

Read more about Inputs and Outputs and Data Flow section.

Inputs define what data or parameters a component needs to operate. They serve as the component's interface for receiving information.

Inputs can come in two main forms. Value inputs are simple parameters that are passed directly. Path inputs are used for file or directory location - for passing large artifacts.

Outputs define what a component produces after execution. They allow components to pass results to downstream tasks.

tip

Outputs from one task become available as inputs to connected downstream tasks, creating the data flow in your pipeline.

Artifacts

info

Learn more about the Artifacts.

Artifacts are the data produced by components (read: any output), stored in TangleML's artifact storage system. It can be a Blob (nameless files) or a directory. Artifacts can be accessed in the Pipeline Run page, in the Artifacts tab.

Summary

Understanding these core concepts helps you work effectively with Pipeline Studio:

tip

Quick Reference:

  • Component = Reusable template/blueprint
  • Task = Component instance on canvas
  • Pipeline = Graph component orchestrating tasks
  • Input/Output = Data interface between tasks
  • Artifact = File-based data (datasets, models, etc.)

With these fundamentals in place, you're ready to start building powerful ML pipelines by combining components into tasks, connecting them through inputs and outputs, and managing data flow through artifacts.