Basic Terminology

This guide explains the essential concepts and terminology you'll encounter when working with Pipeline Studio. Understanding these fundamentals will help you build and manage ML pipelines more effectively.

The building blocks

Components, tasks, and execution

The relationship between components and tasks is fundamental to understanding Pipeline Studio:

Concept	Definition	Analogy
Component	A self-contained, reusable unit of functionality defined by a ComponentSpec YAML file	Like a recipe or function definition
Task	An instance of a component specified in the pipeline and visible on the canvas.	Like cooking from a recipe or declaring a function call in code
Execution	A run of a task, which produces input/outputs artifacts, execution metadata (such as start/end time and launcher-specific information), logs, program exit codes, etc	Like calling a function with actual arguments in a runtime

Pipelines: Orchestrating components

A pipeline in TangleML is a special kind of component known as a graph component. Instead of acting as a standalone container, its main role is to orchestrate multiple tasks and manage how data flows between them. Pipelines are highly reusable—they can themselves be used as components in other pipelines, enabling flexible and hierarchical workflows. When a pipeline runs, it takes care of task dependencies and directs the flow of data, making sure everything happens in the right order.

tip

While editing, pipelines are only stored locally in your browser. Once a pipeline is submitted, as a run, it's stored in the backend for all users. For more information, see Pipelines Persistence.

Inputs and outputs

A pipeline links together inputs and outputs:

Inputs define what data or parameters a component needs to operate. They serve as the component's interface for receiving information. Inputs have two forms:
- Value inputs are simple parameters that are passed directly.
- Path inputs are used to pass large artifacts as files or directories.
Outputs define what a component produces after execution. They allow components to pass results to downstream tasks.

Outputs from one task become available as inputs to connected downstream tasks, creating the data flow in your pipeline. For more information, see Inputs and Outputs and Data Flow.

Artifacts

Artifacts are the data produced by components (read: any output), stored in TangleML's artifact storage system. Artifacts can be a blob (nameless files) or a directory. To view artifacts, go to the Pipeline Run page in the Artifacts tab. For more information, see Artifacts.

The building blocks​

Components, tasks, and execution​

Pipelines: Orchestrating components​

Inputs and outputs​

Artifacts​

The building blocks

Components, tasks, and execution

Pipelines: Orchestrating components

Inputs and outputs

Artifacts