Basic Terminology
This guide explains the essential concepts and terminology you'll encounter when working with Pipeline Studio. Understanding these fundamentals will help you build and manage ML pipelines more effectively.
The building blocks
Components, tasks, and execution
The relationship between components and tasks is fundamental to understanding Pipeline Studio:
| Concept | Definition | Analogy |
|---|---|---|
| Component | A self-contained, reusable unit of functionality defined by a ComponentSpec YAML file | Like a recipe or function definition |
| Task | An instance of a component specified in the pipeline and visible on the canvas. | Like cooking from a recipe or declaring a function call in code |
| Execution | A run of a task, which produces input/outputs artifacts, execution metadata (such as start/end time and launcher-specific information), logs, program exit codes, etc | Like calling a function with actual arguments in a runtime |
Pipelines: Orchestrating components
A pipeline in TangleML is a special kind of component known as a graph component. Instead of acting as a standalone container, its main role is to orchestrate multiple tasks and manage how data flows between them. Pipelines are highly reusable—they can themselves be used as components in other pipelines, enabling flexible and hierarchical workflows. When a pipeline runs, it takes care of task dependencies and directs the flow of data, making sure everything happens in the right order.
While editing, pipelines are only stored locally in your browser. Once a pipeline is submitted, as a run, it's stored in the backend for all users. For more information, see Pipelines Persistence.
Inputs and outputs
A pipeline links together inputs and outputs:
- Inputs define what data or parameters a component needs to operate. They serve as the component's interface for receiving information. Inputs have two forms:
- Value inputs are simple parameters that are passed directly.
- Path inputs are used to pass large artifacts as files or directories.
- Outputs define what a component produces after execution. They allow components to pass results to downstream tasks.
Outputs from one task become available as inputs to connected downstream tasks, creating the data flow in your pipeline. For more information, see Inputs and Outputs and Data Flow.
Artifacts
Artifacts are the data produced by components (read: any output), stored in TangleML's artifact storage system. Artifacts can be a blob (nameless files) or a directory. To view artifacts, go to the Pipeline Run page in the Artifacts tab. For more information, see Artifacts.