Basic Terminology
This guide explains the essential concepts and terminology you'll encounter when working with Pipeline Studio. Understanding these fundamentals will help you build and manage ML pipelines more effectively.
The Building Blocks
Components vs Tasks vs Execution
The relationship between Components and Tasks is fundamental to understanding Pipeline Studio:
| Concept | Definition | Analogy |
|---|---|---|
| Component | Is a self-contained, reusable unit of functionality defined by a ComponentSpec YAML file | Like a recipe or function definition |
| Task | An instance of a component specified in the Pipeline and visible on the canvas | Like cooking from a recipe or declaring a function call in code |
| Execution | A run of a task with actual input/outputs artifacts, execution metadata (start/end time, launcher-specific information), logs, program exit code etc | Like calling a function with actual arguments in a runtime |
Pipelines: Orchestrating Components
A pipeline in TangleML is a special kind of component known as a "graph component" Instead of acting as a standalone container, its main role is to orchestrate multiple tasks and manage how data flows between them. Pipelines are highly reusable—they can themselves be used as components in other pipelines, enabling flexible and hierarchical workflows. When a pipeline runs, it takes care of task dependencies and directs the flow of data, making sure everything happens in the right order.
Make sure you understand how Pipelines are persisted. Unless Pipeline is submitted as a run, it is stored in the browser's storage.
Inputs and Outputs
Read more about Inputs and Outputs and Data Flow section.
Inputs define what data or parameters a component needs to operate. They serve as the component's interface for receiving information.
Inputs can come in two main forms. Value inputs are simple parameters that are passed directly. Path inputs are used for file or directory location - for passing large artifacts.
Outputs define what a component produces after execution. They allow components to pass results to downstream tasks.
Outputs from one task become available as inputs to connected downstream tasks, creating the data flow in your pipeline.
Artifacts
Learn more about the Artifacts.
Artifacts are the data produced by components (read: any output), stored in TangleML's artifact storage system. It can be a Blob (nameless files) or a directory. Artifacts can be accessed in the Pipeline Run page, in the Artifacts tab.
Summary
Understanding these core concepts helps you work effectively with Pipeline Studio:
Quick Reference:
- Component = Reusable template/blueprint
- Task = Component instance on canvas
- Pipeline = Graph component orchestrating tasks
- Input/Output = Data interface between tasks
- Artifact = File-based data (datasets, models, etc.)
With these fundamentals in place, you're ready to start building powerful ML pipelines by combining components into tasks, connecting them through inputs and outputs, and managing data flow through artifacts.