Tasks and Executions
Tasks: Components in Action
A Task is what you get when you drag a component onto the pipeline canvas. It's a configured instance of a component with:
- A unique task ID within the pipeline
- Specific argument values
- Connections to other tasks
- Execution options
Technically, a Task is an item in tasks section of a Graph component.
Think of the relationship this way: If a Component is like a function definition, then a Task is like calling that function with specific arguments.
When you place the same component multiple times on the canvas, you create multiple tasks, each with its own configuration:
# In a pipeline, tasks are instances of components
tasks:
preprocess_train: # Task 1 using Data Preprocessor component
componentRef:
name: Data Preprocessor
arguments:
input_dataset:
taskOutput:
outputName: output
taskId: Task A
preprocess_test: # Task 2 using the same component
componentRef:
name: Data Preprocessor
arguments:
input_dataset:
taskOutput:
outputName: output
taskId: Task B
Execution: Running the Task
Executions is analogous to a "function call execution at runtime".
Execution = Task + actual input/outputs artifacts, execution metadata (start/end time, launcher-specific information), logs, program exit code etc
Every execution may have logs, artifacts, and metadata. You can access the execution details in the Pipeline Run page.
The 1-to-Many Cascade: One component can be used in many tasks (even multiple times in the same pipeline), and each task can have many executions (e.g., daily runs). This relationship is similar to database foreign keys.