kedro.runner.ParallelRunner¶
- class kedro.runner.ParallelRunner(max_workers=None, is_async=False, extra_dataset_patterns=None)[source]¶
ParallelRunneris anAbstractRunnerimplementation. It can be used to run thePipelinein parallel groups formed by toposort. Please note that this runner implementation validates dataset using the_validate_catalogmethod, which checks if any of the datasets are single process only using the _SINGLE_PROCESS dataset attribute.Methods
run(pipeline, catalog[, hook_manager, ...])Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.run_only_missing(pipeline, catalog, hook_manager)Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.- __init__(max_workers=None, is_async=False, extra_dataset_patterns=None)[source]¶
Instantiates the runner by creating a Manager.
- Parameters:
max_workers (int | None) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers).
is_async (bool) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
extra_dataset_patterns (dict[str, dict[str, Any]] | None) – Extra dataset factory patterns to be added to the catalog during the run. This is used to set the default datasets to SharedMemoryDataset for ParallelRunner.
- Raises:
ValueError – bad parameters passed
- run(pipeline, catalog, hook_manager=None, session_id=None)[source]¶
Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.- Parameters:
- Raises:
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Return type:
- Returns:
Any node outputs that cannot be processed by the catalog. These are returned in a dictionary, where the keys are defined by the node outputs.
- run_only_missing(pipeline, catalog, hook_manager)[source]¶
Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.- Parameters:
pipeline (
Pipeline) – ThePipelineto run.catalog (
CatalogProtocol) – An implemented instance ofCatalogProtocolfrom which to fetch data.hook_manager (
PluginManager) – ThePluginManagerto activate hooks.
- Raises:
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Return type:
- Returns:
Any node outputs that cannot be processed by the catalog. These are returned in a dictionary, where the keys are defined by the node outputs.