PyTorchSpawnMultiNode¶
- class lightning.app.components.multi_node.pytorch_spawn.PyTorchSpawnMultiNode(work_cls, cloud_compute, num_nodes, *work_args, **work_kwargs)¶
Bases:
MultiNode
This component enables performing distributed multi-node multi-device training.
Example:
import torch from lightning.app import LightningWork, CloudCompute from lightning.components import MultiNode class AnyDistributedComponent(LightningWork): def run( self, main_address: str, main_port: int, node_rank: int, ): print(f"ADD YOUR DISTRIBUTED CODE: {main_address} {main_port} {node_rank}") compute = CloudCompute("gpu") app = LightningApp( MultiNode( AnyDistributedComponent, num_nodes=8, cloud_compute=compute, ) )
- Parameters
work_cls¶ (
Type
[LightningWork
]) – The work to be executednum_nodes¶ (
int
) – Number of nodes. Gets ignored when running locally. Launch the app with –cloud to run on multiple cloud machines.cloud_compute¶ (
CloudCompute
) – The cloud compute object used in the cloud. The value provided here gets ignored when running locally.work_args¶ (
Any
) – Arguments to be provided to the work on instantiation.work_kwargs¶ (
Any
) – Keywords arguments to be provided to the work on instantiation.