PyTorchSpawnMultiNode¶

class lightning.app.components.multi_node.pytorch_spawn.PyTorchSpawnMultiNode(work_cls, cloud_compute, num_nodes, *work_args, **work_kwargs)¶

Bases: MultiNode

This component enables performing distributed multi-node multi-device training.

Example:

import torch

from lightning.app import LightningWork, CloudCompute
from lightning.components import MultiNode

class AnyDistributedComponent(LightningWork):
    def run(
        self,
        main_address: str,
        main_port: int,
        node_rank: int,
    ):
        print(f"ADD YOUR DISTRIBUTED CODE: {main_address} {main_port} {node_rank}")


compute = CloudCompute("gpu")
app = LightningApp(
    MultiNode(
        AnyDistributedComponent,
        num_nodes=8,
        cloud_compute=compute,
    )
)

Parameters

work_cls¶ (Type[LightningWork]) – The work to be executed
num_nodes¶ (int) – Number of nodes. Gets ignored when running locally. Launch the app with –cloud to run on multiple cloud machines.
cloud_compute¶ (CloudCompute) – The cloud compute object used in the cloud. The value provided here gets ignored when running locally.
work_args¶ (Any) – Arguments to be provided to the work on instantiation.
work_kwargs¶ (Any) – Keywords arguments to be provided to the work on instantiation.