LightningTrainerMultiNode

class lightning.app.components.multi_node.trainer.LightningTrainerMultiNode(work_cls, cloud_compute, num_nodes, *work_args, **work_kwargs)

Bases: MultiNode

This component enables performing distributed multi-node multi-device training.

Example:

import torch

from lightning.app import LightningWork, CloudCompute
from lightning.components import MultiNode

class AnyDistributedComponent(LightningWork):
    def run(
        self,
        main_address: str,
        main_port: int,
        node_rank: int,
    ):
        print(f"ADD YOUR DISTRIBUTED CODE: {main_address} {main_port} {node_rank}")


compute = CloudCompute("gpu")
app = LightningApp(
    MultiNode(
        AnyDistributedComponent,
        num_nodes=8,
        cloud_compute=compute,
    )
)
Parameters
  • work_cls (Type[LightningWork]) – The work to be executed

  • num_nodes (int) – Number of nodes. Gets ignored when running locally. Launch the app with –cloud to run on multiple cloud machines.

  • cloud_compute (CloudCompute) – The cloud compute object used in the cloud. The value provided here gets ignored when running locally.

  • work_args (Any) – Arguments to be provided to the work on instantiation.

  • work_kwargs (Any) – Keywords arguments to be provided to the work on instantiation.