Moving to the Cloud¶

Warning

This is in progress and not yet fully supported.

In the Quick Start guide, you learned how to implement a simple app that trains an image classifier and serve it once trained.

In this tutorial, you’ll learn how to extend that application so that it works seamlessly both locally and in the cloud.

Step 1: Distributed Application¶

Distributed Storage¶

When running your application in a fully-distributed setting, the data available on one machine won’t necessarily be available on another.

To solve this problem, Lightning introduces the Path object. This ensures that your code can run both locally and in the cloud.

The Path object keeps track of the work which creates the path. This enables Lightning to transfer the files correctly in a distributed setting.

Instead of passing a string representing a file or directory, Lightning simply wraps them into a Path object and makes them an attribute of your LightningWork.

Without doing this conscientiously for every single path, your application will fail in the cloud.

In the example below, a file written by SourceFileWork is being transferred by the flow to the DestinationFileAndServeWork work. The Path object is the reference to the file.

import os

from lightning.app import CloudCompute, LightningApp, LightningFlow, LightningWork
from lightning.app.components import TracerPythonScript
from lightning.app.storage.path import Path

FILE_CONTENT = """
Hello there!
This tab is currently an IFrame of the FastAPI Server running in `DestinationFileAndServeWork`.
Also, the content of this file was created in `SourceFileWork` and then transferred to `DestinationFileAndServeWork`.
Are you already 🤯 ? Stick with us, this is only the beginning. Lightning is 🚀.
"""


class SourceFileWork(LightningWork):
    def __init__(self, cloud_compute: CloudCompute = CloudCompute(), **kwargs):
        super().__init__(parallel=True, **kwargs, cloud_compute=cloud_compute)
        self.boring_path = None

    def run(self):
        # This should be used as a REFERENCE to the file.
        self.boring_path = "lit://boring_file.txt"
        with open(self.boring_path, "w", encoding="utf-8") as f:
            f.write(FILE_CONTENT)


class DestinationFileAndServeWork(TracerPythonScript):
    def run(self, path: Path):
        assert path.exists()
        self.script_args += [f"--filepath={path}", f"--host={self.host}", f"--port={self.port}"]
        super().run()


class BoringApp(LightningFlow):
    def __init__(self):
        super().__init__()
        self.source_work = SourceFileWork()
        self.dest_work = DestinationFileAndServeWork(
            script_path=os.path.join(os.path.dirname(__file__), "scripts/serve.py"),
            port=1111,
            parallel=False,  # runs until killed.
            cloud_compute=CloudCompute(),
            raise_exception=True,
        )

    @property
    def ready(self) -> bool:
        return self.dest_work.is_running

    def run(self):
        self.source_work.run()
        if self.source_work.has_succeeded:
            # the flow passes the file from one work to another.
            self.dest_work.run(self.source_work.boring_path)
            self.stop("Boring App End")

    def configure_layout(self):
        return {"name": "Boring Tab", "content": self.dest_work.url + "/file"}


app = LightningApp(BoringApp())

In the scripts/serve.py file, we are creating a FastApi Service running on port 1111 that returns the content of the file received from SourceFileWork when a post request is sent to /file.

import argparse
import os

import uvicorn
from fastapi import FastAPI
from fastapi.requests import Request
from fastapi.responses import HTMLResponse

if __name__ == "__main__":
    parser = argparse.ArgumentParser("Server Parser")
    parser.add_argument("--filepath", type=str, help="Where to find the `filepath`")
    parser.add_argument("--host", type=str, default="0.0.0.0", help="Server host`")
    parser.add_argument("--port", type=int, default="8888", help="Server port`")
    hparams = parser.parse_args()

    fastapi_service = FastAPI()

    if not os.path.exists(str(hparams.filepath)):
        content = ["The file wasn't transferred"]
    else:
        with open(hparams.filepath) as fo:
            content = fo.readlines()  # read the file received from SourceWork.

    @fastapi_service.get("/file")
    async def get_file_content(request: Request, response_class=HTMLResponse):
        lines = "\n".join(["<p>" + line + "</p>" for line in content])
        return HTMLResponse(f"<html><head></head><body><ul>{lines}</ul></body></html>")

    uvicorn.run(app=fastapi_service, host=hparams.host, port=hparams.port)

Distributed Frontend¶

In the above example, the FastAPI Service was running on one machine, and the frontend UI in another.

In order to assemble them, you need to do two things:

Provide port argument to your work’s __init__ method to expose a single service.

Here’s how to expose the port:

class BoringApp(LightningFlow):
    def __init__(self):
        super().__init__()
        self.source_work = SourceFileWork()
        self.dest_work = DestinationFileAndServeWork(
            script_path=os.path.join(os.path.dirname(__file__), "scripts/serve.py"),
            port=1111,
            parallel=False,  # runs until killed.
            cloud_compute=CloudCompute(),
            raise_exception=True,
        )

And here’s how to expose your services within the configure_layout flow hook:

            # the flow passes the file from one work to another.
            self.dest_work.run(self.source_work.boring_path)
            self.stop("Boring App End")

    def configure_layout(self):

In this example, we’re appending /file to our FastApi Service url. This means that our Boring Tab triggers the get_file_content from the FastAPI Service and embeds its content as an IFrame.

    @fastapi_service.get("/file")
    async def get_file_content(request: Request, response_class=HTMLResponse):
        lines = "\n".join(["<p>" + line + "</p>" for line in content])

Here’s a visualization of the application described above:

Step 2: Scalable Application¶

The benefit of defining long-running code inside a LightningWork component is that you can run it on different hardware by providing CloudCompute to the __init__ method of your LightningWork.

By adapting the Quick Start example as follows, you can easily run your component on multiple GPUs:

Without doing much, you’re now running a script on its own cluster of machines! 🤯

Step 3: Resilient Application¶

We designed Lightning with a strong emphasis on supporting failure cases. The framework shines when the developer embraces our fault-tolerance best practices, enabling them to create ML applications with a high degree of complexity as well as a strong support for unhappy cases.

An entire section would be dedicated to this concept.

TODO