How to build an AI Product (6) - Creating a Docker environment for easy experimentation

Docker has revolutionized the way we build, ship, and run applications. It's an indispensable tool for developers and data scientists, allowing for consistent and reproducible environments. But why Docker? Lets find out.

Docker has revolutionized the way we build, ship, and run applications. It's an indispensable tool for developers and data scientists, allowing for consistent and reproducible environments. But why Docker? And, more specifically, why combine Docker with Anaconda Python? Let's delve into the world of containers and see how this potent combination can supercharge your experimentation process.

What's the Big Deal with Docker?

Docker is a platform that uses containerization technology to wrap up an application with everything it needs to run: code, runtime, libraries, and dependencies. This ensures that the application will always run the same, regardless of the environment it's in. Here are some reasons Docker has become essential:

  • Reproducibility: Ensure that your application runs the same way, every time, everywhere.
  • Isolation: Docker containers ensure that your application doesn't conflict with others, providing a clean environment for each app.
  • Portability: Easily share your application by just sharing a Docker image.
  • Version Control for Environments: Like Git for code, Docker can version your entire application environment.
  • Infrastructure Independence: Write once, run anywhere. Be it AWS, Azure, or your local machine.

Why Anaconda Python in Docker?

Anaconda is a distribution of Python for scientific computing and data science. It's adored by data scientists for several reasons:

  • Extensive Libraries: Anaconda bundles a ton of libraries for data science, machine learning, and scientific computing out of the box.
  • Environment Management: Easily create isolated Python environments with specific library versions, ensuring project consistency and avoiding library conflicts.
  • Popular in Data Science: With built-in support for popular libraries like TensorFlow, PyTorch, and scikit-learn, it's a go-to choice for many.

Marrying Docker and Anaconda Python gives you a containerized environment perfect for data science experimentation. You get the flexibility and vast library support of Anaconda, but with the reproducibility and isolation of Docker.

Crafting the Perfect Dockerfile

Here's a sample Dockerfile that sets up an Anaconda environment with some essential libraries:

# Use the continuumio/anaconda3 image as a base image
FROM continuumio/anaconda3

# Install gcc and other essential build tools
RUN apt-get update && apt-get install -y build-essential

# Install the required libraries
RUN pip install --no-cache-dir \
    PyMuPDF \
    sentence_transformers \
    annoy \
    fastapi \
    uvicorn \
    jinja2 \
    python-multipart \
    sqlite-utils

This Dockerfile does a few things:

  1. It uses the continuumio/anaconda3 image, which has Anaconda Python pre-installed.
  2. It installs essential build tools. This is important for libraries that need compilation.
  3. It installs a set of Python libraries like FastAPI, Annoy, and others.

How to Create the Image from the Dockerfile

Creating a Docker image from a Dockerfile is a foundational step in the containerization process. This image acts as a blueprint for your containers and ensures that every instance runs in an identical environment. Here's a step-by-step guide:

  1. Navigate to Your Dockerfile Directory:

Open your terminal or command prompt and navigate to the directory where your Dockerfile is located.

cd path/to/your/Dockerfile/directory

2. Build the Docker Image:

The docker build command is used to create a Docker image from a Dockerfile. The -t flag lets you tag your image with a name so that it's easier to reference later.

docker build -t aiproduct:latest .

Here, aiproduct is the name we're giving the image, and latest is the tag. The dot (.) at the end specifies the context (i.e., the set of files) that Docker should use, which is the current directory in this case.

3. Verify the Image Creation:

To ensure your image has been created and is listed among your local Docker images, run:

docker images

You should see aiproduct in the list of available images with the tag latest.

Running the FastAPI Service with Docker

Once you've built your Docker image, you can run containers based on this image.

Let's break down the command below to run the main.py containing the FastAPI service:

docker run -it -v $(pwd):/app -w /app -p 3000:3000 aiproduct:latest uvicorn main:app --host 0.0.0.0 --port 3000

docker run: This command is used to start a new Docker container from an image.

  • -it: This combination of flags allows you to interact with the container. The -i flag stands for "interactive" and -t allocates a pseudo terminal, allowing for an interactive bash shell in the container.
  • -v $(pwd):/app: This flag mounts the current directory ($(pwd)) on your host machine to the /app directory inside the container. This way, the container can access and run files from your current directory.
  • -w /app: This sets the working directory inside the container to /app. Any command the container runs (like the uvicorn command that follows) will be executed in this directory.
  • -p 3000:3000: This maps port 3000 of your host machine to port 3000 inside the container. This is crucial for accessing the FastAPI service from outside the container.
  • aiproduct:latest: This specifies the Docker image to use for the container. In this case, it's the image we created in the previous section.
  • uvicorn main:app --host 0.0.0.0 --port 3000: This is the command the container will run once it starts. It starts the FastAPI application using Uvicorn, listening on all interfaces (0.0.0.0) and port 3000.

Once you run this command, your FastAPI service will start, and you can access it by navigating to http://0.0.0.0:3000 in your browser.

In the previous tutorial we showed you how to query the FastAPI service. You can do the same now with the difference being that your service now runs inside a Docker container.

Conclusion

By setting up a Docker environment with Anaconda Python, you're paving the way for hassle-free experimentation.

No more "but it works on my machine" moments, just pure, consistent coding bliss.

Whether you're training machine learning models, building web apps, or crunching large datasets, this setup ensures that you have a consistent, reproducible, and isolated environment to work in. Happy experimenting!

🤖 Want to Build the Next Big AI Product?

Join our hands-on, real-life bootcamp and transform your ideas into groundbreaking AI solutions.

Sign Up Now