Docker Tutorial for Information Scientists



Docker Tutorial for Data Scientists
Picture by Writer


Python and the suite of Python knowledge evaluation and machine studying libraries like pandas and scikit-learn provide help to develop knowledge science functions with ease. Nevertheless, dependency administration in Python is a problem. When engaged on an information science undertaking, you’ll must spend substantial time putting in the varied libraries and conserving monitor of the model of the libraries you’re utilizing amongst others.

What if different builders need to run your code and contribute to the undertaking? Nicely, different builders who need to replicate your knowledge science utility ought to first arrange the undertaking setting on their machine—earlier than they’ll go forward and run the code. Even small variations equivalent to differing library variations can introduce breaking modifications to the code. Docker to the rescue. Docker simplifies the event course of and facilitates seamless collaboration.

This information will introduce you to the fundamentals of Docker and educate you tips on how to containerize knowledge  science functions with Docker.



Docker Tutorial for Data Scientists
Picture by Writer


Docker is a containerization instrument that permits you to construct and share functions as transportable artifacts known as photographs

Other than supply code, your utility can have a set of dependencies, required configuration, system instruments, and extra. For instance, in an information science undertaking, you’ll set up all of the required libraries in your growth setting (ideally inside a digital setting). You’ll additionally make sure that you’re utilizing an up to date model of Python that the libraries help. 

Nevertheless, you should still run into issues when attempting to run your utility on one other machine. These issues typically come up from mismatched configuration and library variations—within the growth setting—between the 2 machines.

With Docker, you’ll be able to bundle your utility—together with the dependencies and configuration. So you’ll be able to outline an remoted, reproducible, and constant setting on your functions throughout the vary of host machines.



Let’s go over a couple of ideas/terminologies:


Docker Picture


A Docker picture is the transportable artifact of your utility. 


Docker Container


While you run a picture, you’re primarily getting the appliance operating contained in the container setting. So a operating occasion of a picture is a container.


Docker Registry


Docker registry is a system for storing and distributing Docker photographs. After containerizing an utility right into a Docker picture, you can also make it obtainable for the developer neighborhood by pushing them to a picture registry. DockerHub is the most important public registry, and all photographs are pulled from DockerHub by default.



As a result of containers present an remoted setting on your functions, different builders now solely must have Docker arrange on their machine. They usually can begin containers they’ll pull the Docker picture and begin containers utilizing a single command—with out having to fret about complicated installations—in distant 

When growing an utility, it’s also widespread to construct and check a number of variations of the identical app. Should you use Docker, you’ll be able to have a number of variations of the identical app operating inside totally different containers—with out any conflicts—in the identical setting.

Along with simplifying growth, Docker additionally additionally simplifies deployment and helps the event and operations groups to collaborate successfully. On the server aspect, the operations crew would not must spend time resolving complicated model and dependency conflicts. They solely must have a docker runtime arrange



Let’s rapidly go over some fundamental Docker instructions most of which we’ll use on this tutorial. For a extra detailed overview learn: 12 Docker Instructions Each Information Scientist Ought to Know.

Command Operate
docker ps Lists all operating containers
docker pull image-name Pulls image-name from DockerHub by default
docker photographs Lists all of the obtainable photographs
docker run image-name Begins a container from a picture
docker begin container-id Restarts a stopped container
docker cease container-id Stops a operating container
docker construct path Builds a picture on the path utilizing directions within the Dockerfile


Observe: Run all of the instructions by prefixing sudo should you haven’t created the docker group with the person.



We’ve realized the fundamentals of Docker, and it’s time to use what we’ve realized. On this part, we’ll containerize a easy knowledge science utility utilizing Docker.


Home Worth Prediction Mannequin


Let’s take the next linear regression mannequin that predicts the goal worth: the median home value based mostly on the enter options. The mannequin is constructed utilizing the California housing dataset:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the California Housing dataset
knowledge = fetch_california_housing(as_frame=True)
X = knowledge.knowledge
y = knowledge.goal

# Cut up the dataset into coaching and check units
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize options
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.rework(X_test)

# Prepare the mannequin
mannequin = LinearRegression()
mannequin.match(X_train, y_train)

# Make predictions on the check set
y_pred = mannequin.predict(X_test)

# Consider the mannequin
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Imply Squared Error: {mse:.2f}")
print(f"R-squared Rating: {r2:.2f}")


We all know that scikit-learn is a required dependency. Should you undergo the code, we set as_frame equal to True when loading the dataset . So we additionally want pandas. And the necessities.txt file seems like so:



Docker Tutorial for Data Scientists
Picture by Writer


Create the Dockerfile


Thus far, we’ve got the supply code file and the necessities.txt file. We should always now outline how to construct a picture from our utility. The Dockerfile is used to create this definition of constructing a picture from the appliance supply code recordsdata.

So what’s a Dockerfile? It’s a textual content doc that incorporates step-by-step directions to construct the Docker picture.


Docker Tutorial for Data Scientists
Picture by Writer


Right here’s the Dockerfile for our instance:

# Use the official Python picture as the bottom picture
FROM python:3.9-slim

# Set the working listing within the container

# Copy the necessities.txt file to the container
COPY necessities.txt .

# Set up the dependencies
RUN pip set up --no-cache-dir -r necessities.txt

# Copy the script file to the container

# Set the command to run your Python script
CMD ["python", ""]


Let’s break down the contents of the Dockerfile:

  • All Dockerfiles begin with a FROM instruction specifying the bottom picture. Base picture is that picture on which your picture is predicated. Right here we use an obtainable picture for Python 3.9. The FROM instruction tells Docker to construct the present picture from the required base picture.
  • The SET command is used to set the working listing for all the next instructions (app on this instance).
  • We then copy the necessities.txt file to the container’s file system. 
  • The RUN instruction executes the required command—in a shell—contained in the container. Right here we set up all of the required dependencies utilizing pip
  • We then copy the supply code file—the Python script—to the container’s file system.
  • Lastly CMD refers back to the instruction to be executed—when the container begins. Right here we have to run the script. The Dockerfile ought to comprise just one CMD instruction.


Construct the Picture


Now that we’ve outlined the Dockerfile, we are able to construct the docker picture by operating the docker construct:


The choice -t permits us to specify a reputation and tag for the picture within the title:tag format. The default tag is newest

The construct course of takes a few minutes:

Sending construct context to Docker daemon  4.608kB
Step 1/6 : FROM python:3.9-slim
3.9-slim: Pulling from library/python
5b5fe70539cd: Pull full 
f4b0e4004dc0: Pull full 
ec1650096fae: Pull full 
2ee3c5a347ae: Pull full 
d854e82593a7: Pull full 
Digest: sha256:0074c6241f2ff175532c72fb0fb37264e8a1ac68f9790f9ee6da7e9fdfb67a0e
Standing: Downloaded newer picture for python:3.9-slim
 ---> 326a3a036ed2
Step 2/6 : WORKDIR /app
Step 6/6 : CMD ["python", ""]
 ---> Working in 7fcef6a2ab2c
Eradicating intermediate container 7fcef6a2ab2c
 ---> 2607aa43c61a
Efficiently constructed 2607aa43c61a
Efficiently tagged ml-app:newest


After the Docker picture has been constructed, run the docker photographs command. It’s best to see theml-app picture listed, too.


Docker Tutorial for Data Scientists

You’ll be able to run the Docker picture ml-app utilizing the docker run command:


Docker Tutorial for Data Scientists


Congratulations! You’ve simply dockerized your first knowledge science utility. By making a DockerHub account, you’ll be able to push the picture to it (or to a personal repository throughout the group).



Hope you discovered this introductory Docker tutorial useful. You will discover the code used on this tutorial in this GitHub repository. As a subsequent step, arrange Docker in your machine and do this instance. Or dockerize an utility of your selection. 

The best solution to set up Docker in your machine is utilizing Docker Desktop: you get each the  Docker CLI shopper in addition to a GUI to handle your containers simply. So arrange Docker and get coding immediately!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embody DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At the moment, she’s engaged on studying and sharing her information with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra.