Tips & Tricks of Deploying Deep Learning Webapp on Heroku Cloud

Things I learned by deploying Tensorflow based image classifier Streamlit app on Heroku server

Abid Ali Awan
Towards Data Science

--

Image by author

Heroku is a famous platform among web developers and machine learning enthusiasts. The platform provides a easy ways to deploy and maintain the web application, but if you are not familiar with deploying deep learning applications, you might struggle with storage and dependence issues.

This guide will make your deployment process smother so that you can focus on creating amazing web applications. We will be learning about DVC integration, Git & CLI-based deployment, error code H10, playing around with python packages, and optimizing storage.

Git & CLI-based Deployment

The Streamlit app can be deployed by either Git based, GitHub integration or using Docker. The Git based is by far faster and easier way to deploy any data app on Heroku server.

Simple Git-based

The streamlit app can be deployed using:

git remote add heroku https://heroku:$HEROKU_API_KEY@git.heroku.com/<name of your heroku app>.git

git push -f heroku HEAD:master

For this to work you need:

  • Heroku API Key
  • Heroku App: either by CLI or using website.
  • Git based project
  • Procfile
  • Requirements.txt

CLI-based

CLI based deployment is basic and easy to learn.

Image by Author
  1. Create a free Heroku account here.
  2. Install Heroku CLI using this link.
  3. Either clone remote repository or use git init
  4. Type heroku login and heroku create dagshub-pc-app. This will log you into the server and create an app on a web server.
  5. Now create Procfile containing the commands to run the app: web: streamlit run --server.port $PORT streamlit_app.py
  6. Finally, commit and push code to heroku server git push heroku master

PORT

If you are running the app with streamlit run app.py it will produce an error code H10 which means $PORT assigned by the server was not used by the streamlit app.

You need to:

  • Set PORT by using Heroku CLI
heroku config:set PORT=8080
  • Make changes in your Procfile and add server port in arguments.
web: streamlit run --server.port $PORT app.py

Tweaking Python Packages

This part took me two days to debug as Heroku cloud comes with a 500MB limitation and the new TensorFlow package is 489.6MB. To avoid dependencies and storage issues we need to make changes in the requirements.txt file:

  1. Add tensorflow-cpu instead of tensorflow which will reduce the slug size from 765MB to 400MB.
  2. Add opencv-python-headless instead of opencv-python to avoid installing external dependencies. This will resolve all the cv2 errors.
  3. Remove all unnecessary packages except numpy, Pillow and streamlit.

DVC Integration

Image by Author

There are a few steps required for successfully pulling data from the DVC server.

  1. First, we will install a buildpack that will allow the installation of apt-files by using Heroku APIheroku buildpacks:add --index 1 heroku-community/apt
  2. Create a file name Aptfile and add the latest DVC version https://github.com/iterative/dvc/releases/download/2.8.3/dvc_2.8.3_amd64.deb
  3. In your app.py file add extra lines of code to pull data from remote DVC server:
import os

if "DYNO" in os.environ and os.path.isdir(".dvc"):
os.system("dvc config core.no_scm true")
if os.system(f"dvc pull") != 0:
exit("dvc pull failed")
os.system("rm -r .dvc .apt/usr/lib/dvc")

After that, commit and push your code to Heroku server. Upon successful deployment, the app will automatically pull the data from DVC server.

Optimizing Storage

There are multiple ways to optimize storage and the most common is to use Docker. By using the docker you can bypass the 500MB limit, you also have the freedom to install any third-party integration or packages. To learn more about how to use docker check out this guide.

For optimizing storage:

  • Only add model inference python libraries in requiremnets.txt
  • We can pull selective data from DVC by using dvc pull {model} {sample_data1} {sample_data2}..
  • We only need model inference file so add rest of files to.slugignore which works similarly to .gitignore, to learn more check out Slug Compiler.
  • Remove .dvc and .apt/usr/lib/dvc directory after successfully pulling the data from server.

Outcomes

The initial slug size was 850MB but with storage and package optimizations the final slug size was reduced to 400MB. We have solved error code H10 with a simple command and added opencv-python-headless package to solve dependency issues. This guide was created to overcome some of the common problems faced by beginners on Heroku servers.

The Docker-based deployment can solve a lot of storage problems but it comes with complexity and a slow deployment process. You can use heroku container:push web but before that, you need to build the docker, test it, and resolve all the issues locally before you can push it. This method is preferred by advance Heroku users.

The next challenge is to deploy your web application with webhook. This will allow you to automate the entire machine learning ecosystem from any platform. The automation process will involve creating a simple Flask web server that will run shell commands.

Originally published at KDnuggets on December 24, 2021.

--

--