Unleashing the Power of AI: Creating a Neural Network Powered Application

What are the necessary steps in creating an application powered by a neural network? What are the best practices for using neural networks both locally and in the cloud? Once we are done with the basics, how to take our solution ever further? How does one handle scaling, hosting multiple neural networks, security, privacy, and monitoring?
If you’ve ever asked yourself any of these questions, you are in luck. You are about to take an intriguing journey into a far-out land of Neural Networks (NN). We will show you how to share the power of your NN by making it available to whomever you want, without fuss. During our journey, we will use the FastAPI framework to create a Python API wrapper around our NN and dockerize our environment. We will take things even further and take a look at scaling, talk about how to mitigate nasty surprises by applying best security practices, and what to do about monitoring and maintenance.
Make yourself comfortable, we’re taking off in 3…2…1!
If you know everything and only want to run and test your brand-new sentiment analysis app powered by an NN, follow these steps:
$ docker build -t sentiment_analysis
$ docker run --rm -it -p 8000:8000 sentiment_analysis
http://localhost:8000
in a browser or by running a cURL command from a shell:curl -X 'POST' \
'http://localhost:8000/sentiment_analysis' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"user_review": "You have done a fantastic job running this neural network :)"
}'
{"sentiment_analysis":"5 stars"}
Ease of use matters. Regardless of how well an NN performs, it will gather dust if it is not easy to use. End users should not have to deal with complex arguments in ML libraries, install numerous dependencies, or download huge models and datasets.
In this section, we will standardize access to our NN by creating an API wrapper around it and making it accessible over the network via a REST HTTP endpoint. The complete repository can be found here.
Key points of interest in the repository are:
a) Endpoint for performing sentiment analysis at ./app/router/sentiment
b) Request body definition, input validation, and preprocessing at ./app/schemas/sentiment
c) NN loading and inference at ./app/services/sentiment
(a) Your endpoints should be carefully designed, easily understood, and hard to misinterpret. The best way to achieve this is by following best practices. Refer to this article if unsure what they might be.
(b) Input validation is a must. Even when your endpoints are closed to the public, it is a good idea to implement them since you can’t be sure the data you receive is valid; if it isn’t, the request should fail loudly and clearly. A Pydantic class precisely defines expected field(s) and datatype(s) in a request body. Besides a basic type check, we perform text preprocessing by removing HTML tags, and by limiting string length to 10,000 characters. HTML tags do not add valuable information to sentiment analysis, and the string length is limited because we want to protect our system from performance degradation or failure in case input strings become exceptionally large.
(c) Loading NN and performing inference will depend on the kind of NN you are using (refer to the respective NN documentation for more details). In our case, docs can be found here. Generally, you should set your desired compute hardware as device
, put your model into evaluation mode model.eval()
, and disable gradient calculation @torch.no_grad()
. This approach ensures optimal inference performance by eliminating overhead from features needed only during NN training.
Again, ease of use matters, for developers as well as users. We’ve all heard variations of the dreaded “It works on my machine” story. Environment isolation comes to the rescue, nobody has to suffer the same dreadful fate again. The goal is to eliminate surprises by always running software in the same environment, in the same way. We achieve this by bundling our code, dependencies, and application execution instructions in one place. A complex project can easily be run by simply executing docker run
. Splendid!
The first step in dockerizing our environment is to create a ./Dockerfile
. This file contains a set of instructions on what a Docker image (our bundle of code, dependencies, and app execution) should look like. The base image is a Docker image on top of which our project will be built. In this case, we chose python:3.11-slim
, a minimal Debian distribution with Python 3.11
preinstalled. Next, we have to define a working directory, install dependencies, copy files, set permissions, etc. Finally, we specify the command that runs our application on container startup.
Next, we need to build an image from ./Dockerfile
. Position your shell in the root of a repository and execute $ docker build -t sentiment_analysis
.
We are all set to run a container from the image. This will create a miniature Linux container that will always run our application in the same way, in the same environment. Start a container with $ docker run --rm -it -p 8000:8000 sentiment_analysis
.
If you are unsure what Dockerfiles, images, and containers are, look at this short post.
Deployment to a test/stage/prod environment is out of scope for this blog post since it differs greatly based on the technology used and other project particularities. However, we do strongly recommend automating your CI/CD pipelines. Regarding Git, consider abiding by Trunk Based Development principles to make your workflow as simple as possible.
The real world is rarely a simple affair and putting software in production is no different. At some point, you are likely to face degraded performance due to high traffic or outage in a part of your infrastructure. Excess of resources is also an issue; you don’t want to pay premium on something you do not use. What you need is a flexible way to scale vertically and horizontally. I am happy to say we’ve solved the scaling problem for you! Okay, if not solved, then at least nudged you in the right direction. 🙂
Vertical scaling is no big deal — all you do is add resources to the existing machine or get a more powerful one. Horizontal scaling, however, can be much more challenging. Good thing we are already on the right track.
We already have a stateless Docker container, a perfect candidate for horizontal scaling. All you need to do is spin up X machines at your favorite cloud provider, run this container on each machine, place machines behind a load balancer and you are done! You have a distributed, fault-tolerant system. For more information on how to achieve this, look here.
Services launched in ECS (AWS flavor of Kubernetes) behind an Application Load Balancer
If your machine(s) are underutilized you might decide to vertically scale them down. If this is not an option, you may decide to increase machine utilization and add redundancy by running, let’s say, four application processes inside each container instead of one. Achieve this by running a container with the command $ docker run --rm -it -p 8000:8000 --env WORKERS=4 sentiment_analysis
. This means there will be 4 application processes (4 APIs wrapped around 4 NNs) running inside a single docker container. This is possible because we have empowered your container with Gunicorn and Uvicorn. Gunicorn acts as an orchestrator and load balancer inside a container, spawning (4 in the above example) Uvicorn workers, each of which in turn spawns a single process of your application. You have scaled horizontally not only inside your cluster, but inside containers as well. Extraordinary!
New challenges will present themselves as the number of NNs you host begins to grow. Resource contention, utilization issues, and the overall complexity of the system will start taking their toll. There is no perfect solution, only tradeoffs; yet, the following is the best ML community has come up with for hosting many NN models.
Create a model server by taking your NNs and placing them on a separate, powerful machine dedicated to performing inference. Make use of existing model server frameworks, like TorchServe and TensorFlow Serving. If you have an extensive and regularly changing business logic in your current API wrapper, consider keeping it outside the model server. This will make each component more manageable and allow you a snappy deployment experience with your business logic APIs while keeping slow redeployment of gigantic NNs tied to the model server.
Separate NNs from business logic
Convenience is at odds when it comes to security and privacy. Every closed attack vector adds additional overhead to maintenance and future development, either for the IT department, developers, or both. This is a simple, yet quite unavoidable fact of life. The trick is to correctly asses the risk your system poses to the greater whole and implement appropriate security measures from there.
The bare minimum every network-accessible API should implement is a simple hardcoded secret which must be present in a header of the incoming request. This will prove effective against unsophisticated attacks from malicious scripts that roam the internet in search of unsecured resources. If the secret is long enough it will even be enough to withstand a brute force attack. This method is easy to implement but is by no means secure. A simple network traffic intercept is enough to expose your secret. But again, hopefully, you are using this in a light-security-needs scenario, not for protecting government and corporate resources. RIGHT???
A recommended approach to secure an API is to use OAuth2 with JWT for authentication and HTTPS for data-in-transit encryption. If you are in the cloud, save yourself some time and consider using one of the existing offers for OAuth2 before implementing it yourself. Always perform input validation and be mindful of the information you share via error messages. An API is considered safe with the combination of these methods. Use them for any API worth protecting.
Monitoring is an essential part of maintaining system health. As the number of elements in your system grows, so does the effort you must put in to make sure every element is performing correctly. Doing this by hand becomes unmanageable very quickly. We recommend using Grafana to visually represent data generated by your applications. You can quickly glance at the Grafana dashboards and make sure your infrastructure is fine. Creating Grafana dashboards from scratch can be challenging, but luckily there is an extensive library of dashboards already present. Look them over before trying to make your own.
Grafana dashboard example
No matter what we do, there will always be exceptions that will catch us off guard. When such a time inevitably comes, it is paramount that we have a solid error-tracing system in place. Not even the best minds will be able to help you if there is no record of what has happened. For this purpose we recommend using Sentry. It’s a capable little app and it comes in a free and managed version.
Congratulations on finishing this challenging, yet intriguing journey. I hope your horizon has been widened and that you are leaving with some useful ideas on how to create your NN-powered application, as well as on how to integrate it into your system. Happy coding!