Danial Khosravi's Blog

Entrepreneur in the making...

Shipping Deep Learning Models in Web and Mobile Applications

| Comments

Web Application (running in your browser!)

IOS Application

Source Code

Creating machine learning models are often very fun for us AI/ML/DL enthusiasts and thanks to Python’s great community and great libraries such as TensroFlow and Keras this process has become very easy. With the help of Keras we can put layers together like pieces of a puzzle and focus on our architecture and performance and make things work within our beloved Jupyter Notebooks.

In practice, we often are building these models to help our users/clients and often we need to be able to ship these models somewhere and integrate them with our applications so our users can benefit from our models.

A usual approach is to create an API over the model so our backend can communicate with it and send the results back to the client. However, there are a few issues with this approach:

  • First issue is the latency. Our app probably wouldn’t be as responsive as we want it to be if we have to make calls to our back-end and wait for the result to come back.

  • The second issue is the legal implications of sending user’s text or image over the network. Imagine if you have a translator app that takes an image of a document and translates the document for you. If your user is going to translate a confidential legal document with this app, it complicates the legal implications of your applications and the type of security you need to build around it.

  • Lastly, with this approach our users cannot benefit from our app when they are offline.

What if we could bundle our models with our app so all the model scoring could happen our their machine and they don’t need to be online?
Well, that is exactly the solution.

We often make our models very deep to squeeze out as much accuracy as we can. But in reality a model that is 88% accurate and runs fast on an average mobile device and is very small in size is much better than a deep model that is offering 90% accurate but it’s not responsive on slow devices and increases our bundle size by big amount.

By tweaking our architecture and letting go of a few accuracy points, we often can create smaller versions of our models that are shippable in a mobile app or can be served over the web and can bring a lot of value to our users.

Few rules of thumb on the architecture of models:

  • Start small and go big/deep

  • Use Convolutions instead of LSTMs or GRUs wherever you can. LSTMs and GRUs are great and often times they are our only solution when not just remembering the past, but also the order of the information matters to our problem. However, there are many cases that Convolutions can offer the same accuracy if not better than LSTMs when our problem is translational invariant such as sentiment analysis or most of the computer vision problems.

  • Remember in practice, 85% accuracy is often better than not having the feature in our application at all so don’t bother too much and improve incrementally

Source Code

To demonstrate the idea, I created a sentiment analysis application that runs:

  • on browser using TensorFlowJS in React application
  • on IOS using CoreML in a React Native Application
  • I didn’t bother with the Android after getting IOS and Web working, but if anyone sends a PR I’d be glad to merge

You can find all the codes for model training, web and mobile application HERE ON GITHUB