This blog entry is a bit of deviation from the usual quantitative stuff that I have been writing, this is more about technicalities of presenting dynamic analysis on non-static datasets involving user interactions. To put it simply, we will explore on designing a simple python based web application using Streamlit, deployed with Heroku and utilizing Google Firestore as our database.
The general use case of this setup is a scenario where your dataset is build on the fly with user inputs and predefined set of real time analysis is carried out in python to show in a web application which is sharable. For the case study, I have shown this using data around Indian Premier League(IPL), a major cricketing event in India. The premise is that people like to showcase their cricketing brain and intuition by predicting the winner of matches. IPL consists of 56 league matches thus providing multiple opportunities to keep making predictions based on team composition, ground conditions, performance history etc. So, we are going to create a application where users/players can provide their choices for the upcoming match, we would store that data and pull historical data till date to show insights such as which team is most/least trusted or which player has a better prediction success rate.
Let’s look at storing data in the database first. This is a relatively small dataset so there are no practical concerns, however for a production use case you may want to consider costing and use case compatibility. This streamlib blog provides a good reference to start with setting up Firestore. For this case, the data looks like below. We need to define a ‘collection’, which is called ‘users’ here which is a collection of users/players involved in providing inputs. Each player is now represented as a document within this collection. For each player we can then define a set of key value pairs similar to a python dictionary. In this case the keys are Match numbers and the value is the prediction by the user. You can define the keys independently for each document thus not restricting you to a fixed schema. For this use case, we will not need this property.
To interact with Firestore, you will need a JSON key to authenticate, this can be generated from Project Overview -> Project Setting -> Service Account -> Generate Private Key with Python option selected. For a production use case it is recommended to store this key in a secure manner rather than storing it in your github repository so that external parties can’t alter any dataset.
See sample code below which sets up and authenticates the connection to firestore database using the json key and then fetches a reference pointer to the collection ‘users’ and document <player>. You can now access the data with get, set, update commands.
from google.cloud import firestore
db = firestore.Client.from_service_account_json(“firestorekey.json”)
doc_ref = db.collection(“users”).document(player)
Let’s get into how to setup your code with streamlit. You will require below, you can see details on my Github.
- Python code ( streamlit_app.py) – this is your usual code coupled with streamlit writes, plots, widgets etc
- Procfile – pick this as it is from my repo, make sure you refer to the correct python file name here as this is the execution command. Note that this file has no extension ( like .txt)
- setup.sh – pick this as it is from my repo
- requirements.txt – this is a list of all packages that you are going to use in your python code.
To deploy the webapp we use Heroku. After setting up login, you need to set the deployment. Just connect to Github and chose the repository for the project ( see Figure 2 below). There is an option of automatic deployment i.e. whenever you make a commit in your Github repository, Heroku will capture those changes and reflect on the webapp. If you are doing rapid prototyping then best to use the deploy branch option at the end of the page. You also need to add the python buildpack on the setting tab ( see Figure 3).
Finally, when you have set everything, deploy the branch on heroku and hope that it succesds. You should be able to open the app now. You can visit my sample app https://ipl2021challenger.herokuapp.com/. Here is a screenshot which shows on left hand panel, a way for players to provide the input which will build our dataset over time and on the right hand side analysis on patterns so far. For instance for this dummy data, CSK and DC are the preferred choices which correlates well with their current top standing in the points table, similar inference can be drawn for least preferred teams as well. The second graph shows the success of predictions for the players involved. You can add more widgets to the dashboard, however a word of caution this platform is still in nascent stage so there are plenty of features still unavailable or currently being worked on. So, it is quite possible that you may hit its limitations if you are trying to customize a lot.