AI-assisted Newsletter Dashboard
Developer Advocate, onetask.ai
Moritz is a Data Engineering Master's student who combines his passion for AI and communities as a developer relations advocate at onetask.ai, a startup that provides companies and interested individuals with a solution for all their NLP labeling needs. Having worked as a crowd labeler in highschool, he understands that high quality labels are a necessity for well performing machine learning solutions. During his studies he worked as a teaching assistant at the University of Potsdam and in the data management departement of the Berliner Volksbank, one of the largest cooperative banks in Germany. He loves engaging with a community that is sharing knowledge and experience, especially within the fast moving domain of Artificial Intelligence where the next state-of-the-art is just one conference paper away.
This workshop will take you through an application of NLP in the real world. We will build a personalized and extendable newsletter dashboard from a given E-Mail-Collection. Within 3 hours we will scrape a prepared inbox (you can also apply the techniques to your own one), clean the texts, convert them into embedding vectors and craft a feature matrix that will be used for content-based recommendations. All of this will then come together in a comprehensive Streamlit application. The workshop is also accompanied by theoretical concepts to help you better understand the contents and applications of common NLP techniques.
By the end of the workshop, you will have an understanding of some fundamental NLP concepts and will be able to apply these to build your own applications using only python with popular frameworks like streamlit or fastAPI.
We divided the use case into six categories in which we each will first build a theoretical foundation and then dive into the hands-on coding part. We prepare the data for the key steps beforehand so that a person could still follow the workshop even though they do not manage to code everything in time. The six categories are:
Data export from Gmail
Building training data
Fine-tuning existing models
Building a fast and minimal backend using FastAPI
Finishing up with the streamlit dashboard
Since this is a hands-on workshop, you should bring your own laptop or have a friend with whom you can share the session.
The libraries we use are all written in Python, so you should be comfortable with writing Python code and know yourself around in a Jupyter notebook.
We will have hosted the necessary files on a GitHub page that also contains installation instructions for setting up a virtual python environment. Setting up the environment before the session helps us in focussing our time on the fun parts!
You do not have to bring prior knowledge regarding Natural Language Processing, but doing so certainly helps. If you haven’t heard of “embeddings”, “tokenizer” and “transformer models” yet you should still be able to follow the workshop.
If you can’t run Jupyter Notebooks on your own device or you are very RAM-restricted, we would recommend having a Google account so that you can use Google Colab for free notebook access.