AI-assisted Newsletter Dashboard
Co-Founder and CEO, onetask.ai
Johannes Hötter is a data engineer/scientist and co-founder of onetask.ai, a development environment for AI training data with which data scientists can build large-scale, high-quality training data easily. He is deeply passionate about AI and education, and has already taught more than 20,000 people about the fundamentals of AI within online courses. Before founding onetask, he worked as an AI consultant in projects such as weather forecasting, database chatbots (translating natural language to SQL) or reimbursement predictions in e-commerce systems.
This workshop will take you through an application of NLP in the real world. We will build a personalized and extendable newsletter dashboard from a given E-Mail-Collection. Within 3 hours we will scrape a prepared inbox (you can also apply the techniques to your own one), clean the texts, convert them into embedding vectors and craft a feature matrix that will be used for content-based recommendations. All of this will then come together in a comprehensive Streamlit application. The workshop is also accompanied by theoretical concepts to help you better understand the contents and applications of common NLP techniques.
By the end of the workshop, you will have an understanding of some fundamental NLP concepts and will be able to apply these to build your own applications using only python with popular frameworks like streamlit or fastAPI.
We divided the use case into six categories in which we each will first build a theoretical foundation and then dive into the hands-on coding part. We prepare the data for the key steps beforehand so that a person could still follow the workshop even though they do not manage to code everything in time. The six categories are:
Data export from Gmail
Building training data
Fine-tuning existing models
Building a fast and minimal backend using FastAPI
Finishing up with the streamlit dashboard
Since this is a hands-on workshop, you should bring your own laptop or have a friend with whom you can share the session.
The libraries we use are all written in Python, so you should be comfortable with writing Python code and know yourself around in a Jupyter notebook.
We will have hosted the necessary files on a GitHub page that also contains installation instructions for setting up a virtual python environment. Setting up the environment before the session helps us in focussing our time on the fun parts!
You do not have to bring prior knowledge regarding Natural Language Processing, but doing so certainly helps. If you haven’t heard of “embeddings”, “tokenizer” and “transformer models” yet you should still be able to follow the workshop.
If you can’t run Jupyter Notebooks on your own device or you are very RAM-restricted, we would recommend having a Google account so that you can use Google Colab for free notebook access.