Why #datalift is hard work
Updated: May 31
Why is an event series on productionizing data analytics and machine learning solutions so valuable?
Deploying models to production typically means building an API for the model, and integrating it with existing products or workflows. That way, the model outputs (e.g. predictions, classifications, or recommendations) are actively in use by consumers (e.g. customers or internal stakeholders).
Watch the recorded live sessions in our #datalift playlist
Deploy! is what #datalift is all about
It might be easy to think a project is done once algorithm development is complete. The model built has promising results, enough to greenlight the integration into the production environment. This is a crucial step for businesses to get value from their data.
And it is still hard to achieve. Just consider the following observations.
2019: An astonishing 87% of data science projects never make it into production (VentureBeat).
2020: A survey reports that 55% of companies had not yet deployed a machine learning model (2020 State of enterprise machine learning).
2021: Times are changing. More than half of all respondents of a new survey have more than 25 models in production (2021 enterprise trends in machine learning).
Siloed infrastructure and legacy systems
Often companies are working with legacy and/or closed systems. This means that data and computing resources are not easily or fully accessible by other departments or applications. Without a trustworthy integration and standardization, the decisions based on data are slower, have poor quality, or in the worst case, not possible.
There are also missed opportunities to reuse computing resources, which increases IT and infrastructure costs. And business value delivery becomes more difficult.
Leaders looking to drive innovation by leveraging data analytics and ML should shift from silos to centralized data management solutions, using data and software integration.
The complexity of ML systems
While it's not hard to find job descriptions for full-stack data scientists, most data scientists do not have experience managing the entire lifecycle of the model but focus on how to build algorithms and how to work with data. ML model development is just a small part of the entire ML system.
The handcrafted deployment of models with DIY solutions reflects the immaturity of MLOps as a discipline. It also reflects the scarcity of experienced ML software engineers and other more project-focused roles. This happens when data teams tend to focus more on data science tools and less on integration and implementation.
Although the barrier to entry to building data solutions is much lower today than it was a few years ago, due to the open-source tools and frameworks such as TensorFlow and SciKit-Learn, a high level of MLOps maturity is required in order to add value with ML.
Investing in MLOps is the way to make deployment, monitoring, governance, and security of data solutions highly scalable.
Internal processes and ownership
You might be asking now, who should be responsible for model deployment and maintenance if it's not the data scientists?
On traditional SaaS teams, Backend or DevOps Engineers handle the deployment of code to production (for both practical and audit purposes), as well as the monitoring of production infrastructure and its performance. Should the data or ML engineers be responsible for it then?
We have seen that approaches vary depending on the company and team size. In addition, job titles and scopes are still blurry. This means that it's of little importance if one is in a data scientist or engineer function: both are building the ML systems with equally, matching quality.
Need for a cross-functional approach
As with any deep technology, AI projects are likely to have success across the organization if C-level executives can enable change management and get commitment from senior leadership.
To help data professionals excel in their roles, leaders need to get alignment across multiple decision-makers and business functions, and different departments need to communicate frequently with one another.
Extra - Confidentiality
If the use case is deployed, it is interesting for #datalift. But how do we get the use case on stage?
While companies enjoy success talking about technology, process, and business outcomes, bringing it all on stage is subject to its own challenges. We typically perform the following balancing act:
Respecting business interest while seeking as much information as possible on the specific details of the use case
Respecting confidentiality while working to make good and best practice in deployment visible
Creating as much value as possible for the audience, also with interactive features, while promoting the company's use case too.
Occasionally, we have not been able to get the use case on stage. It means we try again because companies and practitioners will advance AI adoption most if they can share enough experience and insight to move the data economy forward.
What #datalift achieves
Across more than ten industries, from leading corporates to leading startups, and on to new firms, we have heard from those having use cases in production. With a loyal audience of practitioners, of which more than half have roles as Seniors, Leads, Directors, and CxO, the #datalift event series from No 1 to 5 has become the key forum for exchange on what works and what is best practice.
We are looking to build an ecosystem together with the companies most interested in bridging the gap to deployment.
If you are interested, book a conversation here and we will share more details.