Use case in production
Neural network optimization for inference deployments
Sarosh Quraishi
Machine Learning Specialist, Intel
Sarosh Quraishi is a Machine learning specialist at Intel and he is currently working with customers on solving model deployment challenges for deep learning models. In past, Sarosh has completed a Ph.D. from the Indian Institute of Technology and a Postdoc in Applied Mathematics from TU Berlin. He worked on solving parametric eigenvalue problems during his Postdoc.

Session description
Day by day, deep learning models are becoming larger in size and harder to deploy. We will cover various issues that developers face while deploying deep learning models and how to solve them by using popular network compression technologies, such as quantization, pruning, knowledge distillation. These techniques reduce the size of the model and improve performance metrics like latency and throughput. We introduce the Intel® Neural Compressor and show how it works seamlessly with various frameworks.