Accelerate BERT Inference with Knowledge Distillation & AWS Inferentia
Hugging Face SageMaker Workshop:
Accelerate BERT Inference with Knowledge Distillation & AWS Inferentia
This workshop will demonstrate how to accelerate BERT Inference using different optimization techniques like knowledge distillation to make the model fast and keep the accuracy. 🏎
In the workshop, you will learn how to apply knowledge distillation to compress a large model to a small model, and then from the small model to an optimized neuron model with AWS Inferentia. By the end of this process, our model will go from 100ms+ to 5ms+ latency - a 20x improvement! 🤯 🏎
You will learn how to:
- Apply knowledge-distillation with BERT-large as teacher and MiniLM as student
- Compile a Hugging Face Transformer model with AWS Neuron for AWS Inferentia
- Deploy the distilled & optimized model to Amazon SageMaker for production-grade fast inference
The workshop is a hands-on workshop where you will get temporary free access to AWS accounts to participate and accelerate your own model.
Speaker and Presenter Information
Heiko Hotz
AI/ML Solutions Architect @ AWS
Lewis Tunstall
Machine Learning Engineer @ Hugging Face
Philipp Schmid
Tech Lead @ Hugging Face
Relevant Government Agencies
Other Federal Agencies, Federal Government, State & Local Government
Event Type
Webcast
This event has no exhibitor/sponsor opportunities
When
Wed, Apr 13, 2022, 12:00pm - 1:15pm
ET
Cost
Complimentary: $ 0.00
Website
Click here to visit event website
Organizer
Hugging Face | Amazon Web Services (AWS)