Accelerate BERT Inference with Knowledge Distillation & AWS Inferentia



Hugging Face SageMaker Workshop:

Accelerate BERT Inference with Knowledge Distillation & AWS Inferentia


This workshop will demonstrate how to accelerate BERT Inference using different optimization techniques like knowledge distillation to make the model fast and keep the accuracy. 🏎

 

In the workshop, you will learn how to apply knowledge distillation to compress a large model to a small model, and then from the small model to an optimized neuron model with AWS Inferentia. By the end of this process, our model will go from 100ms+ to 5ms+ latency - a 20x improvement! 🤯 🏎

 

You will learn how to:

  • Apply knowledge-distillation with BERT-large as teacher and MiniLM as student
  • Compile a Hugging Face Transformer model with AWS Neuron for AWS Inferentia
  • Deploy the distilled & optimized model to Amazon SageMaker for production-grade fast inference

The workshop is a hands-on workshop where you will get temporary free access to AWS accounts to participate and accelerate your own model.

Speaker and Presenter Information

Heiko Hotz
AI/ML Solutions Architect @ AWS

 

Lewis Tunstall
Machine Learning Engineer @ Hugging Face

 

Philipp Schmid
Tech Lead @ Hugging Face

Relevant Government Agencies

Other Federal Agencies, Federal Government, State & Local Government


Event Type
Webcast


This event has no exhibitor/sponsor opportunities


When
Wed, Apr 13, 2022, 12:00pm - 1:15pm ET


Cost
Complimentary:    $ 0.00


Website
Click here to visit event website


Organizer
Hugging Face | Amazon Web Services (AWS)


Contact Event Organizer



Return to search results