Use Hugging Face Models

HuggingFace provides pre-trained models, fine-tuning scripts, and development APIs that make the process of creating and discovering LLMs easier. Model Garden can serve Text Embedding Inference, Regular Pytorch Inference, and Text Generation Inference supported models in HuggingFace.

Deployment options for Hugging Face models

You can deploy supported Hugging Face models in Vertex AI or Google Kubernetes Engine (GKE). The deployment option you choose can depend on the model you're using and how much control you want over your workloads.

Deploy in Vertex AI

Vertex AI offers a managed platform for building and scaling machine learning projects without in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Hugging Face models. We recommend using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.

  1. To deploy a supported Hugging Face model in Vertex AI, go to Model Garden.

    Go to Model Garden

  2. Go to the Open source models on Hugging Face section and click Show more.

  3. Find and select a model to deploy.

  4. Optional: For the Deployment environment, select Vertex AI.

  5. Optional: Specify the deployment details.

  6. Click Deploy.

To get started, see the following examples:

Deploy in GKE

Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. We recommend this option if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements.

  1. To deploy a supported Hugging Face model in GKE, go to Model Garden.

    Go to Model Garden

  2. Go to the Open source models on Hugging Face section and click Show more.

  3. Find and select a model to deploy.

  4. For the Deployment environment, select GKE.

  5. Follow the deployment instructions.

To get started, see the following examples: