Deploy Llama-3.1-8B-Instruct

You can launch an inference job with Llama-3.1-8B-Instruct using the quick-launch templates.

info

Templates come with pre-filled configurations that are best optimized for high-throughput and low-latency. You can modify the configuration as per your requirements.

To launch the inference job, head over to the Inference tab.

You can select Llama-3.1-8B-Instruct from one of the available templates.

Alternatively, click on + New Inference button and choose Llama. Thereafter, choose meta-llama/Llama-3.1-8B-Instruct.

Enter your HuggingFace token. This will be used for pulling the model weights from the HF Llama repository.

note

All open-sourced Llama models by Meta require agreement to the community license agreement. If not done prior to launching the inference job, it will result in deployment failure.

Head over to the model card on HuggingFace and agree to the T&C to proceed.

Click the Deploy Model button. The model will be up and running after a few minutes of provisioning.

And just like that, you have your own dedicated private Llama-3.1-8B-Instruct deployment. Once the deployment is in a Running state, you can take the model endpoint and model API key, plug it into the OpenAI SDK and query the model.