Finetune Guide

ScaleGenAI can be used to domain-tune your LLMs on your custom datasets. It is compatible with HuggingFace Autotrain API. For extensive FT guides, refer to the HF Autotrain LLM Finetuning docs.

The CLI and API guides are available as linked below. Refer to this section of the doc for information on various fine-tuning parameters.

For CLI and API guides, refer to the following:

CLI Guide | REST API Guide

Model Configuration

Choose a model: Specify the HuggingFace model repository for the model that you want to fine-tune.

HuggingFace Access Token: Your HF access token. Refer to this guide to get your HF token.

Data Configuration

Your training dataset can either come from a HuggingFace data repository, or alternatively you can use your custom remote/local datasets (as virtual mounts) too.

HuggingFace

Dataset Name: Specify the HF data repository. Eg.- argilla/distilabel-capybara-dpo-7k-binarized

Train Subset: Specify the name of the training split. Eg.- train

Text Column: Specify the column with the training data text.

Virtual Mount

Choose a virtual mount: Choose the virtual mount from the drop down. More information on how to configure VM here.

File Name: Specify the data file on the virtual mount.

Train Subset: Specify the name of the training split. Eg.- train

Text Column: Specify the column with the training data text.

Storage Configuration

Specify where you want to store the model checkpoints and model save files after completion of the fine-tuning job.

Push to HuggingFace Model Hub: When enabled the fine-tuned model gets pushed to a private HF model repository. Enables version control.

Checkpoint Store: You can choose an artifacts storage to write your model to. More information on how to configure an artifacts store/checkpoint store here.

Experiment Tracking

Choose between Weights & Biases or CometML to write your training results and metrics to.

Here's the respective guides to get the API keys.

WANDB

COMETML

GPU Configuration

Choose GPU Type: Choose the preferred GPU type for the fine-tuning job.

No. of GPUs: Select the number of GPUs.

Cloud Regions: You can restrict the fine-tuning to a region of your choice, based on data jurisdiction requirements. By default, all regions are selected and ScaleGenAI picks the cheapest GPUs as per your desired config.

You'll get approximate price estimates for your chosen config. Click Update button to update to a new configuration.

Autotrain Parameters

Epochs (int): Number of training epochs. Default is 1.
Learning Rate (float): Learning rate for training. Default is 3e-5.
Batch Size (int): Batch size for training. Default is 2.
Block Size (Union[int, List[int]]): Size of the blocks for training, can be a single integer or a list of integers. Default is -1.
Model Max Length (int): Maximum length of the model input.
Seed (int): Random seed for reproducibility. Default is 42.
Gradient Accumulation (int): Number of steps to accumulate gradients before updating. Default is 1.
Mixed Precision (Optional[str]): Type of mixed precision to use (e.g., ‘fp16’, ‘bf16’, or None). Default is None.
Quantization (Optional[str]): Quantization method to use (e.g., ‘int4’, ‘int8’, or None). Default is “nf4”.
Torch dtype (str): Pytorch data type. Default is 'auto'.
Use Deepspeed (bool): Whether to use Deepspeed distributed backed or not. If disabled, DDP will be used.
Disable Gradient Checkpointing (bool): Whether to disable gradient checkpointing. Default is False.
Use FlashAttention2 (bool): Whether to use flash attention version 2. Default is False.
LoRA (bool) — Whether to use PEFT for finetuning.
LoRA R (int) — Rank of the LoRA matrices. Default is 16.
LoRA Alpha (int) — Alpha parameter for LoRA. Default is 32.
LoRA Dropout (float) — Dropout rate for LoRA. Default is 0.05.

Finetune Guide

Model Configuration​

Data Configuration​

Storage Configuration​

Experiment Tracking​

GPU Configuration​

Autotrain Parameters​