Inference CLI
The following are the CLI commands for ScaleGenAI Inference.
Function | Description |
---|---|
create | Launch an inference job. |
list | List launched inference jobs. |
start | Restart an inference job once scaled to zero. |
delete | Delete the inference job. |
create
Run this command to create an inference job.
scalegen infer create [args]
The following is the list of arguments that the command requires:
model
[required = true]:: string : HuggingFace model name.max_price_per_hour
[required = true]:: int : Maximum price per hour.allow_spot_instances
[required = False]:: bool : Whether to use spot instances for inference.name
[required = true]:: string : The name of the deployment job.hf_token
[required = False]:: string : HuggingFace token (required when using a private repository model).logs_bucket
[required = true]:: string : Name of the artifacts storage bucket.
Example
scalegen infer create \
--name "test-inference-job"
--model "mistralai/Mistral-7B-Instruct-v0.2" \
--max_price_per_hour 20 \
--allow_spot_instances true \
--hftoken "your_huggingface_token" \
--logs_bucket "your_artifacts_storage"
list
Run this command to list your running inference deployments.
scalegen infer list
To print deployment details, use the -v
or --verbose
flag
scalegen infer list -v
start
Run this command to start a inference job once it has been scaled to zero.
scalegen infer start <INF_ID>
Example
scalegen infer start test_job_id
delete
Run this command to delete the inference deployment.
scalegen infer delete <INF_ID>
Example
scalegen infer delete test_job_id