Skip to main content

Inference CLI


The following are the CLI commands for ScaleGenAI Inference.

FunctionDescription
createLaunch an inference job.
listList launched inference jobs.
startRestart an inference job once scaled to zero.
deleteDelete the inference job.

create

Run this command to create an inference job.

scalegen infer create [args]

The following is the list of arguments that the command requires:

  • model [required = true]:: string : HuggingFace model name.
  • max_price_per_hour [required = true]:: int : Maximum price per hour.
  • allow_spot_instances [required = False]:: bool : Whether to use spot instances for inference.
  • name [required = true]:: string : The name of the deployment job.
  • hf_token [required = False]:: string : HuggingFace token (required when using a private repository model).
  • logs_bucket [required = true]:: string : Name of the artifacts storage bucket.

Example

scalegen infer create \
--name "test-inference-job"
--model "mistralai/Mistral-7B-Instruct-v0.2" \
--max_price_per_hour 20 \
--allow_spot_instances true \
--hftoken "your_huggingface_token" \
--logs_bucket "your_artifacts_storage"

list

Run this command to list your running inference deployments.

scalegen infer list

To print deployment details, use the -v or --verbose flag

scalegen infer list -v

start

Run this command to start a inference job once it has been scaled to zero.

scalegen infer start <INF_ID>

Example

scalegen infer start test_job_id

delete

Run this command to delete the inference deployment.

scalegen infer delete <INF_ID>

Example

scalegen infer delete test_job_id