Inferences

Inference workloads deploy trained models into a production environment to generate predictions from live data. These workloads are prioritized over Trainings and Workspaces during scheduling. NVIDIA Run:ai Inference workloads support auto-scaling to maintain service-level agreements (SLAs) by dynamically adjusting resources as demand changes.

Create an inference.

Create an inference using container related fields.

SecuritybearerAuth
Request
Request Body schema: application/json
name
required
string (WorkloadName) non-empty

The name of the workload.

useGivenNameAsPrefix
boolean
Default: false

When true, the requested name will be treated as a prefix. The final name of the workload will be composed of the name followed by a random set of characters.

projectId
required
string (ProjectId2)

The id of the project.

clusterId
required
string <uuid> (ClusterId)

The id of the cluster.

object or null (InferenceSpecSpec)
Responses
202

Request completed successfully.

400

Bad request.

401

Unauthorized

403

Forbidden

503

unexpected error

post/api/v1/workloads/inferences
Request samples
application/json
{
  • "name": "my-workload-name",
  • "useGivenNameAsPrefix": true,
  • "projectId": 1,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Delete an inference.

Delete an inference using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
202

Accepted.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

delete/api/v1/workloads/inferences/{workloadId}
Response samples
application/json
{
  • "code": 202,
  • "message": "Request has been accepted."
}

Get inference data.

Retrieve inference details using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Update inference spec. [Experimental]

Update the specification of an existing inference workload.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Request Body schema: application/json
object or null (CommonFlatFields)
Responses
202

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

patch/api/v1/workloads/inferences/{workloadId}
Request samples
application/json
{
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Get inference metrics data.

Retrieve inference metrics data by id. Supported from control-plane version 2.18 or later.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

query Parameters
metricType
required
Array of strings (InferenceWorkloadMetricType)

Specify which data to request.

Items Enum: "THROUGHPUT" "LATENCY"
start
required
string <date-time>

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: start=2023-06-06T12:09:18.211Z
end
required
string <date-time>

End date of time range to fetch data in ISO 8601 timestamp format.

Example: end=2023-06-07T12:09:18.211Z
numberOfSamples
integer [ 0 .. 1000 ]
Default: 20

The number of samples to take in the specified time range.

Example: numberOfSamples=20
Responses
200

Executed successfully.

207

Partial success.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}/metrics
Response samples
{
  • "measurements": [
    ]
}

Get inference pod's metrics data.

Retrieve inference metrics pod's data by workload and pod id. Supported from control-plane version 2.18 or later.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

podId
required
string <uuid>

The requested pod id.

query Parameters
metricType
required
Array of strings (InferencePodMetricType)

Specifies metrics data to request. Inference metrics are only available for inference workloads.

Items Enum: "THROUGHPUT" "LATENCY"
start
required
string <date-time>

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: start=2023-06-06T12:09:18.211Z
end
required
string <date-time>

End date of time range to fetch data in ISO 8601 timestamp format.

Example: end=2023-06-07T12:09:18.211Z
numberOfSamples
integer [ 0 .. 1000 ]
Default: 20

The number of samples to take in the specified time range.

Example: numberOfSamples=20
Responses
200

Executed successfully.

207

Partial success.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}/pods/{podId}/metrics
Response samples
{
  • "measurements": [
    ]
}