Distributed Inferences

Distributed inference enables running inference workloads across multiple pods, typically to scale model serving beyond a single container or node. This approach is useful when a single instance cannot meet resource requirements.NVIDIA Run:ai supports this model using Leader Worker Set (LWS). Each pod plays a specific role, either as a leader or worker, and together they form a coordinated service. NVIDIA Run:ai manages the orchestration and configuration of these pods to ensure efficient and scalable inference execution

Create a distributed inference.

Create a distributed inference using container related fields.

SecuritybearerAuth
Request
Request Body schema: application/json
name
required
string (WorkloadName) non-empty .*

The name of the workload.

useGivenNameAsPrefix
boolean
Default: false

When true, the requested name will be treated as a prefix. The final name of the workload will be composed of the name followed by a random set of characters.

projectId
required
string (ProjectId) .*

The id of the project.

clusterId
required
string <uuid> (ClusterId)

The id of the cluster.

templateId
string or null <uuid>

The unique identifier of the template to use for submitting this workload. The combined values provided in the spec, template and assets will be used to create the workload.

object or null (DistributedInferenceSpecSpec)
Responses
202

Request completed successfully.

400

Bad request.

401

Unauthorized

403

Forbidden

503

unexpected error

post/api/v1/workloads/distributed-inferences
Request samples
application/json
{
  • "name": "my-workload-name",
  • "useGivenNameAsPrefix": true,
  • "projectId": 1,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "templateId": "550e8400-e29b-41d4-a716-446655440000",
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Delete a distributed inference.

Delete a distributed inference using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
202

Accepted.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

delete/api/v1/workloads/distributed-inferences/{workloadId}
Response samples
application/json
{
  • "code": 202,
  • "message": "Request has been accepted."
}

Get a distributed inference data.

Retrieve a distributed inference details using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/distributed-inferences/{workloadId}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Update distributed inference spec.

Update the specification of an existing distributed inference workload.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Request Body schema: application/json
object or null
Responses
202

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

patch/api/v1/workloads/distributed-inferences/{workloadId}
Request samples
application/json
{
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}