Distributed Inferences

Distributed inference enables running inference workloads across multiple pods, typically to scale model serving beyond a single container or node. This approach is useful when a single instance cannot meet resource requirements.NVIDIA Run:ai supports this model using Leader Worker Set (LWS). Each pod plays a specific role, either as a leader or worker, and together they form a coordinated service. NVIDIA Run:ai manages the orchestration and configuration of these pods to ensure efficient and scalable inference execution

Create a distributed inference. [Experimental]

Create a distributed inference using container related fields.

SecuritybearerAuth
Request
Request Body schema: application/json
name
required
string (WorkloadName) non-empty

The name of the workload.

useGivenNameAsPrefix
boolean
Default: false

When true, the requested name will be treated as a prefix. The final name of the workload will be composed of the name followed by a random set of characters.

projectId
required
string (ProjectId2)

The id of the project.

clusterId
required
string <uuid> (ClusterId)

The id of the cluster.

object or null (DistributedInferenceSpecSpec)

Node related parameters.

Responses
202

Request completed successfully.

400

Bad request.

401

Unauthorized

403

Forbidden

503

unexpected error

post/api/v1/workloads/distributed-inferences
Request samples
application/json
{
  • "name": "my-workload-name",
  • "useGivenNameAsPrefix": true,
  • "projectId": 1,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Delete a distributed inference. [Experimental]

Delete a distributed inference using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
202

Accepted.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

delete/api/v1/workloads/distributed-inferences/{workloadId}
Response samples
application/json
{
  • "code": 202,
  • "message": "Request has been accepted."
}

Get a distributed inference data. [Experimental]

Retrieve a distributed inference details using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/distributed-inferences/{workloadId}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "deletedAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}