NVIDIA NIM

The NVIDIA NIM API provides endpoints to create and manage workloads that deploy NVIDIA Inference Microservices (NIM) through the NIM Operator. These workloads package optimized NVIDIA model servers and run as managed services on the NVIDIA Run:ai platform. Each request includes NVIDIA Run:ai scheduling metadata (for example, project, priority, and category) and a NIM service specification that defines the container image, compute resources, environment variables, storage, and networking configuration. Once submitted, NVIDIA Run:ai handles scheduling, orchestration, and lifecycle management of the NIM service to ensure reliable and efficient model serving.

Create a NVIDIA NIM service. [Experimental]

Create a NVIDIA NIM service

SecuritybearerAuth
Request
Request Body schema: application/json
required
required
object (WorkloadV2MetadataCreateParams)
required
object or null (NimServiceSpec)
Responses
202

Workload creation accepted

400

Bad request.

401

Unauthorized

403

Forbidden

409

The specified resource already exists

500

unexpected error

503

unexpected error

post/api/v2/workloads/nim-services
Request samples
application/json
{
  • "metadata": {
    },
  • "spec": {
    }
}
Response samples
application/json
{
  • "metadata": {
    },
  • "desiredPhase": "Running",
  • "spec": {
    }
}

Get a NVIDIA NIM service. [Experimental]

Retrieve details of a specific NVIDIA NIM service, by id

SecuritybearerAuth
Request
path Parameters
WorkloadV2Id
required
string <uuid>

The ID of the workload.

Responses
200

Successfully retrieved the workload

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v2/workloads/nim-services/{WorkloadV2Id}
Response samples
application/json
{
  • "metadata": {
    },
  • "desiredPhase": "Running",
  • "spec": {
    }
}