Workloads

Workloads are trainings, workspaces, and deployments that are fully controlled by Run:ai. Workloads can be native, third party integrations, and typical Kubernetes workload types. For more information, see Workloads overview.

List workloads.

Retrieve a list of active workloads with details.

SecuritybearerAuth
Request
query Parameters
deleted
boolean

Return only deleted resources when true.

offset
integer <int32>

The offset of the first item returned in the collection.

Example: offset=100
limit
integer <int32> [ 1 .. 500 ]
Default: 50

The maximum number of entries to return.

sortOrder
string
Default: "asc"

Sort results in descending or ascending order.

Enum: "asc" "desc"
sortBy
string

Sort results by a parameter.

Enum: "type" "name" "clusterId" "projectId" "projectName" "departmentId" "departmentName" "createdAt" "deletedAt" "submittedBy" "phase" "completedAt" "nodepool" "allocatedGPU" "distributedFramework"
filterBy
Array of strings

Filter results by a parameter. Use the format field-name operator value. Operators are == Equals, != Not equals, <= Less than or equal, >= Greater than or equal, =@ contains, !@ Does not contain, =^ Starts with and =$ Ends with. Dates are in ISO 8601 timestamp format and available for operators ==, !=, <= and >=.

Example: filterBy=name!=some-workload-name,allocatedGPU>=2,createdAt>=2021-01-01T00:00:00Z
Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

500

unexpected error

503

unexpected error

get/api/v1/workloads
Response samples
application/json
{
  • "next": 1,
  • "workloads": [
    ]
}

Get a workload.

Retrieve workload data using a workloadId.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

Unique identifier of the workload.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/{workloadId}
Response samples
application/json
{
  • "tenantId": 1001,
  • "runningPods": 1,
  • "phaseUpdatedAt": "2022-06-08T11:28:24.131Z",
  • "k8sPhaseUpdatedAt": "2022-06-08T11:28:24.131Z",
  • "updatedAt": "2022-06-08T11:28:24.131Z",
  • "source": "CLI",
  • "deletedAt": "2022-08-12T19:28:24.131Z",
  • "type": "runai-job",
  • "name": "very-important-job",
  • "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
  • "priorityClassName": "high-priority",
  • "submittedBy": "researcher@run.ai",
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "projectName": "proj-1",
  • "projectId": "1",
  • "departmentName": "department-1",
  • "departmentId": "1",
  • "namespace": "runai-proj-1",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "workloadRequestedResources": {
    },
  • "podsRequestedResources": {
    },
  • "allocatedResources": {
    },
  • "actionsSupport": {
    },
  • "phase": "Creating",
  • "conditions": [
    ],
  • "phaseMessage": "Not enough resources in the requested nodepool",
  • "k8sPhase": "Pending",
  • "requestedPods": {
    },
  • "requestedNodePools": [
    ],
  • "currentNodePools": [
    ],
  • "completedAt": "2022-01-01T03:49:52.531Z",
  • "images": [
    ],
  • "urls": [
    ],
  • "datasources": [
    ],
  • "environments": [
    ],
  • "externalConnections": [
    ],
  • "distributedFramework": "Pytorch",
  • "additionalFields": { },
  • "preemptible": true
}

Count workloads.

Retrieve the number of workloads.

SecuritybearerAuth
Request
query Parameters
deleted
boolean

Return only deleted resources when true.

filterBy
Array of strings

Filter results by a parameter. Use the format field-name operator value. Operators are == Equals, != Not equals, <= Less than or equal, >= Greater than or equal, =@ contains, !@ Does not contain, =^ Starts with and =$ Ends with. Dates are in ISO 8601 timestamp format and available for operators ==, !=, <= and >=.

Example: filterBy=name!=some-workload-name,allocatedGPU>=2,createdAt>=2021-01-01T00:00:00Z
Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

500

unexpected error

503

unexpected error

get/api/v1/workloads/count
Response samples
application/json
{
  • "count": 1
}

Get the workloads telemetry. [Experimental]

Retrieves workload data by telemetry type.

SecuritybearerAuth
Request
query Parameters
clusterId
string <uuid>

Filter using the Universally Unique Identifier (UUID) of the cluster.

Example: clusterId=d73a738f-fab3-430a-8fa3-5241493d7128
nodepoolName
string

Filter using the nodepool.

Example: nodepoolName=default
departmentId
string

Filter using the department id.

Example: departmentId=1
groupBy
Array of strings <= 2 items

Group workloads by field.

Items Enum: "ClusterId" "DepartmentId" "ProjectId" "Type" "CurrentNodepools" "Phase"
telemetryType
required
string (WorkloadTelemetryType)

Specifies the telemetry type.

Enum: "WORKLOADS_COUNT" "GPU_ALLOCATION"
Responses
200

Executed successfully.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/telemetry
Response samples
{
  • "type": "ALLOCATION_RATIO",
  • "timestamp": "2023-06-06 12:09:18.211",
  • "values": [
    ]
}

Get workload metrics data. [Experimental]

Retrieves workloads data metrics from the metrics database. Use in reporting and analysis tools.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

Unique identifier of the workload.

query Parameters
metricType
required
Array of strings (WorkloadMetricType)

Specify which data to request.

Items Enum: "GPU_UTILIZATION" "GPU_MEMORY_USAGE_BYTES" "GPU_MEMORY_REQUEST_BYTES" "CPU_USAGE_CORES" "CPU_REQUEST_CORES" "CPU_LIMIT_CORES" "CPU_MEMORY_USAGE_BYTES" "CPU_MEMORY_REQUEST_BYTES" "CPU_MEMORY_LIMIT_BYTES" "POD_COUNT" "RUNNING_POD_COUNT" "GPU_ALLOCATION"
start
required
string <date-time>

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: start=2023-06-06T12:09:18.211Z
end
required
string <date-time>

End date of time range to fetch data in ISO 8601 timestamp format.

Example: end=2023-06-07T12:09:18.211Z
numberOfSamples
integer [ 0 .. 1000 ]
Default: 20

The number of samples to take in the specified time range.

Example: numberOfSamples=20
Responses
200

Executed successfully.

207

Partial success.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/{workloadId}/metrics
Response samples
{
  • "measurements": [
    ]
}