Inferences

Use to retrieve metrics for inference workloads and pods.

Get info of a Hugging Face model [Experimental]

Retrieves various information of a given Hugging Face model.

SecuritybearerAuth
Request
Request Body schema: application/json
modelName
required
string^[a-zA-Z0-9-_]+(\x2F[a-zA-Z0-9-_.]+)?$

the model name

hfToken
string or null^hf_[a-zA-Z0-9]+$

the Hugging Face token, required only for gated models

requestMinMemoryRequirements
boolean or null

whether to request the minimum memory requirements for using this model for inference

Responses
200

Request completed successfully.

400

Bad submission request.

401

Unauthorized

403

Forbidden

500

unexpected error

503

unexpected error

post/api/v1/huggingface-model/info-request
Request samples
application/json
{
  • "modelName": "meta-llama/Llama-2-7b-chat-hf",
  • "hfToken": "hf_aabbccDDEEFFgghhiiKKLLMMnnooPP1234",
  • "requestMinMemoryRequirements": true
}
Response samples
application/json
{
  • "modelName": "meta-llama/Llama-2-7b-chat-hf",
  • "pipelineTag": "text-generation",
  • "architectures": [
    ],
  • "vllmSupportingVersions": [
    ],
  • "minMemoryRequirementsMb": 0
}

Create an inference. [Experimental]

Create an inference using container related fields.

SecuritybearerAuth
Request
Request Body schema: application/json
name
required
string (WorkloadName) non-empty

The name of the workload.

useGivenNameAsPrefix
boolean
Default: false

When true, the requested name will be treated as a prefix. The final name of the workload will be composed of the name followed by a random set of characters.

projectId
required
string (ProjectId2)

The id of the project.

clusterId
required
string <uuid> (ClusterId)

The id of the cluster.

object or null (CommonFlatFields)

Container overrideable fields. In the context of assets,these are environment asset fields that can be overriden in the submit workload request.

Responses
200

Request completed successfully.

400

Bad request.

401

Unauthorized

403

Forbidden

503

unexpected error

post/api/v1/workloads/inferences
Request samples
application/json
{
  • "name": "my-workload-name",
  • "useGivenNameAsPrefix": true,
  • "projectId": 1,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Delete an inference. [Experimental]

Delete an inference using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
204

No Content.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

delete/api/v1/workloads/inferences/{workloadId}
Response samples
application/json
{
  • "code": 401,
  • "message": "Issuer is not familiar."
}

Get inference data. [Experimental]

Retrieve inference details using a workload id.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Update inference spec. [Experimental]

Update the specification of an existing inference workload.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

Request Body schema: application/json
object or null

Container overrideable fields. In the context of assets,these are environment asset fields that can be overriden in the submit workload request.

Responses
200

Executed successfully.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

patch/api/v1/workloads/inferences/{workloadId}
Request samples
application/json
{
  • "spec": {
    }
}
Response samples
application/json
{
  • "name": "my-workload-name",
  • "requestedName": "string",
  • "workloadId": "06d16c5d-4728-42fa-b573-3b11820d999f",
  • "projectId": 1,
  • "departmentId": 2,
  • "clusterId": "71f69d83-ba66-4822-adf5-55ce55efd210",
  • "createdBy": "test@lab.com",
  • "createdAt": "2022-01-01T03:49:52.531Z",
  • "desiredPhase": "Running",
  • "actualPhase": "Creating",
  • "spec": {
    }
}

Get inference metrics data. [Experimental]

Retrieve inference metrics data by id. Supported from control-plane version 2.18 or later.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

query Parameters
metricType
required
Array of strings (InferenceWorkloadMetricType)

Specify which data to request.

Items Enum: "THROUGHPUT" "LATENCY"
start
required
string <date-time>

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: start=2023-06-06T12:09:18.211Z
end
required
string <date-time>

End date of time range to fetch data in ISO 8601 timestamp format.

Example: end=2023-06-07T12:09:18.211Z
numberOfSamples
integer [ 0 .. 1000 ]
Default: 20

The number of samples to take in the specified time range.

Example: numberOfSamples=20
Responses
200

Executed successfully.

207

Partial success.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}/metrics
Response samples
{
  • "measurements": [
    ]
}

Get inference pod's metrics data. [Experimental]

Retrieve inference metrics pod's data by workload and pod id. Supported from control-plane version 2.18 or later.

SecuritybearerAuth
Request
path Parameters
workloadId
required
string <uuid>

The Universally Unique Identifier (UUID) of the workload.

podId
required
string <uuid>

The requested pod id.

query Parameters
metricType
required
Array of strings (InferencePodMetricType)

Specifies metrics data to request. Inference metrics are only available for inference workloads.

Items Enum: "THROUGHPUT" "LATENCY"
start
required
string <date-time>

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: start=2023-06-06T12:09:18.211Z
end
required
string <date-time>

End date of time range to fetch data in ISO 8601 timestamp format.

Example: end=2023-06-07T12:09:18.211Z
numberOfSamples
integer [ 0 .. 1000 ]
Default: 20

The number of samples to take in the specified time range.

Example: numberOfSamples=20
Responses
200

Executed successfully.

207

Partial success.

400

Bad request.

401

Unauthorized

403

Forbidden

404

The specified resource was not found

500

unexpected error

503

unexpected error

get/api/v1/workloads/inferences/{workloadId}/pods/{podId}/metrics
Response samples
{
  • "measurements": [
    ]
}