API Tutorial

This page explains how to use the REST API to get tasks into Alegion and results out. In most cases, this is metadata about tasks, such as the coordinates of a bounding box. Digital assets such as videos, images, and NER corpus documents remain separate, as explained in the data pipelines doc.

(The lone exception is "compound/general tasks" as described in the annotation types doc.)

This tutorial explains how to use the API during the lifecycle of a labeling project. If you're already familiar, visit the endpoint documentation for details about each call's parameters, return types, error codes, etc.

Command examples here are for the cURL command line utility. Python users, please check out our API reference client. to accelerate learning the API, and code samples you can reuse.

Getting started

A "workflow" in Alegion terminology contains all the steps and settings used to annotate your source data. It includes task designs, classification lists, and quality control configurations associated with a project. (For more detail, see the ensuring quality section.)

Working closely with you, Alegion customer success creates a workflow to fit your use case and accuracy requirements, then provides you the unique ID to use in API calls.

"Batches" are arbitrary collections of input records, as well as the annotations that get attached. Batches belong to one (and only one) workflow. Despite the name, a batch doesn't have a set time limit or number of items. They're simply a way to organize your labeling project to fit your needs.

The lifecycle of a project is:

  1. Alegion customer success initializes the workflow.
  2. You create a batch within that workflow.
  3. You upload input records into the batch.
  4. Labelers work the resulting tasks.
  5. You download results, using the same batch ID.

All but the first two steps can go on continuously, in parallel. For instance, results can be retrieved while labeling is ongoing, and new input records can be added at any time.

The first section below covers the common API operations performed in that lifecycle. The second section covers some useful but less frequent operations.

A full round trip

Let's authenticate, create a batch, upload records, and get the results.

Authentication

Authenticate by sending a POST request to the /api/v1/login endpoint. Retrieve the access_token from the response body for use in an Authorization header in all subsequent requests.

Example request

curl -X POST \
  'https://app.alegion.com/api/v1/login' \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"username": "username",
	"password": "password"
  }'

Example response

{
    "access_token": "access_token",
    "expires_in": 3600,
    "token_type": "bearer"
}

Token expiration

Access tokens have a defined lifespan, and requests made using an expired token will fail with an HTTP 401 error. Simply make the authentication request again and use the new access token in all subsequent requests.

Tip

Reusing a token is optional. You can make an authorization request for every new transaction in order to avoid dealing with expiration.

Creating a batch

You can create a batch using the /api/v1/workflows/{workflowId}/batches endpoint.

Example request

curl -X POST \
   https://app.alegion.com/api/v1/workflows/{workflowId}/batches \
   -H 'authorization: Bearer {access_token}' \
   -H 'content-type: application/json' \
   -H 'accept: application/json' \
   -d '{
         "name": "Batch Name",
         "isEnabled": true
       }'

Example response

{
  "id": "739b1978-cb5b-4188-9d1d-37a394d67f5a",
  "name": "Batch Name",
  "workflowId": "dac402db-b594-4f44-a2c7-87a535cf26f9",
  "priority": "normal",
  "isActive": true,
  "isEnabled": true,
  "createdAt": "2019-03-25T19:45:47.275Z",
  "lowWaterMark": 200,
  "highWaterMark": 500
}

Deprecation note

The following properties have been deprecated and can be disregarded:

  • priority
  • isActive
  • lowWaterMark
  • highWaterMark

Uploading input records

This section shows how to encode an input record using JSON. (There is also an option to upload a CSV below.)

Each input record has this structure:

  • a data member containing
    • a list of key-value pairs where
      • the key is a fieldname determined by the task design
      • the value is displayed to the labeler, or in the case of images, videos, and NER documents, the URL to the digital asset
  • metadata (optional): any string:
    When used, metadata will be associated with that input record throughout its lifetime. Use it to attach context to a record to track it as it moves through the Alegion platform. For instance, you can use it to store the record's corresponding unique ID in your database.

Warning

metadata is a reserved word and must be a sibling to the input record data field, not a child of data

Valid:

[
  {
    "data": {
      "input_fieldname": "this is my value"
    },
    "metadata": "external_ID"
  }
]

Invalid:

[
  {
    "data": {
      "input_fieldname": "this is my value",
      "metadata": "external_ID"
    }
  }
]

Use the /api/v1/batches/{batchId}/records/import endpoint to bulk load JSON input records into your batch. The request body is an array of single input records.

Example request

curl -X POST \
  https://app.alegion.com/api/v1/batches/{batchId}/records/import \
  -H 'authorization: Bearer {access_token}' \
  -H 'content-type: application/json' \
  -d '[
    {
        "data": {
            "input_fieldname_1": "this is my value on record 1",
            "input_fieldname_2": "another field value on record 1"
        },
        "metadata": "sample_metadata"
    },
    {
        "data": {
            "input_fieldname_1": "this is my value on record 2",
            "input_fieldname_2": "another field value on record 2"
        },
        "metadata": "more_sample_metadata"
    }
]'

Example response

[
    {
        "id": "309ec5c5-6589-47fc-bd3d-c5b4edeb5d57",
        "data": {
            "input_fieldname_1": "this is my value on record 1",
            "input_fieldname_2": "another field value on record 1"
        },
        "metadata": "sample_metadata"
        "createdAt": "2019-11-15T14:26:17.661Z",
        "workflowStageId": "string",
        "status": "queued"
    },
    {
        "id": "83224129-b65f-4533-8f06-d25333a21759",
        "data": {
            "input_fieldname_1": "this is my value on record 2",
            "input_fieldname_2": "another field value on record 2"
        },
        "metadata": "more_sample_metadata"
        "createdAt": "2019-11-15T14:26:17.661Z",
        "workflowStageId": "string",
        "status": "queued"
    }
]

If you get a success response as shown above, your tasks are in the batch and are ready to be worked. You do not need to query the batch.

Upload limit

The import endpoints currently limit imports to 1,000 records per request. Attempting to send more than 1,000 records in a single request will result in an HTTP 422 Unprocessable Entity error.

Retrieving results

Once tasks have been worked through the entirety of your workflow, you can retrieve results by querying the batch using the /api/v1/batches/{batchId}/results endpoint. Results will be paginated.

Example request

curl -X GET \
  https://app.alegion.com/api/v1/batches/{batchId}/results \
  -H 'authorization: Bearer {access_token}'

Example response

[
    {
        "id": "046b6c7f-0b8a-43b9-b35d-6489e6daee91",
        "createdAt": "2019-11-14T21:02:36.491Z",
        "resultData": {
			"output_fieldname": "..."
        },
        "inputRecord": {
            "createdAt": "2019-11-14T20:59:28.375Z",
            "metadata": "sample metadata",
            "data": {
	            "input_fieldname_1": "this is my value on record 1",
	            "input_fieldname_2": "another field value on record 1"
            },
            "workflowStageId": "046b6c7f-0b8a-43b9-b35d-6489e6daee91",
            "id": "046b6c7f-0b8a-43b9-b35d-6489e6daee91",
            "status": "in-progress"
        }
    }
]

Like input records, the schema of resultData is is determined by details of the task and workflow design. For images, videos, and NER tasks, the output follows a set schema (coordinates, classifications, etc.). In the case of compound/general tasks, Alegion customer success will provide the output fieldnames.

About pagination

Some API responses are large enough that it makes sense to paginate them. Any paginated response will provide a Link header and a X-Total-Count header and will accept pagination related paramaters in the query string.

Pagination request parameters

  • page: Which page number to return
  • pageSize: How many records per page
  • sort: -createdAt or +createdAt, to sort by createdAt descending or ascending, respectively

Example pagination response headers: Link and X-Total-Count

Link: <https://app.alegion.com/api/v1/batches/{batchId}/records?page=1&pageSize=20&sort=-createdAt>;rel="first",
    <https://app.alegion.com/api/v1/batches/{batchId}/records?page=1&pageSize=20&sort=-createdAt>;rel="prev",
    <https://app.alegion.com/api/v1/batches/{batchId}/records?page=3&pageSize=20&sort=-createdAt>;rel="next",
    <https://app.alegion.com/api/v1/batches/{batchId}/records?page=3&pageSize=20&sort=-createdAt>;rel="last"
X-Total-Count: 42

Checking for doneness

Alegion does not have a "callback" mechanism for notifications when a batch is "ready for pickup". This is for two reasons:

  • Batches don't have a defined end as explained above.
  • Alegion is platform-agnostic with respect to the client calling our API, and any single mechanism (e.g. webhooks) would not in practice work for a substantial portion of our customers.

However, it is trivial to call the results endpoint with pagination parameters of page=1 and pageSize=1, then read the X-Total-Count response header to see how many results are available. Provided your system keeps track of how many records were sent into the batch, you'll know when it's complete.

But: There are two other task states that are considered "final" but are not returned by the results endpoint: exception and canceled. These need to be counted separately. Read on for the explanation of these exception state, and the way to retrieve the exception count.

Task exceptions

"Exception" has a special meaning in Alegion, specifically with regard to tasks. (Consider it an overloaded term, unrelated to how programmers normally use the word.)

Labelers report a task exception when for whatever reason a task cannot be worked, for instance when an image fails to render. Task exceptions go into a special queue for review. If the situation was temporary (e.g. permissions that expire on an S3 bucket, "internet weather"), those tasks can be restarted and proceed normally, eventually appearing in the final results.

However, if the situation was not temporary (e.g. a bad filename in the input data), the tasks can be canceled, in which case they will not appear in the final results. (Tasks can be canceled for other administrative reasons as well.)

This is an intentional and constructive task design pattern, and depending on your use case, there are other ways to employ exceptions usefully. (Alegion's customer success team will guide you in this.)

Counting exceptions and canceled tasks

To complete the picture of batch status described above, combine the count from the results endpoint with the count of exception and canceled tasks from the records endpoint with these specifc query parameters:

  • status=exception,canceled
  • page=1
  • pageSize=1

Round trip summary

To sum up the happy-path journey outlined above, we have:

  • Authenticated before each call (optionally reusing the token for repeated calls)
  • Created a batch in your custom workflow, as provided by Alegion's customer success team
  • Uploaded records to that batch, using fieldnames that match the task design
  • Checked the results endpoint to get ground truth out
  • Checked the count of exceptions and canceled tasks

Other calls and concepts

Filtering results

You can filter results using the following querystring parameters.

  • minCreatedAt: to exclude results finalized before this datetime
  • maxCreatedAt: to exclude results finalized after this datetime

Example

Suppose your system knows the last time it requested results from a batch. To query for only new reults that became final after that, use minCreatedAt:

curl -X GET \
https://app.alegion.com/api/v1/batches/{batchId}/results?minCreatedAt=2020-01-01T10:00:00.000Z \
-H 'authorization: Bearer {access_token}'

Common errors

Malformed Batch ID

Sending a seriously malformed batch ID will return an HTTP 422 with the following body:

{
    "message": "Failed to convert value of type 'java.lang.String' to required type 'java.util.UUID'; nested exception is java.lang.IllegalArgumentException: Invalid UUID string: {batch_id}"
}

Bad or Incorrect Token

Sending a malformed or incorrect access token will return an HTTP 500 error with a response body like:

{
    "timestamp": 1510865911864,
    "status": 500,
    "error": "Internal Server Error",
    "exception": "org.springframework.security.access.AccessDeniedException",
    "message": "Incorrect token provided",
    "path": "/api/v1/batches/{batchId}/results"
}

An expired access token will return an HTTP 401 error and look like:

{
    "timestamp": 1512567181019,
    "status": 401,
    "error": "Unauthorized",
    "message": "Unable to authenticate using the Authorization header",
    "path": "/api/v1/batches/{batchId}/records"
}

Listing your workflows

To retrieve an array of all workflows in your account, use the /api/v1/workflows endpoint. Results will be paginated.

Example request

curl -X GET \
  https://app.alegion.com/api/v1/workflows \
  -H 'Authorization: Bearer {access_token}'

Example response

[
  {
    "id": "dac402db-b594-4f44-a2c7-87a535cf26f9",
    "name": "Workflow name",
    "description": "Workflow description",
    "isActive": true,
    "isVisible": true,
    "isArchived": false,
    "createdAt": "2018-09-26T18:35:38.188Z",
    "isLimitWorkerToOncePerInputRecord": false
  }
]

Listing batches within a workflow

To retrieve an array of all batches for a given workflow, use the /api/v1/workflows/{workflowId}/batches endpoint. Results will be paginated.

Example request

curl -X GET \
  https://app.alegion.com/api/v1/workflows/{workflowId}/batches \
  -H 'Authorization: Bearer {access_token}'

Example response

[
    {
        "id": "8760a700-6846-4476-8c36-2f944f17fd72",
        "name": "batch 1",
        "workflowId": "dac402db-b594-4f44-a2c7-87a535cf26f9",
        "priority": "normal",
        "isActive": true,
        "isEnabled": true,
        "createdAt": "2018-09-26T18:41:31.410Z",
        "lowWaterMark": 500,
        "highWaterMark": 500
    }
]

Get metadata about a batch

Use a GET to the /api/v1/batches/{batchId} endpoint to retrieve metadata about a batch. Note that this returns the properties of a batch as a container object, and does not return information about the input records, batch status, or results.

Example request

curl -X GET \
  https://app.alegion.com/api/v1/batches/{batchId} \
  -H 'Authorization: Bearer {access_token}'

Example response

{
    "id": "3c9fa471-882b-4d92-82d4-ece202606ba6",
    "name": "Batch Name",
    "workflowId": "dac402db-b594-4f44-a2c7-87a535cf26f9",
    "priority": "normal",
    "isActive": true,
    "isEnabled": true,
    "createdAt": "2019-03-26T17:47:03.399Z",
    "lowWaterMark": 200,
    "highWaterMark": 500
}

Deprecation note

The following properties have been deprecated and can be disregarded:

  • priority
  • isActive
  • lowWaterMark
  • highWaterMark

Update a batch's metadata

Use a PUT to the /api/v1/batches/{batchId} endpoint to update a batch's metadata. Similar to the GET for this endpoint, this does not affect input records or results.

Example request

curl -X PUT \
  https://app.alegion.com/api/v1/batches/{batchId} \
  -H 'Authorization: Bearer {access_token}' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "{batchId}",
    "name": "Batch Name",
    "workflowId": "{workflowId}",
    "createdAt": "2019-01-01",
    "isEnabled": true,
    "priority": "normal",
    "isActive": true
}'

All the request parameters below are required:

  • id: batch id to update (cannot be changed).
  • name: name that is used to identify the batch.
  • workflowId: workflow id the batch belongs to (cannot be changed).
  • createdAt: any date in the format YYYY-MM-DD (cannot be changed).
  • isEnabled: determines if the batch's tasks are queued (true or false).
  • priority: required, but deprecated (default value: "normal")
  • isActive: required, but deprecated (default value: true)

Example response

{
    "id": "3c9fa471-882b-4d92-82d4-ece202606ba6",
    "name": "Batch Name",
    "workflowId": "dac402db-b594-4f44-a2c7-87a535cf26f9",
    "priority": "normal",
    "isActive": true,
    "isEnabled": false,
    "createdAt": "2019-03-26T17:47:03.399Z",
    "lowWaterMark": 200,
    "highWaterMark": 500
}

Deprecation note

The following properties have been deprecated and can be disregarded:

  • priority
  • isActive
  • lowWaterMark
  • highWaterMark

List a batch's input records

Use the /api/v1/batches/{batchId}/records endpoint to retrieve input records belonging to a batch. Results will be paginated.

Example request

curl -X GET \
  https://app.alegion.com/api/v1/batches/{batchId}/records \
  -H 'authorization: Bearer {access_token}'

Example response

[
    {
        "id": "046b6c7f-0b8a-43b9-b35d-6489e6daee91",
        "createdAt": "2017-11-14T21:02:36.491Z",
        "metadata": "aeiou",
        "data": "aeiou",
        "workflowStageId": "046b6c7f-0b8a-43b9-b35d-6489e6daee91",
        "status": "in-progress"
    }
]

Upload input records from a CSV

Alegion accepts CSV as a format for the bulk upload of input records. While JSON is clearly superior for encoding complex structures, if your system is based on CSVs, uploading them can be appropriate for cases where the input data is simple. The Alegion customer success team will make a format recommendation based on your task design.

Use the /api/v1/batches/{batchId}/records/import-csv endpoint to load a CSV of input records into your batch. Your batchId and a UTF-8 encoded CSV file are required. This is a synchronous operation.

Example request

curl -X POST \
  https://app.alegion.com/api/v1/batches/{batchId}/records/import-csv \
  -H 'authorization: Bearer {access_token}' \
  -H 'cache-control: no-cache' \
  -H 'Content-Type: multipart/form-data' \
  -F file=@/path/to/records.csv

Example response

[
    {
        "id": "309ec5c5-6589-47fc-bd3d-c5b4edeb5d57",
        "data": {

        },
        "metadata": "sample metadata",
        "createdAt": "2017-11-15T14:26:17.661Z",
        "workflowStageId": "string",
        "status": "queued"
    }
]