Ensuring Quality At Scale

The Alegion platform has a wide variety of configurations and accuracy mechanisms to balance quality, speed, cost, and the nuanced business logic demanded by tricky ML applications. This page goes deep on the available options.

Key concepts

workflow symbol

Workflow, stage

"Workflow" refers to a sequence of one or more task designs that take raw input data and return labeled ground truth. A "stage" is a combination of the task design, scoring configurations, and routing rules that determine what labelers see and when. Each workflow has at least one stage.

gold data symbol

Gold data

Gold data is used to transparently test labelers. It is culled from ground truth and "sprinkled" randomly into the task stream at a configurable percentage. By definition, the platform is creating new ground truth all the time. This can then be mined to make more gold, if needed. (To keep scoring honest, labelers will not see the gold they've created.)

labeler scoring symbol

Labeler scoring

The platform ensures accuracy by actively tracking labeler proficiency against gold data tasks and/or when a judgement is overturned by a higher-skilled worker or admin. Labelers who trend below a configurable accuracy threshold can be removed. Scoring criteria can be simple "pass/fail" measures (e.g. whether the labeler chose the correct classification) or calculations like IoU (intersection over union, used to gauge geometric accuracy of bounding boxes and polygons). Such comparators are set at the field level for maximum granularity.

admin review symbol

Admin review

Admin review is a holding tank for completed assignments; judgements in the holding tank can be inspected for accuracy, and it also provides tools to detect when labelers are getting sloppy, such as overly consistent time on task, labeler score, etc. Judgements can be rejected or approved, singly or in bulk.

Flexibility through "polymerized" workflows

This section describes how "polymerized" workflows can be composed from individual parts to organize a labeling effort around the obvious efficiency measures, as well as the hidden costs such as cognitive load.

Alegion's customer success team has unmatched expertise in composing workflows to adapt the platform to tight requirements (such as visual accuracy or difficult ontologies) while managing getting the highest quality from labelers with various types of training and skill levels.

A simple workflow

Diagram of the simplest workflow

The simplest workflow has a single step. Admin review is in full effect.

ML Consensus

ML consensus workflow diagram

This consensus workflow features parallel blind judgements by a human and ML model. If they do not agree, a higher-skilled labeler breaks the tie. Reviewers can see each individual's labels (including the ML) so they have all the context needed to make a final call.

An advantage of the parallelism: as the machine model improves, tiebreakers become increasingly rare, saving time and cost. While the theoretical maximum for such work deflection is 50%, if the tiebreaker is relatively costly, the actual savings could be higher. For instance, if people in the tiebreaker level of labelers costs twice what those in the first round do, the theoretical maximum savings is 67%.

Human Consensus

Human consensus workflow diagram

The same approach can be applied with no ML. Human consensus shares some of the advantages of ML-powered consensus, including work deflection, but does not have the possible cost savings from putting a model in the loop.

However, Alegion's built-in quality mechanisms are engineered to yield the same accuracy as parallel judgements without the duplication of effort, saving both cost and time.

ML pre-labeling

diagram of workflow with ML prelabeling

Partially-trained models can be used to pre-label raw data so that the human labeler is more often adjusting machine inferences rather than always creating labels from scratch. Given passable model performance, pre-labeling can be advantageous for object tracking, instance localization, and scene classification tasks, as well as some NLP applications.

Incorporating external APIs

diagram of workflow with external API call

Any web-based API can be incorporated into a workflow. This diagram shows a flow that is identical to the ML pre-labeling example except with an external call. This is a matter of scripting in AWS Lambda, not merely configuration, but vastly expands the boundaries of the data and logic our platform can leverage in a workflow.

Interestingly, judgements from an external API can optionally be scored like any others, permitting you to apply the same kinds of analysis and quality management methods as used with human labelers.

Multi-stage workflows

Multi-stage cat/dog workflow diagram

Metaphorically, if our stages are "molecules", they can be chained together with conditional logic into "polymers". This composability allows the platform to be tuned to highly domain-specific requirements. Diagrammed above is a fabricated example wherein we want to know whether an image contains a dog or a cat, and if it has either, a breed or variety classification. (Of course, this made-up example implies that a picture can't contain both a dog and a cat, but that's only for the ease of illustration.)

The task first goes to a "fast generalist"—a labeler who can move quickly because they only need to decide whether they see a dog, cat, or neither. Thus, well-trained dog and cat specialists are reserved only for the cases where they're needed.

In some workflows, that setup may yield cost and time savings, but the same task can be accomplished with a different arrangement:

Alternative multi-stage cat/dog workflow diagram

Here, if there's no cat, the feline specialist quickly dismisses the task and it goes to a dog specialist (who will also quickly dismiss the task if there's no dog).

Alegion's customer success team will work with you to find the optimal structure for your particular needs.