Changelog - Mabyduck

July 2026

Jul 14, 2026

Experiment tags 🏷️

0.11.7

Experiments can now be tagged, which makes it easier to group and organize them.

Custom rater pools now indicate how many raters are contained in them.

Finally, we introduced organization-level API keys and a new endpoint to fetch billing events of organization billing accounts.

Jul 5, 2026

Custom rater pools

0.11.6

We added self-managed custom rater pools, giving you full control over the demographics of a rater pool.

We also made search bars all across the app a lot more powerful.

June 2026

Jun 26, 2026

Billing events API and top-1 sampling strategy

0.11.4

Our API now lets you retrieve a project's billing events.

We also added a new "Top-1" sampling strategy which automatically focuses resources to identify the best-performing method.

Jun 23, 2026

Okta 🔑, project spending limits 🧾, and session health ❤️

0.11.3

SSO via Okta is now available for our enterprise customers.

We introduced session health scores, which indicate if raters displayed any unexpected behavior.

We also added support for project-level spending limits for organizations.

Jun 15, 2026

Faster metadata processing ⚡

0.11.2

We improved the processing speed for large datasets of externally hosted files.

Jun 10, 2026

Arabic

0.11.1

We added support for studies in Arabic.

Jun 8, 2026

Improved handling of 10-bit videos 📺

0.11.0

Device requirements checked at the beginning of a session now take into account if datasets contain any 10-bit videos.

Jun 3, 2026

Improved workflows API

0.10.2

We updated our workflows API to be able to handle large-scale custom strategies.

Jun 2, 2026

Extended pairwise audio experiments 🎶

0.10.1

Pairwise audio experiments now support evaluation of audio along multiple criteria, matching our pairwise video experiments.

May 2026

May 25, 2026

Improved UI performance for large datasets

0.10.0

This release contained several small improvements:

We improved UI performance for very large datasets.
The iframe size of embedded experiments can now be controlled dynamically.
We added support for sliders in surveys.

May 14, 2026

Golden questions

0.9.14

We added support for explicit golden datasets and golden questions mixed into a session.

May 12, 2026

Allowlist for project secrets 🔐

0.9.13

Project and dataset secrets now support an allowlist to restrict the endpoints with which secrets can be shared.

May 4, 2026

Performance, video experiments, config files & development mode 🧑‍💻

0.9.12

This release comes with a variety of new features.

We added a new development mode which allows live editing of embedded experiments.
The review mode of sessions now displays meta information.
The Workflows API now supports config files.
We improved the performance of our database to better handle large-scale experiments.
Self-hosted datasets can now also be specified as a CSV file uploaded to our platform.

April 2026

Apr 27, 2026

Sequential video player 🎞️

0.9.11

Pairwise video experiments now also have sequential video playback as an option.

Apr 21, 2026

Session health scores and AI raters 🤖

0.9.10

We improved the robustness of our AI rater pools.

This release brings updates to our session and rater health scores which we use to evaluate the performance of raters.

Apr 15, 2026

Project secrets 🔑

0.9.9

You can now set up project-wide secrets which can be used across embedded experiments.

We've also made it possible to filter plots by parameters, and embedded experiments can now store custom metadata on slates.

Apr 8, 2026

Survey question types and metrics

0.9.8

Audio surveys now support a new "highlight transcript" question type, allowing raters to highlight words in a transcript. Survey experiments also gain a new "Percent chosen" metric for radio buttons and checkboxes, with new plots of these metrics on the results pages.

Apr 4, 2026

Active sampling

0.9.7

We improved the calibration of Elo score error bars, which also improves our active sampling strategies.

March 2026

Mar 31, 2026

Survey features and new languages 🇵🇹

0.9.6

This release comes with a variety of improvements.

Survey experiments can now be configured via JSON config files contained in a dataset. This makes it possible to use different questions on different slates.
We added support for Portuguese and Italian.

Mar 26, 2026

Automatic checks of video encoding settings

0.9.5

We added automatic checks of video encoding settings for self-hosted videos. For example, to check if videos have been fragmented.

Mar 19, 2026

API keys update

0.9.4

We updated API keys to be associated with users.

Mar 13, 2026

Webhooks

0.9.3

You can now set up project webhooks for dataset status changes, job completion, and session completion.

Mar 6, 2026

Hero videos 🎬

0.9.2

Hero videos included in introductions can now be configured to take over the full screen. They are also now configurable directly via the UI, not just via the API.

Mar 6, 2026

Checkboxes & Elo for MUSHRA experiments

0.9.1

Pairwise video experiments can now include arbitrary checkboxes.

We've also added Elo support for MUSHRA experiments.

February 2026

Feb 27, 2026

Hero media in experiment introductions

0.9.0

Experiment introductions can now include a custom hero image, configurable via the API.

Feb 22, 2026

New languages, leaderboards, and a new sampling strategy

0.8.8

This release contains a couple of new features:

We added support for Japanese and Korean.
Leaderboards are now available for ACR type experiments.
Experiments with a single condition also now support a new uniform sampling strategy.

Feb 22, 2026

Elo scores, refunds & downloads

0.8.7

This release comes with a variety of improvements.

ACR type experiments now support Elo scores.
Aborted sessions are now automatically refunded, with no manual intervention needed.
The download button now returns JSON consistent with our API, replacing the previous CSV format.

Feb 6, 2026

Elo scores for ACR experiments 📊

0.8.6

We introduced Elo scores for ACR experiments. These scores are based on a Plackett-Luce model which ignores the absolute scores but only considers how an individual rater would rank conditions.

Feb 6, 2026

Pairwise video experiments and custom strategies

0.8.5

We added new response types and support for multiple dimensions in pairwise video experiments.

It is now possible to create datasets, experiments, and jobs in a single API request via our new workflows API. This endpoint also supports specifying a fully custom strategy.

This release also added custom S3 storage support for our enterprise customers.

January 2026

Jan 25, 2026

Scale-to-fit mode

0.8.4

Added a "scale to fit" option for ACR image experiments.

Jan 21, 2026

Self-hosted datasets

0.8.3

It is now possible to use self-hosted datasets by providing a list of URLs, instead of uploading media files directly to us.

We also improved our API, and it is now possible to launch jobs where previously it was only possible to configure drafts via the API.

Jan 13, 2026

Embedded experiments

0.8.1

Our embedded experiments are now widely available. This type of experiment uses JavaScript to include arbitrary content in experiments, and is ideal for running interactive studies.

December 2025

Dec 19, 2025

Survey experiments

0.7.9

We released new types of experiments that allow the configuration of arbitrary surveys below images, audio, or video.

Dec 16, 2025

Improved support for large numbers of conditions

0.7.8

We improved our support for datasets with very large numbers of conditions. This is useful, for example, when you want to collect labels for training and need to label a large number of audio, images, or videos that are not AI-generated.

Dec 12, 2025

Strategy filters

0.7.7

Selection strategies have received more configuration options. For example, it is now possible to evaluate only a subset of a dataset. It is also possible to always include one method in pairwise comparisons against other methods.

This release also makes it possible to scale (instead of cropping) images in pairwise image experiments.

November 2025

Nov 29, 2025

Plots on rubrics

0.7.5

We added the ability to add configurable plots to rubrics and leaderboards.

It is now also possible to create draft experiments and jobs via our API.

Nov 21, 2025

Public launch 🚀

0.7.4

Today, we are opening up Mabyduck to everyone.

Nov 14, 2025

Design improvements

0.7.3

We made small tweaks to our design and changes to our backend to prepare for a public launch.

Nov 7, 2025

Improved session browser

0.7.2

This release contained several improvements:

An updated session browser, which makes it easier to see demographic and other meta information at a glance.
We added the option to filter leaderboards to include or exclude certain conditions.
We now automatically fragment mp4 files uploaded to our platform. Our pairwise video experiments previously required you to upload already fragmented videos.

October 2025

Oct 31, 2025

Confidence regions in line graphs 📈

0.7.1

We added optional confidence regions to line graph visualizations of your results.

This release also adds support for references in ACR audio experiments.

Oct 24, 2025

Additional languages 🇵🇱

0.7.0

This release comes with a variety of improvements.

We refactored the way our rater pools work, paving the way for us to offer you highly customized rater pools.
We updated the UI of the pairwise video experiment to match the recently updated UI of pairwise image experiments.
We added support for 4 more languages, namely Spanish, Polish, Chinese, and Vietnamese.

Oct 17, 2025

Improved upload for large datasets

0.6.3

We improved the handling of very large dataset uploads through the browser. If a dataset upload is interrupted for any reason, it is now possible to resume uploads.

This version also adds a new Markdown input field for writing introductions.

Oct 10, 2025

Rater feedback function 💬

0.6.2

We introduced the ability for raters to leave feedback on individual slates and alert us to any potential issues with an experiment.

This version also updated the leaderboards' design.

April 2025

Apr 5, 2025

Internationalization 🇫🇷

0.2.6

We internationalized our experiments. In addition to English, we now support French and German.

Additionally, different experiments can now use different config files. This allows you to upload a single dataset with multiple config files for different experiments.

March 2025

Mar 17, 2025

Added configuration options

0.2.5

Pairwise image experiments now support references. We also introduced new configuration options for the MUSHRA experiment.

Mar 7, 2025

Pairwise image experiments

0.2.2

We introduced a new pairwise image experiment. We also added a way to preview images in datasets.

February 2025

Feb 23, 2025

Pre-screening

0.2.0

We have implemented our own pre-screening protocols. This allows us to provide you with a higher quality of raters whose ability and hardware enable them to detect fine differences between stimuli.

Feb 13, 2025

Crowd-sourced raters

0.1.6

It is now possible to launch experiments to crowd-sourced raters through our platform.

Feb 5, 2025

MUSHRA

0.1.5

We added configuration options to change how waveforms are rendered in MUSHRA experiments. In particular, it is now possible to only render the waveform of the reference so that raters can not draw conclusions based on the waveform.

January 2025

Jan 29, 2025

Audio experiments and API

0.1.4

Release 0.1.4 is packed with new features:

A new API for fetching results programmatically.
A new absolute category rating (ACR) experiment for audio stimuli.
Added configuration options for pairwise audio experiments, such as the option to vote "Tie" or checking if audio has been played for a given duration.

Jan 27, 2025

MUSHRA 🪲

0.1.3

We addressed some minor bugs in the MUSHRA experiment.

Jan 16, 2025

Added support for config files

0.1.2

Datasets now support config files. These can be used to change the interface for each slate. For example, to display text prompts next to stimuli.

Jan 6, 2025

Private beta 🐣

0.1.0

Today, we are excited to release a private beta version of Mabyduck to our design partners.