Benchmarks

Existing AI benchmarks measure intelligence: reasoning, knowledge retrieval, code generation. We are building something complementary. Taste-Bench measures the qualities a model needs to help with work that runs on judgment, not just correctness. This includes both a model developing its own sense of taste in a domain (acting as a credible critic or advisor) and its ability to support people in their creative practice without flattening what makes their work theirs.

This is not one benchmark but a series, each targeting a different facet of what taste looks like in practice. Some evals resemble games: the model navigates a structured scenario and we measure the quality of its choices. Others ask the model to produce an output (a critique, a set of questions, a ranked list) and use a separate evaluation layer to judge whether the output is good. Each benchmark develops its own methodology fitted to what it is trying to measure.

We are developing a large number of these tests. The plan is to then compare results against all major existing benchmarks and identify which of our evaluations are strictly orthogonal to measures of raw intelligence. The goal is a small, stable set of benchmarks that capture something genuinely new about what models can and cannot do.

The benchmarks we are developing

Discernment

Can a model tell 'good' from 'bad' in complex domains, in line with expert judgment?

Coming soon

Formation

Can a model form coherent preferences after a structured journey of exposures?

Coming soon

Authenticity

Can a model create from an internal sense of what is good rather than only optimizing for engagement?

Coming soon

Critique

Can a model critique work in a way that is fair, illuminating, and useful to others?

Coming soon

Sovereignty

When asked to stay neutral, can a model avoid imposing its taste on the user?

Coming soon

Calibration

Can a model keep independent ratings aligned with its own comparative judgments?

Coming soon

Curiosity

Can a model choose fruitful questions and directions to explore, showing research-style taste for what to do next?

Coming soon

Courage

Can a model take a stand (love some things, hate others) instead of fence-sitting on every aesthetic choice?

Coming soon

Contact

If you work on evaluation, taste, or creative AI and want to collaborate, write to hello@taste-bench.com.