Empirical Research & Brier Scores

Prediction markets are often judged by bold claims about accuracy. Research gives a better way to handle that question.

Instead of asking whether markets are simply "good" or "bad," researchers ask narrower questions:

Are prices reasonably calibrated?
Do markets update quickly when new information arrives?
How does performance change in liquid versus thin markets?
How do markets compare with polls, models, or expert forecasts in specific settings?

That is where tools like Brier scores and calibration plots become useful.

What Are Brier Scores?

A Brier Score is a mathematical grading system used to measure the accuracy of probabilistic predictions. Unlike a simple binary "right or wrong" grade, a Brier Score penalizes forecasters based on their confidence in a wrong answer.

Brier Scores range from 0.0 to 1.0:

0.0: Perfect accuracy (the market predicted a 100% chance of an event happening, and it happened).
0.25: Equivalent to guessing randomly (50/50) on every single market.
1.0: Total failure (the market predicted a 100% chance of an event happening, and it did not happen).

Brier scores are useful, but they are not a universal verdict. A strong score in one market category does not prove that every market on every platform is equally informative.

How Calibration Works

Brier Scores measure individual market accuracy, but Calibration measures systemic platform accuracy.

A prediction market is well calibrated when, over many cases, events priced around 70% happen about 70% of the time, events priced around 30% happen about 30% of the time, and so on.

The Calibration Test:

Researchers gather every single market that Polymarket priced exactly at 70% (or $0.70).
They wait for all of these markets to resolve.
If exactly 7 out of 10 of those markets resulted in "Yes", the platform is perfectly calibrated at the 70% level.

If a market consistently prices events at 90%, but those events only happen 60% of the time, the market is overconfident. If 90% events happen 95% of the time, the market is underconfident.

Research on prediction markets is broadly supportive, but it also finds important limits. Some papers find that markets are reasonably well calibrated, especially closer to expiration. Others find biases such as favorite-longshot effects or weaker performance in thin markets and long-dated contracts.

Why context matters

The quality of a prediction market depends on conditions, not just theory.

Important variables include:

liquidity
spread size
time to expiration
clarity of resolution rules
who is allowed to participate
whether fees or frictions interfere with arbitrage

That is why broad statements like "markets always beat polls" are too simple. A liquid, high-attention election market close to resolution is not the same thing as a thin niche market that barely trades.

Risks: Thin Liquidity and Noise

One of the clearest research and practice lessons is that thin markets are harder to trust.

If a contract has very little trading activity, a small number of participants can move price sharply. That does not make the market useless, but it does mean readers should interpret the price more carefully.

What research supports today

A careful summary of the literature looks like this:

prediction markets are a serious forecasting tool, not just a novelty
they often produce useful probability estimates
they can compare well with other forecasting methods in some settings
they are not automatically efficient or unbiased in every market
liquidity and contract design matter a great deal

That is a stronger and more durable conclusion than claiming the debate is completely settled.

FAQ

What is a Brier Skill Score (BSS)?

The Brier Skill Score modifies the standard Brier Score by comparing the market's forecast against a baseline reference (usually a naive 50/50 guess). A positive BSS means the market is actively adding predictive value above random chance.

Do prediction markets outperform experts?

Sometimes they do, but not in every setting. The safer conclusion is that prediction markets are often competitive with other forecasting tools and can be especially informative when markets are liquid and incentives are strong.

How do researchers account for platform fees?

Researchers need to consider frictions like fees, spreads, and participation limits because those can affect how closely price tracks forecast quality.

Empirical Research & Brier Scores

Prediction markets are often judged by bold claims about accuracy. Research gives a better way to handle that question.

Instead of asking whether markets are simply "good" or "bad," researchers ask narrower questions:

Are prices reasonably calibrated?
Do markets update quickly when new information arrives?
How does performance change in liquid versus thin markets?
How do markets compare with polls, models, or expert forecasts in specific settings?

That is where tools like Brier scores and calibration plots become useful.

What Are Brier Scores?

Brier Scores range from 0.0 to 1.0:

0.0: Perfect accuracy (the market predicted a 100% chance of an event happening, and it happened).
0.25: Equivalent to guessing randomly (50/50) on every single market.
1.0: Total failure (the market predicted a 100% chance of an event happening, and it did not happen).

Brier scores are useful, but they are not a universal verdict. A strong score in one market category does not prove that every market on every platform is equally informative.

How Calibration Works

Brier Scores measure individual market accuracy, but Calibration measures systemic platform accuracy.

A prediction market is well calibrated when, over many cases, events priced around 70% happen about 70% of the time, events priced around 30% happen about 30% of the time, and so on.

The Calibration Test:

Researchers gather every single market that Polymarket priced exactly at 70% (or $0.70).
They wait for all of these markets to resolve.
If exactly 7 out of 10 of those markets resulted in "Yes", the platform is perfectly calibrated at the 70% level.

If a market consistently prices events at 90%, but those events only happen 60% of the time, the market is overconfident. If 90% events happen 95% of the time, the market is underconfident.

Why context matters

The quality of a prediction market depends on conditions, not just theory.

Important variables include:

liquidity
spread size
time to expiration
clarity of resolution rules
who is allowed to participate
whether fees or frictions interfere with arbitrage

Risks: Thin Liquidity and Noise

One of the clearest research and practice lessons is that thin markets are harder to trust.

What research supports today

A careful summary of the literature looks like this:

prediction markets are a serious forecasting tool, not just a novelty
they often produce useful probability estimates
they can compare well with other forecasting methods in some settings
they are not automatically efficient or unbiased in every market
liquidity and contract design matter a great deal

That is a stronger and more durable conclusion than claiming the debate is completely settled.

Empirical Research & Brier Scores

Empirical Research & Brier Scores

What Are Brier Scores?

How Calibration Works

Why context matters

Risks: Thin Liquidity and Noise

What research supports today

FAQ

What is a Brier Skill Score (BSS)?

Do prediction markets outperform experts?

How do researchers account for platform fees?

Related Documentation

The History of Prediction Markets

Decentralized Oracles: How UMA Works

Empirical Research & Brier Scores

Empirical Research & Brier Scores

What Are Brier Scores?

How Calibration Works

Why context matters

Risks: Thin Liquidity and Noise

What research supports today

FAQ

What is a Brier Skill Score (BSS)?

Do prediction markets outperform experts?

How do researchers account for platform fees?

Related Documentation

The History of Prediction Markets

Decentralized Oracles: How UMA Works