Data Analytics & Infrastructure
What it is
To build useful analytics or automated trading systems, developers usually need two broad types of data:
- Live State Data: The current order book bids/asks and the last traded price.
- Historical Tick Data: Every single order placed, canceled, and executed over the life of a market, used for backtesting strategies and training AI predicting models.
The exact availability of these datasets differs by platform, and developers should not assume that live trading APIs are the best source for historical research.
Why it matters
You cannot backtest a quantitative trading algorithm without clean, granular historical data.
The hard part is not just getting data. It is getting the right kind of data in the right format for the question you are asking.
For example:
- a market-making bot needs current state and fast updates
- a research notebook needs clean historical exports
- a strategy review may need settlement data, volume history, and rule text
Mixing these use cases leads to messy systems and weak conclusions.
How to think about the data stack
1. Live market data
Live data is what you use for execution, alerts, and real-time dashboards. It often comes from websockets or frequently updated market-data endpoints.
2. Historical data
Historical data is what you use for backtesting, research, and model review. Depending on the platform, it may come from exports, separate datasets, or indexer-style services rather than the main live endpoint.
3. Metadata and resolution context
Raw prices are not enough. You also need market wording, deadlines, settlement rules, and category labels. Without that context, historical analysis can become misleading very quickly.
Example: Building a Backtest
Suppose you want to backtest a momentum strategy: "If 'Yes' shares on an inflation contract jump 10% in 5 minutes, buy and hold for 1 hour."
- Data Acquisition: Pull historical trade and market data from the appropriate source, not just the live feed.
- Cleaning: Normalize timestamps, contract identifiers, and settlement outcomes.
- Simulation: Test the rule against the cleaned dataset and include realistic assumptions about fees, spreads, and slippage.
Risks
- Survivorship Bias: When analyzing historical data, it's easy to accidentally test your algorithm only on markets you already know resolved to "Yes," creating dangerously bloated profit simulations that will fail in live trading.
- Data Cleanliness: Because prediction markets deal in highly qualitative events (unlike a standard stock ticker), the resolution criteria strings ("Resolves Yes if Candidate X files FEC paperwork by Tuesday 5PM EST") are often messy, making automated parsing of historical rulesets very difficult.
- Execution mismatch: A backtest built on clean historical prints can still fail in production if your live execution assumptions are unrealistic.
FAQ
Q: Can I get full historical order book data (Level 2 data) for free? Generally, platforms provide historical trade executions (Level 1) for free or via bulk dumps. However, reconstructing the exact state of the entire order book (Level 2) at every millisecond historically is incredibly data-intensive and often requires specialized, paid institutional data feeds.
Q: Why don't the API docs mention historical data endpoints? Because live trading documentation and historical research workflows are often not the same product surface. Developers may need separate docs, exports, or data services for historical work.