Objectives
In this section, we will review a list of affordable (e.g., less than or equal to approximately $150 per month) data vendors that cover the US equities market. We want to understand what they provide at what cost, their API limitations, and if they have historical and live data coverage. We can’t determine their individual data quality because this would require quantifying their accuracy relative to a pristine data vendor such as Bloomberg or FactSet. So, for our purposes, we will assume each vendor has a similar data quality and no one vendor is superior. This data is specifically for constructing our backtesting and risk platform. Although we can generate strategies with this data, we will revisit data sources for trading strategies at a later time.
Summary
Below is a table summarizing our results. If you wish to complete this project, I highly recommend you continue reading the section covering Data before subscribing to any of these data vendors. This way you will better understand what we are looking for and decide which data vendors are the best fit for your needs and budget.
The columns I provided contain information that I found helpful when searching for data vendors. I recommend you study each data vendor carefully as their costs, limits, and coverage might have changed since writing this post.
Cost:
This column quantifies the price in USD per month to subscribe to the data vendor. An empty cell indicates the data is free and a question mark, “?”, implies that we must contact them for a quote. I will assume that if a data vendor does not provide a price upfront, it is expensive. I use the notation, “$” to indicate less than $75 per month, “$$” less than $150 per month, and “$$$” greater than $150 per month.
API Limit:
This column quantifies the number of API calls we can make in a minute. This number is important because our universe will be roughly 4000 to 5000 equities at any given day over the past N years. If we are limited to one call per equity, day, and feature, then that would require 5000 (equities) x 250 (days) x N (years) calls for just one feature. So, I prefer this number to be as large as possible with a threshold of at least 100 calls per minute.
Reference:
This column quantifies if the data vendor has reference data (i.e., symbology). We want to know what the universe is at any given day going back N years. More importantly, we want to be able to map between multiple data vendors using a consistent and unique identifier such as ticker plus exchange, exchange level Figi, CUSIP, or an equivalent identifier. This is very important for our infrastructure because it will allow us to manage our universe across vendors and historically.
Corporate Actions:
This column quantifies if the data vendor has corporate action data which includes at minimum cash dividends and stock splits. Ideally, we are looking for far more including mergers and acquisitions, delistings, IPOs, stock dividends, rights offerings, and spinoffs. This is very important for our infrastructure because it will allow us to stitch our reference data together throughout our historical period.
Price & Volume:
This column quantifies if the data vendor has minute bar SIP (Securities Information Processor) aggregated data. A more affordable alternative is to use only the IEX exchange minute bar aggregated data, which manages roughly 2% - 3% of the total US daily exchange volume. We won’t rely on tick-level order book data at this time given our funding and storage constraints.
Calendar Events:
This column quantifies if the data vendor has equity-level calendar event information such as earnings or investor relation day announcements. Additional information might include analysts’ estimates leading up to these calendar events.
Fundamentals:
This column quantifies if the data vendor has company fundamentals such as the balance sheets, cash flow, and income statements. Ideally, we would like to standardize this information across vendors and equities.
Alternative:
This column quantifies if the data vendor has additional data that might help develop strategies such as implied volatility, analyst target prices, recommendations, news sentiment, etc. Note this data isn’t required until we have set up our backtesting and risk management platforms.
We want to prioritize data vendors that provide more than one information resource and have historical and live data. Live data is updated before or during US market hours. These might include the tradeable universe, corporate actions, and real-time price and volume data. Live data is important because that is what we will be using to run and monitor our portfolio day-to-day operations.
Updating this table
If you find any inconsistencies with this table or would like to add another data vendor, then please feel free to message me.
Coming up next
We will examine each vendor individually and discuss why or why not we selected to subscribe to the vendor.