Can Machine Learning Predict a Hit or Miss on Estimated Earnings?

At Bloomberg, we encourage our technologists and engineers to explore new technologies and think outside the box to solve complex problems — especially when solving those problems brings more value to our customers.  It is why Bloomberg Software Engineer Roberto Martin recently embarked on a project to implement prediction models regarding whether a publicly traded company will beat earnings estimates.

The code for this project and detailed paper resides on GitHub.

“My time spent working on this idea gave me an invaluable learning experience. Given the available programming tools and frameworks, relatively cheap cloud-based systems and readily available data, one can attain significant accomplishment with a little effort,” said Martin.

Here is the summary of his work in his own words:

Public companies listed and traded on the U.S stock exchanges are required by law to report earnings at the end of each fiscal quarter.  This quarterly report is scrutinized carefully by current and prospective shareholders to gain insights into how well the company is doing, and make decisions on whether or not to invest.

Traders are also keenly interested in these reports.  Share prices can move significantly if there are any surprises – such as when actual reported earnings differ greatly from analysts’ estimates.

But what if – by using machine learning – you could figure out the likelihood of these variables in advance?

Here’s a step by step description of how I approached the problem:

  1. First, I gathered historical daily OHLCV (Open, High, Low, Close, Volume) data for many stocks.
  2. Next, I used freely available data sources, which provide historical earnings data.
  3. Then, I organized (“munged”) the data in order to feed it into machine learning algorithms. In this case, the data was relatively clean. I just had to make sure prices were adjusted for splits. The challenging aspect at this stage was merging the datasets.
  4. Because earnings occur 4 times/fiscal year and I only had daily stock prices, I had to split the prices based on the company’s fiscal quarter and then select features from within each group. I used Python and Pandas to get this done. (Pandas has a great api and data structure for working with data, and in particular, financial data. Pandas made it easy to subset, group, aggregate and combine data. Also, it has powerful idioms for working with time indexed data.)
  1. The next step was to generate features. The features are really important because these are what we’re suggesting is predictive of the target variable. For instance, one feature was: “Did price rise more than 50% of the days during the quarter?” If it did, this feature would be given a value of 1, else it would be 0. These features are so-called “engineered” features, since they’re not directly lifted from the data, but extracted using some domain knowledge. A “non-engineered” feature would be simply the quarter. Another could be the sector.
  1. Once the features were extracted, I used Apache Spark’s MLlib to create the models using PySpark. Spark allows you to start small and easily scale to multiple “nodes” as the machines are called, without needing to change your code. You can use it for pretty much any step in the machine learning pipeline, such as munging data; however, I only needed to use the MLlib library for building models. Since we’re dealing with a classification problem here, I generated models using Logistic Regression, Decision Trees and Random Forests.  The data was split into testing (40%) and training (60%) sets, and the model was trained and tested. To clarify, testing involved using the features from the test set to make predictions of whether a company would beat or miss estimates. Once I had these predictions, I compared them to the actual target values to evaluate how well the models did.
  1. Metrics such as precision and recall were used in the model evaluation. Overall, the Random Forest model performed the best followed by Decision Trees. However, there are a number of improvements that could be made to the model building process. First, the models weren’t cross-validated. Second, I didn’t attempt to search for the model parameters that would produce the best results. Finally, I used 14 years of data, but after aggregating that becomes much less.

This demonstrates the steps one can take to build a model that attempts to answer the question: Will a company X beat earnings estimate this quarter? The bottom line: It’s a work in progress, but the best model had a recall of 68% and precision 61%, which is above the 50% mark that is equivalent to randomly guessing. The models built can be improved by including more stocks and getting data over a longer period of time, while adding parameter search and cross validation to the process.

Is this approach foolproof? Nothing is foolproof. But it could be an excellent resource to our customers who are looking for every edge possible to make the best, smartest decisions.

For more information about the process or the code used for this effort, visit the repository on