The goal of this project is to use machine learning, specifically XGboost with a decision tree-based approach, to predict the missing forecasts from available data. This is a promising approach because forecasts across horizons are likely heavily correlated as the earnings of firms generally are relatively similar across two consecutive quarters. In addition, a plethora of firm-specific and macroeconomic data that is relevant to a firm's future earnings is released frequently. Given this wealth of data, it should be possible to accurately predict the missing forecasts and use them to study the degree of mispricing in financial markets.
Understanding how much stock prices are driven by biased beliefs is crucial to understanding the degree of stock market efficiency, which has significant implications for economic growth since the stock market determines how much capital is allocated to specific firms.
In addition, I started a separate project in which aims to investigate how the stock market reacts to and processes different types of information from companies' quarterly and annual reports.
Previous research has shown that even fundamental information such as earnings is processed slowly by the market. However, there is little evidence on what other information from corporate reports drives prices.
This project will address this question by training a large language model (LLM) to predict stock returns during various time periods after a company's disclosure and analyzing which part of the documents drove the model's prediction of returns for each specific period.