Can AI Models Replace Human Insight in Financial Analysis?

Exploring the capabilities of AI models in financial analysis, and how they compare to human insight.
Can AI Models Replace Human Insight in Financial Analysis?

AI vs. Human Insight in Financial Analysis

The Challenge

Can the best AI models today accurately pick up the most important message out of a company earnings call? They can certainly pick up SOME points, but how do we know if those are the important ones? Can we prompt them into doing a better job? To find those answers, we look at what the best journalists in the field have done and try to get as close to that with AI.

Earnings Call Earnings calls are quarterly events where senior management reviews the company’s financial results.

The Significance of Earnings Calls

Earnings calls are quarterly events where senior management reviews the company’s financial results. They discuss the company’s performance, share commentary, and sometimes preview future plans. These discussions can significantly impact the company’s stock price. Management explains their future expectations and reasons for meeting or surpassing past forecasts.

![Financial Analysis](/_search_image financial analysis) The power of automation in earnings analysis.

The Power of Automation in Earnings Analysis

Statista reports that there are just under 4000 companies listed on the NASDAQ and about 58,000 globally according to one estimate. A typical conference call lasts roughly 1 hour. To just listen to all NASDAQ companies, one would need at least 10 people working full-time for the entire quarter. And this doesn’t even include the more time-consuming tasks like analyzing and comparing financial reports.

The Process of Testing AI as a Financial Analyst

To test how well the best LLMs of the day can do this job, I decided to compare the main takeaways by humans and see how well AI can mimic that. Here are the steps:

  • Pick some companies with recent earnings call transcripts and matching news articles.
  • Provide the LLMs with the full transcript as context and ask them to provide the top three bullet points that seem most impactful for the value of the company.

![AI Analysis](/_search_image ai analysis) GPT-4 shows best performance at 80% when providing it the previous quarter’s transcript and using a set of instructions on how to analyze transcripts well (Chain of Thought).

Summary of Results

GPT-4 shows best performance at 80% when providing it the previous quarter’s transcript and using a set of instructions on how to analyze transcripts well (Chain of Thought).

![Confusion](/_search_image confusion) How the Bud Light boycott and Salesforce’s AI plans confused the best AIs.

What do LLMs get right and wrong?

How the Bud Light Boycott and Salesforce’s AI plans confused the best AIs.

Experiment Design and Choices

Why Earnings call transcripts? The more intuitive choice may be company filings, however, I find transcripts to present a more natural and less formal discussion of events.

![Choice of Companies](/_search_image company selection) Choice of Companies.

Variability of Results

LLM results can vary between runs, so I have run all experiments 3 times and show an average.

![Choice of Prompts](/_search_image prompt selection) Choice of Prompts.

Conclusion

From what we can see above, Journalists’ and Research Analysts’ jobs seem safe for now, as most LLMs struggle to get more than two of three answers correctly.

![Future of AI](/_search_image ai future) Where to from here?

Where to from here?

We have all witnessed that LLM capabilities continuously improve. Will this gap be closed and how?