Please rotate your device to landscape mode to view the charts.

The Colour of Finance Words

Journal: Journal of Financial Economics

Date: 20230301

Author: Garcia, Diego; Hu, Xiaowen; Rohrer, Maximilian

Abstract:
Our paper relies on stock price reactions to colour words, in order to provide new dictionaries of positive and negative words in a finance context. We extend the machine learning algorithm of Taddy (2013), adding a cross-validation layer to avoid over-fitting. In head-to-head comparisons, our dictionaries outperform the standard bag-of-words approach (Loughran and McDonald, 2011) when predicting stock price movements out-of-sample. By comparing their composition, word-by-word, our method refines and expands the sentiment dictionaries in the literature. The breadth of our dictionaries and their ability to disambiguate words using bigrams both help to colour finance discourse better.

Link: Google Scholar

Background and Context

Research Motivation

The study investigates whether machine learning algorithms can better predict stock price movements compared to traditional human-coded sentiment dictionaries used in finance.

Data Sources

The research analyzes 85,530 earnings call transcripts from 3,229 firms between 2006-2020, 76,922 10-K filings from 1996-2018, and 87,198 Wall Street Journal articles from 2000-2021.

Methodology

The authors develop new sentiment dictionaries using a machine learning algorithm (MNIR) that learns from stock price reactions to identify positive and negative words and phrases.

Machine Learning vs Traditional Dictionaries: Predictive Power for Stock Returns

Shows how well different methods predict stock price movements during earnings calls
ML methods achieve more than twice the predictive power of traditional dictionaries
Combining ML and traditional methods yields the best results

Coverage of Different Dictionaries Across Text Sources

Compares how much text is captured by different dictionaries across document types
ML dictionary achieves 3-4 times greater coverage than traditional LM dictionary
Coverage advantage holds across all three types of financial documents

Economic Impact of Sentiment on Stock Returns

Shows how much stock prices move in response to positive and negative sentiment
ML methods identify stronger market reactions than traditional dictionaries
Using word pairs (bigrams) captures the strongest market reactions

Dictionary Size Comparison

Compares the number of words in traditional vs ML dictionaries
ML achieves better results with far fewer words
Shows efficiency of machine learning in identifying the most important sentiment words

External Validity: Performance Across Different Text Sources

Shows how well the dictionaries work across different types of financial texts
ML dictionary maintains its advantage across all document types
Performance difference is largest for earnings calls where the ML was trained

Contribution and Implications

The research demonstrates that machine learning can create more effective sentiment dictionaries for analyzing financial texts than traditional human-coded approaches
The new ML dictionaries achieve better predictive power while using fewer words, making them more efficient tools for financial analysis
The methods developed can be applied to other languages and contexts, opening new possibilities for automated financial text analysis

Data Sources

Predictive Power Chart: Based on Table 2 regression results
Coverage Chart: Based on Table 5 dictionary coverage statistics
Economic Impact Chart: Based on Table 2 coefficient estimates
Dictionary Size Chart: Based on Table 5 dictionary statistics
External Validity Chart: Based on Tables 2, 3, and 4 R-squared values