Key Findings
Simple Mean Imputation Works Well
Imputing missing values with cross-sectional means performs similarly to more sophisticated imputation methods for machine learning portfolios, with potential equal-weighted returns of 66% per year.
Limited Information in Missing Data
The observed predictors provide little information about missing values due to block missingness patterns and low cross-sectional correlations between predictors.
Sophisticated Imputation Can Underperform
Complex imputation methods like EM can introduce estimation noise that leads to underperformance if machine learning is not carefully applied.
Portfolio Returns Across Imputation Methods
- Equal-weighted returns around 65-68% annually across different imputation methods
- Value-weighted returns around 37-43% annually
- Neural network (NN1/NN3) forecasts show consistently strong performance regardless of imputation method
Predictor Correlations Distribution
- Most correlations cluster near zero (-0.25 to +0.25)
- First 10 principal components explain only 40% of total variance
- Low correlations suggest limited information gain from sophisticated imputation
Imputation Error by Market Cap
- Higher imputation errors for small-cap stocks (RMSE > 0.70)
- Error magnitude approaches that of simple mean imputation (RMSE = 1.0) for smallest stocks
- More reliable imputations for large-cap stocks
Contribution and Implications
- Provides rigorous justification for the common practice of mean imputation in machine learning asset pricing studies
- Demonstrates that sophisticated imputation methods may not improve performance despite theoretical advantages
- Highlights the importance of careful implementation when using complex imputation methods with machine learning
Data Sources
- Portfolio returns visualization based on Table 4 showing performance across imputation and forecasting methods
- Correlation distribution chart based on Figure 3 showing empirical distribution of predictor correlations
- Imputation error chart based on Figure 4 showing out-of-sample RMSE by market equity decile