Please rotate your device to landscape mode to view the charts.

Key Findings

Patent Data Bias Issues

Significant biases exist in patent and citation counts when aggregated at firm level, particularly affecting newer patents and citations. These biases are systematically related to firm characteristics.

Technology & Regional Variation

Patent and citation biases vary substantially across technology classes and geographical regions, with computer/electronics patents and states like California showing the largest disparities.

Machine Learning Solutions

Machine learning approaches using firm-level information perform significantly better than traditional adjustment methods in addressing patent and citation biases.

Firm-Level Patent and Citation Bias Correlations

  • Larger firms show greater patent bias (0.0548 coefficient) and citation bias (0.121 coefficient)
  • Higher market-to-book ratios correlate with increased patent bias (0.0464) and citation bias (0.0920)
  • R&D intensity shows positive correlation with both patent (0.0235) and citation (0.0417) biases

Machine Learning Model Performance Comparison

  • Machine learning models achieve higher R² values (0.74-0.81) compared to traditional benchmarks (0.43-0.63)
  • Linear SVR performs best with R² of 0.81 and lowest RMSE of 66.22
  • All ML models outperform conventional adjustment methods

Regional Patent Bias Distribution

  • California and Massachusetts show 2.5x increase in patenting between 1990-2000
  • Delaware showed minimal increase in patenting activity
  • Regional differences persist even after traditional adjustments

Contribution and Implications

  • Demonstrates systematic biases in patent data that affect research inferences in corporate finance
  • Provides an actionable checklist for researchers using patent data
  • Introduces machine learning as a promising solution for addressing patent data biases

Data Sources

  • Firm-level bias correlations based on Table 1 regression coefficients
  • Machine learning performance metrics derived from Table 4 model comparisons
  • Regional patent distribution based on Figure 3 patent application trends