Historical Analysis of National Subjective Wellbeing Using Millions of Digitized Books
Abstract: We present the first attempt to construct a long-run historical measure of subjective wellbeing using implied sentiment (known as valence) from language corpora derived from millions of digitized books. While existing measures of subjective wellbeing go back to at most the 1970s, we can go back at least 200 years further using our methods. We analyse data for six countries (the USA, UK, Germany, France, Italy and Spain). First we show that our measure is significantly positively related with existing measures of subjective wellbeing. We do this by noting the strong positive correlation through conventional regression analysis as well as non-parametrically through the use of a p-value histogram which confirms that words that are prominently associated with periods of high life satisfaction are also the same words that have high valence. We then compare our estimated measure with the two longest running (to the best of our knowledge) existing indices of welfare: GDP and Life Expectancy; we find that life expectancy has a robust and significant impact on our measure of subjective wellbeing across all specifications and models. This is robust to the introduction of conflicts, world wars and infant mortality. This last variable correlates negatively with our estimated wellbeing measures and independently from the effect of life expectancy. The correlation with GDP is weakly positive in some specifications, but becomes insignificant after applying filtering measures appropriate for data of this sort. Our new measure can then be used to estimate the impact on subjective wellbeing of major events such as natural disasters, world wars, recessions and booms and epidemics and may be able to play a similar role to long-run historical GDP series in informing policy-makers and inspiring future research.