Census Diving 🤿 #1: Solar Heated Homes ☀️ (variable B25040_008) + an introduction

The first entry in a new series: a look at the distribution of solar heating across the country.

Welcome to Census Diving!

The U.S. Census hosts an almost unimaginable amount of data. Somewhere on the magnitude of trillions of data points, if I had to guess.1 There are probably a lot of interesting stories in the forest of data that get missed for the trees.

That’s the motivation for this new series. Each post will highlight a new Census variable — chosen completely at random2 — from the American Community Survey, a survey that covers the demographic, social, economic, and housing characteristics of the U.S. population. I’m hoping to write some interesting stories about the American public, and to learn more about the Census along the way.3

Since this is the first post, I thought I would lay down some ground rules:

  1. I will select the variable randomly and go with whatever is chosen. I’m not allowed to keep randomly selecting variables until I get one I think is interesting. I’m aware this is hard to verify — you’ll just have to trust me.
  2. In telling a story about the chosen variable, I’m allowed to look to other variables — with preference for variables in the same group — for assistance. Still, though, the main focus should be the variable that was randomly selected.
  3. I can use any iteration of the American Community Survey (1-year, 3-year, 5-year), and can use data from any year.

With those out of the way, let’s dive in.

Today’s randomly chosen variable is B25040_008. Here are the details:

  • Label: Total → Solar energy
  • Group: House heating fuel
  • Universe: Owner-occupied housing units

This variable details the number of occupied housing units in a specified geography that use solar power for heat.

Admittedly, I don’t know very much about solar heating. So I thought, as a first step, I should get a lay of the land. Here is what the distribution of heating fuel energy sources looks like today:

Distribution of Heating Energy Sources

Unsurprisingly, gas and electric heating dominate, accounting for almost 90% of occupied housing units. Solar accounts for less than a third of a percent. Astonishingly, this has barely changed over the past decade. Here is what the distribution has looked like every year since 2011. It’s difficult to tell that it’s moved at all!

Distribution of Heating Energy Sources over Time

Next, I turned to geography.

Share of Units Heated by Solar Across States

Hawaii has the highest rate of solar heating in the country (4.4%). The Southwest region dominates the contiguous U.S., with California (1.08%) and Arizona (1.08%) coming in second and third place respectively. Hawaii, California, and Arizona are the only three states above 1%. Note: if you’re curious as to why Hawaii has a rate more than four times higher than anywhere else, it may be because other forms of energy are uncharacteristically expensive on the island.

In last place is North Dakota, the only state where zero housing units use solar energy for heat. North Dakota is actually one of four states that have fewer solar-heated housing units today than they did in 2011 — the others are Tennessee, South Dakota, and West Virginia.

The counties with the highest rate of solar heating in the country are: Maui, Hawaii (6.04%); Alpine, California (5.07%); and Honolulu, Hawaii (4.07%). 1,715 of America’s 3,143 counties are tied for last place and have zero housing units heated by solar power.

Share of Units Heated by Solar Across Counties

This astounding fact got me thinking about the distribution of solar-heated housing units across the country. As it turns out, the distribution is extremely unequal.

Distribution of Units with Heat Powered by Solar

Just 5% of America’s Census tracts account for nearly three-quarters of the country’s solar-heated units, and 1% of tracts account for almost a third. Meanwhile, 86% of tracts have no solar-heated units at all. Overall, the distribution amounts to a Gini Coefficient of 0.932 (0 is perfect equality, 1 is perfect inequality). For comparison, the Gini Coefficient for the distribution of American incomes is 0.49! All heating fuels — except for coal — are more evenly distributed than solar.

One possible explanation for this pattern of stark inequality is that income may moderate solar power usage.

This hypothesis is, at least in part, supported by the data. Rates of solar-powered heating tend to grow towards the middle of income distribution before leveling out or even slightly decreasing, before a peak at the upper end of the distribution.

Association Between Solar Powered Heat and Income

For a more robust investigation of this relationship, it would be helpful to have individual level data as opposed to data aggregated at the tract level. It would be interesting to try and predict solar usage from a battery of demographic indicators — I’m curious whether income is the primary interaction or whether other variables can tell us more.

That’s all for Census Diving #1. I hope you enjoyed it! Stay tuned for my next post!

The code to reproduce these figures is on my github


  1. Take the 2021 American Community Survey 5-Year Data Detailed Tables, as an example. This dataset details more than 20,000 variables across at least 578,000 unique geographic areas. That comes out to at least 11,560,000,000 data points. In addition to the Detailed Tables, there are Subject Tables (18,000 variables), Data Profiles (1,000 variables), and Comparison Profiles (1,000 variables) for this survey. Also, there has been a 5-Year ACS released every year since 2009, excluding 2020. Then, there are also 3- and 1-year estimates released over the same time period. Then, there are thousands of other surveys. It adds up quite quickly. 

  2. I’m quite literally randomly sampling a single variable at a time from the list of available variables using R. 

  3. As an ancillary benefit, I’m working on a python wrapper for the Census API, and this project should give me a great opportunity to workshop some ideas I have.