Performing basic geo spatial analysis with US Census data.

Part 1: Geo-spatial analysis with US Census data

The goal of this document is to show basic geo-spatial analysis in R. These plots or similar version of these plots were used to perform a basic corporate diligence looking at the general workforce metrics including the SAHIE insurance data and net migration. Together these can help provide an overall understanding of the average demographics of the population within a 45 mile radius (~1-1.5 hr by car) from the capital. This document is meant for education purposes only.

Document summary:

Performing basic geo-spatial analysis with US Census data.
Written by: Pete Skach and Eric W. Olle
CC-BY-SA license for the r-markdown document.
GPLv3 for the code.
No Warranty Provided
Created: Oct 2022
Edited for blog: October 25, 2022

The GitHub for this project contains the R-markdown and the data used to generate this blog. The data and the markdown document are CC-BY-SA.

Loading the packages required.

One item needed for the used prior to using the US Census API packages (tidycensus or ) is a API key. Please see the Kyle Walkers Tidycensus page for more information.

Once the packages are loaded the next step is to start on the data collection, plots and finally making a full figure.

Looking at net migration on a county level

In this first call the function get_estimates will be used to obtain the county level information on net migration to (+) or from (-) a county.

Once the basic migration data has been obtained from the US census summary tables, this can be summarized to create an insert into the graph. The county level spatial information will be used for other plots. The inclusion of county-line information with census-tract data allow for an easy comparison between the graphs.

sum_df <- net_migration %>%
    select(-geometry) %>%
    data.frame() %>%
    summarize(min = round(min(value),2), 
              max = round(max(value), 2), 
            mean = round(mean(value),2), 
            median = round(median(value),2),
            sd =  round(sd(value),2))

knitr::kable(sum_df)

min	max	mean	median	sd
-24.72	13.66	-5.35	-4.85	6.51

net_migration %>%
    ggplot(aes(fill = value)) + 
    geom_sf(color = "white") +
    theme_void() +
    geom_point(color = "Black",
               aes(x = -81.626500, y = 38.347410, group = NULL)) +
    geom_polygon(data =  charls$circle, aes(lon, lat), inherit.aes=FALSE, color = "black", alpha = 0) +
    scale_fill_viridis_c(option = "turbo")  +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5)) +
    labs(title = "Migration Rate per 100,000",
         fill = "Migration") +
    annotate(geom = "table", x = -83, y  = 36.5, label = list(sum_df), 
             vjust = 0, hjust = 0, size = 2) -> migration

migration

This plot shows the overall migration to (+) and from (-) the different counties in WV. The summary table shows the overall distribution of the county migration.

Overall workforce participation

The next plot will concentration on the number of people between 18-64 that are employed full-time as a ratio to the total population to estimate workforce participation. This is different from unemployment.

## Getting data from the 2016-2020 5-year ACS

## Using FIPS code '54' for state 'WV'

WV_employ  %>%
    ggplot(aes(fill = pct_labor_partic)) + 
    geom_sf(color = NA) +
    geom_sf(data = net_migration, color = "white", fill = NA, inherit.aes = FALSE) + 
    theme_void() +
    scale_fill_viridis_c(option = "turbo") +
    geom_point(color = "black",
               aes(x = -81.626500, y = 38.347410, group = NULL), alpha = .2) +
    geom_polygon(data =  charls$circle, aes(lon, lat), inherit.aes=FALSE, color = "black", alpha = 0) +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5)) +
    labs(title = "Probability in the workforce.",
         fill = "Participation") -> wkfrce_partic

wkfrce_partic

The above plot shows the overall full-time workforce participation. This data looks at eh 18-64 year old demographics and may underestimate total participation in the workforce. In some of the southern WV counties, there is a higher than average participation in the 65 and older group.

Income data

The next graph will look at the median income on a census tract level.

## Getting data from the 2016-2020 5-year ACS

## Using FIPS code '54' for state 'WV'

Once the income data is obtained next is to plot in a format similar to above.

WV_income %>%
  ggplot(aes(fill = estimate)) +
  geom_sf(color = NA) +
  geom_sf(data = net_migration, color = "white", fill = NA, inherit.aes = FALSE) +
  theme_void() +
  geom_point(color = "black",
               aes(x = -81.626500, y = 38.347410, group = NULL), alpha = .2) +
  geom_polygon(data =  charls$circle, aes(lon, lat), inherit.aes=FALSE, color = "black", alpha = 0) +
  scale_fill_viridis_c(option = "turbo") +
  theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5)) +
  labs(title = "Median Income.") -> median_income

median_income

This shows the median income by census tract with county-line shown in white. On average, WV is below the national median.

Ratio of Individual versus Employer provided health insurance

Finally, the final graph before making a aggregated figure is to look at the ratio of individual provided versus employer provided health insurance coverage. This can be an indicator for a large number of part-time jobs that provide enough income to not qualify for medicaid but do not provide health insurance benefits.

WV_healt_ins <- get_acs(
  state = "WV",
  geography = "tract",
  variables = c(Employer = "C24050_001E",
                Ind_purch  = "C27005_001E"),
  geometry = TRUE,
  output = "wide",  
  year = 2020
)

## Getting data from the 2016-2020 5-year ACS

## Using FIPS code '54' for state 'WV'

This time the ggplots ability to calculate the ratio by row will be used.

WV_healt_ins %>%
    ggplot(aes(fill = Ind_purch/Employer)) + 
    geom_sf(color = NA) +
    geom_sf(data = net_migration, color = "white", fill = NA, inherit.aes = FALSE) + 
    theme_void() +
    geom_point(color = "black",
               aes(x = -81.626500, y = 38.347410, group = NULL), alpha = 0.2) +
    geom_polygon(data =  data.frame(charls), aes(circle.lon, circle.lat), inherit.aes=FALSE, color = "black", alpha = 0) +
    scale_fill_viridis_c(option = "turbo")  +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5)) +
        labs(title = "Insurance rate Individual/Employer.",
         fill = "Ind/Emp") -> ins_source

ins_source

ration of ind/emp Health coverage — Ratio of Individual to Employer provided health insurance

The above plot shows that on average there is a higher rate of individual provide health insurance in the southern side of the state.

Putting four plots together.

Finally, the plots will be arrange into a single plot using ggpubr.

 ggarrange(median_income, wkfrce_partic, ins_source, migration,   
           nrow = 2, ncol =2, align = "none") %>%
  annotate_figure(., bottom = text_grob("Data source 2020 5-year ACS. Circle is 45 mi from Charleston"))

4 panel figure — Four panel plot of the data

Conclusion

The tidycensus package combined with ggplot2 can be a useful tool for basic geospatial analysis. In the next blog, how to make a pseudo-3d plot and stack the figures will be shown.

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Pop!_OS 20.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.39       RWordPress_0.2-3 ggpmisc_0.5.0    ggpp_0.4.5      
##  [5] magrittr_2.0.3   ggpubr_0.4.0     tigris_1.6.1     forcats_0.5.1   
##  [9] stringr_1.4.0    dplyr_1.0.9      purrr_0.3.4      readr_2.1.2     
## [13] tidyr_1.2.0      tibble_3.1.7     ggplot2_3.3.6    tidyverse_1.3.1 
## [17] tidycensus_1.2.2 geosphere_1.5-14
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7       fs_1.5.2           sf_1.0-7           lubridate_1.8.0   
##  [5] httr_1.4.3         gh_1.3.0           tools_4.2.1        backports_1.4.1   
##  [9] utf8_1.2.2         rgdal_1.5-32       R6_2.5.1           KernSmooth_2.23-20
## [13] DBI_1.1.3          colorspace_2.0-3   withr_2.5.0        sp_1.5-0          
## [17] tidyselect_1.1.2   gridExtra_2.3      curl_4.3.2         compiler_4.2.1    
## [21] cli_3.3.0          rvest_1.0.2        quantreg_5.93      SparseM_1.81      
## [25] xml2_1.3.3         labeling_0.4.2     scales_1.2.0       classInt_0.4-7    
## [29] proxy_0.4-27       rappdirs_0.3.3     digest_0.6.29      foreign_0.8-82    
## [33] pkgconfig_2.0.3    highr_0.9          dbplyr_2.2.1       rlang_1.0.4       
## [37] readxl_1.4.0       XMLRPC_0.3-1       rstudioapi_0.13    farver_2.1.1      
## [41] generics_0.1.3     jsonlite_1.8.0     car_3.1-0          RCurl_1.98-1.8    
## [45] s2_1.0.7           Matrix_1.5-1       Rcpp_1.0.9         munsell_0.5.0     
## [49] fansi_1.0.3        abind_1.4-5        lifecycle_1.0.1    yaml_2.3.5        
## [53] stringi_1.7.8      carData_3.0-5      MASS_7.3-58.1      grid_4.2.1        
## [57] maptools_1.1-4     crayon_1.5.1       lattice_0.20-45    haven_2.5.0       
## [61] cowplot_1.1.1      splines_4.2.1      hms_1.1.1          pillar_1.7.0      
## [65] uuid_1.1-0         markdown_1.1       ggsignif_0.6.3     XML_3.99-0.10     
## [69] wk_0.6.0           reprex_2.0.1       glue_1.6.2         evaluate_0.15     
## [73] modelr_0.1.8       vctrs_0.4.1        tzdb_0.3.0         MatrixModels_0.5-0
## [77] cellranger_1.1.0   gtable_0.3.0       assertthat_0.2.1   xfun_0.31         
## [81] mime_0.12          broom_1.0.0        gitcreds_0.1.2     e1071_1.7-11      
## [85] rstatix_0.7.0      class_7.3-20       survival_3.4-0     viridisLite_0.4.0 
## [89] units_0.8-0        ellipsis_0.3.2

References

Aphalo P (2022). ggpp: Grammar Extensions to ‘ggplot2’. R package version 0.4.5,
https://CRAN.R-project.org/package=ggpp.

Aphalo P (2022). ggpmisc: Miscellaneous Extensions to ‘ggplot2’. R package version 0.5.0,
https://CRAN.R-project.org/package=ggpmisc.

Bache S, Wickham H (2022). magrittr: A Forward-Pipe Operator for R. R package version 2.0.3,
https://CRAN.R-project.org/package=magrittr.

Kassambara A (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version
0.4.0, https://CRAN.R-project.org/package=ggpubr.

Hijmans R (2021). geosphere: Spherical Trigonometry. R package version 1.5-14,
https://CRAN.R-project.org/package=geosphere.

Walker K (2022). tigris: Load Census TIGER/Line Shapefiles. R package version 1.6.1,
https://CRAN.R-project.org/package=tigris.

Walker K, Herman M (2022). tidycensus: Load US Census Boundary and Attribute Data as
‘tidyverse’ and ‘sf’-Ready Data Frames. R package version 1.2.2,
https://CRAN.R-project.org/package=tidycensus.

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686,
https://doi.org/10.21105/joss.01686

Part 1: Geo-spatial analysis with US Census data

Loading the packages required.

Looking at net migration on a county level

Overall workforce participation

Income data

Ratio of Individual versus Employer provided health insurance

Putting four plots together.

Conclusion

References

You Might Also Like

Basic SVM using simulated multiplex cytokine data

Part 2. Using R to discover ASPE oligo’s: xTAG multiplexing

Using python to convert or create composite postscript images of MERS RNA secondary structure

Leave a Reply Cancel reply