name: xaringan-title class: left, middle, inverse background-color: '#BBB' background-size: cover <style type="text/css"> .remark-code-line-highlighted { background-color: rgba(97, 172, 240, .5) !important; } .huge .remark-code { /*Change made here*/ font-size: 125% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> # Exploratory Data Visualization for Time-Series and Longitudinal Data <div style='font-size: 25px;'> Christian Testa <br> <a href='mailto:ctesta@hsph.harvard.edu'><svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#F92672;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> ctesta@hsph.harvard.edu</a> <br> <a style='color: #55acee;' href='https://fediscience.org/@ctesta'><svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#55acee;" xmlns="http://www.w3.org/2000/svg"> <path d="M433 179.11c0-97.2-63.71-125.7-63.71-125.7-62.52-28.7-228.56-28.4-290.48 0 0 0-63.72 28.5-63.72 125.7 0 115.7-6.6 259.4 105.63 289.1 40.51 10.7 75.32 13 103.33 11.4 50.81-2.8 79.32-18.1 79.32-18.1l-1.7-36.9s-36.31 11.4-77.12 10.1c-40.41-1.4-83-4.4-89.63-54a102.54 102.54 0 0 1-.9-13.9c85.63 20.9 158.65 9.1 178.75 6.7 56.12-6.7 105-41.3 111.23-72.9 9.8-49.8 9-121.5 9-121.5zm-75.12 125.2h-46.63v-114.2c0-49.7-64-51.6-64 6.9v62.5h-46.33V197c0-58.5-64-56.6-64-6.9v114.2H90.19c0-122.1-5.2-147.9 18.41-175 25.9-28.9 79.82-30.8 103.83 6.1l11.6 19.5 11.6-19.5c24.11-37.1 78.12-34.8 103.83-6.1 23.71 27.3 18.4 53 18.4 175z"></path></svg> @ctesta</a> <br> <a style='color: #636e72;' href='https://github.com/ctesta01'><svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#636e72;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> ctesta01</a> <br> <br> October 25th, 2024 </div> <img src="images/osha_covid.png" alt="Research Figure Showing correlations in COVID and OSHA complaints"/> <span style='font-family: Open Sans; font-size: 12px;'>Hanage, W.P., Testa, C., Chen, J.T. et al. [COVID-19: US federal accountability for entry, spread, and inequities—lessons for the future.](https://link.springer.com/article/10.1007/s10654-020-00689-2) Eur J Epidemiol 35, 995–1006 (2020). https://doi.org/10.1007/s10654-020-00689-2</span> --- # Motivations <span style='font-size: 25px;'> > "The greatest value of a picture is when it forces us to notice what we never expected to see." –John Tukey -- > "Visualization is often used for evil - twisting insignificant data changes and making them look meaningful. Don't do that crap if you want to be my friend. Present results clearly and honestly. If something isn't working - those reviewing results need to know." —John Tukey </span> -- <img src='images/EleanorLutz.png' height='400px'> --- # Aims - Learn how to use data manipulation tools such as `dplyr` and `tidyr` - Learn how to use `ggplot2`, a powerful, flexible framework for visualizing data in R - Learn where to find more resources <img src='images/R-for-Data-Science.jpg'/> --- # Before we get started There are some packages you'll want to make sure you have installed. ```r install.packages("tidyverse") ``` ```r library(tidyverse, quietly = F, warn.conflicts = T) ``` ``` ## ── Attaching core tidyverse packages ─────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.4 ✔ readr 2.1.5 ## ✔ forcats 1.0.0 ✔ stringr 1.5.1 ## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0 ## ✔ purrr 1.0.1 ## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors ``` --- # Example Data Set ```r df <- readr::read_csv("example_data/example_dataset_1.csv") ``` ``` ## Rows: 400 Columns: 6 ## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (2): strata, gender ## dbl (4): X2005, X2010, X2015, X2020 ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` -- Why use `readr::read_csv`? - Reports on the assumed column types, with options to override - Loads faster - Loads into a tibble, which are faster, better data.frames. --- # Check out the data ```r knitr::kable(head(df,4)) ``` <table> <thead> <tr> <th style="text-align:left;"> strata </th> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> X2005 </th> <th style="text-align:right;"> X2010 </th> <th style="text-align:right;"> X2015 </th> <th style="text-align:right;"> X2020 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 10.509583 </td> <td style="text-align:right;"> 15.796463 </td> <td style="text-align:right;"> 14.918578 </td> <td style="text-align:right;"> 25.42652 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 19.413696 </td> <td style="text-align:right;"> 25.082179 </td> <td style="text-align:right;"> 25.200332 </td> <td style="text-align:right;"> 30.43139 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 12.475157 </td> <td style="text-align:right;"> 19.262194 </td> <td style="text-align:right;"> 19.930757 </td> <td style="text-align:right;"> 20.63809 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 1.284895 </td> <td style="text-align:right;"> 5.251643 </td> <td style="text-align:right;"> 8.746856 </td> <td style="text-align:right;"> 10.24585 </td> </tr> </tbody> </table> Note that: - Data is in a wide format - Column names need cleaning - We have groups of participants --- # Examine the categorical variables Let's check what those groups are: ```r unique(df$strata) ``` ``` ## [1] "A" "B" "C" "D" ``` -- ```r table(df$gender) ``` ``` ## ## F M ## 210 190 ``` --- # Summarize quantitative variables -- ```r df %>% select(-c(strata, gender)) %>% summary() ``` ``` ## X2005 X2010 X2015 X2020 ## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 ## 1st Qu.: 5.217 1st Qu.: 6.765 1st Qu.: 6.135 1st Qu.: 7.621 ## Median : 9.292 Median :10.592 Median :10.280 Median :12.065 ## Mean : 9.537 Mean :11.260 Mean :10.677 Mean :12.678 ## 3rd Qu.:12.851 3rd Qu.:15.073 3rd Qu.:14.331 3rd Qu.:17.638 ## Max. :34.977 Max. :34.707 Max. :33.236 Max. :39.052 ``` -- ... Let's break this one down. --- <img src="images/MagrittrPipe.png" align='right' width='20%' style='padding: 50px;'><br> # Intro to `%>%` and dplyr ```r df %>% select(-c(strata, gender)) %>% summary() ``` -- `%>%`, the pipe operator, comes from the `magrittr` package, but is also included in `dplyr`. -- `x %>% f()` is equivalent to `f(x)` <br> `x %>% f(y)` is equivalent to `f(x,y)` -- `df %>% select(-c(strata, gender))` is equivalent to <br>`select(df, -c(strata, gender))` -- Read `x %>% f(y)` as "`x` gets passed to `f` with additional argument `y`." -- Using pipes helps to: 1. chain several commands together, 2. without creating unnecessarily nested one-liners, e.g. <br> `summary(select(df, -strata))` --- # What's the deal with select? ```r df %>% select(-c(strata, gender)) %>% summary() ``` `select` is the command for subsetting the columns of a data.frame or tibble. -- Notice that `strata` and `gender` are not in quotes. This is because `dplyr` and many of the functions in the tidyverse use tidy-evaluation, which allows users to reference column names of data.frames and tibbles as if they are variables within tidyverse functions. -- The minus sign is saying that we want to remove strata and gender, or equivalently to select all of the columns except for strata and gender. ``` ## X2005 X2010 X2015 X2020 ## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 ## 1st Qu.: 5.217 1st Qu.: 6.765 1st Qu.: 6.135 1st Qu.: 7.621 ## Median : 9.292 Median :10.592 Median :10.280 Median :12.065 ## Mean : 9.537 Mean :11.260 Mean :10.677 Mean :12.678 ## 3rd Qu.:12.851 3rd Qu.:15.073 3rd Qu.:14.331 3rd Qu.:17.638 ## Max. :34.977 Max. :34.707 Max. :33.236 Max. :39.052 ``` --- # Let's add participant ID numbers ```r df <- df %>% mutate(id = 1:nrow(.)) %>% select(id, everything()) knitr::kable(head(df,3)) ``` <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> strata </th> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> X2005 </th> <th style="text-align:right;"> X2010 </th> <th style="text-align:right;"> X2015 </th> <th style="text-align:right;"> X2020 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 10.50958 </td> <td style="text-align:right;"> 15.79646 </td> <td style="text-align:right;"> 14.91858 </td> <td style="text-align:right;"> 25.42652 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 19.41370 </td> <td style="text-align:right;"> 25.08218 </td> <td style="text-align:right;"> 25.20033 </td> <td style="text-align:right;"> 30.43139 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 12.47516 </td> <td style="text-align:right;"> 19.26219 </td> <td style="text-align:right;"> 19.93076 </td> <td style="text-align:right;"> 20.63809 </td> </tr> </tbody> </table> -- Note the use of `.` here, which refers to the argument passed using `%>%`. -- Equivalently, we could have written `mutate(df, id = 1:nrow(df))`. -- `everything()` is part of the `tidyselect` package and system which helps with the programmatic selection of columns and offers other helpful functions like `starts_with` or `contains`. --- # A note about `|>` and `%>%` <small> `|>` is a pipe now built into base R starting from version 4.1.0. Basically the main differences is that `%>%` uses the `.` notation to refer to the left-hand-side of the pipe, while `|>` uses `_` instead. There are some subtle differences in what you can do with the left-hand-side (like `%>%` supports `.$var` while `|>` does not). In many regards, the two pipes are similar, and you will start to see more and more code using `|>` because it is new and hopefully faster than `%>%`. </small> .pull-left[ <a href="https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/"><img src="images/pipe_article.png" width='300px' /> </a> ] .pull-right[ Read more here: <https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/> ] --- # Let's convert to a tidy format ```r df <- df %>% tidyr::pivot_longer( cols = starts_with('X'), names_to = 'year', values_to = 'rate') knitr::kable(head(df, 4)) ``` <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> strata </th> <th style="text-align:left;"> gender </th> <th style="text-align:left;"> year </th> <th style="text-align:right;"> rate </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> X2005 </td> <td style="text-align:right;"> 10.50958 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> X2010 </td> <td style="text-align:right;"> 15.79646 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> X2015 </td> <td style="text-align:right;"> 14.91858 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> X2020 </td> <td style="text-align:right;"> 25.42652 </td> </tr> </tbody> </table> --- # Get rid of "X" ```r df <- df %>% mutate(year = stringr::str_remove(year, "X")) knitr::kable(head(df, 4)) ``` <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> strata </th> <th style="text-align:left;"> gender </th> <th style="text-align:left;"> year </th> <th style="text-align:right;"> rate </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> 2005 </td> <td style="text-align:right;"> 10.50958 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> 2010 </td> <td style="text-align:right;"> 15.79646 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> 2015 </td> <td style="text-align:right;"> 14.91858 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> 2020 </td> <td style="text-align:right;"> 25.42652 </td> </tr> </tbody> </table> --- # Now we can do some plotting .pull-left[ ```r ggplot(data = df, aes(x = year, y = rate)) + geom_point() ``` ] .pull-right[ <img src="longitudinal_eda_files/figure-html/unnamed-chunk-17-1.png" width="100%" /> ] --- # Now let's try geom_line .pull-left[ ```r ggplot(data = df, aes(x = year, y = rate, * group = id)) + geom_line() ``` ] .pull-right[ <img src="longitudinal_eda_files/figure-html/unnamed-chunk-18-1.png" width="100%" /> ] --- # Adding color .pull-left[ ```r ggplot(data = df, aes(x = year, y = rate, group = id, * color = strata)) + * geom_line(alpha=0.5) ``` ] .pull-right[ <img src="longitudinal_eda_files/figure-html/unnamed-chunk-19-1.png" width="100%" /> ] --- # Facet Wrapping .tiny[ ```r ggplot(data = df, aes(x = year, y = rate, group = id, color = strata)) + geom_line(alpha=0.5) + * facet_wrap(~strata) ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-20-1.png" width="576" height="100%" style="display: block; margin: auto;" /> --- # Facet Grid .tiny[ ```r ggplot(data = df, aes(x = year, y = rate, group = id, color = strata)) + geom_line(alpha=0.5) + * facet_grid(gender~strata) + ggtitle("Different strata had different trajectories") ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-21-1.png" width="576" height="100%" style="display: block; margin: auto;" /> --- # Using Stat Summaries .tiny[ ```r ggplot(data = df, aes(x = year, y = rate, group = id, color = gender)) + geom_line(alpha=0.5) + facet_wrap(~strata) + stat_summary(aes(group = interaction(strata, gender)), fun = mean, geom='line', color = 'black') + stat_summary(aes(group = interaction(strata, gender), shape=gender), fun = mean, geom='point', size=2, color = 'black') + labs(shape = 'Gender and Strata\nLevel Average', color = 'Gender') + ggtitle("Men have higher rates than women") ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-22-1.png" width="576" height="100%" style="display: block; margin: auto;" /> --- # Another way using boxplots .tiny[ ```r ggplot(data = df, aes(x = year, y = rate, color = gender)) + geom_boxplot(alpha=0.5) + stat_summary(aes(group = interaction(strata, gender), shape=''), position = position_dodge(width=0.75), fun = mean, geom='point', color = 'grey10', alpha=0.8) + facet_wrap(~strata) + labs(color = "Gender", shape = "Gender + Strata\nLevel Average") + ggtitle("Boxplots allow us to see the interquartile range clearly") ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-23-1.png" width="576" height="100%" style="display: block; margin: auto;" /> --- # Using `geom_ribbon` .tiny[ ```r df %>% group_by(strata, gender, year) %>% summarize( percentile_97.5 = quantile(rate, 0.975), percentile_2.5 = quantile(rate, 0.025), mean = mean(rate), .groups = 'keep') %>% ggplot(aes(x = year, y = mean, ymax = percentile_97.5, ymin = percentile_2.5, group = gender, fill = gender, color = gender)) + geom_ribbon(alpha=0.5, linewidth = 0) + geom_line(aes(linetype='')) + facet_wrap(~strata) + scale_color_manual(values = c('M' = '#2980b9', 'F' = '#c0392b')) + labs(linetype = 'Gender+Strata\nLevel Average', fill = 'Gender', color = 'Gender', y = 'Rate') + ggtitle(paste0("The difference between men and women was consistent over time")) ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-24-1.png" width="432" height="100%" style="display: block; margin: auto;" /> --- # Using plotly for interactive graphics .tiny[ ```r suppressMessages(library(plotly)) ggplotly() %>% layout(width = 8, height = 3.5) ```
] ``` ## Warning: Specifying width/height in layout() is now deprecated. ## Please specify in ggplotly() or plot_ly() ```
--- # Widening Data for Correlation Analysis Before we can look at correlation across the years, we need to widen the dataframe (similar to how it was originally formatted). ```r df_wide <- df %>% tidyr::pivot_wider(id_cols = c(id, strata, gender), names_from = year, values_from = rate) knitr::kable(head(df_wide, 3)) ``` <table> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> strata </th> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> 2005 </th> <th style="text-align:right;"> 2010 </th> <th style="text-align:right;"> 2015 </th> <th style="text-align:right;"> 2020 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 10.50958 </td> <td style="text-align:right;"> 15.79646 </td> <td style="text-align:right;"> 14.91858 </td> <td style="text-align:right;"> 25.42652 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 19.41370 </td> <td style="text-align:right;"> 25.08218 </td> <td style="text-align:right;"> 25.20033 </td> <td style="text-align:right;"> 30.43139 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 12.47516 </td> <td style="text-align:right;"> 19.26219 </td> <td style="text-align:right;"> 19.93076 </td> <td style="text-align:right;"> 20.63809 </td> </tr> </tbody> </table> --- # Correlation Plot .tiny[ ```r # install.packages(GGally) library(GGally) ggpairs(df_wide, aes(color = strata, alpha=0.25), columns = c('2005', '2010', '2015', '2020'), progress=F) ``` ] <img src="longitudinal_eda_files/figure-html/unnamed-chunk-27-1.png" width="576" height="100%" style="display: block; margin: auto;" /> --- # Bivariate Pairs Plots ```r ggbivariate(df, outcome = 'gender', explanatory = 'strata') + theme(legend.position = 'bottom') + ggtitle("Strata were about evenly split across gender") ``` <img src="longitudinal_eda_files/figure-html/unnamed-chunk-28-1.png" width="432" height="100%" style="display: block; margin: auto;" /> --- <img src="images/tt_logo.png" align='right' width='30%' style='padding: 0px;'> # Where you can learn more For data manipulation and visualization: <img src="images/ggplot2-cheatsheet.png" align='right' width='30%' style='padding: 0px;'> - R for Data Science, by Garrett Grolemund and Hadley Wickham, [r4ds.had.co.nz](https://r4ds.had.co.nz) - The ggplot2 Website, [ggplot2.tidyverse.org](https://ggplot2.tidyverse.org) - The [RStudio Cheatsheets](https://rstudio.com/resources/cheatsheets/) (I suggest starting with [dplyr](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) and [ggplot2](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf)) - Watch the [TidyTuesday tutorials on YouTube](https://www.youtube.com/results?search_query=tidytuesday) or check out [TidyTuesday on GitHub](https://github.com/rfordatascience/tidytuesday/) -- For longitudinal data analysis: - [Applied Longitudinal Analysis by Garrett Fitzmaurice, Nan Laird, and James Ware](https://content.sph.harvard.edu/fitzmaur/ala2e/) - [Marie Davidian's Slides on Modeling and Analysis of Longitudinal Data](https://www4.stat.ncsu.edu/~davidian/enar06_handout.pdf) - [Patrick Hagearty's notes on Longitudinal Data Analysis](https://faculty.washington.edu/heagerty/Courses/VA-longitudinal/private/LDAchapter.pdf) (fairly technical) - Longitudinal Data Analysis: Autoregressive Linear Mixed Effects Models, by Ikuko FunatogawaTakashi Funatogawa (very technical) --- # How to take your longitudinal analysis further .pull-left[ <!-- <div style='font-size: 75% !important;'> --> <ul> Use models to make inferences about your data. Models for longitudinal data often include the following features: <br> <br> <li> Multi-level or random effects design</li> <li> Generalized Linear Models</li> <li> Auto-regressive</li> <li> Treatment of missing data</li> </ul> <!-- </div> --> ] .pull-right[ <img src='images/multilevel-reaction.png'/> <span style='font-size: 16px;'> `library(lme4)` <br> `lmer(Reaction ~ Days + (Days|Subject), sleepstudy)` </span> ] --- # Get More Inspiration .pull-left[ * [Georgios Karamanis](https://karaman.is/) * [Cedric Scherer](https://cedricscherer.com) * [Eleanor Lutz](https://eleanorlutz.com/) * [Flowing Data](https://flowingdata.com/) <img src="images/Kaashoek.png" alt="A figure showing effectiveness of statewide interventions as an effect modifier" /> <span style='font-size: 12px;'>Kaashoek J, Testa C, Chen JT, Stolerman LM, Krieger N, Hanage WP, et al. [The evolving roles of US political partisanship and social vulnerability in the COVID-19 pandemic from February 2020–February 2021.](https://journals.plos.org/globalpublichealth/article?id=10.1371/journal.pgph.0000557) PLOS Global Public Health. 2022 </span> ] .pull-right[ <img src="images/EscalatingDrought.jpg" height='500px' alt='A figure on drought from Cedric Scherer' /> ] --- # Join the R User Group! rug-at-hdsi.org .pull-left[ <img src="images/rug.png" alt='screenshot of our RUG YouTube'><br> Join at <https://rug-at-hdsi.org> and check out our [YouTube](https://www.youtube.com/@rusergroupatharvarddatasci7232/videos) ] <!-- .pull-right[ <img src="images/tidymodels.png" style='max-height: 450px;' alt="poster for our upcoming talk on Tidymodels" /> ] --> --- # Find this talk on my github <img src='screenshot.png' height='450px' /> <img src='images/github_qr.png' style = 'vertical-align: top;' height='175px' /> <https://github.com/ctesta01/longitudinal_eda_talk> --- # Image Credits <div style='font-size:small'> <ul> <li>R for Data Science: https://www.dataoptimal.com/wp-content/uploads/R-for-Data-Science.jpg </li> <li>Magrittr Logo: https://magrittr.tidyverse.org/logo.png</li> <li>Tidy Tuesday Logo: https://github.com/rfordatascience/tidytuesday/</li> <li>Escalating Drought: https://www.scientificamerican.com/article/climate-change-drives-escalating-drought/</li> </ul> </div>