# Chapter 8 Part 6: Simple Statistics with `broom`

## 8.1 Learning Objectives

**Learn**about errors and warnings and where to ask for help**Learn**some basic Exploratory Data Analysis techniques**Learn**a basic analysis workflow for statistical modeling**Learn**about*formulas*and how to specify models using them**Learn**about t Tests and how to**apply**them to your dataset**Learn**and**apply**linear regression models**Learn**and**apply**Analysis of Variance (ANOVA)

## 8.2 Getting Help on Errors

### 8.2.1 Understanding the difference between warnings and errors

A *warning* is an indication that the data or arguments isn’t quite what the function expected.

You can usually run the code, but you should be careful about it and verify the output.

An *error* means that the code can’t execute at all given what you have given the function.

Errors can be difficult to understand, which is why

### 8.2.2 Googling is StandaRd pRactice foR eRrors

The first thing I do when I encounter an error is to search for the error. I usually start with Google.

I don’t know everything, and the odds are that I made a mistake in understanding the documentation.

There are some resources that I especially check (in order):

- RStudio Community (for
`tidyverse`

): https://community.rstudio.com/ - Stack Overflow: http://stackoverflow.com/
- Biostars (for Bioinformatics): https://www.biostars.org/
- The package’s github page (especially issues)

### 8.2.3 Where do I ask for help?

I’m trying to be as helpful as I can, but I can’t answer all of your questions.

The following communities are extremely helpful to beginners:

- R for Data Science Community: https://r4ds.slack.com/
- RStudio Community: https://community.rstudio.com/

## 8.3 Caveat about statistics

This is not meant to be a comprehensive course in statistics. We want to show you some basic techniques, but you will need to dig further.

Danielle Navarro’s Learning Statistics with R is excellent and talks much more about statistics: https://learningstatisticswithr.com/

## 8.4 Introducing `tidymodels`

We will be using the `broom`

package from the `tidymodels`

set of packages to make the modeling easier to work with.

`tidymodels`

attempts to unify all of the various modeling packages in a consistent interface.

`broom`

works mostly with the output of models. One of the problems with R is that the many modeling packages are not consistent to work with. It can be just as difficult to get a p-value out of a model as it is to run it on some data! `broom`

simpliflies this a lot.

There are 3 main functions in *broom*:

`tidy()`

- This is where you get most of the output you want, including*coefficients*and*p-values*`glance()`

- additional measures on your model, including R-squared, log likelihood, and AIC/BIC`augment()`

- make predictions with your model using new data

We will mostly use `tidy()`

and `glance()`

for right now.

## 8.5 T-tests

### 8.5.1 The Dataset

A study by Goran et.al (1996) examined the accuracy of some widely used body-composition techniques for children using three different methods:

- dual-energy X-ray absorptiometry (
`dxa`

) technique, - skin-fold thickness (
`st`

), - bioelectric resistance (
`br`

).

Subjects were children between 4 and 10 years old. Data were collected on 98 subjects (49 males and 49 females).

One purpose of the study was to determine whether there was a difference in fat mass measurements using `DXA`

(considered the gold standard method) compared to the skin-fold thickness method.

We also wish to determine if `DXA`

levels are significantly different between males and females.

### 8.5.2 Getting set up

```
body_comp <- read_csv('data/body_composition.csv', na="NA") %>%
clean_names() %>%
mutate(gender = factor(gender, levels=c("1", "0")))
head(body_comp)
```

```
## # A tibble: 6 x 4
## dxa st br gender
## <dbl> <dbl> <dbl> <fct>
## 1 3.65 4.55 4.26 1
## 2 3.92 2.82 6.09 0
## 3 7.53 3.89 5.12 0
## 4 6.24 5.49 8.04 0
## 5 10.6 10.5 14.2 0
## 6 9.58 11.2 12.4 0
```

### 8.5.3 Exploratory Data Analysis

Before we do any statistical tests on our data, we should first visualize it.

Since our ultimate goal is to examine the differences between bodyfat measurement methods, let’s create boxplots that illustrate this difference, if any.

Notice that the `aes()`

for `ggplot()`

only accepts one `x`

value and one `y`

value, but we have three columns we’d like to compare (`dxa`

, `st`

, `br`

). So, we need to convert our data to long format using `pivot_longer()`

.

```
body_comp_long <- body_comp %>%
pivot_longer(cols = c('dxa', 'st', 'br'),
names_to = 'method',
values_to = 'body_fat_percentage')
head(body_comp_long)
```

```
## # A tibble: 6 x 3
## gender method body_fat_percentage
## <fct> <chr> <dbl>
## 1 1 dxa 3.65
## 2 1 st 4.55
## 3 1 br 4.26
## 4 0 dxa 3.92
## 5 0 st 2.82
## 6 0 br 6.09
```

Now that we’ve done that, we can set `x = method`

and `y = body_fat_percentage`

.

```
ggplot(body_comp_long) +
aes(x = method, y = body_fat_percentage, fill = method) +
geom_boxplot() +
geom_jitter(color="grey")
```

It appears that our measurements are close to one another, but there are some noticable differences.

### 8.5.4 t-test

Briefly, a t-test should be used when examining whether the mean **between two groups** are similar This means that the measurements must be **numeric** (there are other tests for categorical data).

The null hypothesis for a t-test is that the two means are equal, and the alternative is that they are not.

One purpose of the study was to determine whether there was a difference in fat mass measurements using

`dxa`

(considered the gold standard method) compared to the skin-fold thickness method (`st`

).

Below, we will use a paired t-test. Paired simply means that each group (`dxa`

and `st`

) each contain measurements for the same subject on corresponding rows. If body fat measurements were collected using `dxa`

for children in Group A and `st`

for a separate set of children in Group B, then we would *not* use a paired t-test.

HYPOTHESIS: There is a difference in mean fat mass measurements between the DXA and skin-fold thickness (ST) methods.

NULL HYPOTHESIS: There is no difference in mean fat mass measurements between the two methods.

We also need to set a significance threshold. We’ll set it at 0.05.

```
body_comp_dxa_st <- body_comp_long %>%
filter(method %in% c("dxa", "st"))
tidy_output2 <- t.test(body_comp$dxa, body_comp$st, paired=TRUE) %>%
tidy()
tidy_output <-
t.test(body_fat_percentage ~ method,
paired=TRUE,
data=body_comp_dxa_st) %>%
tidy()
tidy_output2
```

```
## # A tibble: 1 x 8
## estimate statistic p.value parameter conf.low conf.high method alternative
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 -0.207 -1.78 0.0774 97 -0.438 0.0232 Paired t-~ two.sided
```

We see that `p.value`

is equal to `~0.634`

; this means **we cannot reject the null hypothesis** (i.e., the difference in body fat measurements between `dxa`

and `st`

are not statistically different from one another).

### 8.5.5 Your Turn

Try running `t.test`

, comparing `dxa`

and `br`

using `body_comp_long`

. Hint: You’ll have to filter the `method`

like above.

```
body_comp <- body_comp %>%
tidyr::drop_na()
body_comp_long <- body_comp %>%
pivot_longer(cols = c('dxa', 'st', 'br'),
names_to = 'method',
values_to = 'body_fat_percentage')
body_comp_dxa_sf <- body_comp_long %>%
filter(method %in% c("dxa", "br"))
tidy_output <-
t.test(body_fat_percentage ~ method,
paired=TRUE,
data=body_comp_dxa_sf) %>%
tidy()
tidy_output
```

## 8.7 Let’s build a simple linear model

We’ve established that there is a correlation between `dxa`

and `st`

. Can we use `st`

to predict `dxa`

?

y = ax + b

```
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.275 0.215 1.28 2.05e- 1
## 2 st 0.903 0.0367 24.6 3.17e-43
```

The predicted line through the data is:

dxa = 0.295 + 0.903 * `st`

### 8.7.1 Adding another variable

What if we included `gender`

in our model? Our model can accept `factors`

as inputs.

```
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.0646 0.224 0.288 7.74e- 1
## 2 st 0.888 0.0362 24.5 6.93e-43
## 3 gender0 0.577 0.222 2.60 1.09e- 2
```

In this case, `gender`

is a useful predictor of `dxa`

, since the p-value is less than our threshold of 0.05.

However, if we did use the equation, it would correspond to this equation:

`dxa`

= 0.097 + 0.889 * `st`

+ 0.536 * `gender0`

The factor `gender`

here is recoded as a “dummy” variable, and reading it is a little confusing. Note that it says `gender0`

and not `gender`

. That’s because it’s coding `gender`

= 0 as 1 here, and 0, if gender is 1.

Dummy variables are very confusing. http://www.sthda.com/english/articles/40-regression-analysis/163-regression-with-categorical-variables-dummy-coding-essentials-in-r/

### 8.7.2 Your Turn

Try adding `br`

as a term in the model. How does it change the p-value of `st`

?

```
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.111 0.208 0.532 5.96e- 1
## 2 st 0.699 0.0683 10.2 5.78e-17
## 3 br 0.213 0.0615 3.46 8.20e- 4
```

## 8.8 Analysis of Variance (ANOVA) (Optional)

We’ve determined that there isn’t a statistical difference between `dxa`

and `st`

, but we also meausured bodyfat using bioelectric resistance, `br`

.

Maybe we should see if it measures differently from the other two methods. Because a t-test can only be used to measure the differences in means between two groups, we’d have to use multiple t-tests.

But wait, should we do that right away? No, because we’ll run into the Multiple Comparisons Problem!

So rather than performing multiple t-tests, we first want to examine whether any of the groups is different from the rest of the groups using an ANOVA (`aov()`

).

`aov()`

uses the formula interface. The tilde (`~`

) can be translated to “is a function of”. Below, we are testing whether body fat percentage is a function of the type of body fat measurement method. We pipe the output of `aov()`

to `summary()`

to get a clearer idea of the output of the ANOVA.

```
## # A tibble: 2 x 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 method 2 40.7 20.3 2.01 0.136
## 2 Residuals 290 2935. 10.1 NA NA
```

The value `p.value`

is what we’re interested in. Because it is greater than 0.05, we can conclude that none of the measurement methods is significantly different from the others, and there is no reason to perform multiple t-tests on our dataset.

### 8.8.1 Post-hoc Tests

Now *if* our F statistic probability had come back below 0.05, then we could perform multiple post-hoc t-tests. However, we would need to account for false positives by using a correction method (e.g., Bonferroni).

```
pairwise.t.test(body_comp_long$body_fat_percentage,
body_comp_long$method,
p.adjust = "bonferroni") %>%
tidy()
```

```
## # A tibble: 3 x 3
## group1 group2 p.value
## <chr> <chr> <dbl>
## 1 dxa br 0.167
## 2 st br 0.432
## 3 st dxa 1
```

### 8.8.2 More about the Multiple Testing Problem

Consider what a p-value of 0.05 actually means: if a test is performed at the 0.05 level and the corresponding null hypothesis is true, there is only a 5% chance of incorrectly rejecting the null hypothesis. This is an okay risk to take given that we are only performing the t-test once. But if we were to perform the t-test 1,000 times on data where all null hypotheses were true, the expected number of incorrect rejections (also known as false positives or Type I errors) would be 50!

### 8.8.3 Acknowledgements

Written by Aaron Coyner and Ted Laderas