-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathheritability.Rmd
162 lines (116 loc) · 3.98 KB
/
heritability.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: "Heritability"
author: "Filippo"
date: "2023-03-28"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
base_folder = "/home/filippo/Documents/ciampolini/unipisa_2023/bioinformatics_and_biostatistics_training/introduction_to_animal_breeding"
library("sommer")
library("pedigree")
library("tidyverse")
library("data.table")
data(DT_example)
```
## Estimate heritability
#### The `sommer` R package
In this notebook, we use the `sommer` *R* package to estimate variance components. The `sommer::mmer` solves the MME (mixed model equations).
The multivariate (and by extension the univariate) mixed model has the form:
$$
Y = X\beta + ZU + e
$$
- $Y$: vector (matrix) of phenotypes
- $X\beta$: systematic part of the mixed model
- $ZU$: random part of the mixed model (e.g. genetic effects)
- $e$: residuals
Below the related variance structure ($X\beta$ is the mean, $V$ is the variance-covariance matrix):
$$
Y \sim MVN(X\beta, V)
$$
$$
V =
\begin{bmatrix}
Z_1K\sigma_{g1}^2Z'_i + H\sigma_{e1}^2 & \ldots & Z_1K\sigma_{g1,t}Z'_t + H\sigma_{e1,t} \\
\vdots & \ddots & \vdots \\
Z_1K\sigma_{g1,t}Z'_t + H\sigma_{e1,t} & \ldots & Z_tK\sigma_{gt,t}^2Z'_t + H\sigma_{et,t}^2
\end{bmatrix}
$$
where $i = 1 \ldots t$ indicates the number of traits, $K$ is the covariance matrix (kinship) for the $k_{th}$ random effect, and $H=I$ is an identity (or partial identity) matrix for the residual term.
`sommer` provides kernels to estimate additive (A.mat), dominance (D.mat), and epistatic (E.mat) relationship matrices.
**Heritability** is one of the most popular parameters among the breeding and genetics communities because of the insight it provides in the inheritance of the trait and potential selection response. Heritability is usually estimated as narrow sense ($h^2$; only additive variance in the numerator $σ^2_A$), and broad sense ($H^2$; all genetic variance in the numerator $σ^2_G$).
### Illustration
We first use a built-in example dataset: 41 potato lines evaluated in 3 environments in an RCBD design
```{r example}
dt_2013 = DT_example |>
filter(Block == "CA.2013.1")
```
We need to subset the relationship matrix to make it match the data subset we are using (one year):
```{r}
vec <- colnames(A_example) %in% dt_2013$Name
A <- A_example[vec,vec]
dt_2013$Name <- as.character(dt_2013$Name)
```
Now we can run the estimation of variance components:
```{r}
res = mmer(Weight ~ 1,
random = ~vsr(Name, Gu=A),
rcov = ~units,
nIters = 10,
data = dt_2013)
```
The **variance components**:
```{r varcomp}
summary(res)$varcomp
```
And the **estimated heritability** ($h^2$):
```{r}
vpredict(res, h2 ~ (V1)/(V1+V2))
```
### Dog data
We now use real data from a dog population (Braque Français):
```{r}
phenotypes = fread(file.path(base_folder, "data/pheno.dat"))
```
```{r}
ped <- fread(file.path(base_folder, "data/pedigree.txt"))
animals <- rep(TRUE,nrow(ped))
makeA(ped, animals)
A <- read.table("A.txt",header=FALSE)
```
We need to transform the kinship matrix from the long format to a symmetric matrix:
```{r}
triA <- A |>
spread(key = "V2", value = "V3") |>
dplyr::select(-V1)
tA = t(triA)
triA[upper.tri(triA)] <- tA[upper.tri(tA)]
```
Now we ensure correspondence between the phenotype data and the kinship matrix:
```{r example}
vec <- ped$dog_id %in% phenotypes$id
A <- as.matrix(triA[vec,vec])
colnames(A) <- phenotypes$id
rownames(A) <- phenotypes$id
```
And we are ready to go to estimate heritability for one of the traits in the phenotypic dataset (we pick *game*):
```{r}
res = mmer(game ~ 1,
random = ~vsr(id, Gu=A),
rcov = ~units,
nIters = 10,
data = phenotypes)
```
The **variance components**:
```{r varcomp}
summary(res)$varcomp
```
And the estimated heritability:
```{r}
vpredict(res, h2 ~ (V1)/(V1+V2))
```
### Exercise
Now it's your turn to pick another trait from the file of phenotypes and estimate the heritability:
```{r}
## your code here
```