Skip to content

Commit fc917cf

Browse files
committed
Organizing the overhead slides
1 parent 62b7a79 commit fc917cf

File tree

2 files changed

+174
-198
lines changed

2 files changed

+174
-198
lines changed

index.html

+128-163
Large diffs are not rendered by default.

index.qmd

+46-35
Original file line numberDiff line numberDiff line change
@@ -574,17 +574,7 @@ X[1:4, 1:5]
574574
y[1:6]
575575
```
576576

577-
## Ex 4: Overhead cost (cont.)
578577

579-
- Naive approach: run regressions and return the **full** output.
580-
581-
- Problem: The `lm()` function returns a lot of information. Recovering all that information is costly:
582-
583-
```{r}
584-
#| echo: true
585-
#| label: overhead-cost-lm
586-
lm(y ~ X[,1]) |> str()
587-
```
588578

589579
## Ex 4: Overhead cost - Naive
590580

@@ -593,58 +583,79 @@ Let's start with the naive approach: fitting the model and returning the full ou
593583
```{r}
594584
#| label: overhead-cost-run
595585
#| echo: true
586+
#| cache: true
596587
library(parallel)
597-
cost_serial <- system.time(apply(X, 2, function(x, y) lm(y ~ x), y = y))
588+
cost_serial <- system.time(lapply(1:ncol(X), function(i) lm(y ~ X[,i])))
598589
599590
# Running the benchmark
600591
cl <- makePSOCKcluster(4)
601-
cost_pll <- system.time(parApply(cl, X, 2, function(x, y) lm(y ~ x), y = y))
592+
clusterExport(cl, c("X", "y"))
593+
cost_pll <- system.time(parLapply(cl, 1:ncol(X), function(i) lm(y ~ X[,i])))
602594
603595
# Stopping the cluster
604596
stopCluster(cl)
597+
```
605598

606-
c(Serial = cost_serial["elapsed"], Parallel = cost_pll["elapsed"])
599+
```{r}
600+
#| label: overhead-cost-output-table-naive
601+
#| echo: false
602+
data.frame(
603+
Serial = cost_serial["elapsed"],
604+
`Parallel naive` = cost_pll["elapsed"],
605+
row.names = "Elapsed time (s)",
606+
check.names = FALSE
607+
) |> t() |> knitr::kable()
607608
```
608609

610+
The problem: we are returning a lot of information that we may not need:
611+
612+
```{r}
613+
#| label: overhead-cost-lm-output
614+
#| echo: true
615+
# Approximate size of the output of apply/parApply
616+
format(ncol(X) * object.size(lm(y ~ X[,1])), units="GB")
617+
```
609618

610619
---
611620

621+
## Ex 4: Overhead cost - Less receiving
622+
623+
Instead of capturing the full output, we can just return the coefficients.
612624

613625
```{r}
614626
#| label: overhead-cost-coef-only
615627
#| echo: true
616-
617628
cl <- makePSOCKcluster(4)
618-
cost_pll_coef <- system.time(parApply(cl, X, 2, function(x, y) coef(lm(y ~ x)), y = y))
629+
clusterExport(cl, c("X", "y"))
630+
cost_pll_coef <- system.time(
631+
parLapply(cl, 1:ncol(X), function(i) coef(lm(y ~ X[,i])))
632+
)
619633
620634
# Stopping the cluster
621635
stopCluster(cl)
622636
```
623637

624-
---
625-
626-
```{r}
627-
#| label: overhead-cost-fork
628-
#| echo: true
629-
cost_pll_fork <- system.time({
630-
mclapply(1:ncol(X), function(j) coef(lm(y ~ X[,j])), mc.cores = 4)
631-
})
632-
```
633-
634638
```{r}
635-
#| label: overhead-cost-stop
639+
#| label: overhead-cost-output-table-coef
640+
#| echo: false
636641
data.frame(
637-
Type = c(
638-
"Serial", "Parallel", "Parallel (coef only)",
639-
"Parallel fork (coef only)"
640-
),
641-
Elapsed = c(
642-
cost_serial[3], cost_pll[3], cost_pll_coef[3],
643-
cost_pll_fork[3]
644-
)
645-
) |> t() |> knitr::kable()
642+
Serial = cost_serial["elapsed"],
643+
`Parallel naive` = cost_pll["elapsed"],
644+
`Parallel coef` = cost_pll_coef["elapsed"],
645+
row.names = "Elapsed time (s)",
646+
check.names = FALSE
647+
) |> t() |> knitr::kable()
646648
```
647649

650+
The coefficients are much smaller, significantly reducing the overhead cost to about `r format(ncol(X) * object.size(coef(lm(y ~ X[,1]))[1]), units="MB")`.
651+
652+
::: {.callout-tip title="Pro-tip"}
653+
Using a Fork cluster instead of a PSOCK cluster can further reduce the overhead cost. Both `X` and `y` would have been automatically available in the Fork cluster at 0 cost.
654+
:::
655+
656+
657+
658+
648659
## {style="text-align:center!important;"}
649660

650661
```{r thanks, out.width="300px", echo=FALSE}

0 commit comments

Comments
 (0)