This is a study note for using \(purrr\) ackage for list manipulation in parallel, with $. For more details on the study material see:
library(purrr) #
library(repurrrsive) # provides examples of lists. We explore them below, to lay the groundwork for other lessons, and to demonstrate list inspection strategies
library(listviewer) # expose list exploration in a rendered .Rmd document
library(jsonlite)
library(dplyr)
library(tibble)
\(purrr\) enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you’ve never heard of FP before, the best place to start is the family of map()
functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the map()
functions is the iteration chapter in R for data science. (source: purrr Overview)
listviewer::jsonedit(gh_users, mode = "view")
List is built on top of vector, whereas data frames and tibbles is built on top of list. therefore, compare to list, Vector is lower-level, data frames and tibbles is higher-level. The following is an exmaple of the data object architecture:
Lists are a step up in complexity from atomic vectors: each element can be any type, not just vectors. You construct lists by using list()
instead of c()
:
The following funciton can be use to explore list:
str(..., list.len = x, max.level = y)
str(got_chars, list.len = 3, max.level = 1)
## List of 30
## $ :List of 18
## $ :List of 18
## $ :List of 18
## [list output truncated]
str(got_chars, list.len = 3, max.level = 2)
## List of 30
## $ :List of 18
## ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1022"
## ..$ id : int 1022
## ..$ name : chr "Theon Greyjoy"
## .. [list output truncated]
## $ :List of 18
## ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1052"
## ..$ id : int 1052
## ..$ name : chr "Tyrion Lannister"
## .. [list output truncated]
## $ :List of 18
## ..$ url : chr "https://www.anapioficeandfire.com/api/characters/1074"
## ..$ id : int 1074
## ..$ name : chr "Victarion Greyjoy"
## .. [list output truncated]
## [list output truncated]
The natural reflex as a programmer may be to loop over all values of the vector and apply the function, but vectorization makes that unnecessary. (source: dummies)
vec <- c(9, 16, 25) # make a vector
# a generic for loop
output <-c()
for (i in 1:length(vec)) {
output[i] <- sqrt(vec[i])
}
output
## [1] 3 4 5
# Vectorized operations
sqrt(vec)
## [1] 3 4 5
purrr::map()
is a function for applying a function to each element of a list, as well as atomic vector. The closest base R function is lapply()
. A template for basic map() usage:
map(YOUR_LIST, YOUR_FUNCTION)
If you expect map()
to return output that can be turned into an atomic vector, it is best to use a type-specific variant (Also purrr will alert you to any problems, i.e. if one or more inputs has the wrong type or length.):
map_lgl()
: return logical-type vectormap_chr()
: return charactor-type vectormap_int()
: return integer-type vectormap_dbl()
: return double-type vectorextract information by variable name and index.
map_chr(got_chars[9:12], "name")
## [1] "Daenerys Targaryen" "Davos Seaworth" "Arya Stark"
## [4] "Arys Oakheart"
map_chr(got_chars[13:16], 3)
## [1] "Asha Greyjoy" "Barristan Selmy" "Varamyr" "Brandon Stark"
Without specifying the output type as atomic vector, map()
output list as default.By 1) passing [
into .F
or 2) using index to extract elelmen(s) from list:
map(gh_users, 1)
## [[1]]
## [1] "gaborcsardi"
##
## [[2]]
## [1] "jennybc"
##
## [[3]]
## [1] "jtleek"
##
## [[4]]
## [1] "juliasilge"
##
## [[5]]
## [1] "leeper"
##
## [[6]]
## [1] "masalmon"
map(gh_users, "login")
## [[1]]
## [1] "gaborcsardi"
##
## [[2]]
## [1] "jennybc"
##
## [[3]]
## [1] "jtleek"
##
## [[4]]
## [1] "juliasilge"
##
## [[5]]
## [1] "leeper"
##
## [[6]]
## [1] "masalmon"
x1 <- map(gh_users, `[`, c(18,1,2,21))
listviewer::jsonedit(x1, mode = "view")
x2 <- map(gh_users, `[`, c("name", "login", "id", "location"))
listviewer::jsonedit(x2, mode = "view")
map_dfr()
specifies the output to be data frame, which is the perfect data structure for a list with multiple variables.
map_dfr(gh_users, `[`, c("name", "login", "id", "location"))
## # A tibble: 6 x 4
## name login id location
## <chr> <chr> <int> <chr>
## 1 Gábor Csárdi gaborcsardi 660288 Chippenham, UK
## 2 Jennifer (Jenny) Bryan jennybc 599454 Vancouver, BC, Canada
## 3 Jeff L. jtleek 1571674 Baltimore,MD
## 4 Julia Silge juliasilge 12505835 Salt Lake City, UT
## 5 Thomas J. Leeper leeper 3505428 London, United Kingdom
## 6 Maëlle Salmon masalmon 8360597 Barcelona, Spain
Notice how the variables have been automatically type converted. It’s a beautiful thing. Until it’s not. When programming, it is safer, but more cumbersome, to explicitly specify type and build your data frame the usual way.
gh_users %>% {
tibble(
login = map_chr(., "login"),
name = map_chr(., "name"),
id = map_int(., "id"),
location = map_chr(., "location")
)
}
## # A tibble: 6 x 4
## login name id location
## <chr> <chr> <int> <chr>
## 1 gaborcsardi Gábor Csárdi 660288 Chippenham, UK
## 2 jennybc Jennifer (Jenny) Bryan 599454 Vancouver, BC, Canada
## 3 jtleek Jeff L. 1571674 Baltimore,MD
## 4 juliasilge Julia Silge 12505835 Salt Lake City, UT
## 5 leeper Thomas J. Leeper 3505428 London, United Kingdom
## 6 masalmon Maëlle Salmon 8360597 Barcelona, Spain
listviewer::jsonedit(gh_repos, mode = "view")
# prepare data
unames <- map_chr(gh_repos, c(1, 4, 1))
udf <- gh_repos %>%
set_names(unames) %>%
enframe("username", "gh_repos")
udf
## # A tibble: 6 x 2
## username gh_repos
## <chr> <list>
## 1 gaborcsardi <list [30]>
## 2 jennybc <list [30]>
## 3 jtleek <list [30]>
## 4 juliasilge <list [26]>
## 5 leeper <list [30]>
## 6 masalmon <list [30]>
This shows that we know how to operate on a list-column inside a tibble:
udf %>% mutate(n_repos = map_int(gh_repos, length))
## # A tibble: 6 x 3
## username gh_repos n_repos
## <chr> <list> <int>
## 1 gaborcsardi <list [30]> 30
## 2 jennybc <list [30]> 30
## 3 jtleek <list [30]> 30
## 4 juliasilge <list [26]> 26
## 5 leeper <list [30]> 30
## 6 masalmon <list [30]> 30
The dataframe udf has 6 gh_repos. For one gh_repos, we do the following operation:
one_user <- udf$gh_repos[[1]]
map_df(one_user, `[`, c("name", "fork", "open_issues"))
## # A tibble: 30 x 3
## name fork open_issues
## <chr> <lgl> <int>
## 1 after FALSE 0
## 2 argufy FALSE 6
## 3 ask FALSE 4
## 4 baseimports FALSE 0
## 5 citest TRUE 0
## 6 clisymbols FALSE 0
## 7 cmaker TRUE 0
## 8 cmark TRUE 0
## 9 conditions TRUE 0
## 10 crayon FALSE 7
## # … with 20 more rows
To apply the above one-instance operation to all row in the dataframe, we use mutate()
to map()
inside a map()
:
map()
list-izates all elements in gh_repos variable in dataframe udf.map()
list-izates all element in name, fork, open_issues in dataframe gh_repos.udf %>%
mutate(repo_info = gh_repos %>%
map(. %>% map_df(`[`, c("name", "fork", "open_issues"))))
## # A tibble: 6 x 3
## username gh_repos repo_info
## <chr> <list> <list>
## 1 gaborcsardi <list [30]> <tibble [30 × 3]>
## 2 jennybc <list [30]> <tibble [30 × 3]>
## 3 jtleek <list [30]> <tibble [30 × 3]>
## 4 juliasilge <list [26]> <tibble [26 × 3]>
## 5 leeper <list [30]> <tibble [30 × 3]>
## 6 masalmon <list [30]> <tibble [30 × 3]>
We demonstrate three more ways to specify general .f
:
map(aliases, paste, collapse = "|")
map(aliases, function(x) paste(x, collapse = "|"))
map(aliases, ~ paste(.x, collapse = " | "))
# prepare data
aliases <- set_names(map(got_chars, "aliases"), map_chr(got_chars, "name"))
aliases <- aliases[c("Theon Greyjoy", "Asha Greyjoy", "Brienne of Tarth")]; aliases
## $`Theon Greyjoy`
## [1] "Prince of Fools" "Theon Turncloak" "Reek" "Theon Kinslayer"
##
## $`Asha Greyjoy`
## [1] "Esgred" "The Kraken's Daughter"
##
## $`Brienne of Tarth`
## [1] "The Maid of Tarth" "Brienne the Beauty" "Brienne the Blue"
Use a pre-existing function. Or, as here, define one ourselves, which gives a nice way to build-in our specification for the collapse
argument.
my_fun <- function(x) paste(x, collapse = " | ")
map(aliases, my_fun)
## $`Theon Greyjoy`
## [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
##
## $`Asha Greyjoy`
## [1] "Esgred | The Kraken's Daughter"
##
## $`Brienne of Tarth`
## [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
Define an anonymous function on-the-fly, in the conventional way. Here we put our desired value for the collapse
argument into the function defintion itself.
map(aliases, function(x) paste(x, collapse = " | "))
## $`Theon Greyjoy`
## [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
##
## $`Asha Greyjoy`
## [1] "Esgred | The Kraken's Daughter"
##
## $`Brienne of Tarth`
## [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
Alternatively you can simply name the function and provide collapse via ...
map(aliases, paste, collapse = " | ")
## $`Theon Greyjoy`
## [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
##
## $`Asha Greyjoy`
## [1] "Esgred | The Kraken's Daughter"
##
## $`Brienne of Tarth`
## [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
\(purrr\) provides a very concise way to define an anonymous function: as a formula. This should start with the ~
symbol and then look like a typical top-level expression, as you might write in a script. Use .x
to refer to the input, i.e. an individual element of the primary vector or list.
map(aliases, ~ paste(.x, collapse = " | "))
## $`Theon Greyjoy`
## [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
##
## $`Asha Greyjoy`
## [1] "Esgred | The Kraken's Daughter"
##
## $`Brienne of Tarth`
## [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
The tibble::enframe()
function takes a named vector and promotes the names to a proper variable.
# using formula Anonymous function
map_chr(aliases, ~ paste(.x, collapse = " | ")) %>%
tibble::enframe(value = "aliases")
## # A tibble: 3 x 2
## name aliases
## <chr> <chr>
## 1 Theon Greyjoy Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer
## 2 Asha Greyjoy Esgred | The Kraken's Daughter
## 3 Brienne of Tarth The Maid of Tarth | Brienne the Beauty | Brienne the Blue
map2()
and pmap
have all the type-specific friends you would expect: map2_chr(),
map2_lgl()
, etc.
# prepare data
nms <- got_chars %>%
map_chr("name")
birth <- got_chars %>%
map_chr("born")
map2(.x, .y, .f, ...)
map(INPUT_ONE, INPUT_TWO, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)
map2_chr(nms, birth, function(x, y) paste(x, "was born", y)) %>% head()
## [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke"
## [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock"
## [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke"
## [4] "Will was born "
## [5] "Areo Hotah was born In 257 AC or before, at Norvos"
## [6] "Chett was born At Hag's Mire"
pmap(.l, .f, ...)
map(LIST_OF_INPUT_LISTS, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)
df <- got_chars %>% {
tibble::tibble(
name = map_chr(., "name"),
aliases = map(., "aliases"),
allegiances = map(., "allegiances")
)
}
my_fun <- function(name, aliases, allegiances) {
paste(name, "has", length(aliases), "aliases and",
length(allegiances), "allegiances")
}
df %>%
pmap_chr(my_fun) %>%
tail()
## [1] "Kevan Lannister has 1 aliases and 1 allegiances"
## [2] "Melisandre has 5 aliases and 0 allegiances"
## [3] "Merrett Frey has 1 aliases and 1 allegiances"
## [4] "Quentyn Martell has 4 aliases and 1 allegiances"
## [5] "Samwell Tarly has 7 aliases and 1 allegiances"
## [6] "Sansa Stark has 3 aliases and 2 allegiances"