One of the more typical workflows I need to perform is calculating counts within groups. The dplyr group_by() %>% summarise()
pipeline (or just count()
) can easily get aggregated tallies. But occasionally it’s helpful to get count within group along with the proportion overall.
Here’s one way to do that:
library(dplyr)
# use the starwars dataset
# one row per character
# can be grouped by character species
starwars
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke… 172 77 blond fair blue 19 male
## 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA>
## 4 Dart… 202 136 none white yellow 41.9 male
## 5 Leia… 150 49 brown light brown 19 female
## 6 Owen… 178 120 brown, gr… light blue 52 male
## 7 Beru… 165 75 brown light blue 47 female
## 8 R5-D4 97 32 <NA> white, red red NA <NA>
## 9 Bigg… 183 84 black light brown 24 male
## 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
starwars %>%
# get the count in category with count()
count(species, name = "species_n", sort = TRUE) %>%
# add a column that gives you total of all ...
# calculate proportion ...
# format n (%)
# keep in mind you won't be able to sort after formatting as character
mutate(n = sum(species_n),
prop = round(species_n/n * 100, digits = 0),
prop_with_n = paste0(species_n, " (", prop, "%)")) %>%
# return only columns of interest
select(species, prop_with_n)
## # A tibble: 38 x 2
## species prop_with_n
## <chr> <chr>
## 1 Human 35 (40%)
## 2 Droid 5 (6%)
## 3 <NA> 5 (6%)
## 4 Gungan 3 (3%)
## 5 Kaminoan 2 (2%)
## 6 Mirialan 2 (2%)
## 7 Twi'lek 2 (2%)
## 8 Wookiee 2 (2%)
## 9 Zabrak 2 (2%)
## 10 Aleena 1 (1%)
## # … with 28 more rows