[We'd like to thank Phish nerd extraordinaire, Maya Gans (@WindoraBug on .net, @mayacelium on Twitter), for writing this post and sharing the phishr library that she wrote with Sam Levin (@levisc8 on .net, @SamLevin5 on Twitter) with the community - ed.]
When I tell people I meet outside of the scene that I’m a Phish fan it’s always met with a certain look - you know the one. But this always makes me laugh because one of the reasons I love Phish so much is how they provide one of the richest data sets to adoring fans. I love when folks who say they hate math or statistics end up rattling off their most seen songs, largest song gaps they need to close, or provide feedback on graphs I put up on Twitter.
Phish fans love data, and for that reason Sam Levin and I created the R package phishr. You can request an API Key here and our packages have a handful of functions that help do the heavy lifting.
# load the libraries
library(phishr)
library(purrr)
library(dplyr)
library(ggplot2)
We can use the setlist function which takes on two arguments - your API key [I’ve saved mine to a string called my_apikey] and the show date you’d like the setlist of.
Looking at the most recent setlist:
pn_get_setlist(my_apikey, "2020-02-23")
## Set Song Segue ## 1 Set 1 Crowd Control , ## 2 Set 1 Farmhouse , ## 3 Set 1 Breath and Burning , ## 4 Set 1 Divided Sky , ## 5 Set 1 Meat , ## 6 Set 1 Everything's Right , ## 7 Set 1 The Squirming Coil , ## 8 Set 1 Wingsuit , ## 9 Set 1 David Bowie ## 10 Set 2 Simple > ## 11 Set 2 Golden Age > ## 12 Set 2 Fuego > ## 13 Set 2 Undermind -> ## 14 Set 2 Back on the Train > ## 15 Set 2 Passing Through ## 16 Encore Rise/Come Together > ## 17 Encore The Horse > ## 18 Encore Silent in the Morning , ## 19 Encore Fee , ## 20 Encore Funky Bitch , ## 21 Encore MoreIf you want to select multiple shows you can use a map function to apply the setlist function to each show date:
First we’ll create a vector of the show dates we want setlists for Then we can map the setlist function on each of the show dates Then we need to add a column that corresponds to the Date of each showMexico <- c("2020-02-20", "2020-02-21", "2020-02-22", "2020-02-23")
setlists <- map(Mexico, ~pn_get_setlist(apikey = my_apikey, .x))
Phexico <- map2_dfr(setlists, Mexico, ~mutate(.x, Date = .y))
Phexico
## Set Song Segue Date ## 1 Set 1 Torn and Frayed > 2020-02-20 ## 2 Set 1 Ghost > 2020-02-20 ## 3 Set 1 Free > 2020-02-20 ## 4 Set 1 Shipwreck -> 2020-02-20 ## 5 Set 1 Free , 2020-02-20 ## 6 Set 1 Shake Your Coconuts , 2020-02-20 ## 7 Set 1 Victim , 2020-02-20 ## 8 Set 1 The Moma Dance > 2020-02-20 ## 9 Set 1 Gotta Jibboo , 2020-02-20 ## 10 Set 1 Shade , 2020-02-20 ## 11 Set 1 The Landlady > 2020-02-20 ## 12 Set 1 Destiny Unbound , 2020-02-20 ## 13 Set 1 Steam > 2020-02-20 ## 14 Set 1 Crosseyed and Painless > 2020-02-20 ## 15 Set 1 Run Like an Antelope > 2020-02-20 ## 16 Set 1 Cavern > 2020-02-20 ## 17 Set 1 Beneath a Sea of Stars Part 1 > 2020-02-20 ## 18 Set 1 Say It To Me S.A.N.T.O.S. 2020-02-20 ## 19 Encore You Enjoy Myself 2020-02-20 ## 20 Set 1 Turtle in the Clouds , 2020-02-21 ## 21 Set 1 Shafty -> 2020-02-21 ## 22 Set 1 Plasma -> 2020-02-21 ## 23 Set 1 Shafty -> 2020-02-21 ## 24 Set 1 Plasma > 2020-02-21 ## 25 Set 1 The Lizards , 2020-02-21 ## 26 Set 1 Bathtub Gin -> 2020-02-21 ## 27 Set 1 Shafty -> 2020-02-21 ## 28 Set 1 Bathtub Gin > 2020-02-21 ## 29 Set 1 Blaze On , 2020-02-21 ## 30 Set 1 Sea and Sand , 2020-02-21 ## 31 Set 1 Possum 2020-02-21 ## 32 Set 2 Sigma Oasis > 2020-02-21 ## 33 Set 2 Also Sprach Zarathustra > 2020-02-21 ## 34 Set 2 Drift While You're Sleeping , 2020-02-21 ## 35 Set 2 Lifeboy , 2020-02-21 ## 36 Set 2 I Always Wanted It This Way -> 2020-02-21 ## 37 Set 2 No Men In No Man's Land -> 2020-02-21 ## 38 Set 2 Piper > 2020-02-21 ## 39 Set 2 Good Times Bad Times 2020-02-21 ## 40 Encore Sand -> 2020-02-21 ## 41 Encore Weekapaug Groove -> 2020-02-21 ## 42 Encore Shafty 2020-02-21 ## 43 Set 1 Willin' , 2020-02-22 ## 44 Set 1 Tube , 2020-02-22 ## 45 Set 1 Evening Song , 2020-02-22 ## 46 Set 1 Set Your Soul Free , 2020-02-22 ## 47 Set 1 You Sexy Thing > 2020-02-22 ## 48 Set 1 46 Days , 2020-02-22 ## 49 Set 1 Waste > 2020-02-22 ## 50 Set 1 Your Pet Cat , 2020-02-22 ## 51 Set 1 Tweezer > 2020-02-22 ## 52 Set 1 Manteca > 2020-02-22 ## 53 Set 1 Makisupa Policeman > 2020-02-22 ## 54 Set 1 Twist 2020-02-22 ## 55 Set 2 Energy > 2020-02-22 ## 56 Set 2 Soul Planet > 2020-02-22 ## 57 Set 2 Waves > 2020-02-22 ## 58 Set 2 Carini > 2020-02-22 ## 59 Set 2 Chalk Dust Torture -> 2020-02-22 ## 60 Set 2 Have Mercy > 2020-02-22 ## 61 Set 2 A Life Beyond The Dream , 2020-02-22 ## 62 Set 2 Harry Hood 2020-02-22 ## 63 Encore Sweet Jane > 2020-02-22 ## 64 Encore Tweezer Reprise 2020-02-22 ## 65 Set 1 Crowd Control , 2020-02-23 ## 66 Set 1 Farmhouse , 2020-02-23 ## 67 Set 1 Breath and Burning , 2020-02-23 ## 68 Set 1 Divided Sky , 2020-02-23 ## 69 Set 1 Meat , 2020-02-23 ## 70 Set 1 Everything's Right , 2020-02-23 ## 71 Set 1 The Squirming Coil , 2020-02-23 ## 72 Set 1 Wingsuit , 2020-02-23 ## 73 Set 1 David Bowie 2020-02-23 ## 74 Set 2 Simple > 2020-02-23 ## 75 Set 2 Golden Age > 2020-02-23 ## 76 Set 2 Fuego > 2020-02-23 ## 77 Set 2 Undermind -> 2020-02-23 ## 78 Set 2 Back on the Train > 2020-02-23 ## 79 Set 2 Passing Through 2020-02-23 ## 80 Encore Rise/Come Together > 2020-02-23 ## 81 Encore The Horse > 2020-02-23 ## 82 Encore Silent in the Morning , 2020-02-23 ## 83 Encore Fee , 2020-02-23 ## 84 Encore Funky Bitch , 2020-02-23 ## 85 Encore More 2020-02-23And now we have data to look at some numbers! How about number of songs per set per show?
(
song_counts <- Phexico %>%
mutate(Set = factor(Set, levels = c("Set 1", "Set 2", "Encore"))) %>%
group_by(Set, Date) %>%
count()
)
And we can visualize that:
ggplot(song_counts, aes(x = Set, y = n, fill = Set)) +
geom_bar(stat = "identity") +
facet_wrap(.~ Date) +
theme_bw()
We can also use the function pn_get_show_notes for each show and scrape this for text data or teases
notes <- map(Mexico, ~pn_get_show_notes(apikey = my_apikey, .x)) %>% unlist()
names(notes) <- Mexico
# show the first show's note
notes[1]
Lastly we can look at ratings:
(
ratings <- map(Mexico, ~pn_get_show_rating(apikey = my_apikey, .x)) %>% unlist()
)
Is number of songs per show a good predictor of rating?
# group data by number of songs
counts_per_show <- Phexico %>%
group_by(Date) %>%
count()
# add the ratings to the data
counts_per_show$rating <- ratings
counts_per_show
## # A tibble: 4 x 3 ## # Groups: Date [4] ## Date n rating ## ## 1 2020-02-20 19 3.46 ## 2 2020-02-21 23 3.86 ## 3 2020-02-22 22 4.04 ## 4 2020-02-23 21 3.55Run a simple linear model
model <- lm(rating ~ n, data = counts_per_show)
x <- summary(model)
pf(x$fstatistic[1L], df1 = x$fstatistic[2L], df2 = x$fstatistic[3L], lower.tail = FALSE)
Nope. Not significant at all. BUT! I hope this is a fun introduction to a package that inspires you to look at this rich data set you love, test your hypotheses, and create visualizations! Feel free to reach out to me or Sam on our GitHub account.
If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.
You must be logged in to post a comment.
Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.
This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.
Credits | Terms Of Use | Legal | DMCA
The Mockingbird Foundation is a non-profit organization founded by Phish fans in 1996 to generate charitable proceeds from the Phish community.
And since we're entirely volunteer – with no office, salaries, or paid staff – administrative costs are less than 2% of revenues! So far, we've distributed over $2 million to support music education for children – hundreds of grants in all 50 states, with more on the way.
Nice job!
I've got some light background on R, but need to get a lot more experienced with it for work and for a predictive analytics actuarial exam I'm taking in June. Since it'll probably get postponed, I've been dragging my feet on studying for it lately, but motivation-wise this is a game-changer!
https://www.zdnet.com/article/pluralsight-makes-entire-library-of-courses-free-for-april/
I've been wanting to play with setlist data in a graph database for a little while now...
The roots of that effort were modeled on previous efforts, a series of "Deadbase" books (useful in the taper/tape-trading era), but with the notion that rather than being a cottage industry of a few Dead fans who sought to profit from their work, we would organize into a "crowd sourced" collective and donate any profits to charity.
There are way too many people to credit for this effort, but the main force behind the coding of the modern (post-reunion) Phish.net site and the API which allows you to access and massage this data is the Foundation's current President, Adam Scheinberg. (A full list of site team credits with many of the people involved currently is accessible in the footer of this site "credits", a more complete list in the acknowledgements of "The Phish Companion, 3rd Ed.").