Friday 10/25/2013 by Lemuria

A PATTERN IN PHISH'S PREDICTABILITY

Of the many elements of the Phish.net feature set, one that often catches my curiosity is Trey's Notebook. It identifies songs most likely to be played at each show, given songs played in the previous year but not the previous three shows.

For upcoming shows, it's an algorithmic prediction ("Here's what you might expect to hear...") that often works remarkably well, such as predicting 68% of the 22 songs played three nights ago in Rochester. But for previous shows, focus on those percentages themselves rather than the list of songs, and Trey's Notebook becomes a measure of the extent to which Phish's setlists are predictable.

That varies widely, as this first chart illustrates. A handful of early shows were completely predicted (100%!), but many were predictive #fails (0%). Shows in 1990-93 were generally less predictable than shows before or since, largely as a function of the repertoire expanding during that period. And there's a general pattern, marked here with a fifth-order polynomial trendline, in maroon, though nothing stark. (Note that this scatterplot replaces an earlier, clunkier lineplot.)

Percent of Songs Correctly Predicted by Trey's Notebook
Percent of Songs Correctly Predicted by Trey's Notebook

Since the predictability also varies by tour, I also tried charting tour averages (depicted on a per-show basis, for comparison of both predictability and tour length) and tour-wise moving averages (for each tour, the first show's percentage predicted, then the first two shows' percentages averaged, then the first three, etc.) However, the lengths of tours (particularly as we define them) vary widely, with up to 121 shows in one "tour." And the percent correctly predicted varies across tours, generally increasing from start to finish, with an average percentage correct of 23.4% across the first shows of every tour but an average of 35.4% across the last shows of every tour.

So, this final chart averages, for each show, the percentage correctly predicted at the previous 30 shows. This "30-show moving average" is telling: Save for a few pronounced dips, Phish setlists have been getting generally more predictable over the past 20 years, such that Trey's Notebook now routinely predicts around 40% or more of each what the band plays. But, then, that's the case for the bulk of the past 700 shows - nearly half the band's history!

30-Show Moving Average of Trey's Notebook Percent Correct
30-Show Moving Average of Trey's Notebook Percent Correct

So, the next time some doe-eyed city reporter writes an article calling Phish "unpredictable", well, you can correct them: Maybe not so much as they used to be!

None of this analysis would have been possible without Adam's build of Trey's Notebook, and Stephen's backend querying to collect the data. Thanks to you both for fueling the infoporn!

If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.


Comments

, comment by ForgeTheCoin
ForgeTheCoin Wow! Scientific Phish Nerd-ism at its finest! Interesting!
, comment by JamArchive_Live
JamArchive_Live Geekery at its finest! Great job!
, comment by PeterJenningsLovedTheFish
PeterJenningsLovedTheFish Somehow, this will be spun into a critique of the fact that Phish has no new material and that they aren't what they were in 1.0...
, comment by tasatter
tasatter Nice analysis, Mang! ;)
, comment by Billiam
Billiam Good stuff!
Although the predictability seems to be increasing, it's not rising appreciably. Note in the first figure that the scatter of the data is actually decreasing... there are less wildly predictable and totally unpredictable shows. I'd submit that Phish 3.0 are more mature musicians, so perhaps they're finding a better formula that works for them. Maybe they're trying to let certain songs evolve to a greater degree before rotating them out?

It'd be interesting to compare these trends with a proxy for how 'enjoyable' the shows are. Does predictability correlate with awesomeness?
, comment by NoHayBanda
NoHayBanda Trey's Notebook isn't a perfect measure of predictability. Since it completely disregards stuff played in the last 3 shows, it will consider Crowd Control on a gap of 4 more likely than a Moma Dance on a gap of 3. I think this is particularly important, since if a song was in a 3 show rotation, it would count as being a surprise song! But if it moved to a 4 show rotation, which a lot of songs are these days, it becomes OH so predictable. In a way, phish is being punished in your system by moving songs from a 3 show rotation to a 4 show rotation. If you can find a way to adjust for that you might see that line flatten right out.

Also those numbers are inflated from the number of songs in Trey's Notebook. Sure, it can predict 65% of songs played, but they list 36! The bigger that number, the more 'impressive' that percentage gets.
, comment by Lemuria
Lemuria I agree with @Billiam about the reduction in the dispersion of the percentages. There's a certainly a great deal more that could be done with the data - from adjusting songs by debut to, to adjusting by tour length, to even discounting some shows. (I kept them all in.)

And @NoHayBanda is right, to an extent, both that the actual variability in songs affects the input to Trey's Notebook, and that the precision of the predictions (how many songs are listed) affects the output. But actual variability should impair the predictions, which might actually do a better job if that were considered; and higher or lower precision would alter the percentages but might not appreciably changes *in* those percentages, which is what I was after.
, comment by gankmore
gankmore Awesome analysis. Thanks much. Really enjoyed this. Wonder what one could we do to improve the prediction pattern other than what's been played in the last year and number of gaps?

, comment by sethadam1
sethadam1 @jimthin9 wrote:

If you can find a way to adjust for that you might see that line flatten right out.
Hey, if anyone wants to provide a better input to improve the algorithm that fuels our predictions, I'm all ears. Email me.
, comment by DistressTube
DistressTube @Lemuria This type of thing is definitely something I enjoy as well. I posted in a Song Frequency Change Thread not too long ago about a spreadsheet I compiled to be able to track song frequencies by year and set appearance. I manually compiled the data from Phish.net, taking out songs that "don't count for Phish stats purposes" as noted on the setlists.

More detail in the other post, but here is a link to that spreadsheet. I updated this through the 2013 Dick's run. Song totals are 98% correct or so, give or take a couple here and there. I really enjoy nerding out to Phish.
, comment by GillyGhost
GillyGhost Cool Read. Thank you.
, comment by FACTSAREUSELESS
FACTSAREUSELESS I don't know how to feel about this information, honestly. But thanks.
, comment by NoHayBanda
NoHayBanda @FACTSAREUSELESS said:
I don't know how to feel about this information, honestly. But thanks.
post/handle ftw!
, comment by NoHayBanda
NoHayBanda so, i dont mean to be a dick, i am just interested in this and would love to see it done correctly. but the more i think about this, the more im sure this is showing the opposite of what is true.

the reason shows were so 'unpredictable' early was due to heavy repeats. just look at any notebook and compare it with a gap chart for that show. i randomly looked at 02.07.1991

http://phish.net/treys-notebook?basedate=1991-02-07 /> http://phish.net/setlists/gapchart.php?d=1991-02-07 />
due to the rules of treys notebook, this show was highly unpredictable. when in reality, there were THIRTEEN repeats from the night before. and five more on a 2 show gap.

due to their small song catalog, they were flooded with massive repeats early in their career, trey's notbook just isnt set up to evaluate them properly.
, comment by franksmadanks
franksmadanks Label your axis dude it makes this shit easier
, comment by NoHayBanda
NoHayBanda @sethadam1 said:
@jimthin9 wrote:

If you can find a way to adjust for that you might see that line flatten right out.
Hey, if anyone wants to provide a better input to improve the algorithm that fuels our predictions, I'm all ears. Email me.
i think its generally set up pretty well for phish in their current era of playing, i look at it before every show. but i will email you some ideas to consider! (obviously its an inexact science)
, comment by Lemuria
Lemuria @franksmadanks said:
Label your axis dude it makes this shit easier
The axes are labelled, in the easiest and least shit-encumbered way: The horizontal ("x" ;) axis is the show number. I ran several of them using dates as axes - either as actual dates (which made the spacing suspect and misleading) or just as date-formatted numbers (but, then, you only see a smattering of them). But the shift across shows, whatever their distribution in time, was really my interest - so any labels other than showdate didn't measure what seemed of interest.

(insert "these aren't the labels you're looking for" joke here.)
, comment by TheNonArmenianMan
TheNonArmenianMan Did anybody else lol @ the fact that on the day this article is posted, the show tonight at DCU is a total 0% fail for Trey's Notebook?
, comment by TheNonArmenianMan
TheNonArmenianMan @TheNonArmenianMan said:
Did anybody else lol @ the fact that on the day this article is posted, the show tonight at DCU is a total 0% fail for Trey's Notebook?

My mistake, I was looking at the Notebook for 10/26! total fail on my part
, comment by Lemuria
Lemuria @TheNonArmenianMan said:
Did anybody else lol @ the fact that on the day this article is posted, the show tonight at DCU is a total 0% fail for Trey's Notebook?
It correctly predicted Cities, Wilson, Wolfman's, Rift, Free, Carini, Caspian, BDTNL, Ghost, Cavern, RLA, Suzy, and GTB. The percentage doesn't appear until a programmed threshold but I think that's 13 out of 24 songs, which is 54%, even more predictable (in this sense) than normal.

There are of course other ways to mean or measure predictability. We could create a composite measure combining the Trey's Notebook "anticipated percentage" (which assesses recent commonality) with the show's "last time played" average (which assesses the appearance of rarities, relative to what's common and whatever's in the middle.) And there are probably other things that would be fun to throw in the mix.
, comment by HotPale
HotPale Did you predict that encore tonight? Contact> Suzy> Rocky Top> GTBT nobody saw that comin'! This nonsense is way too nerdy...love me some Phish and makin' calls, but really...this is the stuff that makes me say go get some phresh air! On that note...what are they opening with tomorrow night? Buried Alive!
, comment by Dressed_In_Gray
Dressed_In_Gray What a surprise, the polynomial trendline is a set of Waves.

Nice work here.
, comment by bertoletdown
bertoletdown Ladies and gentlemen: Ellis Godard.

ELLIS GODARD YOU FUCKS
, comment by bertoletdown
bertoletdown @HotPale said:
Did you predict that encore tonight? Contact> Suzy> Rocky Top> GTBT nobody saw that comin'! This nonsense is way too nerdy...love me some Phish and makin' calls, but really...this is the stuff that makes me say go get some phresh air! On that note...what are they opening with tomorrow night? Buried Alive!
Let me be perfectly clear. If you couldn't predict tonight's encore, you are a massive, oxygen hogging noobtater.
, comment by SymphonicDelight
SymphonicDelight BUY, BUY, BUY!
SELL, SELL, SELL!
, comment by PYITIPA
PYITIPA I take issue with the function fitted to "% of songs predicted" graph as used to extrapolate the trend of the band right now. I would prefer to see a function fitted to, at most, 3.0 only to predict the trend of setlist predictability.
, comment by lysergic
lysergic I have two suggestions to improve the prediction algorithm for Trey's Notebook.

(1) The elimination of songs in the last three sets is arbitrary and introduces a sharp discontinuity in eligibility. Instead I would suggest penalties for being played recently, with the penalty decreasing by show separation. To illustrate what I mean, the penalties could be -15 for one show back, -10 for two shows back, -5 for three shows back, -2 for four shows back, and -1 for five shows back.

So if David Bowie has been played 17 times in the past year, but it was played one show ago, DB would get assigned a score of 2. If YEM has been played 13 times in the past year, but it was played three shows ago, it would get a score of 8.

Then whichever songs get the highest scores would be the predicted ones. In essence, this is exactly the scheme being used, except the penalties right now are -1000 for any songs in the past three shows, and 0 for all other songs.

Naturally, there is nothing magical about the penalties I chose above. There would probably have to be some tinkering done. Assuming it's not too tough, you could mess around with the penalties and observe the resulting average prediction accuracy. Then pick whatever penalties maximize average prediction accuracy over the shows so far.

(2) I wonder if the one year cutoff is optimal in terms of establishing the score for each song. Perhaps you could tinker with this. Again if it's not too hard, you could try different cutoff points and see which one maximizes average prediction accuracy.

In the unlikely event that my advice is implemented, I would suggest doing my second suggestion first. The cutoff point is going to make a big difference in terms of which penalties are optimal.
, comment by NoHayBanda
NoHayBanda @lysergic those are good ideas, esp getting rid of the cutoff in favor of gradual bonuses.

as a different kind of prediction, there could be a new section called something like "they are due for a..." where you can take songs with larger gaps, and compare the current gap to them (ie. the sloth has a 3.0 gap average of 18.9, and is currently at 26. this could get a rating of 1.37, or however you scale it.

it would mostly be made up of songs that don't make the trey's notebook cut, but its always fun to think about what rarities could be due up soon
, comment by Lemuria
Lemuria I agree that the penalty should be a graded gap, but only if it's been played recently and has a recent gap pattern. If the gap is lifetime/historical, it won't reflect a recent resurgence; but if the gap can't be recent if it hasn't been recently played.

If it's been played on the current tour, the penalty could be based on the average gap on the tour. If its its been played in the past 12 months but not the current tour, the penalty could be based could be the average gap in the last 12 months, with possibly a penalty or bonus for not appearing yet in the current tour. If it hasn't been played in the last 12 months, the gap could be based on the average historical gap, with possibly an additional penalty or bonus for not having been played in more than a year.

And if we're adjusting the "last seen" penalty, the one-year/12-month cutoff is of course also arbitrary. Maybe it should be the last 30 shows, or the last 3 tours, or the number of shows we'd expect to see within the next year as projected from a moving average of the number of shows seen in past years, or...
, comment by Lemuria
Lemuria By the way, every suggestion is potentially important, but let's check the pudding before making any of these "improvements" permanent. After all, we have a model with a moving average that fluctuates, relatively tightly, around 40%. Any change, should do better.
, comment by FACTSAREUSELESS
FACTSAREUSELESS This entire excercise proves that they will open with Llama sometime again in the future. Taboot taboot.
, comment by tennesseejac
tennesseejac It doesn't look like Trey's notebook has a Velvet Underground song for tonights show. I predict Sweet Jane or at least a Rock and Roll
, comment by unoclay
unoclay this is absolutely awesome. thanks for compiling. I look forward to future updates and improvements.

It seems critical to note that since the band is writing less new songs, this would affect predictability in a big way. Though i dont know that this would 'prove' anything. If we agree they should be writing/adding material, then the lack of this aspect is simply a problem in itself, perhaps associated with--but not directly affecting--calculations of predictability.

just a thought, and certainly not original with me, i expect.
, comment by unoclay
unoclay @HotPale said:
Did you predict that encore tonight? Contact> Suzy> Rocky Top> GTBT nobody saw that comin'! This nonsense is way too nerdy...love me some Phish and makin' calls, but really...this is the stuff that makes me say go get some phresh air! On that note...what are they opening with tomorrow night? Buried Alive!
this article is fish fandom at its finest. we're a unique fanbase and are rightly proud of our devotion. personally, i'd rather have this kind of discussion until the sun goes down than be part of a fanbase that doesnt care enough to mathmatically chart the band we love. my .02.
, comment by pineapplegiddyup
pineapplegiddyup So you're saying that humans aren't that great at coming up with random data? shocking.
, comment by MDosque
MDosque That was a great read. I am one of those fans that gets pegged occasionally for being a jaded vet (kind of...I was the total NOOBER 96-99 and even though I loved the band and dove in head first to the material, I never toured and always deferred to the older fans.) But, I digress. In this glorious year of 2013, 2 years before Back to the Future II occurs, I can be called a jaded vet from time to time. I think the Phish of today is VERY predictable. Not song choices, per se, but the almost guaranteed structure and flow to the show - but that's fine, just different from my infant fandom years of roughly 95-98 where you absolutely never had any idea what would happen on a given night. You still kind of don't in 3.0 and they will certainly drop a Tahoe Tweezer from time to time. It's also a lock that the opening 45 of the second set is going to be the improv time. It's the GD concert structure. There have been very few, if any at all first set free flowing extended jams. That's fine. Certainly 88-93 is similar and even into 96, they would not go nuts in the first half. The playing would be incredible and still is with all our favorite tunes getting nailed, but there was, and I would argue currently isn't any feeling that shit could hit the fan at any second. 97-99 had that. I'm not pining for the past, but it's the Phish I grew with and loved. I still love them to this day, but when I say that a show is "predictable", this "jaded vet" means...

First set will open with KDF, Chalkdust type of jam, etc. (it always peaks and ends too soon) Then comes Moma Dance or Back on the Train (not really ever outside the box. You get a Rift, Bluegrass tune, Gin, and they close it out with a rocker. There is always a tasty little diversion in there (see, MPP2 It's Ice, Scent or even MPP 2011 Wolfman's Boogie on).

Second set is the bread and butter for fans of improv - most if not all highlights of 3.0 come from this opening sequence of the second set. Certainly, I've seen some great stuff the past few years in this slot, most recently the Hampton night 2 Ghost-DWD-Steam. Then the set cools down with Caspian or Wading before a Hood or YEM takes us home.

Encore is Loving Cup or another classic rock cover.

I must stress, this is not to say that I don't LIKE this structure because I do and I respect the way they have honed the show into one that fans from all eras and styles of the band will leave feeling happy. I haven't left a 3.0 show unhappy out of the 7 I have seen. It's just different from the 97-99 Phish I absolutely loved. You never knew what set, date, city, or venue the would burn down.

Thanks for the Phish geekdom. I enjoy this talk.

Dosque
, comment by deceasedlavy
deceasedlavy Cool idea. If only you could chart the predictability of the jams themselves! I guess I have that graph in my brain already though. Three cheers for happyrock> lullaby> ambience.
, comment by FACTSAREUSELESS
FACTSAREUSELESS @MDosque said:
That was a great read. I am one of those fans that gets pegged occasionally for being a jaded vet (kind of...I was the total NOOBER 96-99 and even though I loved the band and dove in head first to the material, I never toured and always deferred to the older fans.) But, I digress. In this glorious year of 2013, 2 years before Back to the Future II occurs, I can be called a jaded vet from time to time. I think the Phish of today is VERY predictable. Not song choices, per se, but the almost guaranteed structure and flow to the show - but that's fine, just different from my infant fandom years of roughly 95-98 where you absolutely never had any idea what would happen on a given night. You still kind of don't in 3.0 and they will certainly drop a Tahoe Tweezer from time to time. It's also a lock that the opening 45 of the second set is going to be the improv time. It's the GD concert structure. There have been very few, if any at all first set free flowing extended jams. That's fine. Certainly 88-93 is similar and even into 96, they would not go nuts in the first half. The playing would be incredible and still is with all our favorite tunes getting nailed, but there was, and I would argue currently isn't any feeling that shit could hit the fan at any second. 97-99 had that. I'm not pining for the past, but it's the Phish I grew with and loved. I still love them to this day, but when I say that a show is "predictable", this "jaded vet" means...

First set will open with KDF, Chalkdust type of jam, etc. (it always peaks and ends too soon) Then comes Moma Dance or Back on the Train (not really ever outside the box. You get a Rift, Bluegrass tune, Gin, and they close it out with a rocker. There is always a tasty little diversion in there (see, MPP2 It's Ice, Scent or even MPP 2011 Wolfman's Boogie on).

Second set is the bread and butter for fans of improv - most if not all highlights of 3.0 come from this opening sequence of the second set. Certainly, I've seen some great stuff the past few years in this slot, most recently the Hampton night 2 Ghost-DWD-Steam. Then the set cools down with Caspian or Wading before a Hood or YEM takes us home.

Encore is Loving Cup or another classic rock cover.

I must stress, this is not to say that I don't LIKE this structure because I do and I respect the way they have honed the show into one that fans from all eras and styles of the band will leave feeling happy. I haven't left a 3.0 show unhappy out of the 7 I have seen. It's just different from the 97-99 Phish I absolutely loved. You never knew what set, date, city, or venue the would burn down.

Thanks for the Phish geekdom. I enjoy this talk.

Dosque
Enjoyed reading this. Some good thoughts in there.
You must be logged in to post a comment.


Phish.net

Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.

This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.

Credits | Terms Of Use | Legal | DMCA

© 1990-2020  The Mockingbird Foundation, Inc. | Hosted by Linode