Very cool! One minor nitpick -- the author mentions that this is 'completely uns...

mattmcknight · on Aug 2, 2017

Right, in a unsupervised model (say k-means), it divides the data into 9 similar groups and then it is up to you to label what those 9 groups are.

steaknsteak · on Aug 2, 2017

Correct, the CNN classifier is definitely supervised learning.

rahimnathwani · on Aug 2, 2017

You're correct of course. But it's cool that you can learn a useful embeddings (in this case into a 128-dimensional space) with only relative few (in this case 9) binary labels.

mkorfmann · on Aug 3, 2017

I'd love to see an analysis of what exactly these embeddings represent concerning the musical features of a sound snippet.

computerex · on Aug 2, 2017

Came here to say this.