Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool! One minor nitpick -- the author mentions that this is 'completely unsupervised'. It's true that the author didn't need to manually classify the data, but someone did.

So, I believe that this is actually supervised learning, as the author is training a classifier on preexisting labels (the genres).

I believe that unsupervised learning would not make use of a target variable at all. If the network architecture terminated at the fully connected layer, and then propagated that layer backwards to reconstruct the input (something like Contrastive Divergence), that would be an unsupervised method.



Right, in a unsupervised model (say k-means), it divides the data into 9 similar groups and then it is up to you to label what those 9 groups are.


Correct, the CNN classifier is definitely supervised learning.


You're correct of course. But it's cool that you can learn a useful embeddings (in this case into a 128-dimensional space) with only relative few (in this case 9) binary labels.


I'd love to see an analysis of what exactly these embeddings represent concerning the musical features of a sound snippet.


Came here to say this.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: