Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Deep Learning ‘ahem’ detector (github.com/worldofpiggy)
137 points by sndean on Nov 9, 2016 | hide | past | favorite | 53 comments


Badly needed: something that removes coughing from classical music



Terrific, hilarious page. I'm in the camp that likes glenn's humming though.


I like his humming, too, but at the risk of making a heretical statement, it would certainly be nice to have both options.


Does it work with Keith Jarrett as well?


the neural network is just like a baby brain. If it is fed with Keith Jarret, yes it will also learn to deal with that


Hah. I work with the owner of that site. Unless this is dave?


Hook this up to a shock collar and train yourself. Even better if it could detect "erm"s :-)


How about just... a mild vibration on the wrist or something? Then again, at least, "AhAAAAHHHH" is more interesting than "Ahem".


Mild vibration wouldn't be enough, it has to be at least annoying, if not painful... I would buy it in a second though, as this approach is probably the easiest way to self-train the "uhm"s & similar out.

Any entrepreneurs around here? ;)


No, but I can buy some dog anti-barking collars, and I'm willing to electrify some people... especially today.


Well then, if I start barking and can't stop, I'll contact you for sure!


Good boy!


Useful for, like, "like".


Random trivia: "ahem" is a cultural / language based thing. Other languages use different "words" for "ahem". The same for grunting sounds. For example the sound people make when they need to push something very hard. It's cultural not natural. The sounds are different by culture.



Can I get a deep learning that cleans podcast episodes from advertising?


I was able to create an ad-block for my Yamaha stereo receiver:

http://blog.rekawek.eu/2016/02/24/radio-adblock/

No need to use AI, just FFT and cross-correlation. It was easier though, because every commercial block starts and ends with the same jingle pair.


I'll get to work on it. It'll cost you $2/month, though.


Is this sarcasm? I'd pay for it in a hearbeat.


Same.


Getting off topic but ... by advertising do you mean obvious commercials or submarine ones? I ask because my podcast player has an advance 30s, 2m, rewind 15s buttons. When a commercial like ad starts it takes me like 2-3 seconds to skip it. In other words I don't need deep learning to skip ads for me.

Which actually makes me question if podcasts can make money as it's so dang easy to skip all the ads.


A lot of podcast consumption is while driving, where operating the phone is dangerous and illegal. I would especially like an option to skip advertising when I'm listening to two episodes from the same podcast and the closing ads from one episode are the same as the opening ads in the next episode.


The app I use will skip 30 seconds ahead and that usually (sometimes I need to skip several minutes ahead) takes care of the ads. However it requires me to pull my phone out of my pocket, unlock it, navigate to the app, click skip. It's not a big deal but it is a hassle. Especially if there is more than one ad segment in the episode.


So this is literally learned from images of the fake-color frequency spectrum?

That's not state of the art in DNN speech recognition, is it?


Pretty much SOA; most end-to-end systems use these spectrograms on short time slices. The alternative is mel-frequency cepstral coefficients, which are used more in GMM-HMM speech recognition than for DNN.


For other illiterates like me:

* SOA = state of the art

* GMM-HMM = Gaussian mixture model - Hidden markov model

* DNN = deep neural networks (more than 1 hidden layer)

* mel-frequency cepstral = MFC ;-)


Could sum1 write a GM script to replace all abbr.s with their ffs? The over use of abbr.s is one of the most XABs on Hacker News.

Could someone write a Grease Monkey script to replace all abbreviations with their full forms? The over use of abbreviations is one of the most extremely annoying behaviours on Hacker News.

(Thank you https://www.allacronyms.com/aa-search?q=annoying&cx=01082138...)


How does this work when the spectrogram is finite time slice?!


Lots of overlapping. It's a sliding window function. Ballpark for most algorithms: 10 ms of new audio, 90 ms of old audio.


Why not just use time-domain convolutions?


What's the difference between the false-color frequency spectrum image and the frequency spectrum itself? Assuming that there's one horizontal pixel per sample, one vertical pixel per frequency bin, and that the color spectrum encodes the intensity to sufficient resolution, I'm not sure there is any difference.

The bitmap is just a convenient representation of a two-dimensional array, and PNG is a convenient compression library!


CNN on spectral images can work well. But most SOTA uses LSTM for speech recognition



That's right. This is not state of the art. LSTM is good for sequences and sequential data (like audio). But with the approach hereby described, prediction can be done in parallel and asynchronously. Which is something ;)


I'm serious here: Can we make a "euuhmm" remover for Elon Musk? I love his vision, but can't stand his talks because he can't a word out without "euhmm"-ing it.


lol... how about forking from github and train a model on Elon. I hate his euhmm thing so badly!! :D


<peeve>I hope this evolves to eventually become capable of removing "like"s, and maybe "uhm"s too, but certainly "like"s. I would willfully reside in the throat of a titanic bloviating German with influenza before listening to that haggard word uttered in every instance where punctuation or thoughtfulness ought have precedence. "Ahem" I can tolerate, even "uhm" and "you know"; but not all sentences require an analogy.</peeve>


For those apparently disturbed [1] by some attribute of the above, please see:

https://en.wikipedia.org/wiki/Guttural - note that the German language is, by some[2], considered guttural. Happens to be among my favorites too. "Ahem", being a guttural sound, I jovially compared to a hypothetically enlarged and garrulous German with influenza (for purposes of exaggeration), which one might fairly imagine sounding slightly more pharyngeal than usual - which I must add that I would truly not find offensive. As for any perceived assault on the ubiquitous abuse of the word "like", if there is anything I can add to exacerbate it, I'd be delighted to oblige.

1. Euphemism for forbidden reference regarding votes

2. http://www.huffingtonpost.com/2013/08/03/german-harsh-langua... - A little witzelsucht for your aphonogelia.


A friend and I tried to make a similar detector for removing squeaking sounds of whiteboard markers from videos. Although solutions like this do sound like a waste of time to some, I think they can go a long way in removing those tiny annoyances.


That sounds fantastic! I've recently been looking at mechanical keyboards again (Cherry-MX blue switches, I have browns at the moment) and those are quite noisy. An algorithm to detect and filter keyboard noise could benefit millions of people with VoIP, live streams, etc.


interesting


Next thing you need is ummm and uhhhh detection :)


Maybe I'm showing my age here, but I'd actually prefer a "liiiiiike" detector.


I would be interested in an "ahem" remover, with the constraint that it should also work if the "ahem" is superimposed over other sound, e.g. music.


This really seems like a waste of time or an I missing something ?


First thought: helpful to know what words to ignore when trying to parse audio.


This is neat. Looks like you need more data though!


Exactly! With just 5 epochs and some hours on a low budget GPU I got 81% accuracy. Not bad at all, considering that no knowledge of MFC & Co. is required.


Is there a demo available?


the entire project is on github. In the folder data/ there are some samples to "see" it in action. Otherwise you have to train it on your voice and apply to whatever sound you like


ahem... hi everybody this is ahem... Piggy ;)


Is it just a coincidence that this comes right at the end of Obamas tenure?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: