Deep Learning ‘ahem’ detector

dnautics · on Nov 9, 2016

Badly needed: something that removes coughing from classical music

gaur · on Nov 9, 2016

Relevant: http://www.davegrossman.net/gould/

savanaly · on Nov 9, 2016

Terrific, hilarious page. I'm in the camp that likes glenn's humming though.

dnautics · on Nov 9, 2016

I like his humming, too, but at the risk of making a heretical statement, it would certainly be nice to have both options.

Biganon · on Nov 9, 2016

Does it work with Keith Jarrett as well?

frag · on Nov 9, 2016

the neural network is just like a baby brain. If it is fed with Keith Jarret, yes it will also learn to deal with that

a_t48 · on Nov 9, 2016

Hah. I work with the owner of that site. Unless this is dave?

petercooper · on Nov 9, 2016

Hook this up to a shock collar and train yourself. Even better if it could detect "erm"s :-)

M_Grey · on Nov 9, 2016

How about just... a mild vibration on the wrist or something? Then again, at least, "AhAAAAHHHH" is more interesting than "Ahem".

annnnd · on Nov 9, 2016

Mild vibration wouldn't be enough, it has to be at least annoying, if not painful... I would buy it in a second though, as this approach is probably the easiest way to self-train the "uhm"s & similar out.

Any entrepreneurs around here? ;)

M_Grey · on Nov 9, 2016

No, but I can buy some dog anti-barking collars, and I'm willing to electrify some people... especially today.

annnnd · on Nov 9, 2016

Well then, if I start barking and can't stop, I'll contact you for sure!

M_Grey · on Nov 9, 2016

Good boy!

mnw21cam · on Nov 9, 2016

Useful for, like, "like".

greggman · on Nov 9, 2016

Random trivia: "ahem" is a cultural / language based thing. Other languages use different "words" for "ahem". The same for grunting sounds. For example the sound people make when they need to push something very hard. It's cultural not natural. The sounds are different by culture.

jval43 · on Nov 9, 2016

And "huh" seems to be universal:

http://www.smithsonianmag.com/science-nature/everybody-almos...

AlwaysRock · on Nov 9, 2016

Can I get a deep learning that cleans podcast episodes from advertising?

t0mek · on Nov 9, 2016

I was able to create an ad-block for my Yamaha stereo receiver:

http://blog.rekawek.eu/2016/02/24/radio-adblock/

No need to use AI, just FFT and cross-correlation. It was easier though, because every commercial block starts and ends with the same jingle pair.

ryptophan · on Nov 9, 2016

I'll get to work on it. It'll cost you $2/month, though.

regecks · on Nov 9, 2016

Is this sarcasm? I'd pay for it in a hearbeat.

AlwaysRock · on Nov 10, 2016

Same.

greggman · on Nov 9, 2016

Getting off topic but ... by advertising do you mean obvious commercials or submarine ones? I ask because my podcast player has an advance 30s, 2m, rewind 15s buttons. When a commercial like ad starts it takes me like 2-3 seconds to skip it. In other words I don't need deep learning to skip ads for me.

Which actually makes me question if podcasts can make money as it's so dang easy to skip all the ads.

qq66 · on Nov 9, 2016

A lot of podcast consumption is while driving, where operating the phone is dangerous and illegal. I would especially like an option to skip advertising when I'm listening to two episodes from the same podcast and the closing ads from one episode are the same as the opening ads in the next episode.

AlwaysRock · on Nov 10, 2016

The app I use will skip 30 seconds ahead and that usually (sometimes I need to skip several minutes ahead) takes care of the ads. However it requires me to pull my phone out of my pocket, unlock it, navigate to the app, click skip. It's not a big deal but it is a hassle. Especially if there is more than one ad segment in the episode.

revelation · on Nov 9, 2016

So this is literally learned from images of the fake-color frequency spectrum?

That's not state of the art in DNN speech recognition, is it?

skoocda · on Nov 9, 2016

Pretty much SOA; most end-to-end systems use these spectrograms on short time slices. The alternative is mel-frequency cepstral coefficients, which are used more in GMM-HMM speech recognition than for DNN.

annnnd · on Nov 9, 2016

For other illiterates like me:

* SOA = state of the art

* GMM-HMM = Gaussian mixture model - Hidden markov model

* DNN = deep neural networks (more than 1 hidden layer)

* mel-frequency cepstral = MFC ;-)

kwhitefoot · on Nov 9, 2016

Could sum1 write a GM script to replace all abbr.s with their ffs? The over use of abbr.s is one of the most XABs on Hacker News.

Could someone write a Grease Monkey script to replace all abbreviations with their full forms? The over use of abbreviations is one of the most extremely annoying behaviours on Hacker News.

(Thank you https://www.allacronyms.com/aa-search?q=annoying&cx=01082138...)

zump · on Nov 9, 2016

How does this work when the spectrogram is finite time slice?!

skoocda · on Nov 14, 2016

Lots of overlapping. It's a sliding window function. Ballpark for most algorithms: 10 ms of new audio, 90 ms of old audio.

eutectic · on Nov 9, 2016

Why not just use time-domain convolutions?

LeifCarrotson · on Nov 9, 2016

What's the difference between the false-color frequency spectrum image and the frequency spectrum itself? Assuming that there's one horizontal pixel per sample, one vertical pixel per frequency bin, and that the color spectrum encodes the intensity to sufficient resolution, I'm not sure there is any difference.

The bitmap is just a convenient representation of a two-dimensional array, and PNG is a convenient compression library!

dharma1 · on Nov 9, 2016

CNN on spectral images can work well. But most SOTA uses LSTM for speech recognition

annnnd · on Nov 9, 2016

Thank you! LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

frag · on Nov 9, 2016

That's right. This is not state of the art. LSTM is good for sequences and sequential data (like audio). But with the approach hereby described, prediction can be done in parallel and asynchronously. Which is something ;)

neals · on Nov 9, 2016

I'm serious here: Can we make a "euuhmm" remover for Elon Musk? I love his vision, but can't stand his talks because he can't a word out without "euhmm"-ing it.

frag · on Nov 9, 2016

lol... how about forking from github and train a model on Elon. I hate his euhmm thing so badly!! :D

eth0up · on Nov 9, 2016

<peeve>I hope this evolves to eventually become capable of removing "like"s, and maybe "uhm"s too, but certainly "like"s. I would willfully reside in the throat of a titanic bloviating German with influenza before listening to that haggard word uttered in every instance where punctuation or thoughtfulness ought have precedence. "Ahem" I can tolerate, even "uhm" and "you know"; but not all sentences require an analogy.</peeve>

eth0up · on Nov 9, 2016

For those apparently disturbed [1] by some attribute of the above, please see:

https://en.wikipedia.org/wiki/Guttural - note that the German language is, by some[2], considered guttural. Happens to be among my favorites too. "Ahem", being a guttural sound, I jovially compared to a hypothetically enlarged and garrulous German with influenza (for purposes of exaggeration), which one might fairly imagine sounding slightly more pharyngeal than usual - which I must add that I would truly not find offensive. As for any perceived assault on the ubiquitous abuse of the word "like", if there is anything I can add to exacerbate it, I'd be delighted to oblige.

1. Euphemism for forbidden reference regarding votes

2. http://www.huffingtonpost.com/2013/08/03/german-harsh-langua... - A little witzelsucht for your aphonogelia.

hackpert · on Nov 9, 2016

A friend and I tried to make a similar detector for removing squeaking sounds of whiteboard markers from videos. Although solutions like this do sound like a waste of time to some, I think they can go a long way in removing those tiny annoyances.

Kenji · on Nov 9, 2016

That sounds fantastic! I've recently been looking at mechanical keyboards again (Cherry-MX blue switches, I have browns at the moment) and those are quite noisy. An algorithm to detect and filter keyboard noise could benefit millions of people with VoIP, live streams, etc.

frag · on Nov 9, 2016

interesting

Hydraulix989 · on Nov 9, 2016

Next thing you need is ummm and uhhhh detection :)

sundvor · on Nov 9, 2016

Maybe I'm showing my age here, but I'd actually prefer a "liiiiiike" detector.

amelius · on Nov 9, 2016

I would be interested in an "ahem" remover, with the constraint that it should also work if the "ahem" is superimposed over other sound, e.g. music.

pmyjavec · on Nov 9, 2016

This really seems like a waste of time or an I missing something ?

yalooze · on Nov 9, 2016

First thought: helpful to know what words to ignore when trying to parse audio.

skoocda · on Nov 9, 2016

This is neat. Looks like you need more data though!

frag · on Nov 9, 2016

Exactly! With just 5 epochs and some hours on a low budget GPU I got 81% accuracy. Not bad at all, considering that no knowledge of MFC & Co. is required.

jaflo · on Nov 9, 2016

Is there a demo available?

frag · on Nov 9, 2016

the entire project is on github. In the folder data/ there are some samples to "see" it in action. Otherwise you have to train it on your voice and apply to whatever sound you like

frag · on Nov 9, 2016

ahem... hi everybody this is ahem... Piggy ;)

farright · on Nov 9, 2016

Is it just a coincidence that this comes right at the end of Obamas tenure?