Mild vibration wouldn't be enough, it has to be at least annoying, if not painful... I would buy it in a second though, as this approach is probably the easiest way to self-train the "uhm"s & similar out.
Random trivia: "ahem" is a cultural / language based thing. Other languages use different "words" for "ahem". The same for grunting sounds. For example the sound people make when they need to push something very hard. It's cultural not natural. The sounds are different by culture.
Getting off topic but ... by advertising do you mean obvious commercials or submarine ones? I ask because my podcast player has an advance 30s, 2m, rewind 15s buttons. When a commercial like ad starts it takes me like 2-3 seconds to skip it. In other words I don't need deep learning to skip ads for me.
Which actually makes me question if podcasts can make money as it's so dang easy to skip all the ads.
A lot of podcast consumption is while driving, where operating the phone is dangerous and illegal. I would especially like an option to skip advertising when I'm listening to two episodes from the same podcast and the closing ads from one episode are the same as the opening ads in the next episode.
The app I use will skip 30 seconds ahead and that usually (sometimes I need to skip several minutes ahead) takes care of the ads. However it requires me to pull my phone out of my pocket, unlock it, navigate to the app, click skip. It's not a big deal but it is a hassle. Especially if there is more than one ad segment in the episode.
Pretty much SOA; most end-to-end systems use these spectrograms on short time slices. The alternative is mel-frequency cepstral coefficients, which are used more in GMM-HMM speech recognition than for DNN.
Could sum1 write a GM script to replace all abbr.s with their ffs? The over use of abbr.s is one of the most XABs on Hacker News.
Could someone write a Grease Monkey script to replace all abbreviations with their full forms? The over use of abbreviations is one of the most extremely annoying behaviours on Hacker News.
What's the difference between the false-color frequency spectrum image and the frequency spectrum itself? Assuming that there's one horizontal pixel per sample, one vertical pixel per frequency bin, and that the color spectrum encodes the intensity to sufficient resolution, I'm not sure there is any difference.
The bitmap is just a convenient representation of a two-dimensional array, and PNG is a convenient compression library!
That's right. This is not state of the art. LSTM is good for sequences and sequential data (like audio). But with the approach hereby described, prediction can be done in parallel and asynchronously. Which is something ;)
I'm serious here: Can we make a "euuhmm" remover for Elon Musk? I love his vision, but can't stand his talks because he can't a word out without "euhmm"-ing it.
<peeve>I hope this evolves to eventually become capable of removing "like"s, and maybe "uhm"s too, but certainly "like"s. I would willfully reside in the throat of a titanic bloviating German with influenza before listening to that haggard word uttered in every instance where punctuation or thoughtfulness ought have precedence. "Ahem" I can tolerate, even "uhm" and "you know"; but not all sentences require an analogy.</peeve>
For those apparently disturbed [1] by some attribute of the above, please see:
https://en.wikipedia.org/wiki/Guttural - note that the German language is, by some[2], considered guttural. Happens to be among my favorites too. "Ahem", being a guttural sound, I jovially compared to a hypothetically enlarged and garrulous German with influenza (for purposes of exaggeration), which one might fairly imagine sounding slightly more pharyngeal than usual - which I must add that I would truly not find offensive. As for any perceived assault on the ubiquitous abuse of the word "like", if there is anything I can add to exacerbate it, I'd be delighted to oblige.
1. Euphemism for forbidden reference regarding votes
A friend and I tried to make a similar detector for removing squeaking sounds of whiteboard markers from videos. Although solutions like this do sound like a waste of time to some, I think they can go a long way in removing those tiny annoyances.
That sounds fantastic! I've recently been looking at mechanical keyboards again (Cherry-MX blue switches, I have browns at the moment) and those are quite noisy. An algorithm to detect and filter keyboard noise could benefit millions of people with VoIP, live streams, etc.
Exactly! With just 5 epochs and some hours on a low budget GPU I got 81% accuracy. Not bad at all, considering that no knowledge of MFC & Co. is required.
the entire project is on github. In the folder data/ there are some samples to "see" it in action. Otherwise you have to train it on your voice and apply to whatever sound you like