Hacker Newsnew | past | comments | ask | show | jobs | submit | jharper's commentslogin

What's everybody's background?

I'm currently coding in Ruby on Rails. Studied CS and Math at Appalachian State (graduated in 2004). Working at Enventys in Charlotte (http://www.enventys.com).

We could definitely meet up at the Enventys facilities ... cool place to hang out.


Human dupe-detection would be an excellent extension to this process.


Why not just compare the content at the other end of the link with the contents of existing links.

It wouldn't be that hard. Whenever a link is submitted, YC's server would visit the link, get the response, strip all html tags and white space from it, then hash whatever is left. It would then store this hash value with the link. Whenever a new story is submitted, it is likewise hashed and then a check is made for an existing link with the same hash value. If it exists, it's a dupe, if not, allow it.

This would be an extra check to the existing dupe URL string of course. It still wouldn't catch every single thing, but it should eliminate quite a few easy dupes.

If that turns to have a low success rate, try hashing the page title or maybe the http headers.


A single comment or timestamp would change the hash.

Maybe the <title>, or the contents of the first <h1> or something would be a better proxy.


Yeah that is what I was thinking when I added that last line.

For some reason I initially wasn't thinking about comments... so the title would be a much better proxy.


Are you suggesting that new submissions route through Mechanical Turk?

LOL.

Once approach might be that when humans detect dupes, they could be reported. Click the "dupe" link, specify the URL(s) of the dupe(s), and submit. The oldest submission "wins", and the data could be used to train a bayesian dupe detector. I imaging that you could start with a URL text match (it's the ends of the string that tend to be different), along with a check of the <title> for the supposedly dupe page, and maybe the first 128 characters of the story text or something.

It actually sounds like a fun project.


I just purchased "TRANCE" on itunes ... pretty good so far.


Vultures (John Mayer)

Lose Yourself (Eminem)


Synergy of ...

- An apple interface (streamlined and beautiful).

- Awesome performance/security/organization of unix.


is this too old to still be considered valid?

http://www.techworld.com/security/news/index.cfm?newsid=1798


Same + Cocoa Mysql + Unfuddle


That's pretty fucking slick guys


BS Applied Math

BS Comp Science (both at Appstate)

Former Java programmer; current language is Ruby, professionally.


25 (B.S. Mathematics, B.S. Comp Sci)


That would be awesome. I wish our schools were run like businesses.

There would probably be less porn stars coming to speak for "diversity" week ...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: