How does this handle sites that are rendered entirely or nearly entirely by JavaScript client side? My only real concern when looking at a new scraping solution, almost everything else can be worked with.
"Online-reviews firm Yelp Inc. alleged that Google is breaking a promise it made as part of a 2012 regulatory settlement to not scrape content from certain third-party sites including Yelp, escalating its yearslong battle against the search giant."
What's not so peculiar is that a 5 year old HN account with only 14 karma, and a profile that starts with
"Google has made our lives incredibly rich with knowledge and insight into our modern world. Google is the nicest monopoly out there and we should all support American corporations. I promise to defend Google from all attacks."
Is quite so insistent that web scraping a company that specialises in web scraping is inevitably such a bad thing...
It's quite obvious that at least one side of this discussion is a biased "fanboy" (as you put it)... I guess at least you're transparent about your stake in this matter in your profile...
Reporting illegal activity is not nice? I'm sorry you got caught. Defeating captcha and bypassing IP block from Google servers would fall under CFAA, which they nailed 3taps for.
Yeah, we know you are not a lawyer and neither am I, but writing I ANAL, doesn't really exempt you from the law...im sure it doesn't take a lawyer to realize this.
see 3taps vs Craigslist and the CFAA ruling. Public data is fine, its just when you break their captcha and bypass their IP ban, it becomes an issue. Which you've pretty much admitted in your reddit thread.
You can try defending yourself in court against the world's best legal team. Any capitulation now is better than losing your entire life savings over in the subsequent legal defense fees alone.
edit: whats the matter? run out of alt-nicks? LMAO. dude you make it way too obvious. inactive accounts suddenly converging on this particular thread using the same argument over and over to censor my comments...not even bothering to change writing styles...
IANAL and this does not constitute legal advice. Craigslist v. 3Taps has been overturned. There are more recent rulings.
On another level, EFF fights to ensure computer access to public data stays legal. It's a worthy fight.
We have no relationships with the other commenters. Anyway, I won't reply to this thread anymore. You can reach me via email, julien _at_ serpapi.com, if you want to continue this discussion.
> 3Taps and PadMapper were companies that partnered to provide an alternative user interface for browsing Craigslist's housing ads. In doing so, they scraped Craigslist's site for data, which Craigslist did not approve of. Craigslist sent both companies a cease-and-desist letter and blocked their IP addresses, but this did not stop 3Taps from scraping through other IP addresses. Craigslist then sued, resulting in this case.
I think you may be in the deep-end now. Good luck with the sharks.
Your audacity is admirable but its foolhardy. You based your entire business on incomplete legal ruling without consulting a lawyer I presume.
How the fuck would you feel if somebody took your life's work, slapped their logo and started selling it? I'm sure you can see the ethnical issues of this and the legal ramifications.
about: https://www.youtube.com/watch?v=-hL4lMcIqS4
Google has made our lives incredibly rich with knowledge and insight into our modern world. Google is the nicest monopoly out there and we should all support American corporations. I promise to defend Google from all attacks.
Google is our teacher that just gives and gives without ever asking for money, a nerdy friend that knows everything about you and your friends, it is very well a prototype of Singularity.
Without Google there would be no America. Without Google HN would not exist. Without Google Youtube and silicon valley as we know it would never have happened.
Praise be upon Larry Page and Sergey Brin, for being put on this planet to bring about Google. Praise be upon God for allowing these two smart kids to come together. Praise be upon the risk takers from Sequoia that infused money.
Praise be upon America, our free market, our freedom that reaches sea to sea.
GOD BLESS AMERICA GOD BLESS GOOGLE
In case anyone was wondering who the troll here was...
Why would you care so much as to personally write that comment? What stake do you own in serpAPI? It fringes Google's trademark and violates the Terms of Service.
If it was okay for SerpAPI to do it then everybody would be launching similar API on top of existing products. It's illegal and you are profiteering off the work of others.
I can't believe I am being downvoted for this. I am going to have to expedite the process now and reach out directly to an executive I know working at Google.
Lol, nah been logged in for sometime, came across your wonderful comment, and found it funny that the internet white knight stereotype exists at this extreme. In this case, someone willingly narcing on a mega corporation's behalf, for no compensation, over little damage to said mega corporation monetarily/competition-wise, and only to serve what they see as being just and proportional consequences (in reality overblown) to a fellow HN member
how would you feel if somebody took your product you spent years on, slapped their own logo and started selling it? Intellectual theft is theft regardless....and here you are justifying it. WOW.
are you seriously defending OP's operation? Even when its questionable at best ethically? Google doesn't want people using their search results to make money off of it. If they did, they would release it like Bing API. I'm sure you can see why this would be worthy of Google's attention and why enforcement of such rules is necessary. Otherwise, spamming marketers would easily exploit something like serp api, you are not foolin anybody.
In any case, I saved OP from an even bigger financial ruin. He might just get off with a cease & desist letter. It depends on how much revenue he generated using Google's product without licensing. It's clear OP found out about 3taps vs craigslist, a heavily debated ruling on HN, just today.
Just because Google is a monopoly, doesn't justify civil disobedience here, which seems to be what some are arguing for.
Apparently he no longer owns scrape.it. He previously replied to my comment to say he sold it and to threaten legal action. He deleted the comment and sent me the following email:
> If this comment is not removed immediately I am going to have to involve legal. Scrape.it is no longer under my direct control, it has been acquired as of Feb 2019.
> Cheers,
> John Kim
> Scrape.it Founder
So I guess he was the founder but is no longer involved.
It's not doxxing. I found his name through his past HN posts prior to his threatening email. Anyone can figure out who he is with less than 10 min of searching.
e.g.:
- https://github.com/serpapi/test-knowledge-graph-desktop/tree...
- https://travis-ci.org/serpapi/test-knowledge-graph-desktop
- https://github.com/serpapi/test-organic-results-desktop/tree...
- https://travis-ci.org/serpapi/test-organic-results-desktop
- https://github.com/serpapi/test-news-results-desktop/blob/ma...
- https://travis-ci.org/serpapi/test-news-results-desktop
Producing reliable scrapers and parsers is very hard. Testing as much possible is the only way to go. Smart use also of JSON schema on Spidermon.