> Is there a better way to surf the web, retrieve the source code of the pages a...

fake-name · on March 9, 2019

This is nice sounding, but many modern web-pages use extensive client-side rendering. Sure, you can work around that without needing a full JS environment, but doing so is ad-hoc and you wind up having to write complex code on a per-site basis.

I do a bunch of web-scraping for hobby shit, and I've love to be able to not have to shell out to chromium for some sites, but unfortunately the modern web basically means you're stuck with it.

harryf · on March 9, 2019

Also sites with some kind of 2FA / oauth happening. This _looks_ like it would be possible to login manually then start scraping.

dstick · on March 9, 2019

Correct me if I’m wrong but neither one supports Javascript rendered pages?

You’re right in the overhead though; I’d stay miles away from Electron for scraping but you’ll need more than a CURL wrapper to properly fetch data in all shapes and sizes :) Headless Chromium does do the trick in that regard.

danpalmer · on March 9, 2019

With web scraping you typically don’t want the visuals anyway. JS rendered applications are usually easier to scrape because they have data in a more raw or canonical format available somewhere to do that rendering.

karmelapple · on March 9, 2019

Plenty of websites will only render the content fully after some JavaScript runs, so to properly scrape them you do indeed need a browser to process the JS. This includes text content.

adamzochowski · on March 9, 2019

Javascript rendered pages load JS which then in turn calls some rest API to get data and use that to render contents. Web scraper stops scraping the html, but calls and scrapes the rest api endpoint.

tazeg95 · on March 9, 2019

Sure, but i meant to build a portable app, for end users who are not coders, with a GUI, and for a dedicated purpose, like for exemple navigating on facebook.

So i will edit this question to this : Is there a better way to code a portable application with a graphical user interface to scrape a given site ?

Thanks for your comment.

sansnomme · on March 9, 2019

Look up robot process automation and visual web scraping. Web scraping without having to write code is a well established field. Just not very popular with the HN crowd for obvious reasons.

Some example would be Scrapinghub's Portia system and the Kantu startup. There are also established players like UIPath and Visualwebripper.

rasengan · on March 9, 2019

You can access the html of the website and use regular expressions.

tazeg95 · on March 9, 2019

> You can access the html of the website and use regular expressions.

Yes but using regular expressions is the last and least recommended solution, please read : https://stackoverflow.com/questions/3577641/how-do-you-parse...

rasengan · on March 9, 2019

If you read that link, it’s only not recommmended because people don’t know how to use it. Regular expressions are powerful.

stareatgoats · on March 9, 2019

Read the link. Just wondering how you managed to interpret this:

> regular expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.

as this:

> it’s only not recommmended because people don’t know how to use it

rasengan · on March 9, 2019

> https://stackoverflow.com/questions/3577641/how-do-you-parse...

It says "can make regex fail when not properly written" etc.

There are different circumstances where using a premade parsing library versus using raw regular expressions are going to make sense.

The answer is not binary.

toastal · on March 9, 2019

I thinking that was the joke

stareatgoats · on March 9, 2019

You're thinking of another post [0]. Not a joke either, really

[0] https://stackoverflow.com/questions/1732348/regex-match-open...

ritz_labringue · on March 9, 2019

I can't not mention this infamous SO answer: https://stackoverflow.com/questions/1732348/regex-match-open...

chinathrow · on March 9, 2019

> like for exemple navigating on facebook.

What would you want to scrape there which is not against their ToS and a violation of user privacy in general?

tyingq · on March 9, 2019

I wouldn't mind a personal scraper that pulls down the family updates and pictures I want and puts them somewhere private where I can see them.

Would get rid of the clutter and keep FB from some amount of shenanigans with my browser.

Nextgrid · on March 9, 2019

I can’t believe there are people still defending this scummy company.

Facebook broke both legal and ethical “ToS” countless times and has no plan to stop.

Why do you consider what Facebook is doing as OK but a little web scraping for personal usage to be so bad?

leetbulb · on March 9, 2019

As the saying goes: "two wrongs don't make a right." Facebook's ToS is still a ToS. If you want to scrape the data that they've collected, either risk your account due to it being against the ToS or collect the data yourself.

dig1 · on March 9, 2019

Good luck with that :) Any modern website requires javascript interpreter on client side, so unless you provide some sort of javascript interpretation (which can be messy), you'll be able to scrape only simple content with scrapy/BS.

bootloop · on March 9, 2019

I mean, I guess the point is that it allows you to scrap data after it was rendered by JS.

patrickm1 · on March 9, 2019

you can always use something like proxycrawl to scrape javascript without using Electron. And it's compatible with scrapy

IloveHN84 · on March 9, 2019

You forgot curl in C/C++ which is the most advanced tool out there