Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not enough.

Privacy Badger (https://www.eff.org/privacybadger) blocks a bunch of stuff on StackOverflow.

Blocks Scripts:

www.google-analytics.com

edge.quantserve.com

b.scorecardresearch.com

Blocks Cookie Only:

ajax.googleapis.com

www.gravatar.com

i.stack.imgur.com

Blocks nothing:

cdn.sstatic.net

Add content relevancy is one part of the equation, but not allowing third parties to track my actions on your site is the other, more important part of the equation (to me at least). Privacy Badger is the only adBlocker I use (besides having flash not run automatically), so I will happily view ads that don't involve tracking me.



The state of website statistics is a sad one. The better site analysis engines were bought years ago by Google (becoming Google analytics) and the like and the lesser ones died. A few are still around kicking, including some quite good ones, but most advertisers, sponsors, etc will only trust third party analytics in determining advertising rates, sponsorship levels, etc.

Essentially, users who block analytics become a net negative for many sites as they add no positive value to the site operator and are just a drain on resources. There are exceptions, of course, in the case of user-generated content participation where submitting content, making comments, etc may be a draw to revenue-generating visitors. But on many sites, that user blocking ads and analytics is only hurting the site operator. If it becomes more popular - say as the default setting in an ad blocker - I'd wager we'll have some sites start to block those users at some point in the future.


I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with?

For some reason, though, we don't allow that sort of privacy violation in our actual lives.

However, most real-world businesses manage to do quite well even with our rude dismissals of their desire to track us.

[edit]


> For some reason, though, we don't allow that sort of privacy violation in our actual lives.

Not so much. Your phone carrier tracks all your movements. Your credit car company knows every swipe. Your bank sells your information to retailers. The leasing you did on that car got you into a database of new car owners. Right after you married you started getting offers from Home Depot. And with your mortgage you started getting calls from brokers, asking if you want to put it back in the rental market...

The "real world" may not track your anonymous movements yet (NYPD cameras, anyone?), but other than that I don't see much difference. In fact it's the opposite; the non-online world is more opaque, harder to opt-out, and much more invasive, as it usually involves PII.

Another difference to consider: the motivations. In the 'real world', we used to pay for goods. In the past you'd walk to your newsstand and buy the NYT edition for $2.50 (with ads). Nowadays, people feel outraged[1] if the exact same goods aren't given away for free (as in beer), and with no trackers, no registration, no ads, no anti-ad-blockers.

Something has to give.

[1] http://the-digital-reader.com/2015/12/16/95221/


Even on paper magazines people were fed up and pushing back some time ago, I remember seeing 'this magazine is less than 50%'ads' or the like as a pride point, so at some point salesmen thought that was a good enough differentiator


Back before the Internet (browsers) a big reason I bought/subscribed to magazines was for the ads. That was how I found out about most new products.

Today I rarely find out about anything from ads. Instead I find out from sites like HN, slashdot, etc or from social network connections and groups


> I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with? > For some reason, though, we don't allow that sort of privacy violation in our actual lives. > However, most real-world businesses manage to do quite well even with our rude dismissals of their desire to track us.

Are you sure you're not being tracked? There are technologies like Prism[1] that allow stores to track people, where they go, how long they stay there, etc. On the lower-tech side, point cards[2] are also used to track purchasing habits.

[1] https://prism.com/

[2] http://consumerist.com/2012/02/17/target-figures-out-teen-gi...


> For some reason, though, we don't allow that sort of privacy violations in our actual lives.

Nomi is a startup that does exactly this. They use open WiFi networks to harvest MAC addresses of cell phones and track people's locations that way.

The only way to "opt out" is to register your MAC address with them. (Aside from just disabling WiFi on your phone, of course).


This is why iOS came out with a MAC address randomiser when it is passively scanning for networks.


Well, I won't be using "Nomi" or recommending them to my friends anytime soon, so good luck to them and their business model.


Um ... it doesn't work exactly like that. If you leave your phone's WiFi turned on, you and your friends will be using Nomi sometime soon (so exciting, amiright?)! You see, the store/eatery/prison/airline/casino/whatever just puts the Nomi WiFi access point on their premises, and when you come in your phone tries to connect to it, and BLAMO, it harvests your MAC address. Now they have a unique identifier for 'you' (your phone's MAC), and they can keep track of how many times you go into the place, where else you like to go, what sections you like to browse (just a few more of the gadgets scattered in the store), if you've been to their marketing events, etc. etc. etc. Oh hey, since it harvests your MAC address, they can also tell who manufactured your phone. 'you' have an apple phone? Sweet! We've got ourselves a (probably) high-end customer.

Welcome to the future :(


Also, because no one gives a damn, your phone will also shout out the names and MACs of access points it's connected to before. I have no idea why Android/iOS hasn't disabled / geofenced that yet.


Phones do this to authenticate faster to known access points.

Instead of waiting for an SSID beacon, they just broadcast their known networks and if one of them is in range, the AP will reply. It's all driven by user demand to connect to their WiFi network faster.

Also, AFIAK, this is required behaviour to join networks that don't broadcast their SSID (e.g. "hidden" networks).

I see two solutions to this problem, but neither are really tenable from a user perspective:

1) run the GPS all the time and geotag known APs[1]

2) leave the WiFi radio on all the time and passively listen for SSID broadcasts[2]

[1] doesn't work indoors, or if the location of an AP is changed (e.g. 4G hotspot). Will also have significant impact on battery runtime, and likely to be abused by ad companies

[2] will have significant impact on battery runtime


It's hard to solve this part of the problem using current wifi protocols. I think this is closely related to other problems that people have studied and I'm sure protocol improvements could be made using public-key crypto or HMAC. As an off-the-cuff example, when you join an encrypted network, it could tell you (over the encrypted channel) a shared secret to use when reconnecting. Then you could broadcast HMAC(secret, current time) or (nonce, HMAC(secret time, nonce)) and if the wifi network recognizes one of those broadcasts as directed to it, it could reply. An eavesdropper who doesn't know the secret wouldn't be able to determine which base station the mobile device was trying to contact.


From the description above, they probably don't interact with users in a way that the users can see or control.


> For some reason, though, we don't allow that sort of privacy violation in our actual lives.

Actually, people do. Take for example supermarket loyalty cards: every purchase is logged. It compounds with online tracking too: when you then login to the online site to check your points or view offers, the information about your purchases can then be linked to web analytics and other profiled data.


> how many real-world businesses could benefit from the same type of invasive tracking ... we don't allow that sort of privacy violation in our actual lives.

We do, but apparently many even on HN don't realize it. Brick and mortar stores track customers through their phone signals (providing unique identifiers and location), cameras that can track where people go in the store and even what they look at, and more. Here are a couple of good resources where I've learned about it (though they cover much more than those issues):

* http://www.mediapost.com/

* http://www.iab.com/


> I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with?

Most companies that collect data use it to make money. That's not true ONLY on the web.

> For some reason, though, we don't allow that sort of privacy violation in our actual lives.

Really? Do you have any grocery store loyalty cards? Do you use a credit card? That data is shared and sold. Do you do any activities that show up on your credit report, like pay bills to a utility? That data is shared and sold. That's how the credit rating agency got it. Do you have a set-top box? It's probably selling data on what shows you watch. Do you have a cell phone? Location data.

If you actually restricted yourself to purchases which don't involve anybody who shares your data, you'd be living in a cave.


> For some reason, though, we don't allow that sort of privacy violation in our actual lives.

Because shops never monitor footfall in certain aisles. Of course real-world businesses do this. Plus all those people who carry loyalty cards.



> most advertisers, sponsors, etc will only trust third party analytics

Which is ridiculous, because they're about as easy to game.


"but most advertisers, sponsors, etc will only trust third party analytics in determining advertising rates, sponsorship levels, etc."

Then you are losing at the negotiation table and trying to blame "the users". Those advertisers are simply wrong. No 3rd party analytics service can provide more accurate data than server logs. None.

"I'd wager we'll have some sites start to block those users at some point in the future."

And I would wager that it won't help their business increase revenue or decrease expenses.


> No 3rd party analytics service can provide more accurate data than server logs. None

This is very, very wrong. Parsing server logs can give you a lot of data but parsing the data directly from the client (ala Google analytics) can give you so, so much more. Data about the client itself, demographic information (beyond ip address locations) and information regarding their interests when hooked into a system that's present on many other sites.

Server logs are a great starting point but the amount of relevant data a third party service can give you is so great most are willing to make the tradeoff and use them. I'd rather have all analytics in house but that extra data can be invaluable.


Yes, an individual website can't track you throughout the whole web and compile a demographics and interest dosier. Only advertisers have that capability because their Javascript is served up by thousands of websites. But that's the whole conondrum! Advertisers want to track, people do not want to be tracked, thus adblockers.


But it's not just that. Server logs do not give you other information that client tracking can gain such as regional and and client information.

In all honestly I would be fine losing some of the information that only comes from creating a dossier on a user and bringing the tracking in house it's just a lot of work and services like Google Analytics give that to you for free so it's hard to just ignore especially at a start-up.


> No 3rd party analytics service can provide more accurate data than server logs. None.

Google Analytics provides information that goes far beyond what the server logs can tell you. The logs will give you the most accurate view of the requests to your server but that's all. You can't get insights about bounce rate, behavior, goal completion, conversion rate, etc. from server logs. To rely only on mining server logs would be a huge step backwards for a business.


All those things are calculable from web logs.


How so? I'm willing to accept that I'm wrong if you can show any server log analytics package that is capable of that. It doesn't seem possible.


If each request has a session cookie in the log it's easy to track a user's movements through your site, so you can figure out bounce rate, conversion etc no problem. Things like Piwik have pretty dashboards to generate whatever reports you want from server logs.


Piwik looks interesting. I'm playing with the demo right now. It does look like they support goals[1]. I can't know for sure how effective it is but I will concede that log analytics is more powerful than I believed.

[1]: http://piwik.org/docs/tracking-goals-web-analytics/

For anyone who wants to play with the demo: http://demo-log-analytics.piwik.org/index.php?module=CoreHom...


Designing good session identifiers for logging is not trivial. More importantly, with a lot of events happening in javascript, a lot of user interactions are occurring beyond the visibility of your server logs. Tracking js events is also clearly possible, but again, non trivial. I've been in more than one situation where we trusted GA more than our own logs.


Some stuff, but if you want to emulate GA events you'll have to also implement some additional frontend code and a backend to log those events. I hope this is the direction that we see analytics moving. The information analytics provides developers is great, but I don't like the fact that this information is usually given to multiple third parties by using their analytics engines.


No they're not. As an example, bounces are commonly the last page where the user left without doing anything. How can you tell that from a simple server log line that says the page loaded? You cant see if they clicked on something or how long they read the page or what other actions they took.

Most sites today also go far beyond basic pageviews and track all kinds of events on the page like scrolling rates, reading time, what other headlines you clicked or hovered over, etc. This is not possible without JS tracking.


This is naive. You can't use raw server logs without filtering out bots.

Also, advertisers don't trust site owners to report what's in their server logs accurately, when there is money on the line. It would be easy to cheat.

So that's why you need an independent third party.


It isn't so much a technical issue as it is one of trust and efficiency. GA is familiar, and easy to compare across different sites without worrying about how exactly they're mining their logs. At best, they have to figure out how to make efficient comparisons. At worst, they have to worry about sites inflating their numbers--either through fabrication or just optimistic tweaking of how data is presented. With GA, they don't really have to worry about outright fabrication or reliability. And while there's a ton of data to be teased out of server logs--especially if someone tries to correlate it with other user data they already have--for most advertisers, that doesn't really matter. They don't want to invest the time it'd take to make use of it with all potential properties they want to advertise on.

Say what you will about GA and privacy issues, there's a reason why it's become so common place. Not just for site owners, but advertisers as well.


> No 3rd party analytics service can provide more accurate data than server logs. None.

For raw pageviews, sure, but try tracking a SPA with server logs, or seeing what the most common screen resolutions are, or seeing what percent of your visitors have Flash installed.


Server logs are easily faked. There's a lot of mistrust in the industry (well-earned, due to many scams in the past). Third-party web analytics providers, implemented via plain-text JavaScript anybody can audit, are fair.

(That ignores of course that server logs are virtually useless for determining a user's behavior on a single page app. It's not the ONLY reason server logs aren't used, but perhaps is the most important reason.)


> No 3rd party analytics service can provide more accurate data than server logs. None.

Only if those servers are personally managed by analytics experts.


  The state of website statistics is a sad one.
How does Piwik fare? I briefly tried it once and loved it. I've never tried GA. Am I just blissfully unaware of a whole world of a difference?


Piwik, certainly when self-hosted, is considered completely fine by most people concerned about these issues. It's still blocked when privacy tools are set to max, which is understandable. But it's viewed as perfectly ethical.


Not if you trigger Piwik views from the server with the session id and never include the client JS.


Self-hosted Piwik is _probably_ privacy compliant for most people. But it's a steaming pile of shite. Try Snowplow Analytics instead and host that yourself.


None of the things you listed would be things most consumers care to block. In fact most are things they either want or could benefit from (analytics.)


most consumers want analytics, profiling, retargeting and tracking?

The only time I've heard "normal folks" talk about these sorts of things, its about how creepy they are, never once about "how that great ad was exactly what they wanted". Thats assuming any awareness on their part at all.

Silent ignorance/indifference isn't consent, apporval or desire. People who pretend it is are almost certainly making self serving arguements.


Come, now. Everyone knows that the most common dying words of people these days are "I wish I'd spent more time engaging with my favorite brands".


I do agree with the cookie blocking on those Google APIs and Gravatar pages.


Taking the parents statement in combination with yours, sounds like you're not "most consumers".


"Privacy Badger blocks a bunch of stuff on StackOverflow."

"Here's a list of above-board analytics services and CDNs."


I like the fact that Privacy Badger blocks analytics scripts and CDN cookies

I don't want to send my StackOverflow activity to multiple different third parties. I am fine with each website operator having my traffic info for that site. I am less fine with them sharing that traffic info with third parties who can easily aggregate that information across most sites that I visit.


> i.stack.imgur.com

what's imgur doing on stackoverflow?


Whenever you post an image on a question or answer, and it's not already hosted somewhere else, i.e. you're uploading from your computer, that image is hosted on imgur. Stack Exchange has a deal with them like that. You can see this on the image upload dialog.


didn't know that, thanks


View-source shows that it's used for images in tags.

  <img src="//i.stack.imgur.com/tKsDb.png" height="16" width="18" alt="" class="sponsor-tag-img">


More significantly, it's the official host for all user-uploaded images on the site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: