Add content relevancy is one part of the equation, but not allowing third parties to track my actions on your site is the other, more important part of the equation (to me at least). Privacy Badger is the only adBlocker I use (besides having flash not run automatically), so I will happily view ads that don't involve tracking me.
The state of website statistics is a sad one. The better site analysis engines were bought years ago by Google (becoming Google analytics) and the like and the lesser ones died. A few are still around kicking, including some quite good ones, but most advertisers, sponsors, etc will only trust third party analytics in determining advertising rates, sponsorship levels, etc.
Essentially, users who block analytics become a net negative for many sites as they add no positive value to the site operator and are just a drain on resources. There are exceptions, of course, in the case of user-generated content participation where submitting content, making comments, etc may be a draw to revenue-generating visitors. But on many sites, that user blocking ads and analytics is only hurting the site operator. If it becomes more popular - say as the default setting in an ad blocker - I'd wager we'll have some sites start to block those users at some point in the future.
I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with?
For some reason, though, we don't allow that sort of privacy violation in our actual lives.
However, most real-world businesses manage to do quite well even with our rude dismissals of their desire to track us.
> For some reason, though, we don't allow that sort of privacy violation in our actual lives.
Not so much. Your phone carrier tracks all your movements. Your credit car company knows every swipe. Your bank sells your information to retailers. The leasing you did on that car got you into a database of new car owners. Right after you married you started getting offers from Home Depot. And with your mortgage you started getting calls from brokers, asking if you want to put it back in the rental market...
The "real world" may not track your anonymous movements yet (NYPD cameras, anyone?), but other than that I don't see much difference. In fact it's the opposite; the non-online world is more opaque, harder to opt-out, and much more invasive, as it usually involves PII.
Another difference to consider: the motivations. In the 'real world', we used to pay for goods. In the past you'd walk to your newsstand and buy the NYT edition for $2.50 (with ads). Nowadays, people feel outraged[1] if the exact same goods aren't given away for free (as in beer), and with no trackers, no registration, no ads, no anti-ad-blockers.
Even on paper magazines people were fed up and pushing back some time ago, I remember seeing 'this magazine is less than 50%'ads' or the like as a pride point, so at some point salesmen thought that was a good enough differentiator
> I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with?
> For some reason, though, we don't allow that sort of privacy violation in our actual lives.
> However, most real-world businesses manage to do quite well even with our rude dismissals of their desire to track us.
Are you sure you're not being tracked? There are technologies like Prism[1] that allow stores to track people, where they go, how long they stay there, etc. On the lower-tech side, point cards[2] are also used to track purchasing habits.
Um ... it doesn't work exactly like that. If you leave your phone's WiFi turned on, you and your friends will be using Nomi sometime soon (so exciting, amiright?)! You see, the store/eatery/prison/airline/casino/whatever just puts the Nomi WiFi access point on their premises, and when you come in your phone tries to connect to it, and BLAMO, it harvests your MAC address. Now they have a unique identifier for 'you' (your phone's MAC), and they can keep track of how many times you go into the place, where else you like to go, what sections you like to browse (just a few more of the gadgets scattered in the store), if you've been to their marketing events, etc. etc. etc. Oh hey, since it harvests your MAC address, they can also tell who manufactured your phone. 'you' have an apple phone? Sweet! We've got ourselves a (probably) high-end customer.
Also, because no one gives a damn, your phone will also shout out the names and MACs of access points it's connected to before. I have no idea why Android/iOS hasn't disabled / geofenced that yet.
Phones do this to authenticate faster to known access points.
Instead of waiting for an SSID beacon, they just broadcast their known networks and if one of them is in range, the AP will reply. It's all driven by user demand to connect to their WiFi network faster.
Also, AFIAK, this is required behaviour to join networks that don't broadcast their SSID (e.g. "hidden" networks).
I see two solutions to this problem, but neither are really tenable from a user perspective:
1) run the GPS all the time and geotag known APs[1]
2) leave the WiFi radio on all the time and passively listen for SSID broadcasts[2]
[1] doesn't work indoors, or if the location of an AP is changed (e.g. 4G hotspot). Will also have significant impact on battery runtime, and likely to be abused by ad companies
[2] will have significant impact on battery runtime
It's hard to solve this part of the problem using current wifi protocols. I think this is closely related to other problems that people have studied and I'm sure protocol improvements could be made using public-key crypto or HMAC. As an off-the-cuff example, when you join an encrypted network, it could tell you (over the encrypted channel) a shared secret to use when reconnecting. Then you could broadcast HMAC(secret, current time) or (nonce, HMAC(secret time, nonce)) and if the wifi network recognizes one of those broadcasts as directed to it, it could reply. An eavesdropper who doesn't know the secret wouldn't be able to determine which base station the mobile device was trying to contact.
> For some reason, though, we don't allow that sort of privacy violation in our actual lives.
Actually, people do. Take for example supermarket loyalty cards: every purchase is logged. It compounds with online tracking too: when you then login to the online site to check your points or view offers, the information about your purchases can then be linked to web analytics and other profiled data.
> how many real-world businesses could benefit from the same type of invasive tracking ... we don't allow that sort of privacy violation in our actual lives.
We do, but apparently many even on HN don't realize it. Brick and mortar stores track customers through their phone signals (providing unique identifiers and location), cameras that can track where people go in the store and even what they look at, and more. Here are a couple of good resources where I've learned about it (though they cover much more than those issues):
> I can't help but wonder...how many real-world businesses could benefit from the same type of invasive tracking that online businesses seem to think they have a right to subject us with?
Most companies that collect data use it to make money. That's not true ONLY on the web.
> For some reason, though, we don't allow that sort of privacy violation in our actual lives.
Really? Do you have any grocery store loyalty cards? Do you use a credit card? That data is shared and sold. Do you do any activities that show up on your credit report, like pay bills to a utility? That data is shared and sold. That's how the credit rating agency got it. Do you have a set-top box? It's probably selling data on what shows you watch. Do you have a cell phone? Location data.
If you actually restricted yourself to purchases which don't involve anybody who shares your data, you'd be living in a cave.
"but most advertisers, sponsors, etc will only trust third party analytics in determining advertising rates, sponsorship levels, etc."
Then you are losing at the negotiation table and trying to blame "the users". Those advertisers are simply wrong. No 3rd party analytics service can provide more accurate data than server logs. None.
"I'd wager we'll have some sites start to block those users at some point in the future."
And I would wager that it won't help their business increase revenue or decrease expenses.
> No 3rd party analytics service can provide more accurate data than server logs. None
This is very, very wrong. Parsing server logs can give you a lot of data but parsing the data directly from the client (ala Google analytics) can give you so, so much more. Data about the client itself, demographic information (beyond ip address locations) and information regarding their interests when hooked into a system that's present on many other sites.
Server logs are a great starting point but the amount of relevant data a third party service can give you is so great most are willing to make the tradeoff and use them. I'd rather have all analytics in house but that extra data can be invaluable.
Yes, an individual website can't track you throughout the whole web and compile a demographics and interest dosier. Only advertisers have that capability because their Javascript is served up by thousands of websites. But that's the whole conondrum! Advertisers want to track, people do not want to be tracked, thus adblockers.
But it's not just that. Server logs do not give you other information that client tracking can gain such as regional and and client information.
In all honestly I would be fine losing some of the information that only comes from creating a dossier on a user and bringing the tracking in house it's just a lot of work and services like Google Analytics give that to you for free so it's hard to just ignore especially at a start-up.
> No 3rd party analytics service can provide more accurate data than server logs. None.
Google Analytics provides information that goes far beyond what the server logs can tell you. The logs will give you the most accurate view of the requests to your server but that's all. You can't get insights about bounce rate, behavior, goal completion, conversion rate, etc. from server logs. To rely only on mining server logs would be a huge step backwards for a business.
If each request has a session cookie in the log it's easy to track a user's movements through your site, so you can figure out bounce rate, conversion etc no problem. Things like Piwik have pretty dashboards to generate whatever reports you want from server logs.
Piwik looks interesting. I'm playing with the demo right now. It does look like they support goals[1]. I can't know for sure how effective it is but I will concede that log analytics is more powerful than I believed.
Designing good session identifiers for logging is not trivial. More importantly, with a lot of events happening in javascript, a lot of user interactions are occurring beyond the visibility of your server logs. Tracking js events is also clearly possible, but again, non trivial. I've been in more than one situation where we trusted GA more than our own logs.
Some stuff, but if you want to emulate GA events you'll have to also implement some additional frontend code and a backend to log those events. I hope this is the direction that we see analytics moving. The information analytics provides developers is great, but I don't like the fact that this information is usually given to multiple third parties by using their analytics engines.
No they're not. As an example, bounces are commonly the last page where the user left without doing anything. How can you tell that from a simple server log line that says the page loaded? You cant see if they clicked on something or how long they read the page or what other actions they took.
Most sites today also go far beyond basic pageviews and track all kinds of events on the page like scrolling rates, reading time, what other headlines you clicked or hovered over, etc. This is not possible without JS tracking.
This is naive. You can't use raw server logs without filtering out bots.
Also, advertisers don't trust site owners to report what's in their server logs accurately, when there is money on the line. It would be easy to cheat.
So that's why you need an independent third party.
It isn't so much a technical issue as it is one of trust and efficiency. GA is familiar, and easy to compare across different sites without worrying about how exactly they're mining their logs. At best, they have to figure out how to make efficient comparisons. At worst, they have to worry about sites inflating their numbers--either through fabrication or just optimistic tweaking of how data is presented. With GA, they don't really have to worry about outright fabrication or reliability. And while there's a ton of data to be teased out of server logs--especially if someone tries to correlate it with other user data they already have--for most advertisers, that doesn't really matter. They don't want to invest the time it'd take to make use of it with all potential properties they want to advertise on.
Say what you will about GA and privacy issues, there's a reason why it's become so common place. Not just for site owners, but advertisers as well.
> No 3rd party analytics service can provide more accurate data than server logs. None.
For raw pageviews, sure, but try tracking a SPA with server logs, or seeing what the most common screen resolutions are, or seeing what percent of your visitors have Flash installed.
Server logs are easily faked. There's a lot of mistrust in the industry (well-earned, due to many scams in the past). Third-party web analytics providers, implemented via plain-text JavaScript anybody can audit, are fair.
(That ignores of course that server logs are virtually useless for determining a user's behavior on a single page app. It's not the ONLY reason server logs aren't used, but perhaps is the most important reason.)
Piwik, certainly when self-hosted, is considered completely fine by most people concerned about these issues. It's still blocked when privacy tools are set to max, which is understandable. But it's viewed as perfectly ethical.
Self-hosted Piwik is _probably_ privacy compliant for most people. But it's a steaming pile of shite. Try Snowplow Analytics instead and host that yourself.
None of the things you listed would be things most consumers care to block. In fact most are things they either want or could benefit from (analytics.)
most consumers want analytics, profiling, retargeting and tracking?
The only time I've heard "normal folks" talk about these sorts of things, its about how creepy they are, never once about "how that great ad was exactly what they wanted". Thats assuming any awareness on their part at all.
Silent ignorance/indifference isn't consent, apporval or desire. People who pretend it is are almost certainly making self serving arguements.
I like the fact that Privacy Badger blocks analytics scripts and CDN cookies
I don't want to send my StackOverflow activity to multiple different third parties. I am fine with each website operator having my traffic info for that site. I am less fine with them sharing that traffic info with third parties who can easily aggregate that information across most sites that I visit.
Whenever you post an image on a question or answer, and it's not already hosted somewhere else, i.e. you're uploading from your computer, that image is hosted on imgur. Stack Exchange has a deal with them like that. You can see this on the image upload dialog.
Privacy Badger (https://www.eff.org/privacybadger) blocks a bunch of stuff on StackOverflow.
Blocks Scripts:
www.google-analytics.com
edge.quantserve.com
b.scorecardresearch.com
Blocks Cookie Only:
ajax.googleapis.com
www.gravatar.com
i.stack.imgur.com
Blocks nothing:
cdn.sstatic.net
Add content relevancy is one part of the equation, but not allowing third parties to track my actions on your site is the other, more important part of the equation (to me at least). Privacy Badger is the only adBlocker I use (besides having flash not run automatically), so I will happily view ads that don't involve tracking me.