The thing that's wrong with most of these statistics is that it doesn't measure popularity. In my mind, popularity is "What is most new development being done in RIGHT NOW." These are more a measure of "What is the sum total of all development for the past 10 years" (About the time that things started showing up on the Internet, and the minimum threshold for most programming books, etc).
Things like Internet search results and job listings are always going to be lagging indicators of programming language popularity. Job search results are lagging because even if most new development is being done in, say, Ruby, it'll take years for the active codebase (that needs to be maintained) to catch up to C or Java. Internet search results are similar- even if 10x more new articles are being written about Ruby, it'll take an incredibly long amount of time to catch up to the massive total that Java has accumulated over time.
What I'd be more interested in is the deltas between each of these statistics on a monthly basis.
""C" named languages are something of a problem. Queries for "C" tend to return results for C# and C++ as well. One way of dealing with this would be to run queries like this: C -C# -C++, however, that unfairly penalizes pages that contain discussions of both C and C++."
Of course, I'm also learning the language that's on the bottom of the list. Go Ocaml!
Nice work. It was especially interesting to was Haskell jump out of the back of the pack to #2 when moving to a more academic crowd.
I'm not sure the data has any meaning at all in the real world. In a similar vein, if you did this same exercise for presidential candidates Ron Paul is likely to look a lot better than his poll numbers show. There is a difference between what people talk about and what people do. You can only take implicit data so far. But it certainly makes for some nice graphs to add to a powerpoint deck, and I'd love to see trend information.
Yeah that sounded bad. Sorry about that. I'm just happy I've got that many nailed. I don't think I'm going to do the other two. Instead, I'm moving down to some of the more interesting ones.
Do you really think most readers have 8 languages under their belt, with real, live, production systems written in them? It would be interesting to put on an AskYC question. I don't know the answer one way or another. For some reason I kind of thought most of the readers, while super hot on the technology, were newer to the party than that.
Perhaps I was mistaken. I took your saying "I know" to mean "I have coded in each of these languages and could do something in each if I had to", instead of "I would be comfortable in a programming environment with any of these languages".
My bad.
btw I actually am trying out SICP right now, despite my C# leanings.
Looks awesome! If I wasn't trying to pick up OCaml and F# right now I'd jump on it -- one of the problems I'm facing is a lack of good texts. Would be very interested in your opinion of the book and Lisp, especially coming from a C# (like me) background.
Is the list of languages in the graph the complete list of languages searched for? If so, its not very thorough. For example, Objective-C, which by these same metrics looks at least as popular as Lisp (Freshmeat has 4 times as many Obj-C projects as Lisp projects), is not included.
Very nice. Great work, and very comprehensive. And I agree (please forgive me for repeating this) but seeing trends would be great.
Have you considered other sources, say, Technorati for similar data mining? That may prove to be another way to measure how much people are talking about a language...
Yes, getting stats from people's online journals is a good idea for keeping track of people's discussions. Technorati looks like it has a good API, too, so thanks for the suggestion!
Noticeably missing were numbers for <$language sucks> and <$language rules>. Job ads definitely count as endorsements, but somehow I don't think of Lambda The Ultimate as a bastion of Java love.
On the topic of getting access to AdSense data--you can also get keyword prices from yahoo/overture, which are (I believe still) freely available once you sign up for an advertiser account. SEOs and domainers use Yahoo's keyword price data, which is generally assumed to be pretty similar to Google's.
Things like Internet search results and job listings are always going to be lagging indicators of programming language popularity. Job search results are lagging because even if most new development is being done in, say, Ruby, it'll take years for the active codebase (that needs to be maintained) to catch up to C or Java. Internet search results are similar- even if 10x more new articles are being written about Ruby, it'll take an incredibly long amount of time to catch up to the massive total that Java has accumulated over time.
What I'd be more interested in is the deltas between each of these statistics on a monthly basis.