A few points (disclaimer: I work at Google on projects including App Engine)
1) It's not fair at all to say that Google "continue to ignore the two critical issues of uptime and data store latency". The HR datastore is specifically designed to address the concerns about variable latency of datastore operations, and to prevent both planned and unplanned downtime.
2) In my experience, high CPU cost for datastore operations is generally tied to things like having large numbers of indexes, or doing queries that don't have the right indexes. Each index written involves a bigtable write, if you are doing hundreds of these per entity it can become very expensive. That said, in general it isn't easy enough to know when you are doing something that will be expensive.
3) The bulkloader is definitely painful to use if you aren't familiar with it. In particular, casting schemaless data to a format such as CSV is hard because of the problem that rows can have unexpected keys. An improved "Bulk Datastore Import and Export tool" is on the 6-month roadmap for the product.
1) Yep. My company moved to an HRD instance this week and our CS reps and customers have both noticed a significant difference in stability. That doesn't imply that it's perfect but to say that Google is ignoring the issue is completely false.
Not sure yet. Even though we are one of the bigger clients on App Engine we weren't getting charged much at all for our usage anyways. But I believe Google said it would be around 3x for the HRD which isn't bad at all. Our executives really didn't care though as stability and reliability to our customers is worth too much to worry 3X (GAE's non-hrd stability has not been good recently). Btw we all use EC2 for some of our other services and that is costing us quite a bit more than GAE is at the moment. We'll see if that continues to hold true as we grow.
I am building everything on AppEngine atm. Initially it was just to prototype but I am thinking of launching a preview release of one app live
Can you talk about how big some of the larger customer apps are and how much traffic/uers they serve?
It would be easier for ppl in a position like mine to make decisions about AppEngine if there were big examples/case studies to point to in the same way Amazon, Heroku, Rackspace etc. do
Not to disparage your post, but it contains only an inkling of substance. Some people managed to get 1600 QPS on GAE, that tells me absolutely nothing. What was their usage like? How much did they pay? What was their latency like? What did they think of GAE? Etc etc.
Can you elaborate on queries not having "the right indexes"? Since the development server automatically adds indexes for all queries run on local dev machines and the production machines cannot run a query for which the necessary indexes don't exist, it's easy to assume that your indexes must be right.
I'd love to hear more about tweaks to existing indexes that can alleviate these sorts of problems.
The geomodel stuff is what I'm referring to here; it's efficient enough that you can actually execute the queries, but it's not necessarily fast (or cheap) to do so.
One annoying issue with the HR datastore is the appids have a 's~' prefix which can lead to strange behaviors when setting up domains via google apps and other jankiness.
I've been a heavy App Engine user since the beta days -- even gave a talk at I|O 2009 about scaling with App Engine. I have clients that run complex GAE sites that handle millions of daily requests.
I somewhat disagree with the author about scalability. There is a very narrow sweet spot of apps for which App Engine is a quite natural solution. If you're in the sweet spot, you'll probably scale well without too much up-front engineering investment.
The sweet spot _is_ narrow, though. For example, as the OP states: geo data doesn't really belong on GAE -- you can do a limited set of bounding box/proximity queries with the third party geomodel library, but wow, it's expensive and dog slow!
I do agree with the author about both the status dashboard (it only sometimes reflects my current experience with the system) and the surprising variation it data store latency. Latency has been much improved of late with 1.4 and beyond.
at google io 2010 they discussed "next gen" queries in depth, which should be able to do geospatial queries easily (or any query for which your search space can be filled by a space filling curve):
I must be in it, because I have rarely seen a downside to using AE. Here's the nature of my apps:
1) Extremely simple or very flattened data-model
2) Few writes, TONs of reads (I peak at 150 requests/second every day)
3) The occasional DeadlineExceededError causes me little or no headache. For some people this would be frustrating.
Also AE is awesome as a CDN. The latency is very tolerable for static assets.
I found that the AppEngine 1.4 SDK addressed many of these concerns. Personally, I've managed 50 requests per second on my blog without trouble, and with minimal CPU overhead. That's probably because everything is in memcache, so the database almost never gets hit. The pricing structure seems to actively encourage you to memcache as much as possible, too. Things are probably pretty different in a more write-intensive app, though.
I wrote my own. http://github.com/thurn/ackbar. If an article gets popular, I've paid up to $5, but below 6000 hits, it's all free. Latency could be better, but there's a lot of factors there (Clojure might be one of them?).
I take it that your images are rather static. If so, you should be adding caching headers which will cache them for free in Google's front end servers.
I don't quite understand with some of the commenters: why do you guys talk about caching immediately as if your app needs to scale from the ground?
Isn't the point of GAE is that it scales (as long as you don't do stupid queries)?
If we have to put everything in memcached, what's the point of using GAE?
I also don't quite understand the push of using memcached for almost everything (especially for young startups). How do you handle data integrity? I'm guessing most data models of young startups are fairly simple and only contain at most 10 models with almost no relationship? Otherwise data integrity is painful.
Your site will scale well - as in it will give exactly the same performance for the millionth user as for the first user.
That doesn't mean you will have a fast site, though.
On GAE the datastore is pretty slow (it's a lot better now, but it used to be terrible), so a common pattern it to use a read-though cache to improve performance.
I think in this reference they are not talking about storing live data in memcache, just copies of objects for faster access. So when you are getting the information about a user you check memcache first, if its not there look it up in datastore and add it to memcache. Then when the user loads a second page you can get the information directly from memcache instead of having to look it up again.
The point in question is why are you doing this on a new application? The sort of caching you describe is something you put in place after your site starts seeing a lot of traffic, and running queries for every request start to slow things down to the point where optimizing them doesn't help.
If you're designing your app well and running it on a solid stack, that should be something you need to worry about in year 3, after you're being TechCrunched on a regular basis. In the Rails/Django/AppEngine world, it seems to be the case that you need to resort to that level of caching just to see regular, day to day, 100 request/second traffic.
I guess I can see that argument, but at the same time its so easy to add memcache that it really seems silly not to add memcache for you models at are accessed a lot. Here is a simple user model, adding memcahce is a total of 7 lines. I just cant see the case where you wouldn't add this to start with:
from google.appengine.api import memcache
from google.appengine.ext import db
class User(db.Model):
email = db.StringProperty()
password = db.StringProperty()
sessionKey = db.StringProperty()
def userKey(self):
return "User.session="+self.sessionKey
def put(self):
super(User, self).put()
memcache.delete(self.userKey())
memcache.add(self.userKey(), self, 600)
def getCurrent(self, request):
user = memcache.get(self.userKey())
if user is None:
user = User.all().filter('sessionKey = ', request.header.cookie.get('user'))
memcache.add(self.userKey(), user, 600)
return user
Easy is good. It's a similar level of effort in most frameworks, which is handy.
The thing is, if you turn it on right away you never get a chance to see and fix a bunch of low hanging fruit optimizations that can really help you out. You don't know about them until you start to run up against scaling issues despite all your caching. Once you're there, it's not any fun to try to go back, find and fix those original bottlenecks, so you don't have much option but to start throwing hardware at it.
If, on the other hand you hold off and handle your first few scaling episodes by making your queries and code faster, only adding caching after you're not seeing returns from that other stuff anymore, you'll be able to go a lot farther before you have to start adding hardware.
C#/ASP.NET/SQL Server, all out of the box, running database and web server on a single machine that would have been a middle of the road dev box 4 years ago.
That box is running about 20 sites, some big, some small, usually no more than one of them getting TechCrunched/Redditted at a time.
Here's a writeup of one particularly heavy day in the life of that box:
"4) The GAE design patterns in python are ugly. I find our current Sinatra-based implementation cleaner and easier to understand. Python + django is verbose, its templating system is obtuse, and its testing framework is, well, I don't know because I've never seen it. This point is a religious one, so I'll leave it be"
Should Python programmers write the unofficial guide to migrating off of heroku?
A bit off topic: all of my customers but one in the last two years wanted to deploy to Amazon Web Services. I find this odd, being enthusiastic myself about AppEngine (no paid work, but I host some of my projects with it and I have written a few articles on GAE).
Clearly, not every web app is a good candidate for GAE.
I have found objectify-appengine to be nicer to work with than the official Java data store APIs and I think it helps minimize loading request times.
I never used it, but it sure looks interesting if your application doesn't fit GAE anymore or you want to make specific infrastructural adjustments. They support multiple different database/http/.. servers too.
This project looks promising. Could be a good basis for a consulting business, taking apps that have hit some obstacles or design limitations on Google AE, and getting them running on TyphoonAE-based hosting with app-specific infrastructure customizations.
I'm starting to notice that App Engine's memcache is incredibly slow. If serving up a page requires one memcache hit (for, say, caching entire response objects), it works well. If it starts to require, say, a dozen, the response time slows to ~700ms at best, and ~4s at worst.
I have no exact data at the moment. I looked through my application's logs and compared the response times in situations when the front page loads from a full-page cache entry and when it has to build the page from multiple pieces of memcached data. If I find time, I'll write a small benchmark application which tests memcache performance. Empirically however, App Engine's "memcache" is pretty slow.
This guy loses credibility when he says that Django templates is ugly and that there's no unit testing framework.
Those are defensible positions in the first 10 minute impression of app engine, but building a simple app (and testing it with GAEUnit) should put both to rest immediately.
Yes, the Datastore is unreliable, but a newer, more reliable version has been released.
App engine is still my preferred platform of choice -- just waiting for ssl, naked domains, and per-entity-group selection of which datastore service level to use.
On the other hand, GAE looks like a great fit for a static site built with Jekyll, like a personal blog. In fact I'm planning to migrate my site over today. Almost any personal site will be free to host. Even though I would only be paying a few dollars with S3, I see no reason not to give app engine a shot.
One thing I worry about is the reported downtime. I'm not sure whether or not that will affect a static site.
I disagree, this is only an issue if you have almost no visitors, and then you can pay a nominal fee and keep instances awake. This is definitely worth the development time you will save and the ability to move off GAE easily if you need to.
Can someone provide some enlightenment about why one would migrate away from GAE? I only ask because I'm currently building an a web application on GAE and am starting to look down the road, wondering if building on GAE will allow us to grow as a business in the ways we want to, or if there are some arbitrary limitations that will come to bite us later.
(knowing of course that we're more or less tied to Google's infrastructure and the GAE way of doing things)
The original article does point out a few reasons, but the way I see it is this: AE is great if you want to scale and your app can scale in the AE way. There is a sweet spot and the tools work pretty well if you're in that sweet spot, but if you stray outside that spot you don't have a lot of options. Even getting your data out can be difficult, making the "export data and start over" nuclear option difficult.
If you know what you're building and you want to scale it using the sort of tools AE provides -- i.e. the datastore, taskqueue, etc. all fits your app well -- it's quite good. And quite cheap; it's not really fair to compare per-resource pricing to AWS because on AE you'll only pay for what you use instead of paying for idle time on a server instance. But it definitely constrains what you can do, and also locks you to a single hosting provider. If you're still exploring what you want to do, that level of design and vendor lock-in can be a pretty severe liability.
Another way of saying the same thing is that GAE is perfectly fine if you know all the requirements of your app before you build it. In a perfect world, you know exactly what you want to build, everything you want is available with GAE, then you go build it and it scales nicely.
In the real world however, requirements change:
* You come up with a new idea that requires a certain library. Chances are, the library won't work on GAE out of the box.
* You find out that you need to change your schema. It is pretty hard to update to the new schema while keeping everything in sync.
Finally, you pay the Google cost. When Google implements a new feature, they spend enormous time making sure that it scales well. They need to do so since they could be looking at millions of users on day 1. Most of us however, are looking to build something as cheap as we can, not knowing whether anyone is going to bother to look at it. However, you have to do the same performance optimizations that Google has to do so that your app scales. Chances are, it will be wasted effort - unless your objective is to just learn. I find it funny that GAE goes completely against the rule that "Premature Optimization is the root of all evil". Yes, you should think about your application's scalability. But your bigger problem should be about finding traction, and being able to react fast, not optimize for millions of views.
If I understand correctly then, it would appear that GAE may be poor for applications that require heavy computation on the server side (say, a facebook style graph, crawling and computing various metrics on it) but great for serving tons of dynamic pages?
Can you illuminate that a little? Among the competitors, GAE appears to be among the lowest cost platforms. I'm sure there's certain types of applications that can cause it to become expensive, but I'm not sure I can predict well enough what those are.
At a high load (50+ requests/sec), we saw database timeouts tens of times per minute, and often more (every request in a 10-60 sec period would fail).
That's not a terribly high load by any modern standard, you can easily service that on one or two fairly commodity machines these days.
Assuming the point of GAE is scalability, this issue didn't make any sense to me. One would assume that Google would simply load balance requests out to replicated instances of your app and that those apps would in turn access Google's vaunted super scalable data services. Thousands of requests a second shouldn't even phase it.
I've noticed that response times can be highly variable, but so far (we're pre-launch) I haven't noticed anything just simply not getting returned or the kinds of slowness he's describing. Without more detail, something else must be going on here.
1) It's not fair at all to say that Google "continue to ignore the two critical issues of uptime and data store latency". The HR datastore is specifically designed to address the concerns about variable latency of datastore operations, and to prevent both planned and unplanned downtime.
2) In my experience, high CPU cost for datastore operations is generally tied to things like having large numbers of indexes, or doing queries that don't have the right indexes. Each index written involves a bigtable write, if you are doing hundreds of these per entity it can become very expensive. That said, in general it isn't easy enough to know when you are doing something that will be expensive.
3) The bulkloader is definitely painful to use if you aren't familiar with it. In particular, casting schemaless data to a format such as CSV is hard because of the problem that rows can have unexpected keys. An improved "Bulk Datastore Import and Export tool" is on the 6-month roadmap for the product.