It's a very difficult problem, no doubt. Implementations are currently battery-constrained, hence the usefulness of GCM/APN for efficiently batched polling. I suspect a fully decentralized system could be feasible for use cases that can tolerate higher latency.