Another good read (it probably does not reflect how you want to write C code, the rule about dynammic allocation is probably extreme if you are not writing code to fly spaceships, but I think it is good to read regardless): http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf
The first query seems awfully slow. I have a six node vertica cluster with a 100 column table with 7Bn rows in it and a similar query takes less than 3 seconds.
The OP's cluster is a 16-node hs1.xlarge cluster (has 3 spindles per node). There's actually a more powerful node-type hs1.8xlarge which has 24 spindles on each node. More info: http://aws.amazon.com/redshift/pricing/
So it's not fair to compare Redshift performance to your Vertica cluster unless the hardware is similar.