Building columnar compression in a row-oriented database

georgewfraser · on Oct 31, 2019

Reading this post is frustrating. What they are describing is where column store databases were 20 years ago. Perhaps at some point the folks at TimescaleDB will read Daniel Abadi’s 2008 paper, which describes the key elements of how all modern column stores work:

http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

The key takeaway is that columnar compression only accounts for a small minority of the speed up that you get for scan-oriented workloads; the real big win comes when you implement a block-oriented query processor and pipelined execution. Of course you can’t do this by building inside the Postgres codebase, which is why every good column store is built more or less from scratch.

Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

mfreed · on Nov 1, 2019

Hi @georgewfraser: We understand this point, and you aren't the first to mention it on HN =)

But our users, and the users of most so-called "time series databases", do not typically generate the classic scan-oriented workloads of OLAP systems that motivated C-Store and classic columnar data warehouses.

I talk about this in the post: sure, there are deep and narrow queries, but there are also a /lot/ of shallow and wide queries. Most real-time dashboarding, which is a huge use-case for IT/devops monitoring and IoT sensor monitoring, is about "what happened very recently".

But you do occasionally want to look back, so you want to keep old data sitting around, just in a cost-effective manner. And there is a lot of data: time-series and monitoring applications are typically write (not read) heavy, and ingest high volumes of data continuously.

Enter compression and data tiering, both of which appeared in today's v1.5 release.

As such, the primary motivation behind our work on compression was actually saving storage overhead, not improving query performance. Although we'll never pass up on performance improvements when we get them! =)

(And we are familiar with the citation -- I've known Daniel and Sam for many years, and one of the engineers who led this work at Timescale was actually Daniel's former PhD advisee Josh Lockerman.)

georgewfraser · on Nov 1, 2019

Thank you for responding. The “shallow and wide” query you describe is not a challenge for any recently built column store, which will partition the data into blocks of 10-100 MB. So when you run a query like:

select * from time_series where ingest_time > ?

You will only scan the most recent block, and it will be extremely fast. I have no doubt that there are individual queries where a time-series database will “win”, but I earnestly question whether there’s a real-world use case, with a mix of queries, where a special purpose time-series db is preferable to a good column store with time-based partitioning.

grumpydba · on Oct 31, 2019

> Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

Or you can set up a clickhouse instance. It's a seriously promising and underrated product.

manigandham · on Oct 31, 2019

Clickhouse is a distributed relational columnar database. It competes with MemSQL, Vertica, Actian, Greenplum, and hosted options like Redshift, Bigquery, Snowflake, etc.

IanCal · on Oct 31, 2019

I assume they were pointing out you don't need to go to a commercial offering.

atombender · on Oct 31, 2019

Clickhouse is good, but it's definitely made for a very limited purpose; it's not a general purpose SQL database. Which is fine, but the attraction with something like TimescaleDB is that your time series data can coexist with normal data.

PeterZaitsev · on Oct 31, 2019

Many Time-Series applications do not need super complicated SQL. This is why there are many timeseries focused databases even without SQL support.

There is also PostgreSQL Foreign Data Wrapper for ClickHouse which allows you to run all SQL PostgreSQL support and often with great performance

buremba · on Nov 1, 2019

If you use Postgresql's query engine on the Clickhouse data, you lose all the benefits of the columnar query engine of Clickhouse so that's not correct.

grumpydba · on Nov 1, 2019

No you don't lose them. Fdw supports push down of where clauses, only selects the required columns. You can also create views in clickhouse to make sur the joins are processed there.

buremba · on Nov 1, 2019

You're right but if the syntax that you're using is not supported in Clickhouse, aggregate and predicate pushdowns won't work and this FDW (https://github.com/adjust/clickhouse_fdw) needs to map all the Postgresql functions / produces to Clickhouse in order to take advantage of push-down so the only use-case here is that you may want to join the data in Clickhouse with the data in Postgresql (or other FDW sources).

IanCal · on Oct 31, 2019

I don't know if I'd say very limited, I've used it to do a lot of standard SQL stuff where my workload wasn't time series at all but was aggregations and analytical workloads.

mamcx · on Oct 31, 2019

I think this is fair. The whole selling point is to have time series abilities inside of PG.

It will not be as fast as a tailored engine? Sure. But the advantage of stay inside of PG is very great (not only ecosystem, but have sql, OLTP + OLAP, etc).

Is like how a document store could be better than add JSON with PG...

In practique? I think this hybrid approach is very good.

P.D: And what about the opposite? Which columnar store can I use for OLTP?

GordonS · on Oct 31, 2019

I was going to say something similar.

We have TimescaleDB in production with hundreds of millions of time series data sat alongside "regular" data - having to deploy a single database instead of Postgres and a specialised time series database is a real boon.

For us, TimescaleDB queries run way faster than stock Postgres. They might well run faster in something specialised, but querying/aggregating hundreds of millions of rows in 10s or 100s of milliseconds is plenty fast enough for many use cases.

manigandham · on Oct 31, 2019

True, but this is a major step up from what Timescale offered before. This was always my biggest contention since performance lagged behind many other datastores but they have done well in taking advantage of the Postgres integration and ecosystem.

There are plenty of scenarios that benefit from having automatic time-based partitioning and querying within the same operational PG datastore instead of running a separate analytical system. The improved performance only helps this case.

m0zg · on Nov 1, 2019

Um, no. Columnar compression can be huge if your data can be sorted favorably. You basically get to filter (both compare and skip) and aggregate without fully decompressing. If you have to scan a column (or a few), this can be huge. It's great if you can skip blocks within the column or entire shards, but that's unlikely to happen unless you're doing a highly selective query. OLAP queries aggregate a ton of data. Judiciously chosen column compression scheme coupled with a good sort order can be a huge, huge advantage.

GordonS · on Oct 31, 2019

Of anyone from TimescaleDB is here: I run a (very) small business using TimescaleDB in production, but don't need any enterprise features and so haven't purchased a commercial license. But I love it!

Do you have a Patreon, PayPal account, or any other means to receive money as a donation, gift or token of appreciation?

mfreed · on Oct 31, 2019

No, but we always love user stories! Shoot me email at mike (at) timescale.com

akulkarni · on Oct 31, 2019

Thank you for the <3

And yes as mfreed already mentioned we love hearing user stories :)

dgudkov · on Oct 31, 2019

For anyone interested in columnar compression and columnar databases I highly recommend reading Vertica's Technical Overview White Paper [1]. I don't know how Vertica is doing now, but it's a beautifully designed columnar database with many interesting concepts which influenced a lot the product I'm currently working on.

[1] https://pdfs.semanticscholar.org/0cb4/d21ca5198819a85adae8ea...

cevian · on Oct 31, 2019

Hey Timescale engineer here. Happy to answer any questions.

benwilson-512 · on Oct 31, 2019

Hey! We're looking to evaluate TimescaleDB for a logistics IoT scenario. Some of the data that enters our system comes from connected devices where recorded_at and inserted_at columns are basically the same. Some data however is sourced from dataloggers that may record for months before the data arrives at our system.

With TimescaleDB, would I use the recorded_at or inserted_at column for the hypertable?

Does this change if data for an individual sensor can sometimes arrive out of order? If the sensor malfunctions and the data contains timestamps in the far past or the far future does this cause issues with TimescaleDB?

What we've done in postgres so far is have the tables with data generally structured around the recorded_at column because most analysis wants to look at the data "in order" . to generate reports, graphs, etc. Each data row also contains a "payload_id" relating it to a "payloads" table which helps group data by when it actually hit the system. Data processing has generally been built around the payloads and then query any additional data in recorded_at order on the main data tables if we need to look back or forward in time.

RobAtticus · on Oct 31, 2019

For choosing the column, you'll usually want to think about what your queries will be using. It sounds like `recorded_at` is probably more likely to be useful since that's when the data "occurred," but again it depends on your expected query load.

Out of order data should be handled fine by TimescaleDB -- if you do have data that is far in the future or in the past, you may get stray chunks to hold those, but it's not going to create all the intermediate chunks or anything that might be undesirable. You can later correct those fields by deleting and reinserting the record with a corrected timestamp.

jnordwick · on Oct 31, 2019

Sounds like he's describing a bitemporal database (although not in the canonical form usually associated with them).

I've looked into timescaledb for this, and it doesn't support them.

refset · on Oct 31, 2019

I agree. Bitemporal databases can natively handle late-arriving data in these kinds of upstream timestamp integration scenarios.

However, the intersection of bitemporal indexes and columnar time-series queries seems important and yet I haven't seen anything that looks like it might offer both, possibly asides from kdb+ and SAP HANA.

Disclosure: I work on https://github.com/juxt/crux (which is optimised for bitemporal graph joins and doesn't currently employ columnar indexes)

athenot · on Oct 31, 2019

I generally like the direction y'all are going.

Question: how does the new compression compare to TokuDB, and is it tuneable for performance/size tradeoff?

https://www.percona.com/doc/percona-server/5.7/tokudb/using_...

RobAtticus · on Oct 31, 2019

Haven't done any comparisons against TokuDB, so can't give a deep answer there. [Edit: removed incorrect bit about tuning algorithm]. We do not offer tuning yet, but this is the initial implementation, so we'll definitely be looking at what knobs we can offer in the future.

andrewg · on Oct 31, 2019

Very cool! The effect on page layout sounds like it would be pretty similar to Oracle's hybrid columnar compression[1], but they claim average compression ratio is more like 10:1. Any idea what would make so much of a difference?

[1] https://www.oracle.com/technetwork/database/exadata/ehcc-twp...

mfreed · on Oct 31, 2019

One guess from a super quick scan can be that we use type-specific compression algorithms. So if your table has one column of timestamps, another of floats, another int, another string, the database employs different compression algorithms (typically best-in-class) based on the column type.

Quick scan of the Oracle paper couldn't find specifics, other than something like this:

"Warehouse Compression provides two levels of compression: LOW and HIGH. Warehouse Compression HIGH typically provides a 10x reduction in storage, while Warehouse Compression LOW typically provides a 6x reduction"

That would at least suggest that they aren't doing anything type-specific like we are.

mfreed · on Oct 31, 2019

It may also be that we're operating in a slightly more delayed fashion (partially based on chunk boundaries), so we can organize across a lot larger range. For example, if you choose to segment by a device_id, it might scan 1M rows to assemble blocks/segments of device_ids, with each device_id having 1000 records to compress in a "mini column".

This also leads to significant query performance settings if you common filter by device_id, for example. Which are super common in time-series workloads for IT monitoring / devops / IOT / etc.

pella · on Oct 31, 2019

Congratulations!

* What is a plan for PG12 PLUGGABLE STORAGE[1]? this is for PG12?

* Can you compare with Zedstore[2]?

[1] https://www.postgresql.org/docs/devel/tableam.html

[2] https://github.com/greenplum-db/postgres/tree/zedstore

mfreed · on Oct 31, 2019

Hi @pello, article author here.

One of the interesting things of our technique is that it doesn't require low-level changes to Postgres, and actually then works with any version of PG that TimescaleDB supports (currently PG10, 11...PG12 coming soon).

That said, we're excited by the work Postgres has been doing with pluggable storage, particularly how PG13 will further open up possibilities such as Zedstore, and look to see how we can then marry some of these ideas. In terms of feature-by-feature comparison, haven't yet dug enough into the details.

Aside, one interesting aspects of our approach is discussed in the article: Having a "hybrid" row/column, rather than purely columnar, can actually be beneficial for many time-series workloads that constantly query very recent data (e.g., for dashboarding) as well as to improve ingest rates (although some column stores do build a temporary in-memory row-based cache before batch writing a column).

1996 · on Oct 31, 2019

for dashboarding use something like pipelinedb (RIP)

mfreed · on Oct 31, 2019

PipelineDB was useful primarily if you primarily wanted to maintain a _continuous_ materialization of some aggregate (e.g., an approximation of the distinct items in the database), not necessarily an aggregate per time period.

If you are interested in primarily showing the recent data -- which you see in many monitoring examples in IT/devops or IoT -- you often want the raw data or an aggregate per time period.

As an aside, TimescaleDB introduced continuous aggregations in v1.3. Here's a nice example of using it with Grafana for dashboarding: https://blog.timescale.com/blog/how-to-quickly-build-dashboa...

mamcx · on Oct 31, 2019

I'm building a relational lang (that could feel like a in memory db) and explored the idea of using a columnar backed structures.

How well could be apply the same ideas for in-memory processing?

If I understand correctly, you have something alike:

- Store each column on a array of N=1000 - Store the group of columns in pages, with metadata of ranges of keys to locate rows in the adequate page

ants_a · on Nov 1, 2019

This is commonly done in graphics and physics processing engines. In that domain it's the array-of-structures vs. structure-of-arrays design choice.

wmf · on Oct 31, 2019

https://arrow.apache.org/

jeltz · on Nov 1, 2019

Are there any plans to collaborate with the PostgreSQL project? Especially on Andres' work on speeding up the executor? My apologies if you already are and I have just not seen your names on the mailing list.

fabian2k · on Oct 31, 2019

I really like Postgres as an allround database, and putting monitoring data into Postgres is an attractive idea to me. One thing I'm wondering is how well you can mix a regular Postgres database with timeseries data using TimescaleDB? If I have a regular application using Postgres, can I just add timeseries in there with TimescaleDB in the same database? Or is that either not possible, or simply a bad idea?

RobAtticus · on Oct 31, 2019

You definitely can mix the two, and we actually find it a very useful thing so you can have your metadata available to query alongside the time series data. TimescaleDB will only make a table a hypertable if you tell it too, otherwise regular tables should work as you'd expect. And since we appear as a table in the contexts that it matters (JOINs, querying, inserting, etc), you shouldn't have any issues mixing the two.

GordonS · on Oct 31, 2019

I do this, and it works just fine - no caveats, it's actually one of the selling points of TimescaleDB.

jeltz · on Nov 1, 2019

Yes you can, and I think this is one of the main selling points of TimescaleDB.

dedsm · on Oct 31, 2019

it's perfectly fine to use TimescaleDB alongside any regular tables.

stalller · on Oct 31, 2019

It is encouraged! :)

valyala · on Oct 31, 2019

This is great news! We at VictoriaMetrics [1] are happy that TimescaleDB eventually addressed high disk usage issue and implemented type-aware column-based compression. This should improve its' position and performance in TSBS benchmark [2].

One question: why do you use Gorilla compression for floating-point values? It works well for integer values, but is pretty useless for floating-point values [3].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics

[2] https://medium.com/@valyala/measuring-vertical-scalability-f...

[3] https://medium.com/faun/victoriametrics-achieving-better-com...

mfreed · on Oct 31, 2019

Gorilla is one of the few simple, efficent, _lossless_ algorithms that compress binary data.

Tricks such as renormalizing floats in base-10 (which I believe VictoriaMetrics implements) while great when they work, potentially truncates lower-order digits.

Lossiness is not a decision we're willing to make for our users unilaterally. And, as Gorilla is one the default compression algorithms we employ, it needs to be one that is correct in all cases.

Incidentally, we experimented with the binary equivalent of the renormalization algorithm, and it performed on-par with Gorilla in our tests.

1996 · on Oct 31, 2019

Too bad pipelinedb died. It was a nice complement to timescaledb -- with small oddities that sometimes took a production database down if you had no experience of advanced vacuum / repack features.

Any plan to bring back something like continuous views in timescaledb? (or to integrate pipelinedb work)

mfreed · on Oct 31, 2019

Continuous aggregates were first introduced in v1.3. Not identical to PipelineDB features, but address some of the very same use cases. We have some cool new features for continuous aggs planned too.

Tutorial: https://docs.timescale.com/latest/using-timescaledb/continuo...

Example with Grafana dashboarding: https://blog.timescale.com/blog/how-to-quickly-build-dashboa...

Technical details of correctly handling backfilled data: https://blog.timescale.com/blog/continuous-aggregates-faster...

powturbo · on Nov 1, 2019

- Timestamps are very easy to compress down to 0.01%

- Floats are hard to compress without quantization or lossy compression. The Gorilla algorithm is not helping so much.

You can do your own benchmarks with your data with TurboPFor : https://github.com/powturbo/TurboPFor

or download icapp including more powerfull algorithms.

m0zg · on Nov 1, 2019

Nonrandom time series floats are often significantly helped by delta-compression before applying the actual block compression algorithm. I used to do this quite successfully in late 90s when sending FOREX quotes to clients over analog modem connections.

powturbo · on Nov 1, 2019

Yeah, for benchmarking it is important to use real data, because the compression ratio can significantly differ. You'll have better compression with lossy compression and better precision with user controlled and bounded relative error. see for ex. this benchmark for scientific data: https://github.com/powturbo/TurboTranspose

jnordwick · on Oct 31, 2019

Instead of trying to hack column like features and performance onto a row oriented structure, why not just develop a column storage for hypertables? All this work seems like a dead end performance-wise.

cevian · on Oct 31, 2019

The article covers this in more depth, but the short answer is: The hybrid approach has benefits when your workload has both shallow-and-wide (fetch all data for user X) and deep-and-narrow queries (calculate the average number of logins for all users). Row stores excel on shallow queries, column stores excel on deep queries. For time-series data you want row-oriented structures when the data just comes in and convert it to a column store after it ages.

mfreed · on Oct 31, 2019

It's also the case that, at least for Postgres, building a new low-level storage format isn't feasible until PG13.

And also this approach allows us to leverage alot of the battle-tested stability built into current storage layout with TOASTed pages, without starting from scratch.

vkaku · on Oct 31, 2019

+10 for moar compression!