Apache Age: A Graph Extension for PostgreSQL

jeiklo · on March 4, 2021

Though i appreciate all the hard work people put into this and offer it so generously for free, it kind of saddens me to see yet another property graph database that supports some non-standardized (not really anyway) query language. I would really like to see a free and rdf based triple-store with good SPARQL support and that can be used for serious production workloads. But all the open source activity seems to be in the property-graph camp, with a new product every couple of month, while the high-quality triple stores are all quite pricy.

jnwatson · on March 5, 2021

Really, the market has spoken. In relational databases, tables with multiple columns aren't strictly necessary, but practically quite useful. In the same way, property graphs are more useful than triple stores, since common usage patterns want a collection of related properties a lot of the time.

Another way to put it is: it is straightforward to map a property graph to a triple store. In most cases, the property graph will have fewer nodes and edges and will operate faster and be easier to maintain.

kendallgclark · on March 5, 2021

Benchmarks consistently fail to show this perf advantage. Fair ones anyway. Which makes sense because property graphs and RDF are very similar and mostly differ in terms of syntax, i.e., stuff that good query planners and indexing schemes compile away.

rambojazz · on March 5, 2021

What does it mean that "tables with multiple columns aren't strictly necessary"? Do you use tables with 1 column?

nkozyra · on March 5, 2021

"One sequential primary key field is all the columns anyone will ever need in a database"

- Bill Gates

ethbr0 · on March 5, 2021

128 bits are enough for enumerating everything.

- 2017

d110af5ccf · on March 7, 2021

2^128 is ~10^38. For frame of reference, there are ~10^22 atoms in a penny and ~10^50 atoms in the entire planet.

Alternatively, 585 years is "only" ~2^64 nanoseconds. 2^128 nanoseconds is on the order of 10^22 years, while the estimated current age of the universe comes in at a mere 10^10 years.

What sort of enumeration could you possibly do in practice on such a scale? (Allocation, on the other hand, is an entirely separate problem.)

ethbr0 · on March 7, 2021

I can't think of any. But I think the historical lesson is that what I (nay, people much smarter than me) can think of today is insufficient for the greater tomorrow.

The ridiculous historical quotes for computer counts, disk size, RAM were all predicated on people not computing differently. As it turns out, technology progressed, and people began doing entirely new, unexpected things with computing.

d110af5ccf · on March 8, 2021

For the record, most (not all) of those quotes are taken somewhat out of context. Even in 1950, it was trivial to come up with examples of sets with more than (for example) 2^32 elements.

When it comes to address space allocation (ie IPv6) I agree; I'm skeptical that 128 bits will prove to be sufficient. But as far as simple enumeration goes, 2^128 is unimaginably large and I don't see how changes in computing could possibly affect that assessment.

An example. Partition a 2^128 address space evenly between 2^64 individual computers (it's difficult to imagine humanity ever possessing anywhere near this many devices). Each computer does nothing more than visit each value in its segment of the address space sequentially. No additional computations, nothing, just visits it. At 1 value per nanosecond (ie 1 GHz) this otherwise pointless exercise requires approximately 585 years to complete.

d110af5ccf · on March 7, 2021

It means that everything a relational database is capable of (from CS theory) can be done with a single column of values per table. Note that each such table also has a "column" of primary keys; in other words, it's a simple K -> V mapping.

Also note that just because you can do something doesn't mean that it will be efficient, or that it will be enjoyable to work with.

dragonwriter · on March 8, 2021

> It means that everything a relational database is capable of (from CS theory) can be done with a single column of values per table. Note that each such table also has a "column" of primary keys

So, in other words, it's a two column table (“primary key” aren't some kind of virtual column, either in concrete databases or in relational theory.)

Calling a database with two columns one of which is the primary key a one-column table is...just wrong.

coolgeek · on March 5, 2021

I believe that they are referring to the EAV data model, which is maximally flexible, in terms of schema modification. But performance, especially in terms of queries (but also in terms of DML) is atrocious

jpfr · on March 5, 2021

This uses openCypher for queries, which is about to become an ISO Standard. In the mid-term the query language will be on equal footing with SQL.

http://www.opencypher.org/articles/2019/09/12/SQL-and-now-GQ...

tannhaeuser · on March 5, 2021

I don't find SPARQL terrible, and have used it in commercial projects.

But RDF can't claim the "standardization" argument in good faith when RDF/SemWeb overshadowed Datalog/Prolog (based on a true ISO and community standard) for such a long time. SemWeb, like XHTML, SOAP/WS-* and other W3C stuff, failed on the web to become an enterprise-y thing instead, W3C being a pay-as-you-go org.

kendallgclark · on March 5, 2021

I don’t understand this argument. Several overlapping specs max exist. ISO specs weren’t built to take advantage of Web specs. W3C specs were. Like it or not, the W3C didn’t do any disservice to the ISO or fight unfairly.

rambojazz · on March 5, 2021

Do you know Fuseki?

kendallgclark · on March 5, 2021

That’s a bit of a toy from the perspective of enterprise requirements. Good starter system if you adhere to open source religion.

rambojazz · on March 8, 2021

What enterprise requirements does it lack?

rzk · on March 4, 2021

This is based on AgensGraph: http://bitnine.net/downloads-2020/

I found this presentation from 2017 about AgensGraph: https://www.slideshare.net/mobile/kisung80/agensgraph-a-mult...

gutbasokchok · on March 8, 2021

download link changed to https://bitnine.net/agensgraph-downloads/

ajankelo · on March 4, 2021

Fantastic that they are using Cypher. Love that language, if one could say that about a Query language.

hc-taway · on March 4, 2021

Cypher's pretty much the only thing about Neo4j that I found to be both pleasant-to-use and... well, any good, really. Love seeing it borrowed by other graphDBs. I'm far from being a SQL hater, but being able to bounce into Cypher to replace (at least some large subset of) recursive CTEs would be a huge developer-experience improvement for PostgreSQL, for multi-model DBs.

Example from the n4j Cypher docs, for the curious:

    MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-()-[r2:FRIEND]-(friend_of_a_friend)
    RETURN friend_of_a_friend.name AS fofName

Returns names of friends-of-friends (connected with FRIEND-labeled edges) User nodes having the "name" property "Adam". Stuff like "friend_of_a_friend" is set as an alias for the matched nodes, like in SQL. () denotes a node, [] an edge. (It's been a while, so this explanation may be subtly wrong, but it's close)

kebman · on March 5, 2021

I love Cypher too. And PostgreSQL! I wonder how this project compares to Neo4j.

eurasiantiger · on March 5, 2021

Take a look at OrientDB’s OSQL.

simplify · on March 5, 2021

Agreed, Cypher was the thing I loved most about neo4j.

simonebrunozzi · on March 5, 2021

Ever tried 4GL on IBM? Way worse than most.

rajman187 · on March 5, 2021

It’s always nice to see such efforts around Postgres. I do think it’s very well suited to many needs aside from extreme scales that most won’t deal with.

In terms of graphs, there is also an implementation of Tinkerpop which allows using Gremlin, very different in nature to Cypher.

http://www.sqlg.org/docs/2.0.1/

NB: I believe cypher can compile to bytecode that runs on the tinkerpop engine which I found interesting

jacques_chester · on March 4, 2021

How would this compare with something like pgRouting (https://pgrouting.org/)?

crad · on March 4, 2021

I read pgRouting as focusing on Geospacial routing to get from point A to point B. Is there more to it than that?

A graph database is about storing data that the relationships between pieces of data, like a social graph as an example. You'd have a people and the relationships between them in the database.

derefr · on March 5, 2021

Despite the description on its website, at core, there's nothing particularly geospatial about pgRouting. You don't need PostGIS, or even Postgres's built-in geo types (point, line, etc.), in order to use pgRouting.

Rather, pgRouting is a set of general graph- and path-search algorithms, exposed as procedures, that operate upon rowsets (most efficiently, upon indexed tables) of vertices and edges. You can use pgRouting to do SPARQL-like graph queries, or even full-blown network analysis, if you like.

In a previous job, I did just that: I loaded up social-network data into vertex and edge tables, and then I used pgRouting's implementation of Floyd-Warshall and Driving Distance to discover high-value potential social connections within a given relationship-weighted distance of a given user. Not as a one-time data-science thing, but as the backend of our service's matching engine, that ran every time a user refreshed their "candidate matches" page. It was pretty instantaneous.

nathcd · on March 5, 2021

I remember wondering about exactly this a couple years ago, but I couldn't figure out the answer (I didn't look all that closely into it). Now I want to come up with an excuse to try it out on something. Thanks for mentioning this!

yowlingcat · on March 5, 2021

Wow, this is super neat. I wish I knew about this a couple years ago, it would've been super useful for a recommendation system I was building in production. I'll have to give this a shot!

ForHackernews · on March 4, 2021

FWIW, Postgres already has good support for representing and querying graph structures using the LTree extension https://www.postgresql.org/docs/current/ltree.html

jimktrains2 · on March 5, 2021

That's more for trees, which to be fair are a specific kind of graph I guess. Ltree doesn't provide anywhere near the types of tools someone would expect if you told them it supports "graph structures".

znpy · on March 4, 2021

old: https://news.ycombinator.com/item?id=26309560

gutbasokchok · on March 8, 2021

Check out http://age.apache.org/docs/Apache_AGE_Guide.pdf for details

drodil · on March 5, 2021

Very ugly code review process in that project in Developer Guidelines

mdaniel · on March 5, 2021

And braceless if statements are the road to ruin

I also chuckled at "Repeat 4 and 5." written in a `<ul>`

csakgw · on March 17, 2021

That's a very interesting topic

gorgonzolaoh · on March 5, 2021

Cool Project!

fatsdomino001 · on March 4, 2021

I wonder how the inclusion of graph features in Postgres 14 will affect this project.

jhoechtl · on March 4, 2021

From where do you get that? I was searching the internets for this purported feature and couldn't find it. Link?

cldellow · on March 4, 2021

Version 14 adds some features to recursive CTE expressions to do BFS/DFS searches and cycle detection. As always, depesz has a nice write up of it: https://www.depesz.com/2021/02/04/waiting-for-postgresql-14-...

I _think_ it's just syntactic sugar and doesn't let you do anything you couldn't already do, although perhaps it would leave room in the future for the Postgres team to optimize query execution.

philsnow · on March 4, 2021

the projects are not related at all but this and https://github.com/FiloSottile/age have a name conflict.

rzzzt · on March 4, 2021

I like to research meaningless things, so looked at the first commit in each repository:

- https://github.com/apache/incubator-age/commit/bef50e5d86d45... (Mar 19, 2019)

- https://github.com/FiloSottile/age/commit/06cbe4f91ea9843069... (Oct 6, 2019)

What does this mean? Absolutely nothing.