One of the things I'm constantly baffled by given the growth of data science and...

conceptoriented · on Jan 30, 2020

I agree that data modeling is underestimated but it hardly can be considered a solved problem. It is very hard because there are numerous alternative understandings and formal defs of what we mean by data (RM, 00, OR, MD etc.) In additiin, there are several levels of representation (physical, logical, semantic). In real projects, they all are mixed.

boublepop · on Jan 30, 2020

Could you elaborate on why you think data-modeling is a solved problem? It’s seems like an opinion warranting more than a one word explanation.

mumblemumble · on Jan 30, 2020

I'm guessing that the implication was that Kimball's dimensional modeling techniques (so, in a nutshell, snowflake and star schemata) will get the job done in almost any case.

I'm not sure I would 100% agree with that - e.g., denormalization, while useful for many things, isn't always your best option. But I would say that that there are a lot of tools in the box, and that is absolutely one of the critical ones to know.

atwebb · on Jan 30, 2020

> denormalization, while useful for many things, isn't always your best option.

From my view, it is generally not a good option for cases it wasn't designed for, an example being non-analytical reporting. If you are running operational support, getting the source data immediately and aggregating/displaying can be more helpful than modeling for analytics workloads. The line between these is blurred in most orgs. To the OPs point, data modeling seems like a sidenote in most analytical discussions. You can accomplish a lot using the star model which is essentially:

Prepare things to be fast by sorting them into proper groups (fact/dim/bridge)

Rely on ints

Store atomic data

Provide summaries/aggregates

Model after the questions you ask; not the system it comes from

jacques_chester · on Jan 31, 2020

I think Kimball's approach stumbles a bit because it needs bitemporalism to rescue it from its own temporal hackery.

But for a lot of things it works pretty well at what it does. It's useful to have a body of work formed over many years, tested in many demanding scenarios, providing some sort of structured guidance. Twitter and blog posts and random tinkering don't rise to the same level.