One of the things I'm constantly baffled by given the growth of data science and analytics is that data modelling isn't treated as a first class concern. It's absolutely fundamental to doing quantitative work efficiently at an organization with any amount of complexity, yet the majority of people in the field seem to be unaware of the concepts.
This ignorance is especially surprising given that it's essentially a solved problem (Kimball), yet if you talk about data modelling, people usually think regression, not schema.
I agree that data modeling is underestimated but it hardly can be considered a solved problem. It is very hard because there are numerous alternative understandings and formal defs of what we mean by data (RM, 00, OR, MD etc.) In additiin, there are several levels of representation (physical, logical, semantic). In real projects, they all are mixed.
I'm guessing that the implication was that Kimball's dimensional modeling techniques (so, in a nutshell, snowflake and star schemata) will get the job done in almost any case.
I'm not sure I would 100% agree with that - e.g., denormalization, while useful for many things, isn't always your best option. But I would say that that there are a lot of tools in the box, and that is absolutely one of the critical ones to know.
> denormalization, while useful for many things, isn't always your best option.
From my view, it is generally not a good option for cases it wasn't designed for, an example being non-analytical reporting. If you are running operational support, getting the source data immediately and aggregating/displaying can be more helpful than modeling for analytics workloads. The line between these is blurred in most orgs. To the OPs point, data modeling seems like a sidenote in most analytical discussions. You can accomplish a lot using the star model which is essentially:
Prepare things to be fast by sorting them into proper groups (fact/dim/bridge)
Rely on ints
Store atomic data
Provide summaries/aggregates
Model after the questions you ask; not the system it comes from
I think Kimball's approach stumbles a bit because it needs bitemporalism to rescue it from its own temporal hackery.
But for a lot of things it works pretty well at what it does. It's useful to have a body of work formed over many years, tested in many demanding scenarios, providing some sort of structured guidance. Twitter and blog posts and random tinkering don't rise to the same level.
This ignorance is especially surprising given that it's essentially a solved problem (Kimball), yet if you talk about data modelling, people usually think regression, not schema.