Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Under InnoDB, because we’re not talking about file corruption but cases where MySQL would silently and irreversibly discard data rather than throwing an error. For example, inserting a value which is too large for the target data type - it truncates the value so that value is lost unless you have another copy or way to reconstruct it.


People always criticize MySQL, then when asked what is the problem, they always mention the same thing about column length, which is just a default that can easily be changed and it's not even the default anymore for years.


That’s the easiest thing to mention but it’s far from the only one I’ve encountered from using MySQL since the late 90s. The reason things like this keep getting mentioned is that they keep breaking databases: it’s like trying to excuse all of the salmonella cases at a restaurant which serves undercooked chicken because you can remember to order it fully cooked.

In 1998, MySQL was appealing because it was very fast on simple queries and didn’t cost a further. Until the early 2010s it had a better replication story than Postgres. But by now it’s often slower than Postgres on my apps, feature poor, and you don’t notice how many limitations you’ve internalized until you realize it’s been years since you had to repair or kludge around a MySQL quirk.


> For example, inserting a value which is too large for the target data type - it truncates the value

I commented in another thread about this. It's not black or white whether that's a bug or a feature since:

1) not truncating causes the whole row to be rejected in Postgres, or with the relevant MySQL server setting

2) for many SaaS and social media apps, truncating that column type is fine, since typically that's a description field

3) auto-truncation means you don't have to sync your app and database schema after every change.

I prefer and rely on MySQL's behavior actually, even though I work with both MySQL and Postgres.

Source: DBA.


I'm trying to think of a delicate way to put this but I'll be blunt: when you put “Source: DBA” my reaction is not “oh, an expert who is correct when everyone else is wrong” but “wow, a DBA who doesn't care about data integrity? It's like finding a doctor who doesn't wash their hands.”.

As someone who's trained a bunch of DBAs, here's how you should explain that this is a bug:

A database's job is to reliably store and retrieve data: that's why we talk about ACID since the whole concept is built around moving from one correct state to another. If we didn't care about getting back what we stored, we could just write flat files on disk and shrug when it breaks. Silently discarding some of the input breaks that contract.

Similarly, a core part of the SQL standard is about the database ensuring that values match the defined schema constraints. If we're going to allow silent truncation, the same reasoning would allow inserting foreign keys which don't exist, coercing non-numeric values into NULLs or other ways of “handling” non-numeric data types, replacing invalid dates with "0000" (oh, wait…), ignoring CHECKs (which MySQL did until 8.0.16), etc. If we're not doing that, we could just use MongoDB and stop pretending to be more than a blob store.

Rejecting the entire row is exactly what a DBA should want because it means that a) the database only contains the valid data it promised to save and b) the application team is immediately alerted to a mismatch between what their application assumes and how the database schema is actually configured — since there's been a breakdown in the migration process, there are likely other problems as well which might be more subtle. Especially in this century there's no excuse for not using a migration framework which makes the process of keeping the database schema in sync with the application automatic and reliable.

In one of the rare cases where data truncation is okay — and note that I've never encountered one of those in 3 decades where the users saw it as anything other than “This garbage software doesn't let us store more and that's why our department really runs on the Excel spreadsheet one of the admin assistants created” — the correct approach is to implement it at the application level so the user can see a fixed-length input field, warnings if they try to enter more, etc. and the application doesn't produce odd errors when, say, it stores a value and attempts to retrieve the same value only to be told it doesn't exist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: