Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I once made an app not using sequential integers as object ids, as you suggest.

It was an absolute nightmare. Maintenance was a nightmare, you're constantly having to generate or replicate these things that add an extra layer of complexity to everything, and almost always unnecessarily.

It's also extremely bad for db performance, causes massive page fragmentation, indexes become useless almost straight after rebuilding them, etc.

For almost everything, sequential int IDs are fine. It's the things you expose to the users that you need to be careful with, and then don't use the primary key to access them, add another unique key to them, but keep the id in there for the db to use and for your own use.

My lesson was to go back to always using int ids, and on a few objects have a separate unique key column to expose to users for sensitive stuff.



I also don't think using UUIDs as a security (by obscurity) strategy is valid. But there are other reasons someone may choose to use UUIDs. For instance, it's convenient to generate identifiers in a decentralized manner. I want to counter your one bad experience with my (equally anecdotal) many-multiple good experiences. Databases do just fine with UUIDs. Though we may be working on different kinds of systems, and optimizing for different things. I don't frown upon using integers (well, longs) for identifiers, but I personally prefer UUIDs.


A securely generated 128 bit UUID isn't security-by-obscurity, but rather security-by-cryptography. It's still bad not to have authorization checks, because UUIDs can "leak" into logs, browser histories, emails, and things like that. But the security benefit of using crypto-random IDs is neither cosmetic nor superficial.

Most applications don't use UUIDs and many of them are fine and I definitely wouldn't ding an app for using monotonic IDs, but I'm increasingly thinking that it's worth praising UUIDs more.


If you know the (integer) identifier, and because the bad application isn't secured with authentication, you get access to something you're not supposed to. If you make the identifier a lot harder to know, and you still have no security, that smells like the obscurity part. I can absolutely see your point that the UUID identifiers are not just a lot harder to guess, they may be impossible to guess. But the security is still bad, and I don't think that the impossible to guess-property of the UUIDs should be a substitute for security. I don't think we really disagree, though.


That's not what people usually mean by "security by obscurity" when they critique the concept. Unfortunately the term is overloaded so it's lost its way over time.

To illustrate this for you, let me turn it around a bit. Is it security by obscurity if the only thing stopping someone from logging into your account is knowing your password?

Security by obscurity is when you (for example) roll your own cryptosystem and rely (in whole or part) on the secrecy of your new-fangled algorithm to save you. That is unsafe. But if you're saying high-entropy strings shouldn't be the only barrier to authentication, you're throwing out half a century of complexity theoretic cryptography.


Yeah, I think the understandable confusion comes from the idea that a UUID "obscures" the sequential identity of the id in the same way a password mask obscures a password, but the obscurity in security through obscurity refers to reliance on an attacker's ignorance of implementation details to secure the system rather than on a mechanism that is provably secure.


Another pretty direct comparison would be to 128-bit secret bearer tokens, on which a huge portion of the industry relies.


I think context matters here. If someone wants to hand out tokens, for instance via e-mail verification, I'm fine with relying on that being a UUID. When you make it harder (impossibly hard) to guess a "record number" by using UUIDs, which is what we were probably talking about, that's great too. (I already yielded that point.) But let's not lead the general population into thinking that UUIDs make everything safer (probably not what you were saying), because if something is "just an identifier" it may not be handled as safely, which is what this seems to be relying on in the context of security. Same as how user names were traditionally not handled as something secret or confidential. Sometimes UUIDs appear as just identifiers and are not handled with any secrecy, so they just can't always double as a security feature.


> Sometimes UUIDs appear as just identifiers and are not handled with any secrecy, so they just can't always double as a security feature.

I can see your point. If UUIDs are handled in such a way that they are discoverable by anyone, they are not enough to make the references secure.

I think the point tptacek and others are making is that this is an instance of the defence in depths principle, though. In scenarios where UUIDs are not simply discoverable, using UUIDs is inherently more secure than using a monotonic ID, simply because the monotonic ID can be easily guessed. Yet, they are still not enough in isolation and you should be additionally using proper access control (due to eventual leakage of particular UUIDs in emails and such).


But UUIDs do, in fact, make things safer.


On average: yes. Always: no.


Always no? What's a situation in which you'd be better off with monotonic ids?


I never said they were less secure. I said there are situations where they're not really more secure.

If I can see in this HTML page that your reply is /reply?id=12345, then it doesn't matter if Hacker News uses integers or UUIDs, if there's a bug in /edit?id=12345 that just lets me edit it without the appropriate security. If we say that UUIDs always make everything inherently more secure, we're doing everyone a disservice.

Now, the original discussion was about (1) discovering for read, and not about (2) escalating a read to a write. But if anyone reading this mistakenly takes from it that UUIDs are the way to solve these problems then they will go on optimizing for (1) at the expense of (2).


"Security by obscurity is no form of security."

That's been bouncing around at least since the time I noticed it on /. Which was a couple of decades ago.


Note most databases use type 1 UUIDs by default, not randomly generated type 4 UUIDs. There are tons of security holes out there because people are using type 1 UUIDs thinking they can be used as secure tokens.


> For instance, it's convenient to generate identifiers in a decentralized manner.

For an elegant solution to this problem, check out Twitter's Snowflake[0].

[0] https://blog.twitter.com/engineering/en_us/a/2010/announcing...


I always wondered why databases have not implemented a scheme like Microsoft's Active Directory RID master FSMO role. One server is responsible for handing out chunks of ID's to each server. They request a new block whenever a threshold is reached (50% by default IIRC).


Some coordination there courtesy of Zookeeper.


I don't think it's really fair to call it security by obscurity. The UUIDs have far more entropy than 99.9% of user passwords protecting them.


Not if they are type 1 UUIDs, which is the default on MySQL.


From https://littlemaninmyhead.wordpress.com/2015/11/22/cautionar...

> Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example.

HN discussion: https://news.ycombinator.com/item?id=10631806


Sure, security is about taking a layered approach - I don't think anyone would seriously advocate using knowledge of a UUID as enough authorisation on it's own. Well, I hope not :)


I find UUIDs very useful for this reason - the IDs can be generated by different parts ofba distributed system, and be "guaranteed" to be unique.

In this kind of system you can also generate deterministic UUIDs, which are useful for idempotency (e.g. The same event can be recognised as a duplicate)


Sure, that's fine. The context of my point about IDs is for user-facing APIs. Note that user-facing really means "publicly accessible", even in the case of private APIs. As I mentioned elsewhere, market research groups will be happy to extrapolate as many metrics as they can from your APIs integer object IDs.

That being said I'm a little surprised to hear about the complexity. Are you able to share which DB/stack you were using? This functionality should be natively supported at two distinct abstractions: your programming language and your database.


In that case C#/EF/SQL Server is what that app was made in. his was like 6 years ago, admittedly, but it didn't geel as if it's really treated as a first class citizen. Everything's in ints in example code, you have to fight the auto-code generators a bit, etc. So in my experience it's never anywhere near as seamless as the int support.

But it's not just the support that's such the problem. You're testing, you need to switch category, you can't just change a 1 to a 2. You have to go find what random uuid the categories had added to it. You can't just go into the DB and add a new line, you have to open a UUID generator. You can't just quickly add a foreign key relationship, you have to look up the UUID. And a ton of other little annoyances.

Actually, categories are an excellent example of something that shouldn't be a UUID, they're actually supposed to be discover-able.

I think my present project has UUIDs on the user, company, invoice and payments tables, but still ints as the primary key. Everything else isn't worth it. There's a merchant table, but again, they're all supposed to be discover-able (and aren't editable by the merchant themselves).

I also generally implement controller level security that checks access to the root object being returned by default, so I can't really make a mistake exposing an unauthorised object. There's an occasional controller where I've made a conscious decision not to implement that level, generally actions that allow both authenticated and unauthenticated users (e.g. viewing merchants or categories).


You can generate uuids that play nicer with database storage / indexing. NEWSEQUENTIALID() in MSSQL, for example.

The keys will be easier to guess again, but if all you have to do is guess a primary key to get access to the underlying data, something else isn't right anyways.


I think this gets to the crux of the issue.

It's not about using hard-to-guess UUIDs[0], but restricting access to the underlying data[1].

[0] https://en.m.wikipedia.org/wiki/Security_through_obscurity

[1] https://en.m.wikipedia.org/wiki/Access_control


It's not really security through obscurity. In these case I understand the ids where related to data that the company was making available to users through email links. A cryptographically secure 128bit UUID is impossible to guess, no more than a cryptographic access token. Now of course, you would probably rather want to have an authentication scheme on top of that, but that comes at a support cost in term of customers loosing their passwords, locking themselves out of their account, etc. And it is not clear you have increased security as people re-use passwords.

Then of course there is the issue that email is for the most part un-encrypted (or encrypted without validating certificates).


It's still an access control issue in that case. The user should never be aware of the UUID's. Only the backend should deal with it. If you have a _public_ API that deals with UUIDs, therein lies the issue.

And a side note: I wouldn't trust that the prng for your UUIDs are cryptographically secure. That's not a part of the spec.


I know, but as they're easier to guess, what's the point? Might as well just go back to ints.


Is MSSQL's NEWSEQUENTIALID secure? I didn't think it was.


Is the point of non mono tonic is schemes to make them -secure- secure?

I thought they were a bit of a hack to raise the bar a touch. In which case the crypto security properties of that function isn’t interesting. Instead the ergonomics are.


No, the cryptographic security of the identifier matters a lot. A GUID generated from an insecure PRNG can be used to predict other GUIDs. A UUID generated from 16 bytes of /dev/urandom can't be used to get anything but the object to which it refers.


Yeah there's nothing wrong with using sequential integer IDs in the database. But objects should be assigned random unique IDs as well, which is how they are referenced by and presented to the outside world. The random ID is what is presented to the frontend/user. I'm not sure what the issue you had with generating random integers for primary keys was, it seems like that should work fine. Is it because the index has to be rebuilt when an value is inserted into the middle of the ordered sequence?


One path is to use sequential ints internally and encrypt them externally with something like idgen:

https://pypi.org/project/idgen/

That provides IDs that are both opaque and, if you want, user-friendly.

(disclaimer: I wrote it.)


Something I realised looking at Google+ identifiers -- 21 digit numerics, 19 of those significant -- was that it made brute-searching the user profile space infeasible. There were only 4 billion and change legitimate profiles, there was a 4 in 100 billion chance of hitting one by chance on any given random request of the space. And IDs appeared to berandomly distributed.

And yes, Google also posted a sitemaps file (or rather, 50,000 sitemap files) with all profile IDs. But that was last marked updated in March 2017, for some reason. Being able to validate that would have been nice.

But as a mitigation against blind bulk scrapes, a useful tool. I'd consider that one of G+'s good design elements.


If you want an URL that you can share only to your friends then you have no choice. If you don't do that, then just use normal ACL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: