Idempotency keys for exactly-once processing

AdieuToLogic · 2025-12-06T06:04:27 1765001067

From the article:

  In distributed systems, there’s a common understanding that 
  it is not possible to guarantee exactly-once delivery of 
  messages.

This is not only a common understanding, it is a provably correct axiom. For a detailed discussion regarding the concepts involved, see the "two general's problem"[0].

To guarantee exactly once processing requires a Single Point of Truth (SPoT) enforcing uniqueness shared by all consumers, such as a transactional persistent store. Any independently derived or generated "idempotency keys" cannot provide the same guarantee.

The author goes on to discuss using the PostgreSQL transaction log to create "idempotency keys", which is a specialization of the aforementioned SPoT approach. A more performant variation of this approach is the "hi/low" algorithm[1], which can reduce SPoT allocation of a unique "hi value" to 1 in 2,147,483,648 times when both are 32-bit signed integers having only positive values.

Still and all, none of the above establishes logical message uniqueness. This is a trait of the problem domain, in that whether two or more messages having the same content are considered distinct (thus mandating different "idempotentcy keys") or duplicates (thus mandating identical "idempotency keys").

0 - https://en.wikipedia.org/wiki/Two_Generals'_Problem

1 - https://en.wikipedia.org/wiki/Hi/Lo_algorithm

dragonwriter · 2025-12-06T06:19:28 1765001968

> it is a provably correct axiom.

Pedantically, axioms by definition are assumed/defined without proof and not provable; if it is provable from axioms/definitions, it is a theorem, not an axiom.

AdieuToLogic · 2025-12-06T07:15:58 1765005358

>> it is a provably correct axiom.

(Queen from d8 to a5) check.

> Pedantically, axioms by definition are assumed/defined without proof and not provable ...

(Bishop from c1 to d2).

Quite true in one form of its use:

  In mathematics or logic, an axiom is an unprovable rule or 
  first principle accepted as true because it is self-evident 
  or particularly useful. “Nothing can both be and not be at 
  the same time and in the same respect” is an example of an 
  axiom. The term is often used interchangeably with 
  postulate, though the latter term is sometimes reserved for 
  mathematical applications (such as the postulates of 
  Euclidean geometry). It should be contrasted with a 
  theorem, which requires a rigorous proof.[0]

However, notice the interchangeable use with "postulate", defined thusly:

  a hypothesis advanced as an essential presupposition,
  condition, or premise of a train of reasoning[1]

And, of course, a hypothesis is capable of being proven. Given the original use of "axiom" when responding to a quote referencing "common understanding", not a mathematical context, is it reasonable to interpret this usage as a postulation?

(Queen from a5 to e5) check.

;-)

0 - https://www.merriam-webster.com/dictionary/axiom

1 - https://www.merriam-webster.com/dictionary/postulate

zkldi · 2025-12-06T11:05:07 1765019107

It's just not what the word axiom means nor how anyone uses it. An axiom is unprovable by definition - is it a thing we accept to be true because it is useful to do so (e.g. there exists an empty set)

"Provably Correct Axiom" is nonsense. An axiom is unprovable.

Just "provably correct" would've been fine. This chess stuff is hilariously pretentious.

threatofrain · 2025-12-06T11:57:22 1765022242

Sorry but the entire line of argumentation and all it’s chess flavor is miles off the mark. This is not the sound of arguing with someone who studied what they talked about.

endofreach · 2025-12-06T09:46:23 1765014383

I just nod and keep playing checkers.

antonvs · 2025-12-06T11:23:12 1765020192

Username checks out

threatofrain · 2025-12-06T12:27:13 1765024033

It should be noted that when you have bad actors in your system almost all guarantees of all kinds go out the window.

gunnarmorling · 2025-12-06T09:44:54 1765014294

> A more performant variation of this approach is the "hi/low" algorithm

I am discussing this approach, just not under that name:

> Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy.

In that model, a database sequence (e.g. fetched in 100 increments) represents the hi value, and local increments to the fetched sequence value are the low value.

However, unlike the log-based approach, this does not ensure monotonicity across multiple concurrent requests.

imron · 2025-12-05T22:53:56 1764975236

I like to use uuid5 for this. It produces unique keys in a given namespace (defined by a uuid) but also takes an input key and produces identical output ID for the same input key.

This has a number of nice properties:

1. You don’t need to store keys in any special way. Just make them a unique column of your db and the db will detect duplicates for you (and you can provide logic to handle as required, eg ignoring if other input fields are the same, raising an error if a message has the same idempotent key but different fields).

2. You can reliably generate new downstream keys from an incoming key without the need for coordination between consumers, getting an identical output key for a given input key regardless of consumer.

3. In the event of a replayed message it’s fine to republish downstream events because the system is now deterministic for a given input, so you’ll get identical output (including generated messages) for identical input, and generating duplicate outputs is not an issue because this will be detected and ignored by downstream consumers.

4. This parallelises well because consumers are deterministic and don’t require any coordination except by db transaction.

cortesoft · 2025-12-05T23:35:13 1764977713

How is this different/better than something like using a SHA256 of the input key?

Edit: Just looked it up... looks like this is basically what a uuid5 is, just a hash(salt+string)

dmurray · 2025-12-06T00:59:44 1764982784

This doesn't sound good at all. It's quite reasonable in many applications to want to send the same message twice: e.g "Customer A buys N units of Product X".

If you try to disambiguate those messages using, say, a timestamp or a unique transaction ID, you're back where you started: how do you avoid collisions of those fields? Better if you used a random UUIDv4 in the first place.

imron · 2025-12-06T03:11:41 1764990701

You don’t generate based on the message contents, rather you use the incoming idempotent id.

Customer A can buy N units of product X as many times as they want.

Each unique purchase you process will have its own globally unique id.

Each duplicated source event you process (due to “at least once” guarantees) will generate the same unique id across the other duplicates - without needing to coordinate between consumers.

bknight1983 · 2025-12-05T23:27:04 1764977224

I recently started using uuidv5 for ID generation based on a composite key. This allows a more diverse key set for partitioning by UUID

bokohut · 2025-12-05T21:49:07 1764971347

This was my exact solution in the late 1990's that I formulated using a uid algorithm I created when confronted with a growing payment processing load issue that centralized hardware at the time could not handle. MsSQL could not process the ever increasing load yet the firehose of real-time payments transaction volume could not be turned off so an interim parallel solution involving microservices to walk everything over to Oracle was devised using this technique. Everything old is new again as the patterns and cycles ebb and flow.

pyrolistical · 2025-12-05T23:52:37 1764978757

This article glosses over the hardest bit and bike sheds too much over keys.

> Critically, these two things must happen atomically, typically by wrapping them in a database transaction. Either the message gets processed and its idempotency key gets persisted. Or, the transaction gets rolled back and no changes are applied at all.

How do you do that when the processing isn’t persisted to the same database? IE. what if the side effect is outside the transaction?

You can’t atomically rollback the transaction and external side effects.

If you could use a distributed database transaction already, then you don’t need idempotent keys at all. The transaction itself is the guarantee

hippo22 · 2025-12-06T00:40:39 1764981639

The external side-effects also need to support idempotency keys, which you propagate. Then you use something like a message queue to drive the process to completion.

gunnarmorling · 2025-12-06T09:47:16 1765014436

Exactly that.

ivanbalepin · 2025-12-06T04:49:21 1764996561

i get what you are saying, but i don't think it's fair to call it bike shedding, getting the keys right is also important, one can easily screw up that part too

roncesvalles · 2025-12-06T00:52:01 1764982321

I'm not sure if TFA implies this (it uses too much of his personal jargon for me to understand everything, and it's Friday) but consider this solution based on his transaction log section: you should use the same database that persists the idempotency key to persist the message, and then consume the messages from the CDC/outbox-style. Meaning, the database simply acts as an intermediate machine that dedupes the flow of messages. Assuming you're allowed to make the producer wait.

jasonwatkinspdx · 2025-12-06T01:19:01 1764983941

The practical answer is you use a combination of queries and compensating actions to resemble idempotency with the external service. Some people additionally constrain things to be a linear sequence of actions/effects, and call this pattern Sagas. It's sort of a bastardized distributed transaction that lets you handle a lot of real world use cases without getting into the complexity of true distributed transactions.

hobs · 2025-12-06T00:12:26 1764979946

If you need "transactions" with microservices the traditional answer is sagas - eg multiple transaction boundaries that dont fully commit until their entire set of things is complete by passing a message or event to the next system, and having the ability to rollback each thing in a positive manner, either by appending new correct state again or "not ever adding" the original state.

ekropotin · 2025-12-06T00:33:18 1764981198

The problem with sagas is that they only guarantee eventual consistency, which is not always acceptable.

There is also 2 phase commit, which is not without downsides either.

All in all, I think the author made a wrong point that exact-once-processing is somehow easier to solve than exact-once-delivery, while in fact it’s exactly same problem just shaped differently. IDs here are secondary.

hobs · 2025-12-06T08:20:48 1765009248

I'd agree with that - two phase commit has a bunch of nasty failure cases as well, so there's no free lunch no matter what you do when you go distributed. So just ... don't, unless you really really have to.

Lethalman · 2025-12-06T09:35:54 1765013754

> To ensure monotonicity, retrieval of the idempotency key and emitting a message with that key must happen atomically, uninterrupted by other worker threads. Otherwise, you may end up in a situation where thread A fetches sequence value 100, thread B fetches sequence value 101, B emits a message with idempotency key 101, and then A emits a message with idempotency key 100\. A consumer would then, incorrectly, discard A’s message as a duplicate.

Also check out Lamport vector clocks. It solves this problem if your producers are a small fixed number.

jackfranklyn · 2025-12-05T23:16:00 1764976560

The messier version of this problem: banks themselves don't give stable unique identifiers. Transaction references get reused, amounts change during settlement, descriptions morph between API calls. In practice you end up building composite keys from fuzzy matching, not clean UUIDs. Real payment data is far noisier than these theoretical discussions assume.

crote · 2025-12-06T08:27:04 1765009624

What surprised me the most is that the counterparty field is optional.

You'd think that a transaction means money is going from a source to a destination, but according to some banking APIs sometimes it just magically disappears into the aether.

hinkley · 2025-12-05T21:28:56 1764970136

Failure resistant systems end up having a bespoke implementation of a project management workflow built into them and then treating each task like a project to be managed from start to finish, with milestones along the way.

doctorpangloss · 2025-12-05T22:12:10 1764972730

another POV is that solutions that require no long term "durable workflow" style storage provide exponentially more value. if you are making something that requires durable workflows, you ought to spend a little bit of time in product development so that it does not require durable workflows, instead of a ton of time making something that isn't very useful durable.

for example, you can conceive of a software vendor that does the end-to-end of a real estate transaction: escrow, banking, signature, etc. The IT required to support the model of such a thing would be staggering. Does it make sense to do that kind of product development? That is inventing all of SAP, on top of solving your actual problem. Or making the mistake of adopting temporal, trigger, etc., who think they have a smaller problem than making all of SAP and spend considerable resources convincing you that they do.

The status quo is that everyone focuses on their little part to do it as quickly as possible. The need for durable workflows is BAD. You should look at that problem as, make buying and selling homes much faster and simpler, or even change the order of things so that less durability is required; not re-enact the status quo as an IT driven workflow.

majormajor · 2025-12-05T22:31:42 1764973902

Chesterton's Fence, no?

Why are real-estate transactions complex and full of paperwork? Because there are history books filled with fraud. There are other types of large transactions that also involve a lot of paperwork too, for the same reason.

Why does a company have extensive internal tracing of the progress of their business processes, and those of their customers? Same reason, usually. People want accountability and they want to discourage embezzlement and such things.

leoqa · 2025-12-05T22:40:47 1764974447

Durable workflows are just distributed state machines. The complexity is there because guaranteeing a machine will always be available is impossible.

whattheheckheck · 2025-12-05T22:25:37 1764973537

Interesting thought but how do you sell an idea that sounds like...

"How we've been doing things is wrong and I am going to redesign it in a way that no one else knows about so I don't have to implement the thing that's asked of me"

doctorpangloss · 2025-12-05T22:37:21 1764974241

Haha, another way of describing what you are saying is enterprise sales: “give people exactly what they ask for, not what makes the most sense.”

Businesses that require enterprise sales are probably the worst performing category of seed investing. They encompass all of Ed tech and health tech, which are the two worst industry verticals for VC; and Y Combinator has to focus on an index of B2B services for other programmers because without that constraint, nearly every “do what you are asked for” would fail. Most of the IT projects business do internally fail!

In fact I think the idea you are selling is even harder, it is much harder to do B2B enterprise sales than knowing if the thing you are making makes sense and is good.

Groxx · 2025-12-05T23:25:05 1764977105

Why call this "exactly once" when it's very clearly "at most once"?

amarant · 2025-12-06T01:03:17 1764982997

Huh. Interesting solution! I've always thought the only way to make an API idempotent was to not expose "adding" endpoints. That is, instead of exposing a endpoint "addvalue(n)" you would have setvalue(n)". Any adding that might be needed is then left as an exercise for the client.

Which obviously has it's own set of tradeoffs.

zmj · 2025-12-05T22:39:36 1764974376

I like the uuid v7 approach - being able to reject messages that have aged past the idempotency key retention period is a nice safeguard.

otterley · 2025-12-05T23:23:24 1764977004

This is some useful reading that's in the same vein: https://docs.aws.amazon.com/wellarchitected/latest/reliabili...

eximius · 2025-12-05T22:35:27 1764974127

These strategies only really work for stream processing. You also want idempotent APIs which won't really work with these. You'd probably go for the strategy they pass over which is having it be an arbitrary string key and just writing it down with some TTL.

ekjhgkejhgk · 2025-12-05T21:31:42 1764970302

Here's what I don't understand about distributed systems: TCP works amazing, so why not use the same ideas? Every message increments a counter, so the receiver can tell the ordering and whether some message is missing. Why is this complicated?

ewidar · 2025-12-05T22:18:36 1764973116

Not trying to be snarly, but you should read the article and come back to discuss. This specific point is adressdd.

exitb · 2025-12-05T21:33:22 1764970402

It needs a single consumer to be that simple.

mkarrmann · 2025-12-05T21:47:00 1764971220

And a single producer! i.e. it breaks down if you add support for fault tolerance

Etheryte · 2025-12-05T21:46:23 1764971183

TCP is a one to one relation, distributed systems are many to many.

ekjhgkejhgk · 2025-12-05T22:43:29 1764974609

You mean like UDP which also works amazing?

sethammons · 2025-12-06T11:45:31 1765021531

I'd tell you a joke about UDP, but you might not get it.

More seriously, you are confident and very incorrect on your understanding of distributed systems. The easiest lift, you can fix being very incorrect (or at least appearing that way) by simply changing your statements to questions.

Personally, I recommend studying. Start with the two generals problem. Read Designing Data Intensive Applications; it is a great intro into real problems and real solutions. Very smart and very experienced people think there is something to distributed systems. They might be on to something.

Etheryte · 2025-12-05T23:04:58 1764975898

UDP gives you practically no guarantees about anything. Forget exactly once processing, UDP doesn't even give you any kind of guarantees about delivery to begin with, whether delivery will happen at all, order of delivery, lack of duplicates, etc, nothing. These things are so far from comparable that this idea makes no sense even after trying real hard to steelman it.

ekjhgkejhgk · 2025-12-06T00:31:16 1764981076

UDP plus increment means that the client can request a snapshot to be re-sent. This mechanism is used in financial exchanges and works amazing.

This illustrates that the webdevs who write articles on "distributed system" don't really understand what is already out there. These are all solved problems.

vouwfietsman · 2025-12-06T12:14:32 1765023272

You are 100% correct. UDP can be used to solve this problem, in fact, UDP can be used to solve any (software) networking problem, because its kind of what networking is.

The thing that webdevs want to solve is related but different, and whether the forest is missed for the trees is sometimes hard to tell.

What webdevs want to solve is data replication in a distributed system of transactions where availability is guaranteed, performance is evaluated horizontally, change is frequent and easy, barrier to entry is low, tooling is widely available, tech is heterogeneous, and the domain is complex relational objects.

Those requirements give you a different set of tradeoffs vs financial exchanges, which despite having their own enormous challenges, certainly have different goals to the above.

So does that mean this article is a good solution to the problem? I'm not sure, its hard to tell sometimes whether all the distributed aircastles invented for web-dev really pay out vs just having a tightly integrated low-level solution, but regardless of the hypothetical optimum, its hard to argue that the proposed solution is probably a good fit for the web dev culture vs UDP, which unfortunately is something very important to take into account if you want to get stuff done.

ekjhgkejhgk · 2025-12-06T13:09:14 1765026554

> in a distributed system of transactions where availability is guaranteed, performance is evaluated horizontally, change is frequent and easy,

Isn't that the situation inside a CPU across its multiple cores? Data is replicated (into caches) in a distributed system of transactions, because each core uses its own L2 cache with which it interacts, and has to be sent back to main memory for consistence. Works amazing.

Another even more complex system: a multi CPU motherboard supporting NUMA access: 2 CPUs coordinate their multiple cores to send over RAM from the other CPU. I have one of these "distributed systems" at home, works amazing.

[1] https://en.wikipedia.org/wiki/Non-uniform_memory_access

vouwfietsman · 2025-12-06T13:28:20 1765027700

Indeed, again you are right. I've gone through the same motions as you trying to understand why the webdev people make this so complicated.

For your specific question here: NUMA & cpu cores don't suffer from the P in CAP: network partitions. If one of your CPU cores randomly stops responding, your system crashes, and that's fine because it never happens. If one of your web servers stops responding, which may happen for very common reasons and so is something you should absolutely design for, your system should keep working because otherwise you cannot build a reliable system out of many disconnected components (and I do mean many).

Also note that there is no way to really check if systems are available, only that you cannot reach them, which is significantly different.

Then we've not even reached the point that the CPU die makes communication extremely fast, whereas in a datacenter you're talking milliseconds, and if you are syncing with a different system accross data centers or even with clients, that story becomes wildely different.

jasonwatkinspdx · 2025-12-06T01:21:42 1764984102

Or perhaps you've simply not learned the basics of actual distributed systems literature, and so are ignorant of the limitations of those solutions?

podgietaru · 2025-12-05T22:55:53 1764975353

UDP doesn’t guarantee exactly once processing.

ekjhgkejhgk · 2025-12-06T00:40:35 1764981635

See my response to your sibling.

manoDev · 2025-12-05T21:46:00 1764971160

> The more messages you need to process overall, the more attractive a solution centered around monotonically increasing sequences becomes, as it allows for space-efficient duplicate detection and exclusion, no matter how many messages you have.

It should be the opposite: with more messages you want to scale with independent consumers, and a monotonic counter is a disaster for that.

You also don’t need to worry about dropping old messages if you implement your processing to respect the commutative property.

itishappy · 2025-12-05T22:22:45 1764973365

> It should be the opposite: with more messages you want to scale with independent consumers, and a monotonic counter is a disaster for that.

Is there any method for uniqueness testing that works after fan-out?

> You also don’t need to worry about dropping old messages if you implement your processing to respect the commutative property.

Commutative property protects if messages are received out of order. Duplicates require idempotency.

hobs · 2025-12-06T00:15:10 1764980110

hash your thing you want to do and see if you did it recently or in order by hashing each thing you wanted to do in order to get a new hash of all the things you did in the order you did it in one value.

majormajor · 2025-12-05T22:33:51 1764974031

You only need monotonicity per producer here, and even with independent producer and consumer scaling you can make tracking that tractable as long as you can avoid every consumer needing to know about every producer while also having a truly huge cardinality of producers.

attila-lendvai · 2025-12-05T23:31:30 1764977490

does OP mean simply the identity of the message?

idempotency means something else to me.

gunnarmorling · 2025-12-06T09:50:01 1765014601

"Idempotency key" is a widely accepted term [1] for this concept; arguably, you could call it "Deduplication key" instead, but I think this ship has sailed.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

d4rkn0d3z · 2025-12-06T08:37:45 1765010265

I agree, this whole thread seems to turn the concept of idempotency on its head. As far as I know, an idempotent operation is one that can be repeated without ill-effect rather than the opposite which is a process that will cause errors if executed repeatedly.

The article doesn't propose anything especially different from Lamport clocks. What this article suggests is a way to deal with non-idempotent message handlers.

vouwfietsman · 2025-12-06T12:28:36 1765024116

I'm not sure I follow, though I agree with your definition of idempotency I think ensuring idempotency on the receiving side is sometimes impossible without recognizing that you are receiving an already processed message, in other words: you recognize that you have already received the incoming message and don't process it, in other words: you can tell that this message is the same as an earlier one, in other words: the identity of the message corresponds to the identity of an earlier message.

Its true that idempotency can sometimes be achieved without explicitly having message identity, but in cases it cannot, a key is usually provided to solve this problem. This key indeed encodes the identity of the message, but is usually called an "idempotency key" to signal its use.

The system then becomes idempotent not by having repeated executions result in identical state on some deeper level, but by detecting and avoiding repeated executions on the surface of the system.