Requests are rejected essentially when an atomic counter of inflight requests hi...

Requests are rejected essentially when an atomic counter of inflight requests hits the limit. It's important to note that the library doesn't actually keep any kind of queue of requests. That's really not necessary because every system already has a ton of queues in the form of socket buffers, executor queues, etc...

Yes, the basic implementation does reject arbitrary requests. We do have a partitioned limit strategy (currently in the experimental state, which is why it wasn't brought up in the techblog). The partitioned limiters lets you guarantee a portion of the limit to certain types of requests. For example, let's say you want to give priority to live vs batch traffic. Live gets 90% of the limit, batch gets 10%. If live requests only account for 50% of the limit then batch can use up to the remaining 50%. But if all of a sudden there's sustained increase in live traffic you're guaranteed that live requests will only be rejected once the exceed 90% of the limit.