Perceus: Garbage Free Reference Counting with Reuse [pdf]

vnorilo · on Dec 19, 2020

This paper is worth reading; an implementation detail that may be useful to many is their use of reference counting: negative refcount is used to indicate thread-shared objects. Releasing objects goes fast path with a single branch when refcount > 1, performing a non-atomic decrement. The slow path handles freeing memory, atomic counter mutation, and a sticky object flag encoded as overflow-by-one.

parley · on Dec 19, 2020

This does seem very interesting, at first glance. One thought:

> These “scoped life-time” reference counts are used by the C++ shared_ptr⟨T⟩(calling the destructor at the end of the scope), Rust’s Rc⟨T⟩(using the Drop trait), and Nim (using a finally block to call destroy)

So Rust's non-lexical lifetimes doesn't remedy this? Meaning the actual drop of xs needing to occur at the end of the example in the beginning of section 2.2, as opposed to right after the map. I would have thought that ys borrows nothing from xs, and the drop can be inserted right after map? Perhaps it's too early in the morning for thinking for me.

sanxiyn · on Dec 19, 2020

No, Rust's non-lexical lifetime does not change when things are dropped at all. It is purely a compile time feature with no run time effects.

christophilus · on Dec 19, 2020

This is impressive. I wonder if any language that used these techniques would also have to use such a rigorous effect system. It seems so.

Their implementation requires annotations on all functions which are effectful (throw exceptions, log to the console, etc). What wasn’t clear to me was whether or not the callers of those functions also need to be annotated, but I presume they do. If so, that’s a pretty tedious limitation. While I like explicitness, changing the signature of a deep, oft-called function (e.g. to add temporary debug logging) requires changing a lot of other code.

jules · on Dec 19, 2020

I don't think you need an effect system. You just need to take care to correctly fiddle with some reference counts when an exception traverses your stack frame. You already need to do that in other reference counted systems. After all, if an exception traverses your stack frame you need to decrement the reference counts of local variables. Whether or not you have checked exceptions or checked effects seems orthogonal to me. What matters is the implementation of those effects.

Jweb_Guru · on Dec 19, 2020

I don't think an explicit effect system is required to implement most (maybe all?) of the optimizations. Actually I think these techniques are probably applicable to many strict, mostly-functional languages with few to no cycles or mutable references. In fact, they cite Lean as their inspiration for many of these techniques.

Someone · on Dec 19, 2020

I would think that, if the compiler can check that the annotations are correct, it also can generate them if none are given (one way to do that: ‘just’ generate any possible combination of annotations, check which are correct, and pick the minimal set. Of course, a production-level compiler would use a faster algorithm)

If so, you still might have to annotate some library functions that are implemented outside the language, but that would be it. The annotations are there for the programmer as, without them, it would be hard to keep track of which of your functions are pure, which might throw, etc.

Jweb_Guru · on Dec 20, 2020

That's certainly not true in general (at least, if you want your compiler to provably halt) and it's really easy to point at counterexamples... for instance, Coq is capable of fully representing Peano arithmetic, and is dependently typed, so it follows that synthesizing its proof terms can require supplying proofs of arbitrary statements in PA, which is undecidable. But typechecking Coq proof terms is very much decidable, or at least if it isn't it's not because it includes PA.

That being said, even if your standard for synthesis is "works in practice" rather than some halting-related criterion, it's still really easy to find type systems that don't qualify. There are also sometimes practical concerns around separate compilation--if you're relying on introspection and not just interfaces for typechecking, and someone just tells you what interface will be satisfied at runtime, you won't be able to repeat this kind of analysis; the same sorts of problems occur when you perform reflection.

Someone · on Dec 20, 2020

I don’t follow that argument. I know that checking a proof almost always is a lot simpler than creating it (and certainly never harder), but when a programmer annotates a function, they don’t provide a proof that the property holds, they just make an assertion that the it holds, and the compiler has to check it.

If the compiler can do such a check, and the number of possible annotations is finite, it can check each of them, thus computing the set of annotations that the function adheres to.

Jweb_Guru · on Dec 20, 2020

Well, the problem is that the number of annotations is not finite, and often there are many valid signatures that would work for a function. Determining which ones make the whole program typecheck is not a decidable problem for most type systems, and even for those for which it is, this often requires sufficient amounts of global knowledge to make the problem totally intractable in practice.

This is a problem you run into very quickly--for example, Hindley-Milner is only decidable because it will only infer "forall a" in prenex form (so its "most general" annotation is only most-general in that context). Type inference if you allow non-prenex quantification--even very mild non-prenex quantification--is undecidable, despite these being just as easy to typecheck as the prenex versions!

Now, in practice programs are made up finite numbers of terms, so for some of these type systems you could certainly come up with some sort of semidecision procedure by iterating through every possible sequence of type annotations. But as this kind of brute force algorithm will not terminate if it can't find the right set of annotations, and in practice will not terminate within human lifespans even in most cases where the right set of annotations exist, I don't think people would be very happy with such a system.

This is especially true if the only reason to find the annotations was to perform program optimizations, and not to enforce correctness. People are already wholly unwilling to give compilers unlimited amounts of time to do interprocedural analysis, which is mostly done in LLVM and such by just inlining everything. If having effect annotations were actually critical for being able to apply the optimizations from the paper (which, like I said earlier, I don't know that they are), this would pretty much be another instance of a global program rewrite too expensive for an optimizing compiler to perform in practice.

Jweb_Guru · on Dec 22, 2020

BTW, since I just realized I was insufficiently clear about the Coq example, I should clarify--in a dependently typed language, synthesizing the right type can actually require synthesizing a well-typed term (this is where the dependency comes in, after all), so it's not generally easier to synthesize types than terms in such a language. You need a sufficiently complex dependently-typed program for this to start separating itself from "ordinary" difficulties with type inference in practice, but when you start playing around with indexed vectors you run into situations like this a lot--cases where you struggle to even be able to write down the thing you're trying to prove, because the type is only well-formed in a particular context and you need to essentially prove to the compiler that you are in such a context for type inference to remain decidable.

For example, to write down (not prove!) an equality between two vectors indexed by their length, at least in Coq (and other type theories without extensional equality, which is in general undecidable), you have to first show that they have the same length--otherwise the expressions are not well-typed. The length can be any well-typed term of type nat, and like I said before Coq is far more than powerful enough to express Peano arithmetic, so you can see already that synthesizing an equality type between two vectors is just as difficult as proving an equality between two natural numbers, so it is very much undecidable in general. In fact, it is quite difficult to write down in practice even in some surprisingly simple cases! The actual proof term, however, is usually something like a pattern match on some other equality followed by the creation of a function that pattern matches on another equality, etc. down the line, until it returns the unhelpful constructor `eq_refl` (which basically just says `x = x`). So all of the interesting work in such cases has transferred to the type system, with the proof term basically just there to perform type casts and coercions.

com2kid · on Dec 19, 2020

Could type inference be used to automatically propagate such annotations?

And then potentially be hooked up to a prettier plugin to adf explicit annotations if desired.

RobLach · on Dec 19, 2020

This is excellent work and Microsoft Research continues to leave a positive impression on me by reason of what they're focusing their energy on.

hardwaresofton · on Dec 19, 2020

Microsoft Research has a great stable of researchers -- could you imagine just casually having Leslie Lamport being an email away while working on some distributed systems problem/design? Or being able to pull on

This might be one of the only benefits to monopolistic presences -- the amount of great minds in the same place, set free to mingle and create can produce amazing results when a benevolent sponsor exists and it's tunnel-visioned on profit.

I'm quite cynical towards Microsoft but I am very grateful that they publish at least some of their papers/insights and their links (to existing papers) generally don't die.

[0]: https://www.microsoft.com/en-us/research/people/

RcouF1uZ4gsC · on Dec 19, 2020

It seems rich monopolies are good at having great research labs. AT&T Bell Labs and Xerox PARC come to mind.

xiphias2 · on Dec 19, 2020

Kodak also had a great research lab: they invented digital photography

ms4720 · on Dec 19, 2020

Over the long haul basic research makes a lot of money, as does having a large pile of very smart people you can point at hard problems

Iv · on Dec 19, 2020

I was surprised when reading Keynes texts where he criticized capitalism, seeing him counting the concentration of capital as one of the pros of the system: it allows efforts that no one else could take on.

Jason_Gibson · on Dec 20, 2020

Indeed, and not only did they develop the Perceus technique for the Koka language, but the immediately practical mimalloc allocator came out of that work as well.

https://github.com/koka-lang/koka https://github.com/microsoft/mimalloc

djwatson24 · on Dec 19, 2020

Cool to see explicit annotations for how to make rc fast when you have access to the ir. But doesn’t solve some major issues, like cycle collection:

“ In practice, mutable references are the main way to con- struct cyclic data. Since mutable references are uncommon in our setting, we leave the responsibility to the programmer to break cycles by explicitly clearing a reference cell that may be part of a cycle. ”

bjourne · on Dec 19, 2020

Always a catch with these zero-overhead refcounting schemes. If the programmer has to use some equivalent to weakrefs then how is this an improvement over existing schemes? The possibility of cyclic garbage to accumulate is a very big drawback in many applications.

studius · on Dec 19, 2020

const is really popular in JS... (i.e. could they surreptitiously be taking on Orinoco?)

djwatson24 · on Dec 19, 2020

I don’t think const or non const is the main issue- some data structures just require cycles like graphs or circular linked lists.

Jweb_Guru · on Dec 19, 2020

I don't disagree, but I also suspect that a very large number of real world JavaScript applications have few to no application-induced cycles. I might even go further and say this extends beyond JavaScript--the most common reason I even define recursive data structures in a language like Rust is for AST-like data that almost invariably forms a tree.

studius · on Dec 19, 2020

I supposed you could construct as dynamic and then make it constant? I wonder why that feature isn't seen in common languages, even though I know it sounds crazy.

vnorilo · on Dec 19, 2020

Clojure has transient mutable collections that are designed to become persistent once "complete". [1] Some systems have made use of using reference count of one (unique owner) as permission to mutate, something that TFA generalizes as memory reuse.

1: https://clojure.org/reference/transients

fwsgonzo · on Dec 19, 2020

That seems like quite the achievement! I'm actively looking for a high-level language to put on top of an enterprise product, but it has to either transpile to C or integrate well with a C standard library (unfortunately). It's not a 100% requirement, as Rust still works because I can override the global allocator.

Right now I'm looking towards Nim, which has some pros and cons. I'm still not good enough to build complex APIs with it, but it's really easy to integrate. Nim feels like Python for systems programming.

kitd · on Dec 19, 2020

D has a similar feel to Nim, and excellent C integration. The GC is optional and it has good meta programming capabilities.

RcouF1uZ4gsC · on Dec 19, 2020

> The GC is optional

A big issue is that while GC is technically optional, a lot of the standard library depends on GC (at least last time I checked) and using D without those parts of the standard library is much less ergonomic.

mhh__ · on Dec 19, 2020

This is increasingly not the case but there is also mir, which is effectively a smaller standard library along with a lot of extremely fast numerical code

elcritch · on Dec 19, 2020

+1 for Nim in that use case. The core system and std library is very flexible, so it “ports” easily. It’s been used in a few games too. I’m using it for embedded currently after a few months and while there are a few edge cases overall it’s been stable for me. With C interfaces you’ll want to add some code to make it more idiomatic, here’s my try for i2c code on esp-idf: https://github.com/elcritch/nesper/blob/e1d9f7ba13eebe45aea4...

iainctduncan · on Dec 19, 2020

You might want to check out the various Schemes that are designed to work well with C, either by compiling to it, or by being in an embedded interpreter. I'm doing that with S7 for computer music and loving it.

Compile to C include Gambit, Gerbil, and Chicken, embed include Guile, S7, Gambit. (And others I'm forgetting).

idclip · on Dec 19, 2020

D/haskell/nim are great

solarexplorer · on Dec 19, 2020

Very nice, could this be used in Apple's ecosystem? (which is based on reference counting as well?)

pstch · on Dec 19, 2020

Java being faster on nqueens shows an interesting result : GCs can actually make code "faster", by moving some work to another core.

It's possible this could still be done with the described approach, but it looks much more difficult.

djwatson24 · on Dec 19, 2020

Unless I am misreading their graph, Java beats neither c++ nor koka in time or rss.

pstch · on Dec 21, 2020

Oops, it was not on nqueens, but on deriv.

> Finally,Java performs best on this benchmark; we can see whilerunning the benchmark that it can run the G1 collectorfully concurrent on another core.

Bootvis · on Dec 19, 2020

I read them the same. Now it is interesting that C++ is so much slower on cfold.

ramshanker · on Dec 19, 2020

I imagine, even C compilers will soon start supporting automatic ref-counted version of compilation for old codes.

pjmlp · on Dec 19, 2020

First they would need to finally adopt one of the many available secure string libraries.

Jweb_Guru · on Dec 19, 2020

While it does depend on the C project, in general it's not the sort of language you can really retrofit pervasive refcounting into automatically. Even if it worked, I think the set of people currently using C who would be willing to switch to such a system, but hadn't already moved to another language or started using Bohm, must be quite small at this point.

moonchild · on Dec 19, 2020

Tracing GC involves unpredictable pauses. RC does not (or, at least, has a much lower incidence rate).

yencabulator · on Dec 26, 2020

Reference counter drop can lead to an unpredictable amount of freeing. Consider dropping the last reference to the root of a varying-size large tree.

Jweb_Guru · on Dec 20, 2020

Yes, but (conservative) tracing GC doesn't require invasively updating the contents of all your data structures, in exchange for not actually capturing all pointer relationships. In C, it's often not really possible or even safe to do that, so I don't see how such a method could work automatically except in very restricted settings.