Dynamic Space Limits for Haskell

Abstract We describe the semantics and implementation of a resource limits system for Haskell, allowing programmers to create resource containers which enforce bounded resident memory usage. Our system is distinguished by a clear allocator-pays semantics drawn from previous experience with profiling in Haskell and an implementation strategy which uses a block-structured heap to organize containers, allowing us to enforce limits with high accuracy. To deal with the problem of deallocating data in a garbage collected heap, we propose a novel taint-based mechanism that unifies the existing practices of revocable pointers and killing threads in order to reclaim memory. Our system is implemented in GHC, a production-strength compiler for Haskell.

Code

Here is the source distribution for GHC 7.8.2 with resource limits:

ghc-7.8.2~rlimits-src.tar.bz2. This provides the compiler and a minimal set of libraries. For more information on building, see the building guide.
ghc-7.8.2~rlimits20140509-testsuite.tar.bz2. If you unpack this tarball on top of the above then you will be able to run the testsuite.
ghc-7.8.2~rlimits20140509-windows-extra-src.tar.bz2. This provides extra build tools which are a requirement for building on Windows. Simply unpack this tarball on top of the regular source distribution. This tarball should be the same as the official distribution.

You will probably also want the rlimits library, which defines some useful libraries for handling resource limits. You will want to cabal install --with-ghc=/path/to/ghc-rlimits/inplace/bin/ghc-stage2

Known bugs:

Does not work on 32-bit architectures
GHCi (and Template Haskell) do not work
Compacting garbage collection does not work

If you are interested in trying out our system with minimal fuss, check out the ready-made VM image we submitted to the PLDI artifact evaluation process. The VM is slightly out of date and has some (now fixed) bugs, but the basic functionality all works.

I am currently only maintaining this patchset for GHC 7.8, and you can see the latest development version on my GitHub ghc-7.8 branch.

FAQ

Why not use an OS mechanism like rlimits?

Mechanisms like rlimits work well in many cases, but they impose conceptual and runtime overhead. The conceptual overhead comes from needing to restructure an application into multiple processes to apply per-process limits: a mechanism that is built into a language is much easier to use. The runtime overhead comes from the overhead of processes and inter-process communication; in a single-runtime system like ours there is no overhead beyond cache effects.

I don't use Haskell. Why should I care?

It's easy to conclude that this paper is Haskell-specific, what with “Haskell” being in the title and all. However, it is worth talking more carefully about what specific features of Haskell our system relies on. In order of increasing specificity to Haskell, we rely on: (1) the ability to create multiple regions in the heap cheaply (a block-structured heap), (2) the ability to safely terminate threads (asynchronous exceptions), and (3) the ability to statically isolate code (restricted monads/Safe Haskell).

The first condition is not at all specific to Haskell, and is really all you need to get half of our resource limits system (tracking of consumed memory). Block-structured heaps are well worth considering the next time you need to write a memory system: the design is very flexible, and in retrospect has been considered one of the "best decisions" to have been made for Haskell's storage manager.

The second condition is not specific to Haskell per se, but Haskell's purity and culture of using bracket makes it far more reasonable to assume that arbitrary code will be able to recover from being killed by an asynchronous exception. This may not be true in languages like Java, where resource limits systems like Luna specifically avoid killing threads. Still, we think most would argue that asynchronous exceptions (which work well) are a desirable feature to have in any language.

The final condition is rather Haskell specific. While in principle this API could be implemented in any language, the inability to enforce its usage in the type system could lead to difficult to diagnose leaks of references. Monads in Haskell help us a lot here, and the ability to run untrusted code using Safe Haskell provides a compelling use-case for resource limits. We freely admit that we are taking advantages of Haskell's circumstances.

It's worth noting that our implementation does not rely on laziness: while handling it is a technical result, without it the implementation can be simplified in the obvious way.

History

We have a previous version of the paper entitled "Resource Limits for Haskell" (BibTeX), which was rejected from ICFP'13, which described an old iteration of the system, which directly utilized GHC's cost-center profiling in order to implement resource limits. It was a very cute implementation, but it was very costly, since it required all code to be compiled with profiling. We've since reimplemented the system to take advantage of GHC's block-structured heap, for a sizeable speed increase.