I am pleased to inform you that the following submission has been accepted to
appear at PLDI 2014:

"Resource Limits for Haskell"

Reviews from the Program Committee and External Review Committee are appended.
Please use the feedback to improve your submission for publication.

Detailed instructions for preparing the final version of your paper will
follow next week.

I look forward to seeing you in Edinburgh.

Keshav Pingali
PLDI'14 Program Chair


*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=
Meaning of Classification:
A: I will champion this paper; it is a definite accept (advocate/accept);
B: I can accept this paper, but I will not champion it (accept, but could reject);
C: This paper should be rejected, though I will not fight strongly against it (reject, but could accept);
D: Serious problems. I will argue to reject this paper (detractor).
*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=

First reviewer's review:

          >>> Classification <<<

C

          >>> Summary of the submission <<<

This paper addresses the challenge of implementing resource limits for Haskell.
 Although resource limits are a well-established idea inter-process (via the
OS), it is often desirable to limit resource consumption intra-process, such as
when a (single) program services requests and wants to limit the resource
consumption of a given request.  This paper develops a resource limit model for
Haskell, describes its semantics and evaluates an implementation.

          >>> Strengths <<<

The question of how to implement resource limits is generally important.  The
implementation presented here is reasonable and it is supported by an
implementation in GHC and a performance evaluation.

          >>> Weaknesses <<<

The paper is steeped in Haskell, limiting is impact to the wider PLDI community.

          >>> Evaluation <<<

I feel conflicted here.  In many respects I like this paper.  The problem is
important, the solution is nice, and the evaluation is plausible.  Your writing
is generally very nice too.

However, I'm a little frustrated because I find its hard to see what someone
from outside the Haskell community is to take away.  How much of the solution
is predicated on the particulars of Haskell and GHC?  How does this relate to
work such as MVM (Usenix 2003) and JSR 121?  Can you say more about how your
system relates to KaffeOS?  JSR 121 was an industrial-strength attempt at
isolation and resource limits; you should have at least passing mention of it. 
Part of the frustration is that you don't appear to make much of an effort to
appeal to the wider PLDI audience, who may be interested in the problem but
have only passing knowledge of Haskell. What is someone not familiar with
Haskell and the GHC implementation supposed to make of a characterization of
overhead as "one extra memory dereference per thunk"? The inference is that
this is low, but for me that's just inference---I would like clarity. You
leverage the block structured heap used by Haskell.  That's a good idea in your
setting, but how would your ideas translate to language implementations that
can't afford the overhead of that aspect of the GHC heap? In short, the paper
looks too much like an ICFP cast-off.  This in itself is not intended as a
criticism, but if it means that it is hard for the PLDI audience to relate to
the solution and or see its relevance outside the confines of GHC, then this is
a problem. The good news is that those concerns are principally about the way
you've pitched it, not concerns with the ideas, so I'm guessing you could
address these issues fairly easily when revising the paper.  Nonetheless, as a
reviewer looking at the existing version of the paper, I'm interested in
hearing answers to some of those; perhaps you can respond in your rebuttal.

Minor nits, comments, questions:

-p5 "which what" -> "which"

- p6 Implementing the nursery over the linked list of blocks as per the block
structured heap would not fly in many high performance VMs because the cost of
indirecting to the block header to identify the generation is too high.  Not a
big deal, but this speaks to my broader question about to what extent your
ideas will play out in other settings.

- p6 "implementation resource containers" -> "implementation of resource
containers"

- p6 "The conventional design for a..." Not true. Many mainstream GCs use
multiple large discontiguous regions, separately acquired from the OS.

- p8 Fig 7 -> Table 1

- p8 Fig 7 "Runtime" -> "time" or "running time" ("runtime" is a noun, refers
to GHC in your case).

*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*

Second reviewer's review:

          >>> Classification <<<

B

          >>> Summary of the submission <<<

This paper extends Haskell and the GHC implementation with a way to
bound the memory consumption of subcomputations (functions, threads,
whatever) via explicit (albeit monadic) resource containers.
Allocation is charged to the allocator and thunk evaluation to the
creator of the thunk. Explicit "with-resource-container" constructs
allow switching the container by the programmer. Exhaustion is
detected by the memory allocator (at the block level, not an
individual allocation) and reified as an asynchronous exception. An
implementation has low overhead (3-5%) and leverages prior work on
space-profiling Haskell programs and separate prior work on a monad
for tracking information flow (so any thread that has access to a
resource container is killed if that container is exhausted and
reclaimed).

          >>> Strengths <<<

* A does-it-all paper on an important problem: thoughtful design,
 formal semantics, real implementation and evaluation

* Particularly nice description of the design and how it complements
 prior work

* Good integration into a real system, with discussion of how GHC's
 block allocator was well-suited for the work

          >>> Weaknesses <<<

* I expected much more discussion about how the approach interacts
 with the optimizer, particularly issues like strictness analysis. I
 would hope for a much more rigorous argument that the
 opaque-built-in approach is always (usually?) sufficient.

* Space limits are in non-programmer-visible terms like "2
 Megabytes", which is practical, but doesn't actually "mean" anything
 in Haskell. This is probably inherent, but not what I expected in
 the early pages of the paper.

          >>> Evaluation <<<

Overall I really like this work and lean toward recommending that PLDI
accept it. Limiting space consumption is an important problem and
this work is a nice complement to prior work. This work also dealt
well with important Haskell issues, most importantly laziness, better
than I would have expected -- it actually works to use resource
containers in lazy languages, which is not obvious!

So in the interest of improving the paper, let me focus on the two
weaknesses listed above and one more substantive comment:

* The optimization issues seem like a "big deal" that deserves more
 discussion. I'm particularly interested in strictness -- we are
 used to being able to make code strict whenever it is meaning
 preserving, but it clearly affects memory accounting. Can we reason
 about when it is legal, e.g., if the resource container of the thunk
 and the caller are the same? These are issues I was already
 wondering on page 2.

* I think the "by limits we mean something like 2MB" issue can just be
 handled with writing. In Section 2, I took the forward pointer to
 Section 4.3 about what a "limit" is to be something different and
 more semantic. I'm also interested in how a computation might
 divide up its container budget to subcomputations (perhaps different
 threads or perhaps just different lazy computations). Are there
 ways to manipulate the current limit algebraically, e.g., "give 1/n
 - k of my limit to each of these other containers I'm creating"?

* The title might be a bit broad since "space" is only one resource
 and much/most of the work in this paper is specific to space.

A few other details:

* page 2: "Since thunks are not values" -- I wasn't sure exactly what
 was meant by this phrase -- it seemed to confuse more than help. In
 some senses, they are values when you write them with the brackets.

* page 3: "If there's not enough space to calculate the value of this
 thunk, it might as well be bottom!" -- I didn't understand the line
 of reasoning here.

* Theorem 2.2: Is this actually true? Do we want it to be true?
 Can't I catch an asynchronous exception and handle it with a
 non-terminating computation that does not allocate?

* page 5: "complex assessment of the entire heap" -- it's a little
 complex, sure, but fairly easy to understand. Is it more complex in
 a lazy language? I'm not sure.

* page 6: The current scalability bottleneck in the collector seems
 fairly easy to fix -- just need a scalable bag of containers needing
 work. Of course, that's easy for a reviewer to say. 

* page 6: The last paragraph of 4.2 is fine, but a clearly-bad idea to
 me, so this paragraph might be removable.

* page 7: 4.4 might be a little too short. How robust/secure is this
 actually? Can you write an adversarial program that exploits this?

And a handful of typos:

* page 2: container." should be container?"

* page 5: "which what resource containers"

* page 5: "Instead we, opt"

* lots of citations need capitalization for proper nouns

*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*

Third reviewer's review:

          >>> Classification <<<

B

          >>> Summary of the submission <<<

The paper presents a language extension for Haskell that allows programmers to
precisely account for memory used during program execution. The accounting is
done by allocating resource counters and attributing (stack and heap) memory
costs to them. Resource counters allow very fine-grain control over resources
inside a single process. There are various challenges in terms of deciding
which resource container should be attributed for each operation that takes
memory, how should overflow exceptions behave, and how to implement resource
counting and resource reclamation with minimal overhead and maximal precision. 
The authors implement their extension withing the GHC compiler and evaluate it
over a benchmark suite, demonstrating its precision and showing its benefits
for detecting memory leakage and fixing bugs by enforcing safe bounds.

          >>> Strengths <<<

* The ability to account for resources is important for debugging and ensuring
robustness. The paper tackles a whole array of challenges (clear semantics,
attribution, reclamation, implementation overhead and precision) and provides
interesting solutions.

* The paper is well-written. I especially like the fact that you specify the
semantics over a minimal functional language, which makes the paper accessible
to readers without background in Haskell.

* The approach is implemented inside an actual compiler and evaluated over real
code and real applications.

          >>> Weaknesses <<<

* The solution is specific to functional languages.

          >>> Evaluation <<<

* How do you set the limits for a resource container? The minimal functional
language doesn't show that. I guess that the actual extension to Haskell has
newRC with a limit argument.

* In Section 2, you rush to consider various alternative designs for the
solution. I think you should spend a couple of paragraph to explain the problem
in more depth - what would you ideally like to achieve and what are the
challenges (e.g., accounting for both stack and heap memory, obtaining precise
estimates for used resources, implementation challenges like integration with
GC)?

* In Figure 2, what is the very last superscript 'j' for?

* Page 5, "Hence, we conservatively track which [what] resource containers..."
* Page 7, "After all, they are only ever [are] evaluated once..."
* Page 9, "...must have a fixed quote[quota]; otherwise,..."
* Page 10, [13] - where is it published?

*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*

Fourth reviewer's review:

          >>> Classification <<<

B

          >>> Summary of the submission <<<

The paper describes adding runtime resource limiting and
reclamation to GHC Haskell. Included are cost attribution semantics
based on the existing profiling semantics.

          >>> Strengths <<<

Clear semantics and discussion of the implementation and
issues that must be addressed in the runtime system. Good explanation
of aspects of the problem unique to lazy evaluation. Solves an
important problem for many interesting applications in a way that is
more efficient and easier to use then existing options.

          >>> Weaknesses <<<

A few aspects of the implementation are unclear including
the interaction with the threaded runtime, accounting for foreign
allocation, and the precise API given to user applications. In addition
some of the writing could be improved.

          >>> Evaluation <<<

In the introduction, a better case should be made for managing resource
limits at the language runtime level. The claim that "processes are too
expensive and coarse-grained" assumes an OS implementation. A much
stronger argument would be more specific about either the particular
costs or a particular OS. Why can managing resources at the language
level avoid these costs? Are these costs inherent to all
implementations or notions of processes? Is the benefit similar to
Singularity OS's software isolated processes (language-level, but also
OS level support)? The examples given in the introduction are not bad,
but it should be clearer why the status quo multi-process application
approach is not viable. For instance, can copy-on-write page sharing
among processes avoid most of the overhead of duplication of memory for
multiple processes? Does the time overhead of this mechanism cost too
much?

The first paragraph of section 2 ("Resource limits") is unclear and
needs to be reworded. Why is "allocator-pays" mentioned? What two
design choices?

In section 8 of Marlow et al.'s paper "Runtime Support for Multicore
Haskell" it mentions that the default strategy for black holes in GHC
allows for an unbounded amount of duplicated work. This is due to
avoiding an atomic operation when forcing a thunk and instead racing
with other threads that may encounter the thunk. The current paper does
not seem to mention (unless I missed it!) whether the reported
implementation is using the threaded runtime system. I can imagine this
black hole issue could interact with accounting for allocation, though I
suspect that it will correctly count the duplicated work as it should.
This does open the remote possibility that a computation that *should*
fit within the limits fails to do so non-deterministically, depending on
the outcome of a data race. The paper needs to make clear whether it is
using the threaded runtime system, and discuss the implications of the
choice. How are costs synchronized? What is the overhead of this? If
only the sequential runtime is being used, the difficulties of
supporting the threaded runtime should be mentioned.

Another point to clarify is how foreign allocations are handled. If a
library uses the foreign function interface to call into a C library
that does allocation, is this memory use accounted for? Could malicious
code exploit "safe" libraries to allocate more then allowed? Similarly,
if I recall correctly, allocations from integer-gmp do not reside
completely in the Haskell heap. Are these subject to over-allocation?
There may be excellent answers to these questions, but they are not
mentioned in the paper.

Section 3.1 mentions restricting effects and the API given in figure 5
and section 3.2 use a `CM` type for a "container monad". Section 4,
however, has examples that use this API as though the types were `IO`
actions. Indeed, the type for `withRC1` is given as:

  withRC1 :: RC -> a -> (a -> IO b) -> IO b

While we would expect from the API given in figure 5 that the type would be:

  withRC1 :: RC -> a -> (a -> CM b) -> CM b

This should all be made consistent. I would expect that there should be
some `runCM :: CM a -> IO a` function that lifts `CM` actions to `IO`
actions.

Section 3 suffers in clarity by not having worked examples. The API is
given, but readers will have difficulty understanding unfamiliar Haskell
syntax and will need some examples to help guide their understanding.

Section 3.2 mentions a `Transfer` function type that comes from a
library of copy functions. Some more details on why different transfer
functions are needed would be helpful. Similarly for the claim that
`rcCopyResult` will copy a "small value". How is it enforced that this
value is "small"? Why is `Transfer` not needed in this case?

Section 4.5's discussion of CAFs needs a more concrete example of why
one would have a program with an infinite data structure. This will be
quite unfamiliar to non-Haskell programmers but would be cleared up
easily with an example such as a lazy prime number table.

Figure 9 is missing thousands separator commas in the first row and
column.

*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*=--=*