Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why is the stack so constrained?
11 points by brundolf on April 14, 2021 | hide | past | favorite | 8 comments
You learn fairly early on that the call stack is small and everything on it must have its size known ahead of time by the compiler. Anything large and/or dynamically-sized goes on the heap. This is just the way it is.

But I was thinking about it: it's all just memory; why can't the stack expand to fit the data (assuming that data doesn't need to be returned outside of the current stack frame)? And perhaps even more strange, why can't more memory be requested for dynamically-sized values as needed? Why does everything on the stack need to have a statically-known size?

And then I found this C function that seems to do just this (but is very rarely used): https://linux.die.net/man/3/alloca

The best explanation I could gather from reading different discussions on the internet is:

1) You don't put large amounts of data on the stack because the stack is small

2) The stack is small because memory address space in 32-bit systems was really limited so you didn't want to eat it up reserving stack space for each thread

3) Stack frame sizes are known at compile time because... that's the way C did it? Maybe it allows for some compiler optimizations? Or something?

Am I missing something, or is this a vestigial idea? Wouldn't cache usage be way better if large/dynamic data structures could go on the stack? Has anyone explored this?




While your question is pretty general, your perspective is quite limited to C language. Some other languages, such as Go, can have dynamic-sized stack allocation. (Actually, the user does not even bother stack or heap is used in Go)

Take Linux, a giant user of C language, as an example. Allocating a large chuck of memory on stack is just not useful, even given unlimited amount of memory. You note the most critical reason yourself in the post: (assuming that data doesn't need to be returned outside of the current stack frame) However, large data structures are almost always for others to use. Just consider sys_clone (that the kernel eventually generates a body for the new thread) or sys_mmap (that the kernel manipulates existing virtual memory area structures from elsewhere). Allocating them on the stack seems pointless.


What I actually had in mind was Rust. Broadly speaking Rust requires all stack-allocated values to be sized at compile time. Also, thanks to borrow-checking, code is encouraged to stay fairly stack-oriented, so at least I personally end up with lots of function-local data structures that get passed by reference down to other logic. So I think "pointless" is overstating it.

However, the bigger question wasn't about the size limit but about being runtime-dynamic. Why can't a function's argument have a dynamic size, and the stack frame just allocates the needed space at call-time? Even if we're not talking about large structures, this would be useful for things like (for example) ad-hoc union types (as opposed to enums, where all variants have to be known at compile time and are packed into a single bit-width regardless of the value). Maybe you could even get rid of some of the code-duplication that happens when generic functions get reified.

I assume there's a good reason, I just don't see what it is.


Let's look at this question from the compiler's point of view : What will be the assembly sequence are you going to generate for that dynamic-sized-argument runtime behavior?

For compiled languages it is handled with calling convention, and most architectures (ARM, PPC, RISC-V, ...) have register-based calling convention. Only when all the argument registers are consumed, the function call pass the rest arguments on the stack.


What you may be missing is that the stack is a lot faster than the heap.

First, stack allocations and de-allocations are extremely cheap. Allocating a known amount of stack memory requires nothing more than moving a pointer. The same goes for de-allocation.

Second, accessing stack memory is typically a LOT faster than heap memory since it's continuous and can be loaded in big chunks into the CPU's cache.


That's exactly why I'm wondering why we can't use it for more things :)


Large stack allocations are discouraged on modern CPUs because call stacks are placed in relatively small stack segments. If these overflow you get stack overflows. You can use the limit command to see default size of the stack segment on your machine. On mine it is eight megabytes which is almost nothing. :)

In "normal" C code, the layout of all call frames (stack frames) is fixed. The location of all local variables relative to the top of the stack is known at compile-time. However, in functions that use alloca, or Variable Length Arrays (VLAs), the layout of the call frame is potentially different in each function call. Therefore, the compiler has to insert code to keep track of where all local variables are and it also makes some compiler optimizations harder.

Look at the machine code generated for the following function using godbolt.org. It is way more complicated than if a and b's sizes had been compile-time constants:

    void fun(int x, int y) {
        int a[x], b[y];
        b[5] = 10;
    }
Dynamically sized call frames are a PITA to implement correctly and efficiently which I suspect is the reason why alloca/VLAs are poorly supported in many compilers.

There's some good discussion here about VLAs and alloca: https://stackoverflow.com/questions/1018853/why-is-the-use-o... Although be aware that some of the answers are wrong so read the comments too.

Even with good support for VLAs/alloca, large stack allocations violates the assumption that the stack segment is large enough. You may be a tidy coder, so you check whether your mallocs returns NULL, but I bet you have never ever checked whether you have enough stack space for a function call. :) Most of the time this works fine because relatively little data is placed on the stack (lots of recursive tree-traversal code, however, breaks badly). So you have to keep track of how much stack space you have left and grow the stack segment if you are nearing exhaustion. For example, if you only have 100 bytes left but want to create a 200 byte local variable, you would allocate a new stack segment, perhaps twice as large as the old one, copy the old stack to the new segment and continue from there. Unfortunately, it would be a major performance drag since you would need to check that you have enough stack space for every dynamic stack allocation and for every call frame.

Some virtual machines use guard pages to check for stack overflows. But the amount of overflow they can detect is limited to the size of the guard page. For example, if you have 64 kb guard pages they would be too small to detect overflows caused by 100 kb stack allocations. So you don't get around the need for explicit checks.


> You can use the limit command to see default size of the stack segment on your machine. On mine it is eight megabytes which is almost nothing.

Sure, but this can be configured :) Someone in one forum said they were able to set their system's stack size to "unlimited" and were able to utilize more than 500MB of stack space in a quick test

> However, in functions that use alloca, or Variable Length Arrays (VLAs), the layout of the call frame is potentially different in each function call. Therefore, the compiler has to insert code to keep track of where all local variables are and it also makes some compiler optimizations harder.

> For example, if you only have 100 bytes left but want to create a 200 byte local variable, you would allocate a new stack segment, perhaps twice as large as the old one, copy the old stack to the new segment and continue from there. Unfortunately, it would be a major performance drag since you would need to check that you have enough stack space for every dynamic stack allocation and for every call frame.

This is the closest I've heard to a real explanation so far. Still- if the downside is just performance, would the cache benefits not potentially outweigh the performance downsides? I'm not a compiler engineer, but if I were I would definitely do some experiments around this. Maybe people already have and it's known to be a dead end?


> Sure, but this can be configured :) Someone in one forum said they were able to set their system's stack size to "unlimited" and were able to utilize more than 500MB of stack space in a quick test

Yeah, but run the same test in a multithreaded program and it is no longer that easy.

> This is the closest I've heard to a real explanation so far. Still- if the downside is just performance, would the cache benefits not potentially outweigh the performance downsides? I'm not a compiler engineer, but if I were I would definitely do some experiments around this. Maybe people already have and it's known to be a dead end?

The performance downsides is that every allocation on the stack could cause an overflow and therefore needs to be checked. Every CALL instruction and everytime you declare a local variable. Maybe with smart compiler optimizations you could get rid of some of these checks, but I don't think you could get rid of them entirely. The other big problem that I didn't mention in my last comment is that the language needs to support movable pointers. If you move a stack to a new memory segment all pointers to locations in the stack suddenly becomes invalid and has to be updated.

For small and fixed allocations, allocating on the stack when possible is certainly a good performance boost over heap allocations, but the advantage may not be very big for large dynamic allocations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: