当前位置：网站首页 > IT知识 > 正文

C++栈与堆内存空间模型详解

liuian 2025-01-14 15:20 73 浏览

一、栈内存空间模型

C++程序运行调用栈示意图：

函数调用过程中，栈（有俗称堆栈）的变化：

当主函数调用子函数的时候：

在主函数中，将子函数的参数按照一定调用约定(参考调用约定），一般是从右向左把参数push到栈中；
然后把下一条指令地址，即返回地址（return address）push入栈（隐藏在call指令中）；
然后跳转到子函数地址处执行：call 子函数；此时

2. 子函数执行：

push %rbp : 把当前rbp的值保持在栈中；
mov %rsp， %rbp：把rbp移到最新栈顶位置，即开启子函数的新帧；
[可选]sub $xxx, %esp: 在栈上分配XXX字节的临时空间。（抬高栈顶）(编译器根据函数中的局部变量的总大小确定临时空间的大小);
[可选]push XXX: 保存（push）一些寄存器的值;

3. 子函数调用返回：

保持返回值：一般将函数函数值保持在eax寄存器中；
[可选]恢复（pop）一些寄存器的值；
mov %rbp,%rsp: 收回栈空间，恢复主函数的栈顶；
pop %rbp；恢复主函数的栈底；

在AT&T中：

以上两条指令可以被leave指令取代

leave
ret；从栈顶获取之前保持的返回地址（return address），并跳转到此位置执行；

栈攻击

由上面栈内存布局可以看出，栈很容易被破坏和攻击，通过栈缓冲器溢出攻击，用攻击代码首地址来替换函数帧的返回地址，当子函数返回时，便跳转到攻击代码处执行,获取系统的控制权，所以操作系统和编译器采用了一些常用的防攻击的方法：

ASLR(地址空间布局随机化)：操作系统可以将函数调用栈的起始地址设为随机化（这种技术被称为内存布局随机化，即Address Space Layout Randomization (ASLR) ），加大了查找函数地址及返回地址的难度。
Cannary

gcc关于栈溢出检测的几个参数：

开启Canary之后，函数开始时在ebp和临时变量之间插入一个随机值，函数结束时验证这个值。如果不相等（也就是这个值被其他值覆盖了），就会调用 _stackchk_fail函数，终止进程。对应GCC编译选项-fno-stack-protector解除该保护。

NX. 开启NX保护之后，程序的堆栈将会不可执行。对应GCC编译选项-z execstack解除该保护。

linux内存管理-庞杂的内存问题，如何理出自己的思路出来

Linux内核源码分析之《内存管理架构》

需要C/C++ Linux服务器架构师学习资料加qun812855908获取（资料包括C/C++，Linux，golang技术，Nginx，ZeroMQ，MySQL，Redis，fastdfs，MongoDB，ZK，流媒体，CDN，P2P，K8S，Docker，TCP/IP，协程，DPDK，ffmpeg等），免费分享

栈异常处理

一个函数（或方法）抛出异常，那么它首先将当前栈上的变量全部清空(unwinding)，如果变量是类对象的话，将调用其析构函数，接着，异常来到call stack的上一层，做相同操作，直到遇到catch语句。
指针是一个普通的变量，不是类对象，所以在清空call stack时，指针指向资源的析构函数将不会调用。

思考问题：

1 递归调用函数怎么从20层直接返回到17层，程序可以正常运行？

参考上面栈帧的结构，中心思想是当递归函数执行到第20层的时候，把当前栈帧的rbp值替换为17层的rbp的值，怎么得到17层rbp的值，就是通过反复取rbp的值（rbp保持了上一帧的rbp），核心代码如下：

/*change stack*/
int ret_stack(int layer)
{
 unsigned long rbp = 0;
 unsigned long layer_rbp = 0;
 int depth = 0;
 
 /* 1.得到首层函数的栈基址 */
 __asm__ volatile(
 "movq %%rbp, %0 nt"
 :"=r"(rbp)
 :
 :"memory");
 layer_rbp = rbp;
 cout << hex<< rbp <<endl;
 /* 2.逐层回溯栈基址 */
 for(; (depth < layer) && (0 != layer_rbp) && (0 != *(unsigned long *)layer_rbp) && (layer_rbp != *(unsigned long *)layer_rbp); ++depth) {
 cout << hex<< layer_rbp <<endl;
 layer_rbp = *(unsigned long *)layer_rbp;
 }
 cout << hex<< layer_rbp <<endl;
 //change current rbp to target layer rbp
 unsigned long *x = (unsigned long *)rbp;
 *x = layer_rbp;
 cout << hex<< x << " v:" << *x <<endl;
 return depth;
 }

2 调用约定有哪些？

我们最常用是以下几种约定

cdec

?是c/c++默认的调用约定

stdcall

它是微软Win32 API的一准标准，我们常用的回调函数就是通过这种调用方式

thiscall

thiscall 是c++中非静态类成员函数的默认调用约定

二、C++堆内存空间模型

1. C++ 程序动态申请内存new/delete：

new/delete 操作符,C++内置操作符

new操作符做两件事，分配内存+调用构造函数初始化。你不能改变它的行为；
delete操作符同样做两件事，调用析构函数+释放内存。你不能改变它的行为；

operator new/delete 函数

operator new ：

The default _allocation and deallocation functions_ are special components of the standard library; They have the following unique properties:

Global: All three versions of operator new are declared in the global namespace, not within thestdnamespace.
Implicit: The allocating versions (_(1)_ and _(2)_) are _implicitly declared_ in every translation unit of a C++ program, no matter whether header <new> is included or not.
Replaceable: The allocating versions (_(1)_ and _(2)_) are also _replaceable_: A program may provide its own definition that replaces the one provided by default to produce the result described above, or can overload it for specific types.

If set_new_handler has been used to define anew_handler function, this _new-handler_ function is called by the default definitions of the allocating versions (_(1)_ and _(2)_) if they fail to allocate the requested storage. operator new can be called explicitly as a regular function, but in C++, new is an operator with a very specific behavior: An expression with the new operator, first calls function operator new(i.e., this function) with the size of its type specifier as first argument, and if this is successful, it then automatically initializes or constructs the object (if needed). Finally, the expression evaluates as a pointer to the appropriate type.

1.是用来专门分配内存的函数，为new操作符调用，你能增加额外的参数重载函数operator new（有限制）:

限制1：第一个参数类型必须是size_t；限制2：函数必须返回void*；

2.operator new 底层一般调用malloc函数（gcc+glibc）分配内存；

3.operator new 分配失败会抛异常（默认），通过传递参数也可以不抛异常，返回空指针；

operator delete :

1.是用来专门分配内存的函数，为delete操作符调用，你能增加额外的参数重载函数operator delete（有限制）:

限制1：第一个参数类型必须是void*；限制2：函数必须返回void；

2.operator delete底层一般调用free函数（gcc+glibc）释放内存；

3.operator delete分配失败会抛异常（默认），通过传递参数也可以不抛异常，返回空指针；

placement new/delete 函数

placement new 其实就是new的一种重载，placement new是一种特殊的operator new，作用于一块已分配但未处理或未初始化的raw内存，就是用一块已经分配好的内存上重建对象（调用构造函数）；
它是C++库标准的一部分；
placement delete 什么都不做；

4. 数组分配 new[]/delete[] 表达式

对应会调用operator new[]/delete[]函数;
按对象的个数，分别调用构造函数和析构函数；

class-specific allocation functions(成员函数）

定制对象特殊new/delete函数；

实现一般是使用全局：

::operator new

::operator delete

关键点：

你想在堆上建立一个对象，应该用new操作符。它既分配内存又为对象调用构造函数。
如果你仅仅想分配内存，就应该调用operator new函数；它不会调用构造函数。
如果你想定制自己的在堆对象被建立时的内存分配过程，你应该写你自己的operator new函数，然后使用new操作符，new操作符会调用你定制的operator new。
如果你想在一块已经获得指针的内存里建立一个对象，应该用placement new。
C++可以为分配失败设置自己的异常处理函数：

If set_new_handler has been used to define a new_handler function, this _new-handler_ function is called by the default definitions of the allocating versions (_(1)_ and _(2)_) if they fail to allocate the requested storage.

如果在构造函数时候抛出异常，new表达式后面会调用对应operator delete函数释放内存：

The other signatures (_(2)_ and _(3)_) are never called by a _delete-expression_ (the delete operator always calls the ordinary version of this function, and exactly once for each of its arguments). These other signatures are only called automatically by a _new-expression_ when their object construction fails (e.g., if the constructor of an object throws while being constructed by a _new-expression_withnothrow, the matchingoperator deletefunction accepting anothrowargument is called).

思考问题：

1 malloc和free是怎么实现的？

2 malloc 分配多大的内存，就占用多大的物理内存空间吗？

3 free 的内存真的释放了吗（还给 OS ） ?

4 既然堆内内存不能直接释放，为什么不全部使用 mmap 来分配？

5 如何查看堆内内存的碎片情况？

6 除了 glibc 的 malloc/free ，还有其他第三方实现吗？

2. C++11的智能指针与垃圾回收

C++智能指针出现是为了解决由于支持动态内存分配而导致的一些C++内存问题，比如内存泄漏，对象生命周期的管理，悬挂指针（dangling pointer）/空指针等问题；
C++智能指针通过RAII设计模式去管理对象生命周期（动态内存管理），提供带少量异常类似普通指针的操作接口，在对象构造的时候分配内存，在对象作用域之外释放内存，帮助程序员管理动态内存；
老的智能指针auto_ptr由于设计语义不好而导致很多不合理问题：不支持复制（拷贝构造函数）和赋值（operator =），但复制或赋值的时候不会提示出错。因为不能被复制，所以不能被放入容器中。而被C++11弃用（deprecated）;

新的智能指针：

1. shared_ptr

shared_ptr是引用计数型（reference counting）智能指针, shared_ptr包含两个成员，一个是指向真正数据的指针，另一个是引用计数ref_count模块指针,对比GCC实现，大致原理如下，

共享对象（数据）（赋值拷贝），引用计数加1，指针消亡，引用计数减1，当引用计数为0，自动`析构所指的对象，引用计数是线程安全的（原子操作）。`

shared_ptr关键点：

用shared_ptr就不要new，保证内存管理的一致性;
使用weak_ptr来打破循环引用;
用make_shared来生成shared_ptr，提高效率，内存分配一次搞定，防止异常导致内存泄漏，参考https://herbsutter.com/gotw/_102/；
`大量的shared_ptr会导致程序性能下降（相对其他指针），需要等到所有的weak引用为0时才能最终释放内存（delete）； `
用enable_shared_from_this来使一个类能获取自身的shared_ptr;
不能在对象的构造函数中使用shared_from_this()函数，因为对象还没有构造完毕，share_ptr还没有初始化构造完全；构造顺序：先需要调用enable_shared_from_this类的构造函数，接着调用对象的构造函数，最后需要调用shared_ptr类的构造函数初始化enable_shared_from_this的成员变量weak_this_。然后才能使用shared_from_this()函数；

2. unique_ptr

独占指针，不共享,不能赋值拷贝;

unique_ptr关键点：

1. 如果对象不需要共享，一般最好都用unique_ptr，性能好，更安全；

2. 可以通过move语义传递对象的生命周期控制权；

3. 函数可以返回unique_ptr对象,为什么？

RVO和NRVO 当函数返回一个对象时，理论上会产生临时变量，那必然是会导致新对象的构造和旧对象的析构，这对效率是有影响的。C++编译针对这种情况允许进行优化，哪怕是构造函数有副作用，这叫做返回值优化（RVO),返回有名字的对象叫做具名返回值优化(NRVO)，就那RVO来说吧，本来是在返回时要生成临时对象的，现在构造返回对象时直接在接受返回对象的空间中构造了。假设不进行返回值优化，那么上面返回unique_ptr会不会有问题呢？也不会。因为标准允许编译器这么做： 1.如果支持move构造，那么调用move构造。 2.如果不支持move，那就调用copy构造。 3.如果不支持copy，那就报错吧。显然的，unique_ptr是支持move构造的，unique_ptr对象可以被函数返回。

3. weak_ptr

引用对象，不增加引用计数，对象生命周期，无法干预；
配合shared_ptr解决shared_ptr循环引用问题；
可以影响到对象内存最终释放的时间；

思考问题：

1 C++的赋值和Java的有什么区别？

C++的赋值可以是对象拷贝也可以对象引用，java的赋值是对象引用；

2 smart_ptr有哪些坑可以仍然导致内存泄漏？

2.1.shared_ptr初始化构造函数指针，一般是可以动态管理的内存地址，如果不是就可能导致内存泄漏；

2.2.shared_ptr要求内部new和delete实现必须是成对，一致性，如果不是就可能导致内存泄漏；

2.3. shared_ptr对象和其他大多数STL容器一样，本身不是线程安全的，需要用户去保证；

3 unique_ptr有哪些限制？

只能移动赋值转移数据，不能拷贝；
不支持类型转换（cast）；

4 智能指针是异常安全的吗？

所谓异常安全是指,当异常抛出时，带有异常安全的函数会:

不泄露任何资源
不允许数据被破坏

智能指针就是采用RAII技术，即以对象管理资源来防止资源泄漏。

Exception Safety Several functions in these smart pointer classes are specified as having "no effect" or "no effect except such-and-such" if an exception is thrown. This means that when an exception is thrown by an object of one of these classes, the entire program state remains the same as it was prior to the function call which resulted in the exception being thrown. This amounts to a guarantee that there are no detectable side effects. Other functions never throw exceptions. The only exception ever thrown by functions which do throw (assuming T meets the common requirements) is std::bad_alloc, and that is thrown only by functions which are explicitly documented as possibly throwing std::bad_alloc.

5 智能指针是线程安全的吗？

智能指针对象的引用计数模块是线程安全的，因为 shared_ptr 有两个数据成员，读写操作不能原子化，所以对象本身不是线程安全的，需要用户去保证线程安全。

Thread Safety shared_ptr objects offer the same level of thread safety as built-in types. A shared_ptrinstance can be "read" (accessed using only const operations) simultaneously by multiple threads. Different shared_ptr instances can be "written to" (accessed using mutable operations such as operator= or reset) simultaneously by multiple threads (even when these instances are copies, and share the same reference count underneath.) Any other simultaneous accesses result in undefined behavior.

C++标准垃圾回收

C++11 提供最小垃圾支持

declare_reachable undeclare_reachable declare_no_pointers undeclare_no_pointers pointer_safety get_pointer_safety

由于很多场景受限，当前几乎没有人使用；

思考问题：

1 C++可以通过哪些技术来支持“垃圾回收”？

smart_ptr，RAII， move语义等；

2 RAII是指什么？

RAII是指Resource Acquisition Is Initialization的设计模式，RAII要求，资源的有效期与持有资源的对象的生命期严格绑定，即由对象的构造函数完成资源的分配(获取)，同时由析构函数完成资源的释放。在这种要求下，只要对象能正确地析构，就不会出现资源泄露问题。当一个函数需要通过多个局部变量来管理资源时，RAII就显得非常好用。因为只有被构造成功(构造函数没有抛出异常)的对象才会在返回时调用析构函数，同时析构函数的调用顺序恰好是它们构造顺序的反序，这样既可以保证多个资源(对象)的正确释放，又能满足多个资源之间的依赖关系。由于RAII可以极大地简化资源管理，并有效地保证程序的正确和代码的简洁，所以通常会强烈建议在C++中使用它。