CN106528403B

CN106528403B - Monitoring method when software based on binary code implanted prosthetics is run

Info

Publication number: CN106528403B
Application number: CN201610877174.5A
Authority: CN
Inventors: 马建峰; 帕尔哈提江·斯迪克; 孙聪; 孙召昌; 吴奇烜
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2016-10-08
Filing date: 2016-10-08
Publication date: 2018-11-20
Anticipated expiration: 2036-10-08
Also published as: CN106528403A

Abstract

The invention discloses a software runtime monitoring method based on binary code implantation technology, which comprises the following steps: (1) extracting function calling relationship; (2) extracting function internal control flow information; (3) constructing a finite state machine; (4) initialization table TABbb; (5) initialization table TABstart and TABret; (6) initialization integer variable Cur_F and Cur_B; (7) implantation code; (8) monitoring software operation state; (9) end monitoring; The present invention It has the characteristics of low runtime overhead and relatively simple implementation.

Description

Software Runtime Monitoring Method Based on Binary Code Implantation Technology

技术领域technical field

本发明属于计算机技术领域，更进一步涉及软件安全技术领域中的一种基于二进制代码植入的软件运行时监控方法。本发明用于windows环境下的PE格式文件或linux环境下的ELF格式文件，使用静态二进制代码植入技术，对软件的运行轨迹进行有效的监控。The invention belongs to the technical field of computers, and further relates to a software runtime monitoring method based on binary code implantation in the technical field of software security. The invention is used for PE format files under the windows environment or ELF format files under the linux environment, and uses static binary code implantation technology to effectively monitor the running track of the software.

背景技术Background technique

随着计算机技术的发展，计算机软件已经渗透到国民经济的各个领域，一些关键软件一旦遭到破坏，会对用户造成经济和安全上的威胁，因此软件的安全性问题越来越凸显其重要性。针对特定软件的安全漏洞，运行恶意代码能够获得访问非法数据的权限。典型的安全漏洞包括缓冲区溢出漏洞。With the development of computer technology, computer software has penetrated into various fields of the national economy. Once some key software is damaged, it will cause economic and security threats to users. Therefore, the security of software has become more and more important. . Running malicious code to gain access to illegitimate data through security vulnerabilities in specific software. Typical security vulnerabilities include buffer overflow vulnerabilities.

软件运行时监控方法是一种在软件实际运行阶段，通过获得软件的运行状态信息和轨迹，并与事先分析出的预期运行轨迹进行比较，判断软件运行时安全性的一种技术。The software runtime monitoring method is a technology to judge the security of the software runtime by obtaining the software running status information and track during the actual running stage of the software, and comparing it with the expected running track analyzed in advance.

通常可利用静态分析工具提取软件的控制流信息，比如函数调用图、控制流图等。根据拟分析目标代码的类型，软件静态分析可分为基于源代码的分析和基于二进制码的分析。针对源代码分析直接对源代码程序表达式及数据结构进行分析；二进制分析在机器代码级上进行，分析可执行代码的中间表达式。基于源代码的分析只能分析出源代码文件所包含的控制流信息，无法分析出该软件所依赖的静态库和动态库文件包含的控制流信息。基于二进制码的分析，不仅可以分析出软件可执行文件内的控制流信息，还可以分析出该软件所依赖的动态库文件包含的一些控制流信息。Usually, static analysis tools can be used to extract software control flow information, such as function call graphs, control flow graphs, etc. According to the type of target code to be analyzed, software static analysis can be divided into analysis based on source code and analysis based on binary code. For source code analysis, the source code program expression and data structure are directly analyzed; binary analysis is performed at the machine code level, and the intermediate expression of the executable code is analyzed. Analysis based on source code can only analyze the control flow information contained in the source code file, but cannot analyze the control flow information contained in the static library and dynamic library files that the software depends on. Based on the analysis of the binary code, not only the control flow information in the software executable file can be analyzed, but also some control flow information contained in the dynamic library file that the software depends on can be analyzed.

软件运行时的轨迹可通过代码植入技术获取。代码植入技术分为基于源代码的代码植入和基于二进制的植入。其中，基于二进制的植入技术又分为动态植入和静态植入。与动态植入相比，静态植入在程序执行前完成，因而静态植入产生的运行时开销较小，同时静态植入实现较为简单。The trajectory of software runtime can be obtained through code implantation technology. Code implantation technology is divided into source code-based code implantation and binary-based implantation. Among them, the binary-based implant technology is further divided into dynamic implant and static implant. Compared with dynamic implantation, static implantation is completed before the program is executed, so the runtime overhead of static implantation is smaller, and the implementation of static implantation is relatively simple.

北京航空航天大学在其申请的专利“基于函数调用图的并行化安全漏洞检测方法”(专利申请号：201110417105.3，申请公布号：102567200A)中公开了一种漏洞检测方法。该方法使用了基于源代码的分析方法，生成了源代码文件对应模块内的函数调用图，只对源代码文件存在的安全漏洞进行检测。该方法的不足之处是，无法分析源代码所依赖的静态库和动态库文件对应模块内的函数调用关系和函数内的控制流信息，无法对源代码文件依赖的静态库和动态库存在的安全漏洞进行检测。Beihang University discloses a vulnerability detection method in its patent "parallel security vulnerability detection method based on function call graph" (patent application number: 201110417105.3, application publication number: 102567200A). This method uses an analysis method based on source code, generates a function call graph in the corresponding module of the source code file, and only detects the security loopholes in the source code file. The disadvantage of this method is that it cannot analyze the function call relationship and control flow information in the corresponding module of the static library and dynamic library file that the source code depends on, and cannot analyze the existence of the static library and dynamic library that the source code file depends on. Security vulnerabilities are detected.

常州云博软件工程技术有限公司在其申请的专利“一种软件探测器的软件探测方法”(专利申请号：201210054220.3，申请公布号：102646068A)中公开了一种对应用软件运行时程序流程信息进行实时探测的方法。该方法利用了基于源代码的代码植入技术，可以对计算机系统内的运行时软件进行实时监控。该方法的不足之处是，新植入的代码要编译后才能执行，无法避免较大运行时开销。Changzhou Yunbo Software Engineering Technology Co., Ltd. discloses a method for analyzing the program flow information when the application software is running method for real-time detection. The method utilizes the code implantation technology based on the source code, which can monitor the runtime software in the computer system in real time. The disadvantage of this method is that the newly implanted code can only be executed after being compiled, which cannot avoid large runtime overhead.

发明内容Contents of the invention

本发明的目的在于克服上述已有技术的不足，提出一种基于二进制代码植入的软件运行时监控方法。The purpose of the present invention is to overcome the deficiencies of the above-mentioned prior art, and propose a software runtime monitoring method based on binary code implantation.

为实现上述目的，本发明的思路是，利用静态二进制分析工具分析出被监控软件内函数之间的调用关系和函数内部的控制流信息，并生成函数调用图G(E,F)和控制流图G(B,E)，然后利用函数调用图G(E,F)构造一个有限状态机FSM(Z,S,T,S0,A)，用控制流图G(B,E)初始化表格TABbb。状态机中每一个状态分别对应函数调用图G(E,F)中的每个函数，同时状态机中加入一个无效状态。状态机中任意两个有效状态之间的迁移关系代表对应函数之间的有效函数调用或函数返回关系,任何非法的函数调用或返回均导致有限状态机迁移到无效状态。表格TABbb包含的行数与对应函数内基本块数量相同，每行包含三个字段：(1)分别代表一个基本块的索引，(2)基本块入口地址相对基本块所在函数的入口地址的相对偏移量，(3)基本块后继块的索引。利用静态二进制代码植入工具将包含监控代码的依赖库加载到被监控软件的进程地址空间中，并在被监控软件的特定位置植入对监控函数的调用语句。当被监控软件运行过程中发生函数调用、函数返回或函数内控制流变化时，会调用依赖库中的监控函数，监控函数通过查有限状态机和TABbb来判断被监控软件运行是否与事先分析预测的运行轨迹一致。In order to achieve the above object, the idea of the present invention is to use the static binary analysis tool to analyze the call relationship between the functions in the monitored software and the control flow information inside the function, and generate the function call graph G (E, F) and the control flow Graph G(B,E), then use the function call graph G(E,F) to construct a finite state machine FSM(Z,S,T,S0,A), and initialize the table TABbb with the control flow graph G(B,E) . Each state in the state machine corresponds to each function in the function call graph G(E, F), and an invalid state is added to the state machine. The transition relationship between any two valid states in the state machine represents the effective function call or function return relationship between the corresponding functions. Any illegal function call or return will cause the finite state machine to migrate to an invalid state. The number of rows contained in the table TABbb is the same as the number of basic blocks in the corresponding function, and each row contains three fields: (1) respectively represent the index of a basic block, (2) the entry address of the basic block relative to the entry address of the function where the basic block is located Offset, (3) The index of the successor block of the basic block. Use the static binary code implantation tool to load the dependent library containing the monitoring code into the process address space of the monitored software, and implant a call statement to the monitoring function at a specific location of the monitored software. When a function call, a function return, or a change in the control flow within the function occurs during the running of the monitored software, the monitoring function in the dependent library will be called. The monitoring function checks the finite state machine and TABbb to determine whether the running of the monitored software is consistent with the prior analysis and prediction The running trajectory is consistent.

实现本发明目的的具体步骤如下：The concrete steps that realize the object of the present invention are as follows:

(1)提取函数调用关系：(1) Extract function call relationship:

利用静态二进制分析工具，提取被监控软件可执行文件及其依赖库文件中的函数调用关系，并将函数调用关系存储到函数调用图G(E,F)数据结构中，其中，E表示有向边的集合，F表示函数调用图G(E,F)内所有函数的集合；Use static binary analysis tools to extract the function call relationship in the monitored software executable file and its dependent library files, and store the function call relationship in the function call graph G(E,F) data structure, where E represents a directed A collection of edges, F represents the collection of all functions in the function call graph G(E,F);

(2)提取函数内部控制流信息：(2) Extract the internal control flow information of the function:

利用静态二进制分析工具，提取函数调用图G(E,F)中每个函数的基本块及控制流信息，并将基于基本块的控制流信息存储到控制流图G(B,E)数据结构中，其中，E表示有向边的集合，B表示控制流图G(B,E)中基本块的集合；Use the static binary analysis tool to extract the basic block and control flow information of each function in the function call graph G(E,F), and store the control flow information based on the basic block into the control flow graph G(B,E) data structure Among them, E represents the set of directed edges, and B represents the set of basic blocks in the control flow graph G(B,E);

(3)构造一个有限状态机：(3) Construct a finite state machine:

(3a)将状态0添加到有限状态机的状态的非空有限集合S中；(3a) Add state 0 to the non-empty finite set S of states of the finite state machine;

(3b)对函数调用图G(E,F)中的每个函数进行编号，其中，第i个函数编号为i；(3b) Number each function in the function call graph G(E,F), wherein the i-th function number is i;

(3c)将函数调用图G(E,F)中的main(·)函数对应的状态1赋给有限状态机的初始状态S0；(3c) assign the state 1 corresponding to the main(·) function in the function call graph G(E,F) to the initial state S0 of the finite state machine;

(3d)对满足f_i∈F的函数f_i，将状态i分别添加到有限状态机的状态集合S和有限状态机的最终状态集合A中；(3d) For the function f _i satisfying f _i ∈ F, add the state i to the state set S of the finite state machine and the final state set A of the finite state machine;

(3e)对f_i∈F、f_j∈F的f_i和f_j，如果存在从函数f_i指向函数f_j的一条有向边，则将以下状态迁移添加到状态迁移函数T中：(3e) For f _i and f _j of f _i ∈ F, f _j ∈ F, if there is a directed edge from function f _i to function f _j , then add the following state transition to the state transition function T:

Z₀＝0，Z₁＝j→Nextstate[i]＝状态jZ ₀ =0, Z ₁ =j→Nextstate[i]=state j

Z₀＝1，Z₁＝i→Nextstate[j]＝状态iZ ₀ =1, Z ₁ =i→Nextstate[j]=state i

其中，Z₀和Z₁表示有限状态机的输入字母，Nextstate[i]表示状态i的下一个状态，Nextstate[j]表示状态j的下一个状态，i和j分别是函数f_i和函数f_j的编号；Among them, Z ₀ and Z ₁ represent the input letters of the finite state machine, Nextstate[i] represents the next state of state i, Nextstate[j] represents the next state of state j, i and j are function f _i and function f respectively the number of _j ;

(3f)对于所有的下一状态为空的状态，将其下一状态设置为状态0；(3f) For all states whose next state is empty, set its next state to state 0;

(4)初始化表格TABbb：(4) Initialize the table TABbb:

将函数调用图G(E,F)中的每个函数对应一个控制流图G(B,E)，将函数调用图G(E,F)中的每个函数对应一个初始化的表格TABbb，其中，初始化表格TABbb中的第i行对应控制流图G(B,E)中第i个基本块；每行包含三个字段index、offset、sucs，字段index表示第i个基本块的索引，字段offset表示第i个基本块的入口地址相对于第i个基本块所在函数入口地址的相对偏移量，字段sucs表示第i个基本块的后继基本块的索引；Each function in the function call graph G(E,F) corresponds to a control flow graph G(B,E), and each function in the function call graph G(E,F) corresponds to an initialized table TABbb, where , the i-th line in the initialization table TABbb corresponds to the i-th basic block in the control flow graph G(B,E); each line contains three fields index, offset, and sucs, the field index represents the index of the i-th basic block, and the field offset indicates the relative offset of the entry address of the i-th basic block relative to the entry address of the function where the i-th basic block is located, and the field sucs indicates the index of the successor basic block of the i-th basic block;

(5)初始化表格TABstart和TABret：(5) Initialize the tables TABstart and TABret:

(5a)当被监控软件加载后，获取函数调用图G(E,F)中所有函数的入口地址；(5a) After the monitored software is loaded, obtain the entry addresses of all functions in the function call graph G(E,F);

(5b)将第i个函数的相关信息添加到表格TABstart的第i行中；其中，表格TABstart中的第i行对应函数调用图G(E,F)中的第i个函数，每行包含三个字段addr、index、ptr，字段addr表示第i个函数的入口地址，字段index表示第i个函数的函数编号，字段ptr表示第i个函数对应表格TABbb的指针；(5b) Add the relevant information of the i-th function to the i-th row of the table TABstart; wherein, the i-th row in the table TABstart corresponds to the i-th function in the function call graph G(E,F), and each row contains Three fields addr, index, ptr, the field addr indicates the entry address of the i-th function, the field index indicates the function number of the i-th function, and the field ptr indicates the pointer of the i-th function corresponding to the table TABbb;

(5c)将值0、-1、-1添加到表格TABret的第一行中，表格TABret中的每行对应被监控软件执行过程中，数调用图G(E,F)中被调用的的函数；(5c) Add the values 0, -1, -1 to the first row of the table TABret, each row in the table TABret corresponds to the number called in the graph G(E,F) during the execution of the monitored software function;

(6)初始化整型变量Cur_F和Cur_B：(6) Initialize the integer variables Cur_F and Cur_B:

将函数调用图G(E,F)中main(·)函数的编号1赋给整型变量Cur_F，将该main(·)函数中第一个基本块的编号1赋给整型变量Cur_B；Assign the number 1 of the main(·) function in the function call graph G(E,F) to the integer variable Cur_F, and assign the number 1 of the first basic block in the main(·) function to the integer variable Cur_B;

(7)植入代码：(7) Implant code:

(7a)利用二进制代码的静态植入工具，将包含监控代码的动态库加载到被监控软件的进程地址空间中；(7a) Load the dynamic library containing the monitoring code into the process address space of the monitored software by using a static implant tool of binary code;

(7b)根据函数名查找动态库中的监控函数，并构造对监控函数的调用语句；(7b) Search the monitoring function in the dynamic library according to the function name, and construct the calling statement to the monitoring function;

(7c)利用代码植入工具，根据函数调用图G(E,F)和控制流图G(B,E)中的信息，分析被监控软件中的代码植入点；(7c) Use the code implantation tool to analyze the code implantation points in the monitored software according to the information in the function call graph G(E,F) and the control flow graph G(B,E);

(7d)将构造的调用语句植入到对应的代码植入点；(7d) implanting the constructed calling statement into the corresponding code implantation point;

(8)监控软件运行状态：(8) Monitoring software running status:

(8a)判断被监控软件是否执行到函数调用图G(E,F)中某个函数的入口位置，若是，则执行步骤(8b)，否则，执行步骤(8f)；(8a) Judging whether the monitored software is executed to the entry position of a certain function in the function call graph G (E, F), if so, then perform step (8b), otherwise, perform step (8f);

(8b)根据函数入口地址查表格TABstart，判断函数入口地址是否在表格TABstart中，若是，则将对应的函数编号和正整数0传给有限状态机，执行步骤(8c)，否则，执行步骤(9)；(8b) Check the table TABstart according to the function entry address, and judge whether the function entry address is in the table TABstart, if so, pass the corresponding function number and positive integer 0 to the finite state machine, and execute step (8c), otherwise, execute step (9 );

(8c)判断有限状态机接收的函数编号是否为1，若是，则执行步骤(8d)，否则，执行步骤(8e)；(8c) judge whether the function number received by the finite state machine is 1, if so, then perform step (8d), otherwise, perform step (8e);

(8d)将有限状态机的当前状态设置为初始状态S0，执行步骤(8a)；(8d) The current state of the finite state machine is set to the initial state S0, and step (8a) is performed;

(8e)判断有限状态机能否从当前状态迁移到接收编号对应的状态，若是，则有限状态机迁移到接收编号对应的状态，并将步骤(8b)中所获得的函数编号值赋给Cur_F，执行步骤(8a)，否则，有限状态机迁移到无效状态0，执行步骤(9)；(8e) judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, if so, then the finite state machine migrates to the state corresponding to the receiving number, and assign the function number value obtained in the step (8b) to Cur_F, Execute step (8a), otherwise, the finite state machine migrates to invalid state 0, and execute step (9);

(8f)判断被监控软件是否执行到函数调用图G(E,F)中函数内call指令位置，若是，则执行步骤(8g)，否则，执行步骤(8h)；(8f) judge whether the monitored software is executed to the call instruction position in the function in the function call graph G (E, F), if so, then execute step (8g), otherwise, execute step (8h);

(8g)将函数调用call指令后的下一条指令的地址添加到TABret中；将函数调用call指令所在函数的编号、所在基本块的索引添加到TABret中，执行步骤(8a)；(8g) adding the address of the next instruction after the function call call instruction to TABret; adding the numbering of the function where the function call call instruction is located, and the index of the basic block to TABret, and performing step (8a);

(8h)判断被监控软件是否执行到函数调用图G(E,F)中函数的出口点，若是，则执行步骤(8i)，否则，执行步骤(8l)；(8h) Judging whether the monitored software is executed to the exit point of the function in the function call graph G (E, F), if so, then perform step (8i), otherwise, perform step (8l);

(8i)判断表格TABret中最后一行中存储的地址值是否为0，如是，则执行步骤(9)，否则，执行步骤(8j)；(8i) judge whether the address value stored in the last row in the table TABret is 0, if so, then perform step (9), otherwise, perform step (8j);

(8j)判断函数返回指令ret中返回地址值是否与表格TABret中最后一行中存储的地址值一致，若是，则执行步骤(8k)，否则，执行步骤(9)；(8j) judging whether the return address value in the function return instruction ret is consistent with the address value stored in the last row in the table TABret, if so, then execute step (8k), otherwise, execute step (9);

(8k)将表格TABret最后一行对应函数的编号和正整数1传给有限状态机，判断有限状态机能否从当前状态迁移到接收编号对应的状态，若是，则有限状态机迁移到接收编号对应的状态，并将表格TABret最后一行对应函数的编号赋给整型变量Cur_F，表格TABret中最后一行中存储的基本块索引值赋给整型变量Cur_B，删除表格TABret中的最后一条记录，执行步骤(8a)，否则，有限状态机迁移到无效状态0，执行步骤(9)；(8k) Pass the number of the function corresponding to the last line of the table TABret and the positive integer 1 to the finite state machine, and judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, and if so, then the finite state machine will migrate to the state corresponding to the receiving number , and assign the number of the function corresponding to the last row of the table TABret to the integer variable Cur_F, assign the basic block index value stored in the last row of the table TABret to the integer variable Cur_B, delete the last record in the table TABret, and execute step (8a ), otherwise, the finite state machine migrates to invalid state 0, and executes step (9);

(8l)当被监控软件执行到控制流图G(B,E)中基本块的入口位置，计算该基本块入口点相对于包含该基本块的函数的入口点的相对偏移量；(81) When the monitored software is executed to the entry position of the basic block in the control flow graph G (B, E), calculate the relative offset of the entry point of the basic block relative to the entry point of the function containing the basic block;

(8m)判断计算出来的偏移量是否为0，若是，则执行步骤(8n)，否则，执行步骤(8o)；(8m) judge whether the calculated offset is 0, if so, execute step (8n), otherwise, execute step (8o);

(8n)将正整数1赋给变量Cur_B，执行步骤(8a)；(8n) assign positive integer 1 to variable Cur_B, perform step (8a);

(8o)判断该偏移量是否是Cur_B对应基本块的后继块中一个的偏移量，若是，则将该后继块的编号赋给整型变量Cur_B，执行步骤(8a)，否则执行步骤(9)；(8o) judge whether this offset is the offset of one of the successor blocks corresponding to the basic block of Cur_B, if so, then assign the numbering of the successor block to the integer variable Cur_B, execute step (8a), otherwise execute step ( 9);

(9)结束监控。(9) End monitoring.

本发明与现有技术相比具有以下优点：Compared with the prior art, the present invention has the following advantages:

第一，由于本发明采用了静态二进制分析工具，提取被监控软件文件及其依赖库文件中的函数调用关系，并将函数调用关系存储到函数调用图G(E,F)数据结构中，提取函数调用图G(E,F)中每个函数的基于基本块的控制流信息，不仅可以监控软件可执行文件对应的控制流，还可以监控软件所依赖动态库文件对应的控制流。从而克服了现有技术只能对软件源代码文件对应控制流进行监控的不足，使得本发明的监控方法具有更全面的优点。First, because the present invention has adopted the static binary analysis tool, extracts the function call relationship in the monitored software file and its dependent library file, and stores the function call relationship in the function call graph G(E, F) data structure, extracts The basic block-based control flow information of each function in the function call graph G(E,F) can not only monitor the control flow corresponding to the software executable file, but also monitor the control flow corresponding to the dynamic library file that the software depends on. Therefore, the deficiency that the prior art can only monitor the control flow corresponding to the software source code file is overcome, so that the monitoring method of the present invention has more comprehensive advantages.

第二，由于本发明利用二进制的静态代码植入工具，将包含监控代码的动态库加载到被监控软件的进程地址空间里，只对软件代码进行静态分析，找出代码植入点，而且静态植入过程在软件执行之前就已完成，从而克服了现有技术产生较高运行时开销的不足，使得本发明具有运行时开销低，实现较为简单的特点。Second, because the present invention utilizes binary static code implantation tool, the dynamic storehouse that will comprise monitoring code is loaded in the process address space of monitored software, only software code is statically analyzed, find out code implantation point, and static The implantation process is completed before the software is executed, thereby overcoming the disadvantage of high runtime overhead in the prior art, so that the present invention has the characteristics of low runtime overhead and relatively simple implementation.

具体实施方式Detailed ways

附图说明Description of drawings

图1为本发明的整体流程图；Fig. 1 is the overall flowchart of the present invention;

图2为本发明监控软件运行状态步骤的流程图。Fig. 2 is a flow chart of the steps of monitoring the running status of the software in the present invention.

具体实施方式Detailed ways

下面结合附图对本发明做进一步的描述。The present invention will be further described below in conjunction with the accompanying drawings.

参照附图1，对本发明的具体步骤做进一步的描述。With reference to accompanying drawing 1, the specific steps of the present invention are further described.

步骤1，提取函数调用关系。Step 1, extract function call relationship.

利用静态二进制分析工具，提取被监控软件可执行文件及其依赖库文件中的函数调用关系，并将函数调用关系存储到函数调用图G(E,F)数据结构中，其中，E表示有向边的集合，F表示函数调用图G(E,F)内所有函数的集合。Use static binary analysis tools to extract the function call relationship in the monitored software executable file and its dependent library files, and store the function call relationship in the function call graph G(E,F) data structure, where E represents a directed A collection of edges, F represents the collection of all functions in the function call graph G(E,F).

步骤2，提取函数内部控制流信息。Step 2, extract the internal control flow information of the function.

利用静态二进制分析工具，提取函数调用图G(E,F)中每个函数的基本块及控制流信息，并将基于基本块的控制流信息存储到控制流图G(B,E)数据结构中，其中，E表示有向边的集合，B表示控制流图G(B,E)中基本块的集合。Use the static binary analysis tool to extract the basic block and control flow information of each function in the function call graph G(E,F), and store the control flow information based on the basic block into the control flow graph G(B,E) data structure , where E represents the set of directed edges, and B represents the set of basic blocks in the control flow graph G(B,E).

步骤3，构造一个有限状态机。Step 3, construct a finite state machine.

将状态0添加到有限状态机的状态的非空有限集合S中。Add state 0 to the non-empty finite set S of states of the finite state machine.

对函数调用图G(E,F)中的每个函数进行编号，其中，第i个函数编号为i。Number each function in the function call graph G(E,F), where the i-th function is numbered i.

将函数调用图G(E,F)中的main(·)函数对应的状态1赋给有限状态机的初始状态S0。Assign the state 1 corresponding to the main(·) function in the function call graph G(E,F) to the initial state S0 of the finite state machine.

对满足f_i∈F的函数f_i，将状态i分别添加到有限状态机的状态集合S和有限状态机的最终状态集合A中。For the function f _i satisfying f _i ∈ F, add the state i to the state set S of the finite state machine and the final state set A of the finite state machine respectively.

对f_i∈F、f_j∈F的f_i和f_j，如果存在从函数f_i指向函数f_j的一条有向边，则将以下状态迁移添加到状态迁移函数T中：For f _i ∈ F, f _j ∈ F of f _i and f _j , if there is a directed edge from function f _i to function f _j , the following state transition is added to the state transition function T:

其中，Z₀和Z₁表示有限状态机的输入字母，Nextstate[i]表示状态i的下一个状态，Nextstate[j]表示状态j的下一个状态，i和j分别是函数f_i和函数f_j的编号。Among them, Z ₀ and Z ₁ represent the input letters of the finite state machine, Nextstate[i] represents the next state of state i, Nextstate[j] represents the next state of state j, i and j are function f _i and function f respectively _j 's number.

对于所有的下一状态为空的状态，将其下一状态设置为状态0。For all states whose next state is empty, set their next state to state 0.

对于包含N个函数的函数调用图G(E,F)，本算法将会构造包含N+1个状态的有限状态机FSM(Z,S,T,S0,A)，状态0代表无效状态，其余N个状态分别对应函数调用图G(E,F)中的N个函数，任何两个函数之间的有效函数调用或返回关系会映射到状态机中对应状态之间的迁移关系，任何非法的函数调用或返回会导致有限状态机迁移到无效状态；其中，Z表示状态机的输入字母表，S表示状态机的状态的非空有限集合，S0表示状态机的初始状态，A表示状态机的最终状态的集合，T表示状态机的状态迁移关函数：S×Z→S，状态机的输入字母表Z包含两个字母Z₀和Z₁，Z₀取0或1，Z₁取1到N中的一个正整数。For a function call graph G(E,F) containing N functions, this algorithm will construct a finite state machine FSM(Z,S,T,S0,A) containing N+1 states, and state 0 represents an invalid state, The remaining N states correspond to the N functions in the function call graph G(E, F), and the effective function call or return relationship between any two functions will be mapped to the transition relationship between the corresponding states in the state machine. Any illegal The function call or return of will cause the finite state machine to migrate to an invalid state; where Z represents the input alphabet of the state machine, S represents the non-empty finite set of states of the state machine, S0 represents the initial state of the state machine, and A represents the state machine T represents the state transition function of the state machine: S×Z→S, the input alphabet Z of the state machine contains two letters Z ₀ and Z ₁ , Z ₀ takes 0 or 1, and Z ₁ takes 1 to a positive integer in N.

步骤4，初始化表格TABbb。Step 4, initialize the table TABbb.

将函数调用图G(E,F)中的每个函数对应一个控制流图G(B,E)，将函数调用图G(E,F)中的每个函数对应一个初始化的表格TABbb，其中，初始化表格TABbb中的第i行对应控制流图G(B,E)中第i个基本块；每行包含三个字段index、offset、sucs，字段index表示第i个基本块的索引，字段offset表示第i个基本块的入口地址相对于第i个基本块所在函数入口地址的相对偏移量，字段sucs表示第i个基本块的后继基本块的索引。Each function in the function call graph G(E,F) corresponds to a control flow graph G(B,E), and each function in the function call graph G(E,F) corresponds to an initialized table TABbb, where , the i-th line in the initialization table TABbb corresponds to the i-th basic block in the control flow graph G(B,E); each line contains three fields index, offset, and sucs, the field index represents the index of the i-th basic block, and the field offset indicates the relative offset of the entry address of the i-th basic block relative to the entry address of the function where the i-th basic block is located, and the field sucs indicates the index of the successor basic block of the i-th basic block.

步骤5，初始化表格TABstart和TABret。Step 5, initialize the tables TABstart and TABret.

当被监控软件加载后，获取函数调用图G(E,F)中所有函数的入口地址。After the monitored software is loaded, the entry addresses of all functions in the function call graph G(E,F) are obtained.

将第i个函数的相关信息添加到表格TABstart的第i行中；其中，表格TABstart中的第i行对应函数调用图G(E,F)中的第i个函数，每行包含三个字段addr、index、ptr，字段addr表示第i个函数的入口地址，字段index表示第i个函数的函数编号，字段ptr表示第i个函数对应表格TABbb的指针。Add the relevant information of the i-th function to the i-th row of the table TABstart; wherein, the i-th row in the table TABstart corresponds to the i-th function in the function call graph G(E,F), and each row contains three fields addr, index, ptr, the field addr indicates the entry address of the i-th function, the field index indicates the function number of the i-th function, and the field ptr indicates the pointer of the i-th function corresponding to the table TABbb.

将值0、-1、-1添加到表格TABret的第一行中，表格TABret中的每行对应被监控软件执行过程中，数调用图G(E,F)中被调用的的函数。Add the values 0, -1, -1 to the first row of the table TABret, each row in the table TABret corresponds to the function called in the number call graph G(E,F) during the execution of the monitored software.

步骤6，初始化整型变量Cur_F和Cur_B。Step 6, initialize the integer variables Cur_F and Cur_B.

将函数调用图G(E,F)中main(·)函数的编号1赋给整型变量Cur_F，将该main(·)函数中第一个基本块的编号1赋给整型变量Cur_B。Assign the number 1 of the main(·) function in the function call graph G(E,F) to the integer variable Cur_F, and assign the number 1 of the first basic block in the main(·) function to the integer variable Cur_B.

步骤7，植入代码。Step 7, implant code.

利用二进制代码的静态植入工具，将包含监控代码的动态库加载到被监控软件的进程地址空间中。Using the static implant tool of the binary code, the dynamic library containing the monitoring code is loaded into the process address space of the monitored software.

根据函数名查找动态库中的监控函数，并构造对监控函数的调用语句。Find the monitoring function in the dynamic library according to the function name, and construct the calling statement to the monitoring function.

利用代码植入工具，根据函数调用图G(E,F)和控制流图G(B,E)中的信息，分析被监控软件中的代码植入点。Use the code implant tool to analyze the code implant points in the monitored software according to the information in the function call graph G(E,F) and the control flow graph G(B,E).

代码植入点包括：函数调用图G(E,F)中函数的入口点BPatch_entry，函数出口点BPatch_exit，函数中包含静态call指令点BPatch_subroutine以及控制流图G(B,E)中基本块的入口点BPatch_locBasicBlockEntry。The code implantation points include: the entry point BPatch_entry of the function in the function call graph G(E,F), the function exit point BPatch_exit, the function contains the static call instruction point BPatch_subroutine and the entry point of the basic block in the control flow graph G(B,E) Click BPatch_locBasicBlockEntry.

将构造的调用语句植入到对应的代码植入点。Implant the constructed calling statement into the corresponding code implantation point.

在函数调用图G(E,F)中函数的入口点BPatch_entry植入对函数调用关系合法性检验函数的调用语句，该函数的功能是：根据函数入口地址查表格TABstart，若该地址在TABstart中则将对应的函数编号传给有限状态机FSM，FSM来查是否存在从当前状态到接受编号对应状态的迁移关系，若有说明函数调用是合法。在函数调用图G(E,F)中函数的入口点BPatch_exit点植入对函数返回关系合法性检验函数的调用语句，该函数的功能是：根据ret指令中的函数返回地址查表格TABret，若表格TABret最后一行中所记录的返回地址是0，说明被监控软件执行到函数调用图G(E,F)中main(·)函数的出口点，结束执行，否则，判断表格TABret最后一行中所记录的返回地址是否与ret指令中的返回地址相同，若是，则将最后一行对应函数的编号传给有限状态机FSM，FSM来查是否存在从当前状态到接受编号对应状态的迁移关系，若存在说明函数返回是合法的，否则结束监控。在函数调用图G(E,F)中函数的call指令点BPatch_subroutine植入对返回信息存储函数的调用语句，当执行完call指令后，该函数会将call指令下一条指令的地址、包含此call指令的函数编号、变量Cur_F和Cur_B存储到表格TABret的最后一行中。在控制流图G(B,E)中基本块的入口点BPatch_locBasicBlockEntry植入对函数内部控制流合法性检验函数的调用语句，该函数的功能是根据基本块的入口地址和基本块所在函数的入口地址计算基本块入口点对于函数入口点的相对偏移量，再根据变量Cur_F和Cur_B来查基本块所在函数对应的表格TABbb，若该偏移量是Cur_B对应基本块的后继块中某一个的相对偏移量，则说明控制流改变是合法的，更新Cur_B的值，否则，监控结束。In the function call graph G(E,F), the entry point BPatch_entry of the function implants the call statement to the function call relationship validity verification function. The function of this function is: look up the table TABstart according to the function entry address, if the address is in TABstart Then pass the corresponding function number to the finite state machine FSM, and the FSM checks whether there is a transition relationship from the current state to the state corresponding to the acceptance number. If there is, it means that the function call is legal. In the function call graph G(E, F), the entry point BPatch_exit point of the function implants the call statement to the function return relationship legality inspection function. The function of this function is: according to the function return address in the ret instruction, look up the table TABret, if The return address recorded in the last row of the table TABret is 0, indicating that the monitored software executes to the exit point of the main(·) function in the function call graph G(E,F) and ends the execution; Whether the recorded return address is the same as the return address in the ret instruction, if so, pass the number of the corresponding function in the last line to the finite state machine FSM, and the FSM checks whether there is a transition relationship from the current state to the state corresponding to the acceptance number, if there is Indicates that the return of the function is legal, otherwise the monitoring will end. In the function call graph G(E,F), the call instruction point BPatch_subroutine of the function implants the call statement to the return information storage function. After the call instruction is executed, the function will use the address of the next instruction of the call instruction, including the call The function number of the instruction, the variables Cur_F and Cur_B are stored in the last row of the table TABret. The entry point BPatch_locBasicBlockEntry of the basic block in the control flow graph G(B,E) implants a call statement to the function's internal control flow validity verification function. The function of this function is based on the entry address of the basic block and the entry of the function where the basic block is located. The address calculates the relative offset of the entry point of the basic block to the entry point of the function, and then checks the table TABbb corresponding to the function where the basic block is located according to the variables Cur_F and Cur_B, if the offset is one of the successor blocks of the basic block corresponding to Cur_B Relative offset, it means that the control flow change is legal, update the value of Cur_B, otherwise, the monitoring ends.

参照附图2，对本发明监控软件运行状态步骤的流程做进一步的描述。Referring to accompanying drawing 2, the flow of the steps of monitoring software running status in the present invention will be further described.

步骤8，监控软件运行状态。Step 8, monitor the running status of the software.

(8a)判断被监控软件是否执行到函数调用图G(E,F)中某个函数的入口位置，若是，则执行步骤(8b)，否则，执行步骤(8f)。(8a) Determine whether the monitored software is executed to the entry position of a certain function in the function call graph G(E, F), if so, perform step (8b), otherwise, perform step (8f).

(8b)根据函数入口地址查表格TABstart，判断函数入口地址是否在表格TABstart中，若是，则将对应的函数编号和正整数0传给有限状态机，执行步骤(8c)，否则，执行步骤9。(8b) Check the table TABstart according to the function entry address, and judge whether the function entry address is in the table TABstart, if so, pass the corresponding function number and positive integer 0 to the finite state machine, and execute step (8c), otherwise, execute step 9.

(8c)判断有限状态机接收的函数编号是否为1，若是，则执行步骤(8d)，否则，执行步骤(8e)。(8c) Determine whether the function number received by the finite state machine is 1, if so, execute step (8d), otherwise, execute step (8e).

(8d)将有限状态机的当前状态设置为初始状态S0，执行步骤(8a)。(8d) Set the current state of the finite state machine as the initial state S0, and execute step (8a).

(8e)判断有限状态机能否从当前状态迁移到接收编号对应的状态，若是，则有限状态机迁移到接收编号对应的状态，并将步骤(8b)中所获得的函数编号值赋给Cur_F，执行步骤(8a)，否则，有限状态机迁移到无效状态0，执行步骤9；(8e) judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, if so, then the finite state machine migrates to the state corresponding to the receiving number, and assign the function number value obtained in the step (8b) to Cur_F, Execute step (8a), otherwise, the finite state machine migrates to invalid state 0, and executes step 9;

(8f)判断被监控软件是否执行到函数调用图G(E,F)中函数内call指令位置，若是，则执行步骤(8g)，否则，执行步骤(8h)。(8f) Determine whether the monitored software is executed to the call instruction position in the function in the function call graph G(E, F), if so, execute step (8g), otherwise, execute step (8h).

(8g)将函数调用call指令后的下一条指令的地址添加到TABret中；将函数调用call指令所在函数的编号、所在基本块的索引添加到TABret中，执行步骤(8a)。(8g) Add the address of the next instruction after the function call call instruction to TABret; add the number of the function where the function call call instruction is located, and the index of the basic block where it is located in TABret, and perform step (8a).

(8h)判断被监控软件是否执行到函数调用图G(E,F)中函数的出口点，若是，则执行步骤(8i)，否则，执行步骤(8l)。(8h) Determine whether the monitored software is executed to the exit point of the function in the function call graph G(E, F), if so, execute step (8i), otherwise, execute step (8l).

(8i)判断表格TABret中最后一行中存储的地址值是否为0，如是，则执行步骤9，否则，执行步骤(8j)。(8i) Determine whether the address value stored in the last row in the table TABret is 0, if yes, execute step 9, otherwise, execute step (8j).

(8j)判断函数返回指令ret中返回地址值是否与表格TABret中最后一行中存储的地址值一致，若是，则执行步骤(8k)，否则，执行步骤9。(8j) Determine whether the return address value in the function return instruction ret is consistent with the address value stored in the last row in the table TABret, if so, execute step (8k), otherwise, execute step 9.

(8k)将表格TABret最后一行对应函数的编号和正整数1传给有限状态机，判断有限状态机能否从当前状态迁移到接收编号对应的状态，若是，则有限状态机迁移到接收编号对应的状态，并将表格TABret最后一行对应函数的编号赋给整型变量Cur_F，表格TABret中最后一行中存储的基本块索引值赋给整型变量Cur_B，删除表格TABret中的最后一条记录，执行步骤(8a)，否则，有限状态机迁移到无效状态0，执行步骤9。(8k) Pass the number of the function corresponding to the last line of the table TABret and the positive integer 1 to the finite state machine, and judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, and if so, then the finite state machine will migrate to the state corresponding to the receiving number , and assign the number of the function corresponding to the last row of the table TABret to the integer variable Cur_F, assign the basic block index value stored in the last row of the table TABret to the integer variable Cur_B, delete the last record in the table TABret, and execute step (8a ), otherwise, the finite state machine transitions to the invalid state 0, and executes step 9.

(8l)当被监控软件执行到控制流图G(B,E)中基本块的入口位置，计算该基本块入口点相对于包含该基本块的函数的入口点的相对偏移量。(8l) When the monitored software executes to the entry position of the basic block in the control flow graph G(B, E), calculate the relative offset of the entry point of the basic block relative to the entry point of the function containing the basic block.

(8m)判断计算出来的偏移量是否为0，若是，则执行步骤(8n)，否则，执行步骤(8o)。(8m) Determine whether the calculated offset is 0, if so, execute step (8n), otherwise, execute step (8o).

(8n)将正整数1赋给变量Cur_B，执行步骤(8a)。(8n) Assign the positive integer 1 to the variable Cur_B, and execute step (8a).

(8o)判断该偏移量是否是Cur_B对应基本块的后继块中一个的偏移量，若是，则将该后继块的编号赋给整型变量Cur_B，执行步骤(8a)，否则执行步骤9。(8o) Determine whether the offset is the offset of one of the successor blocks corresponding to the basic block of Cur_B, if so, assign the number of the successor block to the integer variable Cur_B, and perform step (8a), otherwise perform step 9 .

步骤9，结束监控。Step 9, end monitoring.

Claims

1. A software runtime monitoring method based on binary code implantation technology, the specific steps are as follows:

(1) Extract function call relationship:

Use static binary analysis tools to extract the function call relationship in the monitored software executable file and its dependent library files, and store the function call relationship in the function call graph G(E,F) data structure, where E represents a directed A collection of edges, F represents the collection of all functions in the function call graph G(E,F);

(2) Extract the internal control flow information of the function:

Use the static binary analysis tool to extract the basic block and control flow information of each function in the function call graph G(E,F), and store the control flow information based on the basic block into the control flow graph G(B,E) data structure Among them, E represents the set of directed edges, and B represents the set of basic blocks in the control flow graph G(B,E);

(3) Construct a finite state machine:

(3a) Add state 0 to the non-empty finite set S of states of the finite state machine;

(3b) Number each function in the function call graph G(E, F), wherein, the i-th function number is i, recorded as f _i ;

(3c) assign the state 1 corresponding to the main(·) function in the function call graph G(E,F) to the initial state S0 of the finite state machine;

(3d) For the function f _i satisfying f _i ∈ F, add the state i to the state set S of the finite state machine and the final state set A of the finite state machine;

(3e) For f _i and f _j of f _i ∈ F, f _j ∈ F, if there is a directed edge from function f _i to function f _j , then add the following state transition to the state transition function T:

Z ₀ =0, Z ₁ =j→Nextstate[i]=state j

Z ₀ =1, Z ₁ =i→Nextstate[j]=state i

Among them, Z ₀ and Z ₁ represent the input letters of the finite state machine, Nextstate[i] represents the next state of state i, Nextstate[j] represents the next state of state j, i and j are function f _i and function f respectively the number of _j ;

(3f) For all states whose next state is empty, set its next state to state 0;

(4) Initialize the form TABbb:

Each function in the function call graph G(E,F) corresponds to a control flow graph G(B,E), and each function in the function call graph G(E,F) corresponds to an initialized table TABbb, where , the i-th line in the initialization table TABbb corresponds to the i-th basic block in the control flow graph G(B,E); each line contains three fields index, offset, and sucs, the field index represents the index of the i-th basic block, and the field offset indicates the relative offset of the entry address of the i-th basic block relative to the entry address of the function where the i-th basic block is located, and the field sucs indicates the index of the successor block of the i-th basic block;

(5) Initialize the tables TABstart and TABret:

(5a) After the monitored software is loaded, obtain the entry addresses of all functions in the function call graph G(E,F);

(5b) Add the relevant information of the i-th function to the i-th row of the table TABstart; wherein, the i-th row in the table TABstart corresponds to the i-th function in the function call graph G(E,F), and each row contains Three fields addr, index, ptr, the field addr indicates the entry address of the i-th function, the field index indicates the function number of the i-th function, and the field ptr indicates the pointer of the i-th function corresponding to the table TABbb;

(5c) Add the values 0, -1, -1 to the first row of the table TABret, each row in the table TABret corresponds to the number called in the graph G(E,F) during the execution of the monitored software function;

(6) Initialize the integer variables Cur_F and Cur_B:

Assign the number 1 of the main(·) function in the function call graph G(E,F) to the integer variable Cur_F, and assign the number 1 of the first basic block in the main(·) function to the integer variable Cur_B;

(7) Implant code:

(7a) Load the dynamic library containing the monitoring code into the process address space of the monitored software by using a static implant tool of binary code;

(7b) Search the monitoring function in the dynamic library according to the function name, and construct the calling statement to the monitoring function;

(7c) Use the code implantation tool to analyze the code implantation points in the monitored software according to the information in the function call graph G(E,F) and the control flow graph G(B,E);

(7d) implanting the constructed calling statement into the corresponding code implantation point;

(8) Monitoring software running status:

(8a) Judging whether the monitored software is executed to the entry position of a certain function in the function call graph G (E, F), if so, then perform step (8b), otherwise, perform step (8f);

(8b) Check the table TABstart according to the function entry address, and judge whether the function entry address is in the table TABstart, if so, pass the corresponding function number and positive integer 0 to the finite state machine, and execute step (8c), otherwise, execute step (9 );

(8c) judge whether the function number received by the finite state machine is 1, if so, then perform step (8d), otherwise, perform step (8e);

(8d) The current state of the finite state machine is set to the initial state S0, and step (8a) is performed;

(8e) judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, if so, then the finite state machine migrates to the state corresponding to the receiving number, and assign the function number value obtained in the step (8b) to Cur_F, Execute step (8a), otherwise, the finite state machine migrates to invalid state 0, and execute step (9);

(8f) judge whether the monitored software is executed to the call instruction position in the function in the function call graph G (E, F), if so, then execute step (8g), otherwise, execute step (8h);

(8g) adding the address of the next instruction after the function call call instruction to TABret; adding the numbering of the function where the function call call instruction is located, and the index of the basic block to TABret, and performing step (8a);

(8h) Judging whether the monitored software is executed to the exit point of the function in the function call graph G (E, F), if so, then perform step (8i), otherwise, perform step (81);

(8i) judge whether the address value stored in the last row in the table TABret is 0, if so, then perform step (9), otherwise, perform step (8j);

(8j) judging whether the return address value in the function return instruction ret is consistent with the address value stored in the last row in the table TABret, if so, then execute step (8k), otherwise, execute step (9);

(8k) Pass the number of the function corresponding to the last line of the table TABret and the positive integer 1 to the finite state machine, and judge whether the finite state machine can migrate from the current state to the state corresponding to the receiving number, and if so, then the finite state machine will migrate to the state corresponding to the receiving number , and assign the number of the function corresponding to the last row of the table TABret to the integer variable Cur_F, assign the basic block index value stored in the last row of the table TABret to the integer variable Cur_B, delete the last record in the table TABret, and execute step (8a ), otherwise, the finite state machine migrates to invalid state 0, and executes step (9);

(81) When the monitored software is executed to the entry position of the basic block in the control flow graph G (B, E), calculate the relative offset of the entry point of the basic block relative to the entry point of the function containing the basic block;

(8m) judge whether the calculated offset is 0, if so, execute step (8n), otherwise, execute step (8o);

(8n) assign positive integer 1 to variable Cur_B, perform step (8a);

(8o) judge whether this offset is the offset of one of the successor blocks corresponding to the basic block of Cur_B, if so, then assign the numbering of the successor block to the integer variable Cur_B, execute step (8a), otherwise execute step ( 9);

(9) End monitoring.