CN115146272A

CN115146272A - Vulnerability testing method and vulnerability testing device for browser

Info

Publication number: CN115146272A
Application number: CN202110340809.9A
Authority: CN
Inventors: 张道全; 李琪; 唐洪玉; 张鉴; 李存琛
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-04
Anticipated expiration: 2041-03-30
Also published as: CN115146272B

Abstract

The disclosure provides a vulnerability testing method and device for a browser. The vulnerability testing method comprises the following steps: selecting sample grammar data from a vulnerability grammar library, and preprocessing the sample grammar data to obtain a to-be-trained data set; training a data set to be trained by using a deep learning network model to obtain an initial sample set to be tested, wherein the initial sample set to be tested comprises a plurality of initial first samples to be tested; performing model prediction and assembly processing on at least one part of a plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested; inputting the second sample to be detected into the browser so that the browser can analyze the second sample to be detected, and monitoring the running state of the browser; and determining the second sample to be detected as an abnormal sample under the condition that the operation of the browser is detected to have a fault, generating abnormal log data, and storing the abnormal sample and the abnormal log data. The vulnerability detection efficiency of the browser can be improved.

Description

Vulnerability testing method and vulnerability testing device for browsers

技术领域technical field

本公开涉及信息安全领域，特别涉及一种用于浏览器的漏洞测试方法及其漏洞测试装置。The present disclosure relates to the field of information security, and in particular, to a vulnerability testing method and a vulnerability testing device for browsers.

背景技术Background technique

随着网络技术的发展，越来越多的人通过互联网了解世界。而人们踏入网络世界大门的工具中最常规的工具是浏览器，因此，浏览器的安全非常重要。在传统浏览器漏洞检测过程中，通常优先构造一个具有语法模板的html(hyper text markup language，超文本标记语言)文件。随机策略根据语法模板进行增添、删除、修改等操作生成样本集。这种大量的随机操作使得生成的样本在浏览器漏洞检测过程中存在如下几个不足：With the development of network technology, more and more people understand the world through the Internet. And the most common tool that people use to get into the online world is the browser, so the security of the browser is very important. In a traditional browser vulnerability detection process, an html (hyper text markup language, hypertext markup language) file with a syntax template is usually constructed first. The random strategy generates sample sets by adding, deleting, modifying and other operations according to the grammar template. Such a large number of random operations make the generated samples have the following shortcomings in the process of browser vulnerability detection:

测试代码冗余：基于随机种子和语法规则模板；随机测试容易造成生成重复代码较多，降低检测效率。Test code redundancy: based on random seeds and grammar rule templates; random tests are likely to generate more duplicate codes and reduce detection efficiency.

例如，传统测试代码采用一个大型javascript句法库，比如语法库有如下6行代码：For example, the traditional test code uses a large javascript syntax library. For example, the syntax library has the following 6 lines of code:

Var1＝var1.add(1+2)；Var1=var1.add(1+2);

Var2＝var2.sub(obj.err.a-77623)；Var2 = var2.sub(obj.err.a-77623);

Var3＝new array(1,2,3,4,5,6,7)；Var3=new array(1,2,3,4,5,6,7);

var var_rand1＝var_dataview1[4]；var var_rand1=var_dataview1[4];

let var_rand2＝var_string1；let var_rand2 = var_string1;

var var_rand3＝var_arraybuf4[-1079235270]；var var_rand3 = var_arraybuf4[-1079235270];

然后随机从这个语法库里选取测试代码。比如随机选取如下：Then randomly select test code from this grammar library. For example, randomly select the following:

Var1＝var1.add(1+2)；Var1=var1.add(1+2);

然而，上述三行代码重复，因此造成了测试代码冗余，降低了检测效率。另外，在上述语法库中，出现正确语法的概率比较大，利用正确语法测试浏览器，浏览器不会出现崩溃现象，也就不容易发现漏洞，这也降低了检测效率。However, the above-mentioned three lines of code are repeated, thus resulting in redundant test code and reducing detection efficiency. In addition, in the above-mentioned grammar library, the probability of correct grammar is relatively high. Using the correct grammar to test the browser, the browser will not crash, and it is not easy to find loopholes, which also reduces the detection efficiency.

发明内容SUMMARY OF THE INVENTION

本公开解决的一个技术问题是：提供一种用于浏览器的漏洞测试方法，以提高检测效率。A technical problem solved by the present disclosure is to provide a vulnerability testing method for browsers to improve detection efficiency.

根据本公开的一个方面，提供了一种用于浏览器的漏洞测试方法，包括：从漏洞语法库中选取样本语法数据，对所述样本语法数据进行预处理以获得待训练数据集合；利用深度学习网络模型对所述待训练数据集合进行训练以获得初始待测样本集，所述初始待测样本集包括初始的多个第一待测样本；对所述初始待测样本集中的所述多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本；将所述第二待测样本输入到浏览器以便所述浏览器解析所述第二待测样本，并监控所述浏览器的运行状态；以及在检测到所述浏览器运行出现故障的情况下确定所述第二待测样本为异常样本，产生异常日志数据，并保存所述异常样本和所述异常日志数据。According to one aspect of the present disclosure, there is provided a vulnerability testing method for browsers, including: selecting sample syntax data from a vulnerability syntax library, preprocessing the sample syntax data to obtain a data set to be trained; utilizing depth The learning network model trains the data set to be trained to obtain an initial sample set to be tested, and the initial sample set to be tested includes a plurality of initial samples to be tested; Perform model prediction and assembly processing on at least a part of the first sample to be tested to form a second sample to be tested; input the second sample to be tested into a browser so that the browser can parse the second sample to be tested, and Monitoring the running state of the browser; and determining that the second sample to be tested is an abnormal sample in the case of detecting that the browser is running faulty, generating abnormal log data, and saving the abnormal sample and the abnormality log data.

在一些实施例中，对所述样本语法数据进行预处理的步骤包括：解析所述样本语法数据，并保留所述样本语法数据的固定标签格式；将所述样本语法数据中的函数和变量以单词为单元组成第一数据集合，并对所述第一数据集合中的函数和变量进行去重处理；将所述样本语法数据中的等号、括号和标点符号按照字符组成第二数据集合，并对所述第二数据集合中的等号、括号和标点符号进行去重处理；以及将去重处理后的第一数据集合和去重处理后的第二数据集合合并，并对合并后的数据集合进行向量化处理以生成待训练数据集合。In some embodiments, the step of preprocessing the sample grammar data includes: parsing the sample grammar data and retaining the fixed label format of the sample grammar data; converting the functions and variables in the sample grammar data into The word is a unit to form the first data set, and the functions and variables in the first data set are deduplicated; the equal signs, parentheses and punctuation marks in the sample grammar data are formed into the second data set according to characters, and carry out de-duplication processing to the equal sign, brackets and punctuation marks in the second data set; and combine the first data set after de-duplication processing and the second data set after de-duplication processing, and combine the merged The data set is vectorized to generate a data set to be trained.

在一些实施例中，对所述第一数据集合中的变量进行去重处理的步骤包括：从所述第一数据集合中获取具有相同名称的多个变量；若所述多个变量包括相同类型的变量，则对所述相同类型的变量进行去重处理；以及若所述多个变量包括不同类型的变量，则对所述不同类型的变量分别重新命名以区分所述不同类型的变量。In some embodiments, the step of deduplicating the variables in the first data set includes: obtaining multiple variables with the same name from the first data set; if the multiple variables include the same type If the multiple variables include variables of different types, rename the variables of different types to distinguish the variables of different types.

在一些实施例中，利用深度学习网络模型对所述待训练数据集合进行训练的步骤包括：将所述待训练数据集合导入所述深度学习网络模型的门控循环单元；以及在神经网络损失函数的数值小于预定数值的情况下对所述待训练数据集合进行训练以获得初始待测样本集。In some embodiments, the step of using a deep learning network model to train the set of data to be trained includes: importing the set of data to be trained into a gated loop unit of the deep learning network model; and in a neural network loss function When the value of is less than a predetermined value, the data set to be trained is trained to obtain an initial sample set to be tested.

在一些实施例中，所述预定数值的范围为0.05至0.1。In some embodiments, the predetermined value ranges from 0.05 to 0.1.

在一些实施例中，对所述初始待测样本集中的所述多个第一待测样本的至少一部分进行模型预测和组装处理的步骤包括：以所述第一待测样本的某个变量为起始字符作为预测起点预测起始语句；以所述起始语句为起点，从所述初始待测样本集中依次选取若干行语句，每行语句添加尝试规则模块和/或捕捉规则模块；以及以所述若干行语句为一个单元，对该单元组装标签，以形成第二待测样本。In some embodiments, the step of performing model prediction and assembly processing on at least a part of the plurality of first samples to be tested in the initial sample set to be tested includes: taking a certain variable of the first samples to be tested as The starting character is used as the prediction starting point to predict the starting sentence; with the starting sentence as the starting point, several lines of statements are selected in turn from the initial sample set to be tested, and an attempt rule module and/or a capture rule module are added to each line of the statement; and The several lines of sentences are a unit, and the label is assembled to the unit to form the second sample to be tested.

根据本公开的另一个方面，提供了一种用于浏览器的漏洞测试装置，包括：预处理单元，用于从漏洞语法库中选取样本语法数据，对所述样本语法数据进行预处理以获得待训练数据集合；训练单元，用于利用深度学习网络模型对所述待训练数据集合进行训练以获得初始待测样本集，所述初始待测样本集包括初始的多个第一待测样本；预测组装单元，用于对所述初始待测样本集中的所述多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本；监控单元，用于将所述第二待测样本输入到浏览器以便所述浏览器解析所述第二待测样本，并监控所述浏览器的运行状态；以及确定单元，用于在检测到所述浏览器运行出现故障的情况下确定所述第二待测样本为异常样本，产生异常日志数据，并保存所述异常样本和所述异常日志数据。According to another aspect of the present disclosure, a vulnerability testing apparatus for a browser is provided, comprising: a preprocessing unit, configured to select sample syntax data from a vulnerability syntax library, and preprocess the sample syntax data to obtain a set of data to be trained; a training unit, used for training the set of data to be trained by using a deep learning network model to obtain an initial set of samples to be tested, the initial set of samples to be tested includes a plurality of initial first samples to be tested; A prediction assembling unit for performing model prediction and assembling processing on at least a part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested; a monitoring unit for assembling the first samples to be tested The second sample to be tested is input into the browser, so that the browser can parse the second sample to be tested and monitor the running state of the browser; and a determination unit, configured to detect that the browser is running faulty The next step is to determine that the second sample to be tested is an abnormal sample, generate abnormal log data, and save the abnormal sample and the abnormal log data.

在一些实施例中，所述预处理单元用于解析所述样本语法数据，并保留所述样本语法数据的固定标签格式，将所述样本语法数据中的函数和变量以单词为单元组成第一数据集合，并对所述第一数据集合中的函数和变量进行去重处理，将所述样本语法数据中的等号、括号和标点符号按照字符组成第二数据集合，并对所述第二数据集合中的等号、括号和标点符号进行去重处理，以及将去重处理后的第一数据集合和去重处理后的第二数据集合合并，并对合并后的数据集合进行向量化处理以生成待训练数据集合。In some embodiments, the preprocessing unit is configured to parse the sample grammar data, retain the fixed label format of the sample grammar data, and form the functions and variables in the sample grammar data into a first word in units of words. data set, and perform deduplication processing on the functions and variables in the first data set, form a second data set according to characters in the equal sign, brackets and punctuation marks in the sample grammar data, and perform the second data set on the second data set. The equal signs, parentheses, and punctuation marks in the data set are deduplicated, and the deduplicated first data set and the deduplicated second data set are merged, and the merged data set is vectorized. to generate a set of data to be trained.

在一些实施例中，所述预处理单元用于从所述第一数据集合中获取具有相同名称的多个变量。若所述多个变量包括相同类型的变量，则对所述相同类型的变量进行去重处理，以及若所述多个变量包括不同类型的变量，则对所述不同类型的变量分别重新命名以区分所述不同类型的变量。In some embodiments, the preprocessing unit is configured to obtain a plurality of variables with the same name from the first data set. If the plurality of variables include variables of the same type, perform deduplication processing on the variables of the same type, and if the plurality of variables include variables of different types, rename the variables of the different types respectively to Distinguish the different types of variables.

在一些实施例中，所述训练单元用于将所述待训练数据集合导入所述深度学习网络模型的门控循环单元，在神经网络损失函数的数值小于预定数值的情况下对所述待训练数据集合进行训练以获得初始待测样本集。In some embodiments, the training unit is configured to import the data set to be trained into a gated loop unit of the deep learning network model, and in the case that the value of the neural network loss function is smaller than a predetermined value The data set is trained to obtain the initial set of samples to be tested.

在一些实施例中，所述预测组装单元用于以所述第一待测样本的某个变量为起始字符作为预测起点预测起始语句，以所述起始语句为起点，从所述初始待测样本集中依次选取若干行语句，每行语句添加尝试规则模块和/或捕捉规则模块，以及以所述若干行语句为一个单元，对该单元组装标签，以形成第二待测样本。In some embodiments, the prediction assembling unit is configured to use a certain variable of the first sample to be tested as a starting character as a prediction starting point to predict a starting sentence, and using the starting sentence as a starting point, from the initial Several lines of statements are sequentially selected from the sample set to be tested, and an attempt rule module and/or a capture rule module is added to each line of statements, and the several lines of statements are used as a unit to assemble labels for the unit to form a second sample to be tested.

根据本公开的另一个方面，提供了一种用于浏览器的漏洞测试装置，包括：存储器；以及耦接至所述存储器的处理器，所述处理器被配置为基于存储在所述存储器的指令执行如前所述的方法。According to another aspect of the present disclosure, there is provided a vulnerability testing apparatus for a browser, comprising: a memory; and a processor coupled to the memory, the processor configured to The instruction executes the method as previously described.

根据本公开的另一个方面，提供了一种计算机可读存储介质，其上存储有计算机程序指令，该计算机程序指令被处理器执行时实现如前所述的方法。According to another aspect of the present disclosure, there is provided a computer-readable storage medium having computer program instructions stored thereon, the computer program instructions implementing the aforementioned method when executed by a processor.

在上述漏洞测试方法中，从漏洞语法库中选取样本语法数据，对样本语法数据进行预处理以获得待训练数据集合；利用深度学习网络模型对待训练数据集合进行训练以获得初始待测样本集，所述初始待测样本集包括初始的多个第一待测样本；对初始待测样本集中的多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本；将第二待测样本输入到浏览器以便浏览器解析第二待测样本，并监控浏览器的运行状态；以及在检测到浏览器运行出现故障的情况下确定第二待测样本为异常样本，产生异常日志数据，并保存异常样本和异常日志数据。该方法实现了对浏览器的漏洞测试。由于该方法中从漏洞语法库中获得样本并对样本进行训练从而获得可能有漏洞的新样本，将这样的样本用于浏览器的漏洞测试，可以提高浏览器的漏洞检测效率。In the above vulnerability testing method, sample grammar data is selected from the vulnerability grammar database, and the sample grammar data is preprocessed to obtain a set of data to be trained; a deep learning network model is used to train the set of training data to obtain an initial set of samples to be tested, The initial sample set to be tested includes a plurality of initial samples to be tested; model prediction and assembly processing are performed on at least a part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested; The second sample to be tested is input into the browser so that the browser can parse the second sample to be tested and monitor the running state of the browser; and when it is detected that the browser is running faulty, the second sample to be tested is determined to be an abnormal sample, and generates Exception log data, and save exception samples and exception log data. This method implements vulnerability testing of browsers. Since the method obtains samples from the vulnerability grammar library and trains the samples to obtain new samples that may have vulnerabilities, using such samples for browser vulnerability testing can improve the browser's vulnerability detection efficiency.

通过以下参照附图对本公开的示例性实施例的详细描述，本公开的其它特征及其优点将会变得清楚。Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings.

附图说明Description of drawings

构成说明书的一部分的附图描述了本公开的实施例，并且连同说明书一起用于解释本公开的原理。The accompanying drawings, which form a part of the specification, illustrate embodiments of the present disclosure and together with the description serve to explain the principles of the present disclosure.

参照附图，根据下面的详细描述，可以更加清楚地理解本公开，其中：The present disclosure may be more clearly understood from the following detailed description with reference to the accompanying drawings, wherein:

图1是示出根据本公开一些实施例的用于浏览器的漏洞测试方法的流程图；FIG. 1 is a flowchart illustrating a vulnerability testing method for a browser according to some embodiments of the present disclosure;

图2是示出根据本公开另一些实施例的用于浏览器的漏洞测试方法的流程图；FIG. 2 is a flowchart illustrating a vulnerability testing method for a browser according to other embodiments of the present disclosure;

图3是示出根据本公开一些实施例的用于浏览器的漏洞测试装置的结构示意图；3 is a schematic structural diagram illustrating a vulnerability testing apparatus for a browser according to some embodiments of the present disclosure;

图4是示出根据本公开另一些实施例的用于浏览器的漏洞测试装置的结构示意图；FIG. 4 is a schematic structural diagram illustrating a vulnerability testing apparatus for a browser according to other embodiments of the present disclosure;

图5是示出根据本公开另一些实施例的用于浏览器的漏洞测试装置的结构示意图。FIG. 5 is a schematic structural diagram illustrating a vulnerability testing apparatus for a browser according to other embodiments of the present disclosure.

具体实施方式Detailed ways

现在将参照附图来详细描述本公开的各种示例性实施例。应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

同时，应当明白，为了便于描述，附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。Meanwhile, it should be understood that, for the convenience of description, the dimensions of various parts shown in the accompanying drawings are not drawn in an actual proportional relationship.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本公开及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses in any way.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.

在这里示出和讨论的所有示例中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它示例可以具有不同的值。In all examples shown and discussed herein, any specific value should be construed as illustrative only and not as limiting. Accordingly, other examples of exemplary embodiments may have different values.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.

图1是示出根据本公开一些实施例的用于浏览器的漏洞测试方法的流程图。如图1所示，该方法包括步骤S102至S110。FIG. 1 is a flowchart illustrating a vulnerability testing method for a browser according to some embodiments of the present disclosure. As shown in FIG. 1 , the method includes steps S102 to S110.

在步骤S102，从漏洞语法库中选取样本语法数据，对样本语法数据进行预处理以获得待训练数据集合。In step S102, sample grammar data is selected from the vulnerability grammar database, and the sample grammar data is preprocessed to obtain a data set to be trained.

在一些实施例中，对样本语法数据进行预处理的步骤包括：解析样本语法数据，并保留样本语法数据的固定标签格式；将样本语法数据中的函数和变量以单词为单元组成第一数据集合，并对第一数据集合中的函数和变量进行去重处理；将样本语法数据中的等号、括号和标点符号按照字符组成第二数据集合，并对第二数据集合中的等号、括号和标点符号进行去重处理；以及将去重处理后的第一数据集合和去重处理后的第二数据集合合并，并对合并后的数据集合进行向量化处理以生成待训练数据集合。在该实施例中，通过去重处理，可以提高后续训练过程中的准确率。In some embodiments, the step of preprocessing the sample grammar data includes: parsing the sample grammar data, and retaining the fixed label format of the sample grammar data; forming a first data set with functions and variables in the sample grammar data in units of words , and perform deduplication processing on the functions and variables in the first data set; the equal signs, parentheses and punctuation marks in the sample grammar data are formed into a second data set according to characters, and the equal signs, parentheses in the second data set Perform deduplication processing with punctuation marks; and combine the deduplicated first data set and the deduplicated second data set, and perform vectorization processing on the combined data set to generate a to-be-trained data set. In this embodiment, through deduplication processing, the accuracy rate in the subsequent training process can be improved.

在一些实施例中，固定标签格式可以包括：<html>标签和/或<script>。可选地，固定标签格式还可以包括：<title>标签和/或<head>标签。In some embodiments, the fixed tag format may include: <html> tags and/or <script>. Optionally, the fixed tag format may further include: <title> tag and/or <head> tag.

在一些实施例中，上面所述的函数可以包括：函数名、函数方法和函数参数等。In some embodiments, the functions described above may include: function names, function methods, and function parameters.

在一些实施例中，对第一数据集合中的变量进行去重处理的步骤包括：从第一数据集合中获取具有相同名称的多个变量；若该多个变量包括相同类型的变量，则对该相同类型的变量进行去重处理；以及若该多个变量包括不同类型的变量，则对该不同类型的变量分别重新命名以区分不同类型的变量。In some embodiments, the step of deduplicating the variables in the first data set includes: obtaining multiple variables with the same name from the first data set; if the multiple variables include variables of the same type, Perform de-duplication processing on variables of the same type; and if the plurality of variables include variables of different types, rename the variables of different types to distinguish the variables of different types.

即，对于上述具有相同名称的多个变量，若某些变量的类型相同，则去重，保留这些类型相同的变量中的一个即可，若某些变量的类型不同，则将这些不同类型的变量分别重新命名，例如修改为变量_1，变量_2，以此类推，从而可以区分不同类型的变量。这样即实现了去重操作，又防止由于将不同类型的变量也去重而造成数据丢失。That is, for the above-mentioned multiple variables with the same name, if some variables are of the same type, remove the duplication and keep one of these variables of the same type. Variables are renamed respectively, for example, modified to variable_1, variable_2, and so on, so that different types of variables can be distinguished. In this way, the deduplication operation is realized, and data loss due to deduplication of variables of different types is also prevented.

在步骤S104，利用深度学习网络模型对待训练数据集合进行训练以获得初始待测样本集。该初始待测样本集包括初始的多个第一待测样本。第一待测样本为初始的待测试样本。In step S104, a deep learning network model is used to train the data set to be trained to obtain an initial sample set to be tested. The initial sample set to be tested includes an initial plurality of first samples to be tested. The first sample to be tested is the initial sample to be tested.

在一些实施例中，该步骤S104包括：将待训练数据集合导入深度学习网络模型的门控循环单元；以及在神经网络损失函数的数值小于预定数值的情况下对待训练数据集合进行训练以获得初始待测样本集。例如，预定数值的范围可以为0.05至0.1。例如，预定数值可以为0.1。当然，本领域技术人员能够理解，该预定数值仅是示例性的，本公开的实施例还可以采用其他数值范围的预定数值，因此，本公开并不仅限于此。在该实施例中，通过该训练过程产生的初始待测样本集具有很高的测试使用价值。In some embodiments, this step S104 includes: importing the data set to be trained into the gated loop unit of the deep learning network model; and training the data set to be trained to obtain the initial value when the value of the neural network loss function is less than a predetermined value sample set to be tested. For example, the predetermined value may range from 0.05 to 0.1. For example, the predetermined value may be 0.1. Of course, those skilled in the art can understand that the predetermined value is only exemplary, and the embodiments of the present disclosure may also adopt predetermined values in other numerical ranges, and therefore, the present disclosure is not limited thereto. In this embodiment, the initial sample set to be tested generated through the training process has a high test use value.

这里，深度学习网络模型可以使用本领域技术人员已知的循环神经网络(Recurrent Neural Network，简称为RNN)模型。RNN算法已经被应用到文本预测等领域，这里用来浏览器的漏洞模糊测试(fuzz)具有明显优势。在上述步骤中，将批次处理的字符流数据集合导入门控循环单元中，在训练后，以神经网络损失函数数值小于0.1作为训练目标进行训练，可以产生具有很高的测试使用价值的初始待测样本集。Here, the deep learning network model may use a recurrent neural network (Recurrent Neural Network, RNN for short) model known to those skilled in the art. The RNN algorithm has been applied to fields such as text prediction, and the vulnerability fuzzing (fuzz) used here for browsers has obvious advantages. In the above steps, the batch-processed character stream data set is imported into the gated loop unit. After training, the neural network loss function value less than 0.1 is used as the training target for training, which can generate an initial value with high test use value. sample set to be tested.

在步骤S106，对初始待测样本集中的多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本。这里，第二待测样本即为用于浏览器的漏洞模糊测试的完整的待测试样本。In step S106, model prediction and assembly processing are performed on at least a part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested. Here, the second sample to be tested is a complete sample to be tested for vulnerability fuzzing of browsers.

在一些实施例中，该步骤S106包括：以第一待测样本的某个变量为起始字符作为预测起点预测起始语句；以起始语句为起点，从初始待测样本集中依次选取若干行语句，每行语句添加尝试规则模块(即try{}模块)和/或捕捉规则模块(即catch()模块)；以及以该若干行语句为一个单元，对该单元组装标签(例如<html>标签、<head>标签、<script>标签或<body>标签等)，以形成第二待测样本。这里，在待测样本中增添try{}catch()规则，可以避免某行解析失败后此样本被丢弃，从而可以充分高效率的利用训练后的待测样本。In some embodiments, this step S106 includes: predicting a starting sentence with a variable of the first sample to be tested as a starting character as a prediction starting point; using the starting sentence as a starting point, sequentially selecting several lines from the initial sample set to be tested statement, add a try rule module (i.e. try{} module) and/or catch rule module (i.e. catch() module) to each line of statements; and take the several lines of statements as a unit, assemble tags (such as <html> for the unit) tag, <head> tag, <script> tag or <body> tag, etc.) to form the second sample to be tested. Here, adding a try{}catch() rule to the sample to be tested can prevent the sample from being discarded after a row fails to parse, so that the sample to be tested after training can be fully and efficiently used.

例如，如下面几行语句：For example, in the following lines:

Line1.try{var_arraybuf1.length＝1；}catch(e){}Line1.try{var_arraybuf1.length=1;}catch(e){}

Line2.try{var_dataview2.length＝8；}catch(e){}Line2.try{var_dataview2.length=8;}catch(e){}

Line3.try{var_arraybuf2.__proto__＝2.3023e-320；}catch(e){}Line3.try{var_arraybuf2.__proto__=2.3023e-320;}catch(e){}

Line4.try{var_int2＝var_int3.supportedLocalesOf(var_date1[55073])；}catch(e){}Line4.try{var_int2=var_int3.supportedLocalesOf(var_date1[55073]);}catch(e){}

如果不添加try{}catch()，比如Line2是一个错误语法，那么浏览器解析到第二行就停止了解析，此样本就被丢弃，不再解析Line3和Line4。而添加try{}catch()后，如果Line2是一个错误语法，浏览器会放弃Line2，继续尝试解析Line3和Line4。此样本会被继续解析直到所有语法结束。因此，将待测样本增添try{}catch()规则，可以避免某行解析失败后此样本被丢弃的问题。If try{}catch() is not added, for example, Line2 is an incorrect syntax, then the browser stops parsing when it reaches the second line, and the sample is discarded, and Line3 and Line4 are no longer parsed. After adding try{}catch(), if Line2 is an incorrect syntax, the browser will give up Line2 and continue to try to parse Line3 and Line4. The sample will continue to be parsed until all grammars are complete. Therefore, adding the try{}catch() rule to the sample to be tested can avoid the problem that the sample is discarded after the parsing of a line fails.

在步骤S108，将第二待测样本输入到浏览器以便浏览器解析第二待测样本，并监控浏览器的运行状态。In step S108, the second sample to be tested is input into the browser so that the browser can parse the second sample to be tested and monitor the running state of the browser.

例如，可以使用python脚本监控维持浏览器运行，并将待测样本自动投喂给浏览器。这里，python脚本可以采用github平台上已知的脚本，该脚本可以维持浏览器运行状态。For example, a python script can be used to monitor and keep the browser running and automatically feed the sample to be tested to the browser. Here, the python script can be a known script on the github platform, which can maintain the running state of the browser.

在步骤S110，在检测到浏览器运行出现故障的情况下确定第二待测样本为异常样本，产生异常日志数据，并保存该异常样本和该异常日志数据。In step S110, in the case of detecting that the browser is running faulty, it is determined that the second sample to be tested is an abnormal sample, abnormal log data is generated, and the abnormal sample and the abnormal log data are saved.

例如，使用CERT/CC(Computer Emergency Response Team/CoordinationCenter，美国计算机紧急事件响应小组协调中心)Basic Fuzzing Framework(基本模糊框架，简称为BFF)框架中的msec.py脚本可以记录浏览器崩溃的现象以及崩溃时CPU(centralprocessing unit，中央处理器)的各个寄存器的状态(这些现象和状态可以作为异常日志数据)。并将此状态和造成浏览器崩溃的poc(Proof of Concept，概念验证)代码一同保存到crashed文件夹下。为了方便查看，命名规则可以为：1.html、1.log；2.html、2.log以此类推；其中html保存了JavaScript代码(可以作为异常样本)，log文件保存了此时造成浏览器崩溃的cpu的寄存器状态(可以作为异常日志数据)。For example, using the msec.py script in the CERT/CC (Computer Emergency Response Team/CoordinationCenter) Basic Fuzzing Framework (Basic Fuzzing Framework, referred to as BFF) framework can record the phenomenon of browser crashes and The state of each register of the CPU (central processing unit, central processing unit) at the time of the crash (these phenomena and states can be used as abnormal log data). And save this state and the POC (Proof of Concept) code that caused the browser crash to the crashed folder. In order to facilitate viewing, the naming rules can be: 1.html, 1.log; 2.html, 2.log and so on; html saves the JavaScript code (which can be used as an abnormal sample), and the log file saves the browser caused by this time. The register state of the crashed cpu (can be used as exception log data).

在一些实施例中，还可以将上述测试结果返回给客户端。In some embodiments, the above test result may also be returned to the client.

至此，提供了根据本公开一些实施例的用于浏览器的漏洞测试方法。该漏洞测试方法包括：从漏洞语法库中选取样本语法数据，对样本语法数据进行预处理以获得待训练数据集合；利用深度学习网络模型对待训练数据集合进行训练以获得初始待测样本集，所述初始待测样本集包括初始的多个第一待测样本；对初始待测样本集中的多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本；将第二待测样本输入到浏览器以便浏览器解析第二待测样本，并监控浏览器的运行状态；以及在检测到浏览器运行出现故障(例如崩溃)的情况下确定第二待测样本为异常样本，产生异常日志数据，并保存异常样本和异常日志数据。该方法实现了对浏览器的漏洞测试。由于该方法中从漏洞语法库中获得样本并对样本进行训练从而获得可能有漏洞的新样本，将这样的样本用于浏览器的漏洞测试，可以提高浏览器的漏洞检测效率。So far, vulnerability testing methods for browsers according to some embodiments of the present disclosure are provided. The vulnerability testing method includes: selecting sample grammar data from a vulnerability grammar database, preprocessing the sample grammar data to obtain a data set to be trained; using a deep learning network model to train the data set to be trained to obtain an initial sample set to be tested, so The initial sample set to be tested includes a plurality of initial samples to be tested; model prediction and assembly processing are performed on at least a part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested; The second sample to be tested is input into the browser so that the browser can parse the second sample to be tested, and monitor the running state of the browser; and determine that the second sample to be tested is abnormal in the case of detecting that the browser runs faulty (eg crash) Samples, generate abnormal log data, and save abnormal samples and abnormal log data. This method implements vulnerability testing of browsers. Since the method obtains samples from the vulnerability grammar library and trains the samples to obtain new samples that may have vulnerabilities, using such samples for browser vulnerability testing can improve the browser's vulnerability detection efficiency.

在一些实施例中，若检测到浏览器崩溃则记录，可以初步分析造成浏览器崩溃的原因。例如，由于保存了异常样本和该异常日志数据，可以使用浏览器重新打开crashed文件夹下的html文件，进行详细的分析，分析造成浏览器崩溃的原因，并分析此崩溃漏洞是否造成堆溢出、是否造成代码可执行等问题。In some embodiments, if a browser crash is detected, it is recorded, and the cause of the browser crash can be preliminarily analyzed. For example, since the abnormal sample and the abnormal log data are saved, you can use the browser to reopen the html file in the crashed folder for detailed analysis, analyze the cause of the browser crash, and analyze whether the crash vulnerability causes heap overflow, Whether it causes problems such as code executable.

本公开的发明人发现，现有技术中还存在样本深度不足、语法环境聚合性差和语法规则聚合性差的问题。这里，样本深度取决于语法库的语法数量，如果语法库的语法数量比较少，那么测试深度就不足。语法环境聚合性差：生成的样本中，语法环境上下文关系取决于语法规则模板，不能形成有效的语法关系组合。语法规则聚合性差：基于随机种子和语法规则模板，模糊测试大概率产生的变量、属性、函数关系、参数等不能形成正确的对应关系，导致浏览器解析停止，降低检测效率。The inventors of the present disclosure found that the prior art still has the problems of insufficient sample depth, poor aggregation of grammatical environments, and poor aggregation of grammatical rules. Here, the sample depth depends on the number of grammars in the grammar library. If the grammar number of the grammar library is relatively small, the test depth is insufficient. Poor syntactic environment aggregation: In the generated samples, the syntactic environment context relationship depends on the grammar rule template, which cannot form an effective syntactic relationship combination. Poor grammatical rules aggregation: Based on random seeds and grammatical rule templates, the variables, attributes, functional relationships, parameters, etc. generated by fuzzing with high probability cannot form correct corresponding relationships, causing browser parsing to stop and reducing detection efficiency.

而本公开实施例的上述方法中，使用RNN算法预测可以提高javascript语法规则的准确性，并且使用大量的已知漏洞的样本进行训练，参考训练条件的损失函数值较小(例如，神经网络损失函数数值小于0.1)，所以训练出来的样本出现测试代码冗余、样本深度不足、语法环境聚合性差和语法规则聚合性差的情况概率较低，因此上述方法可以尽可能地解决现有技术的上述问题。再者，训练是基于漏洞库样本执行的，使用含有漏洞的样本训练预测出来的样本模型进行漏洞模糊测试，可以加大发现浏览器漏洞的概率。However, in the above method of the embodiment of the present disclosure, using the RNN algorithm to predict can improve the accuracy of javascript grammar rules, and using a large number of samples of known vulnerabilities for training, the loss function value of the reference training condition is small (for example, the loss function of the neural network The value of the function is less than 0.1), so the training samples have a low probability of test code redundancy, insufficient sample depth, poor syntactic environment aggregation and poor syntactic rule aggregation. Therefore, the above method can solve the above problems of the prior art as much as possible . Furthermore, the training is performed based on the vulnerability library samples, and the vulnerability fuzzing can be performed by using the sample model with the vulnerability to train the predicted sample model, which can increase the probability of discovering browser vulnerabilities.

图2是示出根据本公开另一些实施例的用于浏览器的漏洞测试方法的流程图。FIG. 2 is a flowchart illustrating a vulnerability testing method for a browser according to other embodiments of the present disclosure.

首先，进行样本预处理。First, perform sample preprocessing.

例如，如图2所示，选取公开漏洞库的poc样本数据作为训练模板。例如，4000个poc样本数据为一组。解析样本时，将html样本固定标签格式保留，例如保留<html>、<title>、<head>和<script>等标签。For example, as shown in Figure 2, the POC sample data of the public vulnerability library is selected as the training template. For example, 4000 poc sample data is a set. When parsing the sample, keep the html sample fixed tag format, such as <html>, <title>, <head>, and <script> tags.

如图2所示，可以将js(即JavaScript)语法的变量、函数名、函数方法和函数参数等以单词为单元组成第一数据集，并进行去重处理。将等号、括号和标点符号等按照字符组成第二数据集，并进行去重处理。As shown in FIG. 2 , the variables, function names, function methods, and function parameters of js (that is, JavaScript) syntax can be composed of words as units to form a first data set, and deduplication processing is performed. The equal sign, parentheses, and punctuation marks are formed into a second data set according to characters, and deduplication is performed.

然后，如图2所示，将第一数据集和第二数据集并入数据集中并进行向量化处理，以获得待训练数据集合，也可以称为字符序列。Then, as shown in FIG. 2 , the first data set and the second data set are merged into the data set and vectorized to obtain a data set to be trained, which may also be called a character sequence.

在一些实施例中，可以使用one-hot(独热编码)编码方式进行向量化处理。例如：In some embodiments, one-hot encoding can be used for vectorization. E.g:

第一数据集：array1[]＝{var_1,var_2,var_3,var_4}，The first dataset: array1[]={var_1,var_2,var_3,var_4},

第二数据集：array2[]＝{‘＝’，’(’，’)’}，Second dataset: array2[]={'=','(',')'},

合并起来得到：array[]＝{var_1,var_2,var_3,var_4,‘＝’，’(’，’)’}，Combined to get: array[]={var_1,var_2,var_3,var_4,'=','(',')'},

向量化数组＝one-hot(array)。vectorized array = one-hot(array).

这里，采用one-hot编码，可以提高后续预测过程中的正确性。Here, using one-hot encoding can improve the accuracy in the subsequent prediction process.

接下来，如图2所示，采用已知的循环神经网络算法RNN进行模型训练以获得初始待测样本集。例如，将批次处理的字符流数据集合导入GRU(gated recurrent unit，门控循环单元)门控中。训练后，在神经网络损失函数数值小于0.1的情况下训练完成目标。这样产生的目标样本(即初始待测样本集)具有很高的测试使用价值。Next, as shown in Figure 2, a known recurrent neural network algorithm RNN is used for model training to obtain an initial sample set to be tested. For example, the batched character stream data set is imported into the GRU (gated recurrent unit, gated recurrent unit) gate. After training, the training completes the target when the value of the neural network loss function is less than 0.1. The target samples (ie, the initial sample set to be tested) generated in this way have high test use value.

接下来，如图2所示，对初始待测样本集中的多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本。Next, as shown in FIG. 2 , model prediction and assembly processing are performed on at least a part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested.

例如，以第一待测样本的某个变量“var1＝”(当然，也可以是其他变量)为起始字符作为预测起点来预测起始语句；以起始语句为起点，从初始待测样本集中依次选取若干行(例如1000行)语句，每行语句添加try{}catch()模块，以确保每行语句都可被浏览器执行；以及以该若干行语句为一个单元，对该单元组装<html>标签，以形成完整的第二待测样本。当然，也可以组装<head>标签、<script>标签或<body>标签。For example, a certain variable "var1=" (of course, other variables) of the first sample to be tested is used as the starting character to predict the starting sentence; Select several lines (for example, 1000 lines) of statements in turn, and add a try{}catch() module to each line of statements to ensure that each line of statements can be executed by the browser; and use the several lines of statements as a unit to assemble the unit <html> tag to form a complete second sample to be tested. Of course, you can also assemble <head> tags, <script> tags or <body> tags.

接下来，如图2所示，使用python脚本监控维持浏览器运行，并将第二待测样本自动投喂给浏览器。若正常则浏览器放行，若检测到浏览器崩溃则记录初步分析造成浏览器崩溃的原因，并保存此异常样本。Next, as shown in Figure 2, use python script monitoring to keep the browser running, and automatically feed the second sample to be tested to the browser. If it is normal, the browser will be released. If a browser crash is detected, the cause of the browser crash will be recorded and the abnormal sample will be saved.

至此，提供了根据本公开另一些实施例的用于浏览器的漏洞测试方法。在该方法中，收集大量已知浏览器的漏洞poc样本数据放入数据集合中作为训练源，将训练源导入深度学习机器中，使用循环神经网络算法以及GRU门控单元训练后，产生语法功能复杂的初始待测样本集，并将初始待测样本增添try{}catch()规则以避免某行解析失败后此样本被丢弃的命运，从而充分高效率的利用了训练后的待测样本，利用<html>等标签组装网页模型，并编写python后台检测模块，将待测样本投喂给浏览器，检测完毕后保留异常日志数据以及异常样本文件，生成初步的分析报告，并向客户端返回结果。该方法可以尽可能地解决现有技术中浏览器漏洞检测过程中生成的测试代码冗余、样本深度不足、语法环境聚合性差、语法规则聚合性差等问题，从而提升了浏览器漏洞模糊测试技术。而且上述方法解决了现有技术中单纯依靠随机模糊测试的盲目性、检测深度不够和代码覆盖率较低的问题。So far, vulnerability testing methods for browsers according to other embodiments of the present disclosure are provided. In this method, a large number of vulnerability poc sample data of known browsers are collected and put into the data set as the training source, the training source is imported into the deep learning machine, and the grammatical function is generated after training with the recurrent neural network algorithm and the GRU gating unit The complex initial sample set to be tested, and the try{}catch() rule is added to the initial sample to be tested to avoid the fate of the sample being discarded after a line parsing failure, so that the trained sample to be tested is fully and efficiently used, Use tags such as <html> to assemble web page models, write a python background detection module, feed the samples to be tested to the browser, retain abnormal log data and abnormal sample files after detection, generate a preliminary analysis report, and return it to the client result. The method can solve the problems of redundant test code, insufficient sample depth, poor syntactic environment aggregation, and poor syntactic rule aggregation in the prior art as much as possible during the browser vulnerability detection process, thereby improving the browser vulnerability fuzzing technology. Moreover, the above method solves the problems of blindness, insufficient detection depth and low code coverage that rely solely on random fuzzing tests in the prior art.

图3是示出根据本公开一些实施例的用于浏览器的漏洞测试装置的结构示意图。如图3所示，该漏洞测试装置包括预处理单元302、训练单元304、预测组装单元306、监控单元308和确定单元310。FIG. 3 is a schematic structural diagram illustrating a vulnerability testing apparatus for a browser according to some embodiments of the present disclosure. As shown in FIG. 3 , the vulnerability testing apparatus includes a preprocessing unit 302 , a training unit 304 , a prediction assembling unit 306 , a monitoring unit 308 and a determining unit 310 .

预处理单元302用于从漏洞语法库中选取样本语法数据，对样本语法数据进行预处理以获得待训练数据集合。The preprocessing unit 302 is configured to select sample syntax data from the vulnerability syntax database, and preprocess the sample syntax data to obtain a data set to be trained.

在一些实施例中，预处理单元302可以用于解析样本语法数据，并保留样本语法数据的固定标签格式，将样本语法数据中的函数和变量以单词为单元组成第一数据集合，并对第一数据集合中的函数和变量进行去重处理，将样本语法数据中的等号、括号和标点符号按照字符组成第二数据集合，并对第二数据集合中的等号、括号和标点符号进行去重处理，以及将去重处理后的第一数据集合和去重处理后的第二数据集合合并，并对合并后的数据集合进行向量化处理以生成待训练数据集合。In some embodiments, the preprocessing unit 302 may be configured to parse the sample grammar data, retain the fixed label format of the sample grammar data, form a first data set with functions and variables in the sample grammar data in units of words, and analyze the first data set for the first data set. The functions and variables in a data set are deduplicated, the equal signs, parentheses and punctuation marks in the sample grammar data are formed into a second data set according to characters, and the equal signs, parentheses and punctuation marks in the second data set are processed. Deduplication processing, and combining the deduplicated first data set and the deduplicated second data set, and performing vectorization processing on the combined data set to generate a to-be-trained data set.

在一些实施例中，预处理单元302可以用于从第一数据集合中获取具有相同名称的多个变量。若该多个变量包括相同类型的变量，则对相同类型的变量进行去重处理，以及若该多个变量包括不同类型的变量，则对不同类型的变量分别重新命名以区分该不同类型的变量。In some embodiments, the preprocessing unit 302 may be configured to obtain multiple variables with the same name from the first data set. If the plurality of variables include variables of the same type, perform deduplication processing on the variables of the same type, and if the plurality of variables include variables of different types, rename the variables of the different types respectively to distinguish the variables of the different types .

训练单元304用于利用深度学习网络模型对待训练数据集合进行训练以获得初始待测样本集，该初始待测样本集包括初始的多个第一待测样本。The training unit 304 is configured to perform training on the data set to be trained by using the deep learning network model to obtain an initial sample set to be tested, where the initial sample set to be tested includes a plurality of initial samples to be tested.

在一些实施例中，训练单元304可以用于将待训练数据集合导入深度学习网络模型的门控循环单元，在神经网络损失函数的数值小于预定数值的情况下对待训练数据集合进行训练以获得初始待测样本集。例如，预定数值的范围为0.05至0.1。In some embodiments, the training unit 304 may be configured to import the data set to be trained into the gated loop unit of the deep learning network model, and train the data set to be trained to obtain the initial value when the value of the neural network loss function is less than a predetermined value. sample set to be tested. For example, the predetermined numerical value ranges from 0.05 to 0.1.

预测组装单元306用于对初始待测样本集中的多个第一待测样本的至少一部分进行模型预测和组装处理以形成第二待测样本。The prediction assembling unit 306 is configured to perform model prediction and assembly processing on at least a part of the multiple first samples to be tested in the initial sample set to be tested to form a second sample to be tested.

在一些实施例中，预测组装单元306可以用于以第一待测样本的某个变量为起始字符作为预测起点预测起始语句，以起始语句为起点，从初始待测样本集中依次选取若干行语句，每行语句添加尝试规则模块和/或捕捉规则模块，以及以该若干行语句为一个单元，对该单元组装标签，以形成第二待测样本。In some embodiments, the prediction assembling unit 306 may be configured to use a certain variable of the first sample to be tested as a starting character as a starting point of prediction to predict a starting sentence, and use the starting sentence as a starting point to sequentially select from the initial sample set to be tested Several lines of statements, each line of statements is added with an attempt rule module and/or a capture rule module, and the several lines of statements are used as a unit, and a label is assembled for the unit to form a second sample to be tested.

监控单元308用于将第二待测样本输入到浏览器以便浏览器解析第二待测样本，并监控浏览器的运行状态。The monitoring unit 308 is configured to input the second sample to be tested into the browser, so that the browser can parse the second sample to be tested, and monitor the running state of the browser.

确定单元310用于在检测到浏览器运行出现故障的情况下确定第二待测样本为异常样本，产生异常日志数据，并保存异常样本和异常日志数据。The determining unit 310 is configured to determine that the second sample to be tested is an abnormal sample in the case of detecting a browser running failure, generate abnormal log data, and save the abnormal sample and the abnormal log data.

至此，提供了根据本公开一些实施例的用于浏览器的漏洞测试装置。该漏洞测试装置实现了对浏览器的漏洞测试。由于该漏洞测试装置中从漏洞语法库中获得样本并对样本进行训练从而获得可能有漏洞的新样本，将这样的样本用于浏览器的漏洞测试，可以提高浏览器的漏洞检测效率。So far, a vulnerability testing apparatus for a browser according to some embodiments of the present disclosure is provided. The vulnerability testing device realizes the vulnerability testing of browsers. Since the vulnerability testing device obtains samples from the vulnerability grammar library and trains the samples to obtain new samples that may have vulnerabilities, using such samples for browser vulnerability testing can improve browser vulnerability detection efficiency.

图4是示出根据本公开另一些实施例的用于浏览器的漏洞测试装置的结构示意图。该漏洞测试装置包括存储器410和处理器420。其中：FIG. 4 is a schematic structural diagram illustrating a vulnerability testing apparatus for a browser according to other embodiments of the present disclosure. The vulnerability testing apparatus includes a memory 410 and a processor 420 . in:

存储器410可以是磁盘、闪存或其它任何非易失性存储介质。存储器用于存储图1和/或图2所对应实施例中的指令。The memory 410 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used to store the instructions in the embodiment corresponding to FIG. 1 and/or FIG. 2 .

处理器420耦接至存储器410，可以作为一个或多个集成电路来实施，例如微处理器或微控制器。该处理器420用于执行存储器中存储的指令，通过从漏洞语法库中获得样本并对样本进行训练从而获得可能有漏洞的新样本，将这样的样本用于浏览器的漏洞测试，可以提高浏览器的漏洞检测效率。The processor 420 is coupled to the memory 410 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 420 is used to execute the instructions stored in the memory, and obtain new samples that may have vulnerabilities by obtaining samples from the vulnerability grammar library and training the samples. Using such samples for browser vulnerability testing can improve browsing performance. The vulnerability detection efficiency of the device.

在一些实施例中，还可以如图5所示，漏洞测试装置500包括存储器510和处理器520。处理器520通过BUS总线530耦合至存储器510。该漏洞测试装置500还可以通过存储接口540连接至外部存储装置550以便调用外部数据，还可以通过网络接口560连接至网络或者另外一台计算机系统(未标出)，此处不再进行详细介绍。In some embodiments, as shown in FIG. 5 , the vulnerability testing apparatus 500 includes a memory 510 and a processor 520 . Processor 520 is coupled to memory 510 through BUS 530 . The vulnerability testing device 500 can also be connected to the external storage device 550 through the storage interface 540 to call external data, and can also be connected to the network or another computer system (not shown) through the network interface 560, which will not be described in detail here. .

在该实施例中，通过存储器存储数据指令，再通过处理器处理上述指令，通过从漏洞语法库中获得样本并对样本进行训练从而获得可能有漏洞的新样本，将这样的样本用于浏览器的漏洞测试，可以提高浏览器的漏洞检测效率。In this embodiment, the data instructions are stored in the memory, the above instructions are processed by the processor, and new samples with possible vulnerabilities are obtained by obtaining samples from the vulnerability grammar library and training the samples, and such samples are used in the browser. It can improve the efficiency of browser vulnerability detection.

在一些实施例中，本公开还提供了一种计算机可读存储介质，其上存储有计算机程序指令，该指令被处理器执行时实现图1和/或图2所对应实施例中的方法的步骤。本领域内的技术人员应明白，本公开的实施例可提供为方法、装置、或计算机程序产品。因此，本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。In some embodiments, the present disclosure also provides a computer-readable storage medium on which computer program instructions are stored, and when the instructions are executed by a processor, implement the method in the embodiment corresponding to FIG. 1 and/or FIG. 2 . step. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .

本公开是参照根据本公开实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

至此，已经详细描述了本公开。为了避免遮蔽本公开的构思，没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述，完全可以明白如何实施这里公开的技术方案。So far, the present disclosure has been described in detail. Some details that are well known in the art are not described in order to avoid obscuring the concept of the present disclosure. Those skilled in the art can fully understand how to implement the technical solutions disclosed herein based on the above description.

虽然已经通过示例对本公开的一些特定实施例进行了详细说明，但是本领域的技术人员应该理解，以上示例仅是为了进行说明，而不是为了限制本公开的范围。本领域的技术人员应该理解，可在不脱离本公开的范围和精神的情况下，对以上实施例进行修改。本公开的范围由所附权利要求来限定。While some specific embodiments of the present disclosure have been described in detail by way of examples, those skilled in the art will appreciate that the above examples are provided for illustration only, and are not intended to limit the scope of the present disclosure. Those skilled in the art will appreciate that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A vulnerability testing method for a browser comprises the following steps:

selecting sample grammar data from a vulnerability grammar library, and preprocessing the sample grammar data to obtain a to-be-trained data set;

training the data set to be trained by using a deep learning network model to obtain an initial sample set to be tested, wherein the initial sample set to be tested comprises a plurality of initial first samples to be tested;

performing model prediction and assembly processing on at least one part of the plurality of first samples to be tested in the initial sample set to be tested to form a second sample to be tested;

inputting the second sample to be tested into a browser so that the browser can analyze the second sample to be tested and monitor the running state of the browser; and

and under the condition that the operation fault of the browser is detected, determining that the second sample to be detected is an abnormal sample, generating abnormal log data, and storing the abnormal sample and the abnormal log data.

2. The vulnerability testing method of claim 1, wherein the step of preprocessing the sample grammar data comprises:

analyzing the sample grammar data and reserving a fixed label format of the sample grammar data;

forming a first data set by taking words as units for functions and variables in the sample grammatical data, and performing duplicate removal processing on the functions and variables in the first data set;

forming a second data set by using the equal signs, the brackets and the punctuation marks in the sample grammar data according to characters, and performing de-duplication processing on the equal signs, the brackets and the punctuation marks in the second data set; and

merging the first data set after the deduplication processing and the second data set after the deduplication processing, and carrying out vectorization processing on the merged data set to generate a data set to be trained.

3. The vulnerability testing method of claim 2, wherein the step of de-duplicating the variables in the first data set comprises:

obtaining a plurality of variables with the same name from the first data set;

if the variables comprise the variables of the same type, performing duplicate removal processing on the variables of the same type; and

and if the variables comprise different types of variables, renaming the different types of variables respectively to distinguish the different types of variables.

4. The vulnerability testing method of claim 1, wherein the step of training the data set to be trained using a deep learning network model comprises:

importing the data set to be trained into a gate control cycle unit of the deep learning network model; and

and training the data set to be trained under the condition that the numerical value of the neural network loss function is smaller than a preset numerical value to obtain an initial sample set to be tested.

5. The vulnerability testing method of claim 4, wherein,

the predetermined value ranges from 0.05 to 0.1.

6. The vulnerability testing method of claim 1, wherein the step of performing model prediction and assembly processing on at least a portion of the first plurality of samples under test in the initial set of samples under test comprises:

predicting an initial statement by taking a certain variable of the first sample to be detected as an initial character as a prediction starting point;

sequentially selecting a plurality of lines of sentences from the initial sample set to be detected by taking the initial sentences as starting points, and adding an attempt rule module and/or a capture rule module to each line of sentences; and

and taking the line sentences as a unit, and assembling a label on the unit to form a second sample to be tested.

7. A vulnerability testing apparatus for a browser, comprising:

the preprocessing unit is used for selecting sample grammar data from a vulnerability grammar library and preprocessing the sample grammar data to obtain a data set to be trained;

the training unit is used for training the data set to be trained by utilizing a deep learning network model to obtain an initial sample set to be tested, and the initial sample set to be tested comprises a plurality of initial first samples to be tested;

the prediction assembling unit is used for performing model prediction and assembling processing on at least one part of the first samples to be tested in the initial sample set to be tested to form a second sample to be tested;

the monitoring unit is used for inputting the second sample to be tested into a browser so that the browser can analyze the second sample to be tested and monitor the running state of the browser; and

and the determining unit is used for determining the second sample to be detected as an abnormal sample under the condition that the operation of the browser is detected to have a fault, generating abnormal log data and storing the abnormal sample and the abnormal log data.

8. The vulnerability testing apparatus of claim 7, wherein,

the preprocessing unit is used for analyzing the sample grammar data, reserving a fixed tag format of the sample grammar data, forming a first data set by taking words as units for functions and variables in the sample grammar data, performing deduplication processing on the functions and variables in the first data set, forming a second data set by using equal signs, parentheses and punctuations in the sample grammar data according to characters, performing deduplication processing on the equal signs, the parentheses and the punctuations in the second data set, merging the first data set after the deduplication processing and the second data set after the deduplication processing, and performing vectorization processing on the merged data set to generate a data set to be trained.

9. The vulnerability testing apparatus of claim 8, wherein,

the preprocessing unit is used for acquiring a plurality of variables with the same name from the first data set. If the variables comprise the same type of variables, performing deduplication processing on the same type of variables, and if the variables comprise different types of variables, renaming the different types of variables respectively to distinguish the different types of variables.

10. The vulnerability testing apparatus of claim 7, wherein,

the training unit is used for importing the data set to be trained into a gate control cycle unit of the deep learning network model, and training the data set to be trained to obtain an initial sample set to be tested under the condition that the numerical value of a neural network loss function is smaller than a preset numerical value.

11. The vulnerability testing apparatus of claim 10, wherein,

the predetermined value ranges from 0.05 to 0.1.

12. The vulnerability testing apparatus of claim 7, wherein,

the prediction assembling unit is used for predicting an initial sentence by taking a certain variable of the first sample to be tested as an initial character as a prediction starting point, sequentially selecting a plurality of lines of sentences from the initial sample set to be tested by taking the initial sentence as the starting point, adding a trial rule module and/or a capture rule module to each line of sentence, and assembling a label to the unit by taking the plurality of lines of sentences as a unit to form a second sample to be tested.

13. A vulnerability testing apparatus for a browser, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of any of claims 1-6 based on instructions stored in the memory.

14. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any one of claims 1 to 6.