CN118364476B

CN118364476B - Vulnerability-associated product data processing method, device, equipment and storage medium

Info

Publication number: CN118364476B
Application number: CN202410793203.4A
Authority: CN
Inventors: 王新刚; 顾钊铨; 景晓; 孟令逍; 周琥晨; 余涛; 关华; 袁华平
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2024-06-19
Filing date: 2024-06-19
Publication date: 2024-08-27
Anticipated expiration: 2044-06-19
Also published as: CN118364476A

Abstract

The application provides a processing method, a device, equipment and a storage medium of vulnerability-associated product data, wherein the method comprises the following steps: obtaining a first product data set and a first vulnerability data set from a common database, and obtaining a second product data set and a second vulnerability data set from a plurality of devices; obtaining a plurality of product entity data corresponding to a plurality of vulnerability entities according to the first vulnerability data and the first product data set; selecting a plurality of candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from the plurality of product entity data based on the association relationship between the vulnerability entity and the product entity data; target product information corresponding to the target vulnerability entity is obtained from the second product data set, and target product entity data corresponding to the target vulnerability entity is determined from the plurality of candidate product entity data, so that product entity alignment of the associated vulnerability is realized, and accuracy and reliability of product entity alignment are effectively improved.

Description

Method, device, equipment and storage medium for processing product data associated with vulnerabilities

技术领域Technical Field

本申请实施例涉及数据安全技术领域，尤其是一种漏洞关联的产品数据的处理方法、装置、设备及存储介质。The embodiments of the present application relate to the field of data security technology, and in particular to a method, apparatus, device and storage medium for processing product data associated with a vulnerability.

背景技术Background Art

随着互联网普及和信息化程度提高，网络攻击日益复杂，企业和组织为了应对这些挑战，需要不断提升自身的网络安全防御能力，其中，网络靶场作为一种模拟真实网络环境的虚拟平台，能够基于相应设备的产品和漏洞信息模拟各种网络攻击场景，以允许安全人员在没有风险的情况下进行安全测试、培训和研究，通过模拟网络攻击来提高防御能力，产品信息即为CPE（CommonPlatformEnumeration，通用平台枚举），用于表征不同的产品资产。With the popularization of the Internet and the improvement of informatization, network attacks are becoming increasingly complex. In order to cope with these challenges, enterprises and organizations need to continuously improve their network security defense capabilities. Among them, the network target range, as a virtual platform that simulates a real network environment, can simulate various network attack scenarios based on the product and vulnerability information of the corresponding equipment, allowing security personnel to conduct security testing, training and research without risk, and improve defense capabilities by simulating network attacks. The product information is CPE (Common Platform Enumeration), which is used to characterize different product assets.

由于不同厂商的产品漏洞扫描技术存在差异，导致数据格式不一致，故在网络靶场中还需要对产品和漏洞信息进行识别和对齐，然而，当前的产品实体对齐方法一般是通过对单一产品数据源进行基于字符串规则或者指纹信息的实体关联处理实现的，忽略了产品和漏洞之间的关联性，容易因为信息不准确而产生误匹配，降低了产品实体对齐的准确性和可靠性。Due to differences in product vulnerability scanning technologies among different manufacturers, data formats are inconsistent, so product and vulnerability information also need to be identified and aligned in the network target range. However, the current product entity alignment method is generally achieved by performing entity association processing based on string rules or fingerprint information on a single product data source, ignoring the correlation between products and vulnerabilities. It is easy to produce mismatches due to inaccurate information, which reduces the accuracy and reliability of product entity alignment.

发明内容Summary of the invention

本申请实施例提供一种漏洞关联的产品数据的处理方法、装置、设备及存储介质，可以基于公共数据库得到漏洞产品关联关系，从而确定目标设备中漏洞对应的产品实体数据，实现关联漏洞的产品实体对齐，有效提高产品实体对齐的准确性和可靠性。The embodiments of the present application provide a method, apparatus, device and storage medium for processing product data associated with a vulnerability, which can obtain the vulnerability-product association relationship based on a public database, thereby determining the product entity data corresponding to the vulnerability in the target device, realizing product entity alignment of associated vulnerabilities, and effectively improving the accuracy and reliability of product entity alignment.

为实现上述目的，本申请实施例的第一方面提供了一种漏洞关联的产品数据的处理方法，包括：获取来自于公共数据库的第一产品数据集和第一漏洞数据集，以及获取来自于多个设备的第二产品数据集和第二漏洞数据集；根据所述第一漏洞数据集构建多个漏洞实体对应的多个产品实体；基于所述第一产品数据集对多个所述产品实体进行数据补充处理，以得到多个所述漏洞实体对应的多个产品实体数据；基于所述漏洞实体和所述产品实体数据之间的关联关系，从所述多个产品实体数据中选取与所述第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；从所述第二产品数据集中获取所述目标漏洞实体对应的目标产品信息；计算所述目标产品信息与多个所述候选产品实体数据的匹配系数，并基于所述匹配系数从多个所述候选产品实体数据中确定所述目标漏洞实体对应的目标产品实体数据。To achieve the above-mentioned purpose, a first aspect of an embodiment of the present application provides a method for processing product data associated with a vulnerability, including: obtaining a first product data set and a first vulnerability data set from a public database, and obtaining a second product data set and a second vulnerability data set from multiple devices; constructing multiple product entities corresponding to multiple vulnerability entities based on the first vulnerability data set; performing data supplementation processing on multiple product entities based on the first product data set to obtain multiple product entity data corresponding to multiple vulnerability entities; based on the association relationship between the vulnerability entity and the product entity data, selecting multiple candidate product entity data associated with a target vulnerability entity in the second vulnerability data set from the multiple product entity data; obtaining target product information corresponding to the target vulnerability entity from the second product data set; calculating the matching coefficient between the target product information and the multiple candidate product entity data, and determining the target product entity data corresponding to the target vulnerability entity from the multiple candidate product entity data based on the matching coefficient.

在一些实施例中，所述计算所述目标产品信息与多个所述候选产品实体数据的匹配系数，并基于所述匹配系数从多个所述候选产品实体数据中确定所述目标漏洞实体对应的目标产品实体数据，包括：基于预设的文本字典库计算所述目标产品信息与多个所述候选产品实体数据的匹配系数；将最高的所述匹配系数对应的所述候选产品实体数据确定为所述目标漏洞实体对应的目标产品实体数据。In some embodiments, the calculating of the matching coefficient between the target product information and a plurality of the candidate product entity data, and determining the target product entity data corresponding to the target vulnerability entity from the plurality of the candidate product entity data based on the matching coefficient, includes: calculating the matching coefficient between the target product information and a plurality of the candidate product entity data based on a preset text dictionary library; and determining the candidate product entity data corresponding to the highest matching coefficient as the target product entity data corresponding to the target vulnerability entity.

在一些实施例中，所述基于预设的文本字典库计算所述目标产品信息与多个所述候选产品实体数据的匹配系数之后，所述方法还包括：在最高的所述匹配系数大于系数阈值的情况下，将最高的所述匹配系数对应的所述候选产品实体数据确定为所述目标漏洞实体对应的目标产品实体数据；在最高的所述匹配系数小于或者等于系数阈值的情况下，分别计算所述目标产品信息与每个所述候选产品实体数据之间的词向量相似系数和编辑距离相似系数，基于所述匹配系数、所述词向量相似系数和所述编辑距离相似系数从多个所述候选产品实体数据中确定所述目标产品实体数据。In some embodiments, after calculating the matching coefficient between the target product information and a plurality of the candidate product entity data based on a preset text dictionary library, the method further includes: when the highest matching coefficient is greater than a coefficient threshold, determining the candidate product entity data corresponding to the highest matching coefficient as the target product entity data corresponding to the target vulnerability entity; when the highest matching coefficient is less than or equal to the coefficient threshold, respectively calculating the word vector similarity coefficient and the edit distance similarity coefficient between the target product information and each of the candidate product entity data, and determining the target product entity data from the plurality of the candidate product entity data based on the matching coefficient, the word vector similarity coefficient and the edit distance similarity coefficient.

在一些实施例中，所述基于所述匹配系数、所述词向量相似系数和所述编辑距离相似系数从多个所述候选产品实体数据中确定所述目标产品实体数据，包括：基于预设的权重配置对所述匹配系数、所述词向量相似系数和所述编辑距离相似系数进行加权计算，以得到聚合相似系数；将最高的所述聚合相似系数对应的所述候选产品实体数据确定为所述目标产品实体数据。In some embodiments, the target product entity data is determined from the multiple candidate product entity data based on the matching coefficient, the word vector similarity coefficient and the edit distance similarity coefficient, including: weighted calculation of the matching coefficient, the word vector similarity coefficient and the edit distance similarity coefficient based on a preset weight configuration to obtain an aggregated similarity coefficient; and the candidate product entity data corresponding to the highest aggregated similarity coefficient is determined as the target product entity data.

在一些实施例中，所述根据所述第一漏洞数据集构建多个漏洞实体对应的多个产品实体，包括：基于预设的大语言模型对所述第一漏洞数据集进行信息提取处理，得到多个漏洞实体对应的多个产品信息；根据多个所述产品信息构建多个产品实体。In some embodiments, constructing multiple product entities corresponding to the multiple vulnerability entities based on the first vulnerability data set includes: performing information extraction processing on the first vulnerability data set based on a preset large language model to obtain multiple product information corresponding to the multiple vulnerability entities; and constructing multiple product entities based on the multiple product information.

在一些实施例中，基于所述漏洞实体和所述产品实体数据之间的关联关系，从所述多个产品实体数据中选取与所述第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据，包括：获取所述第二漏洞数据集中目标漏洞实体的漏洞信息，并从多个所述漏洞实体中确定所述漏洞信息对应的漏洞实体；基于所述漏洞实体和所述产品实体数据之间的关联关系，从所述多个产品实体数据中选取与所述漏洞信息对应的漏洞实体相关联的多个候选产品实体数据。In some embodiments, based on the association relationship between the vulnerability entity and the product entity data, multiple candidate product entity data associated with the target vulnerability entity in the second vulnerability data set are selected from the multiple product entity data, including: obtaining vulnerability information of the target vulnerability entity in the second vulnerability data set, and determining the vulnerability entity corresponding to the vulnerability information from the multiple vulnerability entities; based on the association relationship between the vulnerability entity and the product entity data, multiple candidate product entity data associated with the vulnerability entity corresponding to the vulnerability information are selected from the multiple product entity data.

在一些实施例中，不同的所述设备具有不同的镜像标识，所述第二产品数据集和所述第二漏洞数据集中数据通过所述镜像标识进行关联，所述从所述第二产品数据集中获取所述目标漏洞实体对应的目标产品信息，包括：从所述第二漏洞数据集中获取所述目标漏洞实体对应的目标镜像标识；从所述第二产品数据集中获取所述目标镜像标识对应的目标产品信息。In some embodiments, different devices have different image identifiers, the data in the second product data set and the second vulnerability data set are associated through the image identifier, and obtaining the target product information corresponding to the target vulnerability entity from the second product data set includes: obtaining the target image identifier corresponding to the target vulnerability entity from the second vulnerability data set; obtaining the target product information corresponding to the target image identifier from the second product data set.

在一些实施例中，所述获取来自于公共数据库的第一产品数据集和第一漏洞数据集，包括：按照预设间隔时长从所述公共数据库获取产品日志数据和漏洞日志数据；对所述产品日志数据和所述漏洞日志数据进行数据清洗处理，并对清洗后的所述产品日志数据和所述漏洞日志数据进行关键字段提取处理，以得到所述第一产品数据集和所述第一漏洞数据集。In some embodiments, obtaining a first product data set and a first vulnerability data set from a public database includes: obtaining product log data and vulnerability log data from the public database at a preset interval; performing data cleaning processing on the product log data and the vulnerability log data, and performing key field extraction processing on the cleaned product log data and the vulnerability log data to obtain the first product data set and the first vulnerability data set.

在一些实施例中，所述获取来自于多个设备的第二产品数据集和第二漏洞数据集，包括：通过调用多个产品漏洞扫描接口，以获取多个设备的产品漏洞数据；基于预设的格式转换工具对多个产品漏洞数据进行关键字段提取处理，以得到所述第二产品数据集和所述第二漏洞数据集，所述第二漏洞数据集包括多个所述设备的漏洞信息，所述第二产品数据集包括多个所述设备的漏洞信息对应的产品信息。In some embodiments, the obtaining of a second product data set and a second vulnerability data set from multiple devices includes: obtaining product vulnerability data of multiple devices by calling multiple product vulnerability scanning interfaces; performing key field extraction processing on the multiple product vulnerability data based on a preset format conversion tool to obtain the second product data set and the second vulnerability data set, the second vulnerability data set including vulnerability information of multiple devices, and the second product data set including product information corresponding to the vulnerability information of multiple devices.

在一些实施例中，所述通过调用多个产品漏洞扫描接口，以获取多个设备的产品漏洞数据，包括：通过调用第一厂商提供的产品扫描接口，获取目标设备的产品数据；通过调用第二厂商提供的漏洞扫描接口，获取所述目标设备的漏洞数据；其中，所述产品数据和所述漏洞数据具有相同的镜像标识。In some embodiments, the method of acquiring product vulnerability data of multiple devices by calling multiple product vulnerability scanning interfaces includes: acquiring product data of a target device by calling a product scanning interface provided by a first manufacturer; acquiring vulnerability data of the target device by calling a vulnerability scanning interface provided by a second manufacturer; wherein the product data and the vulnerability data have the same mirror identifier.

在一些实施例中，所述方法还包括：在确定多个所述设备的多个所述目标漏洞实体对应的所述目标产品实体数据之后，构建表征所述设备、所述目标漏洞实体以及所述目标产品实体数据之间关联关系的产品实体对齐表。In some embodiments, the method further includes: after determining the target product entity data corresponding to multiple target vulnerability entities of multiple devices, constructing a product entity alignment table characterizing the association relationship between the devices, the target vulnerability entities and the target product entity data.

为实现上述目的，本申请实施例的第二方面提供了一种漏洞关联的产品数据的处理装置，包括：数据获取模块，用于获取来自于公共数据库的第一产品数据集和第一漏洞数据集，以及获取来自于多个设备的第二产品数据集和第二漏洞数据集；实体构建模块，用于根据所述第一漏洞数据集构建多个漏洞实体对应的多个产品实体；数据补充模块，用于基于所述第一产品数据集对多个所述产品实体进行数据补充处理，以得到多个所述漏洞实体对应的多个产品实体数据；实体关联模块，用于基于所述漏洞实体和所述产品实体数据之间的关联关系，从所述多个产品实体数据中选取与所述第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；实体确定模块，用于从所述第二产品数据集中获取所述目标漏洞实体对应的目标产品信息；实体对齐模块，用于计算所述目标产品信息与多个所述候选产品实体数据的匹配系数，并基于所述匹配系数从多个所述候选产品实体数据中确定所述目标漏洞实体对应的目标产品实体数据。To achieve the above-mentioned purpose, the second aspect of an embodiment of the present application provides a device for processing product data associated with a vulnerability, including: a data acquisition module, used to acquire a first product data set and a first vulnerability data set from a public database, and to acquire a second product data set and a second vulnerability data set from multiple devices; an entity construction module, used to construct multiple product entities corresponding to multiple vulnerability entities according to the first vulnerability data set; a data supplement module, used to perform data supplement processing on multiple product entities based on the first product data set to obtain multiple product entity data corresponding to multiple vulnerability entities; an entity association module, used to select multiple candidate product entity data associated with a target vulnerability entity in the second vulnerability data set from the multiple product entity data based on the association relationship between the vulnerability entity and the product entity data; an entity determination module, used to acquire target product information corresponding to the target vulnerability entity from the second product data set; an entity alignment module, used to calculate the matching coefficient between the target product information and the multiple candidate product entity data, and determine the target product entity data corresponding to the target vulnerability entity from the multiple candidate product entity data based on the matching coefficient.

为实现上述目的，本申请实施例的第三方面提出了一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如第一方面中任意一项实施例所述的漏洞关联的产品数据的处理方法。To achieve the above-mentioned purpose, the third aspect of the embodiments of the present application proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the method for processing product data associated with the vulnerability as described in any one of the embodiments in the first aspect.

为实现上述目的，本申请实施例的第四方面提出了一种计算机可读存储介质，存储有计算机可执行指令，计算机可执行指令用于执行如第一方面中任意一项实施例所述的漏洞关联的产品数据的处理方法。To achieve the above-mentioned purpose, the fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing computer-executable instructions, which are used to execute the method for processing product data associated with vulnerabilities as described in any one of the embodiments of the first aspect.

本申请实施例方案提供的漏洞关联的产品数据的处理方法、装置、设备及存储介质，可以从公共数据库中获取第一产品数据集和第一漏洞数据集，同时从多个设备中收集第二产品数据集和第二漏洞数据集，进而根据第一漏洞数据集中的信息，构建出与之对应的多个产品实体，并利用第一产品数据集对上述构建的产品实体进行数据补充处理，形成完整的产品实体数据；进一步的，基于漏洞实体和产品实体数据之间的关联关系，本申请可以确定第二漏洞数据集中目标漏洞实体对应的候选产品实体数据，并从第二产品数据集中提取出与目标漏洞实体相关的产品信息，以计算目标产品信息与候选产品实体数据之间的匹配系数，根据匹配系数从候选产品实体数据中确定目标产品实体数据，其中，可以理解的是，本申请基于公共数据库构建漏洞产品关联关系，以实现产品和漏洞之间的关联，进而在通过不同厂商的产品漏洞扫描技术获取的多个设备的第二产品数据集和第二漏洞数据集时，可以基于产品漏洞的关联性更加准确有效地确定目标漏洞实体对应的目标产品实体数据，以实现产品实体对齐，减少误匹配的发生，提高了数据处理的准确性和可靠性，使得本申请方案中目标漏洞实体下目标产品实体数据的对齐结果可以支持网络靶场复杂的网络环境和多样化的攻击场景模拟，有助于安全人员更全面地了解和应对潜在的网络威胁。The method, apparatus, device and storage medium for processing vulnerability-associated product data provided by the embodiment of the present application can obtain a first product data set and a first vulnerability data set from a public database, and collect a second product data set and a second vulnerability data set from multiple devices at the same time, and then construct multiple product entities corresponding to the first vulnerability data set based on the information in the first vulnerability data set, and use the first product data set to perform data supplement processing on the above-constructed product entities to form complete product entity data; further, based on the association relationship between the vulnerability entity and the product entity data, the present application can determine the candidate product entity data corresponding to the target vulnerability entity in the second vulnerability data set, and extract product information related to the target vulnerability entity from the second product data set to calculate the matching relationship between the target product information and the candidate product entity data. Matching coefficient, determine the target product entity data from the candidate product entity data according to the matching coefficient, wherein it can be understood that the present application constructs a vulnerability product association relationship based on a public database to achieve the association between products and vulnerabilities, and then when a second product data set and a second vulnerability data set of multiple devices are obtained through product vulnerability scanning technology from different manufacturers, the target product entity data corresponding to the target vulnerability entity can be determined more accurately and effectively based on the correlation of product vulnerabilities, so as to achieve product entity alignment, reduce the occurrence of mismatches, and improve the accuracy and reliability of data processing, so that the alignment result of the target product entity data under the target vulnerability entity in the present application scheme can support the complex network environment and diversified attack scenario simulation of the network target range, which helps security personnel to more comprehensively understand and respond to potential network threats.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请一实施例提供的漏洞关联的产品数据的处理方法的流程图；FIG1 is a flow chart of a method for processing vulnerability-related product data provided by an embodiment of the present application;

图2为本申请一实施例提供的确定目标产品实体数据的流程图；FIG2 is a flow chart of determining target product entity data provided by an embodiment of the present application;

图3为本申请一实施例提供的确定目标产品实体数据的另一流程图；FIG3 is another flow chart of determining target product entity data provided by an embodiment of the present application;

图4为本申请一实施例提供的集构建多个产品实体的流程图；FIG4 is a flow chart of a collection of multiple product entities provided by an embodiment of the present application;

图5为本申请一实施例提供的得到多个候选产品实体数据的流程图；FIG5 is a flow chart of obtaining multiple candidate product entity data provided by an embodiment of the present application;

图6为本申请一实施例提供的漏洞关联的产品数据的处理方法的实例流程图；FIG6 is a flowchart of an example of a method for processing vulnerability-related product data provided by an embodiment of the present application;

图7为本申请一实施例提供的得到多个产品实体数据的实例流程图；FIG. 7 is a flowchart of an example of obtaining multiple product entity data provided by an embodiment of the present application;

图8为本申请一实施例提供的计算匹配系数的实例流程图；FIG8 is a flowchart of an example of calculating a matching coefficient provided by an embodiment of the present application;

图9为本申请一实施例提供的确定目标产品实体数据的实例流程图；FIG9 is a flowchart of an example of determining target product entity data provided by an embodiment of the present application;

图10为本申请一实施例提供的电子设备的结构示意图。FIG. 10 is a schematic diagram of the structure of an electronic device provided in one embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

在一些实施例中，虽然在系统示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于系统中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语第一、第二等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。In some embodiments, although the functional modules are divided in the system schematic diagram and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the system or the order in the flowchart. The terms first, second, etc. in the specification, claims and the above drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

此外，除非另有明确的规定和限定，术语“连接/相连”应做广义理解，例如，可以是固定连接或活动连接，也可以是可拆卸连接或不可拆卸连接，或一体的连接；可以是机械连接，也可以是电连接或可以相互通讯；可以是直接相连，也可以通过中间媒介间接相连。In addition, unless otherwise clearly specified and limited, the term "connected/connected" should be understood in a broad sense. For example, it can be a fixed connection or a movable connection, a detachable connection or a non-detachable connection, or an integral connection; it can be a mechanical connection, an electrical connection, or the two can communicate with each other; it can be a direct connection, or it can be indirectly connected through an intermediate medium.

在本申请实施例的描述中，参考术语“一个实施例/实施方式”、“另一实施例/实施方式”或“某些实施例/实施方式”、“在上述实施例/实施方式”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请公开的至少两个实施例或实施方式中。在本申请公开中，对上述术语的示意性表述不一定指的是相同的实施例或实施方式。需要说明的是，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于流程图中的顺序执行所示出或描述的步骤。In the description of the embodiments of the present application, the description with reference to the terms "one embodiment/implementation", "another embodiment/implementation" or "certain embodiments/implementations", "in the above embodiments/implementations", etc. means that the specific features, structures, materials or characteristics described in combination with the implementation or example are included in at least two embodiments or implementations disclosed in the present application. In the disclosure of the present application, the schematic representation of the above terms does not necessarily refer to the same embodiment or implementation. It should be noted that although the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that in the flowchart.

随着互联网普及和信息化程度提高，网络攻击日益复杂，企业和组织为了应对这些挑战，需要不断提升自身的网络安全防御能力，其中，网络靶场作为一种模拟真实网络环境的虚拟平台，能够基于相应设备的产品和漏洞信息模拟各种网络攻击场景，以允许安全人员在没有风险的情况下进行安全测试、培训和研究，通过模拟网络攻击来提高防御能力；由于不同厂商的产品漏洞扫描技术存在差异，导致数据格式不一致，故在网络靶场中还需要对产品和漏洞信息进行识别和对齐，然而，当前的产品实体对齐方法一般是通过对单一产品数据源进行基于字符串规则或者指纹信息的实体关联处理实现的，忽略了产品和漏洞之间的关联性，容易因为信息不准确而产生误匹配，降低了产品实体对齐的准确性和可靠性。With the popularization of the Internet and the improvement of informatization, network attacks are becoming increasingly complex. In order to cope with these challenges, enterprises and organizations need to continuously improve their network security defense capabilities. Among them, the network range, as a virtual platform that simulates a real network environment, can simulate various network attack scenarios based on the product and vulnerability information of the corresponding equipment, allowing security personnel to conduct security testing, training and research without risk, and improve defense capabilities by simulating network attacks; due to differences in product vulnerability scanning technologies of different manufacturers, resulting in inconsistent data formats, it is also necessary to identify and align product and vulnerability information in the network range. However, the current product entity alignment method is generally achieved by performing entity association processing based on string rules or fingerprint information on a single product data source, ignoring the correlation between products and vulnerabilities, and is prone to mismatching due to inaccurate information, reducing the accuracy and reliability of product entity alignment.

其中，产品信息即为CPE（CommonPlatformEnumeration，通用平台枚举），产品实体即为CPE实体，CPE是一种用于描述和识别企业计算产品中存在的应用程序、操作系统和硬件设备类别的标准化方法，提供了标准的机器可读的格式，可以对IT产品和平台进行唯一编码，以表征不同的产品资产；漏洞实体即为CVE（CommonVulnerabilitiesandExposures），CVE用于为已知的安全漏洞提供唯一标识和相关信息，以表征不同的漏洞。Among them, the product information is CPE (Common Platform Enumeration), and the product entity is CPE entity. CPE is a standardized method for describing and identifying application programs, operating systems, and hardware device categories in enterprise computing products. It provides a standard machine-readable format that can uniquely encode IT products and platforms to characterize different product assets; the vulnerability entity is CVE (Common Vulnerabilities and Exposures). CVE is used to provide unique identification and related information for known security vulnerabilities to characterize different vulnerabilities.

基于此，本申请提供了一种漏洞关联的产品数据的处理方法、装置、设备及存储介质，方法包括：获取来自于公共数据库的第一产品数据集和第一漏洞数据集，以及获取来自于多个设备的第二产品数据集和第二漏洞数据集；根据第一漏洞数据和第一产品数据集得到多个漏洞实体对应的多个产品实体数据；基于漏洞实体和产品实体数据之间的关联关系，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；从第二产品数据集中获取目标漏洞实体对应的目标产品信息，从多个候选产品实体数据中确定目标漏洞实体对应的目标产品实体数据，以实现关联漏洞的产品实体对齐，有效提高产品实体对齐的准确性和可靠性。Based on this, the present application provides a method, apparatus, device and storage medium for processing product data associated with a vulnerability, the method comprising: obtaining a first product data set and a first vulnerability data set from a public database, and obtaining a second product data set and a second vulnerability data set from multiple devices; obtaining multiple product entity data corresponding to multiple vulnerability entities according to the first vulnerability data and the first product data set; based on the association relationship between the vulnerability entity and the product entity data, selecting multiple candidate product entity data associated with a target vulnerability entity in the second vulnerability data set from the multiple product entity data; obtaining target product information corresponding to the target vulnerability entity from the second product data set, and determining the target product entity data corresponding to the target vulnerability entity from the multiple candidate product entity data, so as to achieve product entity alignment of associated vulnerabilities, and effectively improve the accuracy and reliability of product entity alignment.

下面结合附图，对本申请实施例作进一步描述。The embodiments of the present application are further described below in conjunction with the accompanying drawings.

参考图1，图1为本申请一实施例提供的漏洞关联的产品数据的处理方法的流程图；本申请实施例的第一方面提供了一种漏洞关联的产品数据的处理方法，包括但不限于有：Referring to FIG. 1 , FIG. 1 is a flow chart of a method for processing vulnerability-associated product data provided by an embodiment of the present application. A first aspect of an embodiment of the present application provides a method for processing vulnerability-associated product data, including but not limited to:

步骤S110，获取来自于公共数据库的第一产品数据集和第一漏洞数据集，以及获取来自于多个设备的第二产品数据集和第二漏洞数据集；根据第一漏洞数据集构建多个漏洞实体对应的多个产品实体；Step S110, obtaining a first product data set and a first vulnerability data set from a public database, and obtaining a second product data set and a second vulnerability data set from a plurality of devices; constructing a plurality of product entities corresponding to a plurality of vulnerability entities according to the first vulnerability data set;

步骤S120，基于第一产品数据集对多个产品实体进行数据补充处理，以得到多个漏洞实体对应的多个产品实体数据；Step S120, performing data supplementation processing on the multiple product entities based on the first product data set to obtain multiple product entity data corresponding to the multiple vulnerability entities;

步骤S130，基于漏洞实体和产品实体数据之间的关联关系，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；Step S130, based on the association relationship between the vulnerability entity and the product entity data, selecting a plurality of candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from the plurality of product entity data;

步骤S140，从第二产品数据集中获取目标漏洞实体对应的目标产品信息；Step S140, obtaining target product information corresponding to the target vulnerability entity from the second product data set;

步骤S150，计算目标产品信息与多个候选产品实体数据的匹配系数，并基于匹配系数从多个候选产品实体数据中确定目标产品实体数据。Step S150 , calculating a matching coefficient between the target product information and a plurality of candidate product entity data, and determining the target product entity data from the plurality of candidate product entity data based on the matching coefficient.

在一些实施例中，第一产品数据集可以为从CNNVD（ChinaNationalVulnerabilityDatabaseofInformationSecurity，国家信息安全漏洞库）等公共数据库按照预设间隔时长获取的CPE数据集，第一漏洞数据集可以为从同样从CNNVD获取的CVE数据集；第二产品数据集可以为具体的设备的产品数据，设备例如网关设备、终端设备等靶场进行漏洞产品探测扫描任务的目标识别，这些设备安装了不同的操作系统、不同用途的软件和组件等，第一漏洞数据集中的数据需要与第一产品数据集进行对齐，第二漏洞数据集即为与第二产品数据集相关联的漏洞数据；漏洞实体可以为第一漏洞数据集中定义的具体安全漏洞，产品实体可以为指根据第一漏洞数据集中的信息构建的，具体安全漏洞下的产品实例对象，但信息不完整，相对的，产品实体数据则可以为指信息补充后，与产品实体相关的详细信息。In some embodiments, the first product data set may be a CPE data set obtained from a public database such as CNNVD (China National Vulnerability Database of Information Security) at a preset interval, and the first vulnerability data set may be a CVE data set also obtained from CNNVD; the second product data set may be product data of a specific device, such as a gateway device, a terminal device, and other target devices for target identification of vulnerability product detection scanning tasks, and these devices are installed with different operating systems, software and components for different purposes, etc. The data in the first vulnerability data set needs to be aligned with the first product data set, and the second vulnerability data set is the vulnerability data associated with the second product data set; the vulnerability entity may be a specific security vulnerability defined in the first vulnerability data set, and the product entity may refer to a product instance object under a specific security vulnerability constructed based on the information in the first vulnerability data set, but the information is incomplete, and in contrast, the product entity data may refer to detailed information related to the product entity after the information is supplemented.

其中，候选产品实体数据是基于漏洞实体和产品实体数据之间的关联关系得到的，与目标漏洞实体相关的产品实体数据集合，目标漏洞实体即为目标设备的具体漏洞，目标产品信息即为具体漏洞下的产品信息，但该产品信息可能存在不完整，格式不统一，上下文依赖不一致等问题，故需要与候选产品实体数据中最接近的数据进行对齐，即进行确定目标产品实体数据的步骤，进而使得本申请可以基于公共数据库得到漏洞产品关联关系，从而确定目标设备中漏洞对应的产品实体数据，实现关联漏洞的产品实体对齐，并有效提高对齐的性和效率。Among them, the candidate product entity data is a set of product entity data related to the target vulnerability entity, which is obtained based on the association relationship between the vulnerability entity and the product entity data. The target vulnerability entity is the specific vulnerability of the target device, and the target product information is the product information under the specific vulnerability. However, the product information may be incomplete, have inconsistent formats, inconsistent context dependencies, etc. Therefore, it is necessary to align with the closest data in the candidate product entity data, that is, to determine the target product entity data, so that the present application can obtain the vulnerability product association relationship based on the public database, thereby determining the product entity data corresponding to the vulnerability in the target device, realizing product entity alignment of associated vulnerabilities, and effectively improving the alignment quality and efficiency.

在一些实施例中，对应上述步骤S110至步骤S150，本申请可以从公共数据库中获取第一产品数据集和第一漏洞数据集，同时从多个设备中收集第二产品数据集和第二漏洞数据集，进而根据第一漏洞数据集中的信息，构建出与之对应的多个产品实体，并利用第一产品数据集对上述构建的产品实体进行数据补充处理，形成完整的产品实体数据；进一步的，基于漏洞实体和产品实体数据之间的关联关系，本申请可以确定第二漏洞数据集中目标漏洞实体对应的候选产品实体数据，并从第二产品数据集中提取出与目标漏洞实体相关的产品信息，以计算目标产品信息与候选产品实体数据之间的匹配系数，根据匹配系数从候选产品实体数据中确定目标产品实体数据，该过程即为关联漏洞的产品实体对齐过程。In some embodiments, corresponding to the above steps S110 to S150, the present application can obtain a first product data set and a first vulnerability data set from a public database, and collect a second product data set and a second vulnerability data set from multiple devices at the same time, and then construct multiple product entities corresponding thereto according to the information in the first vulnerability data set, and use the first product data set to perform data supplement processing on the above-constructed product entities to form complete product entity data; further, based on the association relationship between the vulnerability entity and the product entity data, the present application can determine the candidate product entity data corresponding to the target vulnerability entity in the second vulnerability data set, and extract the product information related to the target vulnerability entity from the second product data set to calculate the matching coefficient between the target product information and the candidate product entity data, and determine the target product entity data from the candidate product entity data according to the matching coefficient. This process is the product entity alignment process of associated vulnerabilities.

其中，可以理解的是，本申请基于公共数据库构建漏洞产品关联关系，以实现产品资产和漏洞之间的关联，进而在通过不同厂商的产品漏洞扫描技术获取的多个设备的第二产品数据集和第二漏洞数据集时，可以基于产品漏洞的关联性更加准确有效地确定目标漏洞实体对应的目标产品实体数据，以实现表征资产的产品实体对齐，减少误匹配的发生，提高了数据处理的准确性和可靠性，使得本申请方案中目标漏洞实体下目标产品实体数据的对齐结果可以支持网络靶场复杂的网络环境和多样化的攻击场景模拟，有助于安全人员更全面地了解和应对潜在的网络威胁。Among them, it can be understood that the present application constructs a vulnerability product association relationship based on a public database to achieve the association between product assets and vulnerabilities, and then when the second product data set and the second vulnerability data set of multiple devices are obtained through product vulnerability scanning technology from different manufacturers, the target product entity data corresponding to the target vulnerability entity can be determined more accurately and effectively based on the correlation of product vulnerabilities, so as to achieve product entity alignment representing assets, reduce the occurrence of mismatches, and improve the accuracy and reliability of data processing, so that the alignment results of the target product entity data under the target vulnerability entity in the present application scheme can support the complex network environment and diverse attack scenario simulation of the network target range, which helps security personnel to more comprehensively understand and respond to potential network threats.

在一些实施例中，上述步骤S130至步骤S150为针对单个设备中单个目标漏洞实体的目标产品信息对齐过程，而本申请可以对多个设备的多个目标漏洞实体进行上述从对应的多个候选产品实体数据中确定目标产品实体数据的处理过程，且单个目标漏洞实体可以具有多个目标产品信息，单个目标漏洞对应的目标产品实体数据也可以有多个，进而可以确定多个设备的多个目标漏洞实体对应的目标产品实体数据。In some embodiments, the above steps S130 to S150 are target product information alignment processes for a single target vulnerability entity in a single device, and the present application can perform the above processing of determining target product entity data from corresponding multiple candidate product entity data on multiple target vulnerability entities of multiple devices, and a single target vulnerability entity can have multiple target product information, and there can also be multiple target product entity data corresponding to a single target vulnerability, thereby determining the target product entity data corresponding to multiple target vulnerability entities of multiple devices.

在一些实施例中，方法还包括：在确定多个设备的多个目标漏洞实体对应的目标产品实体数据之后，构建表征设备、目标漏洞实体以及目标产品实体数据之间关联关系的产品实体对齐表，其中，可以理解的是，在使用公共数据库作为数据源，得到多个漏洞实体对应的多个产品实体数据，构建基础的产品漏洞关联数据集CVE-CPEs后，可以进行目标产品实体数据的确定，以实现CPE实体数据的对齐，由于不同的设备具有不同的镜像标识ImageID，故最后可以将对齐后的CPE实体数据同ImageID、CVE进行关联，得到ImageID-CVE-CPEs数据表，并将建好的关联数据表存储到数据库中，以应用于后续靶场的网络攻击行为分析、攻击溯源等研究工作。In some embodiments, the method also includes: after determining the target product entity data corresponding to multiple target vulnerability entities of multiple devices, constructing a product entity alignment table that characterizes the association relationship between the devices, target vulnerability entities, and target product entity data, wherein it can be understood that after using a public database as a data source, obtaining multiple product entity data corresponding to multiple vulnerability entities, and constructing a basic product vulnerability association data set CVE-CPEs, the target product entity data can be determined to achieve alignment of the CPE entity data. Since different devices have different image identifiers ImageID, the aligned CPE entity data can finally be associated with ImageID and CVE to obtain an ImageID-CVE-CPEs data table, and the constructed association data table is stored in the database for application in subsequent research work such as network attack behavior analysis and attack tracing in the target range.

其中，ImageID作为靶场在设备漏洞产品探测扫描任务中所定义的特定设备标识符，涵盖网关设备、终端设备等多元化设备，这些设备配置各异的操作系统、功能软件及组件，借助ImageID，本申请能够精确锁定特定设备，进而明确其所承载的产品信息及潜在漏洞详情；而由于厂商提供的产品漏洞扫描接口，旨在适应不同设备类型的探测需求，例如，针对运行Linux操作系统的终端设备，应选用厂商A的产品漏洞扫描接口进行扫描；而对于Windows系统的终端设备，则需利用厂商B提供的扫描接口，鉴于各厂商接口及数据格式的差异性，扫描结果未统一遵循标准的cve命名规范，因此需借助格式转换工具，将各厂商扫描结果转化为统一格式，以便于后续的cpe实体对齐工作；故针对同一设备，即同一ImageID所标识的设备，可采用单一厂商的产品与漏洞扫描接口，亦可结合多个厂商接口进行扫描，所得数据均可以以ImageID为索引标识，使得本申请能够将不同扫描来源的漏洞数据与产品数据整合归并，从而全面掌握该设备（ImageID）的产品清单（CPE列表）及漏洞信息（CVE列表）。Among them, ImageID is a specific device identifier defined by the target range in the device vulnerability product detection scanning task, covering diversified devices such as gateway devices and terminal devices. These devices are equipped with different operating systems, functional software and components. With the help of ImageID, this application can accurately lock specific devices, and then clarify the product information and potential vulnerability details they carry; and because the product vulnerability scanning interface provided by the manufacturer is designed to adapt to the detection needs of different device types, for example, for terminal devices running the Linux operating system, the product vulnerability scanning interface of manufacturer A should be used for scanning; and for terminal devices running the Windows system, the scanning interface provided by manufacturer B should be used In view of the differences in interfaces and data formats of various manufacturers, the scanning results do not uniformly follow the standard CVE naming conventions. Therefore, it is necessary to use a format conversion tool to convert the scanning results of various manufacturers into a unified format to facilitate the subsequent CPE entity alignment work; therefore, for the same device, that is, the device identified by the same ImageID, a single manufacturer's product and vulnerability scanning interface can be used, or multiple manufacturers' interfaces can be combined for scanning. The obtained data can all be indexed with ImageID, so that this application can integrate and merge vulnerability data and product data from different scanning sources, so as to fully grasp the product list (CPE list) and vulnerability information (CVE list) of the device (ImageID).

具体的，ImageID-CVE-CPEs数据表可以简单用以下的数据结构表示：Specifically, the ImageID-CVE-CPEs data table can be simply represented by the following data structure:

[[

ImageID_01：ImageID_01:

[cve_01:[cpe_01,cpe_02,...],cve_02:[cpe_03,cpe_04,...],...],[cve_01:[cpe_01,cpe_02,...],cve_02:[cpe_03,cpe_04,...],...],

ImageID_02：ImageID_02:

[cve_03:[cpe_05,cpe_06,...],cve_04:[cpe_07,cpe_08,...],...],[cve_03:[cpe_05,cpe_06,...],cve_04:[cpe_07,cpe_08,...],...],

......

]；];

其中，可以看出，对同一个ImageID来进行的操作，扫描一台设备上（ImageID）的产品漏洞信息，需要用到多个厂商的产品漏洞扫描接口，得到的产品数据（cpe）、漏洞数据（cve）都以ImageID为索引，从多个候选产品实体数据中确定的目标产品实体数据即为对扫描得到的产品CPE列表中的信息进行规范性、准确性调整对齐后的CPE数据，多个对齐后的CPE数据就可以划分到所属的CVE下，并划分到对应的ImageID中。It can be seen that for the same ImageID, scanning the product vulnerability information on a device (ImageID) requires the use of product vulnerability scanning interfaces of multiple manufacturers. The obtained product data (cpe) and vulnerability data (cve) are indexed by ImageID. The target product entity data determined from multiple candidate product entity data is the CPE data after the information in the scanned product CPE list is adjusted and aligned for standardization and accuracy. Multiple aligned CPE data can be divided into the corresponding CVE and the corresponding ImageID.

在一些实施例中，获取来自于公共数据库的第一产品数据集和第一漏洞数据集包括：按照预设间隔时长从公共数据库获取产品日志数据和漏洞日志数据；对产品日志数据和漏洞日志数据进行数据清洗处理，并对清洗后的产品日志数据和漏洞日志数据进行关键字段提取处理，以得到第一产品数据集和第一漏洞数据集，其中，进行数据集获取工作，按照预设间隔时长从CNNVD等公共数据库获取CPE数据集和CVE数据集后需要将清洗后的日志数据转换为统一格式，具体的，处理后的CPE数据可以包含以下字段：产品名称（producet）、供应商信息（vendor）、版本号（version）；处理后的CVE数据包含以下字段：CVE编号（cve_id）、漏洞名称（cvename）、CNNVD编号（cnnvd_id）、描述（description）、厂商（vendor）、漏洞类型（type）、危害等级（rank）、收录时间（first_t）、更新时间（update_t）等。In some embodiments, obtaining a first product data set and a first vulnerability data set from a public database includes: obtaining product log data and vulnerability log data from the public database at a preset interval; performing data cleaning on the product log data and vulnerability log data, and performing key field extraction on the cleaned product log data and vulnerability log data to obtain the first product data set and the first vulnerability data set, wherein, after performing the data set acquisition work and obtaining the CPE data set and CVE data set from public databases such as CNNVD at a preset interval, it is necessary to convert the cleaned log data into a unified format, specifically, the processed CPE data may include the following fields: product name (producet), vendor information (vendor), version number (version); the processed CVE data includes the following fields: CVE number (cve_id), vulnerability name (cvename), CNNVD number (cnnvd_id), description (description), vendor (vendor), vulnerability type (type), hazard level (rank), inclusion time (first_t), update time (update_t), etc.

可以理解的是，通过大语言模型初步构建CVE编号对应的多个CPE实体对象，然后再通过CPE数据集进行实体对象的具体数据填充的过程中，使用大模型对CVE中描述信息文本进行信息抽取后，可以得到当前CVE所影响的CPE，这里说的产品名称、供应商名称、版本号等信息是CPE实体包含的数据字段，本身CPE的定义规范也包含这些信息，即初步生产的CPE实体对象和CPE数据集中均包括产品名称、供应商名称、版本号等信息，两者的区别是在于CPE数据集中数据更加详细或者更加正确，故需要进行数据补充。It can be understood that by initially constructing multiple CPE entity objects corresponding to the CVE number through the large language model, and then filling the specific data of the entity objects through the CPE data set, the CPE affected by the current CVE can be obtained after using the large model to extract the information text describing the information in the CVE. The product name, supplier name, version number and other information mentioned here are data fields contained in the CPE entity, and the definition specification of the CPE itself also includes this information, that is, the initially produced CPE entity object and the CPE data set both include product name, supplier name, version number and other information. The difference between the two is that the data in the CPE data set is more detailed or more correct, so data supplementation is needed.

在一些实施例中，获取来自于多个设备的第二产品数据集和第二漏洞数据集包括：通过调用多个产品漏洞扫描接口，以获取多个设备的产品漏洞数据；基于预设的格式转换工具对多个产品漏洞数据进行关键字段提取处理，以得到第二产品数据集和第二漏洞数据集，第二漏洞数据集包括多个设备的漏洞信息，第二产品数据集包括多个设备的漏洞信息对应的产品信息，其中，采用统一自动化的数据格式转换工具，可以将不同厂商的产品漏洞数据转换为统一的数据格式，解决数据不一致的问题，第二产品数据集即为调用各个厂商提供的产品漏洞扫描接口得到不同镜像ImageID对应的产品漏洞数据，在使用格式转换工具将产品漏洞数据转换成统一的格式后，可以根据ImageID为关联依据，将对应同一个镜像ID的产品漏洞数据进行关联，得到当前镜像ID对应的CVE列表和CPE列表，进而将CPE实体对齐结果与ImageID和CVE进行关联，使用ImageID-CVE-CPEs数据表的数据输出形式，相对于传统的CPE实体对齐方法往往只输出CPE实体对齐结果，ImageID-CVE-CPEs数据表的形式可以方便靶场网络攻击行为分析、攻击溯源等工作的开展。In some embodiments, obtaining a second product data set and a second vulnerability data set from multiple devices includes: obtaining product vulnerability data of multiple devices by calling multiple product vulnerability scanning interfaces; performing key field extraction processing on the multiple product vulnerability data based on a preset format conversion tool to obtain a second product data set and a second vulnerability data set, wherein the second vulnerability data set includes vulnerability information of multiple devices, and the second product data set includes product information corresponding to the vulnerability information of multiple devices, wherein a unified and automated data format conversion tool is used to convert product vulnerability data of different manufacturers into a unified data format to solve the problem of data inconsistency, and the second product data set is product vulnerability data corresponding to different image ImageIDs obtained by calling product vulnerability scanning interfaces provided by various manufacturers. After the product vulnerability data is converted into a unified format using the format conversion tool, the product vulnerability data corresponding to the same image ID can be associated based on ImageID as the association basis to obtain a CVE list and a CPE list corresponding to the current image ID, and then the CPE entity alignment result is associated with the ImageID and the CVE, and the data output form of the ImageID-CVE-CPEs data table is used. Compared with the traditional CPE entity alignment method which often only outputs the CPE entity alignment result, the form of the ImageID-CVE-CPEs data table can facilitate the target range network attack behavior analysis, attack tracing and other work.

在一些实施例中，CPE实体中的part字段只能采用三个值：a表示应用/软件；h表示硬件平台；o表示操作系统；vendor字段代表供应商；product字段代表产品名称；Version字段代表发布版本，此属性的值应为供应商发布产品的字母、数字组成的版本字符串；update字段代表更新版本，此属性的值应为字母、数字组成的字符串，表示产品的特定更新、服务包，如sp3；edition字段代表在2.3版本规范中不推荐使用，除非需要与CPE规范2.2版向后兼容，否则应为其分配逻辑值ANY；language字段代表[RFC5646]定义的有效语言标签，并应用于定义所描述产品的用户界面中支持的语言，如en，us等；基于上述字段，基于第一产品数据集对多个产品实体进行数据补充处理，利用从CNNVD获取的CPE完整信息对构建得到的CPE实体中的某些缺失字段进行补充的过程包括：根据vendor、product、version字段进行匹配，若构建得到的产品实体的这三个字段信息均存在，则将CNNVD的CPE信息及其他可用字段对构建得到的产品实体进行字段信息补全；若构建得到的产品实体的这三个字段信息中有部分缺失，导致无法精确匹配的，就舍弃构建得到的产品实体。In some embodiments, the part field in the CPE entity can only use three values: a represents application/software; h represents hardware platform; o represents operating system; the vendor field represents the vendor; the product field represents the product name; the version field represents the release version, and the value of this attribute should be a version string composed of letters and numbers of the product released by the vendor; the update field represents the update version, and the value of this attribute should be a string composed of letters and numbers, indicating a specific update or service pack of the product, such as sp3; the edition field represents that it is not recommended to use in the 2.3 version specification, unless backward compatibility with the CPE specification version 2.2 is required, otherwise it should be assigned the logical value ANY; the language field represents [RFC5646 ] defined valid language tags, and used to define the languages supported in the user interface of the described product, such as en, us, etc.; based on the above fields, data supplement processing is performed on multiple product entities based on the first product data set, and the process of using the complete CPE information obtained from CNNVD to supplement certain missing fields in the constructed CPE entity includes: matching according to the vendor, product, and version fields. If the three field information of the constructed product entity exists, the CPE information of CNNVD and other available fields are used to complete the field information of the constructed product entity; if part of the three field information of the constructed product entity is missing, resulting in an inability to accurately match, the constructed product entity is discarded.

在一些实施例中，可以理解的是，在根据第一漏洞数据集构建多个漏洞实体对应的多个产品实体，确定产品实体与CVEID的对应关系之后，可以将关联了CVEID的产品实体数据存储到CVE-CPEs关联映射表中，完成CVE-CPEs关联映射表的构建，进而在后续可以基于CVEID对应的CVE数据将CVE列表和CVE-CPEs关联映射表进行碰撞，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据。In some embodiments, it can be understood that after constructing multiple product entities corresponding to multiple vulnerability entities based on the first vulnerability data set and determining the correspondence between the product entity and the CVEID, the product entity data associated with the CVEID can be stored in the CVE-CPEs association mapping table to complete the construction of the CVE-CPEs association mapping table, and then the CVE list and the CVE-CPEs association mapping table can be collided based on the CVE data corresponding to the CVEID in the subsequent process, and multiple candidate product entity data associated with the target vulnerability entity in the second vulnerability data set can be selected from the multiple product entity data.

在一些实施例中，通过调用多个产品漏洞扫描接口，以获取多个设备的产品漏洞数据，包括：通过调用第一厂商提供的产品扫描接口，获取目标设备的产品数据；通过调用第二厂商提供的漏洞扫描接口，获取目标设备的漏洞数据；其中，产品数据和漏洞数据具有相同的镜像标识，不同的设备具有不同的镜像标识，第二产品数据集和第二漏洞数据集中数据通过镜像标识进行关联，故可以通过从第二漏洞数据集中获取目标漏洞实体对应的目标镜像标识；从第二产品数据集中获取目标镜像标识对应的目标产品信息，从第二产品数据集中获取目标漏洞实体对应的目标产品信息。In some embodiments, multiple product vulnerability scanning interfaces are called to obtain product vulnerability data of multiple devices, including: obtaining product data of the target device by calling a product scanning interface provided by a first manufacturer; obtaining vulnerability data of the target device by calling a vulnerability scanning interface provided by a second manufacturer; wherein the product data and the vulnerability data have the same image identifier, different devices have different image identifiers, and the data in the second product data set and the second vulnerability data set are associated through the image identifier, so the target image identifier corresponding to the target vulnerability entity can be obtained from the second vulnerability data set; the target product information corresponding to the target image identifier is obtained from the second product data set, and the target product information corresponding to the target vulnerability entity is obtained from the second product data set.

参考图2，图2为本申请一实施例提供的确定目标产品实体数据的流程图；在一些实施例中，计算目标产品信息与多个候选产品实体数据的匹配系数，并基于匹配系数从多个候选产品实体数据中确定目标产品实体数据，包括但不限于有：Referring to FIG. 2 , FIG. 2 is a flowchart of determining target product entity data provided by an embodiment of the present application; in some embodiments, the matching coefficient between the target product information and a plurality of candidate product entity data is calculated, and the target product entity data is determined from the plurality of candidate product entity data based on the matching coefficient, including but not limited to:

步骤S210，基于预设的文本字典库，分别计算目标产品信息与每个候选产品实体数据之间的匹配系数；Step S210, based on a preset text dictionary library, respectively calculating the matching coefficient between the target product information and each candidate product entity data;

步骤S220，将最高的匹配系数对应的候选产品实体数据确定为目标漏洞实体对应的目标产品实体数据。Step S220: determine the candidate product entity data corresponding to the highest matching coefficient as the target product entity data corresponding to the target vulnerability entity.

在一些实施例中，文本字典库包括中文、英文字典库，使用中文、英文字典库对目标产品信息和目标产品实体数据进行匹配查询，并将最高的匹配系数对应的候选产品实体数据确定为目标漏洞实体对应的目标产品实体数据，以实现对实体数据的基于字典库的匹配一致性判别。In some embodiments, the text dictionary library includes Chinese and English dictionary libraries, which are used to perform matching queries on target product information and target product entity data, and the candidate product entity data corresponding to the highest matching coefficient is determined as the target product entity data corresponding to the target vulnerability entity, so as to achieve dictionary-based matching consistency judgment of the entity data.

参考图3，图3为本申请一实施例提供的确定目标产品实体数据的另一流程图；在一些实施例中，基于预设的文本字典库，分别计算目标产品信息与每个候选产品实体数据之间的匹配系数之后，方法还包括但不限于有：Referring to FIG. 3 , FIG. 3 is another flowchart of determining target product entity data provided by an embodiment of the present application; in some embodiments, after respectively calculating the matching coefficient between the target product information and each candidate product entity data based on a preset text dictionary library, the method further includes but is not limited to:

步骤S310，在最高的匹配系数大于系数阈值的情况下，将最高的匹配系数对应的候选产品实体数据确定为目标漏洞实体对应的目标产品实体数据；Step S310, when the highest matching coefficient is greater than the coefficient threshold, determining the candidate product entity data corresponding to the highest matching coefficient as the target product entity data corresponding to the target vulnerability entity;

步骤S320，在最高的匹配系数小于或者等于系数阈值的情况下，分别计算目标产品信息与每个候选产品实体数据之间的词向量相似系数和编辑距离相似系数，基于匹配系数、词向量相似系数和编辑距离相似系数从多个候选产品实体数据中确定目标产品实体数据。Step S320, when the highest matching coefficient is less than or equal to the coefficient threshold, respectively calculate the word vector similarity coefficient and the edit distance similarity coefficient between the target product information and each candidate product entity data, and determine the target product entity data from multiple candidate product entity data based on the matching coefficient, the word vector similarity coefficient and the edit distance similarity coefficient.

可以理解的是，传统的CPE实体对齐方法采用字符串匹配或基于规则匹配的方法时对CPE的变体不敏感，容易产生误匹配，此外，传统的产品漏洞数据的关联方法基于指纹信息进行匹配时，又无法考虑指纹的上下文信息，因此可能会将指纹匹配错误，故本申请实施例提出上述步骤S310至步骤S320对应的多层模糊匹配过程，可以其次按照粗筛（基于字典的匹配）、细分（基于编辑距离、基于词向量空间相似性）的层次，计算实体间相似性；在此基础上，对从不同厂商获取的产品漏洞数据进行数据关联，筛选出候选集输入到多层实体对齐模型中计算实体相似度。It is understandable that the traditional CPE entity alignment method is insensitive to CPE variants when using string matching or rule-based matching methods, and is prone to mismatching. In addition, the traditional product vulnerability data association method is based on fingerprint information for matching, but it cannot consider the contextual information of the fingerprint, so the fingerprint may be matched incorrectly. Therefore, the embodiment of the present application proposes a multi-layer fuzzy matching process corresponding to the above steps S310 to S320, which can then calculate the similarity between entities according to the levels of coarse screening (dictionary-based matching) and subdivision (based on edit distance, based on word vector space similarity); on this basis, the product vulnerability data obtained from different manufacturers are associated with the data, and the candidate set is screened out and input into the multi-layer entity alignment model to calculate the entity similarity.

在一些实施例中，基于匹配系数、词向量相似系数和编辑距离相似系数从多个候选产品实体数据中确定目标产品实体数据，包括：基于预设的权重配置对匹配系数、词向量相似系数和编辑距离相似系数进行加权计算，以得到聚合相似系数；将最高的聚合相似系数对应的候选产品实体数据确定为目标漏洞实体对应的目标产品实体数据，可以理解的是，在利用中文及英文字典库对查询的CPE实体进行匹配查询后，若最高的匹配系数大于系数阈值，查询结果与CPE实体数据一致，则仅通过字典库匹配来实现实体对齐操作；若基于字典库的匹配一致性判定未通过，则转而采用相似性匹配方法，以基于词向量空间距离相似性和基于编辑距离相似性的计算，以评估实体的匹配程度。具体的，基于词向量空间距离相似性的计算方法涉及将查询CPE实体数据和CPE实体数据输入至BERT-BiLSTM-CRF预训练模型中，以获得各自对应的向量表示，进而计算两者之间的向量余弦相似度，并筛选出相似度最高的CPE实体数据；而基于编辑距离的相似性方法则是通过计算从一个字符串转变至另一个字符串所需的最少操作次数（包括修改、插入及删除）来度量两个字符串间的相似性，在本发明中，主要应用Smith-Waterman距离算法来计算与查询CPE实体距离最小的CPE字符串，并将其作为候选的CPE实体数据，最后，通过评分聚合的方式，将上述两层实体对齐得到的相似性得分进行聚合，评分聚合的相关权重可以如表（1）所示：In some embodiments, target product entity data is determined from multiple candidate product entity data based on the matching coefficient, word vector similarity coefficient and edit distance similarity coefficient, including: weighted calculation of the matching coefficient, word vector similarity coefficient and edit distance similarity coefficient based on a preset weight configuration to obtain an aggregated similarity coefficient; the candidate product entity data corresponding to the highest aggregated similarity coefficient is determined as the target product entity data corresponding to the target vulnerability entity. It can be understood that after using the Chinese and English dictionary libraries to perform a matching query on the queried CPE entity, if the highest matching coefficient is greater than the coefficient threshold, and the query result is consistent with the CPE entity data, the entity alignment operation is only implemented through dictionary library matching; if the matching consistency judgment based on the dictionary library fails, the similarity matching method is used instead to evaluate the matching degree of the entity based on the calculation based on the word vector space distance similarity and the edit distance similarity. Specifically, the calculation method based on word vector space distance similarity involves inputting the query CPE entity data and the CPE entity data into the BERT-BiLSTM-CRF pre-training model to obtain the corresponding vector representations, and then calculating the vector cosine similarity between the two, and screening out the CPE entity data with the highest similarity; while the similarity method based on edit distance measures the similarity between two strings by calculating the minimum number of operations (including modification, insertion and deletion) required to transform from one string to another. In the present invention, the Smith-Waterman distance algorithm is mainly used to calculate the CPE string with the smallest distance from the query CPE entity, and use it as the candidate CPE entity data. Finally, the similarity scores obtained by aligning the above two layers of entities are aggregated by score aggregation. The relevant weights of score aggregation can be shown in Table (1):

表（1）评分聚合权重配置Table (1) Rating aggregation weight configuration

在最高的匹配系数大于系数阈值的情况下，此时基于字典库的匹配、基于词向量相似性、基于编辑距离相似性的评分权重分别为1:0:0；在最高的匹配系数小于或者等于系数阈值的情况下，此时基于字典库的匹配、基于词向量相似性、基于编辑距离相似性的评分权重分别为0.2:0.4:0.4，通过采用多层模糊匹配的方法，首先采用字典匹配的粗筛方法进行快速匹配，然后采用编辑距离及词向量相似距离的方法进行精确匹配，有效提高实体对齐的效率。When the highest matching coefficient is greater than the coefficient threshold, the scoring weights of dictionary-based matching, word vector similarity, and edit distance similarity are 1:0:0 respectively; when the highest matching coefficient is less than or equal to the coefficient threshold, the scoring weights of dictionary-based matching, word vector similarity, and edit distance similarity are 0.2:0.4:0.4 respectively. By adopting a multi-layer fuzzy matching method, the coarse screening method of dictionary matching is first used for fast matching, and then the edit distance and word vector similarity distance methods are used for precise matching, which effectively improves the efficiency of entity alignment.

参考图4，图4为本申请一实施例提供的集构建多个产品实体的流程图；在一些实施例中，根据第一漏洞数据集构建多个漏洞实体对应的多个产品实体，包括但不限于有：Referring to FIG. 4 , FIG. 4 is a flowchart of constructing multiple product entities according to an embodiment of the present application; in some embodiments, multiple product entities corresponding to multiple vulnerability entities are constructed according to the first vulnerability data set, including but not limited to:

步骤S410，基于预设的大语言模型对第一漏洞数据集进行信息提取处理，得到多个漏洞实体对应的多个产品信息；Step S410, performing information extraction processing on the first vulnerability data set based on a preset large language model to obtain multiple product information corresponding to multiple vulnerability entities;

步骤S420，根据多个产品信息构建多个产品实体。Step S420: construct multiple product entities according to multiple product information.

在一些实施例中，大语言模型包括ChatGPT大模型，将第一漏洞数据集中CVE的描述字典信息输入到ChatGPT中进行关键信息抽取，抽取出当前CVE所影响到的产品名称、供应商名称、版本号等产品信息，以构建多个CPE产品实体,进而可以在后续步骤中使用从CNNVD获取并构建的CPE数据集，即使用第二产品数据集，对构建的CPE实体数据进行数据补充。In some embodiments, the large language model includes a ChatGPT large model, which inputs the description dictionary information of the CVE in the first vulnerability data set into ChatGPT to extract key information, extracts product information such as the product name, supplier name, version number, etc. affected by the current CVE, so as to construct multiple CPE product entities, and then the CPE data set obtained and constructed from CNNVD can be used in subsequent steps, that is, the second product data set is used to supplement the constructed CPE entity data.

可以理解的是，相对于传统的CPE实体对齐方法主要采用基于字符串匹配的方法，易出现漏匹配和误匹配，本申请采用了产品漏洞关联的方法，首先通过构建产品漏洞关联数据集，并借助ChatGPT大模型来抽取CPE实体信息，ChatGPT具备出色的语言理解能力，能够从CVE描述文本中精确地提取出CPE实体信息，从而确保了较高的准确性，进而使得在对齐过程中，本申请可以进一步通过与实体集进行碰撞，缩小了实体对齐的候选集范围，从而提高了对齐的准确性和效率，增强了整体的处理效能。It is understandable that compared with the traditional CPE entity alignment method which mainly adopts the string matching-based method, which is prone to missed matches and false matches, the present application adopts the product vulnerability association method. First, by constructing a product vulnerability association data set and using the ChatGPT large model to extract CPE entity information, ChatGPT has excellent language understanding capabilities and can accurately extract CPE entity information from the CVE description text, thereby ensuring a high degree of accuracy. As a result, during the alignment process, the present application can further collide with the entity set to narrow the candidate set range for entity alignment, thereby improving the accuracy and efficiency of the alignment and enhancing the overall processing efficiency.

参考图5，图5为本申请一实施例提供的得到多个候选产品实体数据的流程图；在一些实施例中，基于漏洞实体和产品实体数据之间的关联关系，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据，包括但不限于有：Referring to FIG. 5 , FIG. 5 is a flowchart of obtaining multiple candidate product entity data provided by an embodiment of the present application; in some embodiments, based on the association relationship between the vulnerability entity and the product entity data, multiple candidate product entity data associated with the target vulnerability entity in the second vulnerability data set are selected from the multiple product entity data, including but not limited to:

步骤S510，获取第二漏洞数据集中目标漏洞实体的漏洞信息，并从多个漏洞实体中确定漏洞信息对应的漏洞实体；Step S510, obtaining vulnerability information of a target vulnerability entity in a second vulnerability data set, and determining a vulnerability entity corresponding to the vulnerability information from a plurality of vulnerability entities;

步骤S520，基于漏洞实体和产品实体数据之间的关联关系，从多个产品实体数据中选取与漏洞信息对应的漏洞实体相关联的多个候选产品实体数据。Step S520 , based on the association relationship between the vulnerability entity and the product entity data, a plurality of candidate product entity data associated with the vulnerability entity corresponding to the vulnerability information are selected from the plurality of product entity data.

在一些实施例中，通过得到多个漏洞实体对应的多个产品实体数据的过程可以生成CVE-CPEs关联映射表，CVE-CPEs关联映射表可以理解成过CNNVD获取并构建得到一个全集，而利用厂商接口扫描得到的CVE列表仅仅包含了ImageID对应的这台机器上所包含的漏洞信息，是一个子集，两者碰撞的目的是在全集关联映射表中查找，即从多个产品实体数据中选取与漏洞信息对应的漏洞实体相关联的多个候选产品实体数据，得到CPE候选集，这个数据中的CPE实体信息是规范的，而本申请利用厂商接口得到的CPE在命名规范性、字段的完整性上都是有缺陷的，所以要从多个候选产品实体数据中确定目标产品实体数据，以规范本申请每台设备上的产品信息，实现关联漏洞的产品实体对齐。In some embodiments, a CVE-CPEs association mapping table can be generated by obtaining multiple product entity data corresponding to multiple vulnerability entities. The CVE-CPEs association mapping table can be understood as a complete set obtained and constructed through CNNVD, while the CVE list obtained by scanning using the manufacturer interface only contains the vulnerability information contained on the machine corresponding to the ImageID, which is a subset. The purpose of the collision between the two is to search in the complete set association mapping table, that is, to select multiple candidate product entity data associated with the vulnerability entity corresponding to the vulnerability information from multiple product entity data to obtain a CPE candidate set. The CPE entity information in this data is standardized, and the CPE obtained by this application using the manufacturer interface is defective in naming standardization and field integrity. Therefore, it is necessary to determine the target product entity data from multiple candidate product entity data to standardize the product information on each device of this application and achieve product entity alignment of associated vulnerabilities.

在一些实施例中，在获取漏洞信息并确定漏洞实体的过程中，系统可以从第二漏洞数据集中提取特定漏洞的详细信息，这些信息可能包括CVE标识符、漏洞描述、影响范围等，并从所有可能的漏洞实体中识别出与目标漏洞信息相匹配的实体，以在确定了具体的漏洞实体，并利用已知的漏洞实体和产品实体之间的关联关系，来识别可能受影响的产品，关联关系可以通过上述得到多个漏洞实体对应的多个产品实体数据构架的CVE-CPEs关联映射表来实现，该关联映射表包含了CVE与可能受影响的CPE之间的对应关系，可以理解的是，上述从多个漏洞实体中确定漏洞信息对应的漏洞实体的过程，即为将第二产品数据集对应的CVE列表中的每个CVE与CVE-CPEs关联映射表进行匹配，以确定哪些CPE可能受到特定CVE的影响，缩小范围，生成候选CPE实体集的过程，即实现了将CVE列表和CVE-CPEs关联映射表进行碰撞，通过碰撞过程，系统能够生成一个缩小范围的候选CPE实体集，候选CPE实体集包含了所有可能受到目标CVE影响的产品实体，有效地减少了需要进一步分析的数据量，从而加快了后续实体对齐和漏洞管理的效率。In some embodiments, in the process of obtaining vulnerability information and determining vulnerability entities, the system can extract detailed information of a specific vulnerability from the second vulnerability data set, which may include a CVE identifier, a vulnerability description, an impact range, etc., and identify entities that match the target vulnerability information from all possible vulnerability entities, so as to identify potentially affected products by using the association relationship between known vulnerability entities and product entities after determining the specific vulnerability entity. The association relationship can be realized by obtaining a CVE-CPEs association mapping table of a plurality of product entity data structures corresponding to the plurality of vulnerability entities as described above. The association mapping table contains the correspondence between CVE and the potentially affected CPE. It can be understood that Yes, the above process of determining the vulnerability entity corresponding to the vulnerability information from multiple vulnerability entities is to match each CVE in the CVE list corresponding to the second product data set with the CVE-CPEs association mapping table to determine which CPEs may be affected by a specific CVE, narrow the scope, and generate a candidate CPE entity set, that is, to realize the collision of the CVE list and the CVE-CPEs association mapping table. Through the collision process, the system can generate a narrowed candidate CPE entity set. The candidate CPE entity set includes all product entities that may be affected by the target CVE, effectively reducing the amount of data that needs further analysis, thereby speeding up the efficiency of subsequent entity alignment and vulnerability management.

参考图6，图6为本申请一实施例提供的漏洞关联的产品数据的处理方法的实例流程图；在一些实施例中，为了更高效、准确的完成靶场产品数据的CPE实体对齐，构建能有效应用于靶场网络攻击行为分析、攻击溯源等工作的ImageID-CVE-CPEs数据表，本申请由关联数据集构建、多层模糊匹配模型和对齐模块三个模块构成，其中，对应关联数据集构建过程，本申请可以使用ChatGPT大模型等大语言模型对按照预设间隔时长从CNNVD获取的公开的CVE实体数据进行解析，抽取出当前漏洞所影响的CPE实体列表，使用多层模糊匹配模块，基于按照预设间隔时长从CNNVD获取的CPE实体数据库，对解析出的CPE实体数据进行缺失补全和校准，得到多个漏洞实体对应的多个产品实体数据，以构建CVE-CPEs关联映射表；同时，对应实体对齐过程，本申请可以基于网络靶场模拟的真实网络攻击场景，该方法使用数据格式转换模块将不同厂商的产品漏洞扫描结果转换成具有统一格式的数据，并基于统一镜像标识符ImageID，实现产品数据和漏洞数据的关联，将其与CVE-CPEs关联映射表碰撞，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据，即得到当前ImageID对应的CVE-CPEs实体候选集；进一步的，在实体对齐过程中使用多层模糊匹配模型，基于匹配系数和相似系数从多个候选产品实体数据中确定目标产品实体数据，更新得到ImageID-CVE-CPEs数据表，以备安全分析人员进行攻击检测、安全分析等工作，且上述过程还具有自动化构建等优点，构建的数据表能有效应用于靶场的网络攻击行为分析、攻击溯源等研究工作。Refer to Figure 6, which is an example flow chart of a method for processing product data associated with a vulnerability provided in an embodiment of the present application; in some embodiments, in order to more efficiently and accurately complete the CPE entity alignment of the target range product data, construct an ImageID-CVE-CPEs data table that can be effectively applied to the target range network attack behavior analysis, attack tracing and other tasks, the present application consists of three modules: associated data set construction, multi-layer fuzzy matching model and alignment module. Among them, corresponding to the associated data set construction process, the present application can use a large language model such as the ChatGPT large model to parse the public CVE entity data obtained from CNNVD at a preset interval, extract the CPE entity list affected by the current vulnerability, and use a multi-layer fuzzy matching module based on the CPE entity database obtained from CNNVD at a preset interval to complete and calibrate the parsed CPE entity data, and obtain multiple product entity data corresponding to multiple vulnerability entities to construct a CVE-CPEs association mapping table; at the same time, the corresponding entity Alignment process, the present application can be based on a real network attack scenario simulated by a network target range. The method uses a data format conversion module to convert product vulnerability scanning results of different manufacturers into data with a unified format, and based on a unified image identifier ImageID, realizes the association between product data and vulnerability data, collides it with the CVE-CPEs association mapping table, and selects multiple candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from multiple product entity data, that is, obtains the CVE-CPEs entity candidate set corresponding to the current ImageID; further, a multi-layer fuzzy matching model is used in the entity alignment process, and the target product entity data is determined from multiple candidate product entity data based on the matching coefficient and the similarity coefficient, and the ImageID-CVE-CPEs data table is updated to prepare for security analysts to perform attack detection, security analysis and other tasks, and the above process also has the advantages of automated construction, and the constructed data table can be effectively applied to network attack behavior analysis, attack tracing and other research work in the target range.

参考图7，图7为本申请一实施例提供的得到多个产品实体数据的实例流程图；在一些实施例中，对应图7中关联数据集构建过程，为了对后续对齐过程提供数据支撑，本申请需要首先构建基础的产品漏洞关联数据集，包括数据获取、关键信息抽取和关联数据表构建等步骤，具体的，首先进行数据获取工作，按照预设间隔时长从国家信息安全漏洞库和国家信息安全漏洞共享平台获取的公开的CVE数据和CPE数据；然后对CVE数据、CPE数据解析，完成关键信息的抽取工作，最后，基于CVE的描述文字内容，使用ChatGPT大模型抽取当前CVE所影响的CPE列表(CPEs)，并使用CPE数据集对当前CPE列表(CPEs)进行数据修复和完善，即得到CPE-CPEs关联映射表数据，为后续CPE实体对齐模块提供数据支撑。Refer to Figure 7, which is an example flow chart of obtaining multiple product entity data provided by an embodiment of the present application; in some embodiments, corresponding to the associated data set construction process in Figure 7, in order to provide data support for the subsequent alignment process, the present application needs to first construct a basic product vulnerability associated data set, including data acquisition, key information extraction and associated data table construction. Specifically, first perform data acquisition work, and obtain public CVE data and CPE data from the National Information Security Vulnerability Database and the National Information Security Vulnerability Sharing Platform at preset intervals; then parse the CVE data and CPE data to complete the extraction of key information. Finally, based on the descriptive text content of CVE, use the ChatGPT large model to extract the CPE list (CPEs) affected by the current CVE, and use the CPE data set to repair and improve the current CPE list (CPEs), that is, obtain CPE-CPEs association mapping table data to provide data support for the subsequent CPE entity alignment module.

参考图8，图8为本申请一实施例提供的计算匹配系数的实例流程图；在一些实施例中，对应图8中多层模糊匹配过程，模型可以使用基于字典的数值匹配、基于编辑距离的相似值计算以及基于词向量空间距离的相似性计算三种方法，依次完成实体值的匹配和实体值相似性的计算，具体的，可以首先基于中英文字典库、同义词词库对实体值进行匹配；对于无法有效匹配的实体值，使用基于编辑距离（Smith-Waterman）距离的相似性函数和词向量模型分别计算实体间相似度；最后，通过对两种相似度进行评分聚合，得到相似度评分最高的实体对象，其中，词向量模型可以为命名实体识别模型BERT-Bi-LSTM-CRF，可以理解的是图8中匹配过程即为具体实体数据之间匹配，参照CPE实体对应从第二产品数据集中获取目标漏洞实体对应的目标产品信息，待查询CPE实体即对应目标漏洞实体的候选产品实体数据。Refer to Figure 8, which is an example flow chart of calculating the matching coefficient provided by an embodiment of the present application; in some embodiments, corresponding to the multi-layer fuzzy matching process in Figure 8, the model can use three methods, namely, dictionary-based numerical matching, edit distance-based similarity calculation, and word vector space distance-based similarity calculation, to complete the matching of entity values and the calculation of entity value similarity in sequence. Specifically, the entity values can be first matched based on the Chinese and English dictionary library and the synonym dictionary; for entity values that cannot be effectively matched, the similarity between entities is calculated respectively using a similarity function based on the edit distance (Smith-Waterman) distance and a word vector model; finally, by scoring and aggregating the two similarities, an entity object with the highest similarity score is obtained, wherein the word vector model can be a named entity recognition model BERT-Bi-LSTM-CRF. It can be understood that the matching process in Figure 8 is the matching between specific entity data. Referring to the CPE entity correspondence, the target product information corresponding to the target vulnerability entity is obtained from the second product data set, and the CPE entity to be queried is the candidate product entity data corresponding to the target vulnerability entity.

参考图9，图9为本申请一实施例提供的确定目标产品实体数据的实例流程图；在一些实施例中，对应图9中多层模糊匹配CPE实体对齐过程，为了实现不同厂商扫描的产品数据与CPE实体数据集对齐，该模块包括扫描数据统一格式转换、扫描数据产品漏洞关联、实体对齐计算三个步骤，具体的，首可以先需要使用格式转换工具对不同厂商扫描的产品漏洞数据进行格式转换，得到统一格式的扫描CVE、CPE数据；然后使用镜像标识符ImageID，将扫描CVE和CPE数据进行关联，得到ImageID-CVE-CPEs数据表；对于镜像ImageID的CVEs列表，使用CVEID对CVE数据集进行碰撞，得到数据量更小的CPEs实体候选集，即从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；最后使用多层模糊匹配模型计算实体间CPEs实体候选集和查询CPE实体数据的相似度，选取相似度评分最高的实体，并更新ImageID-CVE-CPEs数据表，完成实体对齐过程，图9中厂商产品漏洞扫描接口1至n即对应多个不同的厂商。Refer to Figure 9, which is an example flow chart of determining the target product entity data provided by an embodiment of the present application; in some embodiments, corresponding to the multi-layer fuzzy matching CPE entity alignment process in Figure 9, in order to achieve alignment of product data scanned by different manufacturers with the CPE entity data set, the module includes three steps of scanning data unified format conversion, scanning data product vulnerability association, and entity alignment calculation. Specifically, first, a format conversion tool may be used to convert the format of product vulnerability data scanned by different manufacturers to obtain scanned CVE and CPE data in a unified format; then, the image identifier ImageID is used to associate the scanned CVE and CPE data to obtain ImageID. ID-CVE-CPEs data table; for the CVEs list of the mirror ImageID, use CVEID to collide the CVE data set to obtain a CPEs entity candidate set with a smaller data volume, that is, select multiple candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from multiple product entity data; finally, use a multi-layer fuzzy matching model to calculate the similarity between the CPEs entity candidate set and the query CPE entity data, select the entity with the highest similarity score, and update the ImageID-CVE-CPEs data table to complete the entity alignment process. The manufacturer product vulnerability scanning interfaces 1 to n in Figure 9 correspond to multiple different manufacturers.

本申请实施例还提供了一种漏洞关联的产品数据的处理装置，包括：数据获取模块，用于获取来自于公共数据库的第一产品数据集和第一漏洞数据集，以及获取来自于多个设备的第二产品数据集和第二漏洞数据集；实体构建模块，用于根据第一漏洞数据集构建多个漏洞实体对应的多个产品实体；数据补充模块，用于基于第一产品数据集对多个产品实体进行数据补充处理，以得到多个漏洞实体对应的多个产品实体数据；实体关联模块，用于基于漏洞实体和产品实体数据之间的关联关系，从多个产品实体数据中选取与第二漏洞数据集中的目标漏洞实体相关联的多个候选产品实体数据；实体确定模块，用于从第二产品数据集中获取目标漏洞实体对应的目标产品信息；实体对齐模块，用于计算目标产品信息与多个候选产品实体数据的匹配系数，并基于匹配系数从多个候选产品实体数据中确定目标产品实体数据。其中，漏洞关联的产品数据的处理装置可以用于执行上述任意一项实施例中提到的漏洞关联的产品数据的处理方法，进而达到相应的基于公共数据库得到漏洞产品关联关系，从而确定目标设备中漏洞对应的产品实体数据，实现关联漏洞的产品实体对齐，有效提高产品实体对齐的准确性和可靠性的效果，故在此不再赘述。The embodiment of the present application also provides a processing device for product data associated with a vulnerability, including: a data acquisition module, used to acquire a first product data set and a first vulnerability data set from a public database, and to acquire a second product data set and a second vulnerability data set from multiple devices; an entity construction module, used to construct multiple product entities corresponding to multiple vulnerability entities according to the first vulnerability data set; a data supplement module, used to perform data supplement processing on multiple product entities based on the first product data set to obtain multiple product entity data corresponding to the multiple vulnerability entities; an entity association module, used to select multiple candidate product entity data associated with a target vulnerability entity in a second vulnerability data set from multiple product entity data based on the association relationship between the vulnerability entity and the product entity data; an entity determination module, used to acquire target product information corresponding to the target vulnerability entity from the second product data set; an entity alignment module, used to calculate a matching coefficient between the target product information and the multiple candidate product entity data, and determine the target product entity data from the multiple candidate product entity data based on the matching coefficient. Among them, the processing device of vulnerability-associated product data can be used to execute the processing method of vulnerability-associated product data mentioned in any of the above embodiments, so as to achieve the corresponding vulnerability-product association relationship based on a public database, thereby determining the product entity data corresponding to the vulnerability in the target device, realizing product entity alignment of associated vulnerabilities, and effectively improving the accuracy and reliability of product entity alignment, so it will not be repeated here.

本申请的一些实施例提供了一种电子设备，图10为本申请一实施例提供的电子设备的结构示意图，参考图10，电子设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现上述任意一项实施例的漏洞关联的产品数据的处理方法，例如，执行以上描述的图1中的方法步骤S110至步骤S150，图2中的方法步骤S210至步骤S220，图3中的方法步骤S310至步骤S320，图4中的方法步骤S410至步骤S420，图5中的方法步骤S510至步骤S520。Some embodiments of the present application provide an electronic device. FIG10 is a schematic diagram of the structure of the electronic device provided by an embodiment of the present application. Referring to FIG10 , the electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, a method for processing product data associated with a vulnerability in any of the above-described embodiments is implemented, for example, method steps S110 to S150 in FIG1 , method steps S210 to S220 in FIG2 , method steps S310 to S320 in FIG3 , method steps S410 to S420 in FIG4 , and method steps S510 to S520 in FIG5 are executed.

本申请实施例的电子设备1000包括一个或多个处理器1010和存储器1020，图10中以一个处理器1010及一个存储器1020为例。The electronic device 1000 of the embodiment of the present application includes one or more processors 1010 and a memory 1020. FIG. 10 takes one processor 1010 and one memory 1020 as an example.

处理器1010和存储器1020可以通过总线或者其他方式连接，图10中以通过总线连接为例。The processor 1010 and the memory 1020 may be connected via a bus or other means, and FIG10 takes the connection via a bus as an example.

存储器1020作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外，存储器1020可以包括高速随机存取存储器，还可以包括非暂态存储器，例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中，存储器1020可选包括相对于处理器1010远程设置的存储器1020，这些远程存储器可以通过网络连接至电子设备1000，同时，上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1020, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory 1020 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one disk storage device, a flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1020 may optionally include a memory 1020 remotely disposed relative to the processor 1010, and these remote memories may be connected to the electronic device 1000 via a network. At the same time, examples of the above-mentioned network include but are not limited to the Internet, an intranet, a local area network, a mobile communication network, and a combination thereof.

在一些实施例中，处理器执行计算机程序时按照预设间隔时间执行上述任意一项实施例的漏洞关联的产品数据的处理方法。In some embodiments, when the processor executes the computer program, the method for processing product data associated with a vulnerability of any of the above embodiments is executed at preset intervals.

本领域技术人员可以理解，图10中示出的装置结构并不构成对电子设备1000的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art will appreciate that the device structure shown in FIG. 10 does not limit the electronic device 1000 , and may include more or fewer components than shown, or combine certain components, or arrange the components differently.

在图10所示的电子设备1000中，处理器1010可以用于调用存储器1020中储存的漏洞关联的产品数据的处理方法，从而实现漏洞关联的产品数据的处理方法。In the electronic device 1000 shown in FIG. 10 , the processor 1010 may be used to call a method for processing vulnerability-associated product data stored in the memory 1020 , thereby implementing the method for processing vulnerability-associated product data.

基于上述电子设备1000的硬件结构，提出本申请的漏洞关联的产品数据的处理装置的各个实施例，同时，实现上述实施例的漏洞关联的产品数据的处理方法所需的非暂态软件程序以及指令存储在存储器中，当被处理器执行时，执行上述实施例的漏洞关联的产品数据的处理方法。Based on the hardware structure of the above-mentioned electronic device 1000, various embodiments of the device for processing vulnerability-associated product data of the present application are proposed. At the same time, the non-transient software program and instructions required to implement the method for processing vulnerability-associated product data of the above-mentioned embodiments are stored in the memory. When executed by the processor, the method for processing vulnerability-associated product data of the above-mentioned embodiments is executed.

本申请实施例的还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，计算机可执行指令用于执行上述的漏洞关联的产品数据的处理方法，可使得上述一个或多个处理器执行上述任意一项实施例的漏洞关联的产品数据的处理方法，例如，执行以上描述的图1中的方法步骤S110至步骤S150，图2中的方法步骤S210至步骤S220，图3中的方法步骤S310至步骤S320，图4中的方法步骤S410至步骤S420，图5中的方法步骤S510至步骤S520。An embodiment of the present application also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to execute the above-mentioned method for processing product data associated with vulnerabilities, so that the above-mentioned one or more processors can execute the method for processing product data associated with vulnerabilities of any of the above-mentioned embodiments, for example, execute method steps S110 to S150 in Figure 1 described above, method steps S210 to S220 in Figure 2, method steps S310 to S320 in Figure 3, method steps S410 to S420 in Figure 4, and method steps S510 to S520 in Figure 5.

本申请实施例的还提供了一种计算机程序产品，该计算机程序产品包括计算机程序，该计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序，处理器执行该计算机程序，使得该计算机设备执行实现上述任意一项实施例的漏洞关联的产品数据的处理方法，例如，执行以上描述的图1中的方法步骤S110至步骤S150，图2中的方法步骤S210至步骤S220，图3中的方法步骤S310至步骤S320，图4中的方法步骤S410至步骤S420，图5中的方法步骤S510至步骤S520。The embodiment of the present application also provides a computer program product, which includes a computer program, and the computer program is stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device executes the method for processing product data associated with the vulnerability of any of the above embodiments, for example, executing the method steps S110 to S150 in Figure 1, the method steps S210 to S220 in Figure 2, the method steps S310 to S320 in Figure 3, the method steps S410 to S420 in Figure 4, and the method steps S510 to S520 in Figure 5 described above.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，即可以位于一个的方，或者也可以分布到多个网络节点上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network nodes. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机可读存储介质（或非暂时性介质）和通信介质（或暂时性介质）。如本领域普通技术人员公知的，术语计算机可读存储介质包括在用于存储信息（诸如计算机可读指令、数据结构、程序模块或其他数据）的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机可读存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD至ROM、数字多功能盘（DVD）或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。It will be appreciated by those skilled in the art that all or some of the steps and systems in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer-readable storage medium (or non-transitory medium) and a communication medium (or transient medium). As known to those skilled in the art, the term computer-readable storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD to ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

以上是对本申请的较佳实施进行了具体说明，但本申请并不局限于上述实施方式，熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present application, but the present application is not limited to the above-mentioned implementation mode. Technical personnel familiar with the field can also make various equivalent modifications or substitutions without violating the spirit of the present application. These equivalent modifications or substitutions are all included in the scope defined by the claims of the present application.

Claims

1. A method for processing vulnerability-associated product data, comprising:

Obtaining a first product data set and a first vulnerability data set from a common database, and obtaining a second product data set and a second vulnerability data set from a plurality of devices;

Constructing a plurality of product entities corresponding to a plurality of vulnerability entities according to the first vulnerability data set;

performing data supplementing processing on a plurality of product entities based on the first product data set to obtain a plurality of product entity data corresponding to a plurality of vulnerability entities;

Selecting a plurality of candidate product entity data associated with a target vulnerability entity in the second vulnerability data set from the plurality of product entity data based on the association relationship between the vulnerability entity and the product entity data;

acquiring target product information corresponding to the target vulnerability entity from the second product data set;

calculating the matching coefficient of the target product information and the candidate product entity data, and determining target product entity data corresponding to the target vulnerability entity from the candidate product entity data based on the matching coefficient;

Wherein the obtaining a first product data set and a first vulnerability data set from a common database comprises: obtaining product log data and vulnerability log data from the public database according to a preset interval duration; performing data cleaning processing on the product log data and the vulnerability log data, and performing key field extraction processing on the cleaned product log data and the cleaned vulnerability log data to obtain the first product data set and the first vulnerability data set;

Wherein the acquiring a second product data set and a second vulnerability data set from a plurality of devices comprises: acquiring product vulnerability data of a plurality of devices by calling a plurality of product vulnerability scanning interfaces; performing key field extraction processing on the plurality of product vulnerability data based on a preset format conversion tool to obtain a second product dataset and a second vulnerability dataset, wherein the second vulnerability dataset comprises vulnerability information of a plurality of devices, and the second product dataset comprises product information corresponding to the vulnerability information of the plurality of devices;

The constructing a plurality of product entities corresponding to the plurality of vulnerability entities according to the first vulnerability data set includes: information extraction processing is carried out on the first vulnerability data set based on a preset large language model, so that a plurality of product information corresponding to a plurality of vulnerability entities are obtained; constructing a plurality of product entities according to a plurality of the product information;

Wherein the selecting, based on the association between the vulnerability entity and the product entity data, a plurality of candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from the plurality of product entity data includes: obtaining vulnerability information of a target vulnerability entity in the second vulnerability data set, and determining a vulnerability entity corresponding to the vulnerability information from a plurality of vulnerability entities; selecting a plurality of candidate product entity data associated with the vulnerability entity corresponding to the vulnerability information from the plurality of product entity data based on the association relation between the vulnerability entity and the product entity data;

the second product data set and the data in the second vulnerability data set are associated through the mirror image identifiers, and the obtaining the target product information corresponding to the target vulnerability entity from the second product data set includes: acquiring a target mirror image identifier corresponding to the target vulnerability entity from the second vulnerability data set; acquiring target product information corresponding to the target mirror image identifier from the second product data set;

The vulnerability entity is a specific security vulnerability defined in the first vulnerability data set, and the product entity is a product instance object under the specific security vulnerability and constructed according to the information in the first vulnerability data set.

2. The method for processing vulnerability-associated product data according to claim 1, wherein the calculating a matching coefficient between the target product information and the plurality of candidate product entity data, and determining target product entity data corresponding to the target vulnerability entity from the plurality of candidate product entity data based on the matching coefficient, comprises:

Based on a preset text dictionary library, calculating a matching coefficient between the target product information and each candidate product entity data;

And determining the candidate product entity data with the highest matching coefficient as target product entity data corresponding to the target vulnerability entity.

3. The method for processing vulnerability-associated product data according to claim 2, wherein after calculating the matching coefficients between the target product information and each candidate product entity data based on a preset text dictionary library, the method further comprises:

Under the condition that the highest matching coefficient is larger than a coefficient threshold value, determining the candidate product entity data corresponding to the highest matching coefficient as target product entity data corresponding to the target vulnerability entity;

and under the condition that the highest matching coefficient is smaller than or equal to a coefficient threshold value, respectively calculating a word vector similarity coefficient and an editing distance similarity coefficient between the target product information and each candidate product entity data, and determining the target product entity data from a plurality of candidate product entity data based on the matching coefficient, the word vector similarity coefficient and the editing distance similarity coefficient.

4. A method of processing vulnerability-associated product data as recited in claim 3, wherein the determining the target product entity data from a plurality of the candidate product entity data based on the matching coefficients, the word vector similarity coefficients, and the edit distance similarity coefficients comprises:

weighting calculation is carried out on the matching coefficient, the word vector similarity coefficient and the editing distance similarity coefficient based on preset weight configuration so as to obtain an aggregate similarity coefficient;

and determining the candidate product entity data corresponding to the highest aggregation similarity coefficient as the target product entity data.

5. The method for processing vulnerability-associated product data according to claim 1, wherein the step of obtaining the product vulnerability data of the plurality of devices by calling the plurality of product vulnerability scanning interfaces comprises:

Acquiring product data of target equipment by calling a product scanning interface provided by a first manufacturer;

acquiring vulnerability data of the target equipment by calling a vulnerability scanning interface provided by a second manufacturer;

wherein the product data and the vulnerability data have the same mirror image identification.

6. The method of processing vulnerability-associated product data of claim 1, further comprising:

after determining the target product entity data corresponding to the target vulnerability entities of the devices, constructing a product entity alignment table for representing the association relationship among the devices, the target vulnerability entities and the target product entity data.

7. A device for processing vulnerability-associated product data, comprising:

The data acquisition module is used for acquiring a first product data set and a first loophole data set from the public database and acquiring a second product data set and a second loophole data set from a plurality of devices;

The entity construction module is used for constructing a plurality of product entities corresponding to a plurality of vulnerability entities according to the first vulnerability data set;

The data supplementing module is used for carrying out data supplementing processing on the plurality of product entities based on the first product data set so as to obtain a plurality of product entity data corresponding to the plurality of vulnerability entities;

The entity association module is used for selecting a plurality of candidate product entity data associated with the target vulnerability entity in the second vulnerability data set from the plurality of product entity data based on the association relation between the vulnerability entity and the product entity data;

the entity determining module is used for acquiring target product information corresponding to the target vulnerability entity from the second product data set;

The entity alignment module is used for calculating the matching coefficient of the target product information and the candidate product entity data, and determining target product entity data corresponding to the target vulnerability entity from the candidate product entity data based on the matching coefficient;

8. An electronic device, comprising:

At least one processor;

At least one memory for storing at least one program;

A method of processing vulnerability-associated product data as claimed in any one of claims 1 to 6 when at least one of said programs is executed by at least one of said processors.

9. A computer readable storage medium storing computer executable instructions for performing a method of processing vulnerability-associated product data as claimed in any one of claims 1 to 6.