CN115952208A - A method, device and electronic equipment for calculating the degree of association between features and tags - Google Patents
A method, device and electronic equipment for calculating the degree of association between features and tags Download PDFInfo
- Publication number
- CN115952208A CN115952208A CN202310008414.8A CN202310008414A CN115952208A CN 115952208 A CN115952208 A CN 115952208A CN 202310008414 A CN202310008414 A CN 202310008414A CN 115952208 A CN115952208 A CN 115952208A
- Authority
- CN
- China
- Prior art keywords
- verification data
- target
- data set
- value
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请实施例提供一种计算特征与标签关联度的方法、装置及电子设备,应用于信息技术领域。应用本申请的方法,可以对已经训练好的待检测模型进行目标特征与目标标签关联度的检测,检测出的关联度是与该待检测模型相关且准确的。通过将验证数据集输入待检测模型中,对验证数据集中的目标特征值进行多次扰动,预测目标标签的值,实现了能够自动分析待检测模型中目标特征和目标标签的关联度,避免人工进行输入计算。在检测目标标签与目标特征关联度时,得到的是一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,因此,应用本申请的方法,还可以大幅减少算力的浪费。
Embodiments of the present application provide a method, device, and electronic device for calculating the degree of association between features and tags, which are applied in the field of information technology. By applying the method of the present application, the degree of correlation between the target feature and the target label can be detected on the trained model to be detected, and the detected degree of correlation is related and accurate to the model to be detected. By inputting the verification data set into the model to be detected, the target feature value in the verification data set is perturbed multiple times, and the value of the target label is predicted, which realizes the automatic analysis of the correlation between the target feature and the target label in the model to be detected, avoiding manual Do input calculations. When detecting the correlation degree between the target label and the target feature, what is obtained is a correlation degree that can directly represent the influence of the target feature on the target label, without predicting the influence of all features on the target label. Therefore, applying the method of this application, also Can greatly reduce the waste of computing power.
Description
技术领域technical field
本申请涉及信息技术领域,特别是涉及一种计算特征与标签关联度的方法、装置及电子设备。The present application relates to the field of information technology, in particular to a method, device and electronic equipment for calculating the degree of association between features and tags.
背景技术Background technique
随着人工智能的应用越来越广泛,传统上认为深度神经网络无需特征筛选,可以将全量的特征全部加载到深度模型中,让模型自动学习,对于重要的特征,其权重就较高;对于不重要的特征,则权重较低。但是全量特征的计算是需要大量算力,对于一个实际模型来说,通常每天都要基于最新的训练数据进行全量训练,因此不做任何特征筛选将会造成大量算力的浪费。With the increasing application of artificial intelligence, it is traditionally believed that the deep neural network does not need feature screening, and can load all the features into the deep model, so that the model can learn automatically. For important features, its weight is higher; for Unimportant features have lower weights. However, the calculation of full features requires a lot of computing power. For an actual model, full training is usually carried out based on the latest training data every day. Therefore, not doing any feature screening will cause a lot of waste of computing power.
为了减少算力的浪费,在相关技术中,人们常用计算特征与标签之前的皮尔逊相关性系数得出特征与标签的相关性,或利用删除某一个特征后,重新训练模型,然后根据实际的效果,反推出特征是否重要。In order to reduce the waste of computing power, in related technologies, people often calculate the Pearson correlation coefficient before the feature and the label to get the correlation between the feature and the label, or delete a certain feature, retrain the model, and then according to the actual effect, inversely deduces whether the feature is important.
然而,计算特征与标签之前的皮尔逊相关性系数只能获得理论上的常见的数学相关关系,而现实环境较为复杂,因此,采用计算特征与标签之前的皮尔逊相关性系数的方法,并不适用于现实模型中。However, calculating the Pearson correlation coefficient before the feature and the label can only obtain the common mathematical correlation in theory, and the real environment is more complicated. Therefore, the method of calculating the Pearson correlation coefficient before the feature and the label does not Applicable to realistic models.
还有一种方法,是通过增减特征后模型的最终效果来判断特征与标签的关联度,该方法虽然能够应用在实际场景,但是当样本中包含多个特征时,需要对增减每一个特征来进行预测,从而对比多次的预测结果,才能确定出待检测模型中目标特征对目标标签的权重大小。该方法的好处是得到的关联度是通过实际验证的,可以应用在实际模型中,例如可以应用在预测房产额度的模型、预测视频播放流量的模型等网络模型。但如果要分析所有特征与目标标签的关联度,通常是需要对去掉某个特征后的数据进行训练,例如有N个特征,需要训练N次;而通常N>50,因此,应用通过增减特征后模型的最终效果来判断特征与标签的关联度的方法,仍然会消耗较多算力。Another method is to judge the correlation between features and labels by the final effect of the model after adding or subtracting features. Although this method can be applied in actual scenarios, when the sample contains multiple features, it is necessary to add or subtract each feature. To make predictions, and compare multiple prediction results to determine the weight of the target feature to the target label in the model to be detected. The advantage of this method is that the obtained correlation degree is verified by practice and can be applied in actual models, such as models for predicting real estate quotas, models for predicting video playback traffic, and other network models. However, if you want to analyze the correlation between all features and the target label, you usually need to train the data after removing a certain feature. For example, if there are N features, you need to train N times; and usually N>50, so the application is increased or decreased. The method of judging the correlation between features and labels based on the final effect of the post-feature model will still consume a lot of computing power.
发明内容Contents of the invention
本申请实施例的目的在于提供一种计算特征与标签关联度的方法、装置及电子设备,以实现在低算力消耗情况下获得特征与标签的关联度。具体技术方案如下:The purpose of the embodiments of the present application is to provide a method, device and electronic device for calculating the degree of association between features and tags, so as to obtain the degree of association between features and tags with low computing power consumption. The specific technical scheme is as follows:
在本申请实施的第一方面,首先提供了一种计算特征与标签关联度的方法,所述方法包括:In the first aspect of the implementation of the present application, a method for calculating the degree of association between features and tags is firstly provided, the method comprising:
接收来自客户端组件的检测参数;其中,所述检测参数包括计算目标特征与目标标签之间关联度所需要的参数;Receive detection parameters from the client component; wherein the detection parameters include parameters required to calculate the degree of association between the target feature and the target label;
根据所述检测参数,调用待检测模型;其中,所述待检测模型用于依据输入的目标特征预测目标标签的值;According to the detection parameters, call the model to be detected; wherein, the model to be detected is used to predict the value of the target label according to the input target characteristics;
通过所述检测参数获取验证数据集;其中,所述验证数据集包括所述目标特征的取值,所述验证数据集用于输入进所述待检测模型中;Obtain a verification data set through the detection parameters; wherein, the verification data set includes the value of the target feature, and the verification data set is used to input into the model to be detected;
将所述验证数据集输入所述待检测模型,获得第一预测结果;Inputting the verification data set into the model to be tested to obtain a first prediction result;
按照所述检测参数修改所述验证数据集中所述目标特征的取值,得到修改后的验证数据集;Modifying the value of the target feature in the verification data set according to the detection parameters to obtain a modified verification data set;
将修改后的验证数据集输入所述待检测模型进行预测,得到第二预测结果;Inputting the modified verification data set into the model to be tested for prediction to obtain a second prediction result;
根据所述第二预测结果与所述第一预测结果计算所述待检测模型中所述目标特征与所述目标标签的关联度。Calculate the degree of association between the target feature and the target label in the model to be detected according to the second prediction result and the first prediction result.
在一种可能的实施方式中,所述检测参数包括扰动方法和扰动次数;其中,所述扰动方法表示修改所述目标特征取值的方法,所述扰动次数表示修改所述目标特征取值的次数,所述按照所述检测参数修改所述验证数据集中所述目标特征的取值,得到修改后的验证数据集,包括:In a possible implementation manner, the detection parameters include a disturbance method and a disturbance number; wherein, the disturbance method indicates a method of modifying the value of the target feature, and the number of disturbances indicates the method of modifying the value of the target feature The number of times, the value of the target feature in the verification data set is modified according to the detection parameters, and the modified verification data set is obtained, including:
按照所述扰动方法,修改所述验证数据集中每条验证数据中目标特征的取值,得到修改后的验证数据集;According to the perturbation method, modify the value of the target feature in each piece of verification data in the verification data set to obtain a modified verification data set;
根据所述扰动次数,重复所述按照所述扰动方法,修改所述验证数据集中每条验证数据中目标特征的取值的步骤,得到与所述扰动次数相同个数的修改后的验证数据集。According to the number of disturbances, repeating the step of modifying the value of the target feature in each piece of verification data in the verification data set according to the disturbance method, to obtain a modified verification data set having the same number of disturbance times as the number of disturbances .
在一种可能的实施方式中,所述扰动方法为按照正态分布方式扰动;所述按照所述扰动方法,修改所述验证数据集中每条验证数据中目标特征的取值,包括:In a possible implementation manner, the perturbation method is perturbation according to a normal distribution; according to the perturbation method, modifying the value of the target feature in each piece of verification data in the verification data set includes:
计算所述目标特征的取值在所述验证数据集内的呈正态分布时的均值及方差值,得到所述目标特征的取值在所述验证数据集内的正态分布曲线;Calculate the mean value and variance value when the value of the target feature is normally distributed in the verification data set, and obtain the normal distribution curve of the value of the target feature in the verification data set;
沿所述正态分布曲线随机改动每条验证数据中目标特征的取值。Randomly change the value of the target feature in each piece of verification data along the normal distribution curve.
在一种可能的实施方式中,所述根据所述第二预测结果与所述第一预测结果计算所述待检测模型中所述目标特征与所述目标标签的关联度,包括:In a possible implementation manner, the calculating the correlation degree between the target feature and the target label in the model to be detected according to the second prediction result and the first prediction result includes:
计算所述第二预测结果与所述第一预测结果的差值的平均值,得到所述待检测模型中所述目标特征与所述目标标签的关联度。Calculate the average value of the difference between the second prediction result and the first prediction result to obtain the correlation degree between the target feature and the target label in the model to be detected.
在一种可能的实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
将每次预测的第二预测结果作为中间结果反馈给所述客户端组件。The second prediction result of each prediction is fed back to the client component as an intermediate result.
在一种可能的实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
将所述关联度,和/或,所述中间结果存储在数据库内;storing the degree of association, and/or, the intermediate result in a database;
接收来自所述客户端组件的查询信息,所述查询信息中包括待查询的目标特征及目标标签名称;Receive query information from the client component, the query information includes the target feature to be queried and the target tag name;
从所述数据库内查找到所述目标特征及目标标签名称对应的关联度,将包含所述关联度的查询结果反馈给所述客户端组件。The correlation degree corresponding to the target feature and the target tag name is searched from the database, and the query result including the correlation degree is fed back to the client component.
在本申请实施的第二方面,还提供了一种计算特征与标签关联度的装置,所述装置包括:In the second aspect of the implementation of the present application, a device for calculating the degree of association between features and tags is also provided, the device comprising:
检测参数接收模块,用于接收来自客户端组件的检测参数;其中,所述检测参数包括计算目标特征与目标标签之间关联度所需要的参数;A detection parameter receiving module, configured to receive detection parameters from the client component; wherein the detection parameters include parameters required to calculate the degree of association between the target feature and the target tag;
待检测模型调用模块,用于根据所述检测参数,调用待检测模型;其中,所述待检测模型用于依据输入的目标特征预测目标标签的值;The model to be detected calling module is used to call the model to be detected according to the detection parameters; wherein the model to be detected is used to predict the value of the target label according to the input target characteristics;
验证数据集获取模块,用于通过所述检测参数获取验证数据集;其中,所述验证数据集包括所述目标特征的取值,所述验证数据集用于输入进所述待检测模型中;A verification data set acquisition module, configured to obtain a verification data set through the detection parameters; wherein, the verification data set includes the value of the target feature, and the verification data set is used for inputting into the model to be detected;
第一预测结果获得模块,用于将所述验证数据集输入所述待检测模型,获得第一预测结果;A first prediction result obtaining module, configured to input the verification data set into the model to be tested to obtain a first prediction result;
验证数据集修改模块,用于按照所述检测参数修改所述验证数据集中所述目标特征的取值,得到修改后的验证数据集;A verification data set modification module, configured to modify the value of the target feature in the verification data set according to the detection parameters, to obtain a modified verification data set;
第二预测结果获得模块,用于将修改后的验证数据集输入所述待检测模型进行预测,得到第二预测结果;A second prediction result obtaining module, configured to input the modified verification data set into the model to be tested for prediction, and obtain a second prediction result;
关联度计算模块,用于根据所述第二预测结果与所述第一预测结果计算所述待检测模型中所述目标特征与所述目标标签的关联度。A degree of association calculation module, configured to calculate the degree of association between the target feature and the target label in the model to be detected according to the second prediction result and the first prediction result.
在一种可能的实施方式中,所述检测参数包括扰动方法和扰动次数;其中,所述扰动方法表示修改所述目标特征取值的方法,所述扰动次数表示修改所述目标特征取值的次数,所述验证数据集修改模块,包括:In a possible implementation manner, the detection parameters include a disturbance method and a disturbance number; wherein, the disturbance method indicates a method of modifying the value of the target feature, and the number of disturbances indicates the method of modifying the value of the target feature Number of times, the verification data set modification module includes:
修改验证数据集子模块,具体用于按照所述扰动方法,修改所述验证数据集中每条验证数据中目标特征的取值,得到修改后的验证数据集;Modifying the verification data set sub-module, specifically for modifying the value of the target feature in each piece of verification data in the verification data set according to the perturbation method, to obtain a modified verification data set;
重复修改子模块,具体用于根据所述扰动次数,重复所述按照所述扰动方法,修改所述验证数据集中每条验证数据中目标特征的取值的步骤,得到与所述扰动次数相同个数的修改后的验证数据集。The repeated modification submodule is specifically used to repeat the step of modifying the value of the target feature in each piece of verification data in the verification data set according to the perturbation method according to the perturbation times to obtain the same number of perturbations as the perturbation times. The modified validation dataset of the number.
在一种可能的实施方式中,所述扰动方法为按照正态分布方式扰动;所述修改验证数据集子模块,包括:In a possible implementation manner, the perturbation method is perturbation according to a normal distribution; the submodule of modifying the verification data set includes:
正态分布曲线计算单元,具体用于计算所述目标特征的取值在所述验证数据集内的呈正态分布时的均值及方差值,得到所述目标特征的取值在所述验证数据集内的正态分布曲线;The normal distribution curve calculation unit is specifically used to calculate the mean value and variance value when the value of the target feature is normally distributed in the verification data set, and obtain the value of the target feature in the verification data set. the normal distribution curve within the dataset;
正态分布修改单元,具体用于沿所述正态分布曲线随机改动每条验证数据中目标特征的取值。The normal distribution modifying unit is specifically used to randomly change the value of the target feature in each piece of verification data along the normal distribution curve.
在一种可能的实施方式中,所述关联度计算模块,包括:In a possible implementation manner, the association degree calculation module includes:
平均值计算子模块,具体用于计算所述第二预测结果与所述第一预测结果的差值的平均值,得到所述待检测模型中所述目标特征与所述目标标签的关联度。The average value calculation sub-module is specifically used to calculate the average value of the difference between the second prediction result and the first prediction result to obtain the correlation degree between the target feature and the target label in the model to be detected.
在一种可能的实施方式中,所述装置还包括:In a possible implementation manner, the device also includes:
中间结果反馈模块,用于将每次预测的第二预测结果作为中间结果反馈给所述客户端组件。An intermediate result feedback module, configured to feed back the second prediction result of each prediction to the client component as an intermediate result.
在一种可能的实施方式中,所述装置还包括:In a possible implementation manner, the device also includes:
反馈结果存储模块,用于将所述关联度,和/或,所述中间结果存储在数据库内;A feedback result storage module, configured to store the degree of association and/or the intermediate result in a database;
查询信息接收模块,用于接收来自所述客户端组件的查询信息,所述查询信息中包括待查询的目标特征及目标标签名称;The query information receiving module is used to receive the query information from the client component, the query information includes the target feature to be queried and the target tag name;
查询结果反馈模块,用于从所述数据库内查找到所述目标特征及目标标签名称对应的关联度,将包含所述关联度的查询结果反馈给所述客户端组件。The query result feedback module is configured to find the correlation degree corresponding to the target feature and the target tag name from the database, and feed back the query result including the correlation degree to the client component.
在本申请实施例的第三方面,提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;In a third aspect of the embodiments of the present application, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;
存储器,用于存放计算机程序;memory for storing computer programs;
处理器,用于执行存储器上所存放的程序时,实现本申请实施例第一方面执行的任一所述的方法步骤。The processor is configured to implement any one of the method steps performed in the first aspect of the embodiment of the present application when executing the program stored in the memory.
在本申请实施的又一方面,还提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例第一方面执行的任一所述的方法步骤。In yet another aspect of the implementation of the present application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the implementation of the present application is realized. For example, any one of the method steps performed in the first aspect.
在本申请实施的又一方面,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一所述的计算特征与标签关联度的方法。In yet another aspect of the implementation of the present application, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to execute any of the methods for calculating the degree of association between features and tags described above.
本申请实施例提供的一种计算特征与标签关联度的方法、装置及电子设备,应用本申请实施例的方法,可以对已经训练好的待检测模型进行目标特征与目标标签关联度的检测,检测出的关联度是与该待检测模型相关且准确的,是能够应用到该待检测模型中的。通过将验证数据集输入待检测模型中,对验证数据集中的目标特征值进行多次扰动,预测目标标签的值,实现了能够自动分析待检测模型中目标特征和目标标签的关联度,避免人工进行输入计算。此外,在检测目标标签与目标特征关联度时,只需对目标特征的值进行扰动,通过目标特征的值预测目标标签的值,最终得到的是一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法,可以大幅减少计算量,从而减少算力的浪费。A method, device, and electronic device for calculating the degree of correlation between features and tags provided in the embodiments of the present application can detect the degree of correlation between target features and target tags for the trained model to be detected by applying the method of the embodiment of the present application. The detected correlation degree is relevant and accurate to the model to be detected, and can be applied to the model to be detected. By inputting the verification data set into the model to be detected, the target feature value in the verification data set is perturbed multiple times, and the value of the target label is predicted, which realizes the automatic analysis of the correlation between the target feature and the target label in the model to be detected, avoiding manual Do input calculations. In addition, when detecting the correlation between the target label and the target feature, it is only necessary to perturb the value of the target feature, and predict the value of the target label through the value of the target feature, and finally obtain a direct representation of the influence of the target feature on the target label. It is not necessary to predict the influence of all features on the target label, so as to compare the influence of the target features on the target label. Therefore, the application of the method of this application can greatly reduce the amount of calculation, thereby reducing the waste of computing power.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following briefly introduces the drawings that are required in the description of the embodiments or the prior art.
图1为本申请实施例提供的一种计算特征与标签关联度的方法流程图;FIG. 1 is a flow chart of a method for calculating the degree of association between features and labels provided by the embodiment of the present application;
图2为本申请实施例提供的一种验证数据集中目标特征分布的示意图;FIG. 2 is a schematic diagram of the distribution of target features in a verification data set provided by an embodiment of the present application;
图3为本申请实施例提供的一种计算特征与标签关联度装置的结构示意图流程图;FIG. 3 is a schematic flow chart of a structure of a device for calculating feature and label association provided by an embodiment of the present application;
图4为本申请实施例提供的一种计算特征与标签关联度的系统框图;FIG. 4 is a system block diagram for calculating the degree of correlation between features and tags provided by the embodiment of the present application;
图5为本申请实施例提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
在相关技术中,可以通过统计学指标计算特征与标签之前的皮尔逊相关性系数得出特征与标签的相关性。其方法主要是将目标特征表示为一个数组,将目标标签表示为一个数组,通过计算这两个数组之间的关系,来得到目标标签与目标特征的关系。但通过这样的方法,得到的关系通常是一次关系(线性关系),二次关系(原值与平方的关系)、三次关系(原值与三次方的关系)、指数关系、对数关系等。而现实中的场景复杂,这种数值分析很难全面分析,无法应用在实际模型中。In a related technology, the correlation between the feature and the label can be obtained by calculating the Pearson correlation coefficient before the feature and the label through statistical indicators. The method is mainly to represent the target feature as an array, and represent the target label as an array, and obtain the relationship between the target label and the target feature by calculating the relationship between the two arrays. However, through such a method, the relationship obtained is usually a primary relationship (linear relationship), a quadratic relationship (the relationship between the original value and the square), a cubic relationship (the relationship between the original value and the cubic power), an exponential relationship, and a logarithmic relationship. However, the real scene is complex, and this kind of numerical analysis is difficult to analyze comprehensively, and cannot be applied to the actual model.
在相关技术中,还有一种是通过增减模型后的最终效果来判断。这种方式的好处是结果是通过实际验证的,但如果要分析所有特征与目标标签的关联度,通常是需要对去掉某个特征后的数据进行训练,因此有M个特征,就需要训练M次;而通常M>50,因此这将大量消耗算力资源。In related technologies, there is another method of judging by the final effect after adding or subtracting models. The advantage of this method is that the result is actually verified, but if you want to analyze the correlation between all features and the target label, you usually need to train the data after removing a certain feature, so there are M features, you need to train M times; and usually M>50, so this will consume a lot of computing resources.
为了解决上述方法中的技术问题,在本申请实施的第一方面,首先提供了一种计算特征与标签关联度的方法,上述方法包括如图1所示步骤:In order to solve the technical problems in the above method, in the first aspect of the implementation of the present application, a method for calculating the degree of association between features and tags is firstly provided. The above method includes the steps shown in Figure 1:
步骤S101:接收来自客户端组件的检测参数。Step S101: Receive detection parameters from the client component.
其中,检测参数包括计算目标特征与目标标签之间关联度所需要的参数。例如,检测参数可以包括:待检测模型信息、验证数据集信息、目标特征的名称和目标标签的名称、扰动方式信息。其中,待检测模型是已经训练好的用于获取目标标签的模型,待检测模型信息用于指示获取待检测模型,其中,待检测模型信息可以是通过多种形式获得,例如,当用户以文件形式直接上传待检测模型,此时,待检测模型信息为待检测模型对应的文件地址或文件名称,也可以是用户填写待检测模型对应的访问地址,例如HDFS/S3(HadoopDistributed System/Simple Storage Service,分布式系统/简单存储服务器)地址,此时,待检测模型信息为待检测模型对应的访问地址。Wherein, the detection parameters include parameters required to calculate the degree of association between the target feature and the target label. For example, the detection parameters may include: information about the model to be detected, information about a verification data set, names of target features and tags, and perturbation mode information. Wherein, the model to be detected is a model that has been trained to obtain the target label, and the information of the model to be detected is used to indicate the acquisition of the model to be detected, wherein the information of the model to be detected can be obtained in various forms, for example, when the user directly upload the model to be tested. At this time, the information of the model to be tested is the file address or file name corresponding to the model to be tested. It can also be the access address corresponding to the model to be tested by the user, such as HDFS/S3 (Hadoop Distributed System/Simple Storage Service , distributed system/simple storage server) address, at this time, the information of the model to be detected is the access address corresponding to the model to be detected.
验证数据集包含多条验证数据,每条验证数据包括目标特征的值。同样的,验证数据集信息也可以是通过多种形式获得,例如可以通过该验证数据集对应的HDFS/S3地址获得或其他方式获得。The validation data set contains multiple pieces of validation data, and each piece of validation data includes the value of the target feature. Similarly, the verification data set information can also be obtained in various forms, for example, it can be obtained through the HDFS/S3 address corresponding to the verification data set or in other ways.
由于输入进待检测模型中的有多个特征,输出也可能是多个结果,目标特征的名称通常是用于用户向客户端输入以指示将哪个特征作为目标特征,目标标签的名称是用于向客户端输入以只是将哪个结果作为目标标签。目标特征是用户根据自身需求确定的要检测与目标标签关联度的特征,用户在输入目标特征的名称时,可以是输入一个目标特征的名称,也可以是输入多个目标特征名称。目标特征在验证数据集中的表现通常为数值。目标标签的值为待检测模型的输出结果。Since there are multiple features input into the model to be detected, the output may also be multiple results. The name of the target feature is usually used by the user to input to the client to indicate which feature is used as the target feature. The name of the target label is used for Input to the client to just have which result as the target label. The target feature is the feature determined by the user according to his own needs to detect the degree of correlation with the target tag. When the user enters the name of the target feature, he can enter the name of one target feature, or input multiple target feature names. The representation of the target features in the validation dataset is usually numerical. The value of the target label is the output result of the model to be tested.
扰动方式信息是用户自行确定的对验证数据集中目标特征的值进行扰动的方式,扰动方式信息可以包括扰动的方法、次数。The perturbation method information is a method determined by the user to perturb the value of the target feature in the verification data set, and the perturbation method information may include the method and times of perturbation.
一个例子中,接收来自客户端组件的检测参数中,识别到该检测参数中包括一个已经训练好的可以预测贷款额度的网络模型的访问地址,每一条验证数据都包括“贷款人年龄”“贷款人消费记录”“贷款人的资产总值”“贷款人身份证号码”多个特征的验证数据集的访问地址,本例中,目标特征的名称“贷款人的资产总值”,目标标签的名称“贷款额度”,扰动方式信息为对验证数据集中目标特征的值的顺序随机打乱50次,则表明本例中是要通过修改验证数据集中“贷款人的资产总值”这一特征的值,来计算预测贷款额度网络模型中,“贷款人的资产总值”这一目标特征与该网络模型输出的“贷款额度”这一目标标签的关联度。In one example, in the detection parameters received from the client component, it is recognized that the detection parameters include the access address of a network model that has been trained to predict the loan amount, and each piece of verification data includes "lender age" and "loan The access address of the verification data set of multiple features of "person's consumption record", "lender's total asset value" and "lender's ID number". In this example, the name of the target feature is "lender's total asset value", and the target label's The name "loan amount", the perturbation method information is to randomly scramble the order of the value of the target feature in the verification data set 50 times, which means that in this example, the feature "total asset value of the lender" in the verification data set is to be modified value, to calculate the correlation degree between the target feature "total asset value of the lender" and the target label "loan limit" output by the network model in the network model for predicting the loan amount.
步骤S102:根据检测参数,调用待检测模型。Step S102: Call the model to be detected according to the detection parameters.
其中,待检测模型用于依据输入的目标特征预测目标标签的值;具体的,可以识别步骤S101中的检测参数中的待检测模型信息,根据待检测模型信息,获取并调用待检测模型。例如,当待检测模型信息是待检测模型的访问地址时,则通过获取到的该访问地址,访问到待检测模型并进行调用。Wherein, the model to be detected is used to predict the value of the target label according to the input target features; specifically, the model information to be detected in the detection parameters in step S101 can be identified, and the model to be detected can be acquired and invoked according to the information of the model to be detected. For example, when the information of the model to be detected is the access address of the model to be detected, the model to be detected is accessed and invoked through the obtained access address.
步骤S103:通过检测参数获取验证数据集。Step S103: Obtain a verification data set by detecting parameters.
其中,验证数据集包括目标特征的取值,验证数据集用于输入进待检测模型中;在获取验证数据集时,可以采用与步骤S102相同的方法,根据验证数据集信息,获取并调用验证数据集。例如,当验证数据集信息是验证数据集的访问地址时,则通过获取到的该访问地址,访问到验证数据集并进行调用。Among them, the verification data set includes the value of the target feature, and the verification data set is used to input into the model to be tested; when obtaining the verification data set, the same method as step S102 can be used to obtain and call the verification data set according to the information of the verification data set. data set. For example, when the verification data set information is the access address of the verification data set, the verification data set is accessed and called through the obtained access address.
步骤S104:将验证数据集输入待检测模型,获得第一预测结果。Step S104: Input the verification data set into the model to be tested to obtain the first prediction result.
步骤S105:按照检测参数修改验证数据集中目标特征的取值,得到修改后的验证数据集。Step S105: Modify the value of the target feature in the verification data set according to the detection parameters to obtain a modified verification data set.
在实际应用中,可以识别步骤S101中接收到的检测参数中的扰动方式信息,根据扰动方式信息中的扰动方法和扰动次数来修改验证数据集中目标特征的取值,得到修改后的验证数据集。In practical applications, the disturbance method information in the detection parameters received in step S101 can be identified, and the value of the target feature in the verification data set can be modified according to the disturbance method and the number of disturbances in the disturbance method information to obtain a modified verification data set .
步骤S106:将修改后的验证数据集输入待检测模型进行预测,得到第二预测结果。Step S106: input the modified verification data set into the model to be tested for prediction, and obtain a second prediction result.
步骤S107:根据第二预测结果与第一预测结果计算待检测模型中目标特征与目标标签的关联度。Step S107: Calculate the correlation degree between the target feature and the target label in the model to be detected according to the second prediction result and the first prediction result.
在实际应用中,如果目标特征与目标标签的关联度较大,即目标特征对待检测模型的输出结果影响较大,则当修改验证数据集中目标特征的取值时,待检测模型输出的目标标签的值变化也就比较明显。In practical applications, if the correlation between the target feature and the target label is large, that is, the target feature has a greater impact on the output of the detection model, then when the value of the target feature in the verification data set is modified, the target label output by the detection model The value change is more obvious.
例如,当待检测模型是一个预测贷款额度的模型,输入的验证数据集中包括贷款人资产总值和贷款人身份证号这两个特征,其中,贷款人资产总值对输出的贷款额度影响较大,则当“贷款人资产总值”为目标特征时,修改“贷款人资产总值”的取值变化越大,输出的贷款额度也会随之变化越大;而贷款人身份证号对输出的贷款额度影响较小,则当“贷款人身份证号”为目标特征时,即使修改“贷款人身份证号”的取值的变化越大,输出的贷款额度也不会随之有很大变化,因此,可以根据第一预测结果和第二预测结果来计算得到待检测模型中目标特征和目标标签的关联度。For example, when the model to be tested is a model that predicts the loan amount, the input verification data set includes the two features of the total asset value of the lender and the lender's ID number, and the total asset value of the lender has a greater impact on the output loan amount. is large, when the "total asset value of the lender" is the target feature, the greater the change in the value of the modified "total asset value of the lender", the greater the output loan amount will be; The output loan amount has little influence, then when the "lender ID number" is the target feature, even if the value of the "lender ID number" changes more, the output loan amount will not change significantly. Therefore, the correlation degree between the target feature and the target label in the model to be detected can be calculated according to the first prediction result and the second prediction result.
具体的,计算待检测模型中目标特征和目标标签的关联度的方式有多种,例如,可以通过计算第一预测结果和第二预测结果的差值来得到待检测模型中目标特征和目标标签的关联度,也可以通过计算第一预测结果和第二预测结果的比值来得到待检测模型中目标特征和目标标签的关联度。Specifically, there are many ways to calculate the degree of correlation between target features and target labels in the model to be detected. For example, the difference between the first prediction result and the second prediction result can be calculated to obtain the target feature and target label in the model to be detected The correlation degree of the target feature and the target label in the model to be detected can also be obtained by calculating the ratio of the first prediction result and the second prediction result.
应用本申请实施例的方法,对已经训练好的待检测模型进行目标特征与目标标签关联度的检测,检测出的关联度是与该待检测模型相关且可靠的,是能够应用到该待检测模型中的。可以将验证数据集输入待检测模型中,对验证数据集中的目标特征值进行扰动,预测目标标签的值,实现了能够自动分析待检测模型中目标特征和目标标签的关联度,避免人工进行输入计算。此外,在检测目标标签与目标特征关联度时,只需对目标特征的值进行扰动预测目标标签的值,得到的是一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法,可以大幅减少计算量,从而减少算力的浪费。Apply the method of the embodiment of the present application to detect the degree of correlation between the target feature and the target label on the model to be detected that has been trained. The detected degree of correlation is related to the model to be detected and reliable, and can be applied to the model to be detected. in the model. The verification data set can be input into the model to be detected, and the target feature value in the verification data set can be disturbed to predict the value of the target label, which can automatically analyze the correlation between the target feature and the target label in the model to be detected, and avoid manual input calculate. In addition, when detecting the degree of correlation between the target label and the target feature, it is only necessary to perturb the value of the target feature to predict the value of the target label, and obtain a correlation degree that can directly represent the influence of the target feature on the target label without predicting The influence of all features on the target label is compared to compare the influence of the target features on the target label. Therefore, the application of the method of the present application can greatly reduce the amount of calculation, thereby reducing the waste of computing power.
在实际应用中,由于单一使用服务器进行计算,会使得运算量较大,效率较低,因此,本申请的方法可以应用于网络侧,其中,网络侧包括服务器组件和训练资源组件,在训练资源组件中执行本申请中上述计算关联度的步骤,由服务器组件进行管理,将关联度反馈给客户端。In practical applications, due to the single use of the server for calculation, the amount of calculation will be large and the efficiency will be low. Therefore, the method of this application can be applied to the network side, wherein the network side includes server components and training resource components. The above-mentioned steps of calculating the degree of association in this application are executed in the component, managed by the server component, and the degree of association is fed back to the client.
具体的,训练资源组件可以包括多个容器平台,每个容器平台都有对应的标识,训练资源组件通过预设容器平台的标识,匹配到与预设容器平台标识对应的预设容器平台。用户可以根据自身需求确定一个或多个容器平台用来执行本申请的方法,因此,用户在客户端输入检测参数时,检测参数可以包括用于确定预设容器平台的预设容器平台标识,该预设容器平台标识可以是多种类型的标识,例如,该预设容器平台标识可以是预设容器平台对应的访问地址,也可以是预设容器平台的名称。本申请实施例的方法还可以通过以下步骤实现:Specifically, the training resource component may include multiple container platforms, each container platform has a corresponding identifier, and the training resource component is matched to the preset container platform corresponding to the preset container platform identifier through the preset container platform identifier. Users can determine one or more container platforms to implement the method of this application according to their own needs. Therefore, when users input detection parameters on the client terminal, the detection parameters can include the preset container platform identifier used to determine the preset container platform. The preset container platform identifier may be various types of identifiers. For example, the preset container platform identifier may be an access address corresponding to the preset container platform, or may be a name of the preset container platform. The method of the embodiment of the present application can also be realized through the following steps:
步骤一:服务器组件接收来自客户端组件的检测参数并识别检测参数中的预设容器平台标识,向训练资源组件发送调用指令。Step 1: The server component receives the detection parameters from the client component, identifies the preset container platform identifier in the detection parameters, and sends an invocation instruction to the training resource component.
步骤二:训练资源组件根据接收到的调用指令,调用与预设容器平台标识对应的容器平台,与待检测模型信息对应的待检测模型,执行步骤S102-步骤S107;并将关联度反馈给服务器组件。Step 2: The training resource component invokes the container platform corresponding to the preset container platform identifier and the model to be tested corresponding to the model information to be tested according to the received call instruction, and executes steps S102-S107; and feeds back the degree of association to the server components.
步骤三:服务器组件接收训练资源组件反馈的包括目标特征与目标标签关联度的反馈结果,将反馈结果发送给客户端组件。Step 3: The server component receives the feedback result fed back by the training resource component, including the correlation degree between the target feature and the target label, and sends the feedback result to the client component.
其中,反馈结果包括目标特征与目标标签的关联度,该关联度可以是以{特征:关联度}的字典指标的形式进行反馈,该关联度表示目标特征对目标标签的影响力,关联度越大,说明目标特征对目标标签的影响越大,反之,关联度越小,则说明目标特征对目标标签的影响越小。Wherein, the feedback result includes the degree of association between the target feature and the target label, and the degree of association can be fed back in the form of a dictionary index of {feature: degree of association}. The degree of association indicates the influence of the target feature on the target label. A larger value indicates that the target feature has a greater influence on the target label, and conversely, a smaller correlation degree indicates that the target feature has a smaller influence on the target label.
应用本申请实施例的方法,可以通过在训练资源组件中进行计算待检测模型中目标特征和目标标签的关联度,将该关联度通过服务器组件反馈给客户端组件,减少服务器组件运行的算力,提高计算关联度的效率。且在得到目标特征对目标标签的关联度后,可以在使用上述检测模型预测目标标签时,将与目标标签的关联度较小的特征进行剔除,从而减少计算量。By applying the method of the embodiment of the present application, the correlation degree between the target feature and the target label in the model to be detected can be calculated in the training resource component, and the correlation degree can be fed back to the client component through the server component to reduce the computing power of the server component , to improve the efficiency of calculating the correlation degree. And after obtaining the correlation degree of the target feature to the target label, when using the above-mentioned detection model to predict the target label, the features with a small correlation degree with the target label can be eliminated, thereby reducing the amount of calculation.
上述步骤S101中接收到的客户端组件的检测参数中的扰动方式信息包括扰动方法和扰动次数;上述步骤S105中按照检测参数修改验证数据集中目标特征的取值,得到修改后的验证数据集,包括以下步骤:The disturbance mode information in the detection parameters of the client component received in the above step S101 includes the disturbance method and the number of disturbances; in the above step S105, the value of the target feature in the verification data set is modified according to the detection parameters, and the modified verification data set is obtained. Include the following steps:
步骤一,按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值,得到修改后的验证数据集;Step 1. According to the perturbation method, modify the value of the target feature in each piece of verification data in the verification data set to obtain a modified verification data set;
步骤二,根据扰动次数,重复执行步骤一,得到与扰动次数相同个数的修改后的验证数据集。Step 2: Repeat step 1 according to the number of perturbations to obtain a modified verification data set with the same number of perturbations.
其中,扰动方法表示修改目标特征取值的方法,可以有多种扰动方法,例如按照正态分布方式扰动、按照均匀分布方式扰动、按照打乱目标特征取值方式扰动和直接赋常数值方式扰动,用户可以根据自身需求确定扰动方法。扰动次数表示修改目标特征取值的次数,也可以是根据业务需求预先设定的次数,该扰动次数越多,最终获得的关联度也就越准确。Among them, the disturbance method refers to the method of modifying the value of the target feature. There are many disturbance methods, such as disturbance according to the normal distribution method, disturbance according to the uniform distribution method, disturbance according to the method of disrupting the value of the target feature value, and disturbance by directly assigning a constant value. , users can determine the perturbation method according to their own needs. The number of disturbances indicates the number of times to modify the value of the target feature, and it can also be a preset number of times according to business requirements. The more the number of disturbances, the more accurate the final correlation degree will be.
应用本申请实施例的方法,可以通过只需对目标特征的值进行扰动来预测目标标签的值,从而通过计算对目标特征的值进行多次扰动后目标标签的值的变化,来得到待检测模型中目标特征与目标特征的关联度,从而减少了网络侧的计算量,达到减少算力浪费的技术效果。Applying the method of the embodiment of the present application, the value of the target label can be predicted by only perturbing the value of the target feature, and then by calculating the change of the value of the target label after multiple perturbations to the value of the target feature, to obtain the value to be detected The degree of correlation between the target feature and the target feature in the model reduces the amount of calculation on the network side and achieves the technical effect of reducing the waste of computing power.
下面以扰动方法为按照正态分布方式扰动为例进行说明。此时,上述步骤一中按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值,包括以下步骤:In the following, it will be described by taking the perturbation method as perturbation according to the normal distribution as an example. At this time, in the above step 1, modify the value of the target feature in each piece of verification data in the verification data set according to the perturbation method, including the following steps:
步骤1,计算目标特征的取值在验证数据集内的呈正态分布时的均值及方差值,得到目标特征的取值在验证数据集内的正态分布曲线;Step 1, calculate the mean value and variance value when the value of the target feature is normally distributed in the verification data set, and obtain the normal distribution curve of the value of the target feature in the verification data set;
步骤2,沿正态分布曲线随机改动每条验证数据中目标特征的取值。Step 2. Randomly change the value of the target feature in each piece of verification data along the normal distribution curve.
这种情况下,就是按照正态分布的方式随机扰动目标特征的取值的取值,以实现按照正态的方式来自动获得不同的验证数据集。如图2所示,在实际应用中,验证数据集中目标特征的值可能像图2中的矩形图所示杂乱分布,计算其正态分布如图2曲线所示,则可以将验证数据集中的目标特征的值修改使其符合该正态分布曲线。应用本申请实施例的方法,可以提供一种对目标特征的值进行扰动的方法,增加扰动目标特征取值的方法的多样性。In this case, the value of the target feature is randomly perturbed according to the normal distribution, so as to automatically obtain different verification data sets according to the normal method. As shown in Figure 2, in practical applications, the value of the target feature in the verification data set may be randomly distributed as shown in the histogram in Figure 2, and its normal distribution is calculated as shown in the curve in Figure 2, then the values in the verification data set can be The values of the target feature are modified to fit this normal distribution curve. Applying the method of the embodiment of the present application can provide a method for disturbing the value of the target feature, and increase the diversity of methods for disturbing the value of the target feature.
下面以扰动方法为按照均匀分布的方式扰动为例进行说明。此时,上述步骤一按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值,包括以下步骤:In the following, description will be made by taking the perturbation method as perturbation in a uniform distribution manner as an example. At this time, the above step 1 modifies the value of the target feature in each piece of verification data in the verification data set according to the perturbation method, including the following steps:
步骤1,计算验证数据集内每条目验证数据中标特征取值的均值;Step 1, calculate the mean value of the winning feature value of each item of verification data in the verification data set;
步骤2,将每条验证数据中目标特征的取值修改为均值。Step 2, modify the value of the target feature in each piece of verification data to the mean value.
这种情况下,就是按照均匀分布的方式随机扰动目标特征的取值的取值,以实现按照正态的方式来自动获得不同的验证数据集。应用本申请实施例的方法,可以提供一种对目标特征的值进行扰动的方法,增加扰动目标特征取值的方法的多样性。In this case, the value of the target feature is randomly perturbed in a uniform distribution manner, so as to automatically obtain different verification data sets in a normal manner. Applying the method of the embodiment of the present application can provide a method for disturbing the value of the target feature, and increase the diversity of methods for disturbing the value of the target feature.
下面以扰动方法为按照打乱目标特征取值方式扰动为例进行说明。此时,上述步骤一按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值,包括以下步骤:In the following, the perturbation method is perturbed according to the method of disturbing the value of the target feature as an example for illustration. At this time, the above step 1 modifies the value of the target feature in each piece of verification data in the verification data set according to the perturbation method, including the following steps:
保持验证数据集中目标特征的取值不变,打乱目标特征的顺序。Keep the values of the target features in the validation dataset unchanged, and shuffle the order of the target features.
应用本申请实施例的方法,可以提供一种对目标特征的值进行扰动的方法,增加扰动目标特征取值的方法的多样性。Applying the method of the embodiment of the present application can provide a method for disturbing the value of the target feature, and increase the diversity of methods for disturbing the value of the target feature.
下面以扰动方法为直接赋常数值方式扰动为例进行说明。此时,上述步骤一按照所选定的扰动方法,修改验证数据集中每条验证数据中目标特征的取值,包括以下步骤:In the following, the disturbance method is directly assigned a constant value as an example for illustration. At this time, the above step 1 modifies the value of the target feature in each piece of verification data in the verification data set according to the selected perturbation method, including the following steps:
将验证数据集中所有目标特征的值赋为一个常数。Assigns the values of all target features in the validation dataset to a constant.
应用本申请实施例的方法,可以提供一种对目标特征的值进行扰动的方法,增加扰动目标特征取值的方法的多样性。Applying the method of the embodiment of the present application can provide a method for disturbing the value of the target feature, and increase the diversity of methods for disturbing the value of the target feature.
在上述步骤S107中,根据第二预测结果与第一预测结果计算待检测模型中目标特征与目标标签的关联度,包括:In the above step S107, the degree of correlation between the target feature and the target label in the model to be detected is calculated according to the second prediction result and the first prediction result, including:
计算一个或多个第二预测结果与第一预测结果的差值的平均值,得到待检测模型中目标特征与目标标签的关联度。Calculate the average value of the difference between one or more second prediction results and the first prediction result to obtain the correlation degree between the target feature and the target label in the model to be detected.
若目标特征与目标标签的关联度大,则每一次对目标特征的取值进行扰动后,预测到的预测结果变化也会比较大。由于验证数据集中有多条验证数据,因此,可以将每一条的验证数据得到的预测结果取平均,得到第一预测结果。If the correlation degree between the target feature and the target label is large, the predicted prediction result will change greatly after each disturbance to the value of the target feature. Since there are multiple pieces of verification data in the verification data set, the prediction results obtained from each piece of verification data can be averaged to obtain the first prediction result.
一个例子中,将验证数据集输入待检测模型中,得到的第一预测结果是avg(R),对验证数据集中目标特征的取值进行多次扰动后,将修改后的验证数据集各自输入待检测模型中,得到多个第二预测结果avg(R1’)、avg(R2’)、avg(R3’)……avg(RN’),其中,N表示扰动次数,则待检测模型中目标特征与目标标签的关联度可以由以下公式进行计算:In one example, the verification data set is input into the model to be tested, and the first prediction result obtained is avg(R). After multiple perturbations are performed on the value of the target feature in the verification data set, the modified verification data sets are respectively input into In the model to be tested, a plurality of second prediction results avg(R 1 '), avg(R 2 '), avg(R 3 ')...avg(R N '), where N represents the number of disturbances, is to be The correlation between target features and target labels in the detection model can be calculated by the following formula:
D=((avg(R)-avg(R1’))+(avg(R)-avg(R2’))+(avg(R)-(R3’))+…D=((avg(R)-avg(R 1 '))+(avg(R)-avg(R 2 '))+(avg(R)-(R 3 '))+…
+(avg(R)-vg(RN’)))/+(avg(R)-vg(R N ')))/
其中,D表示待检测模型中目标特征与目标标签的关联度;R表示针对验证数据集中每一条验证数据获得的预测结果;R1 ’表示针对第一次扰动后的验证数据集中每一条验证数据获得的预测结果;R2’表示针对第二次扰动后的验证数据集中每一条验证数据获得的预测结果;R3’表示针对第三次扰动后的验证数据集中每一条验证数据获得的预测结果;RN’表示针对第N次扰动后的验证数据集中每一条验证数据获得的预测结果;N表示扰动次数。Among them, D represents the correlation degree between the target feature and the target label in the model to be detected; R represents the prediction result obtained for each verification data in the verification data set; R 1 ' represents each verification data in the verification data set after the first perturbation The obtained prediction results; R 2 'represents the prediction results obtained for each piece of verification data in the verification data set after the second perturbation; R 3 ' represents the prediction results obtained for each piece of verification data in the verification data set after the third perturbation ; R N ' represents the prediction result obtained for each piece of verification data in the verification data set after the Nth perturbation; N represents the number of perturbations.
一个例子中,在预设容器平台执行计算目标特征与目标标签的关联度的任务时,可以通过以下伪代码进行运行:In an example, when the container platform is preset to perform the task of calculating the correlation between target features and target labels, the following pseudocode can be used to run:
获得参数model_dir、dataset_dir、feature_names、function、epochs//接收客户端发送的检测参数Obtain parameters model_dir, dataset_dir, feature_names, function, epochs//Receive the detection parameters sent by the client
加载模型model=load_model(model_dir)//通过待检测模型信息加载待检测模型Load model model=load_model(model_dir)//Load the model to be tested through the information of the model to be tested
加载dataset_dir,并进行预测R=model.predict(dataset)//通过验证数据集信息加载验证数据集;并将验证数据集输入待检测模型中进行预测Load dataset_dir and make a prediction R=model.predict(dataset)//Load the verification data set through the verification data set information; and input the verification data set into the model to be tested for prediction
For fn in feature_namesFor fn in feature_names
分析dataset_dir中的fn的数据分布//计算验证数据集中目标特征的取值分布Analyze the data distribution of fn in dataset_dir//Calculate the value distribution of the target features in the verification data set
For i in range(epoches)For i in range(epoches)
修改dataset_dir中的fn特征的值//对目标特征的取值进行扰动Modify the value of the fn feature in dataset_dir//disturb the value of the target feature
加载new_dataset_dir数据为new_dataset_i//将修改后的验证数据集进行更新,重新输入待检测模型中进行预测Load new_dataset_dir data as new_dataset_i//Update the modified verification data set and re-enter it into the model to be tested for prediction
R’.append(mean(model.predict(new_dataset_i)))R’中记录平均预测值//计算每次修改后的验证数据集对应的预测值R'.append(mean(model.predict(new_dataset_i)))Record the average predicted value in R'//Calculate the predicted value corresponding to each modified verification data set
fn_R’=Mean(R’)得到特征fn的多次执行后的平均预测值//计算多次修改后的预测值变化的平均值fn_R'=Mean(R') to get the average predicted value of feature fn after multiple executions//calculate the average value of the predicted value changes after multiple modifications
fn与Label的关联度为fn_R’-R//得到目标标签和目标特征的关联度The degree of association between fn and Label is fn_R’-R//Get the degree of association between the target label and the target feature
返回所有的{fn:fn_R’-R}//将中间结果返回给服务器组件Return all {fn:fn_R’-R}//Return the intermediate results to the server component
应用本申请实施例的方法,可以提供一种对待检测模型中目标特征与目标标签的关系进行量化的方法,得到一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法可以减少计算量,从而减少算力的浪费。Applying the method of the embodiment of the present application can provide a method for quantifying the relationship between the target feature and the target label in the detection model, and obtain a correlation degree that can directly represent the influence of the target feature on the target label without predicting all features The impact on the target label, so as to compare the influence of the target feature on the target label. Therefore, the application of the method of this application can reduce the amount of calculation, thereby reducing the waste of computing power.
在上述实施例中,若由训练资源组件中的容器平台来执行计算待检测模型中目标特征和目标标签关联度的步骤,训练资源组件中预存的容器平台有自建容器平台或公有容器平台两种,其中,自建容器平台仅支持指定用户访问,公有容器平台可以支持所有用户访问。In the above-mentioned embodiment, if the container platform in the training resource component performs the step of calculating the correlation degree between the target feature and the target label in the model to be detected, the pre-stored container platforms in the training resource component include self-built container platforms or public container platforms. Among them, the self-built container platform only supports designated user access, and the public container platform can support all user access.
具体的,指定用户可以是通过认证的用户,该认证可以通过用户的IP、网络信息确定该用户是否为指定用户。例如,可以认证连接某一局域网的用户为指定用户,用户只有在连接该局域网时,才能访问自建容器平台。自建容器平台是自建的资源池或机房,该自建容器平台只接收指定用户访问,有较好的安全性,例如,该自建容器平台可以是k8s集群,或者mesos集群。公有容器平台可以支持多个用户访问,例如AWS(Amazon web service,亚马逊云)、腾讯云、华为云、阿里云。在使用公有容器平台时,需要具有适配公有容器平台的接口。Specifically, the specified user may be an authenticated user, and the authentication may determine whether the user is the specified user through the user's IP and network information. For example, a user connected to a local area network can be authenticated as a designated user, and the user can only access the self-built container platform when connected to the local area network. The self-built container platform is a self-built resource pool or computer room. The self-built container platform only accepts access from designated users and has better security. For example, the self-built container platform can be a k8s cluster or a mesos cluster. Public container platforms can support access by multiple users, such as AWS (Amazon web service, Amazon Cloud), Tencent Cloud, Huawei Cloud, and Alibaba Cloud. When using a public container platform, it is necessary to have an interface that adapts to the public container platform.
在预设容器平台中执行计算关联度任务时,需要对验证数据集中目标特征的值进行多次扰动,再对修改后的验证数据集进行多次预测,这个计算量较多且比较耗时,因此,可以选择在GPU(Graphic Processing Unit,图形处理器)执行计算任务。When performing the task of calculating the correlation degree in the preset container platform, it is necessary to perturb the value of the target feature in the verification data set multiple times, and then perform multiple predictions on the modified verification data set, which is computationally intensive and time-consuming. Therefore, you can choose to perform computing tasks on the GPU (Graphic Processing Unit, graphics processing unit).
应用本申请实施例的方法,可以通过在GPU中通过预设容器平台对检测模型进行目标特征和目标标签关联度的计算,减少CPU(Central Processing Unit,中央处理器)的运算量,从而提高计算关联度的效率。Applying the method of the embodiment of the present application can reduce the computational load of the CPU (Central Processing Unit, central processing unit) by presetting the container platform in the GPU to carry out the calculation of the target feature and the target label correlation degree on the detection model, thereby improving the calculation efficiency of association.
在本申请实施例中,还可以将每次预测的第二预测结果作为中间结果反馈给客户端组件。In this embodiment of the present application, the second prediction result of each prediction may also be fed back to the client component as an intermediate result.
具体的,中间结果可以包括每一次对验证数据集中目标特征的值扰动后的值,和每一次扰动后,待检测模型预测到的修改后的验证数据集对应的第二预测结果。在训练资源组件将中间结果反馈给服务器组件之后,服务器组件可以直接将其反馈给客户端。Specifically, the intermediate result may include the value after each perturbation of the value of the target feature in the verification data set, and the second prediction result corresponding to the modified verification data set predicted by the model to be detected after each perturbation. After the training resource component feeds back the intermediate results to the server component, the server component can directly feed it back to the client.
应用本申请实施例的方法,可以将执行预测任务中的中间结果也反馈给服务器组件,从而利于用户对任务运行的过程进行查询。By applying the method of the embodiment of the present application, the intermediate results in the execution of the prediction task can also be fed back to the server component, thereby facilitating the user to inquire about the process of task execution.
在实际应用中,服务器组件还包括数据库,可以将计算得到的待检测模型中的目标特征和目标标签的关联度,和/或上述实施例中的中间结果存储在数据库内。本申请实施例的方法还可以通过以下步骤实现:In practical application, the server component also includes a database, which can store the calculated correlation degree between the target feature and the target label in the model to be detected, and/or the intermediate results in the above embodiments. The method of the embodiment of the present application can also be realized through the following steps:
步骤一,将关联度,和/或,中间结果存储在数据库内。Step 1, storing the degree of association and/or intermediate results in the database.
步骤二,接收来自客户端组件的查询信息,查询信息中包括待查询的目标特征及目标标签名称。Step 2, receiving query information from the client component, the query information includes the target feature to be queried and the name of the target tag.
步骤三,从数据库内查找到目标特征及目标标签名称对应的关联度,将包含关联度的查询结果反馈给客户端组件。Step 3: Find the correlation degree corresponding to the target feature and the target label name from the database, and feed back the query result including the correlation degree to the client component.
具体的,在接收到客户端组件的查询信息后,可以依据查询信息中目标特征及目标标签的名称,与数据库中各目标特征及目标标签名称进行匹配,找到查询信息中目标特征及目标标签名称对应的关联度,并将该包含关联度的查询结果反馈给客户端组件。Specifically, after receiving the query information from the client component, the target features and target tag names in the query information can be matched with the target features and target tag names in the database to find the target features and target tag names in the query information Corresponding degree of association, and feed back the query result including the degree of association to the client component.
应用本申请实施例的方法,可以使用户能够对目标特征和目标标签的关联度及运行结果进行查询,使得用户能够观看到中间的预测结果,也能够自行进行手动验算或在系统发生故障时进行检测,对用户更加友好。Applying the method of the embodiment of the present application can enable the user to query the correlation degree between the target feature and the target label and the operation result, so that the user can watch the intermediate prediction result, and can also perform manual checking or calculation when the system fails. detection, which is more user-friendly.
具体的,查询结果可以是包括待检测模型中目标特征和目标标签关联度的反馈结果、训练资源组件反馈的中间结果和对关联度进行评述的结果。其中,对关联度进行评述的结果,可以是依据关联度的大小进行确定,当关联度较大,则评述结果为该目标特征对目标标签的影响较大,若关联度较小,则评述结果为该目标特征对目标标签的影响较小,建议剔除。Specifically, the query result may be a feedback result including the correlation degree of the target feature and the target label in the model to be detected, an intermediate result fed back by the training resource component, and a result of commenting on the correlation degree. Among them, the result of commenting on the degree of association can be determined according to the size of the degree of association. When the degree of association is large, the evaluation result is that the target feature has a greater impact on the target label. If the degree of association is small, the evaluation result Because the target feature has less impact on the target label, it is recommended to remove it.
应用本申请实施例的方法,可以将接收到的结果存储在数据库中,以供客户端进行查询,使用户能够更直观的看到任务运行结果。By applying the method of the embodiment of the present application, the received results can be stored in the database for the client to query, so that the user can see the task running results more intuitively.
本申请实施例的第二方面,还提供一种计算特征与标签关联度的装置,该装置如图3所示:The second aspect of the embodiment of the present application also provides a device for calculating the degree of association between features and tags, as shown in Figure 3:
检测参数接收模块301,用于接收来自客户端组件的检测参数;其中,检测参数包括计算目标特征与目标标签之间关联度所需要的参数;The detection
待检测模型调用模块302,用于根据检测参数,调用待检测模型;其中,待检测模型用于依据输入的目标特征预测目标标签的值;The model to be detected calling
验证数据集获取模块303,用于通过检测参数获取验证数据集;其中,验证数据集包括目标特征的取值,验证数据集用于输入进待检测模型中;The verification data
第一预测结果获得模块304,用于将验证数据集输入待检测模型,获得第一预测结果;The first prediction
验证数据集修改模块305,用于按照检测参数修改验证数据集中目标特征的取值,得到修改后的验证数据集;The verification data
第二预测结果获得模块306,用于将修改后的验证数据集输入待检测模型进行预测,得到第二预测结果;The second prediction
关联度计算模块307,用于根据第二预测结果与第一预测结果计算待检测模型中目标特征与目标标签的关联度。The
应用本申请实施例的装置,对已经训练好的待检测模型进行目标特征与目标标签关联度的检测,检测出的关联度是与该待检测模型相关且可靠的,是能够应用到该待检测模型中的。可以将验证数据集输入待检测模型中,对验证数据集中的目标特征值进行扰动,预测目标标签的值,实现了能够自动分析待检测模型中目标特征和目标标签的关联度,避免人工进行输入计算。此外,在检测目标标签与目标特征关联度时,只需对目标特征的值进行扰动预测目标标签的值,得到的是一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法,可以大幅减少计算量,从而减少算力的浪费。Apply the device of the embodiment of the present application to detect the degree of correlation between the target feature and the target label on the trained model to be detected. The detected degree of correlation is related to the model to be detected and reliable, and can be applied to the model to be detected. in the model. The verification data set can be input into the model to be detected, and the target feature value in the verification data set can be disturbed to predict the value of the target label, which can automatically analyze the correlation between the target feature and the target label in the model to be detected, and avoid manual input calculate. In addition, when detecting the degree of correlation between the target label and the target feature, it is only necessary to perturb the value of the target feature to predict the value of the target label, and obtain a correlation degree that can directly represent the influence of the target feature on the target label without predicting The influence of all features on the target label is compared to compare the influence of the target features on the target label. Therefore, the application of the method of the present application can greatly reduce the amount of calculation, thereby reducing the waste of computing power.
在一种可能的实施方式中,检测参数包括扰动方法和扰动次数;其中,扰动方法表示修改目标特征取值的方法,扰动次数表示修改目标特征取值的次数,验证数据集修改模块,包括:In a possible implementation, the detection parameters include a disturbance method and a disturbance number; wherein, the disturbance method indicates a method for modifying the value of the target feature, and the disturbance number indicates the number of times the value of the target feature is modified, and the verification data set modification module includes:
修改验证数据集子模块,具体用于按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值,得到修改后的验证数据集;Modify the verification data set sub-module, which is specifically used to modify the value of the target feature in each piece of verification data in the verification data set according to the perturbation method, and obtain the modified verification data set;
重复修改子模块,具体用于根据扰动次数,重复按照扰动方法,修改验证数据集中每条验证数据中目标特征的取值的步骤,得到与扰动次数相同个数的修改后的验证数据集。The repeated modification sub-module is specifically used to repeat the step of modifying the value of the target feature in each piece of verification data in the verification data set according to the perturbation times and according to the perturbation method, so as to obtain a modified verification data set with the same number of perturbations.
应用本申请实施例的装置,可以通过只需对目标特征的值进行扰动来预测目标标签的值,从而通过计算对目标特征的值进行多次扰动后目标标签的值的变化,来得到待检测模型中目标特征与目标特征的关联度,从而减少了网络侧的计算量,达到减少算力浪费的技术效果。Applying the device of the embodiment of the present application, it is possible to predict the value of the target label by only perturbing the value of the target feature, and then by calculating the change of the value of the target label after multiple perturbations to the value of the target feature, to obtain the value to be detected The degree of correlation between the target feature and the target feature in the model reduces the amount of calculation on the network side and achieves the technical effect of reducing the waste of computing power.
在一种可能的实施方式中,扰动方法为按照正态分布方式扰动;修改验证数据集子模块,包括:In a possible implementation, the perturbation method is perturbation according to a normal distribution; modifying the verification data set submodule includes:
正态分布曲线计算单元,具体用于计算目标特征的取值在验证数据集内的呈正态分布时的均值及方差值,得到目标特征的取值在验证数据集内的正态分布曲线;The normal distribution curve calculation unit is specifically used to calculate the mean value and variance value when the value of the target feature is normally distributed in the verification data set, and obtain the normal distribution curve of the value of the target feature in the verification data set ;
正态分布修改单元,具体用于沿正态分布曲线随机改动每条验证数据中目标特征的取值。The normal distribution modification unit is specifically used to randomly change the value of the target feature in each piece of verification data along the normal distribution curve.
应用本申请实施例的装置,可以提供一种对目标特征的值进行扰动的方法,增加扰动目标特征取值的方法的多样性。Applying the device of the embodiment of the present application can provide a method for disturbing the value of the target feature, and increase the diversity of methods for disturbing the value of the target feature.
在一种可能的实施方式中,关联度计算模块,包括:In a possible implementation manner, the correlation calculation module includes:
平均值计算子模块,具体用于计算第二预测结果与第一预测结果的差值的平均值,得到待检测模型中目标特征与目标标签的关联度。The average value calculation sub-module is specifically used to calculate the average value of the difference between the second prediction result and the first prediction result to obtain the correlation degree between the target feature and the target label in the model to be tested.
应用本申请实施例的装置,可以提供一种对待检测模型中目标特征与目标标签的关系进行量化的方法,得到一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法可以减少计算量,从而减少算力的浪费。Applying the device of the embodiment of the present application can provide a method for quantifying the relationship between the target feature and the target label in the model to be detected, and obtain a correlation degree that can directly represent the influence of the target feature on the target label without predicting all features The impact on the target label, so as to compare the influence of the target feature on the target label. Therefore, the application of the method of this application can reduce the amount of calculation, thereby reducing the waste of computing power.
在一种可能的实施方式中,装置还包括:In a possible implementation manner, the device also includes:
中间结果反馈模块,用于将每次预测的第二预测结果作为中间结果反馈给客户端组件。The intermediate result feedback module is configured to feed back the second prediction result of each prediction to the client component as an intermediate result.
应用本申请实施例的装置,可以将执行预测任务中的中间结果也反馈给服务器组件,从而利于用户对任务运行的过程进行查询。By applying the device of the embodiment of the present application, the intermediate results in the execution of the prediction task can also be fed back to the server component, so as to facilitate the user to inquire about the process of task operation.
在一种可能的实施方式中,装置还包括:In a possible implementation manner, the device also includes:
反馈结果存储模块,用于将关联度,和/或,中间结果存储在数据库内;The feedback result storage module is used to store the degree of association and/or intermediate results in the database;
查询信息接收模块,用于接收来自客户端组件的查询信息,查询信息中包括待查询的目标特征及目标标签名称;The query information receiving module is used to receive the query information from the client component, and the query information includes the target feature to be queried and the target tag name;
查询结果反馈模块,用于从数据库内查找到目标特征及目标标签名称对应的关联度,将包含关联度的查询结果反馈给客户端组件。The query result feedback module is used to find the correlation degree corresponding to the target feature and the target tag name from the database, and feed back the query result including the correlation degree to the client component.
应用本申请实施例的装置,可以使用户能够对目标特征和目标标签的关联度及运行结果进行查询,使得用户能够观看到中间的预测结果,也能够自行进行手动验算或在系统发生故障时进行检测,对用户更加友好。Applying the device of the embodiment of the present application can enable the user to inquire about the correlation degree between the target feature and the target label and the operation result, so that the user can watch the intermediate prediction result, and can also perform manual checking or calculation when the system fails. detection, which is more user-friendly.
在实际应用中,若将本申请实施例第一方面的计算关联度的方法应用在训练资源组件中,则本申请的第三方面还可以提供一种计算特征与标签关联度的系统,具体的,该系统包括如图4所示:In practical applications, if the method for calculating the degree of association in the first aspect of the embodiment of the present application is applied to the training resource component, the third aspect of the application can also provide a system for calculating the degree of association between features and tags, specifically , the system includes as shown in Figure 4:
客户端组件401(Client端,客户端),用于获取用户输入的检测参数,并将检测参数发送给服务器组件,接收并显示服务器组件发送的包括目标特征与目标标签关联度的反馈结果,其中,检测参数中包括待检测模型信息、验证数据集信息、预设容器平台标识、目标特征和目标标签的名称、扰动方式信息;Client component 401 (Client end, client end), is used for obtaining the detection parameter of user input, and detection parameter is sent to server component, receives and displays the feedback result that server component sends and comprises target feature and target label association degree, wherein , the detection parameters include the information of the model to be detected, the information of the verification data set, the identification of the preset container platform, the name of the target feature and target label, and the information of the disturbance method;
服务器组件402(Server端,服务端),用于接收客户端发送的检测参数,根据预设容器平台标识,向训练资源组件发送调用指令,接收训练资源组件反馈的包括目标特征与目标标签关联度的反馈结果,将反馈结果发送给客户端组件;The server component 402 (Server end, server end) is used to receive the detection parameters sent by the client, according to the preset container platform identifier, send a call instruction to the training resource component, and receive the feedback from the training resource component, including the degree of correlation between the target feature and the target label The feedback result, and send the feedback result to the client component;
训练资源组件403,用于根据接收到的调用指令,调用与预设容器平台标识对应的容器平台,与待检测模型信息对应的待检测模型,通过验证数据集信息获取验证数据集,将验证数据集输入待检测模型,获得第一预测结果;按照扰动方式信息,修改验证数据集中目标特征的取值,得到修改后的验证数据集;将修改后的验证数据集输入待检测模型进行预测,得到、第二预测结果;根据第二预测结果与第一预测结果计算待检测模型中目标特征与目标标签的关联度;并将关联度反馈给服务器组件。The
应用本申请实施例的系统,对已经训练好的待检测模型进行目标特征与目标标签关联度的检测,检测出的关联度是与该待检测模型相关且可靠的,是能够应用到该待检测模型中的。通过将检测参数输入给网络侧,网络侧通过预设容器平台将验证数据集输入待检测模型中,对验证数据集中的目标特征值进行多次扰动,预测目标标签的值,实现了能够自动分析待检测模型中目标特征和目标标签的关联度,避免人工进行输入计算。此外,在检测目标标签与目标特征关联度时,只需对目标特征的值进行扰动预测目标标签的值,得到的是一个直接可以表示目标特征对目标标签影响力的关联度,而不需预测所有特征对目标标签的影响,从而对比出目标特征对目标标签的影响力,因此,应用本申请的方法,可以大幅减少计算量,从而减少算力的浪费。Apply the system of the embodiment of the present application to detect the correlation degree between the target feature and the target label on the trained model to be detected. The detected correlation degree is related to the model to be detected and is reliable, and can be applied to the model to be detected. in the model. By inputting the detection parameters to the network side, the network side inputs the verification data set into the model to be detected through the preset container platform, performs multiple perturbations on the target feature values in the verification data set, predicts the value of the target label, and realizes automatic analysis The degree of correlation between target features and target labels in the model to be detected avoids manual input calculations. In addition, when detecting the degree of correlation between the target label and the target feature, it is only necessary to perturb the value of the target feature to predict the value of the target label, and obtain a correlation degree that can directly represent the influence of the target feature on the target label without predicting The influence of all features on the target label is compared to compare the influence of the target features on the target label. Therefore, the application of the method of the present application can greatly reduce the amount of calculation, thereby reducing the waste of computing power.
如上述图4所示,客户端组件401包括以下模块:As shown in FIG. 4 above, the
命令行工具模块(Cli,命令行工具),用于获取用户通过命令行工具输入的检测参数,并将检测参数发送给服务器组件。The command line tool module (Cli, command line tool) is used to obtain the detection parameters input by the user through the command line tool, and send the detection parameters to the server component.
网址模块(Web,网址),用于接收并显示服务器组件发送的包括目标特征与目标标签关联度的反馈结果。The URL module (Web, URL) is used to receive and display the feedback result sent by the server component, including the correlation degree between the target feature and the target label.
其中,网址模块还可以提供一个该系统的网址,用户可以通过点击网址,进入该系统的客户端界面,可以在界面中通过输入检测参数,该界面还可以显示服务器组件发送的包括目标特征与目标标签关联度的反馈结果。命令行工具模块是向用户提供命令行工具,用户可以以命令行工具的方式,键入检测参数。Wherein, the URL module can also provide a URL of the system, and the user can click the URL to enter the client interface of the system, and can input the detection parameters in the interface, and the interface can also display the information sent by the server component, including target features and target parameters. Feedback result of tag association degree. The command-line tool module provides a command-line tool for the user, and the user can input detection parameters in the form of the command-line tool.
应用本申请实施例提供的系统,用户可以通过网址模块和命令行工具模块多种方式进行检测参数的输入,增加了用户输入方式的选择多样性。By applying the system provided by the embodiment of the present application, the user can input detection parameters in multiple ways through the website module and the command line tool module, which increases the diversity of user input modes.
如图4所示,客户端组件401还包括接口调用输入模块(Restful API,REST风格的API),用于与第三方系统建立连接,实现与第三方系统的集成。As shown in FIG. 4 , the
具体的,接口调用输入模块能够向高级用户提供一个可以接入第三方系统的接口,高级用户通过输入该接口特定的参数,将第三方系统与本系统集成,从而通过接口将第三方系统的检测参数向客户端输入。其中,高级用户为具有第三方系统的用户。Specifically, the interface call input module can provide advanced users with an interface that can access the third-party system, and the advanced user can integrate the third-party system with the system by inputting specific parameters of the interface, so that the detection of the third-party system can be detected through the interface. Parameters are input to the client. Among them, advanced users are users with third-party systems.
应用本申请实施例的系统,能够通过提供一个API接口与第三方系统集成,使用户可以直接通过第三方系统与本申请实施例的系统进行交互,增加了用户使用的多样性。The system applying the embodiment of the present application can integrate with the third-party system by providing an API interface, so that the user can directly interact with the system of the embodiment of the present application through the third-party system, increasing the diversity of user use.
在一种可能的实施方式中,命令行工具模块,还用于接收用户发送的查询信息,查询信息中包括待查询的目标特征及目标标签名称;In a possible implementation manner, the command line tool module is also used to receive query information sent by the user, where the query information includes the target feature to be queried and the target tag name;
网址模块,还用于接收并显示服务器组件发送的包含关联度的查询结果。The URL module is also used to receive and display the query result including the degree of relevance sent by the server component.
其中,网址模块实际上提供了一可视化界面,不仅能通过该可视化界面看到目标特征与目标标签关联度的最终反馈结果以及中间反馈结果,还可以通过该界面看到用户输入的检测参数,以及分析任务的运行状态、使用资源状态信息。Among them, the URL module actually provides a visual interface through which not only the final feedback results and intermediate feedback results of the correlation between target features and target tags can be seen, but also the detection parameters input by the user can be seen through this interface, and Analyze the running status of tasks and use resource status information.
其中,分析任务的运行状态包括:运行中、运行成功、运行失败三种状态。客户端组件在显示运行状态时,还可以根据用户指令,显示计算待检测模型中目标特征和目标标签关联度的中间结果。使用资源状态信息包括预设容器平台的调取情况和GPU的内存运行情况。其中,预设容器平台的调取情况可以包括调用的预设容器平台的数量,预设容器平台的运行环境。Wherein, the running state of the analysis task includes three states: running, running successfully, and running failed. When the client component displays the running status, it can also display the intermediate results of calculating the correlation between the target feature and the target label in the model to be detected according to the user's instruction. The resource status information includes the calling status of the preset container platform and the memory operation status of the GPU. Wherein, the calling situation of the preset container platform may include the number of called preset container platforms and the operating environment of the preset container platform.
应用本申请实施例的方法,可以在客户端中显示待检测模型中目标特征和目标标签的关联度的计算情况,使用户可以直接看看各类信息,从而让用户感受更直观、使用更方便。By applying the method of the embodiment of this application, the calculation of the correlation degree between the target features and the target tags in the model to be detected can be displayed on the client, so that the user can directly view various information, so that the user feels more intuitive and more convenient to use .
如上述图4所示,服务器组件402包括以下模块:As shown in FIG. 4 above, the
接口服务模块(API server),具体用于接收客户端组件发送的检测参数,向管理器模块发送包含检测参数的执行任务请求;The interface service module (API server) is specifically used to receive the detection parameters sent by the client component, and send an execution task request including the detection parameters to the manager module;
管理器模块(Manager),具体用于接收接口服务模块发送的执行任务请求,依据执行任务请求,向云接口调用模块发送调用指令;接收云接口调用模块反馈的包括目标特征与目标标签关联度的反馈结果,并将反馈结果通过接口服务模块发送给客户端组件;The manager module (Manager) is specifically used to receive the execution task request sent by the interface service module, and send the call instruction to the cloud interface call module according to the execution task request; receive the feedback from the cloud interface call module, including the target feature and the target tag correlation degree Feedback the result, and send the feedback result to the client component through the interface service module;
云接口调用模块(Cloud Adaptor),具体用于接收管理器模块发送的调用指令;依据调用指令,调用训练资源组件,接收训练资源组件返馈的反馈结果。The cloud interface calling module (Cloud Adaptor) is specifically used to receive the calling instruction sent by the manager module; call the training resource component according to the calling instruction, and receive the feedback result returned by the training resource component.
应用本申请实施例的系统,用户可以只需将检测参数输入给服务器组件,由服务器组件调用训练资源组件计算待检测模型中目标特征和目标标签的关联度,避免了人工的对多次预测结果进行关联度计算,实现了能够自动分析待检测模型中目标特征和目标标签的关联度的技术效果。Applying the system of the embodiment of the present application, the user only needs to input the detection parameters to the server component, and the server component calls the training resource component to calculate the correlation degree between the target feature and the target label in the model to be detected, avoiding manual prediction of multiple prediction results The calculation of the correlation degree realizes the technical effect of automatically analyzing the correlation degree between the target feature and the target label in the model to be detected.
如上述图4所示,服务器组件402还包括数据保存模块,即,数据库(DB,Datebase),用于接收并保存管理器模块发送的反馈结果。As shown in FIG. 4 above, the
具体的,管理器模块可以将接收到的结果写入该数据库中。管理器模块发送的反馈结果可以包括待检测模型中目标特征与目标标签的关联度、训练资源组件反馈的中间结果和对关联度进行评述的结果。Specifically, the manager module can write the received results into the database. The feedback result sent by the manager module may include the correlation degree between the target feature and the target label in the model to be detected, the intermediate result fed back by the training resource component, and the result of commenting on the correlation degree.
应用本申请实施例的系统,可以将接收到的结果存储在数据库中,以供客户端进行查询,使用户能够更直观的看到任务运行结果。Applying the system of the embodiment of the present application, the received results can be stored in the database for the client to query, so that the user can see the task running results more intuitively.
在上述服务器组件402中,接口服务模块,还用于接收来自客户端组件的查询信息,查询信息中包括待查询的目标特征及目标标签名称;向管理器模块发送包含查询信息的执行任务请求。In the above-mentioned
管理器模块,用于依据执行任务请求,从数据库内查找到目标特征及目标标签名称对应的关联度,将包含关联度的查询结果通过接口服务模块反馈给客户端组件。The manager module is used to find the correlation degree corresponding to the target feature and the target label name from the database according to the task execution request, and feed back the query result including the correlation degree to the client component through the interface service module.
应用本申请实施例的系统,可以使用户能够对目标特征和目标标签的关联度及运行结果进行查询,对用户更加友好。Applying the system of the embodiment of the present application can enable the user to inquire about the correlation degree between the target feature and the target label and the operation result, which is more user-friendly.
本申请实施例还提供了一种电子设备,如图5所示,包括处理器501、通信接口502、存储器503和通信总线504,其中,处理器501,通信接口502,存储器503通过通信总线504完成相互间的通信,The embodiment of the present application also provides an electronic device, as shown in FIG. complete the mutual communication,
存储器503,用于存放计算机程序;
处理器501,用于执行存储器503上所存放的程序时,实现如下步骤:When the
接收来自客户端组件的检测参数;其中,检测参数包括计算目标特征与目标标签之间关联度所需要的参数;Receive detection parameters from the client component; where the detection parameters include parameters required to calculate the degree of association between the target feature and the target label;
根据检测参数,调用待检测模型;其中,待检测模型用于依据输入的目标特征预测目标标签的值;According to the detection parameters, the model to be detected is called; wherein, the model to be detected is used to predict the value of the target label according to the input target characteristics;
通过检测参数获取验证数据集;其中,验证数据集包括目标特征的取值,验证数据集用于输入进待检测模型中;Obtain a verification data set by detecting parameters; wherein, the verification data set includes the value of the target feature, and the verification data set is used to input into the model to be detected;
将验证数据集输入待检测模型,获得第一预测结果;Input the verification data set into the model to be tested to obtain the first prediction result;
按照检测参数修改验证数据集中目标特征的取值,得到修改后的验证数据集;Modify the value of the target feature in the verification data set according to the detection parameters to obtain the modified verification data set;
将修改后的验证数据集输入待检测模型进行预测,得到第二预测结果;Inputting the modified verification data set into the model to be tested for prediction to obtain a second prediction result;
根据第二预测结果与第一预测结果计算待检测模型中目标特征与目标标签的关联度。The degree of correlation between the target feature and the target label in the model to be detected is calculated according to the second prediction result and the first prediction result.
上述终端提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect,简称PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned by the terminal above may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (EISA for short) bus or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the terminal and other devices.
存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include a random access memory (Random Access Memory, RAM for short), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application SpecificIntegrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processor, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现服务器组件执行上述实施例中任一的计算特征与标签关联度的方法。In yet another embodiment provided by the present application, a computer-readable storage medium is also provided. A computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, the server component executes any one of the above-mentioned embodiments. A method for calculating the degree of association between features and labels.
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现训练资源组件执行上述实施例中任一的计算特征与标签关联度的方法。In yet another embodiment provided by the present application, a computer-readable storage medium is also provided. A computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, the training resource component implements any of the above-mentioned embodiments. A method for calculating the degree of association between features and labels.
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一的计算特征与标签关联度的方法。In yet another embodiment provided by the present application, a computer program product including instructions is also provided, which, when run on a computer, causes the computer to execute the method for calculating the degree of association between a feature and a tag in any of the above embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)).
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device and system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiments.
以上仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application are included within the protection scope of this application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310008414.8A CN115952208A (en) | 2023-01-04 | 2023-01-04 | A method, device and electronic equipment for calculating the degree of association between features and tags |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310008414.8A CN115952208A (en) | 2023-01-04 | 2023-01-04 | A method, device and electronic equipment for calculating the degree of association between features and tags |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115952208A true CN115952208A (en) | 2023-04-11 |
Family
ID=87290490
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310008414.8A Pending CN115952208A (en) | 2023-01-04 | 2023-01-04 | A method, device and electronic equipment for calculating the degree of association between features and tags |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115952208A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101638114B1 (en) * | 2015-11-27 | 2016-07-11 | 중앙대학교 산학협력단 | Apparatus and method for accelerating multi-label feature selection based on low-rank approximation |
| CN109934334A (en) * | 2019-03-04 | 2019-06-25 | 三峡大学 | A Disturbance-Based Sensitivity Analysis Method for Chlorophyll a Content-related Factors |
| CN110909005A (en) * | 2019-11-29 | 2020-03-24 | 广州市百果园信息技术有限公司 | Model feature analysis method, device, equipment and medium |
| CN111143416A (en) * | 2019-12-25 | 2020-05-12 | 深圳广联赛讯有限公司 | Data cache-based query method, terminal and storage medium |
| CN111640510A (en) * | 2020-04-09 | 2020-09-08 | 之江实验室 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
-
2023
- 2023-01-04 CN CN202310008414.8A patent/CN115952208A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101638114B1 (en) * | 2015-11-27 | 2016-07-11 | 중앙대학교 산학협력단 | Apparatus and method for accelerating multi-label feature selection based on low-rank approximation |
| CN109934334A (en) * | 2019-03-04 | 2019-06-25 | 三峡大学 | A Disturbance-Based Sensitivity Analysis Method for Chlorophyll a Content-related Factors |
| CN110909005A (en) * | 2019-11-29 | 2020-03-24 | 广州市百果园信息技术有限公司 | Model feature analysis method, device, equipment and medium |
| CN111143416A (en) * | 2019-12-25 | 2020-05-12 | 深圳广联赛讯有限公司 | Data cache-based query method, terminal and storage medium |
| CN111640510A (en) * | 2020-04-09 | 2020-09-08 | 之江实验室 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11593400B1 (en) | Automatic triage model execution in machine data driven monitoring automation apparatus | |
| CN110826071B (en) | Software vulnerability risk prediction method, device, equipment and storage medium | |
| US20240296315A1 (en) | Artificial intelligence prompt processing and storage system | |
| US9584525B2 (en) | Entitlement predictions | |
| CN107305611B (en) | Method and device for establishing model corresponding to malicious account and method and device for identifying malicious account | |
| US11061934B1 (en) | Method and system for characterizing time series | |
| WO2022007434A1 (en) | Visualization method and related device | |
| CN109543891B (en) | Method and apparatus for establishing capacity prediction model, and computer-readable storage medium | |
| US11886939B2 (en) | System, device, method and datastack for managing applications that manage operation of assets | |
| US11062224B2 (en) | Prediction using fusion of heterogeneous unstructured data | |
| CN112506619A (en) | Job processing method, apparatus, electronic device, storage medium, and program product | |
| JP7305641B2 (en) | Methods and systems for tracking application activity data from remote devices and generating corrective behavior data structures for remote devices | |
| CN112214770B (en) | Malicious sample identification method, device, computing equipment and medium | |
| CN113656391A (en) | Data detection method and device, storage medium and electronic equipment | |
| CN118069933B (en) | Application system integration method and system based on machine intelligence | |
| CN115952208A (en) | A method, device and electronic equipment for calculating the degree of association between features and tags | |
| CN114528493B (en) | Recommendation method, device, electronic device and storage medium | |
| CN116957354A (en) | A policy evolution path analysis method, device and electronic equipment | |
| CN116155628A (en) | Network security detection method, training device, electronic equipment and medium | |
| CN113138772B (en) | Construction method and device of data processing platform, electronic equipment and storage medium | |
| CN115187364A (en) | Method and device for monitoring margin risk in bank distributed scenario | |
| CN114281586A (en) | Fault determination method and device, electronic equipment and computer readable storage medium | |
| US12184742B1 (en) | Automatic service discovery | |
| CN115329280B (en) | Data screening method, device, equipment and medium | |
| CN113761877B (en) | Data processing method, device, electronic equipment and medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |