[go: up one dir, main page]

CN116664968A - Data processing method, model training method, device and electronic device - Google Patents

Data processing method, model training method, device and electronic device Download PDF

Info

Publication number
CN116664968A
CN116664968A CN202310377068.0A CN202310377068A CN116664968A CN 116664968 A CN116664968 A CN 116664968A CN 202310377068 A CN202310377068 A CN 202310377068A CN 116664968 A CN116664968 A CN 116664968A
Authority
CN
China
Prior art keywords
target
image
face images
face
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310377068.0A
Other languages
Chinese (zh)
Inventor
陈坤鹏
邓巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Xiaomi Technology Wuhan Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Xiaomi Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd, Xiaomi Technology Wuhan Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202310377068.0A priority Critical patent/CN116664968A/en
Publication of CN116664968A publication Critical patent/CN116664968A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a data processing method, a model training method, a device and electronic equipment, wherein the method comprises the following steps: obtaining a target data set of face images of a target object, wherein the face images in at least one image pair in the target data set are different, determining a target image pair with similarity smaller than a similarity threshold according to the similarity between the face images in at least one image pair, determining the occurrence frequency of each target face image according to the target face images included in the target image pair, and deleting any target face image from the target data set in response to the occurrence frequency of any target face image being greater than a quantity threshold. By determining the target image pair with poor similarity, counting the occurrence frequency of the target face image in the range of the target image pair, and carrying out data cleaning on the target data set based on the occurrence frequency, the face image data in the target data set are face images meeting the quality requirement, and the accuracy and efficiency are improved.

Description

数据处理方法、模型训练方法、装置和电子设备Data processing method, model training method, device and electronic device

技术领域technical field

本申请涉及计算机技术领域,尤其涉及一种数据处理方法、模型训练方法、装置和电子设备。The present application relates to the field of computer technology, and in particular to a data processing method, a model training method, a device and electronic equipment.

背景技术Background technique

随着机器学习技术的发展,如基于机器学习的人脸识别技术已经被广泛的应用于社会生活的各个方面。人脸识别技术通常利用样本人脸图像训练人脸识别模型,而训练基于深度学习的人脸识别模型,通常需要海量的人脸数据。With the development of machine learning technology, such as face recognition technology based on machine learning has been widely used in all aspects of social life. Face recognition technology usually uses sample face images to train face recognition models, while training face recognition models based on deep learning usually requires massive amounts of face data.

相关技术中,海量数据的获取通常是用爬虫从互联网上以关键字爬取并保存,这类数据的特点是噪声多,数据不均衡,需要做数据清洗,以确保每一个数据集中图像数据是符合质量要求的数据,从而,如何提高对数据集中数据处理的准确性是亟需解决的技术问题。In related technologies, the acquisition of massive data is usually crawled and saved from the Internet with keywords by crawlers. This type of data is characterized by a lot of noise and unbalanced data. Data cleaning is required to ensure that the image data in each data set is Data that meets the quality requirements, so how to improve the accuracy of data processing in the dataset is a technical problem that needs to be solved urgently.

发明内容Contents of the invention

本申请旨在至少在一定程度上解决相关技术中的技术问题之一。This application aims to solve one of the technical problems in the related art at least to a certain extent.

为此,本申请提出一种数据处理方法、模型训练方法、装置和电子设备,提高了数据处理的准确性。For this reason, the present application proposes a data processing method, a model training method, a device, and electronic equipment, which improve the accuracy of data processing.

本申请一方面实施例提出了一种数据处理方法,包括:An embodiment of the present application proposes a data processing method, including:

获取目标对象的人脸图像的目标数据集;其中,所述目标数据集中包括至少一个图像对,图像对中的人脸图像不同;Obtain a target data set of a face image of a target object; wherein, the target data set includes at least one image pair, and the face images in the image pair are different;

根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对;According to the similarity between the face images in the at least one image pair, determine a target image pair whose similarity is less than a similarity threshold;

根据所述目标图像对包括的目标人脸图像,确定各个所述目标人脸图像出现的频次;According to the target human face images included in the target image pair, determine the frequency of occurrence of each of the target human face images;

响应于任一所述目标人脸图像出现的频次大于数量阈值,将任一所述目标人脸图像从所述目标数据集中删除。In response to the occurrence frequency of any of the target human face images being greater than a quantity threshold, any of the target human face images is deleted from the target data set.

本申请一方面实施例提出了一种模型训练方法,包括:An embodiment of the present application proposes a model training method, including:

获取样本目标数据集;其中,所述样本目标数据集中包含目标对象的样本人脸图像;所述样本人脸图像中标注有所包括的人脸的身份信息;Obtain a sample target data set; wherein, the sample target data set contains a sample face image of the target object; the sample face image is marked with the identity information of the included face;

将任一所述样本人脸图像输入识别模型进行分类,以确定对任一所述样本人脸图像进行预测得到的身份信息;Inputting any of the sample human face images into the recognition model for classification to determine the identity information obtained by predicting any of the sample human face images;

基于预测的身份信息和标注的身份信息之间的差异,对所述识别模型进行训练。The recognition model is trained based on the difference between the predicted identity information and the labeled identity information.

本申请另一方面实施例提出了一种数据处理装置,包括:Another embodiment of the present application proposes a data processing device, including:

获取模块,用于获取目标对象的人脸图像的目标数据集;其中,所述目标数据集中包括至少一个图像对,图像对中的人脸图像不同;An acquisition module, configured to acquire a target data set of a face image of a target object; wherein, the target data set includes at least one image pair, and the face images in the image pair are different;

第一确定模块,用于根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对;A first determination module, configured to determine a target image pair whose similarity is less than a similarity threshold according to the similarity between the face images in the at least one image pair;

第二确定模块,用于根据所述目标图像对包括的目标人脸图像,确定各个所述目标人脸图像出现的频次;The second determination module is used to determine the frequency of appearance of each target face image according to the target face image included in the target image pair;

处理模块,用于响应于任一所述目标人脸图像出现的频次大于数量阈值,将任一所述目标人脸图像从所述目标数据集中删除。A processing module, configured to delete any of the target face images from the target data set in response to the occurrence frequency of any of the target face images being greater than a quantity threshold.

本申请另一方面实施例提出了一种模型训练装置,包括:Another embodiment of the present application proposes a model training device, including:

获取模块,用于获取样本目标数据集;其中,所述样本目标数据集中包含目标对象的样本人脸图像;所述样本人脸图像中标注有所包括的人脸的身份信息;An acquisition module, configured to acquire a sample target data set; wherein, the sample target data set contains a sample face image of a target object; the sample face image is marked with the identity information of the included face;

识别模块,用于将任一所述样本人脸图像输入识别模型进行分类,以确定对任一所述样本人脸图像进行预测得到的身份信息;A recognition module, configured to input any of the sample face images into a recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images;

训练模块,用于基于预测的身份信息和标注的身份信息之间的差异,对所述识别模型进行训练。A training module, configured to train the recognition model based on the difference between the predicted identity information and the labeled identity information.

本申请另一方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如前述实施例所述的方法。Another embodiment of the present application proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the aforementioned embodiments are implemented. the method described.

本申请另一方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如前述实施例所述的方法。Another embodiment of the present application provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the methods described in the foregoing embodiments are implemented.

本申请另一方面实施例提出了一种计算机程序产品,其上存储有计算机程序,所述程序被处理器执行时实现如前述实施例所述的方法。Another embodiment of the present application provides a computer program product, on which a computer program is stored, and when the program is executed by a processor, the method described in the foregoing embodiments is implemented.

本申请提出的数据处理方法、模型训练方法、装置和电子设备,获取目标对象的人脸图像的目标数据集,其中,目标数据集中包括至少一个图像对,图像对中的人脸图像不同,根据至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。通过人脸图像间的相似度,确定出相似度较差的目标图像对,在目标图像对的范围内,进一步统计得到目标图像对中的目标人脸图像出现的频次,基于出现的频次对目标数据集进行数据清洗,使得目标数据集中人脸图像数据均是符合质量要求的人脸图像,提高了精准度和效率。The data processing method, model training method, device and electronic equipment proposed by the present application obtain a target data set of a face image of a target object, wherein the target data set includes at least one image pair, and the face images in the image pair are different, according to The similarity between the face images in at least one image pair, determine the target image pair whose similarity is less than the similarity threshold, determine the frequency of occurrence of each target face image according to the target face images included in the target image pair, and respond to If the frequency of occurrence of any target face image is greater than the number threshold, any target face image is deleted from the target data set. Through the similarity between face images, the target image pair with poor similarity is determined. Within the scope of the target image pair, the frequency of occurrence of the target face image in the target image pair is further counted, and the target image is calculated based on the frequency of occurrence. The data set is cleaned, so that the face image data in the target data set are all face images that meet the quality requirements, which improves the accuracy and efficiency.

本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

附图说明Description of drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为本申请实施例所提供的一种数据处理方法的流程示意图;FIG. 1 is a schematic flow diagram of a data processing method provided in an embodiment of the present application;

图2为本申请实施例提供的另一种数据处理方法的流程示意图;FIG. 2 is a schematic flow diagram of another data processing method provided in the embodiment of the present application;

图3为本申请实施例提供的另一种数据处理方法的流程示意图;FIG. 3 is a schematic flow diagram of another data processing method provided in the embodiment of the present application;

图4为本申请实施例提供的另一种模型训练方法的流程示意图;Fig. 4 is a schematic flow chart of another model training method provided by the embodiment of the present application;

图5为本申请实施例提供的一种数据处理装置的结构示意图;FIG. 5 is a schematic structural diagram of a data processing device provided in an embodiment of the present application;

图6为本申请实施例提供的一种模型训练装置的结构示意图;FIG. 6 is a schematic structural diagram of a model training device provided in an embodiment of the present application;

图7为本申请实施例提供的一种电子设备的框图。FIG. 7 is a block diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

下面参考附图描述本申请实施例的数据处理方法、模型训练方法、装置和电子设备。The following describes the data processing method, model training method, device, and electronic equipment of the embodiments of the present application with reference to the accompanying drawings.

相关技术中,对数据集中的人脸图像进行清洗以去除异常数据或噪声数据,即通过人脸聚类的方式,将每一个数据集中的人脸图像聚类并确定类中心,然后筛选掉距离类中心比较远的图像以达到去除噪声的目的,这类方法比较简单粗暴,处理的准确度较差。同时聚类算法本身类中心个数无法确定,可能不同文件夹下的人脸数据有不同数量的类中心,容易造成偏差且这类方法计算量较大,处理效率较低效率也低。In related technologies, the face images in the data set are cleaned to remove abnormal data or noise data, that is, through face clustering, the face images in each data set are clustered and the cluster centers are determined, and then the distance The image with a relatively far center is used to achieve the purpose of noise removal. This type of method is relatively simple and crude, and the processing accuracy is poor. At the same time, the number of cluster centers in the clustering algorithm itself cannot be determined, and the face data in different folders may have different numbers of cluster centers, which is easy to cause deviation, and this type of method has a large amount of calculation and low processing efficiency.

图1为本申请实施例所提供的一种数据处理方法的流程示意图。FIG. 1 is a schematic flowchart of a data processing method provided by an embodiment of the present application.

本申请实施例的数据处理方法的执行主体为数据处理装置,该装置可设置于电子设备中,其中,电子设备可以为终端设备,或者为服务器,本实施例中不进行限定。The execution subject of the data processing method in the embodiment of the present application is a data processing device, and the device may be set in an electronic device, where the electronic device may be a terminal device or a server, which is not limited in this embodiment.

如图1所示,该方法可以包括以下步骤:As shown in Figure 1, the method may include the following steps:

步骤101,获取目标对象的人脸图像的目标数据集。Step 101, acquiring a target data set of a face image of a target object.

其中,目标对象,即目标用户。Among them, the target object is the target user.

本申请实施例中,目标数据集中包含的人脸图像是目标对象的人脸图像,将目标数据集中的人脸图像进行两两配对得到至少一个图像对,图像对中的人脸图像不同。作为一种实现方式,将目标数据集中的人脸图像进行编号,人脸图像不同则对应的编号不同,将编号不同的人脸图像两两配对,得到至少一个图像对。In the embodiment of the present application, the face images contained in the target data set are face images of the target object, and at least one image pair is obtained by pairing the face images in the target data set, and the face images in the image pairs are different. As an implementation manner, the face images in the target data set are numbered, different face images correspond to different numbers, and two face images with different numbers are paired to obtain at least one image pair.

步骤102,根据至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对。Step 102, according to the similarity between face images in at least one image pair, determine a target image pair whose similarity is smaller than a similarity threshold.

本申请实施例中,针对每一个图像对,确定图像对中的人脸图像之间的相似度,其中,相似度指示了图像对中的人脸图像之间的相似程度,通过该相似程度可以指示两个人脸图像有多大可能性为同一个对象的人脸图像,相似度越高,可能性越大,将各个相似度与相似度阈值比较,即将每一个相似度与相似度阈值比较,确定相似度小于相似度阈值的目标图像对,即筛选出不符合质量要求的目标图像对。In the embodiment of the present application, for each image pair, the similarity between the face images in the image pair is determined, wherein the similarity indicates the similarity between the face images in the image pair, through which the similarity can be Indicates how likely the two face images are of the same object. The higher the similarity, the greater the possibility. Compare each similarity with the similarity threshold, that is, compare each similarity with the similarity threshold to determine The target image pairs whose similarity is less than the similarity threshold, that is, the target image pairs that do not meet the quality requirements are screened out.

步骤103,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次。Step 103, according to the target face images included in the target image pair, determine the occurrence frequency of each target face image.

本申请实施例中,获取各个目标图像对包括的目标人脸图像,对每一个人脸图像出现的频次进行统计,得到每一个人脸图像在所有目标图像对中出现的频次。目标人脸图像出现的频次,指示了该目标人脸图像与目标样本集中的其它人脸图像之间的相似度较小的频率。频次越高,说明该目标人脸图像和目标样本集中的其它人脸图像间的差距越大,越可能是异常的目标人脸图像。In the embodiment of the present application, the target face images included in each target image pair are obtained, and the frequency of appearance of each face image is counted to obtain the frequency of appearance of each face image in all target image pairs. The frequency of appearance of the target face image indicates the frequency with which the similarity between the target face image and other face images in the target sample set is small. The higher the frequency, the greater the gap between the target face image and other face images in the target sample set, and the more likely it is an abnormal target face image.

作为一种示例,目标图像对为4对,分别为目标图像对1,目标图像对2、目标图像对3和目标图像对4,其中,目标图像对1中包含的目标人脸图像的编号为1和3,目标图像对2中包含的目标人脸图像的编号为2和3,目标图像对3中包含的目标人脸图像的编号为1和4,以及目标图像对4中包含的目标人脸图像的编号为1和2,通过统计得到编号为1的目标人脸图像出现的频次为3次,编号为2的目标人脸图像出现的频次为2次,编号为3的目标人脸图像出现的频次为2次,编号为4的目标人脸图像出现的频次为1次。As an example, there are 4 pairs of target images, which are respectively target image pair 1, target image pair 2, target image pair 3 and target image pair 4, wherein the number of the target face image contained in the target image pair 1 is 1 and 3, the number of the target face image contained in the target image pair 2 is 2 and 3, the number of the target face image contained in the target image pair 3 is 1 and 4, and the target person contained in the target image pair 4 The numbers of the face images are 1 and 2. Through statistics, the frequency of the target face image numbered 1 is 3 times, the frequency of the target face image numbered 2 is 2 times, and the frequency of the target face image numbered 3 is 3 times. The frequency of appearance is 2 times, and the frequency of appearance of the target face image numbered 4 is 1 time.

步骤104,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。Step 104, deleting any target face image from the target data set in response to the occurrence frequency of any target face image being greater than the number threshold.

本申请实施例中,将每一个目标人脸图像出现的频次和数量阈值比较,确定出现的频次大于数量阈值的任一个目标人脸图像,为了便于标识,也可以称为第一目标人脸图像,则第一目标人脸数据为噪声数据或者为异常的人脸数据,则将第一目标人脸图像从目标数据集中删除,提高了目标数据集中人脸数据处理的准确度。进一步,处理后得到的目标数据集可以用于对相关的人脸识别模型进行训练,由于提高用于作为训练样本的人脸图像的准确度,从而可以提高人脸识别模型训练的精度。In the embodiment of the present application, the frequency of occurrence of each target face image is compared with the number threshold, and any target face image whose frequency of occurrence is determined to be greater than the number threshold can also be referred to as the first target face image for ease of identification. , then the first target face data is noise data or abnormal face data, then the first target face image is deleted from the target data set, which improves the accuracy of face data processing in the target data set. Further, the target data set obtained after processing can be used to train related face recognition models, and the accuracy of face recognition model training can be improved by improving the accuracy of face images used as training samples.

例如,数量阈值为2次,则编号为1的目标人脸图像出现的频次为3次,大于数量阈值,因此,确定编号为1的目标人脸图像为异常数据,需要从目标数据集中删除,以确保目标图像集中的目标对象的人脸图像均是符合质量要求的人脸图像。For example, if the number threshold is 2 times, the frequency of the target face image numbered 1 is 3 times, which is greater than the number threshold. Therefore, it is determined that the target face image numbered 1 is abnormal data and needs to be deleted from the target data set. To ensure that the face images of the target object in the target image set are all face images meeting the quality requirements.

需要理解的是,对于其它对象的目标图像集可采用本申请实施例的数据处理方法进行同样的处理,以得到其它对象的目标图像集,从而可得到大量满足质量要求的图像集,多个处理得到的图像集可用于对识别模型进行训练,以提高识别模型的训练精度,其中,训脸得到的识别模型用于进行人脸识别。It should be understood that the same process can be performed on the target image sets of other objects using the data processing method of the embodiment of the present application to obtain target image sets of other objects, so that a large number of image sets that meet the quality requirements can be obtained, and multiple processing The obtained image set can be used to train the recognition model to improve the training accuracy of the recognition model, wherein the recognition model obtained from face training is used for face recognition.

本申请实施例的数据处理方法中,获取目标对象的人脸图像的目标数据集,其中,目标数据集中包括至少一个图像对,图像对中的人脸图像不同,根据至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。通过人脸图像间的相似度,确定出相似度较差的目标图像对,在目标图像对的范围内,进一步统计得到目标图像对中的目标人脸图像出现的频次,基于出现的频次对目标数据集进行数据清洗,使得目标数据集中人脸图像数据均是符合质量要求的人脸图像,提高了精准度和效率。In the data processing method of the embodiment of the present application, the target data set of the face image of the target object is acquired, wherein the target data set includes at least one image pair, and the face images in the image pair are different, according to the person in the at least one image pair The similarity between the face images, determine the target image pair whose similarity is less than the similarity threshold, determine the frequency of occurrence of each target face image according to the target face image included in the target image pair, and respond to the appearance of any target face image If the frequency is greater than the number threshold, any target face image will be deleted from the target dataset. Through the similarity between face images, the target image pair with poor similarity is determined. Within the scope of the target image pair, the frequency of occurrence of the target face image in the target image pair is further counted, and the target image is calculated based on the frequency of occurrence. The data set is cleaned, so that the face image data in the target data set are all face images that meet the quality requirements, which improves the accuracy and efficiency.

基于上述实施例,图2为本申请实施例提供的另一种数据处理方法的流程示意图,如图2所示,该方法包含以下步骤:Based on the above embodiment, Fig. 2 is a schematic flow diagram of another data processing method provided in the embodiment of the present application. As shown in Fig. 2, the method includes the following steps:

步骤201,获取目标对象的人脸图像的初始数据集。Step 201, acquiring an initial dataset of face images of a target object.

作为一种实现方式,从互联网上通过关键词获取包括目标对象的人脸图像的初始数据集,而从网络获取到的目标对象的人脸图像包含较多的噪声数据,或称为异常数据,例如不属于该目标对象的人脸图像,或者,虽然是目标对象的人脸图像,但是质量较差无法使用等,因此,需要对初始数据集进行清洗,以识别出初始数据集中的噪声数据或异常数据。As an implementation method, the initial data set including the face image of the target object is obtained from the Internet through keywords, and the face image of the target object obtained from the network contains more noise data, or abnormal data, For example, the face image that does not belong to the target object, or, although it is the face image of the target object, the quality is poor and cannot be used, etc. Therefore, the initial data set needs to be cleaned to identify the noise data or the original data set. abnormal data.

步骤202,对初始数据集中的人脸图像进行识别。Step 202, recognize the face images in the initial data set.

作为一种实现方式,可以通过人脸检测模型对初始数据集中的人脸图像进行检测,识别出各个人脸图像中是否包括人脸,以及包括的人脸的尺寸。As an implementation manner, the face images in the initial data set may be detected through the face detection model, and whether each face image includes a face and the size of the included face can be identified.

步骤203,将识别到的人脸尺寸小于设定阈值的人脸图像,或者,不包含人脸的人脸图像,从初始数据集中删除,以得到目标数据集。Step 203: Delete the recognized face images whose size is smaller than the set threshold, or the face images that do not contain a face, from the initial data set to obtain the target data set.

作为一种实现方式,针对初始数据集中的每一个人脸图像,若该人脸图像中不包括人脸,则确定该人脸图像为噪声数据,则从初始数据集中删除。As an implementation, for each face image in the initial data set, if the face image does not include a face, it is determined that the face image is noise data, and then deleted from the initial data set.

作为另一种实现方式,针对初始数据集中的每一个人脸图像,若该人脸图像中包括人脸,但是人脸在人脸图像中的尺寸小于尺寸阈值,或者是占比小于占比阈值,则确定该人脸图像为异常数据,则从初始数据集中删除,例如,尺寸阈值为30*30像素,占比阈值为十分之一。As another implementation, for each face image in the initial data set, if the face image includes a face, but the size of the face in the face image is smaller than the size threshold, or the proportion is smaller than the proportion threshold , it is determined that the face image is abnormal data, and then deleted from the initial data set, for example, the size threshold is 30*30 pixels, and the proportion threshold is one-tenth.

本申请实施例中,通过对获取到的初始数据集进行检测,以删除人脸尺寸或者是不包含人脸的图像,以得到目标数据集,实现了对初始数据集的初步清洗处理,删除了明显异常的数据,提高了后续处理的效果。In the embodiment of the present application, by detecting the acquired initial data set to delete the size of the face or the image that does not contain the face to obtain the target data set, the initial cleaning process of the initial data set is realized, and the deleted Obvious abnormal data improves the effect of subsequent processing.

步骤204,获取目标对象的人脸图像的目标数据集。Step 204, acquiring a target dataset of face images of the target object.

其中,前述实施例的解释说明也适用于本实施例,原理相同,此处不再赘述。Wherein, the explanations of the foregoing embodiments are also applicable to this embodiment, and the principle is the same, so details are not repeated here.

步骤205,针对每一个图像对,确定图像对中的人脸图像之间的相似度。Step 205, for each image pair, determine the similarity between the face images in the image pair.

作为一种实现方式,将图像对中的人脸图像输入识别模型进行特征提取,得到图像对中的人脸图像的图像特征,根据图像对中的人脸图像的图像特征,确定图像特征之间的距离,根据图像特征之间的距离,确定图像对中的人脸图像之间的相似度。其中,一个图像对中的两个人脸图像的图像特征之间的距离越大,则该图像对中的两个人脸图像之间越不相似,相似度越低;相反,一个图像对中的两个人脸图像的图像特征之间的距离越小,则该图像对中的两个人脸图像之间越相似,相似度越高。As an implementation, the face image in the image pair is input into the recognition model for feature extraction, and the image features of the face image in the image pair are obtained, and the relationship between the image features is determined according to the image features of the face image in the image pair. According to the distance between the image features, the similarity between the face images in the image pair is determined. Among them, the larger the distance between the image features of two face images in an image pair, the less similar the two face images in the image pair are, and the lower the similarity is; on the contrary, the two face images in an image pair The smaller the distance between the image features of the face images, the more similar the two face images in the image pair are, and the higher the similarity is.

其中,图像特征之间的距离,可以为余弦距离、欧式距离等可用于确定图像特征之间的相似度的距离均适用,本实施例中不进行限定。Wherein, the distance between the image features may be cosine distance, Euclidean distance, etc., which can be used to determine the similarity between image features, which are not limited in this embodiment.

步骤206,将各个相似度与相似度阈值比较,确定相似度小于相似度阈值的目标图像对。In step 206, each similarity is compared with a similarity threshold, and a target image pair whose similarity is smaller than the similarity threshold is determined.

步骤207,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次。Step 207, according to the target face images included in the target image pair, determine the occurrence frequency of each target face image.

步骤208,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。Step 208, in response to the occurrence frequency of any target human face image being greater than the quantity threshold, deleting any target human face image from the target data set.

其中,步骤206至步骤208可参照前述实施例中的解释说明,原理相同,此处不再赘述。Wherein, for steps 206 to 208, reference may be made to the explanations in the foregoing embodiments, and the principles are the same, so details are not repeated here.

本申请实施例的数据处理方法中,获取包括目标对象的人脸图像的初始数据集,通过初步筛选去除一部分明显属于噪声或异常的数据,以得到目标数据集,基于目标数据集进一步进行清洗,即通过计算每一个图像对中的人脸图像特征间的距离,确定该图像对中的人脸图像之间的相似度,从确定出的相似度较差的目标图像对中,统计得到目标图像对中的目标人脸图像出现的频次,频次越高,说明该目标人脸图像和目标数据即中的其它人脸图像间的差异越大,该目标人脸图像属于噪声数据或异常数据的可能性越大,进一步,基于出现的频次对目标数据集进行噪声数据清洗,使得目标数据集中人脸图像数据均是符合质量要求的人脸图像,提高了精准度。In the data processing method of the embodiment of the present application, the initial data set including the face image of the target object is obtained, and a part of obviously noise or abnormal data is removed through preliminary screening to obtain the target data set, which is further cleaned based on the target data set, That is, by calculating the distance between the face image features in each image pair, the similarity between the face images in the image pair is determined, and the target image is statistically obtained from the determined target image pair with poor similarity. The frequency of the target face image in the alignment, the higher the frequency, the greater the difference between the target face image and other face images in the target data, and the target face image may belong to noise data or abnormal data The greater the accuracy, further, the noise data cleaning is performed on the target data set based on the frequency of occurrence, so that the face image data in the target data set are all face images that meet the quality requirements, and the accuracy is improved.

基于上述实施例,图3为本申请实施例提供的另一种数据处理方法的流程示意图,如图3所示,该方法包含以下步骤:Based on the above embodiment, Fig. 3 is a schematic flow diagram of another data processing method provided in the embodiment of the present application. As shown in Fig. 3, the method includes the following steps:

步骤301,获取目标对象的人脸图像的目标数据集。Step 301, acquiring a target data set of a face image of a target object.

步骤302,将图像对中的人脸图像输入识别模型进行特征提取,得到图像对中的人脸图像的图像特征。Step 302, input the face images in the image pair into the recognition model for feature extraction, and obtain the image features of the face images in the image pair.

其中,识别模型为神经网络模型,例如,为基于人脸识别算法arcface训练得到的识别模型,或者,为基于人脸识别算法cosface训练得到的识别模型,本实施例中不进行限定。The recognition model is a neural network model, for example, a recognition model trained based on the face recognition algorithm arcface, or a recognition model trained based on the face recognition algorithm cosface, which is not limited in this embodiment.

作为一种实现方式,识别模型包括主干网络backbone和头部网络的全连接层,其中,backbone网络例如为轻量网络mobilenet,用于输出人脸图像的图像特征,图像特征例如为512维特征,全连接层基于图像特征进行分类得到分类结果,在人脸识别的场景下,分类的结果为通过对人脸图像进行识别确定该人脸图像中包括的人脸的身份信息,该身份信息可以为其中的人脸所属的对象的姓名、身份证信息等。As an implementation, the recognition model includes a backbone network backbone and a fully connected layer of the head network, wherein the backbone network is, for example, a lightweight network mobilenet, which is used to output image features of a face image, and the image features are, for example, 512-dimensional features. The fully connected layer classifies based on image features to obtain classification results. In the face recognition scenario, the classification result is to identify the identity information of the face included in the face image by identifying the face image. The identity information can be The name, ID card information, etc. of the object to which the face belongs.

步骤303,根据图像对中的人脸图像的图像特征,确定图像特征之间的距离。Step 303, according to the image features of the face images in the image pair, determine the distance between the image features.

步骤304,根据图像特征之间的距离,确定图像对中的人脸图像之间的相似度。Step 304, according to the distance between the image features, determine the similarity between the face images in the image pair.

作为一种实现方式,可根据图像特征之间的距离和相似度之间的映射关系,将距离转化为对应的相似度。As an implementation manner, the distance can be converted into a corresponding similarity according to the mapping relationship between the distance between image features and the similarity.

其中,步骤303和步骤304可参照前述实施例中的解释说明,原理相同,此处不再赘述。Wherein, step 303 and step 304 may refer to the explanations in the foregoing embodiments, and the principles are the same, so details are not repeated here.

步骤305,对各个距离进行排序,并基于设定选取比例,从排序中确定目标距离。Step 305, sort the distances, and determine the target distance from the sorting based on the set selection ratio.

步骤306,根据目标距离和设定映射关系,确定相似度阈值。Step 306: Determine the similarity threshold according to the target distance and the set mapping relationship.

本申请实施例中,将目标数据集中各个目标图像对中的目标人脸图像之间的距离进行排序,例如,将距离按照从大到小的顺序进行排序,得到排序结果,根据对目标数据集清洗的精度要求确定设定选取比例,例如为5%、7%或10%等,进而,从距离的排序结果中按照设定选取比例选择对应的目标距离,根据目标距离和设定映射关系,确定目标距离对应的相似度,将该相似度作为相似度阈值。In the embodiment of the present application, the distances between the target face images in each target image pair in the target data set are sorted, for example, the distances are sorted in order from large to small, and the sorting results are obtained. According to the target data set The cleaning accuracy requires to determine the set selection ratio, such as 5%, 7% or 10%, etc., and then select the corresponding target distance from the distance sorting results according to the set selection ratio, and according to the target distance and the set mapping relationship, Determine the similarity corresponding to the target distance, and use the similarity as the similarity threshold.

作为一种示例,从目标数据集中确定出20个目标图像对,20个目标图像对中的目标人脸图像之间的距离分别为s1、s2、s3、s4……s19和s20。设定选取比例为20%,即从20个距离中选择4个距离,则将上述20个距离按照从大到小的顺序排序,排名为前4的距离依次为s4、s7、s15和s11,则确定s11为目标距离,从而根据目标距离和相似度的映射关系,确定目标距离对应的相似度阈值,提高了相似度阈值确定的准确性。As an example, 20 target image pairs are determined from the target data set, and the distances between the target face images in the 20 target image pairs are s1, s2, s3, s4...s19 and s20 respectively. Set the selection ratio to 20%, that is, select 4 distances from 20 distances, then sort the above 20 distances in descending order, and the top 4 distances are s4, s7, s15 and s11, Then s11 is determined as the target distance, so that according to the mapping relationship between the target distance and the similarity, the similarity threshold corresponding to the target distance is determined, which improves the accuracy of determining the similarity threshold.

步骤307,将各个相似度与相似度阈值比较,确定相似度小于相似度阈值的目标图像对。In step 307, each similarity is compared with a similarity threshold, and a target image pair whose similarity is smaller than the similarity threshold is determined.

步骤308,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次。步骤309,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。Step 308, according to the target face images included in the target image pair, determine the occurrence frequency of each target face image. Step 309, in response to the occurrence frequency of any target human face image being greater than the quantity threshold, deleting any target human face image from the target data set.

其中,步骤307至步骤309可参照前述实施例中的解释说明,原理相同,此处不再赘述。Wherein, step 307 to step 309 may refer to the explanations in the foregoing embodiments, and the principles are the same, so details are not repeated here.

步骤310,响应于任一目标人脸图像出现的频次小于或等于数量阈值,禁止将任一目标人脸图像从目标数据集中删除。Step 310, in response to the occurrence frequency of any target human face image being less than or equal to the number threshold, prohibiting any target human face image from being deleted from the target data set.

本申请实施例中,将每一个目标人脸图像出现的频次和数量阈值比较,响应于任一目标人脸图像出现的频次小于或等于数量阈值,说明该目标人脸图像是满足质量要求的目标对象的人脸图像,从而禁止将该目标人脸图像从目标数据集中删除,保留了满足质量要求的人脸图像,提高了目标数据集中人脸图像的质量,可提高后续基于目标数据集,对识别模型进一步进行训练时的训练精度。In the embodiment of the present application, the frequency of appearance of each target human face image is compared with the quantity threshold, and in response to the frequency of occurrence of any target human face image is less than or equal to the quantity threshold, indicating that the target human face image is a target that meets the quality requirements The face image of the target, thus prohibiting the deletion of the target face image from the target data set, retaining the face image that meets the quality requirements, improving the quality of the face image in the target data set, and improving the follow-up based on the target data set. Identify the training accuracy for further training of the model.

本申请实施例的数据处理方法中,通过计算目标数据集中每一个图像对中的人脸图像特征间的距离,确定该图像对中的人脸图像之间的相似度,从确定出的相似度较差的目标图像对中,统计得到目标图像对中的目标人脸图像出现的频次,频次越高,说明该目标人脸图像和目标数据集中的其它人脸图像间的差异越大,该目标人脸图像属于噪声数据或异常数据的可能性越大,进一步,基于出现的频次对目标数据集进行噪声数据清洗,使得目标数据集中人脸图像数据均是符合质量要求的人脸图像,提高了目标数据集的精准度。In the data processing method of the embodiment of the present application, by calculating the distance between the face image features in each image pair in the target data set, the similarity between the face images in the image pair is determined, and from the determined similarity In the poor target image pair, the frequency of the target face image in the target image pair is obtained through statistics. The higher the frequency, the greater the difference between the target face image and other face images in the target data set. The face image is more likely to belong to noise data or abnormal data. Further, based on the frequency of occurrence, the target data set is cleaned of noise data, so that the face image data in the target data set are all face images that meet the quality requirements, which improves the The precision of the target dataset.

基于上述实施例,本申请实施例提供了一种模型训练方法,图4为本申请实施例提供的另一种模型训练方法的流程示意图,如图4所示,该方法包含以下步骤:Based on the above-mentioned embodiments, the embodiment of the present application provides a model training method. FIG. 4 is a schematic flowchart of another model training method provided in the embodiment of the present application. As shown in FIG. 4, the method includes the following steps:

步骤401,获取样本目标数据集。Step 401, acquire a sample target data set.

其中,样本目标数据集中包含目标对象的样本人脸图像;样本人脸图像中标注有所包括的人脸的身份信息。Wherein, the sample target data set includes sample face images of the target object; the sample face images are labeled with the identity information of the included faces.

其中,身份信息,用于唯一指示一个对象,即用户,包括姓名、身份证等。Among them, the identity information is used to uniquely indicate an object, that is, a user, including a name, an ID card, and the like.

步骤402,将任一样本人脸图像输入识别模型进行分类,以确定对任一样本人脸图像进行预测得到的身份信息。Step 402, input any sample face image into the recognition model for classification, so as to determine the identity information obtained by predicting any sample face image.

本申请实施例的一种实现方式中,将任一样本人脸图像输入分类模型的特征提取层进行特征提取,得到任一样本人脸图像的高维的图像特征,例如为512维的图像特征,将任一样本人脸图像的图像特征输入分类模型的全连接层,全连接层基于图像特征进行分类,以确定对任一样本人脸图像进行预测得到的身份信息。In an implementation of the embodiment of the present application, any sample face image is input into the feature extraction layer of the classification model for feature extraction to obtain high-dimensional image features of any sample face image, such as 512-dimensional image features, and The image features of any sample face image are input into the fully connected layer of the classification model, and the fully connected layer classifies based on the image features to determine the identity information obtained by predicting any sample face image.

其中,前述实施例中对识别模型的结构的说明也适用于本实施例,原理相同,此处不再赘述。Wherein, the description of the structure of the recognition model in the foregoing embodiments is also applicable to this embodiment, and the principles are the same, so details are not repeated here.

步骤403,基于预测的身份信息和标注的身份信息之间的差异,对识别模型进行训练。Step 403, based on the difference between the predicted identity information and the marked identity information, train the recognition model.

本申请实施例中,基于预测的身份信息和标注的身份信息之间的差异,对识别模型进行训练,即基于差异对识别模型的参数进行调整,并将调整参数后的识别模型,再采用样本目标数据集进行训练,通过多个样本进行多轮迭代,直至识别模型收敛。作为一种实现方式,可以采用用于测试的样本目标数据集对训练得到的识别模型进行测试,若预测的身份信息和标注的身份信息之间的差异小于设定阈值,则认为识别模型训练完成。In the embodiment of the present application, the recognition model is trained based on the difference between the predicted identity information and the marked identity information, that is, the parameters of the recognition model are adjusted based on the difference, and the recognition model after adjusting the parameters is used as a sample The target data set is trained, and multiple rounds of iterations are performed through multiple samples until the recognition model converges. As an implementation method, the trained recognition model can be tested using the sample target data set used for testing. If the difference between the predicted identity information and the marked identity information is less than the set threshold, the recognition model training is considered complete. .

对于识别模型的训练,可选地,可以对识别模型通过不断的参数调整,执行设定轮数的训练后,采用包含多个样本目标数据集的测试集,进行测试,如果人脸识别的召回率大于第一阈值,人脸识别的无识别率小于第二阈值,则认为识别模型训练完成。其中,第一阈值例如为90%,第二阈值例如1%。For the training of the recognition model, optionally, the recognition model can be continuously adjusted by parameters, and after the training of the set number of rounds is performed, the test set containing multiple sample target data sets can be used for testing. If the face recognition recall If the rate is greater than the first threshold and the non-recognition rate of face recognition is less than the second threshold, it is considered that the recognition model training is completed. Wherein, the first threshold is, for example, 90%, and the second threshold is, for example, 1%.

需要理解的是,若样本目标数据集,是未经本申请实施例的数据处理方法处理过的样本目标数据集,例如为上述实施例中的步骤101中的目标数据集,则可用于对识别模型进行初步的训练,训练得到的识别模型的精度不满足最终的需求,但是可以用于对目标数据集中的人脸图像进行识别,得到人脸图像的图像特征。It should be understood that if the sample target data set is a sample target data set that has not been processed by the data processing method of the embodiment of the present application, such as the target data set in step 101 in the above embodiment, it can be used to identify The model is initially trained, and the accuracy of the trained recognition model does not meet the final requirements, but it can be used to recognize the face images in the target data set to obtain the image features of the face images.

若样本目标数据集,是经本申请实施例的数据处理方法处理过的样本目标数据集,则可以对识别模型进行训练,使得训练得到的识别模型的精度较高,提高模型的训练效果。If the sample target data set is the sample target data set processed by the data processing method of the embodiment of the present application, the recognition model can be trained, so that the trained recognition model has higher precision and improves the training effect of the model.

本申请实施例的模型训练方法中,通过对识别模型进行训练,使得训练得到的识别模型识别得到的人脸图像的身份信息和标注的人脸图像的身份信息间的差异小于设定阈值,即满足了精度要求,从而使得识别模型的输出的人脸图像特征具有较高的准确度,从而可以提高目标数据集进行数据清洗的效果。In the model training method of the embodiment of the present application, by training the recognition model, the difference between the identity information of the face image recognized by the trained recognition model and the identity information of the marked face image is smaller than the set threshold, that is The accuracy requirement is met, so that the face image features output by the recognition model have higher accuracy, so that the effect of data cleaning on the target data set can be improved.

为了实现上述实施例,本申请实施例还提出一种数据处理装置。In order to implement the foregoing embodiments, the embodiments of the present application further provide a data processing device.

图5为本申请实施例提供的一种数据处理装置的结构示意图。FIG. 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.

如图5所示,该装置可以包括:As shown in Figure 5, the device may include:

获取模块51,用于获取目标对象的人脸图像的目标数据集;其中,所述目标数据集中包括至少一个图像对,图像对中的人脸图像不同。The acquisition module 51 is configured to acquire a target data set of face images of a target object; wherein, the target data set includes at least one image pair, and the face images in the image pair are different.

第一确定模块52,用于根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对。The first determination module 52 is configured to determine a target image pair whose similarity is smaller than a similarity threshold according to the similarity between the face images in the at least one image pair.

第二确定模块53,用于根据所述目标图像对包括的目标人脸图像,确定各个所述目标人脸图像出现的频次。The second determination module 53 is configured to determine the frequency of appearance of each of the target human face images according to the target human face images included in the target image pair.

处理模块54,用于响应于任一所述目标人脸图像出现的频次大于数量阈值,将任一所述目标人脸图像从所述目标数据集中删除。The processing module 54 is configured to delete any one of the target human face images from the target data set in response to the occurrence frequency of any one of the target human face images being greater than a quantity threshold.

进一步,在本申请实施例的一种实现方式中,第一确定模块52,具体用于:Further, in an implementation manner of the embodiment of the present application, the first determining module 52 is specifically used for:

针对每一个所述图像对,确定所述图像对中的人脸图像之间的相似度;For each image pair, determine the similarity between the face images in the image pair;

将各个所述相似度与相似度阈值比较,确定相似度小于所述相似度阈值的目标图像对。Each of the similarities is compared with a similarity threshold, and a target image pair whose similarity is smaller than the similarity threshold is determined.

在本申请实施例的一种实现方式中,第一确定模块52,具体用于:In an implementation manner of the embodiment of the present application, the first determination module 52 is specifically used to:

将所述图像对中的人脸图像输入识别模型进行特征提取,得到所述图像对中的人脸图像的图像特征;The human face image in the image pair is input into the recognition model to perform feature extraction, and the image features of the human face image in the image pair are obtained;

根据所述图像对中的人脸图像的图像特征,确定图像特征之间的距离;According to the image features of the face images in the image pair, determine the distance between the image features;

根据所述距离,确定所述图像对中的人脸图像之间的相似度。According to the distance, the similarity between the face images in the image pair is determined.

在本申请实施例的一种实现方式中,处理模块54,还用于:In an implementation manner of the embodiment of the present application, the processing module 54 is also used to:

响应于任一所述目标人脸图像出现的频次小于或等于所述数量阈值,禁止将任一所述目标人脸图像从所述目标数据集中删除。In response to the occurrence frequency of any of the target human face images being less than or equal to the number threshold, it is forbidden to delete any of the target human face images from the target data set.

在本申请实施例的一种实现方式中,所述方法还包括:In an implementation manner of the embodiment of the present application, the method further includes:

识别模块,用于获取所述目标对象的人脸图像的初始数据集;对所述初始数据集中的人脸图像进行识别;将识别到的人脸尺寸小于设定阈值的人脸图像,或者,不包含人脸的人脸图像,从所述初始数据集中删除,以得到所述目标数据集。The identification module is used to obtain an initial data set of the face image of the target object; identify the face image in the initial data set; recognize the face image whose face size is smaller than the set threshold, or, Face images that do not contain faces are deleted from the initial data set to obtain the target data set.

在本申请实施例的一种实现方式中,所述方法还包括:In an implementation manner of the embodiment of the present application, the method further includes:

第三确定模块,用于对各个所述距离进行排序;基于设定选取比例,从所述排序中确定目标距离;根据所述目标距离和相似度的映射关系,确定所述相似度阈值。The third determining module is configured to rank each of the distances; determine the target distance from the ranking based on a set selection ratio; and determine the similarity threshold according to the mapping relationship between the target distance and the similarity.

需要说明的是,前述对方法实施例的解释说明也适用于该实施例的装置,此处不再赘述。It should be noted that the foregoing explanations of the method embodiment are also applicable to the device of this embodiment, and details are not repeated here.

本申请实施例的数据处理装置中,获取目标对象的人脸图像的目标数据集,其中,目标数据集中包括至少一个图像对,图像对中的人脸图像不同,根据至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对,根据目标图像对包括的目标人脸图像,确定各个目标人脸图像出现的频次,响应于任一目标人脸图像出现的频次大于数量阈值,将任一目标人脸图像从目标数据集中删除。通过人脸图像间的相似度,确定出相似度较差的目标图像对,在目标图像对的范围内,进一步统计得到目标图像对中的目标人脸图像出现的频次,基于出现的频次对目标数据集进行数据清洗,使得目标数据集中人脸图像数据均是符合质量要求的人脸图像,提高了精准度。In the data processing device of the embodiment of the present application, the target data set of the face image of the target object is acquired, wherein the target data set includes at least one image pair, and the face images in the image pair are different, according to the person in the at least one image pair The similarity between the face images, determine the target image pair whose similarity is less than the similarity threshold, determine the frequency of occurrence of each target face image according to the target face image included in the target image pair, and respond to the appearance of any target face image If the frequency is greater than the number threshold, any target face image will be deleted from the target dataset. Through the similarity between face images, the target image pair with poor similarity is determined. Within the scope of the target image pair, the frequency of occurrence of the target face image in the target image pair is further counted, and the target image is calculated based on the frequency of occurrence. The data set is cleaned, so that the face image data in the target data set are all face images that meet the quality requirements, which improves the accuracy.

为了实现上述实施例,本申请实施例还提出一种模型训练装置。In order to implement the foregoing embodiments, an embodiment of the present application further proposes a model training device.

图6为本申请实施例提供的一种模型训练装置的结构示意图。FIG. 6 is a schematic structural diagram of a model training device provided by an embodiment of the present application.

如图6所示,该装置可以包括:As shown in Figure 6, the device may include:

获取模块61,用于获取样本目标数据集;其中,所述样本目标数据集中包含目标对象的样本人脸图像;所述样本人脸图像中标注有所包括的人脸的身份信息。The acquisition module 61 is configured to acquire a sample target data set; wherein, the sample target data set contains sample human face images of target objects; the sample human face images are labeled with the identity information of the included human faces.

识别模块62,用于将任一所述样本人脸图像输入识别模型进行分类,以确定对任一所述样本人脸图像进行预测得到的身份信息。The recognition module 62 is configured to input any of the sample human face images into a recognition model for classification, so as to determine the identity information obtained by predicting any of the sample human face images.

训练模块63,用于基于预测的身份信息和标注的身份信息之间的差异,对所述识别模型进行训练。The training module 63 is configured to train the recognition model based on the difference between the predicted identity information and the marked identity information.

需要说明的是,前述对方法实施例的解释说明也适用于该实施例的装置,此处不再赘述。It should be noted that the foregoing explanations of the method embodiment are also applicable to the device of this embodiment, and details are not repeated here.

本申请实施例的模型训练装置中,通过对识别模型进行训练,使得训练得到的识别模型识别得到的人脸图像的身份信息和标注的人脸图像的身份信息间的差异小于设定阈值,即满足了精度要求,从而使得识别模型的输出的人脸图像特征具有较高的准确度,从而可以提高目标数据集进行数据清洗的效果。In the model training device of the embodiment of the present application, by training the recognition model, the difference between the identity information of the face image recognized by the trained recognition model and the identity information of the marked face image is smaller than the set threshold, that is The accuracy requirement is met, so that the face image features output by the recognition model have higher accuracy, so that the effect of data cleaning on the target data set can be improved.

为了实现上述实施例,本申请还提出一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如前述方法实施例所述的方法。In order to realize the above embodiments, the present application also proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the aforementioned method is implemented. The method described in the examples.

为了实现上述实施例,本申请还提出一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,实现如前述方法实施例所述的方法。In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method described in the foregoing method embodiments is implemented.

为了实现上述实施例,本申请还提出一种计算机程序产品,其上存储有计算机程序,所述计算机程序被处理器执行时实现如前述方法实施例所述的方法。In order to implement the above embodiments, the present application also proposes a computer program product, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiments is implemented.

图7为本申请实施例提供的一种电子设备的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。FIG. 7 is a block diagram of an electronic device provided by an embodiment of the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

参照图7,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电力组件806,多媒体组件808,音频组件810,输入/输出(I/O)接口812,传感器组件814,以及通信组件816。7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816 .

处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电力组件806为电子设备800的各种组件提供电力。电力组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。The power component 806 provides power to various components of the electronic device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .

多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 . In some embodiments, the audio component 810 also includes a speaker for outputting audio signals.

I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 . For example, the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and the keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or a Changes in position of components, presence or absence of user contact with electronic device 800 , electronic device 800 orientation or acceleration/deceleration and temperature changes in electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如WiFi,4G或5G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on communication standards, such as WiFi, 4G or 5G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.

在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由电子设备800的处理器820执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the electronic device 800 to complete the above method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of a process , and the scope of preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in substantially simultaneous fashion or in reverse order depending on the functions involved, which shall It should be understood by those skilled in the art to which the embodiments of the present application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, as it may be possible, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or other suitable processing if necessary. The program is processed electronically and stored in computer memory.

应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that each part of the present application may be realized by hardware, software, firmware or a combination thereof. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.

此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present application, and those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims (11)

1.一种数据处理方法,其特征在于,包括:1. A data processing method, characterized in that, comprising: 获取目标对象的人脸图像的目标数据集;其中,所述目标数据集中包括至少一个图像对,图像对中的人脸图像不同;Obtain a target data set of a face image of a target object; wherein, the target data set includes at least one image pair, and the face images in the image pair are different; 根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对;According to the similarity between the face images in the at least one image pair, determine a target image pair whose similarity is less than a similarity threshold; 根据所述目标图像对包括的目标人脸图像,确定各个所述目标人脸图像出现的频次;According to the target human face images included in the target image pair, determine the frequency of occurrence of each of the target human face images; 响应于任一所述目标人脸图像出现的频次大于数量阈值,将任一所述目标人脸图像从所述目标数据集中删除。In response to the occurrence frequency of any of the target human face images being greater than a quantity threshold, any of the target human face images is deleted from the target data set. 2.如权利要求1所述的方法,其特征在于,所述根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对,包括:2. The method according to claim 1, wherein, according to the similarity between the face images in the at least one image pair, determining the target image pair whose similarity is less than the similarity threshold includes: 针对每一个所述图像对,确定所述图像对中的人脸图像之间的相似度;For each image pair, determine the similarity between the face images in the image pair; 将各个所述相似度与相似度阈值比较,确定相似度小于所述相似度阈值的目标图像对。Each of the similarities is compared with a similarity threshold, and a target image pair whose similarity is smaller than the similarity threshold is determined. 3.如权利要求2所述的方法,其特征在于,所述针对每一个所述图像对,确定所述图像对中的人脸图像之间的相似度,包括:3. The method according to claim 2, wherein, for each of the image pairs, determining the similarity between the face images in the image pairs includes: 将所述图像对中的人脸图像输入识别模型进行特征提取,得到所述图像对中的人脸图像的图像特征;The human face image in the image pair is input into the recognition model to perform feature extraction, and the image features of the human face image in the image pair are obtained; 根据所述图像对中的人脸图像的图像特征,确定图像特征之间的距离;According to the image features of the face images in the image pair, determine the distance between the image features; 根据所述距离,确定所述图像对中的人脸图像之间的相似度。According to the distance, the similarity between the face images in the image pair is determined. 4.如权利要求3所述的方法,其特征在于,所述方法还包括:4. The method of claim 3, further comprising: 对各个所述距离进行排序;sorting each of said distances; 基于设定选取比例,从所述排序中确定目标距离;determining the target distance from the ranking based on the set selection ratio; 根据所述目标距离和设定映射关系,确定所述相似度阈值。The similarity threshold is determined according to the target distance and the set mapping relationship. 5.如权利要求1-4任一项所述的方法,其特征在于,所述方法,还包括:5. The method according to any one of claims 1-4, wherein the method further comprises: 响应于任一所述目标人脸图像出现的频次小于或等于所述数量阈值,禁止将任一所述目标人脸图像从所述目标数据集中删除。In response to the occurrence frequency of any of the target human face images being less than or equal to the number threshold, it is forbidden to delete any of the target human face images from the target data set. 6.如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:6. The method according to any one of claims 1-4, wherein the method further comprises: 获取所述目标对象的人脸图像的初始数据集;Obtain an initial dataset of face images of the target object; 对所述初始数据集中的人脸图像进行识别;Recognizing the face images in the initial data set; 将识别到的人脸尺寸小于设定阈值的人脸图像,或者,不包含人脸的人脸图像,从所述初始数据集中删除,以得到所述目标数据集。Deleting the recognized face images whose size is smaller than the set threshold, or the face images that do not contain a face, from the initial data set to obtain the target data set. 7.一种模型训练方法,其特征在于,所述方法,包括:7. A method for model training, characterized in that the method comprises: 获取样本目标数据集;其中,所述样本目标数据集中包含目标对象的样本人脸图像;所述样本人脸图像中标注有所包括的人脸的身份信息;Obtain a sample target data set; wherein, the sample target data set contains a sample face image of the target object; the sample face image is marked with the identity information of the included face; 将任一所述样本人脸图像输入识别模型进行分类,以确定对任一所述样本人脸图像进行预测得到的身份信息;Inputting any of the sample human face images into the recognition model for classification to determine the identity information obtained by predicting any of the sample human face images; 基于预测的身份信息和标注的身份信息之间的差异,对所述识别模型进行训练。The recognition model is trained based on the difference between the predicted identity information and the labeled identity information. 8.一种数据处理装置,其特征在于,包括:8. A data processing device, characterized in that it comprises: 获取模块,用于获取目标对象的人脸图像的目标数据集;其中,所述目标数据集中包括至少一个图像对,图像对中的人脸图像不同;An acquisition module, configured to acquire a target data set of a face image of a target object; wherein, the target data set includes at least one image pair, and the face images in the image pair are different; 第一确定模块,用于根据所述至少一个图像对中的人脸图像之间的相似度,确定相似度小于相似度阈值的目标图像对;A first determination module, configured to determine a target image pair whose similarity is less than a similarity threshold according to the similarity between the face images in the at least one image pair; 第二确定模块,用于根据所述目标图像对包括的目标人脸图像,确定各个所述目标人脸图像出现的频次;The second determination module is used to determine the frequency of appearance of each target face image according to the target face image included in the target image pair; 处理模块,用于响应于任一所述目标人脸图像出现的频次大于数量阈值,将任一所述目标人脸图像从所述目标数据集中删除。A processing module, configured to delete any of the target face images from the target data set in response to the occurrence frequency of any of the target face images being greater than a quantity threshold. 9.一种模型训练装置,其特征在于,所述装置,包括:9. A model training device, characterized in that the device comprises: 获取模块,用于获取样本目标数据集;其中,所述样本目标数据集中包含目标对象的样本人脸图像;所述样本人脸图像中标注有所包括的人脸的身份信息;An acquisition module, configured to acquire a sample target data set; wherein, the sample target data set contains a sample face image of a target object; the sample face image is marked with the identity information of the included face; 识别模块,用于将任一所述样本人脸图像输入识别模型进行分类,以确定对任一所述样本人脸图像进行预测得到的身份信息;A recognition module, configured to input any of the sample face images into a recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images; 训练模块,用于基于预测的身份信息和标注的身份信息之间的差异,对所述识别模型进行训练。A training module, configured to train the recognition model based on the difference between the predicted identity information and the labeled identity information. 10.一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如权利要求1-6中任一所述的方法,或实现如权利要求7中所述的方法。10. An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored on the memory and operable on the processor, and when the processor executes the program, it realizes the process described in claims 1-6. Any described method, or realize the method as described in claim 7. 11.一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-6中任一所述的方法,或实现如权利要求7中所述的方法。11. A non-transitory computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-6 is realized, or A method as claimed in claim 7.
CN202310377068.0A 2023-04-10 2023-04-10 Data processing method, model training method, device and electronic device Pending CN116664968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310377068.0A CN116664968A (en) 2023-04-10 2023-04-10 Data processing method, model training method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310377068.0A CN116664968A (en) 2023-04-10 2023-04-10 Data processing method, model training method, device and electronic device

Publications (1)

Publication Number Publication Date
CN116664968A true CN116664968A (en) 2023-08-29

Family

ID=87724976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310377068.0A Pending CN116664968A (en) 2023-04-10 2023-04-10 Data processing method, model training method, device and electronic device

Country Status (1)

Country Link
CN (1) CN116664968A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119248923A (en) * 2024-08-27 2025-01-03 中国科学院香港创新研究院人工智能与机器人创新中心 Ultrasonic examination method, system, device, storage medium and program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119248923A (en) * 2024-08-27 2025-01-03 中国科学院香港创新研究院人工智能与机器人创新中心 Ultrasonic examination method, system, device, storage medium and program product

Similar Documents

Publication Publication Date Title
CN111539443B (en) Image recognition model training method and device and storage medium
CN109389162B (en) Sample image screening technique and device, electronic equipment and storage medium
WO2020088069A1 (en) Hand gesture keypoints detection method and apparatus, electronic device, and storage medium
RU2664003C2 (en) Method and device for determining associate users
CN106503617A (en) Model training method and device
CN110866236B (en) Private picture display method, device, terminal and storage medium
CN104239879B (en) The method and device of separating character
CN109934275A (en) Image processing method and device, electronic device and storage medium
CN109360197A (en) Processing method, device, electronic equipment and the storage medium of image
WO2020019683A1 (en) Input method and apparatus, and electronic device
CN110659690A (en) Neural network construction method and device, electronic equipment and storage medium
CN105608476A (en) Classification method and classification device based on random forest classifier
CN107704190A (en) Gesture identification method, device, terminal and storage medium
CN105100193A (en) Cloud business card recommendation method and device
CN104573642A (en) Face recognition method and device
CN116664968A (en) Data processing method, model training method, device and electronic device
CN105224950A (en) The recognition methods of filter classification and device
CN111274444A (en) Method and device for generating video cover determination model and method and device for determining video cover
CN112884040B (en) Training sample data optimization method, system, storage medium and electronic equipment
CN107423757A (en) clustering processing method and device
CN105653623A (en) Picture collecting method and device
CN111428806B (en) Image tag determining method and device, electronic equipment and storage medium
CN112052710A (en) Face age identification method and device
CN106203042A (en) The method and apparatus determining fingerprint recognition maloperation
CN113590605B (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination