[go: up one dir, main page]

CN110222657B - Single-step face detector optimization system, method and apparatus - Google Patents

Single-step face detector optimization system, method and apparatus Download PDF

Info

Publication number
CN110222657B
CN110222657B CN201910502740.8A CN201910502740A CN110222657B CN 110222657 B CN110222657 B CN 110222657B CN 201910502740 A CN201910502740 A CN 201910502740A CN 110222657 B CN110222657 B CN 110222657B
Authority
CN
China
Prior art keywords
face detector
step face
classification
loss function
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910502740.8A
Other languages
Chinese (zh)
Other versions
CN110222657A (en
Inventor
雷震
张士峰
张永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910502740.8A priority Critical patent/CN110222657B/en
Publication of CN110222657A publication Critical patent/CN110222657A/en
Application granted granted Critical
Publication of CN110222657B publication Critical patent/CN110222657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本发明的一种单步人脸检测器优化系统,包括训练系统:经数据增强模块对被检测图像进行复制拼接、随机裁剪获的图像块,并通过数据增强获取训练样本;经单步人脸检测器接口模块将训练样本发送至待训练的单步人脸检测器进行二分类和边框回归,并获取二分类过程中的采样特征;经尺度感知边距模块获取训练样本的尺度感知边距损失;经特征监督模块通过基于特征监督的分类网络进行训练样本的二分类;损失函数模块,通过基于LCLS、LLOC、LFSM的损失函数对单步人脸检测器进行训练;测试系统:计算训练系统输出的单步人脸检测器的准确度,并在未达到设定条件时再次通过训练系统训练。本发明可以在不增加任何额外开销的前提下提高人脸检测器分类能力。

Figure 201910502740

A single-step face detector optimization system of the present invention includes a training system: image blocks obtained by duplicating and splicing and randomly cropping the detected images through a data enhancement module, and obtaining training samples through data enhancement; The detector interface module sends the training samples to the single-step face detector to be trained for binary classification and bounding box regression, and obtains the sampling features in the binary classification process; the scale-aware margin module obtains the scale-aware margin loss of the training samples ; The two-classification of training samples is carried out through the classification network based on feature supervision through the feature supervision module; the loss function module is used to train the single-step face detector through the loss function based on L CLS , L LOC , and L FSM ; Test system: calculation The accuracy of the single-step face detector output by the training system, and is retrained by the training system when the set conditions are not met. The invention can improve the classification capability of the face detector without adding any extra overhead.

Figure 201910502740

Description

Single-step face detector optimization system, method and device
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a single-step face detector optimization system, method and device.
Background
Face detection is a technology for determining whether a face exists in any input image and returning the position of each face, and is widely applied to the fields of computer vision and the like, such as face recognition, face tracking, face analysis and the like.
Currently, in a face detection model, a single-step detection method based on an anchor point frame is dominant, and the method carries out face detection based on anchor point frames with different positions, scales and aspect ratios. With the development of deep neural networks, the anchor block-based single-step detection method has made great progress in academia. In particular, in the very challenging WIDER FACE dataset, performance on difficult subsets has been promoted in recent years from 40% to 90%. It has now become a challenging problem how to continuously improve these high performance face detectors, especially without adding additional overhead. To address this problem, by analyzing the error distribution of the high performance face detector on the WIDER FACE validation set, two error modes are found, i.e., regression and classification, where classification errors play a major role in detection. If the classification capability of the face detector can be enhanced, more faces can be identified from a complex background, so that error samples are reduced, and the detection precision is improved. Therefore, how to improve the classification capability of the face detector is a problem worthy of further research.
Disclosure of Invention
In order to solve the above-mentioned problems in the prior art, that is, to solve the problem of improving the classification capability of the face detector without adding any additional overhead, a first aspect of the present invention provides a single-step face detector optimization system, which includes a training system and a testing system; the training system comprises a data enhancement module, a scale perception margin module, a feature supervision module, a single-step face detector interface module and a loss function module;
the data enhancement module is configured to obtain a spliced image through copying and splicing based on the detected image; randomly cutting the spliced image to obtain an image block, and after data enhancement is performed on the image block, dividing an anchor point frame to obtain a training sample;
the single-step face detector interface module is configured to send the training samples to a single-step face detector to be trained for secondary classification and frame regression, and obtain sampling features of the training samples obtained in the process of carrying out the secondary classification on the training samples by the single-step face detector;
the scale perception margin module is configured to obtain the loss of the scale perception margin of the training sample;
the characteristic supervision module is configured to perform two classifications of the training samples through a classification network based on characteristic supervision based on the sampling characteristics of the training samples;
the loss function module is configured to perform a function based on LCLS、LLOC、LFSMThe loss function of (3) updating parameters of the single-step face detector; wherein L isCLSFor a scale-aware edge-distance loss function in the binary classification of the single-step face detector, LLOCIs a frame regression loss function, LFSMA loss function for a second class in a feature supervision-based classification network;
the testing system is configured to perform a face detection task by using the single-step face detector obtained by the training system based on preset testing data to obtain detection accuracy, and when the accuracy is smaller than a preset accuracy threshold, the single-step face detector is optimized by the training system again.
In some preferred embodiments, the detected image is a rectangle, and the data enhancement module "obtains a stitched image by copy stitching based on the detected image" by:
copying the detected image for 4 times, and obtaining the spliced image through matrix splicing; the long edge and the short edge of the spliced image are respectively 2 times of the long edge and the short edge of the detected image.
In some preferred embodiments, the image block in the data enhancement module has a long side and a short side which are a times longer side and shorter side, a e [1,2], respectively, of the detected image.
In some preferred embodiments, the feature supervision-based classification network comprises a ROI Align layer, four convolutional layers, a global average pooling layer, and a loss function layer.
In some preferred embodiments, the scale-aware edge distance loss function is constructed based on a perceptual edge distance prediction probability function
y=sigmoid(x-m)
Figure BDA0002090781780000031
Wherein y is a predicted probability value, x is a predicted value, m is an edge distance value of x, alpha is a preset hyper-parameter, and w and h are width and height of the sample respectively.
In some preferred embodiments, the scale-aware edge distance loss function is provided in a scale-aware edge distance network that includes a classification convolutional layer, a scale-aware edge distance layer, a Sigmoid function layer, and a loss function layer.
In some preferred embodiments, the base is LCLS、LLOC、LFSMHas a loss function of
L=LCLS+LLOC+λLFSM
Wherein λ is a predetermined weight.
In a second aspect of the present invention, a single-step face detector optimization method is provided, where the method includes the following steps:
s100, obtaining a spliced image through copying and splicing based on the detected image; randomly cutting the spliced image to obtain an image block, and after data enhancement is performed on the image block, dividing an anchor point frame to obtain a training sample;
s200, performing secondary classification and frame regression of training samples through a single-step face detector, and acquiring sampling characteristics of each training sample;
step S300, performing two-classification on the training samples through a classification network based on feature supervision based on the sampling features of the training samples obtained in the two-classification process of the training samples by the single-step face detector;
step S400, based on LCLS、LLOC、LFSMThe loss function of (2) training the single-step face detector until reaching a preset training end condition; wherein L isCLSFor a scale-aware edge-distance loss function in the binary classification of the single-step face detector, LLOCIs a frame regression loss function, LFSMA loss function for a second class in a feature supervision-based classification network;
and S500, based on preset test data, performing a face detection task by using the single-step face detector obtained by the training system to obtain detection accuracy, if the accuracy is smaller than a preset accuracy threshold value, skipping to the step S100 to optimize the single-step face detector again, and otherwise, outputting the trained single-step face detector.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned single-step face detector optimization method.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; wherein the program is adapted to be loaded and executed by a processor to implement the single-step face detector optimization method described above.
The invention has the beneficial effects that:
the invention utilizes the classification characteristics with more discriminative property obtained from the single-step face detector, and constructs an integral loss function through the combination of the classification network loss function of characteristic supervision, the scale perception margin loss function and the binary classification loss function of the single-step face detector to train the single-step face detector to be optimized, thereby effectively enhancing the classification capability of the high-performance single-step face detector, and the operation of the characteristic supervision classification network and the scale perception margin is not needed in the test stage, so the operation amount of the single-step face detector is not increased, and the invention can improve the classification capability of the face detector without increasing any additional cost.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a network framework diagram of a single-step face detector optimization system according to one embodiment of the invention;
FIG. 2 is a schematic diagram of the principal structure of a feature supervision module in one embodiment of the invention;
FIG. 3 is a schematic diagram of the main structure of a scale-aware edge distance module in an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a single-step face detector optimization method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
How to continuously improve these high performance face detectors has become a challenging problem, especially without adding any additional overhead. Aiming at the problem, the invention provides a single-step face detector optimization system and a single-step face detector optimization method.
A single-step face detector optimization method of the present invention, as shown in figure 1,
the system comprises a data enhancement module, a scale perception margin module, a feature supervision module, a single-step face detector interface module and a loss function module;
the data enhancement module is configured to obtain a spliced image through copying and splicing based on the detected image; randomly cutting the spliced image to obtain an image block, and acquiring a training sample through an anchor point frame after data enhancement is performed on the image block;
the single-step face detector interface module is configured to send the training samples to a single-step face detector to be trained for secondary classification and frame regression, and obtain sampling features of the training samples obtained in the process of carrying out the secondary classification on the training samples by the single-step face detector;
the scale perception margin module is configured to obtain the loss of the scale perception margin of the training sample;
the characteristic supervision module is configured to perform two classifications of the training samples through a classification network based on characteristic supervision based on the sampling characteristics of the training samples;
the loss function module is configured to perform a function based on LCLS、LLOC、LFSMThe loss function of (3) updating parameters of the single-step face detector; wherein L isCLSFor a scale-aware edge-distance loss function in the binary classification of the single-step face detector, LLOCIs a frame regression loss function, LFSMA loss function for a second class in a feature supervision-based classification network;
the testing system is configured to perform a face detection task by using the single-step face detector obtained by the training system based on preset testing data to obtain detection accuracy, and when the accuracy is smaller than a preset accuracy threshold, the single-step face detector is optimized by the training system again.
For a clearer explanation of the single-step face detector optimization system of the present invention, the following will discuss the steps of one embodiment of the method of the present invention in detail with reference to the accompanying drawings.
The one-step face detector backbone network to be trained in the embodiment of the present invention is constructed based on a ResNet network, where the ResNet network is a feature pyramid structure with 6 layers, as shown in fig. 1, the network includes 6 feature layers of C2, C3, C4, C5, C6, and C7, and the corresponding feature layers include 6 detection layers of P2, P3, P4, P5, P6, and P7.
On each detection layer, an anchor box of 2 scales was used: 2S and
Figure BDA0002090781780000071
s, where S is each detectionThe down-sampling rate of the layers, and additionally the aspect ratio of width and height used for only one, is 1.25. Each detection layer can cover a size range of 8 to 362 pixels on the network input image using 2 anchor boxes.
The single-step face detector optimization system of an embodiment of the present invention, as shown in fig. 1, includes a training system and a testing system; the training system comprises a data enhancement module, a scale perception margin module, a feature supervision module, a single-step face detector interface module and a loss function module.
1. Training system
(1) Data enhancement module
The module is configured to obtain a stitched image by copy stitching based on the detected image; and randomly cutting the spliced image to obtain an image block, and acquiring a training sample through an anchor point frame after data enhancement is performed on the image block.
The existing data enhancement strategy is to add some luminosity distortion on a training image, then carry out mean value expansion operation, then cut out two patches (image blocks), randomly select one for training, wherein one patch is the size of the short edge of the image, and the other patch is the size determined by multiplying the short edge of the image by a random number in an interval [0.5,1.0 ]; and finally, randomly overturning the selected patch, and adjusting the selected patch to 1024x1024 to obtain a final training sample. In the data enhancement strategy, more small faces can be obtained by the expansion operation, so that the performance is remarkably improved, particularly for the small faces, however, the whole image except for the placed original image does not contribute to other places in the training phase, and the utilization rate is low.
To solve the above problem, the present embodiment replaces the expansion operation in the original method with a new and efficient data Enhancement (EDA).
Because the largest reduction factor is 2, the detected image is copied for 4 times in the embodiment, and a canvas is generated as a spliced image through the matrix type, and the long edge and the short edge of the spliced image are respectively 2 times of the long edge and the short edge of the detected image. And then randomly clipping the spliced image to obtain a patch, wherein the long side and the short side of the patch are respectively A times of the long side and the short side of the detected image, and A belongs to [1,2 ]. And performing data enhancement on the patch obtained by cutting by using a conventional data enhancement method.
In the training phase, the preset anchor boxes need to be set as positive and negative samples: the anchor block is set to positive samples based on its overlap (IoU) threshold with the true value being greater than 0.5; the basis for setting to negative samples is that their overlap (IoU) with the true value is in the interval [0, 0.4); if the overlap (IoU) of the anchor block with the true value is in the interval [0.4,0.5), it is ignored during the training phase.
(2) Single step human face detector interface module
The module is configured to send the training samples to a single-step face detector to be trained for secondary classification and frame regression, and obtain sampling features of the training samples obtained in the process of carrying out the secondary classification on the training samples by the single-step face detector.
In recent years, a convolutional neural network-based method is dominant in target detection, and is divided into a one-step detection method and a two-step detection method. The two-step detection method consumes a lot of time, but the one-step detection method is faster and can be more practical in many applications. In this embodiment, a one-step detection method is applied.
Face detection is a relatively simple two-classification method with a large number of small faces, which makes the advantage of the second step of the two-step detection method less obvious. In the embodiment, a one-step detection method is applied, so that in order to enhance the classification capability of the one-step detection method without reducing the speed, it becomes critical how to fully utilize the second step in the two-step method to learn more distinctive features. To solve this problem, the present embodiment designs a feature supervised classification network, which, like the second stage in the two-step detection method, lets the backbone network learn more distinctive features in the training stage and keeps the test time of the detector unchanged. Using this method, the second stage in the two-step method is fully exploited in the testing stage to learn more discriminative classification features without any additional overhead.
Since the method adds a classification network based on feature supervision, a Non Maximum Suppression (Non Maximum Suppression) threshold needs to be set as a set threshold, which is 0.7 in the present embodiment. In this embodiment, 512 prediction anchor boxes are selected as training samples, the anchor boxes are distributed on a suitable pyramid layer to remove sampling characteristics, and a layer Pk on which subsequent two classes are to be sampled is determined by formula (1):
Figure BDA0002090781780000091
wherein k is0W, h are the width and height of the training sample, respectively.
Namely:
if the training sample size is less than 162Then distributed to the P2 layer;
if the training sample size is 162To 322In the middle, it is distributed on the P3 layer;
if the training sample size is 322To 642In the middle, it is distributed on the P4 layer;
if the training sample size is 642To 1282In the middle, it is distributed on the P5 layer;
if the training sample size is 1282To 2562In the middle, it is distributed on the P6 layer;
if the training sample size is larger than 2562And then distributed to the P7 layer.
(3) Scale perception edge distance module
The module is configured to obtain a scale-aware edge distance loss for the training sample.
In order to obtain better high-performance face detection, the main obstacle is classification error, i.e. the classification capability is not robust enough. In order to enhance the classification capability in detection, a conventional prediction probability function based on edge distance (as shown in formula (2)) is combined to reconstruct a scale perception edge distance loss function, and the difference is that the setting of an edge distance value m is changed from a fixed value to a factor related to the length and width of a sample, as shown in formula (3).
y=sigmoid(x-m) (2)
Figure BDA0002090781780000101
Wherein y is a predicted probability value, x is a predicted value, m is an edge distance value of x, alpha is a preset hyper-parameter, and w and h are width and height of the sample respectively.
In order to obtain the value of the scale-aware edge-distance loss function, a scale-aware edge-distance network (SAM) is provided. The scale-aware edge distance network comprises a classification convolution layer, a scale-aware edge distance layer, a Sigmoid function layer and a Loss function layer (Focal Loss layer), as shown in fig. 1 and 3, wherein x-m is obtained through the scale-aware edge distance layer, y value is obtained through the Sigmoid function layer, and a Loss function L based on the y value passes through the Loss function layerCLSThe loss is calculated. And (3) inputting an image into a scale perception margin network to obtain the scale of each sample, obtaining the value m through a formula (3), and further obtaining the loss of the image based on the margin through a formula (2).
The scale-aware edge distance network uses smaller edge distance values for larger faces and larger edge distance values for smaller faces to enhance classification capability. After the scale perception edge distance module is used, the face and the complex background can be better distinguished, and therefore the classifying capability of the small face is enhanced.
(4) The feature supervision module
The module is configured to perform a second classification of the training samples through a feature-supervised based classification network based on the sampling features of the training samples.
As shown in fig. 1 and 2, the feature supervision-based classification network (FSM) includes an ROI Align operation layer, four convolution layers (256 × 128 × 3 × 3 convolution, 128 × 64 × 3 × 3 convolution, 64 × 32 × 3 × 3 convolution, 32 × 1 × 3 × 3 convolution), a global average pooling layer, and a Loss function layer (Focal local layer).
(5) Loss function module
The module is configured by being based on LCLS、LLOC、LFSMLoss function ofUpdating parameters of the single-step face detector; wherein L isCLSFor a scale-aware edge-distance loss function in the binary classification of the single-step face detector, LLOCIs a frame regression loss function, LFSMIs a loss function for two classes in a class network based on feature supervision.
Based on LCLS、LLOC、LFSMIs shown in equation (4)
L=LCLS+LLOC+λLFSM (4)
Wherein λ is a preset weight, and the value in this embodiment is 0.5, and the weight is used to balance the loss function in the two classes of the single-step face detector and the loss function in the two classes in the feature supervised classification network in the training.
2. Test system
The system is configured to perform a face detection task by using a single-step face detector obtained by the training system based on preset test data to obtain detection accuracy, and when the accuracy is smaller than a preset accuracy threshold, the single-step face detector is optimized by the training system again.
The test system does not need to carry out the operation of the feature supervision and classification network, so the operation amount of the single-step face detector is not increased.
In the testing stage, the confidence coefficient is set to be 0.05, some detection results are filtered, and 400 frames with the highest confidence coefficient scores are reserved;
then, a Non-Maximum Suppression algorithm (Non Maximum Suppression) is used to set the threshold to 0.4, and a detection box with the highest confidence score of 200 is generated in each image as a final result.
It should be noted that the single-step face detector optimization system provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A single-step face detector optimization method according to a second embodiment of the present invention, as shown in fig. 4, includes the following steps:
s100, obtaining a spliced image through copying and splicing based on the detected image; randomly cutting the spliced image to obtain an image block, and after data enhancement is performed on the image block, dividing an anchor point frame to obtain a training sample;
s200, performing secondary classification and frame regression of training samples through a single-step face detector, and acquiring sampling characteristics of each training sample;
step S300, performing two-classification on the training samples through a classification network based on feature supervision based on the sampling features of the training samples obtained in the two-classification process of the training samples by the single-step face detector;
step S400, based on LCLS、LLOC、LFSMThe loss function of (2) training the single-step face detector until reaching a preset training end condition; wherein L isCLSFor a scale-aware edge-distance loss function in the binary classification of the single-step face detector, LLOCIs a frame regression loss function, LFSMA loss function for a second class in a feature supervision-based classification network;
and S500, based on preset test data, performing a face detection task by using the single-step face detector obtained by the training system to obtain detection accuracy, if the accuracy is smaller than a preset accuracy threshold value, skipping to the step S100 to optimize the single-step face detector again, and otherwise, outputting the trained single-step face detector.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related descriptions of the method described above may refer to the corresponding process in the foregoing system embodiment, and are not described herein again.
A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to realize the above-mentioned single-step face detector optimization method.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the single-step face detector optimization method described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1.一种单步人脸检测器优化系统,其特征在于,所述系统包括训练系统、测试系统;所述训练系统包括数据增强模块、尺度感知边距模块、特征监督模块、单步人脸检测器接口模块、损失函数模块;1. a single-step face detector optimization system, is characterized in that, described system comprises training system, test system; Described training system comprises data enhancement module, scale perception margin module, feature supervision module, single-step face Detector interface module, loss function module; 所述数据增强模块,配置为基于被检测图像,通过复制拼接获得拼接图像;在所述拼接图像中随机裁剪获得一个图像块,并对该图像块进行数据增强后,通过锚点框获取训练样本;The data enhancement module is configured to obtain a spliced image by copying and splicing based on the detected image; randomly crop an image block in the spliced image, and after performing data enhancement on the image block, obtain a training sample through an anchor frame ; 所述单步人脸检测器接口模块,配置为将所述训练样本发送至待训练的单步人脸检测器进行二分类和边框回归,并获取单步人脸检测器进行训练样本的二分类过程中得到的各训练样本的采样特征;The single-step face detector interface module is configured to send the training samples to the single-step face detector to be trained for binary classification and bounding box regression, and obtain the single-step face detector for binary classification of the training samples. The sampling characteristics of each training sample obtained in the process; 所述尺度感知边距模块,配置为获取训练样本的尺度感知边距损失;the scale-aware margin module, configured to obtain the scale-aware margin loss of the training sample; 所述特征监督模块,配置为基于所述各训练样本的采样特征,通过基于特征监督的分类网络进行训练样本的二分类;The feature supervision module is configured to perform two-classification of the training samples through a classification network based on feature supervision based on the sampling features of the training samples; 所述损失函数模块,配置为通过基于LCLS、LLOC、LFSM的损失函数,对单步人脸检测器进行参数更新;其中,LCLS为所述单步人脸检测器二分类中的尺度感知边距损失函数,LLOC为边框回归损失函数,LFSM为基于特征监督的分类网络中二分类的损失函数;The loss function module is configured to update the parameters of the single-step face detector through a loss function based on L CLS , L LOC , and L FSM ; wherein, L CLS is the second classification of the single-step face detector. The scale-aware margin loss function, L LOC is the border regression loss function, and L FSM is the loss function of the binary classification in the classification network based on feature supervision; 所述测试系统,配置为基于预设的测试数据,利用所述训练系统得到的单步人脸检测器进行人脸检测任务获取检测准确度,并在该准确度小于预设准确度阈值时,再次通过所述训练系统优化所述单步人脸检测器。The test system is configured to use the single-step face detector obtained by the training system to perform a face detection task based on preset test data to obtain detection accuracy, and when the accuracy is less than a preset accuracy threshold, The one-step face detector is optimized again by the training system. 2.根据权利要求1所述的单步人脸检测器优化系统,其特征在于,所述被检测图像为矩形,所述数据增强模块中“基于被检测图像,通过复制拼接获得拼接图像”,其方法为:2. single-step face detector optimization system according to claim 1, is characterized in that, described detected image is rectangle, in described data enhancement module " based on detected image, obtain stitched image by copying stitching ", Its method is: 将被检测图像复制4次,并通过矩阵式拼接获取所述拼接图像;所述拼接图像的长边为被检测图像长边的2倍、所述拼接图像的短边为被检测图像短边的2倍。The detected image is copied 4 times, and the stitched image is obtained by matrix splicing; the long side of the stitched image is 2 times the long side of the detected image, and the short side of the stitched image is the short side of the detected image. 2 times. 3.根据权利要求2所述的单步人脸检测器优化系统,其特征在于,所述数据增强模块中所述图像块的长边为被检测图像的长边的A倍、所述图像块的短边为被检测图像的短边的A倍,A∈[1,2]。3. The single-step face detector optimization system according to claim 2, wherein the long side of the image block in the data enhancement module is A times the long side of the detected image, the image block The short side is A times the short side of the detected image, A∈[1,2]. 4.根据权利要求1所述的单步人脸检测器优化系统,其特征在于,所述基于特征监督的分类网络,包括ROI Align层、四个卷积层、一个全局平均池化层、损失函数层。4. single-step face detector optimization system according to claim 1, is characterized in that, described classification network based on feature supervision, comprises ROI Align layer, four convolution layers, a global average pooling layer, loss function layer. 5.根据权利要求1所述的单步人脸检测器优化系统,其特征在于,所述尺度感知边距损失函数基于感知边距的预测概率函数构建,所述基于感知边距的预测概率函数为5. The single-step face detector optimization system according to claim 1, wherein the scale-aware margin loss function is constructed based on the predicted probability function of the perceptual margin, and the predicted probability function based on the perceptual margin is constructed. for y=sigmoid(x-m)y=sigmoid(x-m)
Figure FDA0003084590710000021
Figure FDA0003084590710000021
其中,y为预测的概率值,x是预测值,m是x的边距值,α为预设的超参数,w、h分别为样本的宽和高。Among them, y is the predicted probability value, x is the predicted value, m is the margin value of x, α is the preset hyperparameter, and w and h are the width and height of the sample, respectively.
6.根据权利要求5所述的单步人脸检测器优化系统,其特征在于,所述尺度感知边距损失函数设置在尺度感知边距网络中,所述尺度感知边距网络包括分类卷积层、尺度感知边距层、Sigmoid函数层和损失函数层。6. The single-step face detector optimization system according to claim 5, wherein the scale-aware margin loss function is set in a scale-aware margin network, and the scale-aware margin network comprises a classification convolution layer, scale-aware margin layer, sigmoid function layer, and loss function layer. 7.根据权利要求1-6任一项所述的单步人脸检测器优化系统,其特征在于,所述基于LCLS、LLOC、LFSM的损失函数为7. The single-step face detector optimization system according to any one of claims 1-6, wherein the loss function based on L CLS , L LOC , L FSM is L=LCLS+LLOC+λLFSM L=L CLS +L LOC +λL FSM 其中,λ为预设权重。Among them, λ is the preset weight. 8.一种单步人脸检测器优化方法,其特征在于,所述方法包括以下步骤:8. A single-step face detector optimization method, characterized in that the method comprises the following steps: 步骤S100,基于被检测图像,通过复制拼接获得拼接图像;在所述拼接图像中随机裁剪获得一个图像块,并对该图像块进行数据增强后,通过对锚点框进行划分获取训练样本;Step S100, based on the detected image, obtain a spliced image by duplicating and splicing; randomly crop an image block in the spliced image, and after performing data enhancement on the image block, obtain a training sample by dividing the anchor frame; 步骤S200,通过单步人脸检测器进行训练样本的二分类和边框回归,并获取各训练样本的采样特征;Step S200, performing binary classification and frame regression of the training samples through a single-step face detector, and obtaining the sampling features of each training sample; 步骤S300,基于单步人脸检测器进行训练样本的二分类过程中得到的各训练样本的采样特征,通过基于特征监督的分类网络进行训练样本的二分类;Step S300, based on the sampling features of each training sample obtained in the two-classification process of the training samples performed by the single-step face detector, the two-classification of the training samples is carried out through a classification network based on feature supervision; 步骤S400,通过基于LCLS、LLOC、LFSM的损失函数,对单步人脸检测器进行训练,直至达到预设的训练结束条件;其中,LCLS为所述单步人脸检测器二分类中的尺度感知边距损失函数,LLOC为边框回归损失函数,LFSM为基于特征监督的分类网络中二分类的损失函数;Step S400, through the loss function based on L CLS , L LOC , and L FSM , the single-step face detector is trained until a preset training end condition is reached; wherein, L CLS is the second step of the single-step face detector. Scale-aware margin loss function in classification, L LOC is the loss function of border regression, and L FSM is the loss function of binary classification in the classification network based on feature supervision; 步骤S500,基于预设的测试数据,利用步骤S400训练得到的单步人脸检测器进行人脸检测任务获取检测准确度,若该准确度小于预设准确度阈值时,跳转至步骤S100再次优化所述单步人脸检测器,否则输出训练好的单步人脸检测器。Step S500, based on the preset test data, use the single-step face detector trained in step S400 to perform the face detection task to obtain the detection accuracy, if the accuracy is less than the preset accuracy threshold, jump to step S100 again Optimize the one-step face detector, otherwise output the trained one-step face detector. 9.一种存储装置,其中存储有多条程序,其特征在于,所述程序适于由处理器加载并执行以实现权利要求8所述的单步人脸检测器优化方法。9 . A storage device, wherein a plurality of programs are stored, wherein the programs are adapted to be loaded and executed by a processor to implement the one-step face detector optimization method of claim 8 . 10.一种处理装置,包括处理器、存储装置;处理器,适于执行各条程序;存储装置,适于存储多条程序;其特征在于,所述程序适于由处理器加载并执行以实现权利要求8所述的单步人脸检测器优化方法。10. A processing device, comprising a processor and a storage device; the processor is adapted to execute various programs; the storage device is adapted to store a plurality of programs; characterized in that the programs are adapted to be loaded and executed by the processor to The one-step face detector optimization method of claim 8 is implemented.
CN201910502740.8A 2019-06-11 2019-06-11 Single-step face detector optimization system, method and apparatus Active CN110222657B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910502740.8A CN110222657B (en) 2019-06-11 2019-06-11 Single-step face detector optimization system, method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910502740.8A CN110222657B (en) 2019-06-11 2019-06-11 Single-step face detector optimization system, method and apparatus

Publications (2)

Publication Number Publication Date
CN110222657A CN110222657A (en) 2019-09-10
CN110222657B true CN110222657B (en) 2021-07-20

Family

ID=67816377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910502740.8A Active CN110222657B (en) 2019-06-11 2019-06-11 Single-step face detector optimization system, method and apparatus

Country Status (1)

Country Link
CN (1) CN110222657B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633065A (en) * 2020-11-19 2021-04-09 特斯联科技集团有限公司 Face detection method, system, storage medium and terminal based on data enhancement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220618A (en) * 2017-05-25 2017-09-29 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
CN108460382A (en) * 2018-03-26 2018-08-28 西安电子科技大学 Remote sensing image Ship Detection based on deep learning single step detector
CN108898047A (en) * 2018-04-27 2018-11-27 中国科学院自动化研究所 The pedestrian detection method and system of perception are blocked based on piecemeal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7236615B2 (en) * 2004-04-21 2007-06-26 Nec Laboratories America, Inc. Synergistic face detection and pose estimation with energy-based models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220618A (en) * 2017-05-25 2017-09-29 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
CN108460382A (en) * 2018-03-26 2018-08-28 西安电子科技大学 Remote sensing image Ship Detection based on deep learning single step detector
CN108898047A (en) * 2018-04-27 2018-11-27 中国科学院自动化研究所 The pedestrian detection method and system of perception are blocked based on piecemeal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Single-Shot Scale-Aware Network for Real-Time Face Detection;Shifeng Zhang etal.;《International Journal of Computer Vision (2019)》;20190219;全文 *
基于深度学习的人脸识别算法研究;赵学斌;《中国优秀硕士学位论文全文数据库信息科技辑》;20181115;全文 *

Also Published As

Publication number Publication date
CN110222657A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN106776842B (en) Multimedia data detection method and device
JP2023022031A (en) Efficient Decision Tree Traversal in Adaptive Boosting (AFDABOOST) Classifiers
CN110781924A (en) A feature extraction method for side-scan sonar images based on fully convolutional neural network
WO2019114523A1 (en) Classification training method, server and storage medium
CN116368500A (en) Model training method, image processing method, calculation processing apparatus, and non-transitory computer readable medium
CN116168240B (en) Arbitrary-direction dense ship target detection method based on attention enhancement
CN111967464B (en) Weak supervision target positioning method based on deep learning
CN112232371A (en) An American license plate recognition method based on YOLOv3 and text recognition
CN110148117A (en) Power equipment defect identification method and device based on power image and storage medium
CN117746077B (en) Chip defect detection method, device, equipment and storage medium
CN110889421A (en) Target detection method and device
CN114299358B (en) Image quality assessment method, device, electronic device and machine-readable storage medium
CN118447322A (en) Wire surface defect detection method based on semi-supervised learning
CN113177956B (en) Semantic segmentation method for unmanned aerial vehicle remote sensing image
CN111738114A (en) Vehicle target detection method based on accurate sampling of remote sensing images without anchor points
CN115223042A (en) Target identification method and device based on YOLOv5 network model
CN118379696B (en) Ship target detection method and device and readable storage medium
CN115965862A (en) SAR ship target detection method based on mask network fusion image characteristics
CN111640138A (en) Target tracking method, device, equipment and storage medium
CN113657214A (en) A method of building damage assessment based on Mask RCNN
CN119540553A (en) A MRI image segmentation method based on WKA-Unet
KR101821770B1 (en) Techniques for feature extraction
CN112966757A (en) Method and device for expanding training sample, storage medium and equipment
CN110222657B (en) Single-step face detector optimization system, method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant