[go: up one dir, main page]

CN112580496B - Face Relative Pose Estimation Method Combined with Face Keypoint Detection - Google Patents

Face Relative Pose Estimation Method Combined with Face Keypoint Detection Download PDF

Info

Publication number
CN112580496B
CN112580496B CN202011489338.XA CN202011489338A CN112580496B CN 112580496 B CN112580496 B CN 112580496B CN 202011489338 A CN202011489338 A CN 202011489338A CN 112580496 B CN112580496 B CN 112580496B
Authority
CN
China
Prior art keywords
point
face
target
source
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202011489338.XA
Other languages
Chinese (zh)
Other versions
CN112580496A (en
Inventor
于慧敏
刘柏邑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011489338.XA priority Critical patent/CN112580496B/en
Publication of CN112580496A publication Critical patent/CN112580496A/en
Application granted granted Critical
Publication of CN112580496B publication Critical patent/CN112580496B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种结合人脸关键点与RGB‑D数据进行人脸相对姿态估计的方法。该方法通过对RGB图像上人脸关键点的识别并计算初始姿态角度差,以及结合深度数据,通过对RGB‑D点云进行匹配来得到不同角度之间的人脸的较高精度的相对姿态估计。不同于之前的识别方法,该方法使用了RGB‑D数据,且对数据集的大小要求较低,在小样本的数据上就可以实现高精度的相对姿态估计。而且还结合了人脸关键点信息,增强了姿态估计的鲁棒性。

Figure 202011489338

The invention discloses a method for estimating the relative pose of a human face by combining key points of the human face and RGB-D data. This method recognizes the key points of the face on the RGB image and calculates the initial pose angle difference, and combines the depth data to match the RGB-D point cloud to obtain a relatively high-precision relative pose of the face between different angles. estimate. Different from the previous recognition method, this method uses RGB-D data, and has lower requirements on the size of the data set, and can achieve high-precision relative pose estimation on small sample data. Moreover, it also combines face key point information to enhance the robustness of pose estimation.

Figure 202011489338

Description

结合人脸关键点检测的人脸相对姿态估计方法Face Relative Pose Estimation Method Combined with Face Keypoint Detection

技术领域technical field

本发明属于图像识别、人脸关键点识别、人脸姿态估计,特别地涉及一种基于RGB-D数据和人脸关键点的人脸相对位姿估计方法。The invention belongs to image recognition, human face key point recognition, and human face pose estimation, and particularly relates to a method for estimating relative face pose based on RGB-D data and human face key points.

背景技术Background technique

人脸相对姿态有着比较重要的研究价值。其目标是根据一对不同角度的人脸RGB-D数据,精准估计出二者之间的相对姿态。人脸相对姿态估计有以下主要应用场景:The relative pose of faces has important research value. Its goal is to accurately estimate the relative pose between a pair of face RGB-D data from different angles. Face relative pose estimation has the following main application scenarios:

(1)注意力检测:通过判断头部姿态可以判断人的注意力情况。比如可以检测长途司机是不是在目视前方,长时间不目视前方的话,可以提前敲打,保证安全,减少事故;再比如监控学生上课时是否集中精力等。(1) Attention detection: By judging the head posture, the human attention can be judged. For example, it can detect whether the long-distance driver is looking ahead. If he does not look ahead for a long time, he can beat in advance to ensure safety and reduce accidents; another example is to monitor whether students are concentrating in class.

(2)行为分析。和(1)类似,通过视频监控分析再辅助其他算法可以判断一个人是否具有不轨行为,做到提前预警,防患于未然。(2) Behavior analysis. Similar to (1), video surveillance analysis and other algorithms can be used to judge whether a person has misbehavior, so as to achieve early warning and prevent problems before they happen.

(3)人机互动。人的头部动作有时可以表示意义,传递信息。摇头在大多数人看来是否认,点头表示同意,长时间低头说明可能正在思考问题。如果机器人能理解这样的行为,将提高人机交互的质量和有效性。(3) Human-computer interaction. Human head movements can sometimes express meaning and convey information. Shaking the head means denial in most people's eyes, nodding means agreeing, and bowing the head for a long time means that you may be thinking about a problem. If robots can understand such behaviors, it will improve the quality and effectiveness of human-robot interactions.

(4)视线追踪,也可以称为眼球跟踪。准确的头部姿态估计能够提高视线追踪的精度。视线追踪可以用在游戏领域,比如用眼睛就可以控制游戏内人物的移动,让体感操作更上一层楼。(4) Gaze tracking, also known as eye tracking. Accurate head pose estimation can improve the accuracy of gaze tracking. Eye tracking can be used in the field of games. For example, the movement of characters in the game can be controlled with the eyes, so that the somatosensory operation can be improved to a higher level.

对于人脸相对姿态估计,大体可以分为两个流派。第一种是基于特征的方法。在这种方法中,通过传统方法对人脸上特征的分析并与标准正脸的特征相比较,得到两种特征之间的差别,对特征之间的差别进行分析可以得到相对的姿态估计。For face relative pose estimation, it can be roughly divided into two genres. The first is a feature-based approach. In this method, the traditional method is used to analyze the features of the human face and compare them with the features of the standard frontal face to obtain the difference between the two features, and the relative pose estimation can be obtained by analyzing the difference between the features.

而另一种则是主要基于深度学习的方法。通过设计不同结构的深度学习网络,对当前人脸的特征进行提取并回归角度,得到当前人脸的姿态。随着计算资源的发展,这类方法逐渐变得火热,但是极大地受到模型大小的限制以及对精度的要求。The other is a method mainly based on deep learning. By designing deep learning networks with different structures, the features of the current face are extracted and the angle is returned to obtain the pose of the current face. With the development of computing resources, such methods have gradually become popular, but they are greatly limited by the size of the model and the requirements for accuracy.

点云可以在三维空间中比较准确的反映出物体表面的真实大小与形状结构,是一种对物体比较原始的三维结构表征,随着数据获取变得容易逐渐成为了计算机视觉领域的一种比较重要的数据结构。Point cloud can accurately reflect the real size and shape structure of the object surface in three-dimensional space. It is a relatively primitive three-dimensional structural representation of objects. As data acquisition becomes easier, it has gradually become a comparison in the field of computer vision. important data structures.

发明内容Contents of the invention

本发明的目的在于提供一种基于RGB-D数据的结合人脸关键点检测的人脸相对姿态估计方法。该方法通过关键点检测以及引入点对特征来进行迭代最近点的方法,实现了对一组不同角度的人脸进行相对姿态估计的过程。在本发明中,使用点云来进行估计比单纯使用RGB图像来估计结果要更精准,将相对姿态估计的过程融合进点云匹配过程可以最大程度地提高结果精度。The object of the present invention is to provide a method for estimating the relative pose of a human face based on RGB-D data combined with key point detection of the human face. This method implements the process of relative pose estimation for a group of faces from different angles by means of key point detection and the introduction of point pair features to iterate the closest point. In the present invention, using the point cloud to estimate is more accurate than simply using RGB images to estimate the result, and fusing the process of relative attitude estimation into the point cloud matching process can maximize the accuracy of the result.

为实现上述目的,本发明的技术方案为:结合人脸关键点检测的人脸相对姿态估计方法。该方法为:首先使用人脸关键点检测技术,对人脸的68个关键点进行提取并利用两组关键点的三维坐标间的匹配进行第一次的相对姿态的估计。在第一次的基础上,第二次的相对姿态估计有了比较好的初值,同时再对面部进行裁剪,使得迭代最近点算法前的预处理最优化,最后对六维坐标的点对特征进行迭代的匹配,在匹配的过程中得到相对姿态的估计。In order to achieve the above object, the technical solution of the present invention is: a method for estimating the relative pose of a human face combined with key point detection of a human face. The method is as follows: firstly, 68 key points of the human face are extracted by using the key point detection technology of the human face, and the first relative attitude estimation is carried out by using the matching between the three-dimensional coordinates of two groups of key points. On the basis of the first time, the relative pose estimation of the second time has a better initial value, and at the same time, the face is cropped to optimize the preprocessing before iterating the nearest point algorithm, and finally the point pairs of six-dimensional coordinates The features are matched iteratively, and the relative pose is estimated during the matching process.

具体地,本发明方法包括如下步骤:Specifically, the inventive method comprises the following steps:

步骤1:在已有RGB-D数据的情况下,采用Dlib人脸关键点检测CNN模型对两张对应角度的RGB图像Isource和Itarget进行包含脸部轮廓、嘴巴、眼睛、鼻子等部位共68个人脸关键点检测,从中选取多个对应关键点的RGB-D三维坐标点{KeyPointsnum}和{KeyPointtnum},进行两组三维坐标点的匹配得到人脸姿态相对估计的初始旋转矩阵RstartStep 1: In the case of the existing RGB-D data, use the Dlib face key point detection CNN model to carry out two corresponding angle RGB images I source and I target , including the facial contour, mouth, eyes, nose and other parts. 68 face key points are detected, and a plurality of RGB-D three-dimensional coordinate points {KeyPoint snum } and {KeyPoint tnum } corresponding to the key points are selected, and two sets of three-dimensional coordinate points are matched to obtain the initial rotation matrix R of the relative estimation of the face pose start .

步骤2:根据步骤1中的二维关键点坐标,在RGB图像Isource和Itarget中分别裁剪大小不同的区域用来匹配。本发明的目的是计算Itarget相对于Isource的相对姿态,因此Isource裁剪区域I′source大于Itarget裁剪区域I′targetStep 2: According to the two-dimensional key point coordinates in step 1, regions of different sizes are cropped in the RGB images I source and I target for matching. The purpose of the present invention is to calculate the relative pose of I target relative to I source , so the clipping area I' source of I source is larger than the clipping area I' target of I target .

步骤3:对步骤2中得到两个姿态对应的人脸区域I′source和I′target进行下采样,得到对应的人脸区域I″source和I′target,对于I′target中的点,选取其中点集t∈I′target,同时在I″source中选取点集s∈I″source。得到点集后,计算点集中每一点Pi∈t与点集中其他所有点Pj∈t并且i≠j的点对特征FPij,得到FP组成的矩阵FPmat,大小为NP*NP,其中NP为点集中点的数量。以同样方式计算Qi∈s的点对特征FQij组成的矩阵FQmat,大小为NQ*NQ。将点对特征矩阵FPmat和FQmat中的特征FPij和FQij进行哈希并保存到哈希表H中,对于最匹配的点对特征分配相同的哈希值,由此在两个点集t和s中找到对应的点对并形成点集,分别为PPt和PPs。然后利用奇异值分解的方法,计算PPt和PPs之间的旋转矩阵R和空间平移T。迭代直到||PPs-PPt||小于一定阈值ε或迭代次数超过一定次数限制Itermax后停止迭代。Step 3: Down-sample the face regions I' source and I' target corresponding to the two postures obtained in step 2 to obtain the corresponding face regions I" source and I' target . For the points in I' target , select Among them, the point set t∈I′ target , and select the point set s∈I″ source in the I″ source at the same time. After obtaining the point set, calculate each point P i ∈ t in the point set and all other points in the point set P j ∈ t and i Point-to-feature FP ij of ≠ j, get the matrix FP mat composed of FP, the size is N P *NP, where N P is the number of points in the point set . Calculate the point-to-feature FQ ij composition of Q i ∈ s in the same way The matrix FQ mat with a size of N Q *N Q. Hash the features FP ij and FQ ij in the point pair feature matrix FP mat and FQ mat and save them in the hash table H. For the most matching point pair feature Assign the same hash value, thus find the corresponding point pairs in the two point sets t and s and form point sets, respectively PP t and PP s . Then use the method of singular value decomposition to calculate PP t and PP s Between the rotation matrix R and the space translation T. Iterate until ||PP s -PP t || is less than a certain threshold ε or the number of iterations exceeds a certain limit Iter max to stop the iteration.

步骤4:将步骤2中得到的旋转矩阵Rstart与步骤3迭代多次得到的旋转矩阵R和空间平移T进行矩阵相乘计算,得到旋转矩阵Rfinal。用旋转矩阵与欧拉角之间的变换关系,得到相对姿态的估计结果{anglex,angley,anglez}。Step 4: Perform matrix multiplication calculation on the rotation matrix R start obtained in step 2, the rotation matrix R obtained by multiple iterations in step 3, and the spatial translation T to obtain the rotation matrix R final . Using the transformation relationship between the rotation matrix and the Euler angles, the estimated result of the relative pose {angle x , angle y , angle z } is obtained.

进一步地,所述步骤1中使用的Dlib关键点检测使用Python第三方库d1ib中d1ib.get_frontal_face_detection()函数进行人脸检测,并使用模型shape_predictor_68_landmarks.dat进行关键点检测。Further, the Dlib key point detection used in step 1 uses the dlib.get_frontal_face_detection() function in the Python third-party library dlib for face detection, and uses the model shape_predictor_68_landmarks.dat for key point detection.

进一步地,所述步骤2中,选择5个不在同一平面且相对稳定不易被遮挡的两组关键点{KeyPointsnum}和{KeyPointtnum}的进行匹配,匹配点编号固定为snum,tnum=31,37,46,49,55,分别为2个外眼角、2个外嘴角、鼻尖。Further, in the step 2, select 5 groups of key points {KeyPoint snum } and {KeyPoint tnum } that are not on the same plane and are relatively stable and not easy to be blocked for matching. The number of matching points is fixed as snum, tnum=31, 37, 46, 49, 55, respectively 2 outer corners of the eyes, 2 outer corners of the mouth, and the tip of the nose.

进一步地,所述步骤2中对两组关键点(KeyPointsnum}和(KeyPointtnum}的匹配,使用的匹配方法来自OpenCV中的仿射变换函数estimateAffine3D()。Further, for the matching of two groups of key points (KeyPoint snum } and (KeyPoint tnum } in the step 2, the matching method used comes from the affine transformation function estimateAffine3D() in OpenCV.

进一步地,所述的步骤3中对人脸区域的裁剪基于人脸关键点的二维坐标,方法具体如下:Further, the clipping of the face area in the described step 3 is based on the two-dimensional coordinates of the key points of the face, and the method is as follows:

人脸的裁剪区域为矩形,矩形的四条边由关键点坐标{Xsnum,Ysnum}和{Xtnum,Ytnum}的两个维度上的最大值和最小值决定,分别为(RecLeft=min{Xsnum},RecRight=max{Xsnum},RecUp=min{Ysnum},RecBottom=max{Ysnum}}。其中,Isource裁剪区域I′source的边界在4个最值的基础上再向外扩5个像素,分别为{RecLeft-5,RecRight+5,RecUp-5,RecBottom+5}The clipping area of the face is a rectangle, and the four sides of the rectangle are determined by the maximum and minimum values in the two dimensions of the key point coordinates {X snum , Y snum } and {X tnum , Y tnum }, respectively (Rec Left = min{X snum }, Rec Right =max{X snum }, Rec Up =min{Y snum }, Rec Bottom =max{Y snum }}. Wherein, the boundary of I source clipping area I' source is in 4 most value On the basis of , expand 5 pixels outward, which are {Rec Left -5, Rec Right +5, Rec Up -5, Rec Bottom +5}

而Itarget裁剪区域I′target的边界在4个最值的基础上再向内缩5个像素,分别为{RecLeft+5,RecRight-5,RecUp+5,RecBottom-5}。On the basis of the 4 most values, the boundary of the I target clipping area I′ target shrinks inward by 5 pixels, which are respectively {Rec Left +5, Rec Right -5, Rec Up +5, Rec Bottom -5}.

进一步地,所述步骤3中,计算点集中的点对特征可以采用六维点云数据(x,y,depth,R,G,B}得到十维特征描述子,也可以使用三维点云数据(x,y,depth}得到四维特征描述子,减少特征生成所需时间以及占用空间。Further, in the step 3, the point pair features in the point set can be calculated using six-dimensional point cloud data (x, y, depth, R, G, B} to obtain ten-dimensional feature descriptors, or three-dimensional point cloud data can be used (x, y, depth} to obtain a four-dimensional feature descriptor, reducing the time required for feature generation and occupying space.

对于点对特征的详细描述,FPij=CPPF(PPF(pi,pj,ni,nj),ci,cj),其中

Figure BDA0002840281060000041
表示RGB向量,PPF(pi,pj,ni,nj)=(||d||2,∠(ni,d),∠(nj,d),∠(ni,nj)),其中d=pi-pj,pi,pj表示物体表面选取点的坐标,ni,nj表示选取点的法向量,∠表示向量间的夹角,CPPF表示带有颜色的PPF。For a detailed description of point pair features, FP ij = CPPF(PPF(p i , p j , n i , n j ), c i , c j ), where
Figure BDA0002840281060000041
Indicates RGB vector, PPF(p i , p j , n i , n j )=(||d|| 2 , ∠(n i , d), ∠(n j , d), ∠(n i , n j )), where d=p i -p j , p i , p j represent the coordinates of the selected point on the surface of the object, n i , n j represent the normal vector of the selected point, ∠ represents the angle between the vectors, CPPF represents the color The PPF.

进一步地,迭代之前会对数据进行下采样的预处理,使用的是随机采样,选自Python模块random中的sample()函数。使用不同比例的下采样可以在迭代速度与迭代精度之间进行权衡,一般比例为b=0.1,0.3,0.5,0.8Further, before the iteration, the data is preprocessed for downsampling, using random sampling, which is selected from the sample() function in the Python module random. Using different ratios of downsampling can trade off between iteration speed and iteration accuracy, the general ratio is b=0.1, 0.3, 0.5, 0.8

进一步地,迭代的过程中使用到了求最近点的方法,该方法选择Python第三方模块sklearn模块进行最近点的选取,使用函数为neighbors.NearestNeighbors()Furthermore, the method of finding the nearest point is used in the iterative process. This method selects the Python third-party module sklearn module to select the nearest point. The function used is neighbors.NearestNeighbors()

进一步地,迭代的过程中使用到了求旋转矩阵和空间平移的方法,该方法选择奇异值分解的方法,选自于Python语言扩展库numpy的函数numpy.linalg.svd()Further, the method of finding the rotation matrix and spatial translation is used in the iterative process. This method chooses the method of singular value decomposition, which is selected from the function numpy.linalg.svd() of the Python language extension library numpy

进一步地,在迭代的过程中,每次计算得到旋转矩阵R和空间平移T后都需要对I′target的数据(x,y,depth}进行变换,使其在下一次迭代之前更新自己的位置从而寻找新的最近点。Furthermore, in the iterative process, after each calculation of the rotation matrix R and the spatial translation T, it is necessary to transform the data (x, y, depth) of the I' target so that it updates its position before the next iteration so that Find the new closest point.

本发明的有益效果是:The beneficial effects of the present invention are:

(1)通过实现点云匹配,把相对姿态估计融合在点云匹配的过程中,使得精度相对于单纯的提取特征并估计有较大的提高(1) Through the realization of point cloud matching, the relative pose estimation is integrated into the process of point cloud matching, so that the accuracy is greatly improved compared with the simple extraction of features and estimation

(2)通过对关键点的检测并初步匹配,为后续的迭代最近点过程提供了良好的初值,避免了直接使用迭代最近点过程导致算法朝着错误的方向迭代或无法收敛(2) Through the detection and initial matching of the key points, a good initial value is provided for the subsequent iterative closest point process, which avoids the direct use of the iterative closest point process to cause the algorithm to iterate in the wrong direction or fail to converge

(3)通过对人脸区域的裁剪,使得目标点云点数相对较少,源点云点数相对较多,这样做可以增强匹配过程中的鲁棒性,减少了无法收敛的可能性(3) By clipping the face area, the number of target point cloud points is relatively small, and the number of source point cloud points is relatively large. This can enhance the robustness of the matching process and reduce the possibility of non-convergence

(4)通过对点对特征的引入,减少了原始ICP算法中使用欧氏距离来寻找对应点的弊端,引入点对特征可以更好更准确的寻找到对应点来进行点云匹配。(4) Through the introduction of point pair features, the disadvantages of using Euclidean distance to find corresponding points in the original ICP algorithm are reduced. The introduction of point pair features can better and more accurately find corresponding points for point cloud matching.

附图说明Description of drawings

图1为本发明实施例的结合人脸关键点检测的人脸相对姿态估计方法的步骤流程图;Fig. 1 is the flow chart of the steps of the method for estimating the relative pose of a human face combined with key point detection of a human face according to an embodiment of the present invention;

图2为本发明实施例的相对姿态估计的损失统计。Fig. 2 is the loss statistics of the relative pose estimation according to the embodiment of the present invention.

具体实施方案specific implementation plan

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应该理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

相反,本发明涵盖任何由权利要求定义的在本发明的精髓和范围上做的替代、修改、等效方法以及方案。进一步,为了使公众对本发明有更好的了解,在下文对本发明的细节描述中,详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述可以完全理解本发明。On the contrary, the invention covers any alternatives, modifications, equivalent methods and schemes within the spirit and scope of the invention as defined by the claims. Further, in order to make the public have a better understanding of the present invention, some specific details are described in detail in the detailed description of the present invention below. Without the description of these detailed parts, the present invention can be fully understood by those skilled in the art.

参考图1为本发明实施例的结合人脸关键点检测的人脸相对姿态估计方法的步骤流程图。Referring to FIG. 1 , it is a flow chart of steps of a method for estimating a relative pose of a human face combined with key point detection of a human face according to an embodiment of the present invention.

对源数据Datasource={源图像Imagesource,源深度数据Depthsource}和目标数据Datasource={目标图像Imagetarget,目标深度数据Depthtarget},对其进行以下步骤处理:For source data Data source ={source image Image source , source depth data Depth source } and target data Data source ={target image Image target , target depth data Depth target }, it is processed in the following steps:

1.检测人脸关键点。具体地:1. Detect face key points. specifically:

(1.1)导入第三方库Dlib,并导入人脸关键点检测模型shape_predictor_68_landmarks.dat(1.1) Import the third-party library Dlib, and import the face key point detection model shape_predictor_68_landmarks.dat

(1.2)将Imagesource和Imagetarget送入检测模型中,进行关键点的识别,得到图像中人脸的关键点坐标{Xsnum,Ysnum}和{Xtnum,Ytnum},其中snum,tnum=1,...,68(1.2) Send the Image source and Image target into the detection model to identify key points, and obtain the key point coordinates {X snum , Y snum } and {X tnum , Y tnum } of the face in the image, where snum, tnum =1,...,68

2.选取较为稳定的关键点进行相对位置的初步估计。具体地:2. Select relatively stable key points for preliminary estimation of relative positions. specifically:

(2.1)选取较为稳定的5个关键点,分别为2个外眼角、2个外嘴角、鼻尖,对应的关键点编号分别为snum,tnum=31,37,46,49,55,得到两组关键点{KeyPointsnum}和{KeyPointtnum}(2.1) Select 5 relatively stable key points, namely 2 outer corners of the eyes, 2 outer corners of the mouth, and the tip of the nose. The corresponding key points are numbered snum, tnum=31, 37, 46, 49, 55, and two groups are obtained keypoint {KeyPoint snum } and {KeyPoint tnum }

(2.2)得到两组关键点{KeyPointsnum}和{KeyPointtnum}后,导入第三方库Opencv,使用其中的estimateAffine3D()函数来对两组关键点进行配准,得到初始旋转矩阵Rstart (2.2) After obtaining two sets of key points {KeyPoint snum } and {KeyPoint tnum }, import the third-party library Opencv, use the estimateAffine3D() function to register the two sets of key points, and get the initial rotation matrix R start

3.人脸区域裁剪。将源图像的人脸裁剪出较大的区域,目标图像人脸裁剪出较小的区域,具体地:3. Crop the face area. Crop the face of the source image into a larger area, and the face of the target image into a smaller area, specifically:

(3.1)人脸的裁剪区域为矩形,裁剪的数据不仅包含RGB,也包含Depth数据。矩形的四条边由关键点坐标{Xsnum,Ysnum}和{Xtnum,Ytnum}的两个维度上的最大值和最小值,共4个值,分别为(RecLeft=min{Xsnum},RecRight=max(Xsnum},RecUp=min(Ysnum},RecBottom=max{Ysnum}}所决定。(3.1) The cropped area of the face is a rectangle, and the cropped data includes not only RGB, but also Depth data. The four sides of the rectangle consist of the key point coordinates {X snum , Y snum } and the maximum and minimum values on the two dimensions of {X tnum , Y tnum }, a total of 4 values, respectively (Rec Left =min{X snum }, Rec Right = max(X snum }, Rec Up = min(Y snum }, Rec Bottom = max{Y snum }}.

(3.2)Isource裁剪相对较大的区域,因此裁剪区域I′source的边界在上述4个最值的基础上再向外扩5个像素,分别为{RecLeft-5,RecRight+5,RecUp-5,RecBottom+5}(3.2) I source crops a relatively large area, so the boundary of the cropped area I' source is expanded by 5 pixels on the basis of the above-mentioned 4 most values, which are respectively {Rec Left -5, Rec Right +5, Rec Up -5, Rec Bottom +5}

(3.3)Itarget裁剪相对较小的区域,因此裁剪区域I′target的边界在上述4个最值的基础上再向内缩5个像素,分别为(RecLeft+5,RecRight-5,RecUp+5,RecBottom-5}。(3.3) The I target crops a relatively small area, so the boundary of the cropped area I' target shrinks inward by 5 pixels on the basis of the above-mentioned 4 most values, respectively (Rec Left +5, Rec Right -5, Rec Up +5, Rec Bottom -5}.

4.对裁剪区域的点云数据进行匹配,具体地:4. Match the point cloud data of the clipping area, specifically:

(4.1)裁剪的数据中包含(x,y,depth,R,G,B}。匹配的过程使用的是迭代最近点的方法,迭代之前会对数据进行下采样的预处理,使用的是随机采样,选自Python模块random中的sample()函数。使用不同比例的下采样可以在迭代速度与迭代精度之间进行权衡,一般比例为b=0.1,0.3,0.5,0.8(4.1) The cropped data contains (x, y, depth, R, G, B}. The matching process uses the method of iterating the closest point. Before the iteration, the data will be down-sampled and preprocessed, using random Sampling, selected from the sample() function in the Python module random.Using different proportions of downsampling can trade-off between iteration speed and iteration accuracy, the general proportion is b=0.1,0.3,0.5,0.8

(4.2)下采样过后的点云可以进行迭代最近点匹配来得到最后的结果。得到对应的人脸区域I″source和I′target,首先对于I′target中的点,选取其中点集t∈I′target,同时在I″source中选取点集s∈I″source。得到点集后,计算点集中每一点Pi∈t与点集中其他所有点Pj∈t并且i≠j的点对特征FPij,得到FP组成的矩阵FPmat,大小为NP*NP,其中NP为点集中点的数量。以同样方式计算Qi∈s的点对特征FQij组成的矩阵FQmat,大小为NQ*NQ,其中NQ为点集中点的数量。(4.2) The point cloud after downsampling can be iteratively matched to the nearest point to get the final result. To obtain the corresponding face area I″ source and I′ target , first, for the points in I′ target , select the point set t∈I′ target , and select the point set s∈I″ source in I″ source at the same time. Get After the point set, calculate the point pair feature FP ij of each point P i ∈ t in the point set and all other points P j ∈ t and i≠j in the point set, and obtain the matrix FP mat composed of FP, the size of which is N P * NP , Where N P is the number of points in the point set. In the same way, calculate the matrix FQ mat composed of point pair features FQ ij of Q i ∈ s, and the size is N Q *N Q , where N Q is the number of points in the point set.

(4.3)将点对特征矩阵FPmat和FQmat中的特征FPij和FQij进行哈希并保存到哈希表H中,对于比较相似的点对特征将分配相同的哈希值,由此来在两个点集t和s中找到对应的点对,分别为PPt和PPs。然后利用奇异值分解的方法,计算PPt和PPs之间的旋转矩阵R和空间平移T。迭代直到||PPs-PPt||小于一定阈值ε或迭代次数超过一定次数限制Itermax后停止迭代。(4.3) Hash the features FP ij and FQ ij in the point pair feature matrix FP mat and FQ mat and save them in the hash table H, and assign the same hash value to similar point pair features, thus To find the corresponding point pairs in the two point sets t and s, respectively PP t and PP s . Then use the method of singular value decomposition to calculate the rotation matrix R and space translation T between PP t and PP s . Iterate until ||PP s -PP t || is less than a certain threshold ε or the number of iterations exceeds a certain limit Iter max , then stop iterating.

5.对两次配准的结果进行综合。具体地:5. Synthesize the results of the two registrations. specifically:

步骤(2.2)中得到初始旋转矩阵Rstart,步骤(4.3)中得到迭代过程中的一系列旋转矩阵Ri,其中i=1,...,N。将矩阵进行连乘,得到最终的矩阵结果Rfinal。在已有Rfinal的情况下,用旋转矩阵与欧拉角之间的变换关系,最终可以得到相对姿态的估计结果(anglex,angley,anglez}The initial rotation matrix R start is obtained in step (2.2), and a series of rotation matrices R i in the iterative process are obtained in step (4.3), where i=1, . . . , N. Multiply the matrix to get the final matrix result R final . In the case of the existing R final , using the transformation relationship between the rotation matrix and the Euler angle, the estimation result of the relative attitude can be finally obtained (angle x , angle y , angle z }

图2(a)为一对不同姿态人脸相对姿态估计的旋转角pitch的损失分布,横轴为损失大小,单位为°,刻度为0.1°。纵轴为样本数,总样本数约为200。Figure 2(a) shows the loss distribution of the rotation angle pitch of a pair of faces with different poses relative to pose estimation. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

图2(b)为一对不同姿态人脸相对姿态估计的旋转角yaw的损失分布,横轴为损失大小,单位为°,刻度为0.1°。纵轴为样本数,总样本数约为200。Figure 2(b) shows the loss distribution of the rotation angle yaw of the relative pose estimation of a pair of faces with different poses. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

图2(c)为一对不同姿态人脸相对姿态估计的旋转角roll的损失分布,横轴为损失大小,单位为°,刻度为0.1°。纵轴为样本数,总样本数约为200。Figure 2(c) shows the loss distribution of the rotation angle roll of the relative pose estimation of a pair of faces with different poses. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

上述结果说明,采用本发明方法的姿态估计结果精度很高。The above results show that the accuracy of the pose estimation results using the method of the present invention is very high.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims (5)

1. A human face relative attitude estimation method combining human face key point detection is characterized by comprising the following steps:
step 1: for two RGB images I with corresponding angles source And I target Face key point detection is carried out, and a plurality of RGB-D three-dimensional coordinate points { KeyPoint' corresponding to key points are selected from the face key point detection snum And { KeyPoint }and { KeyPoint tnum And matching two groups of three-dimensional coordinate points to obtain an initial rotation matrix R of face pose relative estimation start
Step 2: according to the two-dimensional key point coordinates in the step 1, in the RGB image I source And I target Respectively cutting areas with different sizes for matching; wherein I source Region I 'of cutting' source Is greater than I target Region I 'to be cut' target
And step 3: obtaining face regions I 'corresponding to the two postures in the step 2' source And I' target Down sampling is carried out to obtain a corresponding face area I ″) source And I ″) target For I ″) target Selecting the point in the point set t belongs to I ″) target While at I ″) source Wherein the selected point set s belongs to I ″) source (ii) a Calculate each point P in the set of points i E t and all other points P in the point set j Point pair characteristics FP of e t and i ≠ j ij To obtain FP ij Composed matrix FP mat Size is N P *N P In which N is P The number of points in the point set; q is calculated in the same manner i E s point pair characteristic FQ ij Composed matrix FQ mat Size is N Q *N Q (ii) a Point pair feature matrix FP mat And FQ mat Feature FP in ij And FQ ij Hashing and storing the hashed values into a hash table H, and distributing the same hash value to the characteristics of the most matched point pairs, thereby finding corresponding point pairs in two point sets t and s and forming point sets which are respectively PP t And PP s (ii) a Calculating PP by singular value decomposition t And PP s A rotation matrix R and a spatial translation T in between; iterating until | | PP s -PP t Stopping iteration when | | is less than a certain threshold value epsilon or the iteration times reach a certain number;
and 4, step 4: the rotation matrix R obtained in the step 1 is processed start Performing matrix multiplication calculation with the rotation matrix R obtained by iteration for multiple times in the step 3 and the space translation T to obtain a rotation matrix R final (ii) a Obtaining the estimation result of the relative attitude { angle using the transformation relation between the rotation matrix and the Euler angle x ,angle y ,angle z }。
2. The method according to claim 1, wherein in step 1, two groups of key points { KeyPoint ™ snum And { KeyPoint }and { KeyPoint tnum Match, number of matching points is fixed to snum, tnum =31,37,46,49,55, 2 external canthus, 2 external mouth canthus and nose tip respectively.
3. The method according to claim 1, wherein the clipping of the face region in step 2 is based on two-dimensional coordinates of key points of the face, and the method specifically comprises the following steps:
the cutting area of the face is a rectangle, and the four sides of the rectangle are provided with key point coordinates { X snum ,Y snum And { X } tnum ,Y tnum Maximum and minimum decisions in two dimensions of the }; wherein, I source Region I 'of cutting' source Is expanded by 5 pixels on the basis of 4 maxima, and I target Region I 'of cutting' target The boundary of (2) is further shrunk by 5 pixels on the basis of 4 maxima.
4. The method according to claim 1, wherein in the step 3, the point pair feature in the point set is calculated by using six-dimensional point cloud data { x, y, depth, R, G, B } to obtain a ten-dimensional feature descriptor: FP ij =CPPF(PPF(p i ,p j ,n i ,n j ),c i ,c j ) Or a four-dimensional feature descriptor obtained by using three-dimensional point cloud data { x, y, depth }: FP ij =PPF(p i ,p j ,n i ,n j ) (ii) a Wherein c is i
Figure FDA0003833169810000021
Representing an RGB vector, PPF (p) i ,p j ,n i ,n j )=(||d|| 2 ,∠(n i ,d),∠(n j ,d),∠(n i ,n j ) Wherein d = p) i -p j ,p i ,p j Coordinates representing selected points of the surface of the object, n i ,n j Normal vectors of selected points are represented, an included angle between vectors is represented, and CPPF represents colored PPF.
5. The method according to claim 1, wherein in step 3, in the iterative process, after each calculation to obtain the rotation matrix R and the spatial translation T, the pair I ″ "is required target Is transformed so that it updates its position before the next iteration to form a new point pair feature.
CN202011489338.XA 2020-12-16 2020-12-16 Face Relative Pose Estimation Method Combined with Face Keypoint Detection Expired - Fee Related CN112580496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011489338.XA CN112580496B (en) 2020-12-16 2020-12-16 Face Relative Pose Estimation Method Combined with Face Keypoint Detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011489338.XA CN112580496B (en) 2020-12-16 2020-12-16 Face Relative Pose Estimation Method Combined with Face Keypoint Detection

Publications (2)

Publication Number Publication Date
CN112580496A CN112580496A (en) 2021-03-30
CN112580496B true CN112580496B (en) 2023-01-10

Family

ID=75135563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011489338.XA Expired - Fee Related CN112580496B (en) 2020-12-16 2020-12-16 Face Relative Pose Estimation Method Combined with Face Keypoint Detection

Country Status (1)

Country Link
CN (1) CN112580496B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645170A (en) * 2009-09-03 2010-02-10 北京信息科技大学 Precise registration method of multilook point cloud
CN106570460A (en) * 2016-10-20 2017-04-19 三明学院 Single-image human face posture estimation method based on depth value
CN109492589A (en) * 2018-11-13 2019-03-19 重庆工程职业技术学院 The recognition of face working method and intelligent chip merged by binary features with joint stepped construction
CN111191599A (en) * 2019-12-27 2020-05-22 平安国际智慧城市科技股份有限公司 Gesture recognition method, device, equipment and storage medium
CN111414798A (en) * 2019-02-03 2020-07-14 沈阳工业大学 Head posture detection method and system based on RGB-D image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015006224A1 (en) * 2013-07-08 2015-01-15 Vangogh Imaging, Inc. Real-time 3d computer vision processing engine for object recognition, reconstruction, and analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645170A (en) * 2009-09-03 2010-02-10 北京信息科技大学 Precise registration method of multilook point cloud
CN106570460A (en) * 2016-10-20 2017-04-19 三明学院 Single-image human face posture estimation method based on depth value
CN109492589A (en) * 2018-11-13 2019-03-19 重庆工程职业技术学院 The recognition of face working method and intelligent chip merged by binary features with joint stepped construction
CN111414798A (en) * 2019-02-03 2020-07-14 沈阳工业大学 Head posture detection method and system based on RGB-D image
CN111191599A (en) * 2019-12-27 2020-05-22 平安国际智慧城市科技股份有限公司 Gesture recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112580496A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
Bao et al. Adaptive feature fusion network for gaze tracking in mobile tablets
Dabouei et al. Fast geometrically-perturbed adversarial faces
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
CN104036546B (en) Method for carrying out face three-dimensional reconstruction at any viewing angle on basis of self-adaptive deformable model
CN107679537B (en) A Pose Estimation Algorithm for Objects in Untextured Space Based on Contour Point ORB Feature Matching
CN104700076B (en) Facial image virtual sample generation method
CN114140418B (en) Seven-DOF grasping posture detection method based on RGB image and depth image
CN102013011B (en) Front-face-compensation-operator-based multi-pose human face recognition method
CN115984439B (en) A method and device for generating three-dimensional adversarial texture of camouflaged target
CN110751097B (en) Semi-supervised three-dimensional point cloud gesture key point detection method
CN111815768B (en) Three-dimensional face reconstruction method and device
CN110895683A (en) Kinect-based single-viewpoint gesture and posture recognition method
Wahid et al. Advanced human pose estimation and event classification using context-aware features and XGBoost classifier
Song et al. Image matching and localization based on fusion of handcrafted and deep features
CN108694348B (en) Tracking registration method and device based on natural features
CN115008454A (en) An online hand-eye calibration method for robots based on multi-frame pseudo-label data enhancement
CN112580496B (en) Face Relative Pose Estimation Method Combined with Face Keypoint Detection
CN110895684B (en) A gesture recognition method based on Kinect
Zhang et al. Hand tracking algorithm based on superpixels feature
CN108932726A (en) A kind of method for tracking target and device
CN114820899B (en) A posture estimation method and device based on multi-view rendering
CN110223319A (en) Based on the dynamic object method for real time tracking and system for improving geometry particle filter
CN105405143B (en) Gesture segmentation method and system based on global expectation-maximization algorithm
Lin et al. 6D object pose estimation with pairwise compatible geometric features
Göçmen et al. Polygonized Silhouettes and Polygon Coding Based Feature Representation for Human Action Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20230110