CN112580496B

CN112580496B - Face Relative Pose Estimation Method Combined with Face Keypoint Detection

Info

Publication number: CN112580496B
Application number: CN202011489338.XA
Authority: CN
Inventors: 于慧敏; 刘柏邑
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2023-01-10
Anticipated expiration: 2040-12-16
Also published as: CN112580496A

Abstract

The invention discloses a method for estimating the relative pose of a human face by combining key points of the human face and RGB-D data. This method recognizes the key points of the face on the RGB image and calculates the initial pose angle difference, and combines the depth data to match the RGB-D point cloud to obtain a relatively high-precision relative pose of the face between different angles. estimate. Different from the previous recognition method, this method uses RGB-D data, and has lower requirements on the size of the data set, and can achieve high-precision relative pose estimation on small sample data. Moreover, it also combines face key point information to enhance the robustness of pose estimation.

Description

Face Relative Pose Estimation Method Combined with Face Keypoint Detection

技术领域technical field

本发明属于图像识别、人脸关键点识别、人脸姿态估计，特别地涉及一种基于RGB-D数据和人脸关键点的人脸相对位姿估计方法。The invention belongs to image recognition, human face key point recognition, and human face pose estimation, and particularly relates to a method for estimating relative face pose based on RGB-D data and human face key points.

背景技术Background technique

人脸相对姿态有着比较重要的研究价值。其目标是根据一对不同角度的人脸RGB-D数据，精准估计出二者之间的相对姿态。人脸相对姿态估计有以下主要应用场景：The relative pose of faces has important research value. Its goal is to accurately estimate the relative pose between a pair of face RGB-D data from different angles. Face relative pose estimation has the following main application scenarios:

(1)注意力检测：通过判断头部姿态可以判断人的注意力情况。比如可以检测长途司机是不是在目视前方，长时间不目视前方的话，可以提前敲打，保证安全，减少事故；再比如监控学生上课时是否集中精力等。(1) Attention detection: By judging the head posture, the human attention can be judged. For example, it can detect whether the long-distance driver is looking ahead. If he does not look ahead for a long time, he can beat in advance to ensure safety and reduce accidents; another example is to monitor whether students are concentrating in class.

(2)行为分析。和(1)类似，通过视频监控分析再辅助其他算法可以判断一个人是否具有不轨行为，做到提前预警，防患于未然。(2) Behavior analysis. Similar to (1), video surveillance analysis and other algorithms can be used to judge whether a person has misbehavior, so as to achieve early warning and prevent problems before they happen.

(3)人机互动。人的头部动作有时可以表示意义，传递信息。摇头在大多数人看来是否认，点头表示同意，长时间低头说明可能正在思考问题。如果机器人能理解这样的行为，将提高人机交互的质量和有效性。(3) Human-computer interaction. Human head movements can sometimes express meaning and convey information. Shaking the head means denial in most people's eyes, nodding means agreeing, and bowing the head for a long time means that you may be thinking about a problem. If robots can understand such behaviors, it will improve the quality and effectiveness of human-robot interactions.

(4)视线追踪，也可以称为眼球跟踪。准确的头部姿态估计能够提高视线追踪的精度。视线追踪可以用在游戏领域，比如用眼睛就可以控制游戏内人物的移动，让体感操作更上一层楼。(4) Gaze tracking, also known as eye tracking. Accurate head pose estimation can improve the accuracy of gaze tracking. Eye tracking can be used in the field of games. For example, the movement of characters in the game can be controlled with the eyes, so that the somatosensory operation can be improved to a higher level.

对于人脸相对姿态估计，大体可以分为两个流派。第一种是基于特征的方法。在这种方法中，通过传统方法对人脸上特征的分析并与标准正脸的特征相比较，得到两种特征之间的差别，对特征之间的差别进行分析可以得到相对的姿态估计。For face relative pose estimation, it can be roughly divided into two genres. The first is a feature-based approach. In this method, the traditional method is used to analyze the features of the human face and compare them with the features of the standard frontal face to obtain the difference between the two features, and the relative pose estimation can be obtained by analyzing the difference between the features.

而另一种则是主要基于深度学习的方法。通过设计不同结构的深度学习网络，对当前人脸的特征进行提取并回归角度，得到当前人脸的姿态。随着计算资源的发展，这类方法逐渐变得火热，但是极大地受到模型大小的限制以及对精度的要求。The other is a method mainly based on deep learning. By designing deep learning networks with different structures, the features of the current face are extracted and the angle is returned to obtain the pose of the current face. With the development of computing resources, such methods have gradually become popular, but they are greatly limited by the size of the model and the requirements for accuracy.

点云可以在三维空间中比较准确的反映出物体表面的真实大小与形状结构，是一种对物体比较原始的三维结构表征，随着数据获取变得容易逐渐成为了计算机视觉领域的一种比较重要的数据结构。Point cloud can accurately reflect the real size and shape structure of the object surface in three-dimensional space. It is a relatively primitive three-dimensional structural representation of objects. As data acquisition becomes easier, it has gradually become a comparison in the field of computer vision. important data structures.

发明内容Contents of the invention

本发明的目的在于提供一种基于RGB-D数据的结合人脸关键点检测的人脸相对姿态估计方法。该方法通过关键点检测以及引入点对特征来进行迭代最近点的方法，实现了对一组不同角度的人脸进行相对姿态估计的过程。在本发明中，使用点云来进行估计比单纯使用RGB图像来估计结果要更精准，将相对姿态估计的过程融合进点云匹配过程可以最大程度地提高结果精度。The object of the present invention is to provide a method for estimating the relative pose of a human face based on RGB-D data combined with key point detection of the human face. This method implements the process of relative pose estimation for a group of faces from different angles by means of key point detection and the introduction of point pair features to iterate the closest point. In the present invention, using the point cloud to estimate is more accurate than simply using RGB images to estimate the result, and fusing the process of relative attitude estimation into the point cloud matching process can maximize the accuracy of the result.

为实现上述目的，本发明的技术方案为：结合人脸关键点检测的人脸相对姿态估计方法。该方法为：首先使用人脸关键点检测技术，对人脸的68个关键点进行提取并利用两组关键点的三维坐标间的匹配进行第一次的相对姿态的估计。在第一次的基础上，第二次的相对姿态估计有了比较好的初值，同时再对面部进行裁剪，使得迭代最近点算法前的预处理最优化，最后对六维坐标的点对特征进行迭代的匹配，在匹配的过程中得到相对姿态的估计。In order to achieve the above object, the technical solution of the present invention is: a method for estimating the relative pose of a human face combined with key point detection of a human face. The method is as follows: firstly, 68 key points of the human face are extracted by using the key point detection technology of the human face, and the first relative attitude estimation is carried out by using the matching between the three-dimensional coordinates of two groups of key points. On the basis of the first time, the relative pose estimation of the second time has a better initial value, and at the same time, the face is cropped to optimize the preprocessing before iterating the nearest point algorithm, and finally the point pairs of six-dimensional coordinates The features are matched iteratively, and the relative pose is estimated during the matching process.

具体地，本发明方法包括如下步骤：Specifically, the inventive method comprises the following steps:

步骤1：在已有RGB-D数据的情况下，采用Dlib人脸关键点检测CNN模型对两张对应角度的RGB图像I_source和I_target进行包含脸部轮廓、嘴巴、眼睛、鼻子等部位共68个人脸关键点检测，从中选取多个对应关键点的RGB-D三维坐标点{KeyPoint_snum}和{KeyPoint_tnum}，进行两组三维坐标点的匹配得到人脸姿态相对估计的初始旋转矩阵R_start。Step 1: In the case of the existing RGB-D data, use the Dlib face key point detection CNN model to carry out two corresponding angle RGB images I _source and I _target , including the facial contour, mouth, eyes, nose and other parts. 68 face key points are detected, and a plurality of RGB-D three-dimensional coordinate points {KeyPoint _snum } and {KeyPoint _tnum } corresponding to the key points are selected, and two sets of three-dimensional coordinate points are matched to obtain the initial rotation matrix R of the relative estimation of the face pose _start .

步骤2：根据步骤1中的二维关键点坐标，在RGB图像I_source和I_target中分别裁剪大小不同的区域用来匹配。本发明的目的是计算I_target相对于I_source的相对姿态，因此I_source裁剪区域I′_source大于I_target裁剪区域I′_target。Step 2: According to the two-dimensional key point coordinates in step 1, regions of different sizes are cropped in the RGB images I _source and I _target for matching. The purpose of the present invention is to calculate the relative pose of I _target relative to I _source , so the clipping area I' _source of I _source is larger than the clipping area I' _target of I _target .

步骤3：对步骤2中得到两个姿态对应的人脸区域I′_source和I′_target进行下采样，得到对应的人脸区域I″_source和I′_target，对于I′_target中的点，选取其中点集t∈I′_target，同时在I″_source中选取点集s∈I″_source。得到点集后，计算点集中每一点P_i∈t与点集中其他所有点P_j∈t并且i≠j的点对特征FP_ij，得到FP组成的矩阵FP_mat，大小为N_P*N_P，其中N_P为点集中点的数量。以同样方式计算Q_i∈s的点对特征FQ_ij组成的矩阵FQ_mat，大小为N_Q*N_Q。将点对特征矩阵FP_mat和FQ_mat中的特征FP_ij和FQ_ij进行哈希并保存到哈希表H中，对于最匹配的点对特征分配相同的哈希值，由此在两个点集t和s中找到对应的点对并形成点集，分别为PP_t和PP_s。然后利用奇异值分解的方法，计算PP_t和PP_s之间的旋转矩阵R和空间平移T。迭代直到||PP_s-PP_t||小于一定阈值ε或迭代次数超过一定次数限制Iter_max后停止迭代。Step 3: Down-sample the face regions I' _source and I' _target corresponding to the two postures obtained in step 2 to obtain the corresponding face regions I" _source and I' _target . For the points in I' _target , select Among them, the point set t∈I′ _target , and select the point set s∈I″ _source in the I″ _source at the same time. After obtaining the point set, calculate each point P _i ∈ t in the point set and all other points in the point set P _j ∈ t and i Point-to-feature FP _ij of ≠ j, get the matrix FP _mat composed of FP, the size is N _P *NP, where N _P is the number of points in the point set _. Calculate the point-to-feature FQ _ij composition of Q _i ∈ s in the same way The matrix FQ _mat with a size of N _Q *N _Q. Hash the features FP _ij and FQ _ij in the point pair feature matrix FP _mat and FQ _mat and save them in the hash table H. For the most matching point pair feature Assign the same hash value, thus find the corresponding point pairs in the two point sets t and s and form point sets, respectively PP _t and PP _s . Then use the method of singular value decomposition to calculate PP _t and PP _s Between the rotation matrix R and the space translation T. Iterate until ||PP _s -PP _t || is less than a certain threshold ε or the number of iterations exceeds a certain limit Iter _max to stop the iteration.

步骤4：将步骤2中得到的旋转矩阵R_start与步骤3迭代多次得到的旋转矩阵R和空间平移T进行矩阵相乘计算，得到旋转矩阵R_final。用旋转矩阵与欧拉角之间的变换关系，得到相对姿态的估计结果{angle_x，angle_y，angle_z}。Step 4: Perform matrix multiplication calculation on the rotation matrix R _start obtained in step 2, the rotation matrix R obtained by multiple iterations in step 3, and the spatial translation T to obtain the rotation matrix R _final . Using the transformation relationship between the rotation matrix and the Euler angles, the estimated result of the relative pose {angle _x , angle _y , angle _z } is obtained.

进一步地，所述步骤1中使用的Dlib关键点检测使用Python第三方库d1ib中d1ib.get_frontal_face_detection()函数进行人脸检测，并使用模型shape_predictor_68_landmarks.dat进行关键点检测。Further, the Dlib key point detection used in step 1 uses the dlib.get_frontal_face_detection() function in the Python third-party library dlib for face detection, and uses the model shape_predictor_68_landmarks.dat for key point detection.

进一步地，所述步骤2中，选择5个不在同一平面且相对稳定不易被遮挡的两组关键点{KeyPoint_snum}和{KeyPoint_tnum}的进行匹配，匹配点编号固定为snum，tnum＝31,37,46,49,55，分别为2个外眼角、2个外嘴角、鼻尖。Further, in the step 2, select 5 groups of key points {KeyPoint _snum } and {KeyPoint _tnum } that are not on the same plane and are relatively stable and not easy to be blocked for matching. The number of matching points is fixed as snum, tnum=31, 37, 46, 49, 55, respectively 2 outer corners of the eyes, 2 outer corners of the mouth, and the tip of the nose.

进一步地，所述步骤2中对两组关键点(KeyPoint_snum}和(KeyPoint_tnum}的匹配，使用的匹配方法来自OpenCV中的仿射变换函数estimateAffine3D()。Further, for the matching of two groups of key points (KeyPoint _snum } and (KeyPoint _tnum } in the step 2, the matching method used comes from the affine transformation function estimateAffine3D() in OpenCV.

进一步地，所述的步骤3中对人脸区域的裁剪基于人脸关键点的二维坐标，方法具体如下：Further, the clipping of the face area in the described step 3 is based on the two-dimensional coordinates of the key points of the face, and the method is as follows:

人脸的裁剪区域为矩形，矩形的四条边由关键点坐标{X_snum，Y_snum}和{X_tnum，Y_tnum}的两个维度上的最大值和最小值决定，分别为(Rec_Left＝min{X_snum}，Rec_Right＝max{X_snum}，Rec_Up＝min{Y_snum}，Rec_Bottom＝max{Y_snum}}。其中，I_source裁剪区域I′_source的边界在4个最值的基础上再向外扩5个像素，分别为{Rec_Left-5，Rec_Right+5，Rec_Up-5，Rec_Bottom+5}The clipping area of the face is a rectangle, and the four sides of the rectangle are determined by the maximum and minimum values in the two dimensions of the key point coordinates {X _snum , Y _snum } and {X _tnum , Y _tnum }, respectively (Rec _Left = min{X _snum }, Rec _Right ＝max{X _snum }, Rec _Up ＝min{Y _snum }, Rec _Bottom ＝max{Y _snum }}. Wherein, the boundary of I _source clipping area I' _source is in 4 most value On the basis of , expand 5 pixels outward, which are {Rec _Left -5, Rec _Right +5, Rec _Up -5, Rec _Bottom +5}

而I_target裁剪区域I′_target的边界在4个最值的基础上再向内缩5个像素，分别为{Rec_Left+5，Rec_Right-5，Rec_Up+5，Rec_Bottom-5}。On the basis of the 4 most values, the boundary of the I _target clipping area I′ _target shrinks inward by 5 pixels, which are respectively {Rec _Left +5, Rec _Right -5, Rec _Up +5, Rec _Bottom -5}.

进一步地，所述步骤3中，计算点集中的点对特征可以采用六维点云数据(x，y，depth，R，G，B}得到十维特征描述子，也可以使用三维点云数据(x，y，depth}得到四维特征描述子，减少特征生成所需时间以及占用空间。Further, in the step 3, the point pair features in the point set can be calculated using six-dimensional point cloud data (x, y, depth, R, G, B} to obtain ten-dimensional feature descriptors, or three-dimensional point cloud data can be used (x, y, depth} to obtain a four-dimensional feature descriptor, reducing the time required for feature generation and occupying space.

对于点对特征的详细描述，FP_ij＝CPPF(PPF(p_i，p_j，n_i，n_j)，c_i，c_j)，其中

表示RGB向量，PPF(p_i，p_j，n_i，n_j)＝(||d||₂，∠(n_i，d)，∠(n_j，d)，∠(n_i，n_j))，其中d＝p_i-p_j，p_i，p_j表示物体表面选取点的坐标，n_i，n_j表示选取点的法向量，∠表示向量间的夹角，CPPF表示带有颜色的PPF。For a detailed description of point pair features, FP _ij = CPPF(PPF(p _i , p _j , n _i , n _j ), c _i , c _j ), where

Indicates RGB vector, PPF(p _i , p _j , n _i , n _j )=(||d|| ₂ , ∠(n _i , d), ∠(n _j , d), ∠(n _i , n _j )), where d=p _i -p _j , p _i , p _j represent the coordinates of the selected point on the surface of the object, n _i , n _j represent the normal vector of the selected point, ∠ represents the angle between the vectors, CPPF represents the color The PPF.

进一步地，迭代之前会对数据进行下采样的预处理，使用的是随机采样，选自Python模块random中的sample()函数。使用不同比例的下采样可以在迭代速度与迭代精度之间进行权衡，一般比例为b＝0.1，0.3，0.5，0.8Further, before the iteration, the data is preprocessed for downsampling, using random sampling, which is selected from the sample() function in the Python module random. Using different ratios of downsampling can trade off between iteration speed and iteration accuracy, the general ratio is b=0.1, 0.3, 0.5, 0.8

进一步地，迭代的过程中使用到了求最近点的方法，该方法选择Python第三方模块sklearn模块进行最近点的选取，使用函数为neighbors.NearestNeighbors()Furthermore, the method of finding the nearest point is used in the iterative process. This method selects the Python third-party module sklearn module to select the nearest point. The function used is neighbors.NearestNeighbors()

进一步地，迭代的过程中使用到了求旋转矩阵和空间平移的方法，该方法选择奇异值分解的方法，选自于Python语言扩展库numpy的函数numpy.linalg.svd()Further, the method of finding the rotation matrix and spatial translation is used in the iterative process. This method chooses the method of singular value decomposition, which is selected from the function numpy.linalg.svd() of the Python language extension library numpy

进一步地，在迭代的过程中，每次计算得到旋转矩阵R和空间平移T后都需要对I′_target的数据(x，y，depth}进行变换，使其在下一次迭代之前更新自己的位置从而寻找新的最近点。Furthermore, in the iterative process, after each calculation of the rotation matrix R and the spatial translation T, it is necessary to transform the data (x, y, depth) of the I' _target so that it updates its position before the next iteration so that Find the new closest point.

本发明的有益效果是：The beneficial effects of the present invention are:

(1)通过实现点云匹配，把相对姿态估计融合在点云匹配的过程中，使得精度相对于单纯的提取特征并估计有较大的提高(1) Through the realization of point cloud matching, the relative pose estimation is integrated into the process of point cloud matching, so that the accuracy is greatly improved compared with the simple extraction of features and estimation

(2)通过对关键点的检测并初步匹配，为后续的迭代最近点过程提供了良好的初值，避免了直接使用迭代最近点过程导致算法朝着错误的方向迭代或无法收敛(2) Through the detection and initial matching of the key points, a good initial value is provided for the subsequent iterative closest point process, which avoids the direct use of the iterative closest point process to cause the algorithm to iterate in the wrong direction or fail to converge

(3)通过对人脸区域的裁剪，使得目标点云点数相对较少，源点云点数相对较多，这样做可以增强匹配过程中的鲁棒性，减少了无法收敛的可能性(3) By clipping the face area, the number of target point cloud points is relatively small, and the number of source point cloud points is relatively large. This can enhance the robustness of the matching process and reduce the possibility of non-convergence

(4)通过对点对特征的引入，减少了原始ICP算法中使用欧氏距离来寻找对应点的弊端，引入点对特征可以更好更准确的寻找到对应点来进行点云匹配。(4) Through the introduction of point pair features, the disadvantages of using Euclidean distance to find corresponding points in the original ICP algorithm are reduced. The introduction of point pair features can better and more accurately find corresponding points for point cloud matching.

附图说明Description of drawings

图1为本发明实施例的结合人脸关键点检测的人脸相对姿态估计方法的步骤流程图；Fig. 1 is the flow chart of the steps of the method for estimating the relative pose of a human face combined with key point detection of a human face according to an embodiment of the present invention;

图2为本发明实施例的相对姿态估计的损失统计。Fig. 2 is the loss statistics of the relative pose estimation according to the embodiment of the present invention.

具体实施方案specific implementation plan

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应该理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

相反，本发明涵盖任何由权利要求定义的在本发明的精髓和范围上做的替代、修改、等效方法以及方案。进一步，为了使公众对本发明有更好的了解，在下文对本发明的细节描述中，详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述可以完全理解本发明。On the contrary, the invention covers any alternatives, modifications, equivalent methods and schemes within the spirit and scope of the invention as defined by the claims. Further, in order to make the public have a better understanding of the present invention, some specific details are described in detail in the detailed description of the present invention below. Without the description of these detailed parts, the present invention can be fully understood by those skilled in the art.

参考图1为本发明实施例的结合人脸关键点检测的人脸相对姿态估计方法的步骤流程图。Referring to FIG. 1 , it is a flow chart of steps of a method for estimating a relative pose of a human face combined with key point detection of a human face according to an embodiment of the present invention.

对源数据Data_source＝{源图像Image_source，源深度数据Depth_source}和目标数据Data_source＝{目标图像Image_target，目标深度数据Depth_target}，对其进行以下步骤处理：For source data Data _source ={source image Image _source , source depth data Depth _source } and target data Data _source ={target image Image _target , target depth data Depth _target }, it is processed in the following steps:

1.检测人脸关键点。具体地：1. Detect face key points. specifically:

(1.1)导入第三方库Dlib，并导入人脸关键点检测模型shape_predictor_68_landmarks.dat(1.1) Import the third-party library Dlib, and import the face key point detection model shape_predictor_68_landmarks.dat

(1.2)将Image_source和Image_target送入检测模型中，进行关键点的识别，得到图像中人脸的关键点坐标{X_snum，Y_snum}和{X_tnum，Y_tnum}，其中snum，tnum＝1，...，68(1.2) Send the Image _source and Image _target into the detection model to identify key points, and obtain the key point coordinates {X _snum , Y _snum } and {X _tnum , Y _tnum } of the face in the image, where snum, tnum =1,...,68

2.选取较为稳定的关键点进行相对位置的初步估计。具体地：2. Select relatively stable key points for preliminary estimation of relative positions. specifically:

(2.1)选取较为稳定的5个关键点，分别为2个外眼角、2个外嘴角、鼻尖，对应的关键点编号分别为snum，tnum＝31,37,46,49，55，得到两组关键点{KeyPoint_snum}和{KeyPoint_tnum}(2.1) Select 5 relatively stable key points, namely 2 outer corners of the eyes, 2 outer corners of the mouth, and the tip of the nose. The corresponding key points are numbered snum, tnum=31, 37, 46, 49, 55, and two groups are obtained keypoint {KeyPoint _snum } and {KeyPoint _tnum }

(2.2)得到两组关键点{KeyPoint_snum}和{KeyPoint_tnum}后，导入第三方库Opencv，使用其中的estimateAffine3D()函数来对两组关键点进行配准，得到初始旋转矩阵R_start (2.2) After obtaining two sets of key points {KeyPoint _snum } and {KeyPoint _tnum }, import the third-party library Opencv, use the estimateAffine3D() function to register the two sets of key points, and get the initial rotation matrix R _start

3.人脸区域裁剪。将源图像的人脸裁剪出较大的区域，目标图像人脸裁剪出较小的区域，具体地：3. Crop the face area. Crop the face of the source image into a larger area, and the face of the target image into a smaller area, specifically:

(3.1)人脸的裁剪区域为矩形，裁剪的数据不仅包含RGB，也包含Depth数据。矩形的四条边由关键点坐标{X_snum，Y_snum}和{X_tnum，Y_tnum}的两个维度上的最大值和最小值，共4个值，分别为(Rec_Left＝min{X_snum}，Rec_Right＝max(X_snum}，Rec_Up＝min(Y_snum}，Rec_Bottom＝max{Y_snum}}所决定。(3.1) The cropped area of the face is a rectangle, and the cropped data includes not only RGB, but also Depth data. The four sides of the rectangle consist of the key point coordinates {X _snum , Y _snum } and the maximum and minimum values on the two dimensions of {X _tnum , Y _tnum }, a total of 4 values, respectively (Rec _Left ＝min{X _snum }, Rec _Right = max(X _snum }, Rec _Up = min(Y _snum }, Rec _Bottom = max{Y _snum }}.

(3.2)I_source裁剪相对较大的区域，因此裁剪区域I′_source的边界在上述4个最值的基础上再向外扩5个像素，分别为{Rec_Left-5，Rec_Right+5，Rec_Up-5，Rec_Bottom+5}(3.2) I _source crops a relatively large area, so the boundary of the cropped area I' _source is expanded by 5 pixels on the basis of the above-mentioned 4 most values, which are respectively {Rec _Left -5, Rec _Right +5, Rec _Up -5, Rec _Bottom +5}

(3.3)I_target裁剪相对较小的区域，因此裁剪区域I′_target的边界在上述4个最值的基础上再向内缩5个像素，分别为(Rec_Left+5，Rec_Right-5，Rec_Up+5，Rec_Bottom-5}。(3.3) The I _target crops a relatively small area, so the boundary of the cropped area I' _target shrinks inward by 5 pixels on the basis of the above-mentioned 4 most values, respectively (Rec _Left +5, Rec _Right -5, Rec _Up +5, Rec _Bottom -5}.

4.对裁剪区域的点云数据进行匹配，具体地：4. Match the point cloud data of the clipping area, specifically:

(4.1)裁剪的数据中包含(x，y，depth，R，G，B}。匹配的过程使用的是迭代最近点的方法，迭代之前会对数据进行下采样的预处理，使用的是随机采样，选自Python模块random中的sample()函数。使用不同比例的下采样可以在迭代速度与迭代精度之间进行权衡，一般比例为b＝0.1，0.3,0.5，0.8(4.1) The cropped data contains (x, y, depth, R, G, B}. The matching process uses the method of iterating the closest point. Before the iteration, the data will be down-sampled and preprocessed, using random Sampling, selected from the sample() function in the Python module random.Using different proportions of downsampling can trade-off between iteration speed and iteration accuracy, the general proportion is b=0.1,0.3,0.5,0.8

(4.2)下采样过后的点云可以进行迭代最近点匹配来得到最后的结果。得到对应的人脸区域I″_source和I′_target，首先对于I′_target中的点，选取其中点集t∈I′_target，同时在I″_source中选取点集s∈I″_source。得到点集后，计算点集中每一点P_i∈t与点集中其他所有点P_j∈t并且i≠j的点对特征FP_ij，得到FP组成的矩阵FP_mat，大小为N_P*N_P，其中N_P为点集中点的数量。以同样方式计算Q_i∈s的点对特征FQ_ij组成的矩阵FQ_mat，大小为N_Q*N_Q，其中N_Q为点集中点的数量。(4.2) The point cloud after downsampling can be iteratively matched to the nearest point to get the final result. To obtain the corresponding face area I″ _source and I′ _target , first, for the points in I′ _target , select the point set t∈I′ _target , and select the point set s∈I″ _source in I″ _source at the same time. Get After the point set, calculate the point pair feature FP _ij of each point P _i ∈ t in the point set and all other points P _j ∈ t and i≠j in the point set, and obtain the matrix FP _mat composed of FP, the size of which is N _P * _NP , Where N _P is the number of points in the point set. In the same way, calculate the matrix FQ _mat composed of point pair features FQ _ij of Q _i ∈ s, and the size is N _Q *N _Q , where N _Q is the number of points in the point set.

(4.3)将点对特征矩阵FP_mat和FQ_mat中的特征FP_ij和FQ_ij进行哈希并保存到哈希表H中，对于比较相似的点对特征将分配相同的哈希值，由此来在两个点集t和s中找到对应的点对，分别为PP_t和PP_s。然后利用奇异值分解的方法，计算PP_t和PP_s之间的旋转矩阵R和空间平移T。迭代直到||PP_s-PP_t||小于一定阈值ε或迭代次数超过一定次数限制Iter_max后停止迭代。(4.3) Hash the features FP _ij and FQ _ij in the point pair feature matrix FP _mat and FQ _mat and save them in the hash table H, and assign the same hash value to similar point pair features, thus To find the corresponding point pairs in the two point sets t and s, respectively PP _t and PP _s . Then use the method of singular value decomposition to calculate the rotation matrix R and space translation T between PP _t and PP _s . Iterate until ||PP _s -PP _t || is less than a certain threshold ε or the number of iterations exceeds a certain limit Iter _max , then stop iterating.

5.对两次配准的结果进行综合。具体地：5. Synthesize the results of the two registrations. specifically:

步骤(2.2)中得到初始旋转矩阵R_start，步骤(4.3)中得到迭代过程中的一系列旋转矩阵R_i，其中i＝1，...，N。将矩阵进行连乘，得到最终的矩阵结果R_final。在已有R_final的情况下，用旋转矩阵与欧拉角之间的变换关系，最终可以得到相对姿态的估计结果(angle_x，angle_y，angle_z}The initial rotation matrix R _start is obtained in step (2.2), and a series of rotation matrices R _i in the iterative process are obtained in step (4.3), where i=1, . . . , N. Multiply the matrix to get the final matrix result R _final . In the case of the existing R _final , using the transformation relationship between the rotation matrix and the Euler angle, the estimation result of the relative attitude can be finally obtained (angle _x , angle _y , angle _z }

图2(a)为一对不同姿态人脸相对姿态估计的旋转角pitch的损失分布，横轴为损失大小，单位为°，刻度为0.1°。纵轴为样本数，总样本数约为200。Figure 2(a) shows the loss distribution of the rotation angle pitch of a pair of faces with different poses relative to pose estimation. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

图2(b)为一对不同姿态人脸相对姿态估计的旋转角yaw的损失分布，横轴为损失大小，单位为°，刻度为0.1°。纵轴为样本数，总样本数约为200。Figure 2(b) shows the loss distribution of the rotation angle yaw of the relative pose estimation of a pair of faces with different poses. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

图2(c)为一对不同姿态人脸相对姿态估计的旋转角roll的损失分布，横轴为损失大小，单位为°，刻度为0.1°。纵轴为样本数，总样本数约为200。Figure 2(c) shows the loss distribution of the rotation angle roll of the relative pose estimation of a pair of faces with different poses. The horizontal axis is the loss size, the unit is °, and the scale is 0.1°. The vertical axis is the number of samples, and the total number of samples is about 200.

上述结果说明，采用本发明方法的姿态估计结果精度很高。The above results show that the accuracy of the pose estimation results using the method of the present invention is very high.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims

1. A human face relative attitude estimation method combining human face key point detection is characterized by comprising the following steps:

step 1: for two RGB images I with corresponding angles _source And I _target Face key point detection is carried out, and a plurality of RGB-D three-dimensional coordinate points { KeyPoint' corresponding to key points are selected from the face key point detection _snum And { KeyPoint }and { KeyPoint _tnum And matching two groups of three-dimensional coordinate points to obtain an initial rotation matrix R of face pose relative estimation _start ；

Step 2: according to the two-dimensional key point coordinates in the step 1, in the RGB image I _source And I _target Respectively cutting areas with different sizes for matching; wherein I _source Region I 'of cutting' _source Is greater than I _target Region I 'to be cut' _target ；

And step 3: obtaining face regions I 'corresponding to the two postures in the step 2' _source And I' _target Down sampling is carried out to obtain a corresponding face area I ″) _source And I ″) _target For I ″) _target Selecting the point in the point set t belongs to I ″) _target While at I ″) _source Wherein the selected point set s belongs to I ″) _source (ii) a Calculate each point P in the set of points _i E t and all other points P in the point set _j Point pair characteristics FP of e t and i ≠ j _ij To obtain FP _ij Composed matrix FP _mat Size is N _P *N _P In which N is _P The number of points in the point set; q is calculated in the same manner _i E s point pair characteristic FQ _ij Composed matrix FQ _mat Size is N _Q *N _Q (ii) a Point pair feature matrix FP _mat And FQ _mat Feature FP in _ij And FQ _ij Hashing and storing the hashed values into a hash table H, and distributing the same hash value to the characteristics of the most matched point pairs, thereby finding corresponding point pairs in two point sets t and s and forming point sets which are respectively PP _t And PP _s (ii) a Calculating PP by singular value decomposition _t And PP _s A rotation matrix R and a spatial translation T in between; iterating until | | PP _s -PP _t Stopping iteration when | | is less than a certain threshold value epsilon or the iteration times reach a certain number;

and 4, step 4: the rotation matrix R obtained in the step 1 is processed _start Performing matrix multiplication calculation with the rotation matrix R obtained by iteration for multiple times in the step 3 and the space translation T to obtain a rotation matrix R _final (ii) a Obtaining the estimation result of the relative attitude { angle using the transformation relation between the rotation matrix and the Euler angle _x ，angle _y ，angle _z }。

2. The method according to claim 1, wherein in step 1, two groups of key points { KeyPoint ™ _snum And { KeyPoint }and { KeyPoint _tnum Match, number of matching points is fixed to snum, tnum =31,37,46,49,55, 2 external canthus, 2 external mouth canthus and nose tip respectively.

3. The method according to claim 1, wherein the clipping of the face region in step 2 is based on two-dimensional coordinates of key points of the face, and the method specifically comprises the following steps:

the cutting area of the face is a rectangle, and the four sides of the rectangle are provided with key point coordinates { X _snum ，Y _snum And { X } _tnum ，Y _tnum Maximum and minimum decisions in two dimensions of the }; wherein, I _source Region I 'of cutting' _source Is expanded by 5 pixels on the basis of 4 maxima, and I _target Region I 'of cutting' _target The boundary of (2) is further shrunk by 5 pixels on the basis of 4 maxima.

4. The method according to claim 1, wherein in the step 3, the point pair feature in the point set is calculated by using six-dimensional point cloud data { x, y, depth, R, G, B } to obtain a ten-dimensional feature descriptor: FP _ij ＝CPPF(PPF(p _i ，p _j ，n _i ，n _j )，c _i ，c _j ) Or a four-dimensional feature descriptor obtained by using three-dimensional point cloud data { x, y, depth }: FP _ij ＝PPF(p _i ，p _j ，n _i ，n _j ) (ii) a Wherein c is _i ，

Representing an RGB vector, PPF (p) _i ，p _j ，n _i ，n _j )＝(||d|| ₂ ，∠(n _i ，d)，∠(n _j ，d)，∠(n _i ，n _j ) Wherein d = p) _i -p _j ，p _i ，p _j Coordinates representing selected points of the surface of the object, n _i ，n _j Normal vectors of selected points are represented, an included angle between vectors is represented, and CPPF represents colored PPF.

5. The method according to claim 1, wherein in step 3, in the iterative process, after each calculation to obtain the rotation matrix R and the spatial translation T, the pair I ″ "is required _target Is transformed so that it updates its position before the next iteration to form a new point pair feature.