CN117421384A

CN117421384A - Visual inertia SLAM system sliding window optimization method based on common view projection matching

Info

Publication number: CN117421384A
Application number: CN202311384144.7A
Authority: CN
Inventors: 班喜程; 尤波; 栾添添; 孙明晓; 胥静; 吕奉坤; 马继瑞; 王鑫源
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-01-19
Anticipated expiration: 2043-10-24
Also published as: CN117421384B

Abstract

A sliding window optimization method of a visual inertia SLAM system based on common view projection matching relates to the technical field of SLAM. The invention aims to solve the problem of low precision of a sliding window optimization method at the rear end of the existing visual inertia SLAM system. According to the invention, on the premise of not changing the rapidity of sliding window optimization, the sliding window optimization method is improved through the co-view projection matching relationship, so that a shadow map point is eliminated, and the map point precision is improved; the number of map points successfully matched with the key frames is increased, the observation field of the key frames is enlarged, and the visual observation constraint of the SLAM system is newly increased, so that the precision of the sliding window for optimizing and estimating the pose is improved, and the positioning precision of the mobile robot is further improved.

Description

Sliding window optimization method for visual-inertial SLAM system based on common-view projection matching

技术领域Technical field

本发明属于SLAM技术领域。The invention belongs to the technical field of SLAM.

背景技术Background technique

SLAM(Simultaneous Localization and Mapping，同时定位与地图构建)技术能够使移动机器人实现自主导航定位与规划，通过移动机器人自身携带的一种或多种传感器设备，感知环境信息，测量自身姿态，实现对移动机器人自身的位姿跟踪，定位其相对于周围环境的位置，同时构建出环境地图。SLAM系统可分为前端和后端两部分，前端实现环境感知、位姿跟踪、回环检测等功能，后端实现位姿优化、轨迹修正、地图构建等功能。前端主要完成数据处理相关的工作，以便SLAM进行实时的显示与更新交互信息，后端主要完成消除累积误差等计算量大且耗时的优化相关工作。SLAM的后端优化方法分为滤波法和非线性优化法，滤波法根据贝叶斯理论通过预测和测量信息估计出移动机器人位姿的后验概率，从而实现对其位姿的更新，但是协方差矩阵和状态量是平方增长关系，导致数据量过大，不适合大型场景环境下使用；非线性优化方法则将移动机器人的位姿信息、测量约束关系，甚至地图点的位置信息，全部纳入到因子图模型中，通过非线性优化的方法实现对其状态估计的整体更新。SLAM (Simultaneous Localization and Mapping) technology enables mobile robots to achieve autonomous navigation, positioning and planning. Through one or more sensor devices carried by the mobile robot itself, it can sense environmental information, measure its own posture, and realize mobile control. The robot tracks its own pose, locates its position relative to the surrounding environment, and constructs an environment map. The SLAM system can be divided into two parts: front-end and back-end. The front-end implements functions such as environment perception, pose tracking, and loop detection, while the back-end implements functions such as pose optimization, trajectory correction, and map construction. The front-end mainly completes the work related to data processing so that SLAM can display and update interactive information in real time. The back-end mainly completes the calculation-intensive and time-consuming optimization related work such as eliminating accumulated errors. The back-end optimization method of SLAM is divided into filtering method and nonlinear optimization method. The filtering method estimates the posterior probability of the mobile robot's pose through prediction and measurement information based on Bayesian theory, thereby updating its pose, but the coordination The variance matrix and the state quantity have a square growth relationship, which results in an excessive amount of data and is not suitable for use in large-scale scene environments; the nonlinear optimization method incorporates the mobile robot's pose information, measurement constraints, and even the location information of map points. Into the factor graph model, the overall update of its state estimate is achieved through nonlinear optimization.

当前，非线性优化方法是SLAM后端处理的主流方案，为了实现快速修正和更新移动机器人的位姿状态的目的，SLAM后端通常采用局部优化的方式，即选择一部分关键帧进行局部位姿优化，从而加快后端优化的响应速度。滑动窗口法是经典的局部区域筛选优化方法，通过设置固定数量关键帧的滑窗，在该局部区域内对视觉与惯性测量建立约束关系模型，然后优化该模型。视觉惯性融合的SLAM后端优化常用滑动窗口法进行局部区域筛选，典型的有中国香港科技大学沈邵劼团队所著的“VINS-Mono:A Robust and VersatileMonocular Visual-Inertial State Estimator(一种鲁棒性强、通用性强的单目视觉惯性状态估计器)”，以及Leutenegger等人所著的“Keyframe-based Visual-inertial SLAMusing Nonlinear Optimization(基于关键帧的视觉惯性SLAM非线性优化)”。滑动窗口法通过舒尔补操作更新滑窗内关键帧，在保留关键帧之间约束关系的前提对滑入或滑出的关键帧进行边缘化，保证滑窗内关键帧的数量不变，从而能够使SLAM系统在较快的时间内完成优化计算，然而优化的精度则不能保证，优化的精度完全由关键帧的质量和误差约束关系决定。因此，传统的视觉惯性SLAM系统有待将进一步改进其后端的滑动窗口优化方法，以提高其优化精度。Currently, the nonlinear optimization method is the mainstream solution for SLAM back-end processing. In order to quickly correct and update the pose state of the mobile robot, the SLAM back-end usually adopts local optimization, that is, selecting a part of key frames for local pose optimization. , thereby speeding up the response speed of back-end optimization. The sliding window method is a classic local area screening and optimization method. By setting a sliding window with a fixed number of key frames, a constraint relationship model is established between visual and inertial measurements in the local area, and then the model is optimized. The sliding window method is commonly used for local area screening in SLAM back-end optimization of visual-inertial fusion. A typical example is "VINS-Mono: A Robust and VersatileMonocular Visual-Inertial State Estimator" written by Shen Shaojie's team at the Hong Kong University of Science and Technology. "Strong and versatile monocular visual inertial state estimator", and "Keyframe-based Visual-inertial SLAM using Nonlinear Optimization" by Leutenegger et al. The sliding window method updates the keyframes in the sliding window through Shure compensation operation, and marginalizes the keyframes sliding in or out while retaining the constraint relationship between keyframes, ensuring that the number of keyframes in the sliding window remains unchanged, thus It enables the SLAM system to complete optimization calculations in a faster time, but the accuracy of the optimization cannot be guaranteed. The accuracy of the optimization is completely determined by the quality of the key frames and the error constraint relationship. Therefore, the traditional visual-inertial SLAM system needs to further improve its back-end sliding window optimization method to improve its optimization accuracy.

发明内容Contents of the invention

本发明是为了解决现有视觉惯性SLAM系统后端的滑动窗口优化方法精度低的问题，现提供基于共视投影匹配的视觉惯性SLAM系统滑窗优化方法。The present invention is to solve the problem of low accuracy of the sliding window optimization method at the back end of the existing visual inertial SLAM system, and now provides a sliding window optimization method for the visual inertial SLAM system based on common view projection matching.

本发明所述基于共视投影匹配的视觉惯性SLAM系统滑窗优化方法，包括以下步骤：The sliding window optimization method of the visual inertial SLAM system based on common view projection matching according to the present invention includes the following steps:

步骤一：当视觉惯性SLAM系统的后端模块订阅到其前端模块发布的机器人最新帧图像时，判断滑动窗口是否已满，是则执行步骤二，否则执行步骤四；Step 1: When the back-end module of the visual-inertial SLAM system subscribes to the latest frame image of the robot published by its front-end module, it determines whether the sliding window is full. If so, proceed to step 2, otherwise proceed to step 4;

步骤二：判断次新帧图像是否为关键帧，是则将滑动窗口内最老帧图像边缘化然后执行步骤三，否则将次新帧图像边缘化然后执行步骤三；Step 2: Determine whether the next-newest frame image is a key frame. If so, marginalize the oldest frame image in the sliding window and then perform step three. Otherwise, marginalize the next-newest frame image and then perform step three;

步骤三：根据舒尔补理论重构滑动窗口的约束关系，然后执行步骤四；Step 3: Reconstruct the constraint relationship of the sliding window according to Schur's complement theory, and then perform step 4;

步骤四：根据最新帧图像与滑动窗口内当前最新关键帧图像的视差判断所述最新帧图像是否为关键帧，是则将滑动窗口内的地图点投影到所述最新帧图像上，所述地图点为滑动窗口内能够与所述最新帧图像共视的关键帧所观测到的地图点，然后执行步骤五，否则更新最新帧图像对应的IMU约束关系，然后返回步骤一；Step 4: Determine whether the latest frame image is a key frame based on the disparity between the latest frame image and the current latest key frame image in the sliding window. If so, project the map points in the sliding window onto the latest frame image. The map The point is a map point observed by a key frame within the sliding window that can be co-viewed with the latest frame image, and then perform step five, otherwise update the IMU constraint relationship corresponding to the latest frame image, and then return to step one;

步骤五：将最新帧图像上的投影点与特征点进行匹配，将未匹配到特征点的投影点删除，同时判断有相匹配投影点的特征点是否存在原地图点，所述原地图点为特征点在进行匹配之前滑动窗口内就存在能够与之对应的地图点，是则执行步骤六，否则执行步骤七；Step 5: Match the projection points on the latest frame image with the feature points, delete the projection points that do not match the feature points, and at the same time determine whether the feature points with matching projection points have original map points. The original map points are There are map points corresponding to the feature points in the sliding window before matching. If so, proceed to step 6; otherwise, proceed to step 7;

步骤六：比较投影点对应的地图点与特征点对应的原地图点的被观测次数，仅保留被观测次数多的地图点，使得滑窗内各关键帧的视觉观测约束得到更新，然后执行步骤八；Step 6: Compare the observed times of the map points corresponding to the projected points and the original map points corresponding to the feature points, and only retain the map points that have been observed more times, so that the visual observation constraints of each key frame in the sliding window are updated, and then execute the steps eight;

步骤七：增加新的地图点，使得所述最新帧图像与新增地图点之间建立观测约束，然后执行步骤八；Step 7: Add new map points to establish observation constraints between the latest frame image and the new map points, and then perform step 8;

步骤八：判断滑动窗口内关键帧数量是否已满，是则执行步骤九，否则返回步骤一；Step 8: Determine whether the number of keyframes in the sliding window is full, if so, proceed to step 9, otherwise return to step 1;

步骤九：分别构建视觉约束因子、IMU预积分约束因子以及边缘化先验约束因子，将所述视觉约束因子、IMU预积分约束因子以及边缘化先验约束因子加入到滑窗因子图优化模型中构建目标函数；Step 9: Construct the visual constraint factor, IMU pre-integration constraint factor and marginalization prior constraint factor respectively, and add the visual constraint factor, IMU pre-integration constraint factor and marginalization prior constraint factor to the sliding window factor graph optimization model Construct the objective function;

步骤十：利用非线性优化算法对所述目标函数进行优化，更新滑动窗口内机器人的状态量，完成滑窗优化。Step 10: Use the nonlinear optimization algorithm to optimize the objective function, update the state quantity of the robot in the sliding window, and complete the sliding window optimization.

进一步的，上述步骤二中所述将滑动窗口内最老帧图像边缘化，包括：Further, as described in step 2 above, the oldest frame image in the sliding window is marginalized, including:

将滑动窗口内最老帧图像及其对应的IMU约束关系删除，并保留最老帧图像的视觉约束关系；Delete the oldest frame image and its corresponding IMU constraint relationship in the sliding window, and retain the visual constraint relationship of the oldest frame image;

步骤二中所述将次新帧图像边缘化，包括：Marginalize the next latest frame image as described in step 2, including:

将次新帧图像及其视觉约束关系删除，并保留次新帧图像对应的IMU约束关系。Delete the next-new frame image and its visual constraint relationship, and retain the IMU constraint relationship corresponding to the next-new frame image.

进一步的，上述步骤三所述根据舒尔补理论重构滑动窗口的约束关系，包括：Further, as described in step three above, the constraint relationship of the sliding window is reconstructed according to Schur's complement theory, including:

将滑动窗口内所有图像帧的状态变量χ以增量方程的方式表示如下：The state variable χ of all image frames in the sliding window is expressed as an incremental equation as follows:

其中，δχ_d为待边缘化的状态向量，δχ_s为保留的状态向量，H_a为δχ_d的协方差矩阵，H_c为δχ_s的协方差矩阵，表示H_b的转置，H_b为δχ_d与δχ_s之间的协方差矩阵，b_d和b_s分别为δχ_d和的δχ_s常数向量；Among them, δχ _d is the state vector to be marginalized, δχ _s is the retained state vector, H _a is the covariance matrix of δχ _d , H _c is the covariance matrix of δχ _s , Represents the transpose of H _b , H _b is the covariance matrix between δχ _d and δχ _s , b _d and b _s are the δχ _s constant vectors of δχ _d and δχ s respectively;

根据舒尔补理论对状态变量χ的增量方程进行高斯消元并推导获得：According to Schur's complement theory, Gaussian elimination is performed on the incremental equation of state variable χ and deduced to obtain:

进而根据上式获得保留的状态向量δχ_s，实现滑动窗口约束关系的重构。Then, the retained state vector δχ _s is obtained according to the above formula to realize the reconstruction of the sliding window constraint relationship.

进一步的，上述步骤五中，若最新帧图像上的投影点与特征点的描述子匹配，则所述投影点与所述特征点匹配。Further, in the above step 5, if the projection point on the latest frame image matches the descriptor of the feature point, then the projection point matches the feature point.

进一步的，上述步骤九中，所述视觉约束因子的待优化状态量χ_c为：Further, in the above step nine, the state quantity χ _c of the visual constraint factor to be optimized is:

其中，和/>分别为相机在第i帧和第j帧的位置，/>和/>分别为相机在第i帧和第j帧的姿态，所述第i帧和第j帧为相邻的两个关键帧，/>和/>分别为相机相对IMU的位置和旋转量，λ为地图点的逆深度。in, and/> are the positions of the camera in the i-th frame and j-th frame respectively,/> and/> are the postures of the camera in the i-th frame and j-th frame respectively, and the i-th frame and j-th frame are two adjacent key frames,/> and/> are the position and rotation amount of the camera relative to the IMU respectively, and λ is the inverse depth of the map point.

进一步的，上述步骤九中，所述IMU预积分约束因子的待优化变量χ_imu为：Further, in the above step nine, the variable to be optimized χ _imu of the IMU pre-integration constraint factor is:

其中，和/>分别为t_i时刻和t_j时刻IMU的速度，/>和/>分别为t_i时刻和t_j时刻IMU的加速度计偏置量，/>和/>分别为t_i时刻和t_j时刻IMU的陀螺仪偏置量，所述t_i时刻和t_j时刻分别为相机拍摄第i帧图像和第j帧图像的时刻。in, and/> are the speeds of the IMU at time t _i and time t _j respectively,/> and/> are the accelerometer bias of the IMU at time t _i and time t _j respectively,/> and/> are the gyroscope offsets of the IMU at time t _i and time t _j respectively, and time t _i and time t _j are the times when the camera captures the i-th frame image and the j-th frame image, respectively.

进一步的，上述步骤九中，所述目标函数表达式如下：Further, in the above step nine, the objective function expression is as follows:

其中，χ为滑动窗口内所有图像帧的状态变量且有k∈[0,N]，λ_m为第m个地图点的逆深度，m为关键帧图像观测到地图点的总数，N为滑动窗口内所有帧图像的总数，第k帧的状态向量和/>分别为相机在第k帧的位置、速度和姿态，b_a和b_ω分别为IMU的加速度计偏置量和陀螺仪偏置量，/>和/>分别为相机相对IMU的位置和旋转量，e_C(x_i,x_j)、e_B(x_i,x_j)和e_M分别表示视觉约束残差、IMU预积分约束残差和边缘化先验约束残差，x_i和x_j分别为第i帧图像和第j帧图像的状态向量。Among them, χ is the state variable of all image frames in the sliding window and has k∈[0,N], λ _m is the inverse depth of the m-th map point, m is the total number of map points observed in the key frame image, N is the total number of all frame images in the sliding window, and the state vector of the k-th frame and/> are the position, speed and attitude of the camera in the k-th frame respectively, b _a and b _ω are the accelerometer offset and gyroscope offset of the IMU respectively, /> and/> are the position and rotation of the camera relative to the IMU respectively, e _C (x _i ,x _j ), e _B (x _i ,x _j ) and e _M respectively represent the visual constraint residual, IMU pre-integration constrained residual and marginalization first Experimentally constrained residual, x _i and x _j are the state vectors of the i-th frame image and j-th frame image respectively.

进一步的，上述视觉约束残差e_C(x_i,x_j)表达式如下：Further, the expression of the above visual constraint residual e _C (x _i ,x _j ) is as follows:

其中，为空间内地图点P在第j帧图像的归一化平面上的投影坐标，/>为空间内地图点P第j帧图像中第一次被观测到时的观测坐标。in, is the projection coordinate of map point P in the space on the normalized plane of the j-th frame image,/> is the observation coordinate of map point P in the space when it is first observed in the j-th frame image.

进一步的，上述IMU预积分约束残差e_B(x_i,x_j)表达式为：Further, the expression of the above IMU pre-integration constrained residual e _B (x _i ,x _j ) is:

其中，r_p、r_q、r_v、分别为t_i时刻至t_j时刻机器人的位置残差、姿态残差、速度残差、加速度计残差和陀螺仪残差，所述t_i时刻和t_j时刻分别为相机拍摄第i帧图像和第j帧图像的时刻。Among them, r _p , r _q , r _v , are the position residual, attitude residual, speed residual, accelerometer residual and gyroscope residual of the robot from time t _i to time t j respectively. The time t _i and time _t _j are respectively the i-th frame image taken by the camera. and the moment of the jth frame image.

进一步的，上述边缘化先验约束残差e_M表达式为：Furthermore, the expression of the above marginalized prior constraint residual e _M is:

e_M＝b_p-H_pδχ_1:N，e _M =b _p -H _p δχ _1:N ,

其中，δχ_1:N为滑动窗口内所有关键帧的状态增量，Among them, δχ _1:N is the state increment of all key frames in the sliding window,

H_a为待边缘化的状态向量δχ_d的协方差矩阵，H_c为保留的状态向量δχ_s的协方差矩阵，表示H_b的转置，H_b为δχ_d与δχ_s之间的协方差矩阵，b_d和b_s分别为δχ_d和的δχ_s常数向量。H _a is the covariance matrix of the state vector δχ _d to be marginalized, H _c is the covariance matrix of the retained state vector δχ _s , Represents the transpose of H _b , H _b is the covariance matrix between δχ _d and δχ _s , b _d and b _s are the constant vectors of δχ _d and δχ _s respectively.

发明的有益效果如下：The beneficial effects of the invention are as follows:

本发明通过对SLAM后端的滑动窗口优化方法进行改进，提出了一种基于共视投影匹配的视觉惯性SLAM系统滑窗优化方法。The present invention improves the sliding window optimization method of the SLAM backend and proposes a visual inertial SLAM system sliding window optimization method based on common view projection matching.

该方法通过对滑窗内共视地图点的投影匹配，解决了同一地图点在SLAM不同运行阶段或不同线程中可能被多次创建而导致重影的问题，将这些“影子”地图点融合为一个地图点，提高了地图点的观测精度，从而提高SLAM系统估计移动机器人的位姿的精度。This method solves the problem that the same map point may be created multiple times in different running stages or different threads of SLAM, resulting in ghosting, by matching the projection of common view map points in the sliding window, and fuses these "shadow" map points into A map point improves the observation accuracy of the map point, thereby improving the accuracy of the SLAM system in estimating the pose of the mobile robot.

进一步的，本发明通过对共视匹配的投影点的筛选，增加了最新关键帧成功匹配地图点的数量，扩大了关键帧的观测视野，新增了SLAM系统的视觉观测约束，从而提高了滑动窗口优化估计位姿的精度，进而提高了SLAM系统定位移动机器人位置的精度。Furthermore, by screening common-view matching projection points, the present invention increases the number of map points successfully matched by the latest key frame, expands the observation field of view of the key frame, and adds visual observation constraints of the SLAM system, thereby improving the sliding The window optimizes the accuracy of the estimated pose, thereby improving the accuracy of the SLAM system in locating the position of the mobile robot.

进一步的，本发明不改变滑动窗口优化区域的大小，滑动窗口可容纳待优化关键帧的数量固定不变，能够保证优化运算的快速性，从而能够保证视觉惯性SLAM运行的快速性，同时兼顾提高了SLAM对移动机器人定位和位姿估计的精度。Furthermore, the present invention does not change the size of the sliding window optimization area. The sliding window can accommodate a fixed number of key frames to be optimized, which can ensure the speed of the optimization operation, thereby ensuring the speed of visual inertial SLAM operation, while taking into account the improvement of The accuracy of SLAM in positioning and pose estimation of mobile robots is improved.

附图说明Description of the drawings

图1为视觉惯性SLAM系统的流程框图；Figure 1 is the flow chart of the visual-inertial SLAM system;

图2为具体实施方式所述基于共视投影匹配的视觉惯性SLAM系统滑窗优化方法的流程图；Figure 2 is a flow chart of the sliding window optimization method of the visual-inertial SLAM system based on common-view projection matching according to the specific embodiment;

图3为当次新帧图像为关键帧的情况下，滑动窗口更新过程示意图；Figure 3 is a schematic diagram of the sliding window update process when the new frame image is a key frame;

图4为当次新帧图像不为关键帧的情况下，滑动窗口更新过程示意图；Figure 4 is a schematic diagram of the sliding window update process when the new frame image is not a key frame;

图5为当前关键帧与其共视关键帧的共视关系示意图；Figure 5 is a schematic diagram of the common view relationship between the current key frame and its common view key frame;

图6为滑动窗口内最新关键帧与其共视关键帧的投影匹配关系示意图；Figure 6 is a schematic diagram of the projection matching relationship between the latest key frame in the sliding window and its common view key frame;

图7为视觉惯性融合的滑动窗口因子图优化模型示意图；Figure 7 is a schematic diagram of the sliding window factor graph optimization model of visual inertia fusion;

图8为our-VINS-Mono在Machine Hall 01数据集上估计出的无人机飞行轨迹图；Figure 8 shows the UAV flight trajectory estimated by our-VINS-Mono on the Machine Hall 01 data set;

图9为图8中的飞行轨迹在x、y、z三个坐标轴上的位移量变化与时间的关系曲线；Figure 9 is the relationship curve between the displacement change and time of the flight trajectory in Figure 8 on the three coordinate axes of x, y, and z;

图10为采用绝对位姿误差指标(APE)评估图8的飞行轨迹的APE曲线与时间的变化关系图，并给出了全局轨迹的均方根误差(rmse)、误差中位数(median)、误差均值(mean)和误差标准差(std)；Figure 10 is a graph showing the relationship between the APE curve and time using the absolute pose error index (APE) to evaluate the flight trajectory in Figure 8, and gives the root mean square error (rmse) and error median (median) of the global trajectory. , error mean (mean) and error standard deviation (std);

图11为VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹的APE误差箱线图的对比图；Figure 11 is a comparison of the APE error box plots of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set;

图12为VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹，在不同误差评价指标下的APE误差量的对比图；Figure 12 is a comparison chart of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set, and the APE error amount under different error evaluation indicators;

图13为VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹，轨迹误差的密度分布与APE误差关系的对比图。Figure 13 is a comparison chart of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set, and the relationship between the density distribution of trajectory errors and the APE error.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例，都属于本发明保护的范围。需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without any creative work fall within the scope of protection of the present invention. It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.

参照图1至图13具体说明本实施方式，本实施方式所述的基于共视投影匹配的视觉惯性SLAM系统滑窗优化方法，包括以下步骤：This embodiment will be described in detail with reference to Figures 1 to 13. The visual-inertial SLAM system sliding window optimization method based on common view projection matching described in this embodiment includes the following steps:

步骤1：设置滑动窗口大小，使其能够容纳待优化的关键帧数量为N。Step 1: Set the size of the sliding window so that it can accommodate the number of keyframes to be optimized to N.

步骤2：利用视觉惯性SLAM系统的后端模块订阅前端模块发布的移动机器人的位姿状态信息，当后端模块采集到一个新的图像帧后，判断滑动窗口是否已满，若已满，则需要更新滑动窗口，执行步骤3；若未满，则执行步骤4。Step 2: Use the back-end module of the visual-inertial SLAM system to subscribe to the pose status information of the mobile robot published by the front-end module. When the back-end module collects a new image frame, it determines whether the sliding window is full. If it is full, then If the sliding window needs to be updated, go to step 3; if it is not full, go to step 4.

步骤3：判断最新帧图像与滑动窗口内当前最新关键帧图像的视差是否达到阈值。Step 3: Determine whether the disparity between the latest frame image and the current latest key frame image in the sliding window reaches the threshold.

若达到阈值，则次新帧图像是关键帧，此时需要将滑窗内的最老帧边缘化，即：将滑动窗口内最老帧图像及其对应的IMU约束关系删除，并保留最老帧图像的视觉约束关系，也就是将最老帧观测到的地图点约束关系寄存于滑窗内的与最老帧有共视关系的关键帧上。根据舒尔补理论重构滑动窗口的约束关系，然后执行步骤5。If the threshold is reached, the next-newest frame image is a key frame. At this time, the oldest frame in the sliding window needs to be marginalized, that is, the oldest frame image in the sliding window and its corresponding IMU constraint relationship are deleted, and the oldest frame is retained. The visual constraint relationship of the frame image is to store the constraint relationship of the map points observed in the oldest frame on the key frame in the sliding window that has a common view relationship with the oldest frame. Reconstruct the constraint relationship of the sliding window according to Schur's complement theory, and then perform step 5.

若未达到阈值，则次新帧图像不是关键帧，此时则将次新帧图像边缘化，即：将滑动窗口内次新帧图像及其视觉约束关系删除，并保留次新帧图像对应的IMU约束关系。根据舒尔补理论重构滑动窗口的约束关系，然后执行步骤5。If the threshold is not reached, the next-new frame image is not a key frame. At this time, the next-new frame image will be marginalized, that is, the next-new frame image and its visual constraint relationship in the sliding window will be deleted, and the corresponding image of the next-new frame image will be retained. IMU constraint relationship. Reconstruct the constraint relationship of the sliding window according to Schur's complement theory, and then perform step 5.

步骤3的内容实现了对滑动窗口内关键帧的更新功能，根据次新帧是否为关键帧，决定对滑动窗口进行滑入更新或滑出更新，若次新帧为关键帧，则对滑窗内的最老帧进行边缘化，即滑出更新。如图3给出了在次新帧是关键帧的情况下滑动窗口更新示意图。若次新帧不为关键帧，则对次新帧进行边缘化，即滑入更新。如图4给出了次新帧不是关键帧的情况下的滑动窗口更新示意图。The content of step 3 implements the update function of key frames in the sliding window. Depending on whether the next new frame is a key frame, it is decided to slide in or slide out to update the sliding window. If the next new frame is a key frame, the sliding window is updated. The oldest frame within the frame is marginalized, that is, it slides out of the update. Figure 3 shows a schematic diagram of the sliding window update when the next new frame is a key frame. If the next new frame is not a key frame, the next new frame will be marginalized, that is, it will be slid into the update. Figure 4 shows a schematic diagram of the sliding window update when the next new frame is not a key frame.

对滑动窗口进行更新的目的是在保留关键帧约束关系的前提下，对滑动窗口区域内的关键帧进行新增和删除。选择滑入更新还是滑出更新，其本质是使滑动窗口内的关键帧具有最大的视野范围，避免这些关键帧的视野狭小且重叠。将关键帧从滑窗内删除，但仍保留该关键帧约束关系的过程称为将该关键帧从滑窗内边缘化，通常采用舒尔补方法实现。The purpose of updating the sliding window is to add and delete keyframes in the sliding window area while retaining the keyframe constraint relationships. The essence of choosing slide-in update or slide-out update is to make the keyframes in the sliding window have the largest field of view and avoid the narrow and overlapping field of view of these keyframes. The process of deleting a keyframe from the sliding window but still retaining the constraint relationship of the keyframe is called marginalizing the keyframe from the sliding window, and is usually implemented using the Shure complement method.

滑动窗口内所有图像帧的状态变量χ在滑动窗口内构建最小二乘表达式，并简化为Hδχ＝b的形式，H为系统协方差矩阵，δχ为状态量的增量，b为一个常数向量。将状态变量χ分为两类，分别为待边缘化的状态量δχ_d和需要保留的状态量δχ_s。将系统协方差矩阵H拆分为四个分块矩阵H_a、H_c、H_b，H_a为δχ_d的协方差矩阵，即被边缘化的变量；H_c为δχ_s的协方差矩阵，即被保留的状态变量的协方差矩阵；/>表示H_b的转置，H_b为δχ_d与δχ_s之间的协方差矩阵。The state variable χ of all image frames in the sliding window is constructed as a least squares expression in the sliding window and simplified to the form Hδχ = b. H is the system covariance matrix, δχ is the increment of the state quantity, and b is a constant vector. . The state variable χ is divided into two categories, namely the state quantity δχ _d to be marginalized and the state quantity δχ _s to be retained. Split the system covariance matrix H into four block matrices Ha, H _c _, H _b , H _a is the covariance matrix of δχ _d , that is, the marginalized variables; H _c is the covariance matrix of δχ _s , that is, the covariance matrix of the retained state variables;/> Represents the transpose of H _b , _which is the covariance matrix between δχ _d and δχ _s .

将状态变量χ以增量方程的方式表示如下：The state variable χ is expressed as an incremental equation as follows:

其中，b_d和b_s分别为δχ_d和的δχ_s常数向量。Among them, b _d and b _s are the constant vectors of δχ _d and δχ _s respectively.

根据舒尔补理论，在不计算待边缘化状态量δχ_d的情况下，求解需要保留的状态量δχ_s，进行高斯消元，可得：According to Schur's complement theory, without calculating the state quantity δχ _d to be marginalized, solving the state quantity δχ _s that needs to be retained and performing Gaussian elimination, we can get:

其中，表示H_a在H_b中的舒尔补，根据上式可以推导出δχ_s为：in, Represents the Schur complement of H _a in H _b . According to the above formula, δχ _s can be derived as:

上式中的状态量只有δχ_s，使得被边缘化的关键帧的约束信息得以保留，满足在不损失精度前提下将边缘化关键帧的状态量剔除。在边缘化的过程中，虽然状态变量会不断更新，但采用优化一致性的策略固定线性化点，即计算边缘化雅可比矩阵时，求导变量的值是固定的，而不是用每次迭代更新后的状态变量求雅可比，以保证系统不可观的维度，避免协方差矩阵的零空间发生变化。The state quantity in the above formula is only δχ _s , so that the constraint information of the marginalized key frames can be retained, and the state quantity of the marginalized key frames can be eliminated without losing accuracy. In the process of marginalization, although the state variables will be continuously updated, the linearization point is fixed using the optimization consistency strategy, that is, when calculating the marginalization Jacobian matrix, the value of the derivation variable is fixed instead of using each iteration. The Jacobian of the updated state variables is calculated to ensure the non-observable dimensions of the system and avoid changes in the null space of the covariance matrix.

步骤4：判断最新帧图像是否为关键帧。Step 4: Determine whether the latest frame image is a key frame.

若是关键帧，称其为最新关键帧。将滑动窗口内的地图点投影到所述最新帧图像上，所述地图点为滑动窗口内能够与所述最新帧图像共视的关键帧所观测到的地图点，然后执行步骤5。If it is a keyframe, it is called the latest keyframe. Project the map points in the sliding window onto the latest frame image, and the map points are map points observed in the key frames in the sliding window that can be co-viewed with the latest frame image, and then perform step 5.

若不是关键帧，仅更新最新帧图像对应的IMU约束关系，然后返回步骤2。If it is not a key frame, only update the IMU constraint relationship corresponding to the latest frame image, and then return to step 2.

步骤5：分别判断最新帧图像上的投影点是否能够与最新帧图像上的特征点(在图像平面中的二维特征点，用ORB或SIFT方法提取)匹配成功，即最新帧图像上的投影点与特征点的描述子匹配，若某个投影点在最新关键帧的特征点中找不到匹配点，那么抛弃该投影点；若某个投影点在最新关键帧上的特征点中能够找到匹配点，则判断有相匹配投影点的特征点是否存在原地图点，所述原地图点为特征点在进行匹配之前滑动窗口内就存在能够与之对应的地图点，是则执行步骤6，否则执行步骤7。Step 5: Determine whether the projection points on the latest frame image can successfully match the feature points on the latest frame image (two-dimensional feature points in the image plane, extracted using the ORB or SIFT method), that is, the projection on the latest frame image The point matches the descriptor of the feature point. If a projection point cannot find a matching point among the feature points of the latest key frame, then the projection point is discarded; if a projection point can be found among the feature points on the latest key frame Matching points, then determine whether there are original map points that match the feature points of the projection points. The original map points are feature points. Before matching, there are map points that can correspond to them in the sliding window. If so, perform step 6. Otherwise, go to step 7.

步骤6：比较投影点对应的地图点与特征点对应的原地图点的被观测次数，并保留所述被观测次数多的地图点及更新滑窗内各关键帧的视觉观测约束，删除被观测次数较少的那个地图点，即地图点替换。然后执行步骤8。Step 6: Compare the observed times of the map points corresponding to the projected points and the original map points corresponding to the feature points, and retain the map points with the most observed times and update the visual observation constraints of each key frame in the sliding window, and delete the observed ones. The map point with fewer times is the map point replacement. Then proceed to step 8.

步骤7：增加新的地图点，使得所述最新帧图像与新增地图点之间建立观测约束，即地图点新增。然后执行步骤8。Step 7: Add a new map point so that an observation constraint is established between the latest frame image and the newly added map point, that is, a new map point is added. Then proceed to step 8.

对于步骤5、6、7的内容，是将与最新关键帧共视的全部关键帧能够观测到的所有地图点投影到最新关键帧上，通过投影点与特征点的关系以及特征点是否原有对应的地图点，决定对地图点的进一步处理措施。如图5所示，给出了当前关键帧与其共视关键帧的共视关系示意图。同一个地图点可能在移动机器人运动的不同阶段被观测到，也可能在SLAM系统的不同线程中被创建，从而导致“影子”地图点现象。对地图点进行投影匹配，通过地图点替换处理，能够对“影子”地图点进行筛选，从而提高了地图点的精度；通过地图点新增处理，能够增加关键帧观测地图点的视野，增加关键帧成功匹配地图点的数量，从而增加了视觉观测约束关系。For steps 5, 6, and 7, all map points that can be observed in all key frames that are co-viewed with the latest key frame are projected onto the latest key frame. Through the relationship between the projected points and the feature points and whether the feature points are original The corresponding map point determines the further processing measures for the map point. As shown in Figure 5, a schematic diagram of the common view relationship between the current key frame and its common view key frame is given. The same map point may be observed at different stages of the mobile robot's movement, or may be created in different threads of the SLAM system, resulting in the phenomenon of "shadow" map points. Projection matching of map points, through map point replacement processing, can filter "shadow" map points, thereby improving the accuracy of map points; through map point addition processing, the field of view of key frame observation map points can be increased, and key points can be increased. The number of frames that successfully match map points, thus increasing the visual observation constraints.

根据相机视觉成像透视关系对地图点进行投影匹配，图6给出了滑动窗口内最新关键帧与其共视关键帧的投影匹配关系示意图。在投影计算中，关键帧的位姿由SLAM的跟踪模块估计得到，即使位姿准确度不高也没有关系，可以在投影点的圆形邻域内寻找候选匹配特征点，后续通过SLAM后端持续优化和修正位姿。下式给出了从三维地图点到二维图像的针孔相机模型透视投影坐标变换表达式：Projection matching of map points is performed based on the camera visual imaging perspective relationship. Figure 6 shows a schematic diagram of the projection matching relationship between the latest key frame in the sliding window and its common view key frame. In the projection calculation, the pose of the key frame is estimated by the SLAM tracking module. Even if the pose accuracy is not high, it does not matter. Candidate matching feature points can be found within the circular neighborhood of the projection point, and then continued through the SLAM backend. Optimize and correct poses. The following formula gives the expression for the perspective projection coordinate transformation of the pinhole camera model from a three-dimensional map point to a two-dimensional image:

上式中，[u v 1]^T是成像平面上像素坐标系下的投影点齐次坐标，u和v分别为像素坐标系下投影点的横纵坐标。f_x和f_y分别为相机在横向方向和纵向方向的焦距，c_x和c_y分别为图像的中心像素坐标至图像原点像素坐标之间相差的横向和纵向像素数。Z是从相机坐标系到归一化坐标系变换时的光轴方向的深度值，R为移动机器人的旋转矩阵，t为移动机器人的平移向量，[x_w y_w z_w 1]^T是地图点在世界坐标系下的齐次坐标，x_w、y_w、z_w分别为世界坐标系下三轴的坐标值，K是相机内参矩阵，T表示相机位姿的变换矩阵。In the above formula, [uv 1] ^T is the homogeneous coordinates of the projection point in the pixel coordinate system on the imaging plane, u and v are the horizontal and vertical coordinates of the projection point in the pixel coordinate system respectively. f _x and f _y are the focal lengths of the camera in the horizontal and vertical directions respectively, and c _x and _{cy are} respectively the number of horizontal and vertical pixels that differ from the center pixel coordinates of the image to the image origin pixel coordinates. Z is the depth value in the optical axis direction when transforming from the camera coordinate system to the normalized coordinate system, R is the rotation matrix of the mobile robot, t is the translation vector of the mobile robot, [x _w y _w z _w 1] ^T is the map The homogeneous coordinates of the point in the world coordinate system, x _w , y _w , and z _w are the coordinate values of the three axes in the world coordinate system respectively, K is the camera internal parameter matrix, and T represents the transformation matrix of the camera pose.

步骤8：判断滑窗内关键帧数量是否已满，若未满，则返回步骤2；若已满，则执行步骤9。Step 8: Determine whether the number of key frames in the sliding window is full. If it is not full, return to step 2; if it is full, go to step 9.

步骤9：根据视觉重投影透视关系，构建移动机器人的位姿与地图点之间的视觉约束因子。根据IMU(惯性测量单元)预积分理论，构建IMU预积分约束因子。根据更新滑动窗口的边缘化处理结果，构建先验约束因子。将所述视觉约束因子、IMU预积分约束因子以及边缘化先验约束因子加入到滑窗因子图优化模型中构建目标函数。图7给出了由三种约束因子构建的滑动窗口因子图优化模型的示意图。Step 9: Based on the visual reprojection perspective relationship, construct the visual constraint factors between the mobile robot's pose and the map points. According to the IMU (Inertial Measurement Unit) pre-integration theory, the IMU pre-integration constraint factor is constructed. Based on the marginalization processing results of the updated sliding window, a priori constraint factors are constructed. The visual constraint factor, IMU pre-integration constraint factor and marginalization prior constraint factor are added to the sliding window factor graph optimization model to construct the objective function. Figure 7 shows a schematic diagram of the sliding window factor graph optimization model constructed from three constraint factors.

根据滑动窗口因子图优化模型，建立紧耦合光束平差优化目标函数，通过最小化视觉约束残差、IMU预积分约束残差、边缘化先验约束残差之和，获得状态的最大后验概率估计，形式如下所示：Based on the sliding window factor graph optimization model, a tightly coupled beam adjustment optimization objective function is established, and the maximum posterior probability of the state is obtained by minimizing the sum of visual constraint residuals, IMU pre-integration constraint residuals, and marginalized prior constraint residuals. Estimate, in the form of:

滑动窗口内所有帧图像的状态变量k∈[0,N]，N为滑动窗口内所有图像帧总数，λ_m为第m个地图点的逆深度，m为关键帧图像观测到地图点的总数，所述目标函数表达式如下：State variables of all frame images in the sliding window k∈[0,N], N is the total number of all image frames in the sliding window, λ _m is the inverse depth of the m-th map point, m is the total number of map points observed in the key frame image, and the objective function expression is as follows:

其中，第k帧的状态向量和/>分别为相机在第k帧的位置、速度和姿态，b_a和b_ω分别为IMU的加速度计偏置量和陀螺仪偏置量，和/>分别为相机相对IMU的位置和旋转量，e_C(x_i,x_j)、e_B(x_i,x_j)和e_M分别表示视觉约束残差、IMU预积分约束残差和边缘化先验约束残差，x_i和x_j分别为第i帧图像和第j帧图像的状态向量。Among them, the state vector of the kth frame and/> are the position, speed and attitude of the camera in the k-th frame respectively, b _a and b _ω are the accelerometer offset and gyroscope offset of the IMU respectively, and/> are the position and rotation of the camera relative to the IMU respectively, e _C (x _i ,x _j ), e _B (x _i ,x _j ) and e _M respectively represent the visual constraint residual, IMU pre-integration constrained residual and marginalization first Experimentally constrained residual, x _i and x _j are the state vectors of the i-th frame image and j-th frame image respectively.

视觉约束因子基于重投影误差构建，空间中的地图点投影到每个能观测到该地图点的关键帧上，其投影点的坐标与该地图点的观测坐标应该是重合的，两类坐标的重投影误差即为视觉约束因子的残差。视觉惯性SLAM系统中，视觉因子的待优化状态量χ_c表示为：The visual constraint factor is constructed based on the reprojection error. A map point in space is projected onto each key frame where the map point can be observed. The coordinates of the projected point and the observed coordinates of the map point should coincide. The two types of coordinates The reprojection error is the residual error of the visual constraint factor. In the visual-inertial SLAM system, the state quantity χ _c of the visual factor to be optimized is expressed as:

假设空间中有一地图点为P，它的索引编号为l，其在第j帧的归一化平面上的投影坐标为在第j帧的观测坐标为/>则有视觉约束残差e_C(x_i,x_j)表达式如下：Suppose there is a map point P in the space, its index number is l, and its projected coordinates on the normalized plane of the jth frame are The observation coordinates at the jth frame are/> Then the expression of the visually constrained residual e _C (x _i ,x _j ) is as follows:

其中，可由重投影的估计模型得到，表达式为：in, It can be obtained from the reprojected estimation model, and the expression is:

上式中，表示地图点P被第i帧第一次观测到时，在第i帧的归一化平面上的观测值，/>为IMU坐标系至相机坐标系的旋转量，/>为世界坐标系至第j帧IMU坐标系的旋转量，/>为第i帧IMU坐标系至世界坐标系的旋转量，/>为相机坐标系至IMU坐标系的旋转量。In the above formula, Represents the observation value on the normalized plane of the i-th frame when the map point P is first observed in the i-th frame,/> is the rotation amount from the IMU coordinate system to the camera coordinate system,/> is the amount of rotation from the world coordinate system to the j-th frame IMU coordinate system,/> is the rotation amount from the IMU coordinate system to the world coordinate system in the i-th frame,/> is the amount of rotation from the camera coordinate system to the IMU coordinate system.

IMU预积分约束因子用于描述两个关键帧时间差之间的IMU测量信息的相对位姿变化关系。将IMU采样的i时刻和j时刻与图像的第k帧和第k+1帧对齐，则IMU预积分因子的待优化变量表示为：The IMU pre-integration constraint factor is used to describe the relative pose change relationship of the IMU measurement information between the time differences of two key frames. Align the i time and j time sampled by the IMU with the k-th frame and k+1-th frame of the image, then the variable to be optimized of the IMU pre-integration factor is expressed as:

其中，和/>分别为t_i时刻和t_j时刻IMU的速度，/>和/>分别为t_i时刻和t_j时刻IMU的加速度计偏置量，/>和/>分别为t_i时刻和t_j时刻IMU的陀螺仪偏置量，所述t_i时刻和t_j时刻分别为相机拍摄第i帧和第j帧的时刻。in, and/> are the speeds of the IMU at time t _i and time t _j respectively,/> and/> are the accelerometer bias of the IMU at time t _i and time t _j respectively,/> and/> are the gyroscope offsets of the IMU at time t _i and time t _j respectively, and time t _i and time t _j are the times when the camera captures the i-th frame and the j-th frame respectively.

所述t_j时刻IMU的位置速度/>和姿态/>通过下式获得：The position of the IMU at time t _j Speed/> and posture/> Obtained by the following formula:

其中，Δt为第i帧和第j帧之间的时间间隔，t_i≤t≤t_j，为IMU坐标系至世界坐标系的旋转矩阵，g_w为世界坐标系下的重力向量，δt为相邻两个IMU测量数据之间的时间间隔，/>为IMU坐标系下t时刻以第i帧时刻所在位置为参考，载体相对于t_i时刻的旋转量。Among them, Δt is the time interval between the i-th frame and the j-th frame, t _i ≤ t ≤ t _j , is the rotation matrix from the IMU coordinate system to the world coordinate system, g _w is the gravity vector in the world coordinate system, δt is the time interval between two adjacent IMU measurement data, /> is the rotation amount of the carrier relative to time t i in the IMU coordinate system, taking the position of the i-th frame time as a reference at time _t .

和/>分别为t时刻IMU的加速度计偏置量和陀螺仪偏置量。 and/> are the accelerometer bias and gyroscope bias of the IMU at time t, respectively.

和/>分别表示t时刻IMU输出的加速度和角速度，/>和/>分别表示t时刻IMU测量信息的加速度噪声和角速度噪声，并假设它们均为服从高斯分布的白噪声。对上式进行变换，将位置量、速度量和旋转量的参考坐标系，由世界坐标系变换到第i帧的体坐标系下，可得： and/> Represent respectively the acceleration and angular velocity output by the IMU at time t,/> and/> represent the acceleration noise and angular velocity noise of the IMU measurement information at time t respectively, and assume that they are both white noise obeying Gaussian distribution. Transform the above formula and transform the reference coordinate system of position, velocity and rotation from the world coordinate system to the body coordinate system of the i-th frame, we can get:

上式中即为位置、速度、姿态的IMU预积分变量，表达式如下：In the above formula That is, the IMU pre-integrated variables of position, velocity and attitude, the expression is as follows:

表示t时刻相对于t_i时刻的机器人姿态的IMU预积分变量。 Represents the IMU pre-integrated variable of the robot attitude at time t relative to time t _i .

可见，第j帧时刻的状态量只与IMU的测量值有关，不会受到第i帧时刻状态量的影响，从而保证了测量值与状态量无关，避免了重复积分。则有IMU预积分约束残差e_B(x_i,x_j)表达式如下：It can be seen that the state quantity at the j-th frame is only related to the measured value of the IMU and will not be affected by the state quantity at the i-th frame, thus ensuring that the measured value has nothing to do with the state quantity and avoiding repeated integration. Then the expression of IMU pre-integration constrained residual e _B (x _i ,x _j ) is as follows:

其中，r_p、r_q、r_v、分别为t_i时刻至t_j时刻机器人的位置残差、姿态残差、速度残差、加速度计残差和陀螺仪残差，所述t_i时刻和t_j时刻分别为相机拍摄第i帧图像和第j帧图像的时刻，/>是四元数乘法符号，[·]_xyz表示取四元数的虚部组成三维向量，/>为从世界坐标系到t_i时刻IMU坐标系的旋转量(四元数表示法)，/>为从t_i时刻IMU坐标系到t_j时刻IMU坐标系的旋转量(四元数表示法)。Among them, r _p , r _q , r _v , are the position residual, attitude residual, speed residual, accelerometer residual and gyroscope residual of the robot from time t _i to time t j respectively. The time t _i and time _t _j are respectively the i-th frame image taken by the camera. and the moment of the jth frame image,/> is the quaternion multiplication symbol, [·] _xyz means taking the imaginary part of the quaternion to form a three-dimensional vector, /> is the amount of rotation from the world coordinate system to the IMU coordinate system at time t _i (quaternion representation),/> is the rotation amount (quaternion representation) from the IMU coordinate system at time t _i to the IMU coordinate system at time t _j .

先验约束因子是在更新滑动窗口时，剔除边缘化关键帧后，所保留下来的约束关系。以因子图优化模型构建最小二乘表达式，状态向量χ设为节点，将被边缘化的状态向量χ_d中的位姿信息与地图点的约束关系，以先验信息的形式加入到被保留的状态向量χ_s的优化关系中，避免了因更新滑窗后被边缘化的关键帧所对应的约束信息的丢失。在边缘化处理后，得到的先验矩阵H_p和对应的先验残差量b_p，这与步骤3中的边缘化表达式是一致的，所述边缘化先验约束残差e_M表达式为：The a priori constraint factor is the constraint relationship retained after eliminating marginalized key frames when updating the sliding window. A least squares expression is constructed using a factor graph optimization model. The state vector χ is set as a node. The pose information in the marginalized state vector χ _d and the constraint relationship between the map points are added to the retained information in the form of a priori information. In the optimization relationship of the state vector χ _s , the loss of the constraint information corresponding to the marginalized key frames after updating the sliding window is avoided. After the marginalization process, the obtained prior matrix H _p and the corresponding prior residual amount b _p are consistent with the marginalization expression in step 3. The marginalization prior constraint residual e _M is expressed The formula is:

e_M＝b_p-H_pδχ_1:N，e _M =b _p -H _p δχ _1:N ,

步骤10：利用非线性优化算法对所述目标函数进行优化，例如高斯-牛顿法，Levenberg-Marquadt法等。以Levenberg-Marquadt法为例，通过为状态增量设置置信区域的方法，避免了因状态增量变化较大而导致的近似失效，避免了线性方程组的系数矩阵非奇异和病态问题，因此能够提供更稳定更准确的状态增量。将目标函数构建成为状态增量方程的形式：Step 10: Optimize the objective function using a nonlinear optimization algorithm, such as Gauss-Newton method, Levenberg-Marquadt method, etc. Taking the Levenberg-Marquadt method as an example, by setting a confidence area for the state increment, the approximation failure caused by large changes in the state increment is avoided, and the non-singular and ill-conditioned coefficient matrix of the linear equation system is avoided. Therefore, it can Provides more stable and accurate status increments. Construct the objective function into the form of a state increment equation:

(H_h+μI)Δχ＝g，(H _h +μI)Δχ=g,

上式中，H_h为Hessian矩阵(黑塞矩阵)，μ为Lagrange乘子(拉格朗日乘子)，通过调整参数μ的大小来反映非线性二次近似模型是否准确，然后通过迭代的方式优化得到最优解。In the above formula, H _h is the Hessian matrix (Hessian matrix), μ is the Lagrange multiplier (Lagrange multiplier), and the size of the parameter μ is adjusted to reflect whether the nonlinear quadratic approximation model is accurate, and then iteratively Method optimization to obtain the optimal solution.

更新滑窗内所有关键帧的位姿，完成一次滑动窗口优化，判断SLAM后端的优化线程是否被中止，若未被中止，返回步骤2；若被中止，则结束。Update the poses of all key frames in the sliding window, complete a sliding window optimization, and determine whether the optimization thread of the SLAM backend has been suspended. If not, return to step 2; if it is suspended, end.

本实施方式在不改变滑动窗口优化快速性的前提下，通过共视投影匹配关系改进滑动窗口优化方法，消除了“影子”地图点，提高了地图点精度；增加了关键帧成功匹配地图点的数量，扩大了关键帧的观测视野，新增了SLAM系统的视觉观测约束，从而提高了滑动窗口优化估计位姿的精度，进而提高了移动机器人的定位精度。This implementation method improves the sliding window optimization method through the common view projection matching relationship without changing the speed of the sliding window optimization, eliminates "shadow" map points, improves the accuracy of the map points, and increases the probability that key frames successfully match map points. The number expands the observation field of view of key frames and adds visual observation constraints of the SLAM system, thereby improving the accuracy of sliding window optimization estimation poses and thus improving the positioning accuracy of mobile robots.

本发明的实现与验证：Implementation and verification of the invention:

本发明提出的一种基于共视投影匹配改进滑窗优化的视觉惯性SLAM方法使用C++语言编程实现，通过评估SLAM系统估计出的移动机器人运动轨迹的准确度，来验证本发明的有益效果。以视觉惯性SLAM系统——VINS-Mono框架为基础，将使用本发明提出的改进型滑动窗口优化方法的VINS-Mono命名为our-VINS-Mono。使用苏黎世联邦理工学院公开的EuRoc飞行数据集作为验证数据集，该数据集以微型旋翼飞行器为运载体，搭载了可见光相机传感器和惯性测量单元，用于感知周围环境信息和测量飞行器自身运动姿态。The present invention proposes a visual-inertial SLAM method based on common-view projection matching to improve sliding window optimization, which is implemented using C++ language programming. The beneficial effects of the present invention are verified by evaluating the accuracy of the mobile robot's motion trajectory estimated by the SLAM system. Based on the visual-inertial SLAM system-VINS-Mono framework, the VINS-Mono using the improved sliding window optimization method proposed in this invention is named our-VINS-Mono. The EuRoc flight data set published by ETH Zurich is used as a verification data set. This data set uses a micro-rotor aircraft as a carrier and is equipped with a visible light camera sensor and an inertial measurement unit to sense surrounding environment information and measure the aircraft's own motion attitude.

以EuRoc飞行数据集中的Machine Hall 01数据子集为例，将VINS-Mono和our-VINS-Mono分别估计Machine Hall 01数据集中旋翼飞行器的飞行轨迹，然后采用不同的评价指标对两条轨迹的精度进行评估。图8给出了our-VINS-Mono在Machine Hall 01数据集上估计出的无人机飞行轨迹图，图9给出了图8中的飞行轨迹在x、y、z三个坐标轴上的位移量变化与时间的关系曲线；图10是采用绝对位姿误差指标(APE)评估图8的飞行轨迹的APE误差曲线与时间的变化关系图，并给出了全局轨迹的均方根误差(rmse)、误差中位数(median)、误差均值(mean)和误差标准差(std)。图11是VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹的APE误差箱线图的对比图，从图中可以看出our-VINS-Mono的APE误差中位数明显比VINS-Mono要低，误差分区整体下移。图12是VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹，在不同误差评价指标下的APE误差量的对比图，从图中可以看出our-VINS-Mono的max(最大误差值)、std、median、mean、rmse指标下的APE误差量均小于VINS-Mono，而min(最小误差)指标下的APE误差量偏大，说明误差量整体趋于中位数附近，异常值对误差整体影响较小。图13是VINS-Mono和our-VINS-Mono在Machine Hall 01数据集上估计出的飞行轨迹，轨迹误差的密度分布与APE误差关系的对比图，从图中可以看出our-VINS-Mono的误差量的分布比较集中，这与箱线图中误差集中于中位数相一致。Taking the Machine Hall 01 data subset in the EuRoc flight data set as an example, VINS-Mono and our-VINS-Mono are used to estimate the flight trajectory of the rotorcraft in the Machine Hall 01 data set, and then different evaluation indicators are used to evaluate the accuracy of the two trajectories. to evaluate. Figure 8 shows the UAV flight trajectory estimated by our-VINS-Mono on the Machine Hall 01 data set. Figure 9 shows the flight trajectory in Figure 8 on the three coordinate axes of x, y, and z. The relationship curve between the displacement change and time; Figure 10 is the relationship between the APE error curve and time using the absolute pose error index (APE) to evaluate the flight trajectory in Figure 8, and the root mean square error of the global trajectory is given ( rmse), error median (median), error mean (mean) and error standard deviation (std). Figure 11 is a comparison of the APE error box plots of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set. From the figure, we can see the median APE error of our-VINS-Mono. The number is obviously lower than that of VINS-Mono, and the error partition moves downward as a whole. Figure 12 is a comparison chart of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set, and the APE error amount under different error evaluation indicators. It can be seen from the figure that our-VINS-Mono The APE errors under the max (maximum error value), std, median, mean, and rmse indicators are all smaller than VINS-Mono, while the APE error under the min (minimum error) indicator is larger, indicating that the overall error tends to be in the middle Near the number, outliers have less impact on the overall error. Figure 13 is a comparison of the flight trajectories estimated by VINS-Mono and our-VINS-Mono on the Machine Hall 01 data set. The density distribution of the trajectory error and the relationship between the APE error. From the figure, we can see that our-VINS-Mono The distribution of the error amount is relatively concentrated, which is consistent with the error concentration in the median in the box plot.

综上所述，本发明提出了一种基于共视投影匹配改进滑窗优化的视觉惯性SLAM方法，旨在通过对滑动窗口内关键帧的共视地图点进行投影匹配，优化地图点的位置精度，消除重影地图点，增加关键帧与地图点之间的观测约束，提高视觉惯性SLAM系统的位姿估计精度，从而提高移动机器人的定位精度。在滑动窗口内，将与最新关键帧有共视关系的全部关键帧所能观测到的全部的地图点投影到最新关键帧上，判断每个投影点与特征点的匹配关系，通过地图点替换，消除“影子”地图点，避免了冗余存储；通过地图点新增，增加了关键帧与地图点成功匹配的数量，扩大了关键帧的观测视野。此外，本发明不改变滑动窗口的大小，滑窗内仍然是固定数量的关键帧，能够确保运算时间的快速性。In summary, the present invention proposes a visual inertial SLAM method based on common-view projection matching to improve sliding window optimization, aiming to optimize the position accuracy of map points by performing projection matching on common-view map points of key frames within the sliding window. , eliminate ghost map points, increase observation constraints between key frames and map points, and improve the pose estimation accuracy of the visual-inertial SLAM system, thereby improving the positioning accuracy of the mobile robot. In the sliding window, all map points that can be observed in all key frames that have a common view relationship with the latest key frame are projected onto the latest key frame, and the matching relationship between each projection point and the feature point is judged, and the map point is replaced by , eliminate "shadow" map points and avoid redundant storage; through the addition of map points, the number of successful matches between key frames and map points is increased, and the observation field of view of key frames is expanded. In addition, the present invention does not change the size of the sliding window, and there are still a fixed number of key frames in the sliding window, which can ensure the speed of the calculation time.

虽然在本文中参照了特定的实施方式来描述本发明，但是应该理解的是，这些实施例仅仅是本发明的原理和应用的示例。因此应该理解的是，可以对示例性的实施例进行许多修改，并且可以设计出其他的布置，只要不偏离所附权利要求所限定的本发明的精神和范围。应该理解的是，可以通过不同于原始权利要求所描述的方式来结合不同的从属权利要求和本文中所述的特征。还可以理解的是，结合单独实施例所描述的特征可以使用在其它所述实施例中。Although the present invention is described herein with reference to specific embodiments, it is to be understood that these embodiments are merely exemplary of the principles and applications of the invention. It is therefore to be understood that many modifications may be made to the exemplary embodiments and other arrangements may be devised without departing from the spirit and scope of the invention as defined by the appended claims. It is to be understood that the features described in the different dependent claims may be combined in a different manner than that described in the original claims. It will also be understood that features described in connection with individual embodiments can be used in other described embodiments.

Claims

1. The visual inertia SLAM system sliding window optimization method based on common view projection matching is characterized by comprising the following steps of:

step one: when the back end module of the visual inertia SLAM system subscribes to the latest frame image of the robot published by the front end module, judging whether the sliding window is full, if yes, executing the second step, otherwise, executing the fourth step;

step two: judging whether the next new frame image is a key frame or not, if so, marginalizing the oldest frame image in the sliding window and then executing the step III, otherwise, marginalizing the next new frame image and then executing the step III;

step three: reconstructing the constraint relation of the sliding window according to the Shu's complement theory, and then executing the fourth step;

step four: judging whether the latest frame image is a key frame according to the parallax of the latest frame image and the current latest key frame image in the sliding window, if so, projecting map points in the sliding window onto the latest frame image, wherein the map points are observed map points of the key frame which can be viewed together with the latest frame image in the sliding window, executing a fifth step, otherwise, updating an IMU constraint relation corresponding to the latest frame image, and returning to the first step;

step five: matching the projection points on the latest frame image with the feature points, deleting the projection points which are not matched with the feature points, judging whether the feature points with the matched projection points have original map points or not, wherein the original map points are map points which can be corresponding to the feature points in a sliding window before the feature points are matched, if so, executing the step six, otherwise, executing the step seven;

step six: comparing the observed times of the map points corresponding to the projection points and the original map points corresponding to the feature points, only reserving the map points with more observed times, updating the visual observation constraint of each key frame in the sliding window, and then executing the step eight;

step seven: adding new map points, establishing observation constraint between the latest frame image and the newly added map points, and executing the step eight;

step eight: judging whether the number of key frames in the sliding window is full, if yes, executing a step nine, otherwise, returning to the step one;

step nine: respectively constructing a visual constraint factor, an IMU pre-integral constraint factor and an marginalized prior constraint factor, and adding the visual constraint factor, the IMU pre-integral constraint factor and the marginalized prior constraint factor into a sliding window factor graph optimization model to construct an objective function;

step ten: and optimizing the objective function by using a nonlinear optimization algorithm, updating the state quantity of the robot in the sliding window, and completing the sliding window optimization.

2. The method for optimizing a sliding window of a visual inertial SLAM system based on co-view projection matching according to claim 1, wherein in the second step, the step of marginalizing the oldest frame image in the sliding window comprises:

deleting the oldest frame image and the IMU constraint relation corresponding to the oldest frame image in the sliding window, and reserving the visual constraint relation of the oldest frame image;

the step two, the marginalizing of the image of the next new frame includes:

deleting the next new frame image and the visual constraint relation thereof, and reserving the IMU constraint relation corresponding to the next new frame image.

3. The method for optimizing the sliding window of the visual inertial SLAM system based on co-view projection matching according to claim 1, wherein the reconstructing the constraint relation of the sliding window according to the schulplement theory in the step three includes:

the state variable χ of all image frames within the sliding window is expressed in terms of an incremental equation as follows:

wherein δχ _d For the state vector to be marginalized δχ _s To preserve state vector, H _a Is delta chi _d Covariance matrix of H _c Is delta chi _s Is used for the co-variance matrix of (a),represents H _b Transpose of H _b Is delta chi _d And delta chi _s Covariance matrix between b _d And b _s Delta chi respectively _d And δχ _s A constant vector;

and carrying out Gaussian elimination on an increment equation of a state variable χ according to a Shuerbu theory, and deducing to obtain:

further, a reserved state vector delta chi is obtained according to the above method _s And realizing the reconstruction of the constraint relation of the sliding window.

4. The method for optimizing sliding window of visual inertial SLAM system based on co-view projection matching according to claim 1, wherein in step five, if the projection point on the latest frame image matches with the descriptor of the feature point, the projection point matches with the feature point.

5. The method for optimizing sliding window of visual inertial SLAM system based on co-view projection matching according to claim 1, wherein in step nine, the state quantity χ to be optimized of the visual constraint factor _c The method comprises the following steps:

wherein,and->The camera is at the position of the i-th frame and the j-th frame, respectively,>and->The pose of the camera in the ith frame and the jth frame are respectively, wherein the ith frame and the jth frame are two adjacent key frames, < ->And->The position and rotation of the camera relative to the IMU are respectively shown, and lambda is the inverse depth of the map point.

6. The co-view projection matching based visual inertia of claim 5The SLAM system sliding window optimization method is characterized in that in the step nine, the variable χ to be optimized of the IMU pre-integration constraint factor _imu The method comprises the following steps:

wherein,and->Respectively t _i Time sum t _j Speed of IMU at time, +.>And->Respectively t _i Time sum t _j Accelerometer bias of moment IMU, +.>And->Respectively t _i Time sum t _j Gyro offset of IMU at moment, t _i Time sum t _j The moments are the moments when the camera shoots the ith frame image and the jth frame image, respectively.

7. The method for optimizing a sliding window of a visual inertial SLAM system based on co-view projection matching according to claim 1, wherein in step nine, the objective function expression is as follows:

wherein χ is the state variable of all image frames in the sliding window and hask∈[0,N]，λ _m For the inverse depth of the mth map point, m is the total number of map points observed by the key frame images, N is the total number of all frame images in the sliding window, and the state vector of the kth frame is-> And->B is the position, speed and attitude of the camera at the kth frame, respectively _a And b _ω The accelerometer bias and the gyroscope bias of the IMU, and->The position and rotation amount of the camera relative to the IMU, e _C (x _i ,x _j )、e _B (x _i ,x _j ) And e _M Respectively representing a vision constraint residual error, an IMU pre-integral constraint residual error and an marginalized prior constraint residual error, and x _i And x _j The state vectors of the i-th frame image and the j-th frame image, respectively.

8. The method for optimizing sliding window of visual inertial SLAM system based on co-view projection matching according to claim 7, wherein the visual constraint residual e _C (x _i ,x _j ) The expression is as follows:

wherein,for the projection coordinates of the map point P in space on the normalized plane of the jth frame of image, +.>Is the observed coordinate of the map point P in the space when the map point P is observed for the first time in the j-th frame image.

9. The method for optimizing a sliding window of a visual inertial SLAM system based on co-view projection matching according to claim 7, wherein the IMU pre-integration constraint residual e _B (x _i ,x _j ) The expression is:

wherein r is _p 、r _q 、r _v 、Respectively t _i From time to t _j Position residual error, attitude residual error, speed residual error, accelerometer residual error and gyroscope residual error of the moment robot, wherein t is as follows _i Time sum t _j The moments are the moments when the camera shoots the ith frame image and the jth frame image, respectively.

10. The method for optimizing sliding window of visual inertial SLAM system based on co-view projection matching according to claim 7, wherein the marginalized a priori constraint residual e _M The expression is:

e _M ＝b _p -H _p δχ _1:N ，

wherein δχ _1:N For the state delta of all key frames within the sliding window,

H _a for the state vector δχ to be marginalized _d Covariance matrix of H _c For the reserved state vector δχ _s Is used for the co-variance matrix of (a),represents H _b Transpose of H _b Is delta chi _d And delta chi _s Covariance matrix between b _d And b _s Delta chi respectively _d And δχ _s Constant vector.