WO2018133119A1 - Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur - Google Patents
Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur Download PDFInfo
- Publication number
- WO2018133119A1 WO2018133119A1 PCT/CN2017/072257 CN2017072257W WO2018133119A1 WO 2018133119 A1 WO2018133119 A1 WO 2018133119A1 CN 2017072257 W CN2017072257 W CN 2017072257W WO 2018133119 A1 WO2018133119 A1 WO 2018133119A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- depth image
- depth
- frame
- segments
- fusion
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
Definitions
- the present invention relates to the field of computer vision technology, and in particular, to a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.
- High-precision 3D reconstruction of indoor scenes is one of the challenging research topics in computer vision, involving theories and techniques in computer vision, computer graphics, pattern recognition, optimization and many other fields.
- the traditional method is to use laser or radar ranging sensors or structured light technology to acquire the structural information of the scene or the surface of the object for 3D reconstruction.
- these instruments are mostly expensive and difficult to carry, so the application is limited.
- researchers began to study the use of pure vision methods for 3D reconstruction, which has produced a lot of useful research work.
- KinectFusion algorithm proposed by Newcombe et al. uses Kinect to obtain the depth information of each point in the image, and iteratively approximates the coordinates of the 3D point in the current frame camera coordinate system and the global model by iterative approximation of the nearest neighbor (ICP) algorithm.
- the coordinates are aligned to estimate the pose of the current frame camera, and the volume data fusion is performed through the Truncated Signed Distance Function (TSDF) iteration to obtain a dense three-dimensional model.
- TSDF Truncated Signed Distance Function
- Whelan et al. proposed the Kintinuous algorithm, which is a further extension of KinectFusion.
- the algorithm uses ShiftingTSDFVolume uses the memory to solve the problem of memory consumption of the mesh model during large scene reconstruction. It also uses DBoW to find matching key frames for closed-loop detection. Finally, the pose and model are optimized to obtain a large-scale 3D model.
- Choi et al. proposed the Elastic Fragment idea. First, the RGBD data stream is segmented every 50 frames. The visual odometer estimation is performed separately for each segment. The geometric descriptor FPFH is extracted from the point cloud data between the two segments to find the matching.
- a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera are provided.
- a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera may include:
- weighted volume data fusion is performed to reconstruct a three-dimensional model of the complete scene in the room.
- the performing adaptive bilateral filtering on the depth image specifically includes:
- Adaptive bilateral filtering is performed according to the following formula:
- the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ; Representing the corresponding depth value after filtering; the W is expressed in the field The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
- the Gaussian kernel function filtered in the spatial domain and the range domain is determined according to the following formula:
- ⁇ s and the ⁇ c are variances of a spatial domain and a range Gaussian kernel function, respectively;
- f represents a focal length of the depth camera
- K s and the K c represent a constant.
- the performing visual content-based block fusion and registration processing on the filtered depth image specifically includes: segmenting the depth image sequence based on the visual content, Each segment is block-fused, and closed-loop detection is performed between the segments, and the result of the closed-loop detection is globally optimized.
- the depth image sequence is segmented based on the visual content, and each segment is block-fused, and the closed-loop detection is performed between the segments, and the global optimization of the closed-loop detection result includes:
- the depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine the transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
- the visual content detection automatic segmentation method segments the depth image sequence, divides the similar depth image content into one segment, and performs block fusion on each segment to determine the depth image.
- the map is constructed and optimized by using the G2O framework to obtain optimized camera track information, thereby implementing the global optimization.
- Step 1 calculating a similarity between the depth image of each frame and the depth image of the first frame
- Step 2 determining whether the similarity is lower than a similarity threshold
- Step 3 If yes, segment the depth image sequence
- Step 4 The next frame depth image is taken as the starting frame depth image of the next segment, and steps 1 and 2 are repeatedly performed until all frame depth images are processed.
- the step 1 specifically includes:
- the u p is any pixel on the depth image;
- the Z(u p ) and the p respectively represent a depth value corresponding to the u p and the first spatial three-dimensional point;
- the first spatial three-dimensional point rotation translation is transformed into a world coordinate system according to the following formula to obtain a second spatial three-dimensional point:
- the T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system;
- the p represents the first spatial three-dimensional point, and the q represents the second spatial three-dimensional point;
- the i takes a positive integer;
- the second spatial three-dimensional point is back-projected to the two-dimensional image plane according to the following formula to obtain the projected depth image:
- the u q is a pixel on the projected depth image corresponding to the q;
- the f x , the f y , the c x and the c y represent internal parameters of the depth camera;
- the x q , y q , z q represent the coordinates of the q;
- the T represents the transposition of the matrix;
- the number of effective pixels on the start frame depth image and the depth image projected on any frame is calculated separately, and the ratios of the two are used as similarities.
- the weighting volume data fusion is performed according to the processing result, so that reconstructing the indoor full scene three-dimensional model specifically includes: combining the depth image of each frame by using the truncated symbol distance function mesh model according to the processing result, and using the voxel
- the grid represents the three-dimensional space, thereby obtaining a three-dimensional model of the complete scene in the room.
- the depth image of each frame is merged by using the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene, which specifically includes:
- the weighted fusion of the truncated symbol distance function data is performed by using a Volumemetric method framework
- the Mesh model extraction is performed by using the Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
- the truncated symbol distance function is determined according to the following formula:
- f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface a point;
- the K represents an internal parameter matrix of the camera;
- the u represents a pixel;
- the z i (u) represents a depth value corresponding to the pixel u;
- the v i represents a voxel.
- the data weighted fusion is performed according to the following formula:
- the v represents a voxel
- the f i (v) and the w i (v) respectively represent a truncated symbol distance function corresponding to the voxel v and a weight function thereof
- the n takes a positive integer
- the F(v) represents a truncated symbol distance function value corresponding to the voxel v after the fusion
- the W(v) represents a weight of a truncated symbol distance function value corresponding to the voxel v after the fusion
- weight function can be determined according to the following formula:
- a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera comprising:
- a filtering module configured to perform adaptive bilateral filtering on the depth image
- a block fusion and registration module for performing visual content-based block fusion and registration processing on the filtered depth image
- the volume data fusion module is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the complete scene in the room.
- the filtering module is specifically configured to:
- Adaptive bilateral filtering is performed according to the following formula:
- the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ; Representing the corresponding depth value after filtering; the W is expressed in the field The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
- the block fusion and registration module is specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between the segments, The results of closed-loop detection are globally optimized.
- the block fusion and registration module is further specifically configured to:
- the depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, and the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine a transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
- the block fusion and registration module specifically includes:
- the camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image;
- a segmentation unit configured to backproject the point cloud data corresponding to the depth image of each frame to an initial coordinate system according to the camera pose information, and perform similarity between the depth image obtained by the projection and the depth image of the initial frame. Comparing, and when the similarity is lower than the similarity threshold, initializing the camera pose and performing segmentation;
- a registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a match between the segments. relationship;
- An optimization unit is configured to utilize the pose information of each segment and the matching relationship between the segments and the segments, construct a map, and perform graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby implementing the global optimization.
- the segmentation unit specifically includes:
- a calculating unit configured to calculate a similarity between the depth image of each frame and the depth image of the first frame
- a determining unit configured to determine whether the similarity is lower than a similarity threshold
- a segmentation subunit configured to segment the depth image sequence when the similarity is lower than a similarity threshold
- a processing unit configured to use the next frame depth image as the starting frame depth image of the next segment, and repeatedly execute the calculating unit and the determining unit until all the frame depth images are processed.
- the volume data fusion module is specifically configured to: according to the processing result, use a truncated symbol distance function mesh model to fuse the depth image of each frame, and use a voxel grid to represent the three-dimensional space, thereby obtaining a complete indoor scene. 3D model.
- the volume data fusion module specifically includes:
- a weighted fusion unit configured to perform weighted fusion of the truncated symbol distance function data by using a Volumemetric method framework based on noise characteristics and an interest region;
- An extracting unit is configured to perform Mesh model extraction by using a Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
- Embodiments of the present invention provide a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.
- the method includes: acquiring a depth image; performing adaptive bilateral filtering on the depth image; performing block-based fusion and registration processing on the filtered depth image based on the visual content; and performing weighted volume data fusion according to the processing result, thereby reconstructing the indoor A complete 3D model of the scene.
- the embodiment of the invention can effectively reduce the cumulative error in the visual odometer estimation and improve the registration precision by performing the visual content-based block fusion and registration on the depth image, and also adopts the weighted volume data fusion algorithm, which can effectively The geometrical details of the surface of the object are maintained, thereby solving the technical problem of how to improve the accuracy of the three-dimensional reconstruction in the indoor scene, so that a complete, accurate and refined indoor scene model can be obtained.
- FIG. 1 is a flow chart showing a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention
- 2a is a color image corresponding to a depth image according to an embodiment of the present invention.
- FIG. 2b is a schematic diagram of a point cloud obtained from a depth image according to an embodiment of the present invention.
- 2c is a schematic diagram of a point cloud obtained by bilaterally filtering a depth image according to an embodiment of the present invention
- 2d is a schematic diagram of a point cloud obtained by adaptive bilateral filtering of a depth image according to an embodiment of the invention
- FIG. 3 is a schematic flow chart of segmentation fusion and registration based on visual content according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of a weighted volume data fusion process according to an embodiment of the present invention.
- FIG. 5a is a schematic diagram of a three-dimensional reconstruction result using an unweighted volume data fusion algorithm
- Figure 5b is a partial detail view of the three-dimensional model of Figure 5a;
- FIG. 5c is a schematic diagram of a three-dimensional reconstruction result obtained by a weighted volume data fusion algorithm according to an embodiment of the present invention.
- Figure 5d is a partial detail view of the three-dimensional model of Figure 5c;
- FIG. 6 is a schematic diagram of an effect of performing three-dimensional reconstruction on a 3D Scene Data data set using the method proposed by the embodiment of the present invention
- FIG. 7 is a schematic diagram showing the effect of performing three-dimensional reconstruction on the Augmented ICL-NUIM Dataset data set using the method proposed by the embodiment of the present invention.
- FIG. 8 is a schematic diagram showing an effect of three-dimensional reconstruction of indoor scene data collected by Microsoft Kinect for Windows according to an embodiment of the present invention
- FIG. 9 is a schematic structural diagram of a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention.
- Embodiments of the present invention provide a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera. As shown in Figure 1, the method includes:
- this step may include: utilizing consumption based on the principle of structured light Level depth camera to get depth images.
- the consumer-level depth camera (Microsoft Kinect for Windows and Xtion, referred to as the depth camera) based on the structured light principle acquires the depth data of the depth image by transmitting the structured light and receiving the reflection information.
- real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows.
- the depth data can be calculated according to the following formula:
- f represents the focal length of the consumer-grade depth camera
- B represents the baseline
- D represents the parallax
- S110 Perform adaptive bilateral filtering on the depth image.
- the acquired depth image is adaptively double-filtered by using the noise characteristics of the consumer-level depth camera based on the structured light principle.
- the adaptive bilateral filtering algorithm refers to filtering in both the spatial domain and the value domain of the depth image.
- the parameters of the adaptive bilateral filtering algorithm can be set according to the noise characteristics of the depth camera and its internal parameters, which can effectively remove noise and preserve edge information.
- the noise of the depth data is mainly generated in the quantization process. It can be seen from the above equation that the variance of the depth noise is proportional to the square of the depth value, that is, the larger the depth value, the larger the noise.
- embodiments of the present invention define a filtering algorithm based on this noise characteristic.
- the above adaptive bilateral filtering can be performed according to the following formula:
- u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ; Indicates the depth value corresponding to the filter; W indicates the field The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
- w s and w c can be determined according to the following formula:
- ⁇ s and ⁇ c are the variances of the spatial domain and the Gaussian kernel function of the range, respectively.
- ⁇ s and ⁇ c are related to the magnitude of the depth value, which is not fixed.
- ⁇ s and ⁇ c can be determined according to the following formula:
- K s and K c represent constants, the specific values of which are related to the parameters of the depth camera.
- FIG. 2a shows a color image corresponding to the depth image.
- Figure 2b shows a point cloud derived from a depth image.
- Figure 2c shows a point cloud resulting from bilateral filtering of the depth image.
- Figure 2d shows a point cloud obtained by adaptive bilateral filtering of depth images.
- the embodiment of the present invention can implement edge preservation and denoising of the depth map by adopting an adaptive bilateral filtering method.
- S120 Perform visual content-based block fusion and registration processing on the depth image.
- the depth image sequence is segmented based on the visual content, and each segment is block-fused, and closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized.
- the depth image sequence is a depth image data stream.
- the step may include: determining a transformation relationship between the depth images, segmenting the depth image sequence according to the method of automatically segmenting the visual content, and dividing the similar depth image content into one segment, for each segment
- the segment performs block fusion, determines the transformation relationship between the depth images, and performs closed-loop detection between segments and segments according to the transformation relationship, and achieves global optimization.
- this step may include:
- S122 Backprojecting the point cloud data corresponding to the depth image of each frame to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame, and when the similarity is lower than At the similarity threshold, the camera pose is initialized and segmented.
- This step performs closed-loop detection between segments.
- S124 Using the pose information of each segment and the matching relationship between segments and segments, constructing a graph and performing graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby achieving global optimization.
- This step improves the non-rigid distortion in the Simultaneous Localization and Calibration (SLAC) mode, and introduces line processes constraints to remove the wrong closed-loop matching.
- SLAC Simultaneous Localization and Calibration
- the foregoing step S122 may further include:
- S1221 Calculate the similarity between the depth image of each frame and the depth image of the first frame.
- This step segments the depth image sequence based on the visual content. In this way, the cumulative error problem caused by the estimation of the visual odometer can be effectively solved, and the similar content can be fused together, thereby improving the registration accuracy.
- next frame depth image is taken as the start frame depth image of the next segment, and steps S1221 and S1222 are repeatedly performed until all the frame depth images are processed.
- the step of calculating the similarity between the depth image of each frame and the depth image of the first frame may specifically include:
- S12211 Calculate the first spatial three-dimensional point corresponding to each pixel on the depth image according to the projection relationship and the depth value of the image of any frame depth:
- u p is any pixel on the depth image
- Z(u p ) and p respectively represent the depth value corresponding to u p and the first spatial three-dimensional point
- ⁇ represents the projection relationship, that is, the point cloud data corresponding to each depth image Backprojection to 2D-3D projection transformation relationship in the initial coordinate system.
- T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system, which can be estimated by a visual odometer; i takes a positive integer; p represents a first spatial three-dimensional point, and q represents a second The three-dimensional point of space, the coordinates of p and q are:
- u q is the pixel on the projected depth image corresponding to q; f x , f y , c x and c y represent the internal parameters of the depth camera; x q , y q , z q represent the coordinates of q; T represents the matrix Transpose.
- S12214 Calculate the number of effective pixels on the start frame depth image and the depth image after any frame projection, and compare the ratios of the two as the similarity.
- n 0 and n i respectively represent the starting frame depth image and the number of effective pixels on the depth image after any frame projection; ⁇ represents the similarity.
- FIG. 3 exemplarily shows a flow diagram of segmentation fusion and registration based on visual content.
- the embodiment of the invention adopts an automatic segmentation algorithm based on visual content, which can effectively reduce the cumulative error in the visual odometer estimation and improve the registration accuracy.
- the step may include: combining the depth image of each frame by using a truncated symbol distance function (TSDF) mesh model according to the content of the block-based fusion and registration processing based on the visual content, and using the voxel mesh to represent the three-dimensional space To obtain a three-dimensional model of the complete scene in the room.
- TSDF truncated symbol distance function
- This step may further include:
- the TSDF mesh model can be used to fuse the depth images of each frame to represent the three-dimensional space using a voxel grid with a resolution of m, that is, each three-dimensional space is divided into m blocks.
- Each grid v stores two values: the truncated symbol distance function f i (v) and its weight w i (v).
- the truncated symbol distance function can be determined according to the following formula:
- f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface Point;
- K represents the internal parameter matrix of the camera;
- u represents the pixel;
- z i (u) represents the depth value corresponding to the pixel u;
- v i represents the voxel.
- the camera can be a depth camera or a depth camera.
- data weighted fusion can be performed according to the following formula:
- f i (v) and w i (v) respectively represent the truncated symbol distance function (TSDF) corresponding to voxel v and its weight function;
- n takes a positive integer;
- F(v) represents the voxel v after fusion The truncated symbol distance function value;
- W(v) represents the weight of the truncated symbol distance function value corresponding to the voxel v after fusion.
- the weight function may be determined according to the noise characteristics of the depth data and the region of interest, and the value is not fixed. In order to maintain the geometric details of the surface of the object, the weight of the area with small noise and the area of interest is set large, and the weight of the area with high noise or the area of no interest is set small.
- the weight function can be determined according to the following formula:
- d i represents the radius of the region of interest, the smaller the radius, the more interested, the greater the weight
- ⁇ s is the noise variance in the depth data, and its value is consistent with the variance of the spatial domain kernel function of the adaptive bilateral filtering algorithm
- w is a constant, preferably it may take a value of 1 or 0.
- FIG. 4 exemplarily shows a schematic diagram of a weighted volume data fusion process.
- the weighted volume data fusion algorithm in the embodiment of the invention can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model, which has good robustness and expandability.
- Figure 5a exemplarily shows a three-dimensional reconstruction result using an unweighted volume data fusion algorithm
- Figure 5b exemplarily shows a partial detail of the three-dimensional model in Figure 5a
- Figure 5c exemplarily illustrates the use of an embodiment of the invention The three-dimensional reconstruction result obtained by the proposed weighted volume data fusion algorithm
- Figure 5d exemplarily shows the local details of the three-dimensional model in Figure 5c.
- FIG. 6 exemplarily shows an effect of performing three-dimensional reconstruction on the 3D Scene Data data set using the method proposed by the embodiment of the present invention
- FIG. 7 exemplarily shows the use of the present invention on the Augmented ICL-NUIM Dataset data set Schematic diagram of the effect of the method proposed by the embodiment for three-dimensional reconstruction
- FIG. 8 exemplarily shows the effect of three-dimensional reconstruction of the indoor scene data collected by Microsoft Kinect for Windows.
- the embodiment of the present invention further provides a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera.
- the system 90 includes: an obtaining module 92 and a filtering module 94.
- the obtaining module 92 is configured to acquire a depth image.
- the filtering module 94 is configured to perform adaptive bilateral filtering on the depth image.
- the block fusion and registration module 96 is configured to perform visual content based block fusion and registration processing on the filtered depth image.
- the volume data fusion module 98 is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the indoor complete scene.
- the embodiment of the invention can effectively reduce the accumulated error in the visual odometer estimation, improve the registration precision, can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model. .
- the filtering module is specifically configured to: perform adaptive bilateral filtering according to the following formula:
- u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ; Indicates the depth value corresponding to the filter; W indicates the field The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
- the block fusion and registration module may be specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between segments, The results of the test are globally optimized.
- the block fusion and registration module is further specifically configured to: determine a transformation relationship between depth images, segment the depth image sequence based on the visual content detection automatic segmentation method, and similar depth images
- the content is divided into a segment, and each segment is block-fused, the transformation relationship between the depth images is determined, and closed-loop detection is performed between segments and segments according to the transformation relationship, and global optimization is realized.
- the block fusion and registration module may specifically include: a camera pose information acquisition unit, a segmentation unit, a registration unit, and an optimization unit.
- the camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image.
- the segmentation unit is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the depth image obtained by the projection and the depth image of the initial frame, and similar When the degree is lower than the similarity threshold, the camera pose is initialized and segmented.
- the registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a matching relationship between the segments.
- the optimization unit is used to utilize the pose information of each segment and the matching relationship between segments and segments, construct a graph and optimize the graph by using the G2O framework to obtain optimized camera track information, thereby achieving global optimization.
- the segmentation unit may specifically include: a calculation unit, a determination unit, a segmentation subunit, and a processing unit.
- the calculation unit is configured to calculate the similarity between the depth image of each frame and the depth image of the first frame.
- the judging unit is configured to judge whether the similarity is lower than the similarity threshold.
- the segmentation subunit is configured to segment the depth image sequence when the similarity is below the similarity threshold.
- the processing unit is configured to use the next frame depth image as the start frame depth image of the next segment, and repeatedly execute the calculation unit and the determination unit until all frame depth images are processed.
- the volume data fusion module can be specifically used in accordance with As a result, the depth image of each frame is fused by the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene.
- the volume data fusion module may specifically include a weighted fusion unit and an extraction unit.
- the weighted fusion unit is configured to perform weighted fusion of the truncated symbol distance function data based on the noise feature and the region of interest using the Volumetric method framework.
- the extraction unit is used to extract the Mesh model by using the Marching cubes algorithm, thereby obtaining a three-dimensional model of the indoor complete scene.
- the system for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera includes an acquisition module, a filtering module, a block fusion and registration module, and a volume data fusion module. among them:
- the acquisition module is used for depth image acquisition of indoor scenes using a depth camera.
- the filtering module is configured to perform adaptive bilateral filtering processing on the acquired depth image.
- the acquisition module is an equivalent replacement of the above acquisition module.
- real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows.
- the adaptive depth filtering is performed on the acquired depth image, and the parameters in the adaptive bilateral filtering method are automatically set according to the noise characteristics of the depth camera and its internal parameters. Therefore, the embodiment of the present invention can effectively remove noise and preserve edge information. .
- the block fusion and registration module is used to automatically segment the data stream based on the visual content, each segment performs block fusion, and the closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized.
- the block fusion and registration module performs automatic block fusion and registration based on visual content.
- the block fusion and registration module specifically includes: a pose information acquisition module, a segmentation module, a coarse registration module, a fine registration module, and an optimization module.
- the pose information acquisition module is configured to perform visual odometer estimation using the Kintinuous framework to obtain camera pose information under each frame depth image.
- the segmentation module is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the projected depth image and the depth image of the initial frame, if the similarity is lower than The similarity threshold initializes the camera pose and performs a new segmentation.
- the coarse registration module is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments;
- the fine registration module is used for fine registration using the GICP algorithm to obtain the matching relationship between segments.
- the optimization module is used to construct the map and use the G2O framework for graph optimization by using the pose information of each segment and the matching relationship between segments.
- the optimization module is further used to apply a SLAC (Simultaneous Localization and Calibration) mode to optimize non-rigid distortion, and use line processes constraints to delete the wrong closed-loop matching.
- SLAC Simultaneous Localization and Calibration
- the above-mentioned block fusion and registration module segments the RGBD data stream based on the visual content, which can effectively solve the cumulative error problem caused by the visual odometer estimation, and can fuse the similar content together, thereby improving the registration. Precision.
- the volume data fusion module is configured to perform weighted volume data fusion according to the optimized camera track information to obtain a three-dimensional model of the scene.
- the volume data fusion module defines a weight function of the truncated symbol distance function according to the noise characteristics of the depth camera and the region of interest to achieve the geometric detail of the surface of the object.
- the system embodiment for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera may be used to implement a method embodiment for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera, the technical principle, the technical problems solved, and the technical effects produced. Similarly, reference may be made to each other; for the convenience and brevity of the description, the same portions are omitted between the various embodiments.
- the system and method for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera are only illustrated by dividing the above functional modules, units or steps in performing three-dimensional reconstruction of indoor complete scenes.
- the acquisition module in the foregoing may also be used as an acquisition module.
- the function distribution may be performed by different functional modules, units or steps according to requirements, that is, modules, units or steps in the embodiment of the present invention. Decomposed or combined, for example, the acquisition module or the acquisition and filtering module can be combined into a data preprocessing module.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
L'invention concerne un procédé et un système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur niveau utilisateur. Le procédé comprend les étapes qui consistent : à acquérir une image de profondeur et à réaliser un filtrage bilatéral adaptatif ; à effectuer une estimation visuelle d'odomètre à l'aide de l'image de profondeur filtrée, à segmenter automatiquement une séquence d'images sur la base du contenu visuel, à exécuter une détection en boucle fermée entre des segments, et à mettre en œuvre une optimisation globale ; et à réaliser une fusion de données de volume pondérées selon des informations de trajectoire de caméra optimisée, de façon à reconstruire un modèle tridimensionnel d'une scène d'intérieur complète. Le procédé et le système réalisent une préservation de bordures et un débruitage d'une image de profondeur au moyen d'un algorithme de filtrage bilatéral adaptatif. Un algorithme de segmentation automatique basé sur un contenu visuel peut réduire nettement l'erreur cumulative dans le processus d'estimation visuelle d'odomètre et améliorer la précision d'alignement. L'utilisation de l'algorithme de fusion de données de volume pondérées peut préserver efficacement les détails géométriques de la surface d'un objet. Ainsi, le problème technique de l'amélioration de la précision de la reconstruction tridimensionnelle dans une scène d'intérieur est résolu, de sorte qu'un modèle de scène d'intérieur complet, précis et détaillé peut être obtenu.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/072257 WO2018133119A1 (fr) | 2017-01-23 | 2017-01-23 | Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/072257 WO2018133119A1 (fr) | 2017-01-23 | 2017-01-23 | Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018133119A1 true WO2018133119A1 (fr) | 2018-07-26 |
Family
ID=62907634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/072257 WO2018133119A1 (fr) | 2017-01-23 | 2017-01-23 | Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018133119A1 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109166171A (zh) * | 2018-08-09 | 2019-01-08 | 西北工业大学 | 基于全局式和增量式估计的运动恢复结构三维重建方法 |
CN110375765A (zh) * | 2019-06-28 | 2019-10-25 | 上海交通大学 | 基于直接法的视觉里程计方法、系统及存储介质 |
CN110807789A (zh) * | 2019-08-23 | 2020-02-18 | 腾讯科技(深圳)有限公司 | 图像处理方法、模型、装置、电子设备及可读存储介质 |
CN111260709A (zh) * | 2020-01-15 | 2020-06-09 | 浙江大学 | 一种面向动态环境的地面辅助的视觉里程计方法 |
CN111524075A (zh) * | 2020-03-26 | 2020-08-11 | 北京迈格威科技有限公司 | 深度图像滤波方法、图像合成方法、装置、设备及介质 |
CN115346002A (zh) * | 2022-10-14 | 2022-11-15 | 佛山科学技术学院 | 一种虚拟场景构建方法及其康复训练应用 |
CN118967795A (zh) * | 2024-10-16 | 2024-11-15 | 泉州装备制造研究所 | 基于四目全景相机的视觉惯导紧耦合slam方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101860667A (zh) * | 2010-05-06 | 2010-10-13 | 中国科学院西安光学精密机械研究所 | 一种快速去除图像中混合噪声的方法 |
CN102800127A (zh) * | 2012-07-18 | 2012-11-28 | 清华大学 | 一种基于光流优化的三维重建方法及装置 |
CN104599314A (zh) * | 2014-06-12 | 2015-05-06 | 深圳奥比中光科技有限公司 | 三维模型重建方法与系统 |
CN104732492A (zh) * | 2015-03-09 | 2015-06-24 | 北京工业大学 | 一种深度图像的去噪方法 |
CN106169179A (zh) * | 2016-06-30 | 2016-11-30 | 北京大学 | 图像降噪方法以及图像降噪装置 |
-
2017
- 2017-01-23 WO PCT/CN2017/072257 patent/WO2018133119A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101860667A (zh) * | 2010-05-06 | 2010-10-13 | 中国科学院西安光学精密机械研究所 | 一种快速去除图像中混合噪声的方法 |
CN102800127A (zh) * | 2012-07-18 | 2012-11-28 | 清华大学 | 一种基于光流优化的三维重建方法及装置 |
CN104599314A (zh) * | 2014-06-12 | 2015-05-06 | 深圳奥比中光科技有限公司 | 三维模型重建方法与系统 |
CN104732492A (zh) * | 2015-03-09 | 2015-06-24 | 北京工业大学 | 一种深度图像的去噪方法 |
CN106169179A (zh) * | 2016-06-30 | 2016-11-30 | 北京大学 | 图像降噪方法以及图像降噪装置 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109166171A (zh) * | 2018-08-09 | 2019-01-08 | 西北工业大学 | 基于全局式和增量式估计的运动恢复结构三维重建方法 |
CN109166171B (zh) * | 2018-08-09 | 2022-05-13 | 西北工业大学 | 基于全局式和增量式估计的运动恢复结构三维重建方法 |
CN110375765A (zh) * | 2019-06-28 | 2019-10-25 | 上海交通大学 | 基于直接法的视觉里程计方法、系统及存储介质 |
CN110375765B (zh) * | 2019-06-28 | 2021-04-13 | 上海交通大学 | 基于直接法的视觉里程计方法、系统及存储介质 |
CN110807789A (zh) * | 2019-08-23 | 2020-02-18 | 腾讯科技(深圳)有限公司 | 图像处理方法、模型、装置、电子设备及可读存储介质 |
CN111260709A (zh) * | 2020-01-15 | 2020-06-09 | 浙江大学 | 一种面向动态环境的地面辅助的视觉里程计方法 |
CN111260709B (zh) * | 2020-01-15 | 2022-04-19 | 浙江大学 | 一种面向动态环境的地面辅助的视觉里程计方法 |
CN111524075A (zh) * | 2020-03-26 | 2020-08-11 | 北京迈格威科技有限公司 | 深度图像滤波方法、图像合成方法、装置、设备及介质 |
CN111524075B (zh) * | 2020-03-26 | 2023-08-22 | 北京迈格威科技有限公司 | 深度图像滤波方法、图像合成方法、装置、设备及介质 |
CN115346002A (zh) * | 2022-10-14 | 2022-11-15 | 佛山科学技术学院 | 一种虚拟场景构建方法及其康复训练应用 |
CN115346002B (zh) * | 2022-10-14 | 2023-01-17 | 佛山科学技术学院 | 一种虚拟场景构建方法及其康复训练应用 |
CN118967795A (zh) * | 2024-10-16 | 2024-11-15 | 泉州装备制造研究所 | 基于四目全景相机的视觉惯导紧耦合slam方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106910242B (zh) | 基于深度相机进行室内完整场景三维重建的方法及系统 | |
US12272085B2 (en) | Method and system for scene image modification | |
Zach et al. | A globally optimal algorithm for robust tv-l 1 range image integration | |
WO2018133119A1 (fr) | Procédé et système de reconstruction tridimensionnelle d'une scène d'intérieur complète sur la base d'une caméra de profondeur | |
CN113178009B (zh) | 一种利用点云分割和网格修补的室内三维重建方法 | |
CN106228507B (zh) | 一种基于光场的深度图像处理方法 | |
Li et al. | Detail-preserving and content-aware variational multi-view stereo reconstruction | |
Hiep et al. | Towards high-resolution large-scale multi-view stereo | |
CN109242855B (zh) | 基于多分辨率三维统计信息的屋顶分割方法、系统及设备 | |
CN103814306B (zh) | 深度测量质量增强 | |
CN103247045B (zh) | 一种从多视图中得到人造场景主方向及图像边缘的方法 | |
CN110047144A (zh) | 一种基于Kinectv2的完整物体实时三维重建方法 | |
Sibbing et al. | Sift-realistic rendering | |
CN113674400A (zh) | 基于重定位技术的光谱三维重建方法、系统及存储介质 | |
Xu et al. | Survey of 3D modeling using depth cameras | |
Jisen | A study on target recognition algorithm based on 3D point cloud and feature fusion | |
Gao et al. | Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting | |
Nouduri et al. | Deep realistic novel view generation for city-scale aerial images | |
Kim et al. | Multi-view object extraction with fractional boundaries | |
Lyra et al. | Development of an efficient 3D reconstruction solution from permissive open-source code | |
Park et al. | A tensor voting approach for multi-view 3D scene flow estimation and refinement | |
Wu et al. | 3d gaussian splatting for large-scale surface reconstruction from aerial images | |
Cushen et al. | Markerless real-time garment retexturing from monocular 3d reconstruction | |
Nguyen et al. | High resolution 3d content creation using unconstrained and uncalibrated cameras | |
CN114037804A (zh) | 一种结合物体级特征约束的室内场景重构方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17892520 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17892520 Country of ref document: EP Kind code of ref document: A1 |