CN118429528A

CN118429528A - Method, device and equipment for three-dimensional reconstruction of scene

Info

Publication number: CN118429528A
Application number: CN202410494260.2A
Authority: CN
Inventors: 林安成; 李俊; 苏天晴; 项羽升; 茆胜; 刘鹏; 吴向东; 杨怀; 黄少雄; 李晓明; 刘小飞; 沈安琪
Original assignee: Suzhou Shangliwei Technology Co ltd; Poly Changda Engineering Co Ltd
Current assignee: Suzhou Shangliwei Technology Co ltd; Poly Changda Engineering Co Ltd
Priority date: 2024-04-23
Filing date: 2024-04-23
Publication date: 2024-08-02

Abstract

The application is applicable to the technical field of three-dimensional reconstruction, and provides a method, a device and equipment for three-dimensional reconstruction of a scene, wherein the method comprises the following steps: acquiring a plurality of observation images, each observation image comprising a target scene; generating a target surface grid in a symbol distance function SDF grid corresponding to a target scene in a differential surface extraction mode; according to the Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch, determining a plurality of Gaussian objects corresponding to each triangular patch, and binding the plurality of Gaussian objects to a target surface grid to obtain a mixed representation of the surface grid and Gaussian object binding; and rendering a plurality of Gaussian objects according to the camera view angle corresponding to each observation image to generate a rendering image corresponding to each observation image. By adding a differentiable surface extraction mode into a traditional Gaussian splashing model, the mixed characterization of the Gaussian object and the surface grid is obtained, and compared with a traditional method, the three-dimensional reconstruction method of the scene is higher in efficiency.

Description

Method, device and equipment for three-dimensional reconstruction of scene

Technical Field

The application belongs to the technical field of three-dimensional reconstruction, and particularly relates to a method, a device and equipment for three-dimensional reconstruction of a scene.

Background

Currently, the output representation of the three-dimensional reconstruction of the scene may be a surface Mesh (Mesh), a neural radiation field (Neural RADIANCE FIELD, NERF), a gaussian spray model (3D Gaussian Splatting,3DGS). On the task of image rendering, the image quality rendered by Mesh is not high enough, the NeRF rendering speed is low, and the 3DGS model has both the rendering speed and the quality. Because the 3DGS model is composed of unstructured 'Gaussian' objects, the 3DGS model is far less controllable than structured Mesh, and is difficult to apply to tasks such as scene editing, virtual reality, physical simulation and the like, and therefore higher performance and controllability can be achieved through mixed characterization of binding Gaussian objects and Mesh in the 3DGS model.

However, when the prior art performs three-dimensional reconstruction on a scene through a gaussian splashing model, a two-stage extraction method is used in a method of extracting 3DGS and Mesh from an image sequence, namely, firstly, an original 3DGS image loss plus a regularization loss is used for training gaussian objects, the trained gaussian objects are used as point clouds, a Mesh is obtained through poisson reconstruction, meanwhile, the gaussian objects are discarded, then new gaussian objects are bound from the Mesh, the image loss is recalculated and trained by the new gaussian objects, and finally, the mixed characterization of the Mesh and the gaussian object binding is obtained. Obviously, this two-stage extraction method causes significant inefficiency in the three-dimensional reconstruction process.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for three-dimensional reconstruction of a scene, which can solve the problem of low three-dimensional reconstruction efficiency caused by a two-stage extraction mode of training a Gaussian object on the surface grid after training the Gaussian object to obtain the surface grid when the three-dimensional reconstruction of the scene is carried out by a traditional Gaussian splashing model.

In a first aspect, an embodiment of the present application provides a method for three-dimensional reconstruction of a scene, where the method includes:

acquiring a plurality of observation images; wherein each of the observation images includes a target scene;

Generating a target surface grid in a symbol distance function SDF grid corresponding to the target scene in a differential surface extraction mode; wherein the symbol distance function SDF grid is a data structure storing geometric information of the target scene; the target surface mesh comprises a plurality of triangular patches, and the target surface mesh is used for indicating the surface shape of the target scene;

determining a plurality of Gaussian objects corresponding to each triangular patch according to a Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch;

And rendering a plurality of Gaussian objects according to the camera view angle corresponding to each observation image to generate a rendering image corresponding to each observation image.

In a possible implementation manner of the first aspect, the generating, by a differentiable surface extraction manner, a target surface mesh in a symbol distance function SDF mesh corresponding to the target scene includes:

Determining SDF values corresponding to a plurality of voxels in a Symbol Distance Function (SDF) grid corresponding to the target scene according to a plurality of the observed images;

Generating a plurality of triangular patches in each voxel by the differentiable surface extraction mode according to preset generation conditions and the SDF value of each voxel;

And merging all the triangular patches to generate the corresponding target surface grids.

In a possible implementation manner of the first aspect, the determining, according to the gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each of the triangular patches, a plurality of gaussian objects corresponding to each of the triangular patches includes:

Determining the center positions of a plurality of Gaussian objects corresponding to each triangular patch according to the preset centroid coordinates and the vertex coordinates of each triangular patch in the target surface grid;

According to a preset linear transformation matrix and vertex coordinates of each triangular patch in the target surface grid, determining covariance of a plurality of Gaussian objects corresponding to each triangular patch;

determining colors of a plurality of Gaussian objects corresponding to each triangular patch according to the Gaussian appearance model and the central positions of the Gaussian objects;

And forming a plurality of Gaussian objects corresponding to each triangular patch based on preset opacity, the center position, the covariance and the color.

In a possible implementation manner of the first aspect, the determining, according to the preset centroid coordinate and the vertex coordinate of each triangular patch in the target surface mesh, a center position of the plurality of gaussian objects corresponding to each triangular patch includes:

determining the center positions of a plurality of Gaussian objects corresponding to each triangular patch based on the preset centroid coordinates, the vertex coordinates of each triangular patch and a preset center position calculation formula;

The preset center position calculation formula is as follows:

μ_i＝bc_i[v₁,v₂,v₃]^T，i＝1，2，3，…，K

Wherein μ _i is the center position of i gaussian objects, bc _i is the preset centroid coordinates, { v ₁,v₂,v₃ } is the vertex coordinates of the triangular surface patch, v ₁ is the first vertex of the triangular surface patch, v ₂ is the second vertex of the triangular surface patch, v ₃ is the third vertex of the triangular surface patch, i is the number of gaussian objects, and K is a natural number.

In a possible implementation manner of the first aspect, the determining, according to a preset linear transformation matrix and vertex coordinates of each triangular patch in the target surface mesh, covariance of a plurality of gaussian objects corresponding to each triangular patch includes:

according to a preset local coordinate system and the vertex coordinates of each triangular patch, determining a coordinate system rotation matrix of converting the preset linear transformation matrix and the local coordinate system into a global coordinate system;

Determining covariance of a plurality of Gaussian objects based on the coordinate system rotation matrix, a preset covariance matrix of the Gaussian objects in a local coordinate system, the preset linear transformation matrix and a preset covariance calculation formula;

the preset covariance calculation formula is as follows:

Wherein Σ is covariance of the plurality of gaussian objects, R _t2w is the coordinate system rotation matrix, M is the preset linear transformation matrix, Σ _e is a preset covariance matrix of the gaussian objects in a local coordinate system.

In a possible implementation manner of the first aspect, the determining, according to a preset local coordinate system and a vertex coordinate of each triangular patch, a coordinate system rotation matrix of converting the local coordinate system into a global coordinate system includes:

Determining a first axis direction, a second axis direction and a third axis direction of the preset local coordinate system according to a vertex coordinate of the triangular patch, a preset coordinate origin of the preset local coordinate system and a preset coordinate system direction determination formula of the preset local coordinate system;

Determining the coordinate system rotation matrix of converting a local coordinate system into an global coordinate system according to the first axis direction, the second axis direction and the third axis direction of the preset local coordinate system;

The preset coordinate system direction determining formula of the preset local coordinate system is as follows:

t₁：＝Norm{(v₂-v₁)×(v₃-v₁)}

t₂：＝Norm{(v₂-v₁)}

t₃：＝Norm{(t₁×t₂)}

The coordinate system rotation matrix is:

R_t2w＝[t₁,t₂,t₃]

wherein v ₁ is the preset origin of coordinates, t ₁ is the first axis direction, t ₂ is the second axis direction, t ₃ is the third axis direction, norm is vector normalization, and x is vector cross.

In a possible implementation manner of the first aspect, the determining the preset linear transformation matrix according to a preset local coordinate system and vertex coordinates of each triangular patch includes:

Taking the side length formed by the first vertex and the second vertex of each triangular surface patch in the target surface grid as the side length of an equilateral triangle;

According to the side length of the equilateral triangle, determining a preset linear transformation matrix according to the vertex coordinates of each triangular patch in the local coordinate system and a preset covariance matrix of the Gaussian object in the local coordinate system;

the preset covariance matrix of the Gaussian object in the local coordinate system is as follows:

∑_e＝diag(∈,r²,r²)

The preset linear transformation matrix is as follows:

Wherein Σ _e is a preset covariance matrix of the gaussian object in a local coordinate system, e is variances of a plurality of gaussian objects in the first axis direction, and r is variances of a plurality of gaussian objects in the second axis direction and the third axis direction; m is the preset linear transformation matrix, l is the side length of the equilateral triangle, l= |v ₂-v₁ |, < is the vector inner product operation, v ₃' is the coordinates of v ₃ in the local coordinate system, v ₃′＝[0,(v₃,t₂>,<v₃,t₃>]^T, The i-th element of v ₃'.

In a possible implementation manner of the first aspect, after the generating a rendered image corresponding to each observed image, the method further includes:

Calculating the loss between each observed image and the corresponding rendered image based on a preset loss function;

Optimizing the SDF value in the SDF grid and the neural network weight in the Gaussian appearance model in a preset gradient descent mode according to the loss to obtain the optimized SDF grid and the optimized Gaussian appearance model;

Wherein, the preset loss function is:

L(I_pred,I_gt)＝(1-λ)L₁(I_pred,I_gt)+λL_D-SSIM(I_pred,I_gt)

Wherein I _pred is the rendered image, I _gt is the observed image, λ is a weight superparameter, L ₁ is a standard L1 loss function, L _D-SSIM is a standard D-SSIM loss function, and L is a loss between the observed image and the rendered image.

In a second aspect, an embodiment of the present application provides an apparatus for three-dimensional reconstruction of a scene, including:

an observation image acquisition module for acquiring a plurality of observation images, wherein each of the observation images includes a target scene;

the surface grid generating module is used for generating a target surface grid in the symbol distance function SDF grid corresponding to the target scene in a differential surface extraction mode; wherein the symbol distance function SDF grid is a data structure for storing geometric information; the target surface mesh comprises a plurality of triangular patches, and the target surface mesh is used for indicating the surface shape of the target scene;

the Gaussian object determining module is used for determining a plurality of Gaussian objects corresponding to each triangular patch according to the Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch;

and the rendering image generation module is used for rendering the Gaussian objects according to the camera view angles corresponding to the observation images and generating rendering images corresponding to the observation images.

In a third aspect, an embodiment of the present application provides an apparatus for three-dimensional reconstruction of a scene, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a method for three-dimensional reconstruction of a scene according to any one of the preceding claims when the computer program is executed.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method of three-dimensional reconstruction of a scene as defined in any one of the preceding claims.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a terminal device, causes the terminal device to perform the method of three-dimensional reconstruction of a scene as described in any of the preceding claims.

Compared with the prior art, the embodiment of the application has the beneficial effects that: acquiring a plurality of observation images; wherein each observation image includes a target scene; generating a target surface grid in a symbol distance function SDF grid corresponding to a target scene in a differential surface extraction mode; then, according to the Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch, determining a plurality of Gaussian objects corresponding to each triangular patch, and binding the plurality of Gaussian objects to a target surface grid to obtain a mixed representation of the surface grid and the Gaussian object binding; and then, rendering a plurality of Gaussian objects according to the camera view angle corresponding to each observation image, and generating a rendering image corresponding to each observation image. By adding a differentiable surface extraction mode into a traditional Gaussian splashing model, mixed characterization of Gaussian objects and surface grids can be obtained, and compared with a traditional two-stage extraction mode, the three-dimensional scene reconstruction method is higher in efficiency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for three-dimensional reconstruction of a scene according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a preset local coordinate system on a triangular patch according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a preset local coordinate system on a triangular patch according to an embodiment of the present application after adaptive transformation;

FIG. 4 is a flow chart of a method for three-dimensional reconstruction of a scene according to another embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for three-dimensional reconstruction of a scene according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for three-dimensional reconstruction of a scene according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for three-dimensional reconstruction of a scene according to an embodiment of the application. The method comprises the following steps:

s11, acquiring a plurality of observation images; wherein each observation image includes a target scene;

S12, generating a target surface grid in a symbol distance function SDF grid corresponding to the target scene in a differentiable surface extraction mode; wherein the symbol distance function SDF grid is a data structure for storing geometric information of the target scene; the target surface grid comprises a plurality of triangular patches, and is used for indicating the surface shape of a target scene;

S13, determining a plurality of Gaussian objects corresponding to each triangular patch according to the Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch;

and S14, rendering a plurality of Gaussian objects according to the camera view angle corresponding to each observation image, and generating a rendering image corresponding to each observation image.

It should be noted that the method is mainly used for realizing three-dimensional reconstruction of a scene from multiple image observations of a static scene, wherein the result of the three-dimensional reconstruction includes a "gaussian splash model (3 DGS)" and a "surface Mesh (Mesh)".

The gaussian spray model (3D Gaussian Splatting,3DGS) describes the visual information of a scene using a number of objects called "gauss", where each gaussian includes a gaussian density function and visual information. The iso-surface of the gaussian density function is drawn by taking a certain density value, and the gaussian can be regarded as an ellipsoid with opacity and color in space. Given the camera view angle, the gaussian is 2D projected and integrated using a Sputtering (SPLATTING) technique, rendering the gaussian into one image. The Gaussian splash model is used for carrying out explicit modeling on a scene (namely, the scene is expressed as ellipsoids), and a rendering program which is highly optimized for the GPU is matched, so that the rendering quality and the rendering speed are both considered in the mode.

Wherein the gaussian density function is defined as:

Where x ε R ³ is a query point in three dimensions, μ ε R ³ is the center position of the Gaussian, Σε R ^3×3 is the covariance matrix of the Gaussian. Visual information of a gaussian object is quantized into opacity α, α e R, and color c, c e R ^D. Wherein, the color is described by spherical harmonic coefficients, and the vector dimension D is determined according to the precision requirement.

A surface Mesh (Mesh) is a polygonal Mesh structure consisting of vertices (Vertices), edges (Edges), and Faces (Faces) for representing and processing three-dimensional shapes or objects. In a surface mesh, vertices are points in space that define the basic position of the mesh; the edges are connected with two vertexes, and a topological structure of the grid is defined; a face is composed of a plurality of vertices and edges, typically triangles or quadrilaterals, which together form the surface of an object.

When the output representation of the three-dimensional reconstruction of the scene is a hybrid representation of the gaussian object and the surface mesh, higher rendering quality, speed and manipulability can be achieved.

In step S11, the observation image may be an image of the target scene acquired by a single camera at different angles of view, or may be an image of the target scene acquired by a plurality of cameras at a single time or a plurality of times. Each observation image includes a target scene. The target scene is an object that needs to be reconstructed in three dimensions.

In step S12, it should be noted that the differentiable surface extraction method is a method for extracting a discrete surface mesh from continuous geometric information (such as implicit function or voxel data). The method can process continuous changes of the surface shape, can provide differential information (such as normal and curvature) about the surface shape, and is beneficial to subsequent rendering, physical simulation and shape optimization. By analyzing the geometric characteristics of the input data, the specific position of the surface is determined according to preset conditions. In this embodiment, the method of extracting the differentiable surface may be FlexiCubes, DMTet, and the method of extracting the differentiable surface is not particularly limited in this embodiment.

The sign distance Function SDF grid (SIGNED DISTANCE Function) is a data structure that stores geometric information for calculating the directed distance of any point in space to a shape boundary. The SDF grid contains N ³ nodes, each with a real value, the SDF value, representing the signed distance of the node location to the nearest surface. When the node is inside the object, the SDF value is negative; when the node is outside the object, the SDF value is positive; when the node is on the object boundary, the SDF value is zero. The SDF grid contains a plurality of voxels, each consisting of eight nodes, storing the SDF values of the corresponding nodes, thereby providing detailed geometric information about scene boundaries. The SDF mesh in this embodiment may be a coarse model of the target scene or a randomly initialized model.

Triangular patches are the basic elements that make up a surface mesh, and are two-dimensional geometric figures made up of three vertices and three edges. Each triangular patch defines a local planar area that together form the surface of the entire object, i.e. by combining a large number of triangular patches together, a complex surface grid can be formed. The target surface mesh is a surface mesh generated in the SDF mesh corresponding to the target scene for indicating the surface shape of the target scene.

Specifically, in the SDF mesh in which the geometric information of the target scene is stored, the SDF mesh is analyzed by using the differentiable surface extraction method, thereby generating one surface mesh, i.e., the target surface mesh. This target surface mesh is used to indicate the surface shape of the target scene, which contains a plurality of triangular patches. It can be seen that this process combines the accuracy of the differentiable surface extraction approach with the richness of the geometric information of the SDF grid.

In step S13, in the three-dimensional graph, the centroid generally refers to the average position of the center of mass or discrete points of the shape. The preset centroid coordinate is a centroid coordinate set when a plurality of gaussian objects are bound and is used for determining a specific position or direction in a scene. The gaussian appearance model is a probability model for describing texture or color change of the surface of an object, and can capture statistical distribution of properties such as color, brightness and the like of the surface of the object. In this embodiment, a gaussian appearance model is used to maintain color information of the scene.

Specifically, calculating the center positions and covariance of a plurality of Gaussian objects corresponding to each triangular patch through preset centroid coordinates and vertex coordinates of each triangular patch; obtaining colors of a plurality of Gaussian objects by inquiring the Gaussian appearance model; and setting the opacity of the plurality of gaussian objects to a fixed value, thereby determining all parameters of the gaussian objects, namely, center position, covariance, color and opacity. That is, a plurality of gaussian objects are bound on each triangular patch.

In step S14, the camera view angle is the angle and manner of looking at the entire scene from the position of the camera or the observer, and different view angles may bring about different visual experiences and feelings. Each observation image corresponds to a camera view angle.

Specifically, a camera view angle corresponding to each observation image is determined, each Gaussian object is transformed according to the camera view angle, and then the transformed Gaussian object is rendered through a standard graphic rendering flow, so that a rendering image corresponding to each observation image is obtained.

It will be appreciated that a plurality of observation images are acquired; wherein each observation image includes a target scene; generating a target surface grid in a symbol distance function SDF grid corresponding to a target scene in a differential surface extraction mode; then, according to the Gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each triangular patch, determining a plurality of Gaussian objects corresponding to each triangular patch, and binding the plurality of Gaussian objects to a target surface grid to obtain a mixed representation of the surface grid and the Gaussian object binding; and then, rendering a plurality of Gaussian objects according to the camera view angle corresponding to each observation image, and generating a rendering image corresponding to each observation image. By adding a differentiable surface extraction mode into a traditional Gaussian splashing model, mixed characterization of Gaussian objects and surface grids can be obtained, and compared with a traditional two-stage extraction mode, the three-dimensional scene reconstruction method is higher in efficiency.

In one possible implementation manner, generating the target surface mesh in the symbol distance function SDF mesh corresponding to the target scene through a differentiable surface extraction manner includes:

determining the SDF values corresponding to a plurality of voxels in the symbol distance function SDF grid corresponding to the target scene according to the plurality of observation images;

generating a plurality of triangular patches in each voxel by a differentiable surface extraction mode according to preset generation conditions and the SDF value of each voxel;

and combining all triangular patches to generate a corresponding target surface grid.

The voxel is the smallest unit of the digital data in three-dimensional space division. The SDF grid contains a plurality of voxels, each consisting of eight nodes, storing the SDF values of the corresponding nodes, thereby providing detailed geometric information about scene boundaries. The preset generation conditions are a series of conditions and parameters set when generating a triangular patch from the SDF value of a voxel. The preset generating condition may be that when one SDF value is a positive value and the other SDF value is a negative value, the intersection point is determined when the 2 SDF values corresponding to each edge of the voxel. The specific position of "0", i.e., the intersection point, is inferred by presetting the generation condition, and all the intersection points are connected to generate the triangular patch.

Specifically, according to each observation image, determining the SDF values of a plurality of voxels in an SDF grid corresponding to the target scene, namely, the SDF grid stores the geometric information of the target scene, and each voxel in the grid contains the SDF value of a corresponding node; then, analyzing the SDF value of each voxel in the SDF grid by utilizing a differential surface extraction mode, and generating a certain number of triangular patches in each voxel according to a preset generation condition; and finally, combining all triangular patches to form a complete target surface grid. In this embodiment, 0-4 triangular patches may be generated for each voxel, and the number of triangular patches is not limited.

In one possible implementation manner, determining a plurality of gaussian objects corresponding to each triangular patch according to the gaussian appearance model, the preset centroid coordinates and the vertex coordinates of each triangular patch includes:

according to the preset centroid coordinates and the vertex coordinates of each triangular patch in the target surface grid, determining the center positions of a plurality of Gaussian objects corresponding to each triangular patch;

determining colors of a plurality of Gaussian objects corresponding to each triangular patch according to the Gaussian appearance model and the central positions of the plurality of Gaussian objects;

Based on preset opacity, center position, covariance and color, a plurality of Gaussian objects corresponding to each triangular patch are formed.

It should be noted that the gaussian object includes key attributes of center position, covariance, color, and opacity, which together define the position and appearance of the gaussian object in three-dimensional space. The center position of a gaussian object is the geometric center or centroid of the gaussian object, representing the position of the gaussian object in a three-dimensional scene. In this embodiment, the center positions of the plurality of gaussian objects are determined according to the preset centroid coordinates and the vertex coordinates of each triangular patch in the target surface mesh.

The covariance of a gaussian object describes the shape and direction of the gaussian object in a three-dimensional scene. In the three-dimensional reconstruction process, the expansion degree of the Gaussian object in each direction and the correlation between the expansion degree and the correlation can be known through the covariance matrix. In this embodiment, the covariance of the plurality of gaussian objects is determined according to the preset linear transformation matrix and the vertex coordinates of each triangular patch in the target surface mesh. The preset linear transformation matrix is a linear transformation matrix which realizes self-adaption of a covariance matrix of the Gaussian object in a local coordinate system.

Appearance information of the gaussian object includes color and opacity. Gaussian objects are typically given specific color information to simulate the color of a scene. In this embodiment, the color of the gaussian object is determined by a gaussian appearance model. The opacity of a gaussian object determines how visible the gaussian object is when rendered. In three-dimensional reconstruction, the occlusion relationship and visual effect between different objects in a scene can be controlled by adjusting the opacity of a gaussian object. In this embodiment, the opacity of the gaussian object is set to a fixed value, i.e., α=1, i.e., a preset opacity.

Specifically, in this embodiment, a gaussian appearance model a is used to maintain the color information of the scene, and for a gaussian object having a center position μ, the color c is set to:

c＝A(μ)＝NN(Enc(μ)) (2)

Where Enc is an encoder that maps three-dimensional positions to a high-dimensional space, hashGrid, sinusoidal may be used. NN is a neural network learning model, and the NN inputs codes of central positions and outputs spherical harmonic coefficients, namely the output colors. The network structure of the neural network NN may use a layer 2 MLP deep learning model.

In one possible implementation manner, determining the center positions of a plurality of gaussian objects corresponding to each triangular patch according to the preset centroid coordinates and the vertex coordinates of each triangular patch in the target surface grid includes:

Determining the center positions of a plurality of Gaussian objects corresponding to each triangular patch based on a preset centroid coordinate, a vertex coordinate of each triangular patch and a preset center position calculation formula;

The preset center position calculation formula is as follows:

μ_i＝bc_i[v₁,v₂,v₃]^T，i＝1，2，3，…，K

Wherein μ _i is the center position of i gaussian objects, bc _t is the preset centroid coordinates, { v ₁,v₂,v₃ } is the vertex coordinates of the triangular patch, v ₁ is the first vertex of the triangular patch, v ₂ is the second vertex of the triangular patch, v ₃ is the third vertex of the triangular patch, i is the number of gaussian objects, and K is a natural number.

In particular, the target surface mesh may be represented as (V, F) comprising vertices And triangular patches f= { F ₁,f₂,...,f_F }, each triangular patch F is a triplet, and the value range of the element is [ 1..v ]. Each triangular patch f can bind K gaussian objects. The center positions of the K gaussian objects can be calculated by using the vertex coordinates { v ₁,v₂,v₃ } of the triangular patches and K preset centroid coordinates, v ₁ is the first vertex of the triangular patches, v ₂ is the second vertex of the triangular patches, and v ₃ is the third vertex of the triangular patches. The preset center position calculation formula is set to calculate the center positions of the plurality of gaussian objects. The preset center position calculation formula is as follows:

μ_i＝bc_i[v₁,v₂,v₃]^T，i＝1，2，3，...，K (3)

Where μ _i is the center position of i gaussian objects, bc _i is the preset centroid coordinates, { v ₁,v₂,v₃ } is the vertex coordinates of the triangular patch, v ₁ is the first vertex of the triangular patch, v ₂ is the second vertex of the triangular patch, v ₃ is the third vertex of the triangular patch, i is the number of gaussian objects, and K is a natural number.

Taking k=3 as an example, the preset centroid coordinates may be set as shown in formula (4):

in this embodiment, the selection of the preset centroid coordinates is not limited, and may depend on the scene.

In one possible implementation, determining the covariance of the plurality of gaussian objects corresponding to each triangular patch according to the preset linear transformation matrix and the vertex coordinates of each triangular patch in the target surface mesh includes:

According to a preset local coordinate system and the vertex coordinates of each triangular patch, determining a preset linear transformation matrix and a coordinate system rotation matrix of which the local coordinate system is converted into a global coordinate system;

determining covariance of a plurality of Gaussian objects based on a coordinate system rotation matrix, a preset covariance matrix of the Gaussian objects in a local coordinate system, a preset linear transformation matrix and a preset covariance calculation formula;

the preset covariance calculation formula is as follows:

Wherein, Σ is the covariance of a plurality of gaussian objects, R _t2w is the coordinate system rotation matrix, M is the preset linear transformation matrix, Σ _e is the preset covariance matrix of the gaussian objects in the local coordinate system.

When calculating covariance of the gaussian object, determining a coordinate system rotation matrix of converting a preset linear transformation matrix and a local coordinate system into a global coordinate system according to a preset local coordinate system and vertex coordinates of each triangular patch; then, covariance of the plurality of Gaussian objects is determined based on the coordinate system rotation matrix, a preset covariance matrix of the Gaussian objects in the local coordinate system and a preset linear transformation matrix.

Specifically, the preset local coordinate system is a local coordinate system defined on the triangular patch. The local coordinate system is a virtual coordinate system, and is mainly used for describing information of a local environment. The global coordinate system is a reference for describing the positions and directions of all objects in a scene, and is also called a world coordinate system; the global coordinate system provides a reference for the absolute position of the object in the scene. The preset linear transformation matrix is a linear transformation matrix capable of realizing self-adaption of covariance matrix of the Gaussian object in a local coordinate system. The coordinate system rotation matrix is a matrix that converts a local coordinate system into a global coordinate system. The preset covariance matrix is a covariance matrix of a preset Gaussian object in a local coordinate system. Through presetting the linear transformation matrix, a covariance matrix of the Gaussian object in the global coordinate system can be obtained, so that the Gaussian object can be adaptively adjusted. As shown in formula (5), according to a preset covariance calculation formula, a covariance matrix of the gaussian object in the global coordinate system can be obtained, and the preset covariance calculation formula is as follows:

In one possible implementation, determining a coordinate system rotation matrix for converting the local coordinate system into the global coordinate system according to a preset local coordinate system and vertex coordinates of each triangular patch includes:

Determining a first axis direction, a second axis direction and a third axis direction of a preset local coordinate system according to a vertex coordinate of the triangular patch, a preset coordinate origin of the preset local coordinate system and a preset coordinate system direction determining formula of the preset local coordinate system;

Determining a coordinate system rotation matrix of converting the local coordinate system into an global coordinate system according to a first axis direction, a second axis direction and a third axis direction of a preset local coordinate system;

t₁：＝Norm{(v₂-v₁)×(v₃-v₁)}

t₂：＝Norm{(v₂-v₁)}

t₃：＝Norm{(t₁×t₂)}

The coordinate system rotation matrix is:

R_t2w＝[t₁,t₂,t₃]

Wherein v ₁ is a preset origin of coordinates, t ₁ is a first axis direction, t ₂ is a second axis direction, t ₃ is a third axis direction, norm is vector normalization, and x is vector cross-multiplication.

Specifically, the preset local coordinate system is a local coordinate system defined on the triangular patch, as shown in fig. 2, and fig. 2 is a schematic diagram of the preset local coordinate system on the triangular patch according to an embodiment of the present application. In fig. 2, the vertex coordinates of the triangular patch are { v ₁,v₂,v₃ }, v ₁ is set as the preset origin of coordinates of the preset local coordinate system, and the first axis direction, the second axis direction and the third axis direction of the preset local coordinate system are determined according to the preset coordinate system direction determination formula of the preset local coordinate system. The first axis direction is the normal vector direction, namely the first axis is vertical to the triangular surface patch; the second axis direction and the third axis direction are respectively perpendicular to the first axis direction. The preset coordinate system direction determining formula of the preset local coordinate system is as follows:

Where v ₁ is a preset origin of coordinates, t ₁ is a first axis direction, t ₂ is a second axis direction, t ₃ is a third axis direction, norm is vector normalization, and x is vector cross-multiplication.

Then, according to the first axis direction, the second axis direction and the third axis direction of the preset local coordinate system, a coordinate system rotation matrix of the local coordinate system converted into the global coordinate system can be determined. By rotating the matrix in the coordinate system, the gaussian object can be converted from the local coordinate system to the global coordinate system. Namely, the coordinate system rotation matrix is shown in formula (7):

R_t2w＝[t₁,t₂,t₃] (7)

Wherein R _t2w is a coordinate system rotation matrix of converting a local coordinate system into an global coordinate system.

In one possible implementation, determining the preset linear transformation matrix according to the preset local coordinate system and the vertex coordinates of each triangular patch includes:

∑_e＝diag(∈,r²,r²)

The preset linear transformation matrix is as follows:

Wherein Σ _e is a preset covariance matrix of the gaussian object in the local coordinate system, e is variances of the plurality of gaussian objects in the first axis direction, and r is variances of the plurality of gaussian objects in the second axis direction and the third axis direction; m is a preset linear transformation matrix, l is the side length of an equilateral triangle, l= |v ₂-v₁ |, < is a vector inner product operation, v '₃ is the coordinates of v ₃ in a local coordinate system, v' ₃＝[0,<v₃,t₂>,<v₃,t₃>]^T, The i-th element of v' ₃.

Specifically, for an equilateral triangle, the covariance matrix of the gaussian object in the local coordinate system may naturally be set to Σ _e＝diag(∈,r²,r², where e is the variance of multiple gaussian objects in the first axis direction, which may be a small number, meaning that the gaussian objects are flat in the first axis direction; r is the variance of the plurality of gaussian objects in the second axis direction and the third axis direction, and is a hyper-parameter.

For irregular triangular patches, the adaptation can be realized by applying a preset linear transformation matrix M to a preset covariance matrix Σ _e of a gaussian object in a local coordinate system, as shown in fig. 3, fig. 3 is a schematic diagram of a local coordinate system preset on the triangular patches after adaptive transformation according to an embodiment of the present application. The preset linear transformation matrix M is a matrix for transforming an equilateral triangle containing vertices v ₁、v₂ to a target triangle. The preset linear transformation matrix is as follows:

Wherein M is a preset linear transformation matrix, l is the side length of an equilateral triangle, l= |v ₂-v₁ |, < is a vector inner product operation, v '₃ is the coordinate of v ₃ in a local coordinate system, v' ₃＝[0,<v₃,t₂>,<v₃,t₃>]^T, The i-th element of v' ₃.

In one possible implementation manner, after generating the rendered image corresponding to each observation image, the method for three-dimensional reconstruction of a scene further includes:

Calculating the loss between each observation image and the corresponding rendering image based on a preset loss function;

optimizing the SDF value in the SDF grid and the neural network weight in the Gaussian appearance model in a preset gradient descent mode according to the loss to obtain an optimized SDF grid and an optimized Gaussian appearance model;

the preset loss function is as follows:

L(I_pred,I_gt)＝(1-λ)L₁(I_pred,I_gt)+λL_D-SSIM(I_pred,I_gt)

Wherein, I _pred is a rendered image, I _gt is an observed image, lambda is a weight hyper-parameter, L ₁ is a standard L1 loss function, L _D-SSIM is a standard D-SSIM loss function, and L is a loss between the observed image and the rendered image.

The loss function is an arithmetic function for quantifying the difference between the observation image and the rendered image. The preset loss function is a loss function selected in advance according to a task or a scene of three-dimensional reconstruction. As in the present embodiment, a standard L1 loss function and a standard D-SSIM loss function are employed.

The preset gradient descent method is a preset algorithm for optimizing parameters in the SDF grid and the Gaussian appearance model. The gradient descent method includes various kinds, such as standard gradient descent, random gradient descent, etc. In this embodiment, the specific type of the preset gradient descent method is not limited.

Specifically, as shown in fig. 4, fig. 4 is a flow chart of a method for three-dimensional reconstruction of a scene according to another embodiment of the present application. After the rendered image is generated, the rendered image is compared with the observed image to evaluate accuracy and reliability of the gaussian appearance model and the SDF grid. And calculating the loss between each observation image and the corresponding rendering image through a preset loss function, and optimizing the SDF value in the SDF grid and the neural network weight in the Gaussian appearance model through a preset gradient descent mode so as to reduce the loss, thereby realizing the optimization of the SDF grid and the Gaussian appearance model.

After the SDF value in the SDF grid and the neural network weight in the Gaussian appearance model are optimized for a plurality of times in a preset gradient descent mode, after loss convergence, the target surface grid extracted for the last time and the dynamically generated Gaussian object and the color thereof are reserved, and the hybrid representation of the surface grid and the Gaussian object is obtained.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 5 is a schematic structural diagram of a device for three-dimensional reconstruction of a scene according to an embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 5, the apparatus 2 for three-dimensional reconstruction of a scene of this embodiment includes:

An observation image acquisition module 21 for acquiring a plurality of observation images, wherein each of the observation images includes a target scene;

A surface mesh generating module 22, configured to generate a target surface mesh in a symbol distance function SDF mesh corresponding to the target scene by using a differentiable surface extraction manner; wherein the symbol distance function SDF grid is a data structure for storing geometric information; the target surface mesh comprises a plurality of triangular patches, and the target surface mesh is used for indicating the surface shape of the target scene;

a gaussian object determining module 23, configured to determine a plurality of gaussian objects corresponding to each triangular patch according to a gaussian appearance model, a preset centroid coordinate, and a vertex coordinate of each triangular patch;

And the rendering image generating module 24 is configured to render the plurality of gaussian objects according to the camera view angle corresponding to each of the observed images, and generate a rendering image corresponding to each of the observed images.

It can be understood that in this embodiment, by adding a differentiable surface extraction algorithm to a traditional gaussian splashing model, two characterizations of a surface grid and a gaussian object can be obtained at the same time, and compared with a traditional two-stage extraction method, the three-dimensional reconstruction method of the scene of the present application has higher efficiency.

It should be noted that, because the content such as information interaction and execution process between each module in the device 2 for three-dimensional reconstruction of a scene is based on the same conception as the embodiment of the method of the present application, specific functions and technical effects thereof may be found in the method embodiment section, and details are not repeated herein.

The embodiment of the application also provides a device for three-dimensional reconstruction of the scene, as shown in fig. 6, and fig. 6 is a schematic structural diagram of the device for three-dimensional reconstruction of the scene. Referring to fig. 6, the apparatus 3 for three-dimensional reconstruction of a scene of this embodiment includes: comprising a memory 31, a processor 32 and a computer program stored in the memory 31 and executable on the processor 32, the processor 32 implementing the steps in a method embodiment of three-dimensional reconstruction of a scene of any of the above, when the computer program is executed.

The embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor implements steps of the above-described respective method embodiments.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for three-dimensional reconstruction of a scene, comprising:

Generating a target surface grid in a symbol distance function SDF grid corresponding to the target scene in a differential surface extraction mode; wherein the symbol distance SDF grid is a data structure storing geometric information of the target scene; the target surface mesh comprises a plurality of triangular patches, and the target surface mesh is used for indicating the surface shape of the target scene;

2. The method of three-dimensional reconstruction of a scene according to claim 1, wherein generating a target surface mesh in a symbol distance function SDF mesh corresponding to the target scene by means of differentiable surface extraction comprises:

3. The method of three-dimensional reconstruction of a scene according to claim 1, wherein determining a plurality of gaussian objects corresponding to each of the triangular patches according to a gaussian appearance model, a preset centroid coordinate and a vertex coordinate of each of the triangular patches comprises:

4. The method for three-dimensional reconstruction of a scene according to claim 3, wherein determining the center positions of the plurality of gaussian objects corresponding to each triangular patch according to the preset centroid coordinates and the vertex coordinates of each triangular patch in the target surface mesh comprises:

The preset center position calculation formula is as follows:

μ_i＝bc_i[v₁,v₂,v₃]^T，i＝1,2,3,...,K

5. The method of three-dimensional reconstruction of a scene as set forth in claim 4, wherein said determining covariance of a plurality of said gaussian objects corresponding to each of said triangular patches based on a preset linear transformation matrix and vertex coordinates of each triangular patch in said target surface mesh comprises:

the preset covariance calculation formula is as follows:

6. The method of three-dimensional reconstruction of a scene as set forth in claim 5, wherein said determining a coordinate system rotation matrix of a local coordinate system converted into a global coordinate system based on a preset local coordinate system and vertex coordinates of each of said triangular patches comprises:

Determining a first axis direction, a second axis direction and a third axis direction of the preset local coordinate system according to a vertex coordinate of the triangular patch, a preset coordinate origin of the preset local coordinate system and a preset coordinate system direction determination formula of the preset local coordinate system; wherein the first vertex of the triangular patch is used as the preset origin of coordinates of the preset local coordinate system;

t₁：＝Norm{(v₂-v₁)×(v₃-v₁)}

t₂：＝Norm{(v₂-v₁)}

t₃：＝Norm{(t₁×t₂)}

The coordinate system rotation matrix is:

R_t2w＝[t₁,t₂,t₃]

7. The method of three-dimensional reconstruction of a scene as set forth in claim 6, wherein said determining said preset linear transformation matrix from a preset local coordinate system and vertex coordinates of each of said triangular patches comprises:

∑_e＝diag(∈,r²,r²)

The preset linear transformation matrix is as follows:

Wherein Σ _e is a preset covariance matrix of the gaussian object in a local coordinate system, e is variances of a plurality of gaussian objects in the first axis direction, and r is variances of a plurality of gaussian objects in the second axis direction and the third axis direction; m is the preset linear transformation matrix, l is the side length of the equilateral triangle, l= |v ₂-v₁ |, </> is the vector inner product operation, v '₃ is the coordinates of v ₃ in the local coordinate system, v' ₃＝[0,<v₃,t₂>,<v₃,t₃>]^T, The i-th element of v' ₃.

8. The method of three-dimensional reconstruction of a scene according to claim 2, wherein after said generating a rendered image corresponding to each of said observed images, said method further comprises:

Wherein, the preset loss function is:

L(I_pred,I_gt)＝(1-λ)L₁(I_pred,I_gt)+λL_D-SSIM(I_pred,I_gt)

9. An apparatus for three-dimensional reconstruction of a scene, comprising:

the surface grid generating module is used for generating a target surface grid in the symbol distance function SDF grid corresponding to the target scene in a differential surface extraction mode; wherein the symbol distance SDF grid is a data structure storing geometric information of the target scene; the target surface mesh comprises a plurality of triangular patches, and the target surface mesh is used for indicating the surface shape of the target scene;

10. An apparatus for three-dimensional reconstruction of a scene, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any one of claims 1 to 8 when executing the computer program.