[go: up one dir, main page]

CN116805349A - Indoor scene reconstruction method, device, electronic equipment and media - Google Patents

Indoor scene reconstruction method, device, electronic equipment and media Download PDF

Info

Publication number
CN116805349A
CN116805349A CN202310544904.XA CN202310544904A CN116805349A CN 116805349 A CN116805349 A CN 116805349A CN 202310544904 A CN202310544904 A CN 202310544904A CN 116805349 A CN116805349 A CN 116805349A
Authority
CN
China
Prior art keywords
training
point
point cloud
scene
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310544904.XA
Other languages
Chinese (zh)
Inventor
齐越
曲延松
王君义
段宛彤
王宇泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310544904.XA priority Critical patent/CN116805349A/en
Publication of CN116805349A publication Critical patent/CN116805349A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Graphics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种室内场景重建方法、装置、电子设备及介质。方法包括:根据训练图像获取训练视角参数并建立场景点云;在从训练视角原点出发并经过训练图像中每个像素点的位置点的射线上,选取多个位置点作为训练采样点;确定位于训练采样点附近预定范围内场景点云中的点云点作为训练点集合,将训练点集合当前的输入信息输入至当前的场景重建模型,得到训练点集合的颜色和光场强度;插值得到训练采样点的颜色和光场强度后,渲染得到训练视角参数下的渲染图像;根据训练视角参数下的渲染图像与训练图像之间的损失,对当前的输入信息与场景重建模型进行训练。本申请的方法,解决了训练图像数量较少时,生成室内场景渲染图像质量差的问题。

This application provides an indoor scene reconstruction method, device, electronic equipment and media. The method includes: obtaining training perspective parameters based on the training image and establishing a scene point cloud; selecting multiple location points as training sampling points on the ray starting from the origin of the training perspective and passing through the location point of each pixel in the training image; determining The point cloud points in the scene point cloud within a predetermined range near the training sampling point are used as a training point set. The current input information of the training point set is input into the current scene reconstruction model to obtain the color and light field intensity of the training point set; the training samples are obtained by interpolation After adjusting the color and light field intensity of the point, the rendered image under the training perspective parameters is obtained by rendering; based on the loss between the rendered image under the training perspective parameters and the training image, the current input information and scene reconstruction model are trained. The method of this application solves the problem of poor quality indoor scene rendering images when the number of training images is small.

Description

室内场景重建方法、装置、电子设备及介质Indoor scene reconstruction method, device, electronic equipment and media

技术领域Technical field

本申请涉及虚拟现实技术领域,尤其涉及一种室内场景重建方法、装置、电子设备及介质。The present application relates to the field of virtual reality technology, and in particular to an indoor scene reconstruction method, device, electronic equipment and media.

背景技术Background technique

虚拟现实技术(Virtual Reality,VR),以计算机技术为主,利用并综合三维图形技术、多媒体技术、仿真技术、显示技术、伺服技术等多种高科技的最新发展成果,借助计算机等设备产生一个逼真的三维视觉、触觉、嗅觉等多种感官体验的虚拟世界,从而使处于虚拟世界中的人产生一种身临其境的感觉。随着社会生产力和科学技术的不断发展,各行各业对VR技术的需求日益旺盛。VR技术也取得了巨大进步,并逐步成为一个新的科学技术领域。室内场景的新视角合成在在虚拟现实与人的交互中扮演了重要的角色,因此该任务在很多VR应用中有很大的潜力,比如室内场景虚拟漫游或者游客导航等。Virtual Reality (VR) technology, based on computer technology, utilizes and integrates the latest development achievements of various high-tech technologies such as three-dimensional graphics technology, multimedia technology, simulation technology, display technology, and servo technology, and uses computers and other equipment to generate a A virtual world with realistic three-dimensional vision, touch, smell and other sensory experiences, thus giving people in the virtual world an immersive feeling. With the continuous development of social productivity and science and technology, the demand for VR technology in all walks of life is increasingly strong. VR technology has also made great progress and gradually become a new field of science and technology. New perspective synthesis of indoor scenes plays an important role in the interaction between virtual reality and people, so this task has great potential in many VR applications, such as indoor scene virtual tours or tourist navigation.

现有技术中,室内场景新视角合成大量采用神经辐射场,使用连续的多层感知器来编码三维场景的亮度和密度,并通过光线追踪来重建光场。In the existing technology, new perspective synthesis of indoor scenes uses neural radiation fields extensively, uses continuous multi-layer perceptrons to encode the brightness and density of the three-dimensional scene, and reconstructs the light field through ray tracing.

上述方案中,为了重建一个房间大小的场景,至少需要几百张环绕场景密集采集的RGB图像,需要较大的人力成本与时间成本并且对拍摄质量有较高的要求。如果输入图像数量不够,生成的渲染图像会出现很多空洞以及错误的渲染,因此存在训练图像数量较少时,生成室内场景渲染图像的质量差的问题。In the above scheme, in order to reconstruct a room-sized scene, at least hundreds of RGB images densely collected around the scene are required, which requires large labor costs and time costs and has high requirements for shooting quality. If the number of input images is not enough, the generated rendered image will have many holes and erroneous renderings. Therefore, there is a problem of poor quality indoor scene rendering images when the number of training images is small.

发明内容Contents of the invention

本申请提供一种室内场景重建方法、装置、电子设备及介质,用以解决训练图像数量较少时,生成室内场景渲染图像的质量差的问题。This application provides an indoor scene reconstruction method, device, electronic equipment and media to solve the problem of poor quality of generated indoor scene rendering images when the number of training images is small.

一方面,本申请提供一种室内场景重建方法,包括:On the one hand, this application provides an indoor scene reconstruction method, including:

在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;Image collection is performed indoors to obtain training images; and, based on the motion recovery structure technology, the training perspective parameters corresponding to each training image are obtained according to the training images and a scene point cloud is established; wherein, the scene point cloud is Point clouds representing structural information of indoor scenes;

在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;On the ray starting from the origin of the training perspective and passing through the position point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud, select a plurality of the position points as training sampling points; and, based on semantics The prediction technology determines the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, and obtains the current input information of the training point set; wherein the input information includes Spatial location, neural characteristics, and ray direction;

将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;Input the current input information of the training point set into the current scene reconstruction model to obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color and light field intensity of the training point set Intensity, obtain the color and light field intensity of the training sampling point through interpolation; and, obtain the rendered image under the training perspective parameter by rendering according to the color and light field intensity of the training sampling point;

根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。According to the loss between the rendered image under the training perspective parameters and the training image under the training perspective parameters, the current input information and the current scene reconstruction model are trained and corrected until the trained input information and the current scene reconstruction model are obtained. The scene reconstruction model.

可选的,所述基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,包括:Optionally, determining point cloud points in the scene point cloud within a predetermined range near the multiple training sampling points as a training point set based on semantic prediction technology includes:

基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values;

针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point.

可选的,所述神经特征包括第一神经特征和第二神经特征,所述获取所述训练点集合当前的输入信息,包括:Optionally, the neural features include first neural features and second neural features, and obtaining the current input information of the training point set includes:

将所有所述训练图像与所述场景点云输入语义预测网络,得到所述场景点云中各个点云点的第一神经特征;Input all the training images and the scene point cloud into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud;

获取与点云点距离最近的训练视角原点对应的所述训练图像,将该训练图像输入卷积神经网络,得到所述点云点的第二神经特征。The training image corresponding to the origin of the training perspective closest to the point cloud point is obtained, and the training image is input into the convolutional neural network to obtain the second neural feature of the point cloud point.

可选的,所述方法还包括:Optionally, the method also includes:

获取待重建场景的目标视角参数;以及,根据所述目标视角参数确定虚拟像素点;Obtain the target perspective parameters of the scene to be reconstructed; and determine virtual pixels according to the target perspective parameters;

在自所述目标视角原点出发并经过至场景点云中每个所述虚拟像素点对应的位置点的射线上,选取多个所述位置点作为目标采样点;以及,确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,并获取所述目标点集合的所述输入信息;On the ray starting from the origin of the target perspective and passing through the position point corresponding to each of the virtual pixel points in the scene point cloud, select a plurality of the position points as the target sampling points; and, determine that the position points are located at the plurality of Point cloud points in the scene point cloud within a predetermined range near the target sampling point are used as a target point set, and the input information of the target point set is obtained;

将所述目标点集合的所述输入信息,输入至所述场景重建模型,得到所述场景重建模型输出的所述目标点集合的颜色和光场强度;以及,根据所述目标点集合的颜色和光场强度,通过插值获得所述目标采样点的颜色和光场强度;以及,根据所述目标采样点的颜色和光场强度,通过渲染得到所述目标视角参数下的渲染图像。Input the input information of the target point set into the scene reconstruction model to obtain the color and light field intensity of the target point set output by the scene reconstruction model; and, according to the color and light field intensity of the target point set Field intensity: obtain the color and light field intensity of the target sampling point through interpolation; and obtain the rendered image under the target viewing angle parameters through rendering according to the color and light field intensity of the target sampling point.

可选的,所述确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,包括:Optionally, determining the point cloud points in the scene point cloud located within a predetermined range near the multiple target sampling points as a target point set includes:

获取所述场景点云中位于所述多个目标采样点附近预定范围内的所有点云点,选取与每个所述采样点距离最近的预定值个所述点云点,作为所述目标点集合。Obtain all point cloud points in the scene point cloud located within a predetermined range near the plurality of target sampling points, and select a predetermined number of the point cloud points closest to each of the sampling points as the target points. gather.

可选的,所述训练图像,满足以下条件:所述训练图像包含房间表面;任意两个相邻的训练视角参数下的训练图像之间的重叠部分不少于整个图像大小的百分之三十。Optionally, the training image satisfies the following conditions: the training image contains a room surface; the overlap between any two adjacent training images under training perspective parameters is not less than 3% of the entire image size. ten.

另一方面,本申请提供一种室内场景重建装置,包括:On the other hand, this application provides an indoor scene reconstruction device, including:

采集模块,用于在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;An acquisition module is used to collect images indoors to obtain training images; and, based on the motion recovery structure technology, obtain the training perspective parameters corresponding to each of the training images according to the training images and establish a scene point cloud; wherein, The scene point cloud described above is a point cloud that represents the structural information of the indoor scene;

采样模块,用于在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;A sampling module, configured to select a plurality of the position points as training sampling points on the ray starting from the origin of the training perspective and passing through the position point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud. ; And, based on semantic prediction technology, determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, and obtain the current input information of the training point set; wherein, The input information includes spatial position, neural characteristics and ray direction;

渲染模块,用于将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;A rendering module, configured to input the current input information of the training point set into the current scene reconstruction model, and obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the training point The color and light field intensity of the set are interpolated to obtain the color and light field intensity of the training sampling point; and, based on the color and light field intensity of the training sampling point, the rendered image under the training perspective parameter is obtained through rendering;

修正模块,用于根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。A correction module, configured to train and correct the current input information and the current scene reconstruction model according to the loss between the rendered image under the training perspective parameters and the training image under the training perspective parameters until a trained The input information and the scene reconstruction model.

可选的,所述采样模块,具体用于:Optional, the sampling module is specifically used for:

基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values;

针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point.

可选的,所述神经特征包括第一神经特征和第二神经特征,所述采样模块,还具体用于:Optionally, the neural features include first neural features and second neural features, and the sampling module is also specifically used for:

将所有所述训练图像与所述场景点云输入语义预测网络,得到所述场景点云中各个点云点的第一神经特征;Input all the training images and the scene point cloud into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud;

获取与点距离最近的训练视角原点对应的所述训练图像,将该训练图像输入卷积神经网络,得到所述点云点的第二神经特征。The training image corresponding to the origin of the training perspective closest to the point distance is obtained, and the training image is input into the convolutional neural network to obtain the second neural feature of the point cloud point.

可选的,所述装置还包括,重建模块,用于:Optionally, the device also includes a reconstruction module, used for:

获取待重建场景的目标视角参数;以及,根据所述目标视角参数确定虚拟像素点;Obtain the target perspective parameters of the scene to be reconstructed; and determine virtual pixels according to the target perspective parameters;

在自所述目标视角原点出发并经过至场景点云中每个所述虚拟像素点对应的位置点的射线上,选取多个所述位置点作为目标采样点;以及,确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,并获取所述目标点集合的所述输入信息;On the ray starting from the origin of the target perspective and passing through the position point corresponding to each of the virtual pixel points in the scene point cloud, select a plurality of the position points as the target sampling points; and, determine that the position points are located at the plurality of Point cloud points in the scene point cloud within a predetermined range near the target sampling point are used as a target point set, and the input information of the target point set is obtained;

将所述目标点集合的所述输入信息,输入至所述场景重建模型,得到所述场景重建模型输出的所述目标点集合的颜色和光场强度;以及,根据所述目标点集合的颜色和光场强度,通过插值获得所述目标采样点的颜色和光场强度;以及,根据所述目标采样点的颜色和光场强度,通过渲染得到所述目标视角参数下的渲染图像。Input the input information of the target point set into the scene reconstruction model to obtain the color and light field intensity of the target point set output by the scene reconstruction model; and, according to the color and light field intensity of the target point set Field intensity: obtain the color and light field intensity of the target sampling point through interpolation; and obtain the rendered image under the target viewing angle parameters through rendering according to the color and light field intensity of the target sampling point.

可选的,所述重建模块,具体用于:Optional, the reconstruction module is specifically used for:

获取所述场景点云中位于所述多个目标采样点附近预定范围内的所有点云点,选取与每个所述采样点距离最近的预定值个所述点云点,作为所述目标点集合。Obtain all point cloud points in the scene point cloud located within a predetermined range near the plurality of target sampling points, and select a predetermined number of the point cloud points closest to each of the sampling points as the target points. gather.

可选的,所述已知视角参数下的训练图像,满足以下条件:训练图像包含房间表面;任意两个相邻的已知视角参数下的训练图像之间的重叠部分不少于整个图像大小的百分之三十。Optionally, the training images under known viewing angle parameters meet the following conditions: the training image contains the room surface; the overlap between any two adjacent training images under known viewing angle parameters is not less than the entire image size. Thirty percent.

又一方面,本申请提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;所述存储器存储计算机执行指令;所述处理器执行所述存储器存储的计算机执行指令,以实现如前所述的方法。In another aspect, the present application provides an electronic device, including: a processor, and a memory communicatively connected to the processor; the memory stores computer execution instructions; the processor executes the computer execution instructions stored in the memory, to implement the method as described previously.

又一方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如前所述的方法。In another aspect, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when executed by a processor, the computer-executable instructions are used to implement the method as described above.

本申请提供的室内场景重建方法、装置、电子设备及介质中,在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。本方案通过基于语义预测技术在训练采样点周围选取训练点集合,提高了点云点的物体类别与对应训练采样点的物体类别相同的可能性,使得根据点云点颜色与光场强度插值得到的训练采样点的颜色与光场强度更为准确,从而提高生成的渲染图像的质量,并且可以根据训练图像与对应的渲染图像对点云点的输入信息以及场景重建模型进行更好地训练修正,降低了对训练图像数量的要求,因此解决了训练图像数量较少时,生成室内场景渲染图像的质量差的问题。In the indoor scene reconstruction method, device, electronic equipment and media provided by this application, image collection is performed indoors to obtain training images; and based on motion recovery structure technology, based on the training images, the corresponding data of each training image is obtained. Train the viewing angle parameters and establish a scene point cloud; wherein the scene point cloud is a point cloud that represents the structural information of the indoor scene; each training image starting from the origin of the training viewing angle and passing through the training viewing angle parameters in the scene point cloud On the ray of the position point corresponding to the pixel point, select a plurality of the position points as training sampling points; and, based on semantic prediction technology, determine points in the scene point cloud located within a predetermined range near the multiple training sampling points. Cloud points are used as a training point set, and the current input information of the training point set is obtained; wherein the input information includes spatial position, neural characteristics and ray direction; the current input information of the training point set is input into the current Scene reconstruction model, obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color and light field intensity of the training point set, obtain the color and light field intensity of the training sampling point through interpolation ; And, according to the color and light field intensity of the training sampling point, obtain the rendering image under the training perspective parameter through rendering; according to the relationship between the rendering image under the training perspective parameter and the training image under the training perspective parameter loss, perform training and correction on the current input information and the current scene reconstruction model until the trained input information and the scene reconstruction model are obtained. By selecting a set of training points around the training sampling points based on semantic prediction technology, this solution improves the possibility that the object category of the point cloud point is the same as the object category of the corresponding training sampling point, so that the point cloud point color and light field intensity interpolation can be obtained The color and light field intensity of the training sampling points are more accurate, thereby improving the quality of the generated rendered images, and the input information of the point cloud points and the scene reconstruction model can be better trained and corrected based on the training images and corresponding rendered images. , reducing the requirement on the number of training images, thus solving the problem of poor quality of generated indoor scene rendering images when the number of training images is small.

附图说明Description of the drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

图1中示例性示出了本申请实施例一提供的室内场景重建方法的流程示意图;Figure 1 exemplarily shows a flow chart of the indoor scene reconstruction method provided by Embodiment 1 of the present application;

图2中示例性示出了本申请实施例一提供的生成场景点云的示例图;Figure 2 schematically shows an example diagram of generating a scene point cloud provided by Embodiment 1 of the present application;

图3中示例性示出了本申请实施例一提供的光线追踪的场景示例图;Figure 3 schematically shows an example diagram of a ray tracing scene provided by Embodiment 1 of the present application;

图4中示例性示出了本申请实施例一提供的语义预测的场景示例图;Figure 4 schematically shows an example diagram of a scene for semantic prediction provided by Embodiment 1 of the present application;

图5中示例性示出了本申请实施例一提供的获取点云点的神经特征的流程示意图;Figure 5 exemplarily shows a schematic flow chart for obtaining neural features of point cloud points provided by Embodiment 1 of the present application;

图6中示例性示出了本申请实施例一提供的动态分辨率实时渲染的场景示例图;Figure 6 schematically shows an example scene diagram of real-time rendering with dynamic resolution provided by Embodiment 1 of the present application;

图7中示例性示出了本申请实施例二提供的室内场景重建装置的结构示意图;Figure 7 exemplarily shows a schematic structural diagram of an indoor scene reconstruction device provided in Embodiment 2 of the present application;

图8中示例性示出了本申请实施例三提供的室内场景重建电子设备的结构示意图。Figure 8 exemplarily shows a schematic structural diagram of an indoor scene reconstruction electronic device provided in Embodiment 3 of the present application.

通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。Through the above-mentioned drawings, clear embodiments of the present application have been shown, which will be described in more detail below. These drawings and text descriptions are not intended to limit the scope of the present application's concepts in any way, but are intended to illustrate the application's concepts for those skilled in the art with reference to specific embodiments.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the appended claims.

虚拟现实技术(Virtual Reality,VR),以计算机技术为主,利用并综合三维图形技术、多媒体技术、仿真技术、显示技术、伺服技术等多种高科技的最新发展成果,借助计算机等设备产生一个逼真的三维视觉、触觉、嗅觉等多种感官体验的虚拟世界,从而使处于虚拟世界中的人产生一种身临其境的感觉。随着社会生产力和科学技术的不断发展,各行各业对VR技术的需求日益旺盛。VR技术也取得了巨大进步,并逐步成为一个新的科学技术领域。室内场景的新视角合成在在虚拟现实与人的交互中扮演了重要的角色,因此该任务在很多VR应用中有很大的潜力,比如室内场景虚拟漫游或者游客导航等。Virtual Reality (VR) technology, based on computer technology, utilizes and integrates the latest development achievements of various high-tech technologies such as three-dimensional graphics technology, multimedia technology, simulation technology, display technology, and servo technology, and uses computers and other equipment to generate a A virtual world with realistic three-dimensional vision, touch, smell and other sensory experiences, thus giving people in the virtual world an immersive feeling. With the continuous development of social productivity and science and technology, the demand for VR technology in all walks of life is increasingly strong. VR technology has also made great progress and gradually become a new field of science and technology. New perspective synthesis of indoor scenes plays an important role in the interaction between virtual reality and people, so this task has great potential in many VR applications, such as indoor scene virtual tours or tourist navigation.

现有的室内场景新视角合成技术,为了重建一个房间大小的场景,至少需要几百张环绕场景密集采集的RGB图像,需要较大的人力成本与时间成本并且对拍摄质量有较高的要求。如果输入图像数量不够,生成的渲染图像会出现很多空洞以及错误的渲染,因此存在训练图像数量较少时,生成室内场景渲染图像的质量差的问题。Existing indoor scene new perspective synthesis technology requires at least hundreds of RGB images densely collected around the scene in order to reconstruct a room-sized scene, which requires large labor costs and time costs and has high requirements for shooting quality. If the number of input images is not enough, the generated rendered image will have many holes and erroneous renderings. Therefore, there is a problem of poor quality indoor scene rendering images when the number of training images is small.

下面以具体地实施例对本申请的技术方案进行示例说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。The technical solution of the present application is illustrated below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

实施例一Embodiment 1

图1为本申请一实施例提供的室内场景重建方法的流程示意图。如图1所示,本实施例提供的室内场景重建方法可以包括:Figure 1 is a schematic flowchart of an indoor scene reconstruction method provided by an embodiment of the present application. As shown in Figure 1, the indoor scene reconstruction method provided by this embodiment may include:

S101,在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;S101, perform image collection indoors to obtain training images; and, based on the motion recovery structure technology, obtain the training perspective parameters corresponding to each training image according to the training images and establish a scene point cloud; wherein, the scene points The cloud is a point cloud that represents the structural information of the indoor scene;

S102,在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;S102. On the ray starting from the origin of the training perspective and passing through the location point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud, select multiple location points as training sampling points; and, Based on semantic prediction technology, the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points are determined as a training point set, and the current input information of the training point set is obtained; wherein, the input Information includes spatial location, neural characteristics, and ray direction;

S103,将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;S103, input the current input information of the training point set into the current scene reconstruction model, and obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color of the training point set and light field intensity, and obtain the color and light field intensity of the training sampling point through interpolation; and, according to the color and light field intensity of the training sampling point, obtain the rendered image under the training perspective parameter through rendering;

S104,根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。S104. Based on the loss between the rendering image under the training perspective parameters and the training image under the training perspective parameters, perform training correction on the current input information and the current scene reconstruction model until the trained input is obtained information to reconstruct the model with the scene.

实际应用中,本实施例的执行主体可以为室内场景重建装置,该装置可以通过计算机程序实现,例如,应用软件等;或者,也可以实现为存储有相关计算机程序的介质,例如,U盘、云盘等;再或者,还可以通过集成或安装有相关计算机程序的实体装置实现,例如,芯片、服务器等。In practical applications, the execution subject of this embodiment can be an indoor scene reconstruction device, which can be implemented through a computer program, such as application software, etc.; or it can also be implemented as a medium storing relevant computer programs, such as a U disk, Cloud disk, etc.; or, it can also be realized through physical devices integrated or installed with relevant computer programs, such as chips, servers, etc.

具体地,在房间内使用相机进行拍摄,得到RGB训练图像;基于运动恢复结构技术(Structure from Motion,SfM)如Colmap,根据训练图像获得每张训练图像对应的训练视角参数,并建立表征室内场景结构的3D场景点云。Specifically, a camera is used to shoot in the room to obtain RGB training images; based on structure from motion (SfM) technology such as Colmap, the training perspective parameters corresponding to each training image are obtained based on the training images, and a representation of the indoor scene is established. Structured 3D scene point cloud.

实际应用中,视角参数可以为相机参数,相机参数包括相机内参和相机外参;其中,相机内参包括内参矩阵,相机外参表征相机位姿,包括旋转矩阵与平移向量。基于某一确定的视角参数,可以唯一确定该视角参数下对应的相机成像得到的图像中各个像素点在世界坐标系下对应的空间位置。In practical applications, the viewing angle parameters can be camera parameters, which include camera intrinsic parameters and camera extrinsic parameters; among them, camera intrinsic parameters include intrinsic parameter matrices, and camera extrinsic parameters represent the camera pose, including rotation matrix and translation vector. Based on a certain viewing angle parameter, the corresponding spatial position of each pixel in the world coordinate system in the image obtained by the corresponding camera imaging under the viewing angle parameter can be uniquely determined.

从训练视角原点出发,向训练视角参数下的训练图像中每个像素点对应的位置点射出一根射线作为光线,在每根光线上选取多个训练采样点;其中,训练视角原点、像素点对应的位置点和场景点云均位于同一坐标系,每根光线均穿过场景点云;实际应用中,训练视角原点可以为相机光心,其空间位置可根据训练视角参数确定。以及,基于语义预测技术,在每个训练采样点附近预定范围内选取多个训练点云点,作为训练点集合,并获取训练点集合当前的输入信息;其中,输入信息包括空间位置、神经特征以及射线方向。Starting from the origin of the training perspective, a ray is emitted as a ray to the position corresponding to each pixel in the training image under the training perspective parameters, and multiple training sampling points are selected on each ray; among them, the origin of the training perspective, the pixel The corresponding position points and scene point clouds are located in the same coordinate system, and each ray passes through the scene point cloud; in practical applications, the origin of the training perspective can be the optical center of the camera, and its spatial position can be determined based on the training perspective parameters. And, based on semantic prediction technology, select multiple training point cloud points within a predetermined range near each training sampling point as a training point set, and obtain the current input information of the training point set; where the input information includes spatial location, neural features and ray direction.

将训练点集合当前的输入信息,输入至当前的场景重建模型,得到场景重建模型输出的训练点集合的颜色和光场强度;实际应用中,场景重建模型可以包括光场强度解码器和颜色解码器,具体参数可根据实际生产的需要进行选择;根据训练点集合的颜色和光场强度,通过插值获得训练采样点的颜色和光场强度;根据训练采样点的颜色和光场强度,通过渲染得到训练视角参数下的渲染图像中各个像素点的颜色,生成渲染图像。Input the current input information of the training point set into the current scene reconstruction model to obtain the color and light field intensity of the training point set output by the scene reconstruction model; in practical applications, the scene reconstruction model can include a light field intensity decoder and a color decoder. , specific parameters can be selected according to actual production needs; according to the color and light field intensity of the training point set, the color and light field intensity of the training sampling point are obtained through interpolation; according to the color and light field intensity of the training sampling point, the training perspective parameters are obtained through rendering The color of each pixel in the rendered image below is used to generate a rendered image.

根据训练视角参数下的训练图像与渲染图像,通过计算损失,对点云点的输入信息以及场景重建模型进行修正,直至得到经过训练的点云点的输入信息以及场景重建模型。According to the training images and rendering images under the training perspective parameters, the input information of the point cloud points and the scene reconstruction model are corrected by calculating the loss until the input information of the trained point cloud points and the scene reconstruction model are obtained.

举例来说,在一个10平米左右大小的房间内,采集40张不同训练视角参数下的训练图像,用以进行室内场景重建。For example, in a room of about 10 square meters, 40 training images with different training perspective parameters are collected for indoor scene reconstruction.

基于SfM技术,根据训练图像得到训练图像对应的训练视角参数,以及生成场景点云。图2为本申请一实施例提供的生成场景点云的示例图,如图2所示,根据包含室内场景的训练图像,基于SfM技术可得到表征室内场景结构的3D场景点云。Based on SfM technology, the training perspective parameters corresponding to the training images are obtained based on the training images, and the scene point cloud is generated. Figure 2 is an example diagram of generating a scene point cloud provided by an embodiment of the present application. As shown in Figure 2, according to training images containing indoor scenes, a 3D scene point cloud representing the indoor scene structure can be obtained based on SfM technology.

图3为本申请一实施例提供的光线追踪的场景示例图,如图3所示,根据训练视角参数确定训练视角原点以及训练图像中各个像素点在场景点云所在坐标系下对应的位置点,从训练视角原点出发,经过各个像素点的位置点作射线,在每条射线上的设定区域内均匀采样N个采样点q1,q2,…,qN,并基于语义预测技术,在每个采样点周围半径为r的球内,选取预定值个点云点,作为训练点集合。在一种可能的实现方式中,在每条射线上选取128个训练采样点,并在每个训练采样点周围选取8个点云点,作为训练点集合。Figure 3 is an example diagram of a ray tracing scene provided by an embodiment of the present application. As shown in Figure 3, the origin of the training perspective and the corresponding position point of each pixel in the training image in the coordinate system where the scene point cloud is located are determined according to the training perspective parameters. Starting from the origin of the training perspective, a ray is drawn through the position of each pixel point, and N sampling points q 1 , q 2 ,..., q N are evenly sampled in the set area on each ray, and based on semantic prediction technology, Within the sphere of radius r around each sampling point, a predetermined number of point cloud points are selected as a set of training points. In one possible implementation, 128 training sampling points are selected on each ray, and 8 point cloud points are selected around each training sampling point as a training point set.

将训练点集合当前的空间位置、神经特征和射线方向输入解码器,得到解码器输出的训练点集合中各个点云点的颜色和光场强度。其中解码器分为光场强度σ解码器和颜色c解码器两组,分别生成光场强度和颜色。解码器由六个全连接隐藏层和输入输出层组成,其中光场强度解码器的神经元数目为128-256-256-96-32-16,颜色解码器的神经元数目为128-256-256-128-64-8。Input the current spatial position, neural characteristics and ray direction of the training point set into the decoder, and obtain the color and light field intensity of each point cloud point in the training point set output by the decoder. The decoder is divided into two groups: light field intensity σ decoder and color c decoder, which generate light field intensity and color respectively. The decoder consists of six fully connected hidden layers and input and output layers. The number of neurons of the light field intensity decoder is 128-256-256-96-32-16, and the number of neurons of the color decoder is 128-256- 256-128-64-8.

根据训练点集合的颜色和光场强度,通过逆距离插值得到训练采样点的颜色和光场强度。对训练采样点的颜色和广场强度进行体渲染积分,得到对应像素点的颜色。According to the color and light field intensity of the training point set, the color and light field intensity of the training sampling point are obtained through inverse distance interpolation. Perform volume rendering integration on the color and square intensity of the training sampling points to obtain the color of the corresponding pixel.

对于一个像素点,其颜色c的体渲染积分计算公式如下:。For a pixel, the volume rendering integral calculation formula of its color c is as follows:.

c=∑Nτj(1-exp(-σjδj))rj c=∑ N τ j (1-exp(-σ j δ j ))r j

其中,N是从在经过相机原点与该像素点的射线上选取采样点的数量,σj是第j个采样点的光场强度,rj是第j个采样点的颜色,δj是相邻采样点之间的距离。Among them, N is the number of sampling points selected from the ray passing through the camera origin and the pixel point, σ j is the light field intensity of the j-th sampling point, r j is the color of the j-th sampling point, and δ j is the phase The distance between adjacent sampling points.

得到渲染图像中各个像素点的颜色后,即可生成渲染图像。在渲染图像和训练视图真实RGB值间计算损失函数 After obtaining the color of each pixel in the rendered image, the rendered image can be generated. Calculate the loss function between the rendered image and the real RGB values of the training view

其中,为渲染图像,/>为对应的真实训练图像。根据/>对点云点的输入信息以及场景重建模型进行修正。in, To render an image,/> is the corresponding real training image. According to/> Modify the input information of point cloud points and the scene reconstruction model.

实际应用中,训练图像的采集需满足一定的要求,在一个示例中,所述训练视角参数下的训练图像,满足以下条件:训练图像包含房间表面;任意两个相邻的训练视角参数下的训练图像之间的重叠部分不少于整个图像大小的百分之三十。In practical applications, the collection of training images needs to meet certain requirements. In one example, the training images under the training perspective parameters meet the following conditions: the training image contains the room surface; any two adjacent training perspective parameters The overlap between training images is no less than thirty percent of the entire image size.

具体地,房间表面特征包含于至少一张训练图像,且两张连续拍摄的相邻训练视角参数下的训练图像重叠,重叠区域的大小不小于整个图像大小的30%,从而保证采集到的训练图像能够达到较好的训练效果。Specifically, the room surface features are included in at least one training image, and two consecutively shot training images under adjacent training perspective parameters overlap, and the size of the overlapping area is not less than 30% of the entire image size, thereby ensuring that the collected training images Images can achieve better training results.

举例来说,对于一个10平米大小的房间,使用单目相机拍摄35-40张RGB图像,满足房间表面被至少一张RGB图像拍摄到,且两张连续相机视角下的图像重叠部分不少于图像大小的30%。For example, for a 10-square-meter room, use a monocular camera to capture 35-40 RGB images, so that the surface of the room is captured by at least one RGB image, and the overlap of the images from two consecutive camera angles is no less than 30% of image size.

其中,确定训练点集合的方式可以有多种,在一个示例中,所述基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,可以包括:There are many ways to determine the training point set. In one example, the semantic prediction technology is used to determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as training points. Point collections can include:

基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values;

针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point.

具体地,将训练图像和场景点云输入语义预测网络,得到训练图像中各个像素点的语义信息,包括像素点的预测物体类别,以及场景点云中各个点云点的语义信息,包括点云点的预测物体类别与对应概率值。Specifically, the training image and scene point cloud are input into the semantic prediction network to obtain the semantic information of each pixel in the training image, including the predicted object category of the pixel, and the semantic information of each point cloud in the scene point cloud, including the point cloud. The predicted object category and corresponding probability value of the point.

对于每个训练采样点,获取其周围预定范围内所有点云点,并计算每个点云点的选取概率,按照选取概率从高到底的顺序选取预定值个点云点,作为训练点集合。For each training sampling point, obtain all point cloud points within a predetermined range around it, calculate the selection probability of each point cloud point, and select a predetermined value of point cloud points in order from high to low according to the selection probability as a training point set.

其中,对于一个点云点,其选取概率pchosen的计算公式如下:Among them, for a point cloud point, the calculation formula for its selection probability p chosen is as follows:

其中为语义预测网络输出的点云点的预测物体类别,令当前训练采样点所在的射线的预测物体类别/>与该射线穿过的位置点对应像素点的预测物体类别相同,令射线的预测物体类别/>对应的概率值/>与射线穿过的位置点对应像素点的预测物体类别/>对应的概率值/>相同。in For the predicted object category of the point cloud points output by the semantic prediction network, let the predicted object category of the ray where the current training sampling point is located/> The predicted object category of the pixel corresponding to the position point through which the ray passes Same, let the predicted object category of the ray /> Corresponding probability value/> The predicted object category of the pixel corresponding to the position point where the ray passes/> Corresponding probability value/> same.

举例来说,图4为本申请一实施例提供的语义预测的场景示例图,如图4所示,将训练图像和场景点云输入语义预测模块,得到输出的训练图像中各个像素点的语义信息以及场景点云中各个点云点的语义信息,其中像素点以及各个点云点通过不同的灰度表征不同的预测物体类别,点云点的预测物体类别对应的概率值未在图中体现,在此仅作示意。For example, Figure 4 is an example diagram of a scene for semantic prediction provided by an embodiment of the present application. As shown in Figure 4, the training image and scene point cloud are input into the semantic prediction module to obtain the semantics of each pixel in the output training image. information and the semantic information of each point cloud point in the scene point cloud. The pixel points and each point cloud point represent different predicted object categories through different grayscales. The probability values corresponding to the predicted object categories of the point cloud points are not reflected in the figure. , this is only for illustration.

如前述举例,获取每个训练采样点周围半径为r的球内所有的点云点,计算该点云点的选取概率,并选择选取概率最高的8个点云点作为训练点集合。As in the above example, obtain all the point cloud points in a ball with a radius r around each training sampling point, calculate the selection probability of the point cloud point, and select the 8 point cloud points with the highest selection probability as the training point set.

通过令点云点的预测物体类别与射线的预测物体类别相同的概率与点云点的选取概率正相关,并将选取概率高的点云点作为训练点集合,提高了训练点云点与训练采样点对应的物体类别相同的概率,从而使得根据训练点云点参数得到的训练采样点的颜色和光场强度更为准确。By making the probability that the predicted object category of the point cloud point is the same as the predicted object category of the ray is positively correlated with the selection probability of the point cloud point, and selecting the point cloud point with a high probability as the training point set, the training point cloud point and the training are improved The probability that the object categories corresponding to the sampling points are the same makes the color and light field intensity of the training sampling points obtained based on the training point cloud point parameters more accurate.

在一个示例中,所述神经特征包括第一神经特征和第二神经特征,所述获取所述训练点集合当前的输入信息,可以包括:In one example, the neural features include first neural features and second neural features, and obtaining the current input information of the training point set may include:

将所有所述训练图像与所述场景点云输入语义预测网络,得到所述场景点云中各个点云点的第一神经特征;Input all the training images and the scene point cloud into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud;

获取与点云点距离最近的训练视角原点对应的所述训练图像,将该训练图像输入卷积神经网络,得到所述点云点的第二神经特征。The training image corresponding to the origin of the training perspective closest to the point cloud point is obtained, and the training image is input into the convolutional neural network to obtain the second neural feature of the point cloud point.

图5为本申请一实施例提供的获取点云点的神经特征的流程示意图,如图5所示,点云点的神经特征可以包括第一神经特征和第二神经特征。获取点云点的第一神经特征包括:将所有训练图像和场景点云输入语义预测网络,得到场景点云中各个点云点的第一神经特征。获取点云点的第二神经特征包括:获取与该点云点距离最近的训练视角原点对应的训练图像,将该训练图像输入卷积神经网络,得到点云点的第二神经特征。其中语义预测网络与上述的语义预测网络相同,实际应用中,将训练图像和场景点云输入语义预测网络,可以得到语义预测网络输出的场景点云中各个点云点的第二神经特征、预测物体类别以及对应的概率值。Figure 5 is a schematic flowchart of obtaining neural features of point cloud points provided by an embodiment of the present application. As shown in Figure 5, the neural features of point cloud points may include first neural features and second neural features. Obtaining the first neural features of the point cloud points includes: inputting all training images and scene point clouds into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud. Obtaining the second neural feature of the point cloud point includes: acquiring a training image corresponding to the origin of the training perspective closest to the point cloud point, inputting the training image into the convolutional neural network, and obtaining the second neural feature of the point cloud point. The semantic prediction network is the same as the above-mentioned semantic prediction network. In practical applications, the training images and scene point clouds are input into the semantic prediction network, and the second neural features and predictions of each point cloud point in the scene point cloud output by the semantic prediction network can be obtained. Object categories and corresponding probability values.

举例来说,假设训练图像的分辨率为640×480,将训练图像与场景点云输入语义预测网络,得到场景点云中各个点云点的第一神经特征s;其中语义预测网络为预训练的BP-Net;将与点云点距离最近的训练视角原点对应的训练图像输入卷积神经网络,得到32维第二神经特征n;其中卷积神经网络可以为预训练的图像特征提取卷积网络,由六个卷积模块组成,通道数分别为64-128-128-256-128-32,每一层的卷积模块由卷积层-线性整流-实例规范化单元构成。For example, assuming that the resolution of the training image is 640×480, input the training image and scene point cloud into the semantic prediction network to obtain the first neural feature s of each point cloud point in the scene point cloud; the semantic prediction network is pre-trained BP-Net; input the training image corresponding to the origin of the training perspective closest to the point cloud point into the convolutional neural network to obtain the 32-dimensional second neural feature n; where the convolutional neural network can extract convolutions for the pre-trained image features The network consists of six convolution modules, with the number of channels being 64-128-128-256-128-32. The convolution module of each layer is composed of convolution layer-linear rectification-instance normalization unit.

通过将与点云点距离最近的训练视角原点对应的训练图像进行卷积以及语义预测得到的点云点的语义信息作为点云点的神经特征,获得点云点除空间位置以外的结构信息与语义信息,使得点云点输入场景重建模型的输入信息更为丰富,从而得到更准确的点云点的颜色和光场强度。By convolving the training image corresponding to the origin of the training perspective closest to the point cloud point and semantically predicting the semantic information of the point cloud point as the neural feature of the point cloud point, the structural information of the point cloud point other than the spatial position is obtained. Semantic information makes the input information of the point cloud point input scene reconstruction model richer, thereby obtaining more accurate point cloud point color and light field intensity.

基于修正后的输入信息和场景重建模型,可以进行室内新视角场景的重建。在一个示例中,所述方法还可以包括:Based on the corrected input information and scene reconstruction model, indoor new perspective scenes can be reconstructed. In one example, the method may further include:

获取待重建场景的目标视角参数;以及,根据所述目标视角参数确定虚拟像素点;Obtain the target perspective parameters of the scene to be reconstructed; and determine virtual pixels according to the target perspective parameters;

在自所述目标视角原点出发并经过至场景点云中每个所述虚拟像素点对应的位置点的射线上,选取多个所述位置点作为目标采样点;以及,确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,并获取所述目标点集合的所述输入信息;On the ray starting from the origin of the target perspective and passing through the position point corresponding to each of the virtual pixel points in the scene point cloud, select a plurality of the position points as the target sampling points; and, determine that the position points are located at the plurality of Point cloud points in the scene point cloud within a predetermined range near the target sampling point are used as a target point set, and the input information of the target point set is obtained;

将所述目标点集合的所述输入信息,输入至所述场景重建模型,得到所述场景重建模型输出的所述目标点集合的颜色和光场强度;以及,根据所述目标点集合的颜色和光场强度,通过插值获得所述目标采样点的颜色和光场强度;以及,根据所述目标采样点的颜色和光场强度,通过渲染得到所述目标视角参数下的渲染图像。Input the input information of the target point set into the scene reconstruction model to obtain the color and light field intensity of the target point set output by the scene reconstruction model; and, according to the color and light field intensity of the target point set Field intensity: obtain the color and light field intensity of the target sampling point through interpolation; and obtain the rendered image under the target viewing angle parameters through rendering according to the color and light field intensity of the target sampling point.

具体地,确定待重建场景的目标视角后,获取目标视角参数。根据目标视角参数确定目标视角原点和虚拟像素点在场景点云所在坐标系中对应的位置点,从目标视角原点出发,穿过场景点云并经过每个虚拟像素点对应的位置点做射线,在射线的预定区域内选取目标采样点,并在每个目标采样点周围预定范围内选取场景点云中的点云点,得到目标点集合。Specifically, after determining the target perspective of the scene to be reconstructed, the target perspective parameters are obtained. According to the target perspective parameters, determine the target perspective origin and the corresponding position point of the virtual pixel point in the coordinate system of the scene point cloud. Starting from the target perspective origin, pass through the scene point cloud and pass through the location point corresponding to each virtual pixel point to make a ray. In the ray Select target sampling points within a predetermined area, and select point cloud points in the scene point cloud within a predetermined range around each target sampling point to obtain a target point set.

将目标点集合训练好的的输入信息输入训练好的场景重建模型,得到目标点集合的颜色和光场强度;对目标点集合的颜色和光场强度进行插值,得到对应目标采样点的颜色和光场强度;根据目标采样点的颜色和光场强度渲染得到新视角的渲染图像。Input the trained input information of the target point set into the trained scene reconstruction model to obtain the color and light field intensity of the target point set; interpolate the color and light field intensity of the target point set to obtain the color and light field intensity of the corresponding target sampling point. ; Render according to the color and light field intensity of the target sampling point to obtain a rendered image from a new perspective.

基于训练好的输入信息与训练好的场景重建模型进行室内场景新视角图像的渲染,能够生成效果较好的渲染图像,从而提高在训练图像数量较少的情况下生成的室内场景新视角渲染图像的质量。Based on the trained input information and the trained scene reconstruction model, the indoor scene new perspective image rendering can generate a better rendering image, thereby improving the indoor scene new perspective rendering image generated when the number of training images is small. the quality of.

举例来说,假设当前希望生成新视角图像,获取训练视角参数下室内场景的训练图像,并根据训练图像得到训练好的点云点的输入信息以及训练好的场景重建模型。For example, assume that you currently want to generate a new perspective image, obtain a training image of an indoor scene under the training perspective parameters, and obtain the input information of the trained point cloud points and the trained scene reconstruction model based on the training image.

确定新视角的目标视角参数后,根据目标视角参数确定目标视角原点和虚拟像素点对应的位置点在场景点云所在坐标系下的空间位置,做从目标视角原点出发,经过每个虚拟像素点的位置点的射线,在射线预定区域内均匀选取128个目标采样点,并在每个目标采样点半径为r的球内选取场景点云中的点云点,得到目标点集合。After determining the target perspective parameters of the new perspective, determine the spatial position of the target perspective origin and the position point corresponding to the virtual pixel point in the coordinate system of the scene point cloud according to the target perspective parameter, and do a process starting from the target perspective origin and passing through each virtual pixel point. For the ray of the position point, 128 target sampling points are uniformly selected within the predetermined area of the ray, and the point cloud points in the scene point cloud are selected within a sphere with a radius r of each target sampling point to obtain a target point set.

将目标点集合训练好的的输入信息输入训练好的场景重建模型,得到目标点集合的颜色和光场强度;对目标点集合的颜色和光场强度进行逆距离插值,得到对应目标采样点的颜色和光场强度;根据目标采样点的颜色和光场强度进行体渲染积分,得到新视角的渲染图像。Input the trained input information of the target point set into the trained scene reconstruction model to obtain the color and light field intensity of the target point set; perform inverse distance interpolation on the color and light field intensity of the target point set to obtain the color and light of the corresponding target sampling point Field intensity; perform volume rendering integration based on the color and light field intensity of the target sampling point to obtain a rendered image from a new perspective.

进一步地,本方案可提供一个室内场景高效采集重建漫游系统,用户输入稀疏采集的室内场景的RGB图像,训练过后即可实现场景的实时漫游与高质量新视角渲染。该系统可以渲染新视角,包括移动视线所在位置,移动视线方向,放大缩小视线可见范围等。Furthermore, this solution can provide an indoor scene efficient collection and reconstruction roaming system. The user inputs sparsely collected RGB images of indoor scenes. After training, real-time roaming of the scene and high-quality new perspective rendering can be achieved. The system can render new perspectives, including moving the position of the line of sight, moving the direction of the line of sight, zooming in and out of the visible range of the line of sight, etc.

系统框架基于Python语言实现,通过配有可交互控制面板的软件可视化当前视角所对应的画面。其中神经网络模型基于PyTorch实现,并在NVIDIA Quadro P6000GPU,24G内存单卡上进行训练。通过Pycuda将代码迁移到GPU上进行加速,可以实现实时渲染。The system framework is implemented based on Python language, and the screen corresponding to the current perspective is visualized through software equipped with an interactive control panel. The neural network model is implemented based on PyTorch and trained on NVIDIA Quadro P6000 GPU and 24G memory single card. By migrating the code to the GPU for acceleration through Pycuda, real-time rendering can be achieved.

图6为本申请一实施例提供的动态分辨率实时渲染的场景示例图,如图6所示,该系统可以使得用户实时浏览场景,当用户开启实时渲染按钮,系统可以根据渲染时间动态调整渲染图像的分辨率,例如当渲染时间超过预设值,选择低分辨率;并根据渲染图像的分辨率和目标视角参数,确定渲染图像中每个像素点的位置点,并生成渲染图像,保证用户实时浏览室内场景。Figure 6 is an example diagram of a scene rendered with dynamic resolution in real time provided by an embodiment of the present application. As shown in Figure 6, the system allows the user to browse the scene in real time. When the user turns on the real-time rendering button, the system can dynamically adjust the rendering according to the rendering time. The resolution of the image. For example, when the rendering time exceeds the preset value, select a low resolution; and based on the resolution of the rendered image and the target perspective parameters, determine the position of each pixel in the rendered image, and generate a rendered image to ensure that the user Browse indoor scenes in real time.

其中,选取目标点集合的方式可以有多种,在一个示例中,所述确定所述多个目标采样点附近预定范围内包含的目标点集合,可以包括:There are many ways to select a set of target points. In one example, determining a set of target points contained within a predetermined range near the plurality of target sampling points may include:

获取所述多个目标采样点附近预定范围内的所有点云点,选取与每个所述采样点距离最近的预定值个点云点,作为所述目标点集合。All point cloud points within a predetermined range near the plurality of target sampling points are obtained, and a predetermined value of point cloud points closest to each of the sampling points are selected as the target point set.

具体地,对于每个目标采样点,获取其周围预定范围内所有点云点,按照距离由近到远的顺序,选取预定值个点云点,作为目标点集合。Specifically, for each target sampling point, all point cloud points within a predetermined range around it are obtained, and a predetermined number of point cloud points are selected in order of distance from near to far as a set of target points.

举例来说,获取目标采样点周围半径为r的球内的所有点云点,并选取与目标采样点距离最近的8个点云点,作为目标点集合。For example, obtain all point cloud points in a ball with a radius r around the target sampling point, and select the 8 point cloud points closest to the target sampling point as the target point set.

本申请提供的室内场景重建方法中,在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。本方案通过基于语义预测技术在训练采样点周围选取训练点集合,提高了点云点的物体类别与对应训练采样点的物体类别相同的可能性,使得根据点云点颜色与光场强度插值得到的训练采样点的颜色与光场强度更为准确,从而提高生成的渲染图像的质量,并且可以根据训练图像与对应的渲染图像对点云点的输入信息以及场景重建模型进行更好地训练修正,降低了对训练图像数量的要求,因此解决了训练图像数量较少时,生成室内场景渲染图像的质量差的问题。In the indoor scene reconstruction method provided by this application, image collection is performed indoors to obtain training images; and, based on the motion recovery structure technology, based on the training images, the training perspective parameters corresponding to each training image are obtained and scene points are established Cloud; wherein, the scene point cloud is a point cloud that represents indoor scene structure information; starting from the origin of the training perspective and passing through the position point corresponding to each pixel point in the training image under the training perspective parameter in the scene point cloud On the ray, select a plurality of the position points as training sampling points; and, based on semantic prediction technology, determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, And obtain the current input information of the training point set; wherein the input information includes spatial position, neural characteristics and ray direction; input the current input information of the training point set into the current scene reconstruction model to obtain the The color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color and light field intensity of the training point set, obtain the color and light field intensity of the training sampling point through interpolation; and, according to the training The color and light field intensity of the sampling point are rendered to obtain the rendered image under the training perspective parameters; according to the loss between the rendering image under the training perspective parameters and the training image under the training perspective parameters, the current input The information and the current scene reconstruction model are trained and corrected until the trained input information and the scene reconstruction model are obtained. By selecting a set of training points around the training sampling points based on semantic prediction technology, this solution improves the possibility that the object category of the point cloud point is the same as the object category of the corresponding training sampling point, so that the point cloud point color and light field intensity interpolation can be obtained The color and light field intensity of the training sampling points are more accurate, thereby improving the quality of the generated rendered images, and the input information of the point cloud points and the scene reconstruction model can be better trained and corrected based on the training images and corresponding rendered images. , reducing the requirement on the number of training images, thus solving the problem of poor quality of generated indoor scene rendering images when the number of training images is small.

实施例二Embodiment 2

图7为本申请一实施例提供的室内场景重建装置的结构示意图。如图7所示,本实施例提供的室内场景重建装置70可以包括:Figure 7 is a schematic structural diagram of an indoor scene reconstruction device provided by an embodiment of the present application. As shown in Figure 7, the indoor scene reconstruction device 70 provided in this embodiment may include:

采集模块71,用于在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;The collection module 71 is used to collect images indoors to obtain training images; and, based on motion recovery structure technology, obtain the training perspective parameters corresponding to each training image according to the training images and establish a scene point cloud; wherein, The scene point cloud is a point cloud that represents the structural information of the indoor scene;

采样模块72,用于在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;The sampling module 72 is used to select a plurality of the position points as training samples on the ray starting from the origin of the training perspective and passing through the position point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud. points; and, based on semantic prediction technology, determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, and obtain the current input information of the training point set; wherein , the input information includes spatial position, neural characteristics and ray direction;

渲染模块73,用于将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;The rendering module 73 is used to input the current input information of the training point set into the current scene reconstruction model to obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the training The color and light field intensity of the point set are obtained by interpolation to obtain the color and light field intensity of the training sampling point; and, based on the color and light field intensity of the training sampling point, the rendered image under the training perspective parameter is obtained by rendering;

修正模块74,用于根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。The correction module 74 is used to perform training correction on the current input information and the current scene reconstruction model according to the loss between the rendered image under the training perspective parameter and the training image under the training perspective parameter until the trained image is obtained. The input information and the scene reconstruction model.

实际应用中,室内场景重建装置可以通过计算机程序实现,例如,应用软件等;或者,也可以实现为存储有相关计算机程序的介质,例如,U盘、云盘等;再或者,还可以通过集成或安装有相关计算机程序的实体装置实现,例如,芯片、服务器等。In practical applications, the indoor scene reconstruction device can be implemented through a computer program, such as application software, etc.; or it can also be implemented as a medium storing relevant computer programs, such as a U disk, cloud disk, etc.; or it can also be integrated Or implemented by physical devices installed with relevant computer programs, such as chips, servers, etc.

具体地,在房间内使用相机进行拍摄,得到RGB训练图像;基于运动恢复结构技术(Structure from Motion,SfM)如Colmap,根据训练图像获得每张训练图像对应的训练视角参数,并建立表征室内场景结构的3D场景点云。Specifically, a camera is used to shoot in the room to obtain RGB training images; based on structure from motion (SfM) technology such as Colmap, the training perspective parameters corresponding to each training image are obtained based on the training images, and a representation of the indoor scene is established. Structured 3D scene point cloud.

实际应用中,视角参数可以为相机参数,相机参数包括相机内参和相机外参;其中,相机内参包括内参矩阵,相机外参表征相机位姿,包括旋转矩阵与平移向量。基于某一确定的视角参数,可以唯一确定该视角参数下对应的相机成像得到的图像中各个像素点在世界坐标系下对应的空间位置。In practical applications, the viewing angle parameters can be camera parameters, which include camera intrinsic parameters and camera extrinsic parameters; among them, camera intrinsic parameters include intrinsic parameter matrices, and camera extrinsic parameters represent the camera pose, including rotation matrix and translation vector. Based on a certain viewing angle parameter, the corresponding spatial position of each pixel in the world coordinate system in the image obtained by the corresponding camera imaging under the viewing angle parameter can be uniquely determined.

从训练视角原点出发,向训练视角参数下的训练图像中每个像素点对应的位置点射出一根射线作为光线,在每根光线上选取多个训练采样点;其中,训练视角原点、像素点对应的位置点和场景点云均位于同一坐标系,每根光线均穿过场景点云;实际应用中,训练视角原点可以为相机光心,其空间位置可根据训练视角参数确定。以及,基于语义预测技术,在每个训练采样点附近预定范围内选取多个训练点云点,作为训练点集合,并获取训练点集合当前的输入信息;其中,输入信息包括空间位置、神经特征以及射线方向。Starting from the origin of the training perspective, a ray is emitted as a ray to the position corresponding to each pixel in the training image under the training perspective parameters, and multiple training sampling points are selected on each ray; among them, the origin of the training perspective, the pixel The corresponding position points and scene point clouds are located in the same coordinate system, and each ray passes through the scene point cloud; in practical applications, the origin of the training perspective can be the optical center of the camera, and its spatial position can be determined based on the training perspective parameters. And, based on semantic prediction technology, select multiple training point cloud points within a predetermined range near each training sampling point as a training point set, and obtain the current input information of the training point set; where the input information includes spatial location, neural features and ray direction.

将训练点集合当前的输入信息,输入至当前的场景重建模型,得到场景重建模型输出的训练点集合的颜色和光场强度;实际应用中,场景重建模型可以包括光场强度解码器和颜色解码器,具体参数可根据实际生产的需要进行选择;根据训练点集合的颜色和光场强度,通过插值获得训练采样点的颜色和光场强度;根据训练采样点的颜色和光场强度,通过渲染得到训练视角参数下的渲染图像中各个像素点的颜色,生成渲染图像。Input the current input information of the training point set into the current scene reconstruction model to obtain the color and light field intensity of the training point set output by the scene reconstruction model; in practical applications, the scene reconstruction model can include a light field intensity decoder and a color decoder. , specific parameters can be selected according to actual production needs; according to the color and light field intensity of the training point set, the color and light field intensity of the training sampling point are obtained through interpolation; according to the color and light field intensity of the training sampling point, the training perspective parameters are obtained through rendering The color of each pixel in the rendered image below is used to generate a rendered image.

根据训练视角参数下的训练图像与渲染图像,通过计算损失,对点云点的输入信息以及场景重建模型进行修正,直至得到经过训练的点云点的输入信息以及场景重建模型。According to the training images and rendering images under the training perspective parameters, the input information of the point cloud points and the scene reconstruction model are corrected by calculating the loss until the input information of the trained point cloud points and the scene reconstruction model are obtained.

实际应用中,训练图像的采集需满足一定的要求,在一个示例中,所述训练视角参数下的训练图像,满足以下条件:训练图像包含房间表面;任意两个相邻的训练视角参数下的训练图像之间的重叠部分不少于整个图像大小的百分之三十。In practical applications, the collection of training images needs to meet certain requirements. In one example, the training images under the training perspective parameters meet the following conditions: the training image contains the room surface; any two adjacent training perspective parameters The overlap between training images is no less than thirty percent of the entire image size.

具体地,房间表面特征包含于至少一张训练图像,且两张连续拍摄的相邻训练视角参数下的训练图像重叠,重叠区域的大小不小于整个图像大小的30%,从而保证采集到的训练图像能够达到较好的训练效果。Specifically, the room surface features are included in at least one training image, and two consecutively shot training images under adjacent training perspective parameters overlap, and the size of the overlapping area is not less than 30% of the entire image size, thereby ensuring that the collected training images Images can achieve better training results.

其中,确定训练点集合的方式可以有多种,在一个示例中,所述采样模块72,可以用于:There are many ways to determine the training point set. In one example, the sampling module 72 can be used to:

基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values;

针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point.

具体地,将训练图像和场景点云输入语义预测网络,得到训练图像中各个像素点的语义信息,包括像素点的预测物体类别,以及场景点云中各个点云点的语义信息,包括点云点的预测物体类别与对应概率值。Specifically, the training image and scene point cloud are input into the semantic prediction network to obtain the semantic information of each pixel in the training image, including the predicted object category of the pixel, and the semantic information of each point cloud in the scene point cloud, including the point cloud. The predicted object category and corresponding probability value of the point.

对于每个训练采样点,获取其周围预定范围内所有点云点,并计算每个点云点的选取概率,按照选取概率从高到底的顺序选取预定值个点云点,作为训练点集合。For each training sampling point, obtain all point cloud points within a predetermined range around it, calculate the selection probability of each point cloud point, and select a predetermined value of point cloud points in order from high to low according to the selection probability as a training point set.

其中,对于一个点云点,其选取概率pchosen的计算公式如下:Among them, for a point cloud point, the calculation formula for its selection probability p chosen is as follows:

其中为语义预测网络输出的点云点的预测物体类别,令当前训练采样点所在的射线的预测物体类别/>与该射线穿过的位置点对应像素点的预测物体类别相同,令射线的预测物体类别/>对应的概率值/>与射线穿过的位置点对应像素点的预测物体类别/>对应的概率值/>相同。in For the predicted object category of the point cloud points output by the semantic prediction network, let the predicted object category of the ray where the current training sampling point is located/> The predicted object category of the pixel corresponding to the position point through which the ray passes Same, let the predicted object category of the ray /> Corresponding probability value/> The predicted object category of the pixel corresponding to the position point where the ray passes/> Corresponding probability value/> same.

通过令点云点的预测物体类别与射线的预测物体类别相同的概率与点云点的选取概率正相关,并将选取概率高的点云点作为训练点集合,提高了训练点云点与训练采样点对应的物体类别相同的概率,从而使得根据训练点云点参数得到的训练采样点的颜色和光场强度更为准确。By making the probability that the predicted object category of the point cloud point is the same as the predicted object category of the ray is positively correlated with the selection probability of the point cloud point, and selecting the point cloud point with a high probability as the training point set, the training point cloud point and the training are improved The probability that the object categories corresponding to the sampling points are the same makes the color and light field intensity of the training sampling points obtained based on the training point cloud point parameters more accurate.

在一个示例中,所述神经特征包括第一神经特征和第二神经特征,所述采样模块72,还可以用于:In one example, the neural features include first neural features and second neural features, and the sampling module 72 can also be used to:

将所有所述训练图像与所述场景点云输入语义预测网络,得到所述场景点云中各个点云点的第一神经特征;Input all the training images and the scene point cloud into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud;

获取与点云点距离最近的训练视角原点对应的所述训练图像,将该训练图像输入卷积神经网络,得到所述点云点的第二神经特征。The training image corresponding to the origin of the training perspective closest to the point cloud point is obtained, and the training image is input into the convolutional neural network to obtain the second neural feature of the point cloud point.

具体地,点云点的神经特征可以包括第一神经特征和第二神经特征。获取点云点的第一神经特征包括:将所有训练图像和场景点云输入语义预测网络,得到场景点云中各个点云点的第一神经特征。获取点云点的第二神经特征包括:获取与该点云点距离最近的训练视角原点对应的训练图像,将该训练图像输入卷积神经网络,得到点云点的第二神经特征。其中语义预测网络与上述的语义预测网络相同,实际应用中,将训练图像和场景点云输入语义预测网络,可以得到语义预测网络输出的场景点云中各个点云点的第二神经特征、预测物体类别以及对应的概率值。Specifically, the neural features of the point cloud points may include first neural features and second neural features. Obtaining the first neural features of the point cloud points includes: inputting all training images and scene point clouds into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud. Obtaining the second neural feature of the point cloud point includes: acquiring a training image corresponding to the origin of the training perspective closest to the point cloud point, inputting the training image into the convolutional neural network, and obtaining the second neural feature of the point cloud point. The semantic prediction network is the same as the above-mentioned semantic prediction network. In practical applications, the training images and scene point clouds are input into the semantic prediction network, and the second neural features and predictions of each point cloud point in the scene point cloud output by the semantic prediction network can be obtained. Object categories and corresponding probability values.

通过将与点云点距离最近的训练视角原点对应的训练图像进行卷积以及语义预测得到的点云点的语义信息作为点云点的神经特征,获得点云点除空间位置以外的结构信息与语义信息,使得点云点输入场景重建模型的输入信息更为丰富,从而得到更准确的点云点的颜色和光场强度。By convolving the training image corresponding to the origin of the training perspective closest to the point cloud point and semantically predicting the semantic information of the point cloud point as the neural feature of the point cloud point, the structural information of the point cloud point other than the spatial position is obtained. Semantic information makes the input information of the point cloud point input scene reconstruction model richer, thereby obtaining more accurate point cloud point color and light field intensity.

基于修正后的输入信息和场景重建模型,可以进行室内新视角场景的重建。在一个示例中,所述装置还包括,重建模块,可以用于:Based on the corrected input information and scene reconstruction model, indoor new perspective scenes can be reconstructed. In one example, the device further includes a reconstruction module, which can be used to:

获取待重建场景的目标视角参数;以及,根据所述目标视角参数确定虚拟像素点;Obtain the target perspective parameters of the scene to be reconstructed; and determine virtual pixels according to the target perspective parameters;

在自所述目标视角原点出发并经过至场景点云中每个所述虚拟像素点对应的位置点的射线上,选取多个所述位置点作为目标采样点;以及,确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,并获取所述目标点集合的所述输入信息;On the ray starting from the origin of the target perspective and passing through the position point corresponding to each of the virtual pixel points in the scene point cloud, select a plurality of the position points as the target sampling points; and, determine that the position points are located at the plurality of Point cloud points in the scene point cloud within a predetermined range near the target sampling point are used as a target point set, and the input information of the target point set is obtained;

将所述目标点集合的所述输入信息,输入至所述场景重建模型,得到所述场景重建模型输出的所述目标点集合的颜色和光场强度;以及,根据所述目标点集合的颜色和光场强度,通过插值获得所述目标采样点的颜色和光场强度;以及,根据所述目标采样点的颜色和光场强度,通过渲染得到所述目标视角参数下的渲染图像。Input the input information of the target point set into the scene reconstruction model to obtain the color and light field intensity of the target point set output by the scene reconstruction model; and, according to the color and light field intensity of the target point set Field intensity: obtain the color and light field intensity of the target sampling point through interpolation; and obtain the rendered image under the target viewing angle parameters through rendering according to the color and light field intensity of the target sampling point.

具体地,确定待重建场景的目标视角后,获取目标视角参数。根据目标视角参数确定目标视角原点和虚拟像素点在场景点云所在坐标系中对应的位置点,从目标视角原点出发,穿过场景点云并经过每个虚拟像素点对应的位置点做射线,在射线的预定区域内选取目标采样点,并在每个目标采样点周围预定范围内选取场景点云中的点云点,得到目标点集合。Specifically, after determining the target perspective of the scene to be reconstructed, the target perspective parameters are obtained. According to the target perspective parameters, determine the target perspective origin and the corresponding position point of the virtual pixel point in the coordinate system of the scene point cloud. Starting from the target perspective origin, pass through the scene point cloud and pass through the location point corresponding to each virtual pixel point to make a ray. In the ray Select target sampling points within a predetermined area, and select point cloud points in the scene point cloud within a predetermined range around each target sampling point to obtain a target point set.

将目标点集合训练好的的输入信息输入训练好的场景重建模型,得到目标点集合的颜色和光场强度;对目标点集合的颜色和光场强度进行插值,得到对应目标采样点的颜色和光场强度;根据目标采样点的颜色和光场强度渲染得到新视角的渲染图像。Input the trained input information of the target point set into the trained scene reconstruction model to obtain the color and light field intensity of the target point set; interpolate the color and light field intensity of the target point set to obtain the color and light field intensity of the corresponding target sampling point. ; Render according to the color and light field intensity of the target sampling point to obtain a rendered image from a new perspective.

基于训练好的输入信息与训练好的场景重建模型进行室内场景新视角图像的渲染,能够生成效果较好的渲染图像。Based on the trained input information and the trained scene reconstruction model, the new perspective image of the indoor scene can be rendered to generate a rendering image with better effect.

其中,选取目标点集合的方式可以有多种,在一个示例中,所述重建模块,具体用于:There are many ways to select the target point set. In one example, the reconstruction module is specifically used to:

获取所述多个目标采样点附近预定范围内的所有点云点,选取与每个所述采样点距离最近的预定值个点云点,作为所述目标点集合。All point cloud points within a predetermined range near the plurality of target sampling points are obtained, and a predetermined value of point cloud points closest to each of the sampling points are selected as the target point set.

具体地,对于每个目标采样点,获取其周围预定范围内所有点云点,按照距离由近到远的顺序,选取预定值个点云点,作为目标点集合。Specifically, for each target sampling point, all point cloud points within a predetermined range around it are obtained, and a predetermined number of point cloud points are selected in order of distance from near to far as a set of target points.

本申请提供的室内场景重建装置中,在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。本方案通过基于语义预测技术在训练采样点周围选取训练点集合,提高了点云点的物体类别与对应训练采样点的物体类别相同的可能性,使得根据点云点颜色与光场强度插值得到的训练采样点的颜色与光场强度更为准确,从而提高生成的渲染图像的质量,并且可以根据训练图像与对应的渲染图像对点云点的输入信息以及场景重建模型进行更好地训练修正,降低了对训练图像数量的要求,因此解决了训练图像数量较少时,生成室内场景渲染图像的质量差的问题。In the indoor scene reconstruction device provided by this application, image collection is performed indoors to obtain training images; and, based on the motion recovery structure technology, the training perspective parameters corresponding to each training image are obtained and scene points are established based on the training images. Cloud; wherein, the scene point cloud is a point cloud that represents indoor scene structure information; starting from the origin of the training perspective and passing through the position point corresponding to each pixel point in the training image under the training perspective parameter in the scene point cloud On the ray, select a plurality of the position points as training sampling points; and, based on semantic prediction technology, determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, And obtain the current input information of the training point set; wherein the input information includes spatial position, neural characteristics and ray direction; input the current input information of the training point set into the current scene reconstruction model to obtain the The color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color and light field intensity of the training point set, obtain the color and light field intensity of the training sampling point through interpolation; and, according to the training The color and light field intensity of the sampling point are rendered to obtain the rendered image under the training perspective parameters; according to the loss between the rendering image under the training perspective parameters and the training image under the training perspective parameters, the current input The information and the current scene reconstruction model are trained and corrected until the trained input information and the scene reconstruction model are obtained. By selecting a set of training points around the training sampling points based on semantic prediction technology, this solution improves the possibility that the object category of the point cloud point is the same as the object category of the corresponding training sampling point, so that the point cloud point color and light field intensity interpolation can be obtained The color and light field intensity of the training sampling points are more accurate, thereby improving the quality of the generated rendered images, and the input information of the point cloud points and the scene reconstruction model can be better trained and corrected based on the training images and corresponding rendered images. , reducing the requirement on the number of training images, thus solving the problem of poor quality of generated indoor scene rendering images when the number of training images is small.

实施例三Embodiment 3

图8为本公开实施例中提供的一种电子设备的结构示意图,如图8所示,该电子设备包括:Figure 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. As shown in Figure 8, the electronic device includes:

处理器(processor)291,电子设备还包括了存储器(memory)292;还可以包括通信接口(Communication Interface)293和总线294。其中,处理器291、存储器292、通信接口293、可以通过总线294完成相互间的通信。通信接口293可以用于信息传输。处理器291可以调用存储器292中的逻辑指令,以执行上述实施例的方法。The electronic device also includes a processor (processor) 291 and a memory (memory) 292; it may also include a communication interface (Communication Interface) 293 and a bus 294. Among them, the processor 291, the memory 292, and the communication interface 293 can communicate with each other through the bus 294. Communication interface 293 may be used for information transmission. The processor 291 can call logical instructions in the memory 292 to execute the methods of the above embodiments.

此外,上述的存储器292中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。In addition, the above-mentioned logical instructions in the memory 292 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.

存储器292作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器291通过运行存储在存储器292中的软件程序、指令以及模块,从而执行功能应用以及数据处理,即实现上述方法实施例中的方法。As a computer-readable storage medium, the memory 292 can be used to store software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 291 executes software programs, instructions and modules stored in the memory 292 to execute functional applications and data processing, that is, to implement the methods in the above method embodiments.

存储器292可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器292可以包括高速随机存取存储器,还可以包括非易失性存储器。The memory 292 may include a stored program area and a stored data area, where the stored program area may store an operating system and an application program required for at least one function; the stored data area may store data created according to the use of the terminal device, etc. In addition, the memory 292 may include high-speed random access memory and may also include non-volatile memory.

本公开实施例提供一种非临时性计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如前述实施例所述的方法。Embodiments of the present disclosure provide a non-transitory computer-readable storage medium. Computer-executable instructions are stored in the computer-readable storage medium. When the computer-executable instructions are executed by a processor, they are used to implement the methods described in the previous embodiments. method.

实施例四Embodiment 4

本公开实施例提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现上述本公开实施例中任意实施例提供的方法。An embodiment of the disclosure provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the method provided by any of the above embodiments of the disclosure is implemented.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求书指出。Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1.一种室内场景重建方法,其特征在于,包括:1. An indoor scene reconstruction method, characterized by including: 在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;Image collection is performed indoors to obtain training images; and, based on the motion recovery structure technology, the training perspective parameters corresponding to each training image are obtained according to the training images and a scene point cloud is established; wherein, the scene point cloud is Point clouds representing structural information of indoor scenes; 在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;On the ray starting from the origin of the training perspective and passing through the position point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud, select a plurality of the position points as training sampling points; and, based on semantics The prediction technology determines the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, and obtains the current input information of the training point set; wherein the input information includes Spatial location, neural characteristics, and ray direction; 将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;Input the current input information of the training point set into the current scene reconstruction model to obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the color and light field intensity of the training point set Intensity, obtain the color and light field intensity of the training sampling point through interpolation; and, obtain the rendered image under the training perspective parameter by rendering according to the color and light field intensity of the training sampling point; 根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。According to the loss between the rendered image under the training perspective parameters and the training image under the training perspective parameters, the current input information and the current scene reconstruction model are trained and corrected until the trained input information and the current scene reconstruction model are obtained. The scene reconstruction model. 2.根据权利要求1所述的方法,其特征在于,所述基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,包括:2. The method according to claim 1, characterized in that the point cloud points in the scene point cloud located within a predetermined range near the plurality of training sampling points are determined as a training point set based on semantic prediction technology, include: 基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values; 针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point. 3.根据权利要求1所述的方法,其特征在于,所述神经特征包括第一神经特征和第二神经特征,所述获取所述训练点集合当前的输入信息,包括:3. The method according to claim 1, wherein the neural features include first neural features and second neural features, and obtaining the current input information of the training point set includes: 将所有所述训练图像与所述场景点云输入语义预测网络,得到所述场景点云中各个点云点的第一神经特征;Input all the training images and the scene point cloud into the semantic prediction network to obtain the first neural features of each point cloud point in the scene point cloud; 获取与点云点距离最近的训练视角原点对应的所述训练图像,将该训练图像输入卷积神经网络,得到所述点云点的第二神经特征。The training image corresponding to the origin of the training perspective closest to the point cloud point is obtained, and the training image is input into the convolutional neural network to obtain the second neural feature of the point cloud point. 4.根据权利要求1所述的方法,其特征在于,所述方法还包括:4. The method according to claim 1, characterized in that, the method further comprises: 获取待重建场景的目标视角参数;以及,根据所述目标视角参数确定虚拟像素点;Obtain the target perspective parameters of the scene to be reconstructed; and determine virtual pixels according to the target perspective parameters; 在自所述目标视角原点出发并经过至场景点云中每个所述虚拟像素点对应的位置点的射线上,选取多个所述位置点作为目标采样点;以及,确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,并获取所述目标点集合的所述输入信息;On the ray starting from the origin of the target perspective and passing through the position point corresponding to each of the virtual pixel points in the scene point cloud, select a plurality of the position points as the target sampling points; and, determine that the position points are located at the plurality of Point cloud points in the scene point cloud within a predetermined range near the target sampling point are used as a target point set, and the input information of the target point set is obtained; 将所述目标点集合的所述输入信息,输入至所述场景重建模型,得到所述场景重建模型输出的所述目标点集合的颜色和光场强度;以及,根据所述目标点集合的颜色和光场强度,通过插值获得所述目标采样点的颜色和光场强度;以及,根据所述目标采样点的颜色和光场强度,通过渲染得到所述目标视角参数下的渲染图像。Input the input information of the target point set into the scene reconstruction model to obtain the color and light field intensity of the target point set output by the scene reconstruction model; and, according to the color and light field intensity of the target point set Field intensity: obtain the color and light field intensity of the target sampling point through interpolation; and obtain the rendered image under the target viewing angle parameters through rendering according to the color and light field intensity of the target sampling point. 5.根据权利要求4所述的方法,其特征在于,所述确定位于所述多个目标采样点附近预定范围内的所述场景点云中的点云点作为目标点集合,包括:5. The method according to claim 4, wherein determining the point cloud points in the scene point cloud within a predetermined range near the plurality of target sampling points as a target point set includes: 获取所述场景点云中位于所述多个目标采样点附近预定范围内的所有点云点,选取与每个所述采样点距离最近的预定值个所述点云点,作为所述目标点集合。Obtain all point cloud points in the scene point cloud located within a predetermined range near the plurality of target sampling points, and select a predetermined number of the point cloud points closest to each of the sampling points as the target points. gather. 6.根据权利要求1-5任一项所述的方法,其特征在于,所述训练图像,满足以下条件:所述训练图像包含房间表面;任意两个相邻的训练视角参数下的训练图像之间的重叠部分不少于整个图像大小的百分之三十。6. The method according to any one of claims 1 to 5, characterized in that the training image satisfies the following conditions: the training image contains a room surface; training images under any two adjacent training perspective parameters The overlap between them is no less than thirty percent of the entire image size. 7.一种室内场景重建装置,其特征在于,包括:7. An indoor scene reconstruction device, characterized by comprising: 采集模块,用于在室内进行图像采集,得到训练图像;以及,基于运动恢复结构技术,根据所述训练图像,获取每个所述训练图像对应的训练视角参数并建立场景点云;其中,所述场景点云为表征室内场景结构信息的点云;An acquisition module is used to collect images indoors to obtain training images; and, based on the motion recovery structure technology, obtain the training perspective parameters corresponding to each of the training images according to the training images and establish a scene point cloud; wherein, The scene point cloud described above is a point cloud that represents the structural information of the indoor scene; 采样模块,用于在自所述训练视角原点出发并经过场景点云中训练视角参数下的训练图像中每个像素点对应的位置点的射线上,选取多个所述位置点作为训练采样点;以及,基于语义预测技术确定位于所述多个训练采样点附近预定范围内的所述场景点云中的点云点作为训练点集合,并获取所述训练点集合当前的输入信息;其中,所述输入信息包括空间位置、神经特征以及射线方向;A sampling module, configured to select a plurality of the position points as training sampling points on the ray starting from the origin of the training perspective and passing through the position point corresponding to each pixel in the training image under the training perspective parameter in the scene point cloud. ; And, based on semantic prediction technology, determine the point cloud points in the scene point cloud located within a predetermined range near the multiple training sampling points as a training point set, and obtain the current input information of the training point set; wherein, The input information includes spatial position, neural characteristics and ray direction; 渲染模块,用于将所述训练点集合当前的输入信息,输入至当前的场景重建模型,得到所述场景重建模型输出的所述训练点集合的颜色和光场强度;以及,根据所述训练点集合的颜色和光场强度,通过插值获得所述训练采样点的颜色和光场强度;以及,根据所述训练采样点的颜色和光场强度,通过渲染得到所述训练视角参数下的渲染图像;A rendering module, configured to input the current input information of the training point set into the current scene reconstruction model, and obtain the color and light field intensity of the training point set output by the scene reconstruction model; and, according to the training point The color and light field intensity of the set are interpolated to obtain the color and light field intensity of the training sampling point; and, based on the color and light field intensity of the training sampling point, the rendered image under the training perspective parameter is obtained through rendering; 修正模块,用于根据所述训练视角参数下的渲染图像与所述训练视角参数下的训练图像之间的损失,对当前的输入信息与当前的场景重建模型进行训练修正,直至得到经过训练的所述输入信息与所述场景重建模型。A correction module, configured to train and correct the current input information and the current scene reconstruction model according to the loss between the rendered image under the training perspective parameters and the training image under the training perspective parameters until a trained The input information and the scene reconstruction model. 8.根据权利要求7所述的装置,其特征在于,所述采样模块,具体用于:8. The device according to claim 7, characterized in that the sampling module is specifically used for: 基于语义预测技术,获取所述训练视角参数下的训练图像中各个像素点的语义信息以及所述场景点云中各个点云点的语义信息;所述像素点的语义信息包括物体类别,所述点云点的语义信息包括物体类别与对应的概率值;Based on semantic prediction technology, the semantic information of each pixel point in the training image under the training perspective parameter and the semantic information of each point cloud point in the scene point cloud are obtained; the semantic information of the pixel point includes the object category, and the The semantic information of point cloud points includes object categories and corresponding probability values; 针对每个训练采样点,获取所述场景点云中位于该训练采样点周围预定范围内的所有点云点;并计算所述训练采样点周围预定范围内每个点云点的选取概率,按照选取概率从高到低的顺序,选取预设值个点云点,作为所述训练点集合;其中,计算所述选取概率包括:若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别相同,则该点云点的选取概率为一;若所述点云点的物体类别与所述训练采样点对应的像素点的物体类别不同,则该点云点的选取概率为一减该点云点对应的像素点的物体类别对应的概率值。For each training sampling point, obtain all point cloud points in the scene point cloud located within a predetermined range around the training sampling point; and calculate the selection probability of each point cloud point within the predetermined range around the training sampling point, according to In order of selection probability from high to low, a preset value of point cloud points is selected as the training point set; wherein, calculating the selection probability includes: if the object category of the point cloud point corresponds to the training sampling point The object category of the pixel points is the same, then the selection probability of the point cloud point is one; if the object category of the point cloud point is different from the object category of the pixel point corresponding to the training sampling point, then the selection probability of the point cloud point is The probability is one minus the probability value corresponding to the object category of the pixel corresponding to the point cloud point. 9.一种电子设备,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;9. An electronic device, characterized by comprising: a processor, and a memory communicatively connected to the processor; 所述存储器存储计算机执行指令;The memory stores computer execution instructions; 所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-6中任一项所述的方法。The processor executes computer-executable instructions stored in the memory to implement the method according to any one of claims 1-6. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如权利要求1-6中任一项所述的方法。10. A computer-readable storage medium, characterized in that computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, the computer-executable instructions are used to implement any one of claims 1-6. method described in the item.
CN202310544904.XA 2023-05-15 2023-05-15 Indoor scene reconstruction method, device, electronic equipment and media Pending CN116805349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310544904.XA CN116805349A (en) 2023-05-15 2023-05-15 Indoor scene reconstruction method, device, electronic equipment and media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310544904.XA CN116805349A (en) 2023-05-15 2023-05-15 Indoor scene reconstruction method, device, electronic equipment and media

Publications (1)

Publication Number Publication Date
CN116805349A true CN116805349A (en) 2023-09-26

Family

ID=88079196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310544904.XA Pending CN116805349A (en) 2023-05-15 2023-05-15 Indoor scene reconstruction method, device, electronic equipment and media

Country Status (1)

Country Link
CN (1) CN116805349A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691539A (en) * 2020-12-30 2022-07-01 大唐移动通信设备有限公司 Electronic equipment starting method and electronic equipment
CN117765171A (en) * 2023-12-12 2024-03-26 之江实验室 Three-dimensional model reconstruction method and device, storage medium and electronic equipment
CN119445006A (en) * 2025-01-13 2025-02-14 浪潮电子信息产业股份有限公司 Three-dimensional digital content generation method, device, system, equipment, medium and product
CN119444955A (en) * 2025-01-09 2025-02-14 科大讯飞股份有限公司 Image rendering method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691539A (en) * 2020-12-30 2022-07-01 大唐移动通信设备有限公司 Electronic equipment starting method and electronic equipment
CN117765171A (en) * 2023-12-12 2024-03-26 之江实验室 Three-dimensional model reconstruction method and device, storage medium and electronic equipment
CN119444955A (en) * 2025-01-09 2025-02-14 科大讯飞股份有限公司 Image rendering method, device, equipment and storage medium
CN119444955B (en) * 2025-01-09 2025-07-18 科大讯飞股份有限公司 Image rendering method, device, equipment and storage medium
CN119445006A (en) * 2025-01-13 2025-02-14 浪潮电子信息产业股份有限公司 Three-dimensional digital content generation method, device, system, equipment, medium and product
CN119445006B (en) * 2025-01-13 2025-04-25 浪潮电子信息产业股份有限公司 Three-dimensional digital content generation method, device, system, equipment, medium and product

Similar Documents

Publication Publication Date Title
CN116805349A (en) Indoor scene reconstruction method, device, electronic equipment and media
JP2022524891A (en) Image processing methods and equipment, electronic devices and computer programs
CN115115805B (en) Training method, device, equipment and storage medium for three-dimensional reconstruction model
CN113220251B (en) Object display method, device, electronic equipment and storage medium
US20240112394A1 (en) AI Methods for Transforming a Text Prompt into an Immersive Volumetric Photo or Video
US11361477B2 (en) Method for improved handling of texture data for texturing and other image processing tasks
CN116977522A (en) Rendering method and device of three-dimensional model, computer equipment and storage medium
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN117456128A (en) Three-dimensional reconstruction method, device, equipment and storage medium
CN113763231A (en) Model generation method, image perspective determination method, device, device and medium
CN115082609B (en) Image rendering method, device, storage medium and electronic device
CN108043027B (en) Storage medium, electronic device, game screen display method and device
CN116228986A (en) Indoor scene illumination estimation method based on local-global completion strategy
CN115272575B (en) Image generation method and device, storage medium and electronic equipment
CN116664770A (en) Image processing method, storage medium and system for shooting entity
CN115526976A (en) Virtual scene rendering method and device, storage medium and electronic equipment
US20250037354A1 (en) Generalizable novel view synthesis guided by local attention mechanism
CN113658318A (en) Data processing method and system, training data generation method and electronic device
CN117834839A (en) Multi-view 3D intelligent imaging measurement system based on mobile terminal
CN117237573A (en) Method, device, equipment, storage medium and program product for generating object map
CN117274296A (en) Training method, device, equipment and storage medium of target following model
CN112257653A (en) Method and device for determining space decoration effect graph, storage medium and electronic equipment
Fei et al. PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models
CN116991296B (en) Object editing method and device, electronic equipment and storage medium
CN116681818B (en) New view angle reconstruction method, training method and device of new view angle reconstruction network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination