CN117596373A - Method and electronic device for information display based on dynamic digital human image - Google Patents
Method and electronic device for information display based on dynamic digital human image Download PDFInfo
- Publication number
- CN117596373A CN117596373A CN202410069436.XA CN202410069436A CN117596373A CN 117596373 A CN117596373 A CN 117596373A CN 202410069436 A CN202410069436 A CN 202410069436A CN 117596373 A CN117596373 A CN 117596373A
- Authority
- CN
- China
- Prior art keywords
- video
- real
- view
- angles
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/296—Synchronisation thereof; Control thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
技术领域Technical field
本申请涉及信息处理技术领域,特别是涉及基于动态数字人形象进行信息展示的方法及电子设备。This application relates to the field of information processing technology, and in particular to methods and electronic devices for information display based on dynamic digital human images.
背景技术Background technique
随着科技的快速发展,尤其是计算机视觉、增强现实 (AR) 和虚拟现实 (VR) 技术的成熟,虚实融合技术正在吸引越来越多的关注。这种技术通过整合真实世界的元素和计算机生成的虚拟元素为用户提供了一个沉浸式的体验,它为各种产业开辟了新的应用领域和机会。尤其是在娱乐、广告和电商等行业,这些技术的兴起,不仅为消费者带来了新的体验,还为企业开辟了新的市场和营销机会。例如,在电商平台中,针对服饰行业,可以提供“模特走秀”、“试衣间”等功能,给消费者带来沉浸式的视觉体验。With the rapid development of technology, especially the maturity of computer vision, augmented reality (AR) and virtual reality (VR) technologies, virtual and real fusion technology is attracting more and more attention. This technology provides users with an immersive experience by integrating real-world elements and computer-generated virtual elements, and it opens up new application areas and opportunities for various industries. Especially in industries such as entertainment, advertising, and e-commerce, the rise of these technologies not only brings new experiences to consumers, but also opens up new markets and marketing opportunities for enterprises. For example, in the e-commerce platform, for the apparel industry, functions such as "model catwalk" and "fitting room" can be provided to bring an immersive visual experience to consumers.
然而,在实现上述功能的过程中也提出了新的技术和设计挑战,比如,在“模特走秀”场景中,包括如何提供更具有真实感的虚拟环境以及“数字人”,等等。其中,关于“数字人”,在现有技术中,通常有以下几种实现方式:However, the process of realizing the above functions also poses new technical and design challenges. For example, in the "model catwalk" scene, including how to provide a more realistic virtual environment and "digital people", etc. Among them, regarding "digital human", in the existing technology, there are usually the following implementation methods:
方式一,传统的3D人像建模技术:使用高精度的3D扫描设备对真人模特进行扫描,然后手动或半自动地进行纹理贴图和细节雕刻,以生成人体模型。但是,这种方式下,通常需要昂贵的扫描设备和高技能的专家来处理扫描数据,从扫描到完成的模型需要较长的时间,并且,主要适用于静态建模,动态表情和动作捕捉需要额外的工作。Method 1, traditional 3D portrait modeling technology: use high-precision 3D scanning equipment to scan a real model, and then manually or semi-automatically perform texture mapping and detail carving to generate a human body model. However, this method usually requires expensive scanning equipment and highly skilled experts to process the scanned data. It takes a long time from scanning to the completed model, and it is mainly suitable for static modeling, dynamic expressions and motion capture needs. extra work.
方式二,基于AI的人像合成:利用神经网络和大量的训练数据,直接合成或转换人像视角和姿态。但是,这种方式需要大量的标注数据进行训练,在某些复杂的场景和角度下,生成的结果可能不够真实,并且,实时应用可能需要高性能的硬件支持,计算需求高。Method 2, AI-based portrait synthesis: Use neural networks and a large amount of training data to directly synthesize or convert the portrait perspective and posture. However, this method requires a large amount of annotated data for training. In some complex scenes and angles, the generated results may not be realistic enough. In addition, real-time applications may require high-performance hardware support and high computing requirements.
方式三,基于深度图像的立体图像渲染方案:使用颜色图像和对应的深度图像来生成新的视点图像。通过深度信息,该方法可以估计场景的3D结构,并从新的视点渲染场景。但是,这种方式的输出质量严重依赖于深度图像的质量,而低质量或不准确的深度信息还可能导致渲染图像中的伪影或失真,另外,深度图像的获取成本比较高,需要通过一些硬件,激光雷达等来获取,这可能增加成本和复杂性。Method three, stereoscopic image rendering scheme based on depth images: use color images and corresponding depth images to generate new viewpoint images. With depth information, the method can estimate the 3D structure of the scene and render the scene from a new viewpoint. However, the output quality of this method depends heavily on the quality of the depth image, and low-quality or inaccurate depth information may also lead to artifacts or distortions in the rendered image. In addition, the acquisition cost of the depth image is relatively high and requires some methods. hardware, lidar, etc., which can add cost and complexity.
因此,如何在基于数字人进行信息展示的过程中,以更低成本为用户提供更真实的视觉体验,成为需要本领域技术人员解决的技术问题。Therefore, how to provide users with a more realistic visual experience at a lower cost in the process of information display based on digital people has become a technical problem that needs to be solved by those skilled in the art.
发明内容Contents of the invention
本申请提供了基于动态数字人形象进行信息展示的方法及电子设备,能够以更低成本为用户提供更真实的视觉体验。This application provides a method and electronic device for information display based on dynamic digital human images, which can provide users with a more realistic visual experience at a lower cost.
本申请提供了如下方案:This application provides the following solutions:
一种基于动态数字人形象进行信息展示的方法,包括:A method of information display based on dynamic digital human images, including:
获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到的;Obtain video content from multiple real perspectives. The video content from multiple real perspectives is obtained by synchronously shooting the process of real people displaying the target clothing while wearing the target clothing from different perspectives using multiple camera devices. of;
从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;Extract the character images contained in multiple video frames contained in the video content respectively;
利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view person images at the same time point, generate character images at corresponding time points for multiple intermediate viewing angles between the adjacent real-view angles;
根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。Generate multi-view videos based on the multiple real views and the multiple intermediate view character images at multiple time points, so as to match the multi-view videos to the preset virtual 3D space scene model on the client, To provide a dynamic digital human image to display the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching.
其中,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下在目标空间场所中行走以对所述目标服饰进行展示的过程进行同步拍摄得到的。Wherein, the video content of multiple real perspectives is obtained by synchronously shooting the process of real people walking in the target space wearing the target clothing from different perspectives through multiple camera devices to display the target clothing. of.
其中,所述为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像,包括:Wherein, generating character images corresponding to time points from multiple intermediate perspectives between adjacent real perspectives includes:
利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,对所述相邻真实视角之间的多个中间视角在对应时间点的像素位置进行估计,以生成所述中间视角在多个时间点上的人物图像。Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view person images at the same time point, the pixel positions of multiple intermediate views between the adjacent real-view views at the corresponding time points are processed. Estimation to generate images of people at multiple time points from the intermediate perspective.
其中,所述对所述相邻真实视角之间的多个中间视角在对应时间点的像素位置进行估计,包括:Wherein, the estimating the pixel positions of multiple intermediate views between the adjacent real views at corresponding time points includes:
以相邻真实视角在同一时间点分别对应的人物图像为输入,通过深度学习模型拟合出所述相邻真实视角的人物图像之间的稠密光流场,并利用所述稠密光流场,对所述相邻真实视角之间的多个中间视角在对应时间点的人物图像像素位置进行估计。Taking the character images corresponding to adjacent real-view angles at the same time point as input, a deep learning model is used to fit the dense optical flow field between the adjacent real-perspective character images, and the dense optical flow field is used to The pixel positions of the human image at corresponding time points from multiple intermediate viewing angles between the adjacent real viewing angles are estimated.
其中,还包括:Among them, it also includes:
对所述多视角视频进行压缩处理,以用于向客户端进行传输。The multi-view video is compressed for transmission to the client.
其中,所述对所述多视角视频进行压缩处理,包括:Wherein, the compression processing of the multi-view video includes:
以时间点为单位对多个视角对应的视频帧进行拼接处理,得到由多个组合帧形成的帧序列;其中,将该时间点的多个视角对应的多个视频帧划分为多个集合,每个集合中的多个视频帧拼接为一个组合帧,且使得相邻视角的视频帧位于相邻组合帧的相同位置;The video frames corresponding to multiple viewing angles are spliced in units of time points to obtain a frame sequence formed by multiple combined frames; wherein, the multiple video frames corresponding to multiple viewing angles at the time point are divided into multiple sets, Multiple video frames in each set are spliced into a combined frame, so that video frames from adjacent perspectives are located at the same position of adjacent combined frames;
利用通用的视频编码器对所述多个组合帧形成的帧序列进行编码,并对所述多个组合帧进行帧间压缩处理。A general video encoder is used to encode the frame sequence formed by the multiple combined frames, and inter-frame compression is performed on the multiple combined frames.
其中,每个组合帧的分辨率低于终端设备可支持的最大分辨率。Among them, the resolution of each combined frame is lower than the maximum resolution supported by the end device.
其中,还包括:Among them, it also includes:
在对多个组合帧形成的帧序列进行编码及帧间压缩处理之后,还对帧序列进行切片处理,以便以切片后得到的片段为单位进行传输,在接收端以片段为单位进行独立的解码播放。After the frame sequence formed by multiple combined frames is encoded and inter-frame compressed, the frame sequence is also sliced so that the slices obtained after slicing are transmitted as units, and the receiving end is independently decoded in units of segments. Play.
其中,所述对多个组合帧形成的帧序列进行编码时,还包括:Wherein, when encoding the frame sequence formed by multiple combined frames, it also includes:
根据每个切片中包括的组合帧的数量,控制帧间编码过程中的关键帧间隔,以便减少同一切片中被编码成关键帧的帧数。The keyframe interval during inter-frame encoding is controlled based on the number of combined frames included in each slice in order to reduce the number of frames in the same slice that are encoded as keyframes.
其中,所述对多个组合帧形成的帧序列进行编码时,还包括:Wherein, when encoding the frame sequence formed by multiple combined frames, it also includes:
对于关键帧之外的组合帧,通过调低对双向参考帧的判断阈值,增加同一切片中被编码成双向参考帧的帧数。For combined frames other than key frames, the number of frames encoded into bidirectional reference frames in the same slice is increased by lowering the judgment threshold for bidirectional reference frames.
一种基于动态数字人形象进行信息展示的方法,包括:A method of information display based on dynamic digital human images, including:
响应于用户发起的查看请求,获取多视角视频,所述多视角视频通过以下方式生成:通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到多个真实视角的视频内容,对所述多个真实视角的视频内容包含的多个视频帧中分别提取出其中包含的人物图像,利用基于图像的渲染技术,为相邻真实视角之间的多个中间视角生成人物图像,并根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成所述多视角视频;In response to a viewing request initiated by the user, a multi-view video is obtained. The multi-view video is generated through the process of displaying the target clothing on a real person wearing the target clothing from different perspectives through multiple camera devices. Synchronous shooting is performed to obtain video content from multiple real-view angles, and the character images contained in the multiple video frames contained in the video content from multiple real-view angles are extracted respectively, and image-based rendering technology is used to create adjacent real-view angle images. Generate character images from a plurality of intermediate perspectives, and generate the multi-view video based on the multiple real perspectives and the character images at multiple time points from the multiple intermediate perspectives;
对所述多视角视频进行解码;decoding the multi-view video;
将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容;Match the multi-view video to a preset virtual 3D space scene model to provide content in which a dynamic digital human image displays the target clothing in the virtual 3D space scene;
响应于连续视角切换的交互操作,通过切换到其他视角的人物图像视频内容,提供模拟连续视角切换的互动效果。In response to the interactive operation of continuous perspective switching, the interactive effect of simulating continuous perspective switching is provided by switching to character image video content from other perspectives.
一种生成动态数字人形象的方法,包括:A method for generating dynamic digital human images, including:
获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角对真实人物执行目标动作的过程进行同步拍摄得到的;Obtain video content from multiple real viewing angles, which is obtained by synchronously shooting the process of real people performing target actions from different viewing angles through multiple camera devices;
从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;Extract the character images contained in multiple video frames contained in the video content respectively;
利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view person images at the same time point, generate character images at corresponding time points for multiple intermediate viewing angles between the adjacent real-view angles;
根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像,生成多视角视频,以便通过所述多视角视频进行对应的动态数字人形象的展示。Multi-view videos are generated based on the multiple real view angles and the multiple intermediate view character images at multiple time points, so that corresponding dynamic digital human images can be displayed through the multi-perspective videos.
一种基于动态数字人形象进行服饰信息展示的方法,包括:A method for displaying clothing information based on dynamic digital human images, including:
响应于通过动态数字人对目标服饰进行展示的请求,获得虚拟3D空间场景模型,以及通过多视角视频的形式表达的动态数字人形象,其中,所述多视角视频用于从多个视角对目标人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行展示;In response to a request to display the target clothing through a dynamic digital human, a virtual 3D space scene model is obtained, as well as a dynamic digital human image expressed in the form of a multi-view video, wherein the multi-view video is used to display the target from multiple perspectives. Display the process of displaying the target clothing while the character is wearing the target clothing;
将所述多视角视频匹配到所述虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。Match the multi-view video to the virtual 3D space scene model to provide content of a dynamic digital human image displaying the target clothing in the virtual 3D space scene, and provide a method for simulating continuous viewing angle switching. interactive effect.
一种基于动态数字人形象进行信息展示的装置,包括:A device for displaying information based on dynamic digital human images, including:
视频内容获得单元,用于获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到的;A video content acquisition unit is used to obtain video content from multiple real perspectives. The video content from multiple real perspectives is performed on real people wearing the target clothing from different perspectives through multiple camera devices. It is obtained by synchronously shooting the display process;
人物图像提取单元,用于从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;A character image extraction unit, configured to extract character images contained in multiple video frames contained in the video content;
视角合成单元,用于利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;A perspective synthesis unit, configured to use image-based rendering technology to generate correspondences for multiple intermediate perspectives between adjacent true perspectives based on pixel offset relationship information between adjacent real perspective images of people at the same time point. Images of people at points in time;
多视角视频生存和单元,用于根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。A multi-view video storage and unit is configured to generate a multi-view video based on the multiple real views and the multiple intermediate view character images at multiple time points, so as to match the multi-view video to the predetermined The virtual 3D space scene model is placed in the virtual 3D space scene model to provide the content of the dynamic digital human image displaying the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching.
一种基于动态数字人形象进行信息展示的装置,包括:A device for displaying information based on dynamic digital human images, including:
多视角视频获取单元,用于响应于用户发起的查看请求,获取多视角视频,所述多视角视频通过以下方式生成:通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到多个真实视角的视频内容,对所述多个真实视角的视频内容包含的多个视频帧中分别提取出其中包含的人物图像,利用基于图像的渲染技术,为相邻真实视角之间的多个中间视角生成人物图像,并根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成所述多视角视频;A multi-view video acquisition unit is configured to obtain a multi-view video in response to a viewing request initiated by a user. The multi-view video is generated in the following manner: using multiple camera devices to view real people wearing target clothing from different perspectives. The process of displaying the target clothing is synchronously shot to obtain video content from multiple real perspectives, and the character images contained in the multiple video frames contained in the video content from the multiple real perspectives are extracted respectively, and image-based Rendering technology to generate character images for multiple intermediate perspectives between adjacent real perspectives, and generate the multi-view video based on the multiple real perspective and the character images of the multiple intermediate perspectives at multiple time points;
解码单元,用于对所述多视角视频进行解码;A decoding unit, used to decode the multi-view video;
添加单元,用于将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容;Adding a unit for matching the multi-view video to a preset virtual 3D space scene model to provide content in which a dynamic digital human image displays the target clothing in the virtual 3D space scene;
视角切换交互单元,用于响应于连续视角切换的交互操作,通过切换到其他视角的人物图像视频内容,提供模拟连续视角切换的互动效果。The perspective switching interaction unit is used to respond to the interactive operation of continuous perspective switching and provide an interactive effect that simulates continuous perspective switching by switching to character image video content from other perspectives.
一种生成动态数字人形象的装置,包括:A device for generating dynamic digital human images, including:
视频内容获得单元,用于获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角对真实人物执行目标动作的过程进行同步拍摄得到的;A video content acquisition unit is used to obtain video content from multiple real viewing angles. The video content from multiple real viewing angles is obtained by synchronously shooting the process of real people performing target actions from different viewing angles through multiple camera devices;
人物图像提取单元,用于从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;A character image extraction unit, configured to extract character images contained in multiple video frames contained in the video content;
视角合成单元,用于利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;A perspective synthesis unit, configured to use image-based rendering technology to generate correspondences for multiple intermediate perspectives between adjacent true perspectives based on pixel offset relationship information between adjacent real perspective images of people at the same time point. Images of people at points in time;
动态数字人资产生成单元,用于根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像,生成多视角视频,以便通过所述多视角视频进行对应的动态数字人形象的展示。A dynamic digital human asset generation unit, configured to generate multi-view videos based on the multiple real views and the multiple intermediate view character images at multiple time points, so as to generate corresponding dynamic digital images through the multi-view videos. Display of human image.
一种基于动态数字人形象进行服饰信息展示的装置,包括:A device for displaying clothing information based on dynamic digital human images, including:
请求接收单元,用于响应于通过动态数字人对目标服饰进行展示的请求,获得虚拟3D空间场景模型,以及通过多视角视频的形式表达的动态数字人形象,其中,所述多视角视频用于从多个视角对目标人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行展示;A request receiving unit, configured to obtain a virtual 3D space scene model and a dynamic digital human image expressed in the form of a multi-view video in response to a request to display the target clothing through a dynamic digital human, wherein the multi-view video is used for Demonstrate the process of displaying the target clothing by the target person while wearing the target clothing from multiple perspectives;
展示单元,用于将所述多视角视频匹配到所述虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。A display unit configured to match the multi-view video to the virtual 3D space scene model to provide content for displaying the target clothing by a dynamic digital human image in the virtual 3D space scene, and to provide content for Simulate the interactive effect of continuous perspective switching.
一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一项所述的方法的步骤。A computer-readable storage medium on which a computer program is stored, which implements the steps of any of the foregoing methods when executed by a processor.
一种电子设备,包括:An electronic device including:
一个或多个处理器;以及one or more processors; and
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行前述任一项所述的方法的步骤。A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform any of the foregoing methods. A step of.
根据本申请提供的具体实施例,本申请公开了以下技术效果:According to the specific embodiments provided in this application, this application discloses the following technical effects:
通过本申请实施例,在需要为用户提供“模特秀场”等功能时,可以通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄,得到多个真实视角的视频内容,之后,可以从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像,再利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像。然后,可以根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将其中一视角的人物图像视频内容添加到预置的虚拟3D空间场景模型中进行渲染展示,以提供动态数字人在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。通过这种方式,无需对模特人物进行显式建模,只需利用多视角拍摄的图像即可实现类3D的真人数字人效果,避免了复杂的建模流程和高昂的建模成本,从而可以获得低成本、高保真、可实时互动的动态真人数字人效果。并且,由于本申请实施例中建立起的真人数字人资产的格式为普通的HEVC等格式的视频,因此,可被大多数移动设备解析,并且避免了复杂的渲染流程和较大的计算开销。Through the embodiments of this application, when it is necessary to provide users with functions such as "model show", the process of displaying the target clothing by real people wearing the target clothing can be synchronized from different perspectives through multiple camera devices. Shoot to obtain video content from multiple real-view angles. Afterwards, the character images contained in the multiple video frames contained in the video content can be extracted respectively, and then image-based rendering technology is used to render the same video content based on adjacent real-view angles. The pixel offset relationship information between the person images at the time point is used to generate person images corresponding to the time point for multiple intermediate perspectives between the adjacent real viewing angles. Then, a multi-view video can be generated based on the multiple real viewing angles and the multiple intermediate viewing angles of character images at multiple time points, so that the character image video content from one of the viewing angles can be added to the preset virtual Rendering and display are performed in the 3D space scene model to provide the content of the dynamic digital person displaying the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching. In this way, there is no need to explicitly model the model character, and only the images taken from multiple perspectives can be used to achieve a 3D-like real-life digital human effect, avoiding complex modeling processes and high modeling costs, thus enabling Get low-cost, high-fidelity, dynamic real-time interactive digital effects. Moreover, since the format of the real-life digital human assets created in the embodiment of the present application is ordinary HEVC and other format videos, it can be parsed by most mobile devices and avoids complex rendering processes and large computing overhead.
另外,对于多个视角对应的多份视频内容,具体在进行视频编码时,可以以时间点为单位对所述多个视角对应的多个视频帧进行拼接处理,得到由多个组合帧形成的帧序列,并且,将该时间点的多个视角对应的多个视频帧划分为多个集合,每个集合中的多个视频帧拼接为一个组合,且使得相邻视角的视频帧位于相邻组合帧的相同位置。之后,可以利用通用的视频编码器对所述多个组合帧形成的帧序列进行编码,并通过对所述多个组合帧进行帧间压缩处理,消除或减少相邻视角的视频帧间的冗余信息。这样,由于对多个视角的视频帧进行了分组拼接,因此,使得拼接后的组合帧的分辨率不至于过高,便于在大部分终端设备中进行实时解码;另外,由于在进行分组拼接时,对分组方式以及排列方式进行了控制,使得相邻视角的视频帧位于相邻组合帧的相同位置,也就是说,使得相邻视角的视频帧位于不同但是相邻的组合帧中,且在不同组合帧中的位置相同,而相邻视角的视频帧之间具有相似度比较高的特点,因此,通过这种方式拼接成的相邻组合帧之间具有比较高的相似度,进而通过通用的帧间压缩算法即可通过消除或减少相邻视角的视频帧间的冗余信息,而获得较高的压缩率。换言之,在本申请实施例中,通过通用的视频编码器即可获得理想的压缩率,相应的,在解码端利用通用的解码器即可完成解码,从而可以在更多的终端设备上得到支持。In addition, for multiple video contents corresponding to multiple viewing angles, specifically during video encoding, multiple video frames corresponding to the multiple viewing angles can be spliced in units of time points to obtain a video frame formed by multiple combined frames. frame sequence, and divide the multiple video frames corresponding to the multiple perspectives at that time point into multiple sets, and splice the multiple video frames in each set into a combination, so that the video frames from adjacent perspectives are located adjacent to each other. The same position of the combined frame. After that, a general video encoder can be used to encode the frame sequence formed by the multiple combined frames, and by performing inter-frame compression processing on the multiple combined frames, the redundancy between video frames of adjacent perspectives can be eliminated or reduced. remaining information. In this way, since the video frames from multiple perspectives are grouped and spliced, the resolution of the spliced combined frame is not too high, which facilitates real-time decoding in most terminal devices; in addition, since the grouped splicing is performed, , the grouping method and arrangement method are controlled so that video frames from adjacent perspectives are located at the same position of adjacent combined frames, that is, video frames from adjacent perspectives are located in different but adjacent combined frames, and in The positions in different combined frames are the same, and video frames from adjacent perspectives have relatively high similarities. Therefore, adjacent combined frames spliced in this way have relatively high similarities, and then through universal The inter-frame compression algorithm can obtain a higher compression rate by eliminating or reducing redundant information between video frames from adjacent perspectives. In other words, in the embodiment of the present application, the ideal compression rate can be obtained through a universal video encoder. Correspondingly, the decoding can be completed by using a universal decoder at the decoding end, so that it can be supported on more terminal devices. .
当然,实施本申请的任一产品并不一定需要同时达到以上所述的所有优点。Of course, implementing any product of this application does not necessarily require achieving all the above-mentioned advantages at the same time.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the drawings of the present application. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.
图1是本申请实施例提供的系统架构的示意图;Figure 1 is a schematic diagram of the system architecture provided by the embodiment of the present application;
图2是本申请实施例提供的第一方法的流程图;Figure 2 is a flow chart of the first method provided by the embodiment of the present application;
图3是本申请实施例提供的帧重排方式的示意图;Figure 3 is a schematic diagram of a frame rearrangement method provided by an embodiment of the present application;
图4是本申请实施例提供的第二方法的流程图;Figure 4 is a flow chart of the second method provided by the embodiment of the present application;
图5是本申请实施例提供的第三方法的流程图;Figure 5 is a flow chart of the third method provided by the embodiment of the present application;
图6是本申请实施例提供的第四方法的流程图;Figure 6 is a flow chart of the fourth method provided by the embodiment of the present application;
图7是本申请实施例提供的电子设备的示意图。Figure 7 is a schematic diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art fall within the scope of protection of this application.
首先需要说明的是,在本申请实施例中,主要可以针对商品信息服务系统等应用系统中为用户提供“模特秀场”等场景,在该场景中需要达到的效果通常是:用户可以通过客户端查看到“数字人”在穿着某个服饰的状态下,在某个空间场景中行走,或者做出某些展示动作,并且,通常需要为用户提供连续视角切换的互动效果。其中,数字人(DigitalHuman / Meta Human),是运用数字技术创造出来的、与人类形象接近的数字化人物形象。也就是说,用户在通过手机等终端设备查看这种“模特秀场”界面的过程中,可以通过在手机屏幕上执行滑动操作的方式,切换到“任意”视角进行观看,就像用户亲自在“走秀”现场,并且可以走到不同位置对模特走秀过程进行观看一样。First of all, it should be noted that in the embodiments of this application, scenarios such as "model shows" can be provided to users in application systems such as product information service systems. The effect to be achieved in this scenario is usually: users can The client sees a "digital person" walking in a certain spatial scene while wearing certain clothing, or performing certain display actions, and it is usually necessary to provide users with an interactive effect of continuous perspective switching. Among them, Digital Human/Meta Human is a digital character created using digital technology that is close to a human image. That is to say, when the user is viewing this "model show" interface through a terminal device such as a mobile phone, he or she can switch to an "any" viewing angle by performing a sliding operation on the mobile phone screen, just like the user is in person. It is the same as the "catwalk" scene, and you can go to different locations to watch the models' catwalk process.
在现有技术中,一些商品信息服务系统中也会存在“模特秀场”相关的产品,但是,主要是采用建模的方式来生成“3D数字人”,然后将3D服饰模型匹配到这种“3D数字人”身上,并模拟出真实人物“走秀”的效果。在此过程中,可以提供多视角查看等互动效果,但是,其中的“3D数字人”由于是通过3D建模的方式生成的,因此,看上去会有明显的“卡通感”,不够真实,行走、姿势等动作也会不够自然;另外,通过3D服饰模型也难以完全还原服饰的特点,等等。In the existing technology, there are also products related to "model shows" in some product information service systems. However, modeling is mainly used to generate "3D digital people", and then the 3D clothing models are matched to such products. "3D digital people", and simulate the effect of real people "walking on the catwalk". During this process, interactive effects such as multi-view viewing can be provided. However, since the "3D digital human" is generated through 3D modeling, it will look obviously "cartoon-like" and not realistic enough. Movements such as walking and posture will also be unnatural; in addition, it is difficult to fully restore the characteristics of clothing through 3D clothing models, etc.
本申请实施例中,为了能够在商品信息服务系统中的“模特秀场”、“试衣间”等场景中,为用户提供更真实的视觉体验,让用户看到的是更真实的人物形象(而不是看上去比较缺乏相对的真实感的卡通形象),提供了采用“真人数字人”的方式为用户提供“模特秀场”等功能的实现方案。In the embodiment of this application, in order to provide users with a more realistic visual experience in scenes such as "model show" and "fitting room" in the product information service system, so that users can see more realistic characters. (rather than a cartoon image that seems relatively lacking in realism), it provides an implementation plan for using "real digital people" to provide users with functions such as "model shows".
具体的,可以以多视角视频来代替现有技术中的3D建模,也就是说,可以通过多相机拍摄等方式得到多个视角的视频,然后,可以通过这种多视角视频为用户提供“模特秀场”功能,从而使得“模特”秀场中的人物以及服饰状态都更真实。并且,由于实际生产出的资产是以多视角视频的形式存在,而不是模型,因此,可以支持在更多的终端设备中进行渲染展示。Specifically, multi-view videos can be used to replace 3D modeling in the existing technology. That is to say, videos from multiple views can be obtained through multi-camera shooting, etc., and then, users can be provided with "multi-view videos" The "model show" function makes the characters and clothing status in the "model" show more realistic. Moreover, since the assets actually produced exist in the form of multi-view videos rather than models, it can support rendering and display on more terminal devices.
在通过上述多视角视频来实现“模特秀场”的过程中,还存在以下问题:如前文所述,在“模特秀场”功能中通常需要为用户提供连续视角切换的互动效果,例如,让用户可以通过滑动屏幕等方式进行连续的视角切换。但是,本申请实施例中,由于是通过多视角视频来表示具体的“数字人”,而视角是离散的,因此,在客户端展示过程中,如果需要切换视角,则只能在这些固定的视角之间进行切换,例如,从视角1切换到视角2,如果视角之间的距离比较大,则用户可能会感觉到明显的跳变现象。当然,理论上讲,如果相机设备的数量足够多,分布足够密集,使得可以获得更多视角上的视频内容,则在播放端进行播放的过程中,即使是在离散的视角之间进行切换,也可以从一定程度上模拟出连续视角切换的互动效果。但是,相机设备数量越多,则意味着成本会越高。In the process of realizing the "model show" through the above-mentioned multi-perspective videos, there are still the following problems: As mentioned above, in the "model show" function, it is usually necessary to provide users with the interactive effect of continuous perspective switching, for example, let Users can continuously switch perspectives by sliding the screen. However, in the embodiment of this application, since a specific "digital person" is represented through multi-view video, and the viewing angles are discrete, therefore, during the client display process, if the viewing angle needs to be switched, it can only be done at these fixed Switching between viewing angles, for example, switching from viewing angle 1 to viewing angle 2, if the distance between viewing angles is relatively large, the user may feel an obvious jump phenomenon. Of course, in theory, if the number of camera devices is large enough and the distribution is dense enough, so that video content from more viewing angles can be obtained, then during playback on the playback end, even when switching between discrete viewing angles, The interactive effect of continuous perspective switching can also be simulated to a certain extent. However, more camera equipment means higher costs.
因此,为了能够以更低成本实现对连续视角切换的模拟,本申请实施例还采用了基于图像的渲染技术,通过该技术,可以利用相邻两个视角实际拍摄到的图像,估计出多个中间视角上的像素位置,并生成中间视角的图像。也即,不需要实际增加相机设备的数量,而是可以通过基于图像的渲染技术,对并未真正部署相机设备的多个中间视角上的图像进行补充,以此生成更多视角上的图像内容,进而达到更好的模拟连续视角切换的目的。Therefore, in order to be able to simulate continuous viewing angle switching at a lower cost, embodiments of the present application also adopt image-based rendering technology. Through this technology, images actually captured from two adjacent viewing angles can be used to estimate multiple The pixel position on the intermediate perspective and generates the image of the intermediate perspective. That is, there is no need to actually increase the number of camera devices. Instead, image-based rendering technology can be used to supplement images from multiple intermediate perspectives where camera devices are not actually deployed, thereby generating image content from more perspectives. , thereby achieving the purpose of better simulating continuous perspective switching.
当然,具体实现时,还可能需要在“模特秀场”功能提供多种不同的场景,例如,室内场景,室外场景,科技场景,等等,如果直接在真实的空间中搭建出这些场景,则成本可能仍然比较高,并且需要同一模特人物分别在多个不同的场景中重新完成走秀,并重新进行拍摄等等,因此,时间成本也会比较高。Of course, during specific implementation, it may also be necessary to provide a variety of different scenes in the "model show" function, such as indoor scenes, outdoor scenes, technology scenes, etc. If these scenes are built directly in the real space, then The cost may still be relatively high, and the same model will need to re-complete the catwalk in multiple different scenes and re-shoot, etc. Therefore, the time cost will also be relatively high.
为此,在本申请实施例中,真人数字人的生成与场景的创建可以独立进行,在通过相机设备拍摄得到多视角的视频内容后,可以对视频中的模特人物进行“抠图”,后续的对中间视角的图像合成等都可以是基于抠图后得到的人物图像来进行。另外还可以提供多种不同的3D虚拟空间模型,在客户端进行展示时,可以将多视角的人物图像视频放入到3D虚拟空间场景模型中进行展示。这样,同一个多视角人物图像视频可以放到不同的3D虚拟空间场景模型中进行展示。For this reason, in the embodiment of the present application, the generation of real digital people and the creation of scenes can be performed independently. After the multi-angle video content is captured through the camera device, the model characters in the video can be "cut out" and subsequently The image synthesis of the intermediate perspective can be performed based on the character image obtained after cutout. In addition, a variety of different 3D virtual space models can be provided. When displaying on the client, multi-view character image videos can be put into the 3D virtual space scene model for display. In this way, the same multi-view character image video can be displayed in different 3D virtual space scene models.
另外,由于本申请实施例中涉及到多视角视频,并且,为了实现对连续视角切换的模拟,视角(包括真实视角,以及通过算法补充的中间视角)的数量可能会非常多(可能是每隔3度或5度一个视角,则共有120或者72个视角,等等),此时,对于这种多视角视频的压缩、传输等也是需要考虑的问题。针对该问题,本申请实施例中也提供了相应的解决方案,对于该方案,后文中会有详细介绍。In addition, since the embodiments of this application involve multi-view videos, and in order to simulate continuous view switching, the number of views (including real views and intermediate views supplemented by algorithms) may be very large (perhaps every other A viewing angle of 3 degrees or 5 degrees means a total of 120 or 72 viewing angles, etc.). At this time, the compression and transmission of this multi-view video also need to be considered. To address this problem, the embodiments of this application also provide a corresponding solution, which will be introduced in detail later.
从系统架构角度而言,参见图1,本申请实施例可以涉及到采集端、服务端以及客户端。其中,在采集端,可以通过多个相机设备对模特任务走秀等过程进行拍摄,得到多个真实视角的视频内容。在服务端,可以进行人像抠图、中间视角的图像合成、多视角视频压缩等处理。压缩处理后的多视角视频可以传输到客户端,在传输时还可以进行切片处理,以缩短客户端的等待时延。在客户端则可以对多视角视频进行下载,并渲染出3D空间场景模型,根据多视角视频渲染出“数字人”(某个默认视角的人物图像视频内容),然后将“数字人”添加到3D空间场景模型中,在此过程中,还可以进行一些光影融合等处理,例如,包括实现不同视角下的阴影一致性,等等,以提升展示效果。在展示过程中,则可以响应用户的视角切换操作,通过切换到相邻的其他视角的视频内容,来模拟出连续视角切换的效果。另外,还可以响应用户执行的缩放操作,等等。From a system architecture perspective, referring to Figure 1, embodiments of this application may involve a collection end, a server end, and a client end. Among them, on the collection side, multiple camera devices can be used to shoot the process of model tasks and catwalks, and obtain video content from multiple real perspectives. On the server side, processing such as portrait cutout, intermediate perspective image synthesis, and multi-view video compression can be performed. The compressed multi-view video can be transmitted to the client, and can also be sliced during transmission to shorten the client's waiting delay. On the client side, you can download the multi-view video, render a 3D spatial scene model, render a "digital human" (character image video content from a certain default perspective) based on the multi-view video, and then add the "digital human" to In the 3D space scene model, during this process, some processing such as light and shadow fusion can also be performed, for example, including achieving shadow consistency under different viewing angles, etc., to improve the display effect. During the display process, it can respond to the user's perspective switching operation and simulate the effect of continuous perspective switching by switching to video content from other adjacent perspectives. Additionally, you can respond to zoom operations performed by the user, and more.
下面对本申请实施例提供的具体实现方案进行详细介绍。The specific implementation solutions provided by the embodiments of this application are introduced in detail below.
实施例一Embodiment 1
首先,该实施例一从服务端的角度,提供了一种基于动态数字人形象进行信息展示的方法,参见图2,该方法具体可以包括:First, from the perspective of the server, Embodiment 1 provides a method for displaying information based on dynamic digital human images. See Figure 2. The method may specifically include:
S201:获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到的。S201: Obtain video content from multiple real perspectives. The video content from multiple real perspectives is to synchronize the process of displaying the target clothing by real people wearing the target clothing from different perspectives through multiple camera devices. Photographed.
其中,所谓的真实视角,就是实际部署有相机设备以进行拍摄的视角。具体实现时,针对“模特秀场”功能,首先可以在摄影棚等真实空间场景中,环绕真实人物(也即真人模特人物)180/360度布设多台相机设备,确保每个视角都能得到清晰的拍摄,在真人模特人物穿着具体所需展示的服饰的状态下,利用所有相机设备进行同步拍摄,使得在同一时间捕获到人物的动作和姿态。这样,具体的模特人物可以在穿着具体所需展示的服饰的状态下,在上述空间场景中行走或者做出某些展示动作,在此过程中,上述多台相机设备就可以进行同步拍摄,从而得到多个视角的视频内容。在本申请实施例中,具体的相机设备数量可以不必过多,例如,可以十几台的数量级,等等。Among them, the so-called real perspective is the perspective from which camera equipment is actually deployed for shooting. When implemented specifically, for the "model show" function, multiple camera equipment can be deployed 180/360 degrees around real people (i.e. real model characters) in real space scenes such as studios to ensure that every perspective can be captured Clear shooting, using all camera equipment to capture the characters' movements and postures at the same time while the real model is wearing the specific clothing required to be displayed. In this way, a specific model character can walk in the above-mentioned space scene or perform certain display actions while wearing the specific clothing required for display. During this process, the above-mentioned multiple camera devices can perform synchronous shooting, thereby Get video content from multiple perspectives. In the embodiment of the present application, the number of specific camera devices does not need to be too large, for example, it may be on the order of more than a dozen, and so on.
S202:从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像。S202: Extract character images contained in multiple video frames contained in the video content.
在获取到多个视角的视频内容之后,由于需要生成“数字人”,以便于与多个不同的场景模型进行融合,因此,首先可以从多个视角的视频帧中,分别提取出其中包含的人物图像。具体的,由于每个视角对应一份视频内容,每份视频内容中包括多个视频帧,每个视频帧中都可以包括人物图像,可以利用“抠图”技术从中将人物图像抠取出来。这样,每个视角都可以抠取出多份人物图像,对应的是同一人物,这里的“多份人物图像”是指每个视角在多个不同时间点上的视频帧中分别抠取出的人物图像。由于视频帧具有时间信息,因此,这种抠取出的人物图像也会具有对应的时间点信息,可以按照时间点对这些人物图像进行排列,行程人物图像序列,每个视角都可以得到一个人物图像序列。After obtaining video content from multiple perspectives, it is necessary to generate a "digital human" to facilitate fusion with multiple different scene models. Therefore, first, the video content contained in the video frames from multiple perspectives can be extracted separately. Character images. Specifically, since each perspective corresponds to a video content, each video content includes multiple video frames, and each video frame can include a character image, and the "cutout" technology can be used to extract the character image from it. In this way, multiple character images can be extracted from each perspective, corresponding to the same character. The "multiple character images" here refer to the character images extracted from the video frames at multiple different time points from each perspective. . Since the video frame has time information, the extracted character images will also have corresponding time point information. These character images can be arranged according to the time point, and the character image sequence can be obtained from each perspective. sequence.
S203:利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像。S203: Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view character images at the same time point, generate characters at corresponding time points for multiple intermediate perspectives between adjacent real-view perspectives. image.
在获得每个视角对应的人物图像序列之后,为了更好的模拟连续视角切换效果,还可以通过插值的方式,合成出更多视角上的人物图像序列。具体的,该视角合成过程可以在每两个相邻的真实视角之间来进行,例如,假设共有10个真实视角,视角1与视角2相邻,则可以对视角1与视角2之间的多个中间视角进行人物图像合成,也即,估计出假设中间视角处设有相机设备,则拍摄到的人物图像会是怎样的。另外,还可以对视角2与视角3之间,视角3与视角4之间等其他的相邻视角之间的中间视角进行图像合成。这样,使得每两个相邻的真实视角之间都可以估计出多个中间视角上的图像。After obtaining the character image sequence corresponding to each perspective, in order to better simulate the effect of continuous perspective switching, interpolation can be used to synthesize character image sequences from more perspectives. Specifically, the perspective synthesis process can be performed between every two adjacent real perspectives. For example, assuming there are 10 real perspectives in total, and perspective 1 and perspective 2 are adjacent, then the perspective between perspective 1 and perspective 2 can be Synthesizing human images from multiple intermediate angles of view, that is, estimating what the captured image of a person would look like assuming a camera device is installed at the intermediate angles of view. In addition, image synthesis can also be performed on intermediate viewing angles between viewing angles 2 and 3, between viewing angles 3 and 4, and other adjacent viewing angles. In this way, images from multiple intermediate viewpoints can be estimated between every two adjacent real viewpoints.
其中,关于中间视角的选取,可以根据实际需要而定,例如,经测试,如果每隔3度设置一个视角,则在视角之间切换的过程中,用户将不会明显地感觉到跳变现象,因此,就可以将中间视角的间隔设置为3度;当然,如果为了更好地模拟连续视角切换,则还可以将中间视角的间隔设置地更小,或者,如果为了避免传输的码率过高,则还可以将中间视角的间隔设置地更大一些,等等。Among them, the selection of the middle viewing angle can be determined according to actual needs. For example, after testing, if a viewing angle is set every 3 degrees, the user will not obviously feel the jump phenomenon when switching between viewing angles. , therefore, you can set the interval between the intermediate viewing angles to 3 degrees; of course, if you want to better simulate continuous viewing angle switching, you can also set the interval between the intermediate viewing angles smaller, or if you want to avoid excessive transmission bit rates. If it is high, you can also set the interval between the middle viewing angles to be larger, and so on.
在确定出中间视角的间隔之后,就可以根据相邻两个视角对应的人物图像的像素位置,以及像素偏移关系信息,估计出各个中间视角的像素位置,进而可以合成出具体中间视角上的人物图像。其中,由于每个视角上的人物图像为多个,分别对应不同的时间点,因此,在进行中间视角的图像合成时,也可以以时间点为单位来进行,也即,为具体的中间视角生成多个时间点分别对应的人物图像,这样每个中间视角都可以对应一个人物图像序列。After determining the interval between the intermediate perspectives, the pixel positions of each intermediate perspective can be estimated based on the pixel positions of the character images corresponding to the two adjacent perspectives and the pixel offset relationship information, and then the specific intermediate perspective can be synthesized. Character images. Among them, since there are multiple person images at each perspective, corresponding to different time points, therefore, when performing intermediate perspective image synthesis, it can also be performed in units of time points, that is, for specific intermediate perspective Generate character images corresponding to multiple time points, so that each intermediate perspective can correspond to a character image sequence.
具体实现时,根据相邻两个视角的人物图像对中间视角的像素位置进行估计时,可以有多种方式。例如,一种方式下,可以以相邻真实视角在同一时间点分别对应的人物图像为输入,通过深度学习模型拟合出所述相邻视角对应的人物图像之间的稠密光流场,之后,可以利用这种稠密光流场,对所述相邻真实视角之间的多个中间视角在对应时间点的像素位置进行估计。其中,光流(optical flow)是空间运动物体在观察成像平面上的像素运动的瞬时速度,光流法是利用图像序列中像素在时间域上的变化以及相邻帧之间的相关性,来找到上一帧跟当前帧之间存在的对应关系,从而计算出相邻帧之间物体的运动信息的一种方法。在空间中,运动可以用运动场描述,而在一个图像平面上,物体的运动往往是通过图像序列中不同图像灰度分布的不同体现的,从而,空间中的运动场转移到图像上就表示为光流场。光流场是一个二维矢量场,它反映了图像上每一点灰度的变化趋势,可看成是带有灰度的像素点在图像平面上运动而产生的瞬时速度场。它包含的信息即是各像点的瞬时运动速度矢量信息。其中,稠密光流是一种针对图像或指定的某一片区域进行逐点匹配的图像配准方法,它计算图像上所有的点的偏移量,从而形成一个稠密的光流场。通过这个稠密的光流场,可以进行像素级别的图像配准。因此,在本申请实施例中,就可以首先估计出这种光流场信息,然后,基于光流场,估计出各个中间视角在对应时间点的人物图像像素位置,进而可以为中间视角生成对应的人物图像。In specific implementation, there are many ways to estimate the pixel position of the middle perspective based on the character images of two adjacent perspectives. For example, in one way, human images corresponding to adjacent real viewing angles at the same time point can be used as input, and the dense optical flow field between the character images corresponding to the adjacent real viewing angles can be fitted through a deep learning model, and then , this dense optical flow field can be used to estimate the pixel positions of multiple intermediate viewing angles at corresponding time points between the adjacent real viewing angles. Among them, optical flow (optical flow) is the instantaneous speed of pixel movement of a spatially moving object on the observation imaging plane. The optical flow method uses the changes in the time domain of pixels in the image sequence and the correlation between adjacent frames to A method to find the correspondence between the previous frame and the current frame to calculate the motion information of objects between adjacent frames. In space, motion can be described by a motion field, and on an image plane, the motion of an object is often reflected by the different grayscale distributions of different images in the image sequence. Therefore, the motion field in space is transferred to the image and is expressed as light. Flow field. The optical flow field is a two-dimensional vector field, which reflects the changing trend of the grayscale of each point on the image. It can be regarded as the instantaneous velocity field generated by the movement of pixels with grayscale on the image plane. The information it contains is the instantaneous motion velocity vector information of each image point. Among them, dense optical flow is an image registration method that performs point-by-point matching on an image or a specified area. It calculates the offset of all points on the image to form a dense optical flow field. Through this dense optical flow field, pixel-level image registration can be performed. Therefore, in the embodiment of the present application, this optical flow field information can be estimated first, and then, based on the optical flow field, the pixel position of the character image at the corresponding time point for each intermediate perspective can be estimated, and then the corresponding corresponding time point can be generated for the intermediate perspective. images of people.
S204:根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将所述多视角视频匹配到预置的虚拟3D空间场景模型中进行渲染展示,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。S204: Generate multi-view videos based on the multiple real views and the multiple intermediate views of character images at multiple time points, so as to match the multi-view videos to the preset virtual 3D space scene model on the client. Rendering and display are performed in the virtual 3D space scene to provide the content of the dynamic digital human image displaying the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching.
在获得多个中间视角的人物图像序列之后,可以将多个真实视角以及所述多个中间视角在多个时间点上的人物图像分别组成的序列,分别组织成视频格式,从而得到多视角视频。也就是说,在本申请实施例中,多视角视频可以包括多个真实视角对应的视频,以及多个中间视角对应的视频。可以通过这种多视角视频,来表达一个“数字人”,该“数字人”是通过对真实人物进行拍摄、抠图、以及中间视角合成等方式生成的,而不是通过3D建模的方式生成,因此,在保持3D效果的同时,还可以使得“数字人”看上去更真实,减少卡通感,并且对于服饰等商品也可以实现更真实更自然的展示效果。After obtaining a sequence of person images from multiple intermediate perspectives, the sequences composed of multiple real perspectives and the multiple intermediate perspective person images at multiple time points can be organized into video formats, thereby obtaining a multi-view video . That is to say, in this embodiment of the present application, multi-view videos may include videos corresponding to multiple real views and videos corresponding to multiple intermediate views. This kind of multi-view video can be used to express a "digital person". This "digital person" is generated by shooting, cutting out, and intermediate perspective synthesis of real people, rather than by 3D modeling. , Therefore, while maintaining the 3D effect, it can also make the "digital people" look more real and reduce the cartoon feel, and can also achieve a more realistic and natural display effect for clothing and other products.
通过上述方式得到多视角视频之后,还可以对这种多视角视频进行压缩编码处理,以便提供给客户端进行展示。在客户端展示时,就可以解码出具体的多视角视频,由于之前已经进行了抠图等处理,因此,使得这种多视角视频可以具有透明背景等特点,因此,可以放置到预先生成的3D空间场景模型中,从而呈现出具体的“数字人”位于该3D空间场景模型中的视觉效果。After the multi-view video is obtained through the above method, the multi-view video can also be compressed and encoded to provide it to the client for display. When displayed on the client, the specific multi-view video can be decoded. Since the cutout and other processing have been carried out before, this multi-view video can have characteristics such as a transparent background. Therefore, it can be placed on the pre-generated 3D in the spatial scene model, thereby presenting the visual effect of a specific "digital person" located in the 3D spatial scene model.
其中,具体在进行压缩编码处理时,由于涉及到多视角视频,因此,本申请实施例中还可以对压缩编码的方式进行特殊处理。这是因为,相对于普通视频,多视角视频通常具有以下特点:在分辨率方面:多视角视频可能包括数十甚至上百个视角分别对应的视频数据,如果每个视角都对应的是高清的视频,那么整体的视频数据将是巨大的。假设有120个视角,整体的分辨率将会超过32K甚至更高,这对于大多数的设备来说都是很大的负载;视频码率方面,高分辨率意味着更高的视频码率,这将导致实时传输和流畅播放变得更为困难。普通的720P视频的码率可能是2-5Mbps,但多视角视频的码率会成几十倍甚至上百倍的增加。视频体积方面,多视角视频中包含了多个视角的数据,这导致了视频文件的体积迅速增长。一个小时的多视角视频可能需要数十甚至上百GB的存储空间。这些技术挑战使得多视角视频的存储、压缩和传输变得尤为困难。Specifically, when performing compression encoding processing, since multi-view videos are involved, special processing may also be performed on the compression encoding method in the embodiments of the present application. This is because, compared with ordinary videos, multi-view videos usually have the following characteristics: In terms of resolution: multi-view videos may include video data corresponding to dozens or even hundreds of views. If each view corresponds to high-definition video, then the overall video data will be huge. Assuming there are 120 viewing angles, the overall resolution will exceed 32K or even higher, which is a heavy load for most devices; in terms of video bit rate, high resolution means higher video bit rate. This will make real-time transmission and smooth playback more difficult. The bit rate of ordinary 720P video may be 2-5Mbps, but the bit rate of multi-view video will increase dozens or even hundreds of times. In terms of video volume, multi-view videos contain data from multiple perspectives, which causes the size of video files to grow rapidly. An hour of multi-view video may require tens or even hundreds of GB of storage space. These technical challenges make the storage, compression, and transmission of multi-view video particularly difficult.
现有技术中,存在一些对多视角视频进行压缩、传输的方案。例如:In the existing technology, there are some solutions for compressing and transmitting multi-view videos. For example:
方式一,可以对多视角视频进行简单拼接,具体的,可以将同一时间点的所有视角的图像都拼在同一帧里,然后再进行压缩传输。但是,这会导致拼接后的视频分辨率过高,难以实时解码播放,也很难传输。Method 1 can simply splice multi-view videos. Specifically, images from all views at the same time point can be put together into the same frame, and then compressed and transmitted. However, this will result in the spliced video having too high resolution, making it difficult to decode and play in real time, and difficult to transmit.
方式二,通过流媒体方式进行传输,但是,一方面压缩的效率不够理想,另一方面,为了使得客户端能够切换视角,还需要对多视角视频分别进行切片处理之后,再进行流式传输,当用户在某时刻需要从视角A切换到视角B时,可以拉取视角B在对应时间片的数据进行播放;但是,分片的切流延迟比较大,并且,是否能够进行流畅的视角切换还依赖切片的大小,因为必须等上一个切片播完了才能切换下一个视角的下一切片进行播放。The second method is to transmit through streaming media. However, on the one hand, the compression efficiency is not ideal. On the other hand, in order to enable the client to switch the viewing angle, the multi-view video needs to be sliced and processed separately before streaming. When the user needs to switch from perspective A to perspective B at a certain moment, the data of perspective B in the corresponding time slice can be pulled for playback; however, the stream switching delay of slicing is relatively large, and whether smooth perspective switching is still possible Depends on the size of the slice, because you must wait for the previous slice to finish playing before switching to the next slice in the next perspective for playback.
方式三,采用专为多视角视频设计的编解码器进行视频编解码,这种方式提供了更好的压缩性能,但它增加了编码和解码的复杂性,需要更高的计算能力,此外,由于MV-HEVC是相对较新的标准,需要专用的解码器才可以完成解码,因此,不所有设备都支持这种格式,尤其是普通的手机或者电脑等终端设备通常都是无法支持的,即使能够支持,在打开及播放视频的过程中也会出现卡顿等现象。Method three uses a codec designed specifically for multi-view videos for video encoding and decoding. This method provides better compression performance, but it increases the complexity of encoding and decoding and requires higher computing power. In addition, Since MV-HEVC is a relatively new standard and requires a dedicated decoder to complete the decoding, not all devices support this format, especially terminal devices such as ordinary mobile phones or computers usually cannot support it, even if It can be supported, but lags and other phenomena may also occur during the process of opening and playing videos.
针对上述情况,在本申请实施例中,还针对多视角视频的存储、压缩等问题提供了相应的解决方案。具体的,在对具体多视角视频进行编码之前,首先可以对多个视角分别对应的视频内容进行拼接,也即,将同一时间点不同视角对应的视频内容进行拼接,但是,并不是简单地将所有视角的视频内容全部拼接在同一帧中,而是可以对同一时间点多个视角对应的多个视频帧进行分组,得到多个集合,将同一集合内的多个视频帧拼接到同一帧中(为便于区分,可以将这种拼接后得到的帧称为“组合帧”),也就是说,同一时间点可以对应多个不同的组合帧,通过这种方式,可以使得组合帧的分辨率不至于过高。In view of the above situation, in the embodiment of the present application, corresponding solutions are also provided for problems such as storage and compression of multi-view videos. Specifically, before encoding a specific multi-view video, the video content corresponding to multiple views can be spliced first, that is, the video content corresponding to different views at the same time point is spliced. However, it is not simply a matter of splicing. The video contents from all perspectives are all spliced into the same frame. Instead, multiple video frames corresponding to multiple perspectives at the same time point can be grouped to obtain multiple sets, and multiple video frames in the same set can be spliced into the same frame. (For the convenience of distinction, the frame obtained after this splicing can be called a "combined frame"), that is to say, the same point in time can correspond to multiple different combined frames. In this way, the resolution of the combined frame can be made Not too high.
其中,具体在对多视角的视频帧进行分组时,可以根据具体视角的数量、每个视角下单个视频帧的分辨率以及终端设备可支持的最大分辨率等信息,来确定出需要分成多少个组,以及每组包括多少个视角,以使得每个组合帧的分辨率低于终端设备可支持的最大分辨率。例如,假设共有72个视角,每个视角的视频帧的分辨率为720P,目前市面上大部分终端设备通常能够支持4K分辨率的实时解码,此时,就可以将上述72个视角分成12组,每个组合帧中会包括6个视角在同一时间点对应的视频帧,每个组合帧的分辨率为720P×6=4320P,与一般4K图像的分辨率接近,因此,大部分的终端设备可以实现对这种分辨率的图像帧的实时解码。Specifically, when grouping multi-view video frames, the number of frames that need to be divided into can be determined based on information such as the number of specific views, the resolution of a single video frame in each view, and the maximum resolution supported by the terminal device. groups, and how many views are included in each group such that the resolution of each combined frame is lower than the maximum resolution that the end device can support. For example, assume there are 72 viewing angles in total, and the resolution of the video frame of each viewing angle is 720P. Currently, most terminal devices on the market can usually support real-time decoding of 4K resolution. At this time, the above 72 viewing angles can be divided into 12 groups. , each combined frame will include video frames corresponding to 6 perspectives at the same point in time. The resolution of each combined frame is 720P×6=4320P, which is close to the resolution of general 4K images. Therefore, most terminal devices Real-time decoding of image frames of this resolution can be achieved.
在确定出具体的分组数之后,为了能够在编码过程中实现更高的压缩效率,还可以确定不同视角的分组方式,以及在具体组合帧中的排列方式等。其中,具体的分组方式以及排列方式可以有多种,例如,在最简单的方式下,可以是第1到6视角为一组,第7到12视角为一组,等等,在具体的组合帧中,可以划分为3×2(三排两列)的块,各个视角可以在这些块中按编号顺序排列,等等。但是,考虑到不同视角在同一时间点拍摄到的视频帧在内容上往往具有比较高的相似度,尤其是相邻视角,两者之间的相似度会更高,从信息编码角度而言,这种相邻视角之间存在的高度相似的内容中会存在大量的冗余信息,在编码的过程中是属于可以被压缩的内容。也就是说,冗余信息的存在有利于提升编码的压缩效率,因此,在编码过程中,如果能够充分利用到这种冗余信息,则对于提升压缩效率会有很大的帮助。After determining the specific number of groups, in order to achieve higher compression efficiency during the encoding process, the grouping method of different viewing angles and the arrangement method in the specific combined frame can also be determined. Among them, there can be many specific grouping and arrangement methods. For example, in the simplest way, the 1st to 6th perspectives can be a group, the 7th to 12th perspectives can be a group, etc., in the specific combination The frame can be divided into 3×2 (three rows and two columns) blocks, and the individual views can be arranged in numbered order within these blocks, and so on. However, considering that video frames captured at the same time point from different viewing angles tend to have relatively high similarities in content, especially from adjacent viewing angles, the similarity between the two will be higher. From an information encoding perspective, There will be a large amount of redundant information in the highly similar content between adjacent views, which is content that can be compressed during the encoding process. In other words, the existence of redundant information is conducive to improving the compression efficiency of encoding. Therefore, if this redundant information can be fully utilized during the encoding process, it will be of great help in improving compression efficiency.
在视频编码过程中,具体的信息压缩技术可以分为帧内压缩与帧间压缩两种,帧内压缩是在空域(空间XY轴上)进行压缩,压缩过程中主要参考本帧数据之间的相似性;而帧间压缩则是利用视频序列中不同视频帧之间的帧间冗余,例如前后帧间的相似性,通过预测方法来减少数据量。通常,相对于帧内压缩而言,帧间压缩通常可以获得更高的压缩率。In the video encoding process, specific information compression technology can be divided into two types: intra-frame compression and inter-frame compression. Intra-frame compression compresses in the spatial domain (space XY axis). In the compression process, the main reference is to the relationship between the data of this frame. Similarity; while inter-frame compression uses the inter-frame redundancy between different video frames in a video sequence, such as the similarity between previous and following frames, to reduce the amount of data through prediction methods. In general, inter-frame compression usually achieves higher compression ratios relative to intra-frame compression.
但是,如果按照前述例子中所述的简单的视角分组以及排列方式,则具有最高冗余度的相邻视角的视频帧是在同一组合帧中,因此,在对组合帧进行压缩时,只能在帧内压缩过程中使用到这种冗余信息,而无法在帧间编码过程中得到充分利用。However, if the simple view grouping and arrangement method described in the previous example is followed, the video frames of adjacent views with the highest redundancy are in the same combined frame. Therefore, when compressing the combined frame, only This redundant information is used in the intra-frame compression process and cannot be fully utilized in the inter-frame coding process.
为此,在本申请实施例中,还提供更优的视角分组以及排列方式,具体的,可以使得相邻视角的视频帧位于相邻组合帧的相同位置,也就是说,相邻视角的视频帧会被分到不同但是相邻的组中,并且,会位于相邻组合帧的相同位置。To this end, in the embodiment of the present application, a more optimal perspective grouping and arrangement method is also provided. Specifically, video frames from adjacent perspectives can be located at the same position of adjacent combined frames. That is to say, videos from adjacent perspectives can Frames will be grouped into different but adjacent groups, and will be located in the same position of adjacent combined frames.
例如,假设共有36个视角(为了便于描述,减少了视角的数量),被分为6个组,每个组内有6个视角的视频帧,也即,每6个视角的视频帧组成一个组合帧。如图3所示,假设每个组合帧中包括3×2个块,每个块用于放置一个视角的视频帧,每个块的位置编号分别为0,1,2,3,4,5;另外假设36个视角分别用A1、A2、A3……A36来表示。则通过图3所示可以看出,视角A1、A2、A3、A4、A5、A6位于组合帧1至6的第0号位置,视角A7、A8、A9、A10、A11、A12位于组合帧1至6的第1号位置,以此类推。也就是说,A1、A7、A13、A19、A25、A31这些视角为第一组,拼接成组各帧1;A2、A8、A14、A20、A26、A32为第二组,拼接成组合帧2,以此类推。可见,每个组合帧内,视角的编号之间形成等差数列,视角编号之间的差值就是分组的数量,在该例子中为6。通过这种方式,可以使得相邻的组合帧之间,在同样位置处,视角也是相邻的,这就使得相邻组各帧之间在相同位置处的图像内容会具有很高的相似度,在进行帧间压缩编码时,就能够充分利用到相邻视角之间由于内容相似度较高而产生的冗余信息,从而有利于获得较高的压缩率。For example, assume that there are 36 views in total (the number of views has been reduced for ease of description), divided into 6 groups, and each group has 6 video frames from the views, that is, every 6 video frames from the views form a Combined frames. As shown in Figure 3, assume that each combined frame includes 3×2 blocks, each block is used to place a video frame from one perspective, and the position number of each block is 0, 1, 2, 3, 4, 5 respectively. ; In addition, assume that the 36 viewing angles are represented by A1, A2, A3...A36 respectively. As shown in Figure 3, it can be seen that the viewing angles A1, A2, A3, A4, A5, and A6 are located at position 0 of the combined frames 1 to 6, and the viewing angles A7, A8, A9, A10, A11, and A12 are located at the combined frame 1 to position 1 of 6, and so on. In other words, A1, A7, A13, A19, A25, and A31 are the first group and are spliced into group frame 1; A2, A8, A14, A20, A26, and A32 are the second group, and are spliced into combined frame 2. , and so on. It can be seen that in each combined frame, the numbers of the viewing angles form an arithmetic sequence, and the difference between the viewing angle numbers is the number of groups, which is 6 in this example. In this way, the viewing angles at the same position between adjacent combined frames can be adjacent, which makes the image content at the same position between adjacent groups of frames have a high degree of similarity. , when performing inter-frame compression coding, the redundant information generated due to the high content similarity between adjacent views can be fully utilized, which is conducive to obtaining a higher compression rate.
当然,上述例子中仅示出了其中一个时间点上各个视角的视频帧拼接情况,其他时间点上各个视角的视频帧也可以按照上述方式进行分组及排列。这样,每个时间点都可以拼接出6个组合帧。各个时间点分别按照上述方式完成拼接后,可以将得到的组合帧形成帧序列,例如,如果将各个组合帧表示为“组合帧mn”,其中,m代表时间点的编号,n代表同一时间点对应的各个组合帧的编号,则形成的帧序列可以是:(组合帧11,组合帧12,组合帧13,组合帧14,组合帧15,组合帧16,组合帧21,组合帧22,组合帧23,组合帧24,组合帧25,组合帧26,组合帧31,组合帧32……)。Of course, the above example only shows the splicing of video frames from each perspective at one point in time. Video frames from each perspective at other time points can also be grouped and arranged in the above manner. In this way, 6 combined frames can be spliced at each time point. After each time point is spliced in the above manner, the resulting combined frame can be formed into a frame sequence. For example, if each combined frame is represented as "combined frame mn", where m represents the number of the time point and n represents the same time point. Corresponding numbers of each combined frame, the frame sequence formed can be: (combined frame 11, combined frame 12, combined frame 13, combined frame 14, combined frame 15, combined frame 16, combined frame 21, combined frame 22, combined Frame 23, combined frame 24, combined frame 25, combined frame 26, combined frame 31, combined frame 32...).
通过上述分组以及排列方式,将这种相邻视角的视频帧分散到不同但是相邻的组合帧中,并且,位于相邻帧的相同位置,而相邻视角的视频帧之间通常会存在很高的相似度,因此,可以使得相邻的组合帧之间,至少在每个位置处的视频帧之间都具有很高的相似度,也即存在大量的冗余信息,这些冗余信息就是在进行帧间编码过程中,可以被压缩优化的对象。因此,在对组合帧进行编码时,就可以通过帧间编码的方式,充分利用相邻视角的视频帧之间的冗余信息,以获得更高的压缩效率。并且,由于这种高压缩率是通过帧间编码技术实现的,而通用的视频编码器就具有帧间编码的能力,因此,利用通用的视频编码器即可实现编码,而不需要依赖专用于多视角编码的编码器,相应的,在播放端利用通用的视频解码器进行解码即可,因此,进一步支持了在更多的终端设备中的解码播放。Through the above grouping and arrangement, the video frames of adjacent viewing angles are dispersed into different but adjacent combined frames, and are located at the same position of adjacent frames, and there are usually many gaps between video frames of adjacent viewing angles. High similarity, therefore, can make adjacent combined frames, at least between video frames at each position, have a high degree of similarity, that is, there is a large amount of redundant information, which is Objects that can be compressed and optimized during inter-frame encoding. Therefore, when encoding combined frames, the redundant information between video frames from adjacent perspectives can be fully utilized through inter-frame coding to obtain higher compression efficiency. Moreover, since this high compression rate is achieved through inter-frame coding technology, and general-purpose video encoders have the ability to inter-frame coding, encoding can be achieved using a general-purpose video encoder without relying on a dedicated According to the multi-view encoding encoder, a universal video decoder can be used for decoding on the playback end. Therefore, it further supports decoding and playback in more terminal devices.
也就是说,通过本申请实施例,采用了对多视角的视频帧进行分组拼接的方式,相对于直接将全部视角拼接到同一帧中的方式,可以减少每个组合帧的分辨率,降低终端设备的解码压力;另外,还对不同视角在不同组合帧中的排列方式进行了特殊处理,以使得不同的组合帧之间的内容相似度会比较高,这样,可以通过对组合帧进行帧间压缩的方式来消除或减少相邻视角的视频帧间的冗余信息,以此获得较高的压缩率,从而使得通用的编码器(例如, HEVC(High Efficiency Video Coding,高效率视频编码)等)即可完成编码过程。相应的,也可以使用通用的解码器来进行解码,这样,可以使得具体的视频可以在多数终端设备中进行解码播放,避免了复杂的渲染流程和较大的计算开销以及对专用编解码器的依赖。That is to say, through the embodiments of the present application, the method of grouping and splicing video frames from multiple perspectives is adopted. Compared with the method of directly splicing all perspectives into the same frame, the resolution of each combined frame can be reduced, and the terminal The decoding pressure of the device; in addition, special processing has been carried out on the arrangement of different perspectives in different combined frames, so that the content similarity between different combined frames will be relatively high. In this way, the combined frames can be inter-frame Compression is used to eliminate or reduce redundant information between video frames from adjacent perspectives, thereby obtaining a higher compression rate, thus making it possible for general-purpose encoders (for example, HEVC (High Efficiency Video Coding), etc. ) to complete the encoding process. Correspondingly, a general decoder can also be used for decoding. In this way, specific videos can be decoded and played on most terminal devices, avoiding complex rendering processes, large computing overhead, and the need for dedicated codecs. rely.
在对多个组合帧形成的帧序列进行编码及帧间压缩处理之后,可以用于向客户端进行传输,以便在客户端进行解码展示。其中,在可选的实现方式下,还具体在进行传输之前,还可以对帧序列进行切片处理,以便以切片后得到的片段为单位进行传输,在接收端可以以片段为单位进行独立的解码播放。这样,接收端只需要收到第一个片段即可进行解码播放,而不需要等待所有的帧序列全部传输完成,因此,可以缩短等待延迟。After the frame sequence formed by multiple combined frames is encoded and inter-frame compressed, it can be used for transmission to the client for decoding and display on the client. Among them, in an optional implementation mode, the frame sequence can also be sliced before transmission, so that the slices obtained after slicing can be transmitted in units, and the receiving end can perform independent decoding in units of segments. Play. In this way, the receiving end only needs to receive the first segment to decode and play it, without waiting for all frame sequences to be transmitted. Therefore, the waiting delay can be shortened.
其中,具体的切片时长可以根据实际需求而定,如果切片越小,则接收端的延迟会越小。例如,每个切片的时长可以为1S,或者也可以是0.5S,等等。其中,在本申请实施例中,由于对多个视角的视频帧进行了分组拼接处理,因此,在确定出切片时长之后,每个切片中需要包括的组合帧的数量可以根据播放端的播放帧率而定。例如,仍然以72个视角并分为12组为例,每个时间点对应12个组合帧,另外假设播放端的播放帧率是30帧/S,在对组合帧进行分片处理时的分片时长为1S,则每个片段中需要包含30×72/6=360个组合帧。也就是说,每个片段中包括的组合帧需要满足播放端在1S的时长内所需播放的帧数,其中,播放端具体在播放时,是需要将组合帧进行解码,从中选择某个视角对应的视频帧,并进行播放,并且,播放端在1S内播放的30帧通常是同一视角下的30个视频帧,而具体对哪个视角进行播放都是可能的,因此,在对组合帧进行切片时,如果每个片段为1S,就需要使得同一个片段中每个视角都存在30个视频帧。如果视角数量是72,则视频帧的数量为30×72个,由于这些视频帧进行了分组,拼接成了组合帧,因此,组合帧的数量就是30×72/6=360。当然,在上述假设条件不变的情况下,如果将分片时长改为0.5S,则每个片段中包括180个组合帧即可,等等。Among them, the specific slice duration can be determined according to actual needs. If the slice is smaller, the delay at the receiving end will be smaller. For example, the duration of each slice can be 1S, or it can be 0.5S, etc. Among them, in the embodiment of the present application, since video frames from multiple perspectives are grouped and spliced, after determining the slice duration, the number of combined frames that need to be included in each slice can be based on the playback frame rate of the playback end. Depends. For example, still taking 72 perspectives divided into 12 groups as an example, each time point corresponds to 12 combined frames. In addition, assuming that the playback frame rate of the playback end is 30 frames/S, the fragmentation when the combined frame is fragmented If the duration is 1S, each segment needs to contain 30×72/6=360 combined frames. That is to say, the combined frames included in each clip need to meet the number of frames that the player needs to play within 1S. Specifically, when playing, the player needs to decode the combined frames and select a certain perspective from them. The corresponding video frames are played and played. Moreover, the 30 frames played by the player within 1S are usually 30 video frames from the same perspective, and it is possible to play them from any specific perspective. Therefore, the combined frames are processed When slicing, if each segment is 1S, there need to be 30 video frames from each perspective in the same segment. If the number of views is 72, the number of video frames is 30×72. Since these video frames are grouped and spliced into combined frames, the number of combined frames is 30×72/6=360. Of course, if the above assumptions remain unchanged, if the fragmentation duration is changed to 0.5S, each fragment will include 180 combined frames, and so on.
另外,如果进行了上述分片传输,则在进行帧间编码时,还可以通过控制其中的关键帧及双向参考帧的数量,来进一步提升压缩率。具体的,编码器将多张图像进行编码后生产成一段一段的 GOP ( Group of Pictures ) ,解码器在播放时则是读取一段一段的GOP 进行解码后读取画面再渲染显示。GOP是一组连续的画面,由一张 I 帧和数张 B / P帧组成,是视频图像编码器和解码器存取的基本单位,它的排列顺序将会一直重复到影像结束。其中,I 帧是内部编码帧(也称为关键帧),P帧是前向预测帧(前向参考帧),B 帧是双向内插帧(双向参考帧)。具体的,I 帧通常是一个完整的画面,而 P 帧和 B 帧记录的是相对于 I 帧的变化,其中, P帧和B帧中没有完整的画面数据,P帧中只有与前一帧的画面差别的数据,B帧记录的是本帧与前后帧的差别。其中,B帧所需记录的信息量比较少,因此,通常具有更高的压缩率。如果一个GOP中包括的I帧越少、B帧越多,则整体上的压缩率会比较高。In addition, if the above-mentioned fragmented transmission is performed, the compression rate can be further improved by controlling the number of key frames and bidirectional reference frames during inter-frame coding. Specifically, the encoder encodes multiple images and produces segments of GOP (Group of Pictures). During playback, the decoder reads the segments of GOP, decodes them, reads the pictures, and then renders and displays them. GOP is a group of continuous pictures, consisting of one I frame and several B/P frames. It is the basic unit of access for video image encoders and decoders. Its arrangement sequence will be repeated until the end of the image. Among them, I frame is an intra-coded frame (also called a key frame), P frame is a forward prediction frame (forward reference frame), and B frame is a bidirectional interpolation frame (bidirectional reference frame). Specifically, I frame is usually a complete picture, while P frame and B frame record changes relative to I frame. Among them, P frame and B frame do not have complete picture data, and P frame only has the same data as the previous frame. Picture difference data, B frame records the difference between this frame and the previous and next frames. Among them, the amount of information required to be recorded in B frames is relatively small, so it usually has a higher compression rate. If a GOP contains fewer I frames and more B frames, the overall compression rate will be higher.
在实际应用中,具体哪些帧会被编码为I帧或者P帧、B帧等,通常是由编码器根据算法来确定的,而在本申请实施例中,为了进一步控制视频的压缩率,还可以通过对编码器进行干预,来减少I帧的数量,增加B帧的数量。具体实现时,可以根据每个切片中包括的组合帧的数量,控制帧间编码过程中的关键帧间隔,以便减少同一切片中被编码成关键帧的帧数。例如,假设每个切片中包括360个组合帧,则可以将关键帧间隔设为360帧或者180帧,也即,使得同一切片中仅有一帧或两帧会被编码为I帧。另外,对于关键帧之外的组合帧,还可以通过调低对双向参考帧的判断阈值,来增加同一切片中被编码成双向参考帧的帧数。也就是说,对于B帧而言,编码器通常会通过计算当前帧与前后帧之间的相似度,并与某个阈值进行比较之后,确定当前帧是否可以被编码为B帧,在本申请实施例中,可以将该阈值调低,则可以将更多的帧编码为B帧,以提升压缩率。In practical applications, which frames will be encoded as I frames, P frames, B frames, etc. are usually determined by the encoder based on the algorithm. In this embodiment of the present application, in order to further control the compression rate of the video, the The encoder can be intervened to reduce the number of I frames and increase the number of B frames. During specific implementation, the key frame interval in the inter-frame encoding process can be controlled according to the number of combined frames included in each slice, so as to reduce the number of frames encoded into key frames in the same slice. For example, assuming that each slice includes 360 combined frames, the key frame interval can be set to 360 frames or 180 frames, that is, so that only one or two frames in the same slice will be encoded as I frames. In addition, for combined frames other than key frames, you can also increase the number of frames in the same slice that are encoded into bidirectional reference frames by lowering the judgment threshold for bidirectional reference frames. That is to say, for B frames, the encoder usually determines whether the current frame can be encoded as a B frame by calculating the similarity between the current frame and the previous and subsequent frames and comparing it with a certain threshold. In this application In embodiments, the threshold can be lowered, and more frames can be encoded as B frames to improve the compression rate.
这里需要说明的是,由于P帧、B帧的解码依赖于I帧,而B帧的解码会依赖于前一帧及后一帧,因此,理论上而言,如果I帧数量较少,B帧的数量比较多,虽然在压缩率上会有更好的表现,但是,在解码时可能会影响图像质量。然而,在本申请实施例中,由于对多视角的视频帧进行了分组拼接,并将相邻视角的视频帧位于相邻组合帧的相同位置,因此,使得每两个相邻组合帧之间都会具有相似度比较高的特点,在该前提下,即使通过上述方式控制了I帧以及B帧的数量,通常也不会影响解码端的图像质量。经测试,本申请实施例提供的方案与简单拼接后进行编码的方案相比,分辨率、码率都得到了明显降低,PSNR(PeakSignal-to-Noise Ratio,峰值信噪比,表示信号最大可能功率和影响它的表示精度的破坏性噪声功率的比值,是衡量图像质量的指标之一)反而得到了提升,具体可以如表1所示:What needs to be explained here is that since the decoding of P and B frames depends on the I frame, and the decoding of the B frame depends on the previous frame and the next frame, theoretically, if the number of I frames is small, B The larger the number of frames, the better the compression rate will be, but the image quality may be affected during decoding. However, in the embodiment of the present application, since multi-view video frames are grouped and spliced, and video frames from adjacent views are located at the same position of adjacent combined frames, the gap between each two adjacent combined frames is All have the characteristics of relatively high similarity. Under this premise, even if the number of I frames and B frames is controlled through the above method, the image quality at the decoder will usually not be affected. After testing, compared with the solution provided by the embodiment of the present application for encoding after simple splicing, the resolution and code rate have been significantly reduced. PSNR (PeakSignal-to-Noise Ratio, peak signal-to-noise ratio, represents the maximum possible signal The ratio of the power to the destructive noise power that affects its representation accuracy, which is one of the indicators to measure image quality) has been improved instead, as shown in Table 1:
表1Table 1
当然,在实际应用中,如果需要更高的画面质量,则也可以适当增加I帧的数量,减少B帧的数量,例如,每个分片中可以包括2个或者更多的I帧,等等。Of course, in actual applications, if higher picture quality is required, the number of I frames can be appropriately increased and the number of B frames can be reduced. For example, each slice can include 2 or more I frames, etc. wait.
以上对多视角视频的压缩、传输等方面进行了详细介绍。在实际应用中,假设某用户通过客户端发起对“模特秀场”的访问,则可以将具体的多视角视频传输到客户端,客户端可以进行下载,并且可以使用通用的视频解码器进行解码。之后,可以将其中某个视角(通常可以将其中一视角设为默认视角)下的视频帧添加到预置的3D空间场景模型中。其中,3D空间场景模型可以是预先通过3D建模的方式生成的,也即,客户端可以对3D空间场景模型以及多视角视频分别进行渲染,并将其中一视角对应的视频内容添加到3D空间场景模型中展示。The compression and transmission of multi-view videos are introduced in detail above. In practical applications, assuming that a user initiates access to the "Model Show" through the client, the specific multi-view video can be transmitted to the client, and the client can download it and use a general video decoder to decode it. . Afterwards, the video frames from one of the perspectives (usually one of the perspectives can be set as the default perspective) can be added to the preset 3D space scene model. Among them, the 3D space scene model can be generated in advance through 3D modeling. That is, the client can render the 3D space scene model and multi-view videos respectively, and add the video content corresponding to one of the views to the 3D space. shown in the scene model.
具体实现时,为了使得视频内容与3D空间场景模型之间具有更好的相互融合的效果,还可以将场景视角与人物视角进行同步渲染,视角切换时也可以进行同步切换,另外,还可以实现人物与场景的光影融合,等等。例如,可以在3D空间场景中添加光照信息,然后根据光照信息计算出人物的影子位置、光线位置等,并在3D空间场景中添加影子信息、光线信息,等等,通过这种方式使得人物与场景更好的融合。During specific implementation, in order to achieve a better integration effect between the video content and the 3D space scene model, the scene perspective and the character perspective can also be rendered synchronously, and the perspective can also be switched synchronously. In addition, it can also be implemented The fusion of light and shadow between characters and scenes, etc. For example, lighting information can be added to the 3D space scene, and then the character's shadow position, light position, etc. can be calculated based on the lighting information, and shadow information, light information, etc. can be added to the 3D space scene. In this way, the character can be compared with the character. Better integration of scenes.
总之,通过本申请实施例,在需要为用户提供“模特秀场”等功能时,可以通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄,得到多个真实视角的视频内容,之后,可以从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像,再利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像。然后,可以根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将其中一视角的人物图像视频内容添加到预置的虚拟3D空间场景模型中进行渲染展示,以提供动态数字人在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。通过这种方式,无需对模特人物进行显式建模,只需利用多视角拍摄的图像即可实现类3D的真人数字人效果,避免了复杂的建模流程和高昂的建模成本,从而可以获得低成本、高保真、可实时互动的动态真人数字人效果。并且,由于本申请实施例中建立起的真人数字人资产的格式为普通的HEVC等格式的视频,因此,可被大多数移动设备解析,并且避免了复杂的渲染流程和较大的计算开销。In short, through the embodiments of the present application, when it is necessary to provide users with functions such as "model show", the process of displaying the target clothing on real people wearing the target clothing can be performed from different perspectives through multiple camera devices. Perform synchronous shooting to obtain video content from multiple real-view angles. Afterwards, the character images contained in the multiple video frames contained in the video content can be extracted respectively, and then image-based rendering technology can be used to generate video content based on adjacent real-view angles. The pixel offset relationship information between the person images at the same time point is used to generate person images at corresponding time points for multiple intermediate viewing angles between the adjacent real viewing angles. Then, a multi-view video can be generated based on the multiple real viewing angles and the multiple intermediate viewing angles of character images at multiple time points, so that the character image video content from one of the viewing angles can be added to the preset virtual Rendering and display are performed in the 3D space scene model to provide the content of the dynamic digital person displaying the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching. In this way, there is no need to explicitly model the model character, and only the images taken from multiple perspectives can be used to achieve a 3D-like real-life digital human effect, avoiding complex modeling processes and high modeling costs, thus enabling Get low-cost, high-fidelity, dynamic real-time interactive digital effects. Moreover, since the format of the real-life digital human assets created in the embodiment of the present application is ordinary HEVC and other format videos, it can be parsed by most mobile devices and avoids complex rendering processes and large computing overhead.
另外,对于多个视角对应的多份视频内容,具体在进行视频编码时,可以以时间点为单位对所述多个视角对应的多个视频帧进行拼接处理,得到由多个组合帧形成的帧序列,并且,将该时间点的多个视角对应的多个视频帧划分为多个集合,每个集合中的多个视频帧拼接为一个组合帧,且使得相邻视角的视频帧位于相邻组合帧的相同位置。之后,可以利用通用的视频编码器对所述多个组合帧形成的帧序列进行编码,并通过对所述多个组合帧进行帧间压缩处理,消除或减少相邻视角的视频帧间的冗余信息。这样,由于对多个视角的视频帧进行了分组拼接,因此,使得拼接后的组合帧的分辨率不至于过高,便于在大部分终端设备中进行实时解码;另外,由于在进行分组拼接时,对分组方式以及排列方式进行了控制,使得相邻视角的视频帧位于相邻组合帧的相同位置,也就是说,使得相邻视角的视频帧位于不同但是相邻的组合帧中,且在不同组合帧中的位置相同,而相邻视角的视频帧之间具有相似度比较高的特点,因此,通过这种方式拼接成的相邻组合帧之间具有比较高的相似度,进而通过通用的帧间压缩算法即可通过消除或减少相邻视角的视频帧间的冗余信息,而获得较高的压缩率。换言之,在本申请实施例中,通过通用的视频编码器即可获得理想的压缩率,相应的,在解码端利用通用的解码器即可完成解码,从而可以在更多的终端设备上得到支持。In addition, for multiple video contents corresponding to multiple viewing angles, specifically during video encoding, multiple video frames corresponding to the multiple viewing angles can be spliced in units of time points to obtain a video frame formed by multiple combined frames. frame sequence, and divide the multiple video frames corresponding to the multiple perspectives at that time point into multiple sets, and splice the multiple video frames in each set into a combined frame, so that the video frames from adjacent perspectives are located in the same position. The same position of adjacent combined frames. After that, a general video encoder can be used to encode the frame sequence formed by the multiple combined frames, and by performing inter-frame compression processing on the multiple combined frames, the redundancy between video frames of adjacent perspectives can be eliminated or reduced. remaining information. In this way, since the video frames from multiple perspectives are grouped and spliced, the resolution of the spliced combined frame is not too high, which facilitates real-time decoding in most terminal devices; in addition, since the grouped splicing is performed, , the grouping method and arrangement method are controlled so that video frames from adjacent perspectives are located at the same position of adjacent combined frames, that is, video frames from adjacent perspectives are located in different but adjacent combined frames, and in The positions in different combined frames are the same, and video frames from adjacent perspectives have relatively high similarities. Therefore, adjacent combined frames spliced in this way have relatively high similarities, and then through universal The inter-frame compression algorithm can obtain a higher compression rate by eliminating or reducing redundant information between video frames from adjacent perspectives. In other words, in the embodiment of the present application, the ideal compression rate can be obtained through a universal video encoder. Correspondingly, the decoding can be completed by using a universal decoder at the decoding end, so that it can be supported on more terminal devices. .
实施例二Embodiment 2
该实施例二是与实施例一相对应的,从客户端的角度,提供了一种基于动态数字人形象进行信息展示的方法,参见图4,该方法可以包括:The second embodiment corresponds to the first embodiment. From the perspective of the client, a method for displaying information based on a dynamic digital human image is provided. See Figure 4. The method may include:
S401:响应于用户发起的查看请求,获取多视角视频,所述多视角视频通过以下方式生成:通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到多个真实视角的视频内容,对所述多个真实视角的视频内容包含的多个视频帧中分别提取出其中包含的人物图像,利用基于图像的渲染技术,为相邻真实视角之间的多个中间视角生成人物图像,并根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成所述多视角视频;S401: In response to a viewing request initiated by the user, obtain a multi-view video. The multi-view video is generated by displaying the target clothing on a real person wearing the target clothing from different perspectives through multiple camera devices. The process of synchronizing shooting to obtain video content from multiple real perspectives, extracting the character images contained in multiple video frames contained in the video content from multiple real perspectives, and using image-based rendering technology to create adjacent images. Generate character images from multiple intermediate perspectives between real perspectives, and generate the multi-view video based on the multiple real perspective and character images at multiple time points from the multiple intermediate perspectives;
S402:对所述多视角视频进行解码;S402: Decode the multi-view video;
S403:将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容;S403: Match the multi-view video to a preset virtual 3D space scene model to provide content of a dynamic digital human image displaying the target clothing in the virtual 3D space scene;
S404:响应于连续视角切换的交互操作,通过切换到其他视角的人物图像视频内容,提供模拟连续视角切换的互动效果。S404: In response to the interactive operation of continuous perspective switching, provide an interactive effect simulating continuous perspective switching by switching to character image video content from other perspectives.
实施例三Embodiment 3
以上实施例一、二主要针对商品信息服务系统中需要为用户提供“模特秀场”等功能为例,对具体的实现方案进行了介绍。而在实际应用中,本申请实施例提供的生产动态真人数字人的方式也可以在其他应用场景中来使用。为此,该实施例三还提供了一种生成动态数字人形象的方法,参见图5,该方法可以包括:The above Embodiments 1 and 2 mainly take the need to provide users with functions such as "model show" in the product information service system as an example, and introduce the specific implementation plan. In practical applications, the method of producing dynamic real-life digital humans provided by the embodiments of the present application can also be used in other application scenarios. To this end, the third embodiment also provides a method for generating a dynamic digital human image. See Figure 5. The method may include:
S501:获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角对真实人物执行目标动作的过程进行同步拍摄得到的。S501: Obtain video content from multiple real viewing angles. The video content from multiple real viewing angles is obtained by synchronously shooting the process of real people performing target actions from different viewing angles through multiple camera devices.
具体的目标动作可以根据实际场景中的需求而定,例如,可以是表演某个舞蹈动作,等等。The specific target action can be determined according to the needs of the actual scene, for example, it can be to perform a certain dance move, etc.
S502:从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像。S502: Extract character images contained in multiple video frames contained in the video content.
S503:利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像。S503: Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view character images at the same time point, generate characters at corresponding time points for multiple intermediate perspectives between adjacent real-view perspectives. image.
S504:根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像,生成多视角视频,以便通过所述多视角视频进行对应的动态数字人形象的展示。S504: Generate a multi-view video based on the multiple real view angles and the multiple intermediate view character images at multiple time points, so that the corresponding dynamic digital human image can be displayed through the multi-perspective video.
通过上述方式生产出的动态数字人形象的相关数据,可以以多视角视频的形式存在,多视角视频则是由多个视角对应的普通视频格式的文件组成,因此,这种真人数字人具有端侧友好的特点,可以在多种场景中进行展示,包括与某种3D空间场景模型进行融合后展示,从而呈现出具体的数字人在具体的3D空间场景中做出对应动作的视角效果,并且,用户可以进行连续的视角切换或者视角缩放,等等。The data related to the dynamic digital human image produced through the above method can exist in the form of multi-view video. Multi-view video is composed of files in ordinary video format corresponding to multiple viewing angles. Therefore, this kind of real-life digital human has terminal It has side-friendly features and can be displayed in a variety of scenarios, including being displayed after being integrated with a certain 3D space scene model, thereby showing the perspective effect of a specific digital person making corresponding actions in a specific 3D space scene, and , users can perform continuous viewing angle switching or viewing angle zooming, etc.
实施例四Embodiment 4
在该实施例四中,主要对将多视角视频匹配到虚拟3D空间场景模型中的方式进行保护,具体的,该实施例四提供了一种基于动态数字人形象进行服饰信息展示的方法,参见图6,该方法可以包括:In this fourth embodiment, the method of matching multi-view videos into virtual 3D space scene models is mainly protected. Specifically, this fourth embodiment provides a method for displaying clothing information based on dynamic digital human images, see Figure 6. The method may include:
S601:响应于通过动态数字人对目标服饰进行展示的请求,获得虚拟3D空间场景模型,以及通过多视角视频的形式表达的动态数字人形象,其中,所述多视角视频用于从多个视角对目标人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行展示;S601: In response to a request to display the target clothing through a dynamic digital human, obtain a virtual 3D space scene model and a dynamic digital human image expressed in the form of a multi-view video, where the multi-view video is used to view the target clothing from multiple perspectives. Demonstrate the process of displaying the target clothing by the target person while wearing the target clothing;
S602:将所述多视角视频匹配到所述虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。S602: Match the multi-view video to the virtual 3D space scene model to provide content for the dynamic digital human image to display the target clothing in the virtual 3D space scene, and provide content for simulating continuous viewing angles Switching interactive effects.
在该实施例四中,具体的多视角视频可以通过前述实施例中提供的方式生成,或者,也可以通过其他方式生成,例如,可以直接通过密集部署更多摄像机的方式生成多视角视频,等等。其中,在具体在将多视角视频匹配到虚拟3D空间场景模型中时,可以通过3D渲染引擎来实现,当然,在具体实现时,可以在通用的3D渲染引擎基础上进行一些功能定制,例如,实现多个不同视角的阴影一致性,等等。In this fourth embodiment, the specific multi-view video can be generated by the method provided in the previous embodiment, or it can also be generated by other methods. For example, the multi-view video can be generated directly by densely deploying more cameras, etc. wait. Among them, when matching multi-view videos to virtual 3D space scene models, it can be achieved through a 3D rendering engine. Of course, during specific implementation, some function customization can be performed on the basis of a general 3D rendering engine, for example, Achieve shadow consistency across multiple different viewing angles, and more.
关于该实施例二至四中的未详述部分内容,可以参见实施例一以及本说明书其他部分的记载,这里不再赘述。For the parts not described in detail in Embodiments 2 to 4, please refer to the records in Embodiment 1 and other parts of this specification, and will not be described again here.
需要说明的是,本申请实施例中可能会涉及到对用户数据的使用,在实际应用中,可以在符合所在国的适用法律法规要求的情况下( 例如,用户明确同意,对用户切实通知,等),在适用法律法规允许的范围内在本文描述的方案中使用用户特定的个人数据。It should be noted that the embodiments of this application may involve the use of user data. In actual applications, this can be done in compliance with the applicable laws and regulations of the country (for example, the user explicitly agrees, and the user is effectively notified, etc.), use user-specific personal data in the scenarios described herein to the extent permitted by applicable laws and regulations.
与实施例一相对应,本申请实施例还提供了一种基于动态数字人形象进行信息展示的装置,该装置可以包括:Corresponding to Embodiment 1, this embodiment of the present application also provides a device for displaying information based on a dynamic digital human image. The device may include:
视频内容获得单元,用于获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到的;A video content acquisition unit is used to obtain video content from multiple real perspectives. The video content from multiple real perspectives is performed on real people wearing the target clothing from different perspectives through multiple camera devices. It is obtained by synchronously shooting the display process;
人物图像提取单元,用于从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;A character image extraction unit, configured to extract character images contained in multiple video frames contained in the video content;
视角合成单元,用于利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;A perspective synthesis unit, configured to use image-based rendering technology to generate correspondences for multiple intermediate perspectives between adjacent true perspectives based on pixel offset relationship information between adjacent real perspective images of people at the same time point. Images of people at points in time;
多视角视频生存和单元,用于根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成多视角视频,以便在客户端将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。A multi-view video storage and unit is configured to generate a multi-view video based on the multiple real views and the multiple intermediate view character images at multiple time points, so as to match the multi-view video to the predetermined The virtual 3D space scene model is placed in the virtual 3D space scene model to provide the content of the dynamic digital human image displaying the target clothing in the virtual 3D space scene, and to provide an interactive effect for simulating continuous perspective switching.
其中,所述多个真实视角的视频内容是通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下在目标空间场所中行走以对所述目标服饰进行展示的过程进行同步拍摄得到的。Wherein, the video content of multiple real perspectives is obtained by synchronously shooting the process of real people walking in the target space wearing the target clothing from different perspectives through multiple camera devices to display the target clothing. of.
具体的,所述视角合成单元具体可以用于:Specifically, the perspective synthesis unit can be used for:
利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,对所述相邻真实视角之间的多个中间视角在对应时间点的像素位置进行估计,以生成所述中间视角在多个时间点上的人物图像。Using image-based rendering technology, based on the pixel offset relationship information between adjacent real-view person images at the same time point, the pixel positions of multiple intermediate views between the adjacent real-view views at the corresponding time points are processed. Estimation to generate images of people at multiple time points from the intermediate perspective.
更为具体的,所述视角合成单元具体可以用于:More specifically, the perspective synthesis unit can be used for:
以相邻真实视角在同一时间点分别对应的人物图像为输入,通过深度学习模型拟合出所述相邻真实视角的人物图像之间的稠密光流场,并利用所述稠密光流场,对所述相邻真实视角之间的多个中间视角在对应时间点的人物图像像素位置进行估计。Taking the character images corresponding to adjacent real-view angles at the same time point as input, a deep learning model is used to fit the dense optical flow field between the adjacent real-perspective character images, and the dense optical flow field is used to The pixel positions of the human image at corresponding time points from multiple intermediate viewing angles between the adjacent real viewing angles are estimated.
另外,该装置还可以包括:Additionally, the device may include:
视频压缩单元,用于对所述多视角视频进行压缩处理,以用于向客户端进行传输。A video compression unit, configured to compress the multi-view video for transmission to the client.
其中,所述视频压缩单元具体可以包括:Wherein, the video compression unit may specifically include:
拼接处理子单元,用于以时间点为单位对多个视角对应的视频帧进行拼接处理,得到由多个组合帧形成的帧序列;其中,对于同一时间点,将该时间点的多个视角对应的多个视频帧划分为多个集合,每个集合中的多个视频帧拼接为一个组合帧,且使得相邻视角的视频帧位于相邻组合帧的相同位置;The splicing processing subunit is used to splice video frames corresponding to multiple perspectives in units of time points to obtain a frame sequence formed by multiple combined frames; wherein, for the same time point, multiple perspectives at that time point are The corresponding multiple video frames are divided into multiple sets, and the multiple video frames in each set are spliced into a combined frame, so that video frames from adjacent perspectives are located at the same position of adjacent combined frames;
帧间压缩子单元,用于利用通用的视频编码器对所述多个组合帧形成的帧序列进行编码,并对所述多个组合帧进行帧间压缩处理。The inter-frame compression subunit is used to encode the frame sequence formed by the multiple combined frames using a general video encoder, and perform inter-frame compression processing on the multiple combined frames.
其中,每个组合帧的分辨率低于终端设备可支持的最大分辨率。Among them, the resolution of each combined frame is lower than the maximum resolution supported by the end device.
另外,该装置还可以包括:Additionally, the device may include:
分片处理单元,用于在对多个组合帧形成的帧序列进行编码及帧间压缩处理之后,还对帧序列进行切片处理,以便以切片后得到的片段为单位进行传输,在接收端以片段为单位进行独立的解码播放。The slicing processing unit is used to perform encoding and inter-frame compression processing on the frame sequence formed by multiple combined frames, and also perform slicing processing on the frame sequence, so that the fragments obtained after slicing can be transmitted in units. At the receiving end, Fragments are decoded and played independently in units.
再者,还可以包括:Furthermore, it can also include:
关键帧数量控制单元,用于根据每个切片中包括的组合帧的数量,控制帧间编码过程中的关键帧间隔,以便减少同一切片中被编码成关键帧的帧数。A key frame number control unit is used to control the key frame interval in the inter-frame encoding process according to the number of combined frames included in each slice, so as to reduce the number of frames encoded into key frames in the same slice.
双向参考帧数量控制单元,用于对于关键帧之外的组合帧,通过调低对双向参考帧的判断阈值,增加同一切片中被编码成双向参考帧的帧数。The bidirectional reference frame number control unit is used to increase the number of frames encoded into bidirectional reference frames in the same slice by lowering the judgment threshold for bidirectional reference frames for combined frames other than key frames.
与实施例二相对应,本申请实施例还提供了一种基于动态数字人形象进行信息展示的装置,该装置可以包括:Corresponding to Embodiment 2, this embodiment of the present application also provides a device for displaying information based on a dynamic digital human image. The device may include:
多视角视频获取单元,用于响应于用户发起的查看请求,获取多视角视频,所述多视角视频通过以下方式生成:通过多个相机设备从不同视角、对真实人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行同步拍摄得到多个真实视角的视频内容,对所述多个真实视角的视频内容包含的多个视频帧中分别提取出其中包含的人物图像,利用基于图像的渲染技术,为相邻真实视角之间的多个中间视角生成人物图像,并根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像生成所述多视角视频;A multi-view video acquisition unit is configured to obtain a multi-view video in response to a viewing request initiated by a user. The multi-view video is generated in the following manner: using multiple camera devices to view real people wearing target clothing from different perspectives. The process of displaying the target clothing is synchronously shot to obtain video content from multiple real perspectives, and the character images contained in the multiple video frames contained in the video content from the multiple real perspectives are extracted respectively, and image-based Rendering technology to generate character images for multiple intermediate perspectives between adjacent real perspectives, and generate the multi-view video based on the multiple real perspective and the character images of the multiple intermediate perspectives at multiple time points;
解码单元,用于对所述多视角视频进行解码;A decoding unit, used to decode the multi-view video;
添加单元,用于将所述多视角视频匹配到预置的虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容;Adding a unit for matching the multi-view video to a preset virtual 3D space scene model to provide content in which a dynamic digital human image displays the target clothing in the virtual 3D space scene;
视角切换交互单元,用于响应于连续视角切换的交互操作,通过切换到其他视角的人物图像视频内容,提供模拟连续视角切换的互动效果。The perspective switching interaction unit is used to respond to the interactive operation of continuous perspective switching and provide an interactive effect that simulates continuous perspective switching by switching to character image video content from other perspectives.
与实施例三相对应,本申请实施例还提供了一种生成动态数字人形象的装置,该装置可以包括:Corresponding to Embodiment 3, this embodiment of the present application also provides a device for generating a dynamic digital human image. The device may include:
视频内容获得单元,用于获得多个真实视角的视频内容,所述多个真实视角的视频内容是通过多个相机设备从不同视角对真实人物执行目标动作的过程进行同步拍摄得到的;A video content acquisition unit is used to obtain video content from multiple real viewing angles. The video content from multiple real viewing angles is obtained by synchronously shooting the process of real people performing target actions from different viewing angles through multiple camera devices;
人物图像提取单元,用于从所述视频内容包含的多个视频帧中分别提取出其中包含的人物图像;A character image extraction unit, configured to extract character images contained in multiple video frames contained in the video content;
视角合成单元,用于利用基于图像的渲染技术,根据相邻真实视角在同一时间点的人物图像之间的像素偏移关系信息,为所述相邻真实视角之间的多个中间视角生成对应时间点的人物图像;A perspective synthesis unit, configured to use image-based rendering technology to generate correspondences for multiple intermediate perspectives between adjacent true perspectives based on pixel offset relationship information between adjacent true perspective person images at the same time point. Images of people at points in time;
动态数字人资产生成单元,用于根据所述多个真实视角以及所述多个中间视角在多个时间点上的人物图像,生成多视角视频,以便通过所述多视角视频进行对应的动态数字人形象的展示。A dynamic digital human asset generation unit, configured to generate multi-view videos based on the multiple real views and the multiple intermediate view character images at multiple time points, so as to generate corresponding dynamic digital images through the multi-view videos. Display of human image.
与实施例四相对应,本申请实施例还提供了一种基于动态数字人形象进行服饰信息展示的装置,该装置可以包括:Corresponding to Embodiment 4, this embodiment of the present application also provides a device for displaying clothing information based on dynamic digital human images. The device may include:
请求接收单元,用于响应于通过动态数字人对目标服饰进行展示的请求,获得虚拟3D空间场景模型,以及通过多视角视频的形式表达的动态数字人形象,其中,所述多视角视频用于从多个视角对目标人物在穿着目标服饰状态下对所述目标服饰进行展示的过程进行展示;A request receiving unit, configured to obtain a virtual 3D space scene model and a dynamic digital human image expressed in the form of a multi-view video in response to a request to display the target clothing through a dynamic digital human, wherein the multi-view video is used for Demonstrate the process of displaying the target clothing by the target person while wearing the target clothing from multiple perspectives;
展示单元,用于将所述多视角视频匹配到所述虚拟3D空间场景模型中,以提供动态数字人形象在所述虚拟3D空间场景中对所述目标服饰进行展示的内容,并提供用于模拟连续视角切换的互动效果。A display unit configured to match the multi-view video to the virtual 3D space scene model to provide content for displaying the target clothing by a dynamic digital human image in the virtual 3D space scene, and to provide content for Simulate the interactive effect of continuous perspective switching.
另外,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述方法实施例中任一项所述的方法的步骤。In addition, embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the method described in any one of the foregoing method embodiments are implemented.
以及一种电子设备,包括:and an electronic device including:
一个或多个处理器;以及one or more processors; and
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行前述方法实施例中任一项所述的方法的步骤。A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform any one of the foregoing method embodiments steps of the method.
其中,图7示例性的展示出了电子设备的架构,具体可以包括处理器710,视频显示适配器711,磁盘驱动器712,输入/输出接口713,网络接口714,以及存储器720。上述处理器710、视频显示适配器711、磁盘驱动器712、输入/输出接口713、网络接口714,与存储器720之间可以通过通信总线730进行通信连接。Among them, FIG. 7 exemplarily shows the architecture of the electronic device, which may specifically include a processor 710, a video display adapter 711, a disk drive 712, an input/output interface 713, a network interface 714, and a memory 720. The above-mentioned processor 710, video display adapter 711, disk drive 712, input/output interface 713, network interface 714, and the memory 720 can be connected through a communication bus 730.
其中,处理器710可以采用通用的CPU(Central Processing Unit,处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请所提供的技术方案。Among them, the processor 710 can be implemented by a general CPU (Central Processing Unit, processor), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, for executing Related procedures to implement the technical solutions provided in this application.
存储器720可以采用ROM(Read Only Memory,只读存储器)、RAM(Random AccessMemory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器720可以存储用于控制电子设备700运行的操作系统721,用于控制电子设备700的低级别操作的基本输入输出系统(BIOS)。另外,还可以存储网页浏览器723,数据存储管理系统724,以及信息展示处理系统725等等。上述信息展示处理系统725就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之,在通过软件或者固件来实现本申请所提供的技术方案时,相关的程序代码保存在存储器720中,并由处理器710来调用执行。The memory 720 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 720 may store an operating system 721 for controlling the operation of the electronic device 700 and a basic input output system (BIOS) for controlling low-level operations of the electronic device 700 . In addition, a web browser 723, a data storage management system 724, an information display processing system 725, etc. can also be stored. The above-mentioned information display processing system 725 can be an application program that specifically implements the above-mentioned steps in the embodiment of the present application. In short, when the technical solution provided in this application is implemented through software or firmware, the relevant program code is stored in the memory 720 and called and executed by the processor 710 .
输入/输出接口713用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 713 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
网络接口714用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The network interface 714 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices. The communication module can communicate through wired means (such as USB, network cable, etc.) or wirelessly (such as mobile network, WIFI, Bluetooth, etc.).
总线730包括一通路,在设备的各个组件(例如处理器710、视频显示适配器711、磁盘驱动器712、输入/输出接口713、网络接口714,与存储器720)之间传输信息。Bus 730 includes a path that carries information between various components of the device (eg, processor 710, video display adapter 711, disk drive 712, input/output interface 713, network interface 714, and memory 720).
需要说明的是,尽管上述设备仅示出了处理器710、视频显示适配器711、磁盘驱动器712、输入/输出接口713、网络接口714,存储器720,总线730等,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本申请方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 710, the video display adapter 711, the disk drive 712, the input/output interface 713, the network interface 714, the memory 720, the bus 730, etc., during the specific implementation process, the Equipment may also include other components necessary for proper operation. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the solution of the present application, and does not necessarily include all the components shown in the drawings.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。From the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus the necessary general hardware platform. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product can be stored in a storage medium, such as ROM/RAM, disk , optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of this application.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system or system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. The system and system embodiments described above are only illustrative, in which the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, It can be located in one place, or it can be distributed over multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
以上对本申请所提供的基于动态数字人形象进行信息展示的方法及电子设备,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。The method and electronic equipment for information display based on dynamic digital human images provided by this application have been introduced in detail above. This article uses specific examples to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for use. To help understand the method and its core idea of the present application; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and application scope based on the ideas of the present application. In summary, the contents of this specification should not be construed as limiting this application.
Claims (13)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410069436.XA CN117596373B (en) | 2024-01-17 | 2024-01-17 | Method and electronic device for displaying information based on dynamic digital human image |
| PCT/CN2024/130678 WO2025152573A1 (en) | 2024-01-17 | 2024-11-08 | Information display method based on dynamic digital human avatar, and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410069436.XA CN117596373B (en) | 2024-01-17 | 2024-01-17 | Method and electronic device for displaying information based on dynamic digital human image |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117596373A true CN117596373A (en) | 2024-02-23 |
| CN117596373B CN117596373B (en) | 2024-04-12 |
Family
ID=89920446
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410069436.XA Active CN117596373B (en) | 2024-01-17 | 2024-01-17 | Method and electronic device for displaying information based on dynamic digital human image |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117596373B (en) |
| WO (1) | WO2025152573A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118972669A (en) * | 2024-07-03 | 2024-11-15 | 阿里巴巴达摩院(杭州)科技有限公司 | Object generation method, 4D object generation method and video generation method |
| WO2025152573A1 (en) * | 2024-01-17 | 2025-07-24 | 淘宝(中国)软件有限公司 | Information display method based on dynamic digital human avatar, and electronic device |
Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070211796A1 (en) * | 2006-03-09 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding multi-view video to provide uniform picture quality |
| CN106303289A (en) * | 2015-06-05 | 2017-01-04 | 福建凯米网络科技有限公司 | A kind of real object and virtual scene are merged the method for display, Apparatus and system |
| US20190099678A1 (en) * | 2017-09-29 | 2019-04-04 | Sony Interactive Entertainment America Llc | Virtual Reality Presentation of Real World Space |
| CN113949873A (en) * | 2021-10-15 | 2022-01-18 | 北京奇艺世纪科技有限公司 | A kind of video coding method, apparatus and electronic equipment |
| CN114302128A (en) * | 2021-12-31 | 2022-04-08 | 视伴科技(北京)有限公司 | Video generation method, device, electronic device and storage medium |
| KR20220047882A (en) * | 2017-03-17 | 2022-04-19 | 매직 립, 인코포레이티드 | Technique for recording augmented reality data |
| WO2022111554A1 (en) * | 2020-11-30 | 2022-06-02 | 华为技术有限公司 | View switching method and apparatus |
| CN114663633A (en) * | 2022-03-24 | 2022-06-24 | 航天宏图信息技术股份有限公司 | AR virtual live broadcast method and system |
| CN114897681A (en) * | 2022-04-20 | 2022-08-12 | 上海交通大学 | Multi-user free visual angle video method and system based on real-time virtual visual angle interpolation |
| CN115115959A (en) * | 2021-03-19 | 2022-09-27 | 海信集团控股股份有限公司 | Image processing method and device |
| CN115567661A (en) * | 2022-09-23 | 2023-01-03 | 上海微创医疗机器人(集团)股份有限公司 | Video data processing method, system, computer device and storage medium |
| CN116170624A (en) * | 2023-01-13 | 2023-05-26 | 北京达佳互联信息技术有限公司 | Object display method and device, electronic equipment and storage medium |
| CN116208812A (en) * | 2023-02-15 | 2023-06-02 | 武汉大学 | A video frame interpolation method and system based on stereo event and intensity camera |
| CN116993432A (en) * | 2023-07-12 | 2023-11-03 | 浙江天猫技术有限公司 | Virtual clothes information display method and electronic equipment |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10735826B2 (en) * | 2017-12-20 | 2020-08-04 | Intel Corporation | Free dimension format and codec |
| CN117221627A (en) * | 2023-08-24 | 2023-12-12 | 山东省计算中心(国家超级计算济南中心) | Multi-view synchronization method and free view system |
| CN117596373B (en) * | 2024-01-17 | 2024-04-12 | 淘宝(中国)软件有限公司 | Method and electronic device for displaying information based on dynamic digital human image |
-
2024
- 2024-01-17 CN CN202410069436.XA patent/CN117596373B/en active Active
- 2024-11-08 WO PCT/CN2024/130678 patent/WO2025152573A1/en active Pending
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070211796A1 (en) * | 2006-03-09 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding multi-view video to provide uniform picture quality |
| CN106303289A (en) * | 2015-06-05 | 2017-01-04 | 福建凯米网络科技有限公司 | A kind of real object and virtual scene are merged the method for display, Apparatus and system |
| KR20220047882A (en) * | 2017-03-17 | 2022-04-19 | 매직 립, 인코포레이티드 | Technique for recording augmented reality data |
| US20190099678A1 (en) * | 2017-09-29 | 2019-04-04 | Sony Interactive Entertainment America Llc | Virtual Reality Presentation of Real World Space |
| WO2022111554A1 (en) * | 2020-11-30 | 2022-06-02 | 华为技术有限公司 | View switching method and apparatus |
| CN115115959A (en) * | 2021-03-19 | 2022-09-27 | 海信集团控股股份有限公司 | Image processing method and device |
| CN113949873A (en) * | 2021-10-15 | 2022-01-18 | 北京奇艺世纪科技有限公司 | A kind of video coding method, apparatus and electronic equipment |
| CN114302128A (en) * | 2021-12-31 | 2022-04-08 | 视伴科技(北京)有限公司 | Video generation method, device, electronic device and storage medium |
| CN114663633A (en) * | 2022-03-24 | 2022-06-24 | 航天宏图信息技术股份有限公司 | AR virtual live broadcast method and system |
| CN114897681A (en) * | 2022-04-20 | 2022-08-12 | 上海交通大学 | Multi-user free visual angle video method and system based on real-time virtual visual angle interpolation |
| CN115567661A (en) * | 2022-09-23 | 2023-01-03 | 上海微创医疗机器人(集团)股份有限公司 | Video data processing method, system, computer device and storage medium |
| CN116170624A (en) * | 2023-01-13 | 2023-05-26 | 北京达佳互联信息技术有限公司 | Object display method and device, electronic equipment and storage medium |
| CN116208812A (en) * | 2023-02-15 | 2023-06-02 | 武汉大学 | A video frame interpolation method and system based on stereo event and intensity camera |
| CN116993432A (en) * | 2023-07-12 | 2023-11-03 | 浙江天猫技术有限公司 | Virtual clothes information display method and electronic equipment |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025152573A1 (en) * | 2024-01-17 | 2025-07-24 | 淘宝(中国)软件有限公司 | Information display method based on dynamic digital human avatar, and electronic device |
| CN118972669A (en) * | 2024-07-03 | 2024-11-15 | 阿里巴巴达摩院(杭州)科技有限公司 | Object generation method, 4D object generation method and video generation method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117596373B (en) | 2024-04-12 |
| WO2025152573A1 (en) | 2025-07-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11381739B2 (en) | Panoramic virtual reality framework providing a dynamic user experience | |
| US11087549B2 (en) | Methods and apparatuses for dynamic navigable 360 degree environments | |
| CN112738010B (en) | Data interaction method and system, interaction terminal and readable storage medium | |
| CN110351592B (en) | Animation presentation method and device, computer equipment and storage medium | |
| US11941748B2 (en) | Lightweight view dependent rendering system for mobile devices | |
| US20200388068A1 (en) | System and apparatus for user controlled virtual camera for volumetric video | |
| CN117596373A (en) | Method and electronic device for information display based on dynamic digital human image | |
| TWI817273B (en) | Real-time multiview video conversion method and system | |
| CN113963094B (en) | Depth map and video processing, reconstruction method, device, equipment and storage medium | |
| WO2019229293A1 (en) | An apparatus, a method and a computer program for volumetric video | |
| Shi et al. | Real-time remote rendering of 3D video for mobile devices | |
| CN113243112A (en) | Streaming volumetric and non-volumetric video | |
| JP7320146B2 (en) | Support for multi-view video motion with disocclusion atlas | |
| WO2017185761A1 (en) | Method and device for playing back 2d video | |
| US20250037356A1 (en) | Augmenting a view of a real-world environment with a view of a volumetric video object | |
| CN116075860A (en) | Information processing device, information processing method, video distribution method, and information processing system | |
| EP3223524A1 (en) | Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices | |
| TWI778749B (en) | Transmission method, processing device, and generating system of video for virtual reality | |
| CN107580228B (en) | A monitoring video processing method, device and equipment | |
| US20240275832A1 (en) | Method and apparatus for providing performance content | |
| WO2019122504A1 (en) | Method for encoding and decoding volumetric video data | |
| WO2019077199A1 (en) | An apparatus, a method and a computer program for volumetric video | |
| US20240221300A1 (en) | System and Method for Unsupervised and Autonomous 4D Dynamic Scene and Objects Interpretation, Segmentation, 3D Reconstruction, and Streaming | |
| CN111726598A (en) | Image processing method and device | |
| EP3742404A1 (en) | Content coding system and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |