CN107613239B

CN107613239B - Video communication background display method and device

Info

Publication number: CN107613239B
Application number: CN201710811931.3A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2020-09-11
Anticipated expiration: 2037-09-11
Also published as: CN107613239A

Abstract

The invention provides a video communication background display method and a video communication background display device, wherein the method comprises the following steps: acquiring a scene image of a current user; acquiring a depth image of a current user; processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; locking a preset first virtual background image to generate a second virtual background image, fusing the figure region image and the second virtual background image to obtain a combined image, and displaying the combined image to a target user performing video communication with the current user; determining familiarity between a current user and a target user; and acquiring corresponding image elements from the first virtual background image according to the familiarity, and displaying the image elements to the target user in the second virtual background image. Therefore, with the familiarity between the user and the target user, the virtual background image of the current user is gradually opened and displayed to the target user, the privacy of the user is protected, and the communication safety is realized.

Description

Video communication background display method and device

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及一种视频通信背景显示方法和装置。The present invention relates to the technical field of image processing, and in particular, to a video communication background display method and device.

背景技术Background technique

随着互联网技术的发展，越来越多的通信功能得到开发并应用，其中，视频通信功能由于可以实现，异地用户的可视化沟通而得到广泛应用。With the development of Internet technology, more and more communication functions have been developed and applied. Among them, the video communication function has been widely used because it can realize visual communication of remote users.

然而，相关技术中，在用户进行视频聊天时，当前用户可能预先根据个人喜好设置了虚拟背景等，而这些虚拟背景反应了用户的个人喜好，甚至有可能反应了当前用户所在的真实环境，直接将该虚拟背景显示给对方，对当前用户的隐私信息不能有效保护。However, in the related art, when a user conducts a video chat, the current user may set a virtual background in advance according to personal preferences, etc., and these virtual backgrounds reflect the user's personal preferences, and may even reflect the real environment where the current user is located. Displaying the virtual background to the other party cannot effectively protect the privacy information of the current user.

发明内容SUMMARY OF THE INVENTION

本发明提供一种视频通信背景显示方法和装置，以解决现有技术中，用户视频聊天时，直接将虚拟背景图像显示给目标用户暴露用户的个人隐私的技术问题。The present invention provides a video communication background display method and device to solve the technical problem of exposing the user's personal privacy by directly displaying a virtual background image to a target user during a video chat in the prior art.

本发明实施例提供一种视频通信背景显示方法，用于电子装置，包括：获取当前用户的场景图像；获取所述当前用户的深度图像；处理所述场景图像和所述深度图像，以提取所述当前用户在所述场景图像中的人物区域而获得人物区域图像；对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将所述人物区域图像与所述第二虚拟背景图像融合以得到合并图像，显示给与所述当前用户进行视频通信的目标用户；确定所述当前用户与所述目标用户之间的熟悉度；根据所述熟悉度从所述第一虚拟背景图像中获取相应的图像元素，并在所述第二虚拟背景图像中向所述目标用户显示所述图像元素。An embodiment of the present invention provides a video communication background display method for an electronic device, including: acquiring a scene image of a current user; acquiring a depth image of the current user; processing the scene image and the depth image to extract the the character area of the current user in the scene image to obtain a character area image; perform locking processing on the preset first virtual background image to generate a second virtual background image, and combine the character area image with the second virtual background image. The virtual background images are fused to obtain a combined image, which is displayed to the target user performing video communication with the current user; the familiarity between the current user and the target user is determined; Corresponding image elements are acquired from the background image, and the image elements are displayed to the target user in the second virtual background image.

本发明另一实施例提供一种视频通信背景显示装置，用于电子装置，包括：可见光摄像头，用于获取当前用户的场景图像；深度图像采集组件，用于获取所述当前用户的深度图像；处理器，用于处理所述场景图像和所述深度图像，以提取所述当前用户在所述场景图像中的人物区域而获得人物区域图像；对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将所述人物区域图像与所述第二虚拟背景图像融合以得到合并图像，显示给与所述当前用户进行视频通信的目标用户；确定所述当前用户与所述目标用户之间的熟悉度；根据所述熟悉度从所述第一虚拟背景图像中获取相应的图像元素，并在所述第二虚拟背景图像中向所述目标用户显示所述图像元素。Another embodiment of the present invention provides a video communication background display device for an electronic device, including: a visible light camera for acquiring a scene image of a current user; a depth image acquisition component for acquiring a depth image of the current user; a processor, configured to process the scene image and the depth image to extract the character area of the current user in the scene image to obtain a character area image; perform locking processing on the preset first virtual background image generating a second virtual background image, and merging the character area image with the second virtual background image to obtain a combined image, and displaying it to a target user who is in video communication with the current user; determining that the current user and the Familiarity between target users; acquiring corresponding image elements from the first virtual background image according to the familiarity, and displaying the image elements to the target users in the second virtual background image.

本发明又一实施例提供一种电子装置，包括：一个或多个处理器；存储器；和一个或多个程序，其中所述一个或多个程序被存储在所述存储器中，并且被配置成由所述一个或多个处理器执行，所述程序包括用于执行上述实施例所述的视频通信背景显示方法。Yet another embodiment of the present invention provides an electronic device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to Executed by the one or more processors, the program includes a method for performing the video communication background display method described in the above embodiments.

本发明还一实施例提供一种计算机可读存储介质，包括与能够摄像的电子装置结合使用的计算机程序，所述计算机程序可被处理器执行以完成上述实施例所述的视频通信背景显示方法。Another embodiment of the present invention provides a computer-readable storage medium, comprising a computer program used in conjunction with an electronic device capable of imaging, and the computer program can be executed by a processor to implement the video communication background display method described in the foregoing embodiment. .

本发明实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:

获取当前用户的场景图像，获取当前用户的深度图像，处理场景图像和深度图像，以提取当前用户在场景图像中的人物区域而获得人物区域图像，对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将人物区域图像与第二虚拟背景图像融合以得到合并图像，显示给与当前用户进行视频通信的目标用户，确定当前用户与目标用户之间的熟悉度，根据熟悉度从第一虚拟背景图像中获取相应的图像元素，并在第二虚拟背景图像中向目标用户显示图像元素。由此，在视频通信时，随着用户和目标用户之间的熟悉，将当前用户的虚拟背景图像逐渐开放并显示给目标用户，保护了用户的隐私，实现了通信安全。Obtain the scene image of the current user, obtain the depth image of the current user, process the scene image and the depth image to extract the character area of the current user in the scene image to obtain the character area image, and lock the preset first virtual background image The second virtual background image is generated by processing, and the image of the character area is fused with the second virtual background image to obtain a combined image, which is displayed to the target user performing video communication with the current user, and the familiarity between the current user and the target user is determined. The familiarity obtains corresponding image elements from the first virtual background image, and displays the image elements to the target user in the second virtual background image. Therefore, during video communication, as the user and the target user become familiar with each other, the virtual background image of the current user is gradually opened and displayed to the target user, which protects the user's privacy and realizes communication security.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be set forth, in part, from the following description, and in part will be apparent from the following description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1是本发明某些实施方式的视频通信背景显示方法的流程示意图；1 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图2是本发明某些实施方式的视频通信背景显示装置的模块示意图；2 is a schematic block diagram of a video communication background display device according to some embodiments of the present invention;

图3是本发明某些实施方式的电子装置的结构示意图；3 is a schematic structural diagram of an electronic device according to some embodiments of the present invention;

图4是本发明某些实施方式的视频通信背景显示方法的流程示意图；4 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图5是本发明某些实施方式的视频通信背景显示方法的流程示意图；5 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图6(a)至图6(e)是根据本发明一个实施例的结构光测量的场景示意图；6(a) to 6(e) are schematic diagrams of scenes of structured light measurement according to an embodiment of the present invention;

图7(a)和图7(b)根据本发明一个实施例的结构光测量的场景示意图；Fig. 7(a) and Fig. 7(b) are schematic diagrams of scenes of structured light measurement according to an embodiment of the present invention;

图8是本发明某些实施方式的视频通信背景显示方法的流程示意图；8 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图9是本发明某些实施方式的视频通信背景显示方法的流程示意图；9 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图10是本发明某些实施方式的视频通信背景显示方法的流程示意图；10 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图11是本发明某些实施方式的视频通信背景显示方法的流程示意图；11 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图12是本发明某些实施方式的视频通信背景显示方法的流程示意图；12 is a schematic flowchart of a video communication background display method according to some embodiments of the present invention;

图13是本发明某些实施方式的电子装置的模块示意图；以及13 is a schematic block diagram of an electronic device according to some embodiments of the present invention; and

图14是本发明某些实施方式的电子装置的模块示意图。14 is a schematic block diagram of an electronic device according to some embodiments of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

下面参考附图描述本发明实施例的视频通信背景显示方法和装置。The method and apparatus for displaying a video communication background according to embodiments of the present invention are described below with reference to the accompanying drawings.

图1是根据本发明一个实施例的视频通信背景显示方法的流程图，如图1所示，该方法包括：FIG. 1 is a flowchart of a method for displaying a video communication background according to an embodiment of the present invention. As shown in FIG. 1 , the method includes:

步骤101，获取当前用户的场景图像。Step 101: Acquire a scene image of the current user.

步骤102，获取当前用户的深度图像。Step 102, acquiring a depth image of the current user.

步骤103，处理场景图像和深度图像，以提取当前用户在场景图像中的人物区域而获得人物区域图像。Step 103: Process the scene image and the depth image to extract the character area of the current user in the scene image to obtain the character area image.

其中，参见图2和图3，本发明实施方式的视频通信背景显示方法可以由本发明实施方式的视频通信背景显示装置100实现。本发明实施方式的视频通信背景显示100用于电子装置1000。如图3所示，视频通信背景显示装置100包括可见光摄像头11、深度图像采集组件12和处理器20。步骤101可以由可见光摄像头11实现，步骤102可以由深度图像采集组件12实现，步骤103和步骤104由处理器20实现。2 and 3 , the video communication background display method according to the embodiment of the present invention can be implemented by the video communication background display apparatus 100 according to the embodiment of the present invention. The video communication background display 100 of the embodiment of the present invention is used in the electronic device 1000 . As shown in FIG. 3 , the video communication background display device 100 includes a visible light camera 11 , a depth image acquisition component 12 and a processor 20 . Step 101 may be implemented by the visible light camera 11 , step 102 may be implemented by the depth image acquisition component 12 , and steps 103 and 104 may be implemented by the processor 20 .

也即是说，可见光摄像头11可用于获取当前用户的场景图像；深度图像采集组件12可用于获取当前用户的深度图像；处理器20可用于处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像，以及将人物区域图像与第二虚拟背景图像融合以得到合并图像。That is to say, the visible light camera 11 can be used to obtain the scene image of the current user; the depth image acquisition component 12 can be used to obtain the depth image of the current user; the processor 20 can be used to process the scene image and the depth image to extract the current user in the scene image. The person area image is obtained by the person area image, and the person area image is fused with the second virtual background image to obtain a combined image.

其中，场景图像为可以是灰度图像或彩色图像，深度图像表征包含当前用户的场景中各个人或物体的深度信息。场景图像的场景范围与深度图像的场景范围一致，且场景图像中的各个像素均能在深度图像中找到对应该像素的深度信息。The scene image may be a grayscale image or a color image, and the depth image representation includes depth information of each person or object in the scene of the current user. The scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

本发明实施方式的视频通信背景显示装置100可以应用于本发明实施方式的电子装置1000。也即是说，本发明实施方式的电子装置1000包括本发明实施方式的视频通信背景显示装置100。The video communication background display device 100 of the embodiment of the present invention can be applied to the electronic device 1000 of the embodiment of the present invention. That is to say, the electronic device 1000 of the embodiment of the present invention includes the video communication background display device 100 of the embodiment of the present invention.

在某些实施方式中，电子装置1000包括手机、平板电脑、笔记本电脑、智能手环、智能手表、智能头盔、智能眼镜等。In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a laptop computer, a smart bracelet, a smart watch, a smart helmet, smart glasses, and the like.

现有的分割人物与背景的方法主要根据相邻像素在像素值方面的相似性和不连续性进行人物与背景的分割，但这种分割方法易受外界光照等环境因素的影响。本发明实施方式的视频通信背景显示方法视频通信背景显示装置100和电子装置1000通过获取当前用户的深度图像以将场景图像中的人物区域提取出来。由于深度图像的获取不易受光照、场景中色彩分布等因素的影响，因此，通过深度图像提取到的人物区域更加准确，尤其可以准确标定出人物区域的边界。进一步地，较为精准的人物区域图像与第二虚拟背景融合后的合并图像的效果更佳。The existing methods for segmenting people and backgrounds are mainly based on the similarity and discontinuity of adjacent pixels in terms of pixel values. However, this segmentation method is easily affected by environmental factors such as external lighting. The video communication background display method according to the embodiment of the present invention The video communication background display device 100 and the electronic device 1000 extract the character area in the scene image by acquiring the depth image of the current user. Since the acquisition of the depth image is not easily affected by factors such as illumination and color distribution in the scene, the character area extracted from the depth image is more accurate, especially the boundary of the character area can be accurately demarcated. Further, the effect of the merged image obtained by merging the more accurate person region image and the second virtual background is better.

请参见图4，作为一种可能的实现方式，上述步骤102中获取当前用户的深度图像的步骤包括：Referring to FIG. 4, as a possible implementation manner, the step of acquiring the depth image of the current user in the above step 102 includes:

步骤201，向当前用户投射结构光。Step 201, projecting structured light to the current user.

步骤202，拍摄经当前用户调制的结构光图像。Step 202, capturing a structured light image modulated by the current user.

步骤203，解调结构光图像的各个像素对应的相位信息以得到深度图像。Step 203: Demodulate phase information corresponding to each pixel of the structured light image to obtain a depth image.

在本示例中，继续参见图3，深度图像采集组件12包括结构光投射器121和结构光摄像头122。步骤201可以由结构光投射器121实现，步骤202和步骤203可以由结构光摄像头122实现。In this example, continuing to refer to FIG. 3 , the depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122 . Step 201 may be implemented by the structured light projector 121 , and steps 202 and 203 may be implemented by the structured light camera 122 .

也即是说，结构光投射器121可用于向当前用户透射结构光；结构光摄像头122可用于拍摄经当前用户调制的结构光图像，以及解调结构光图像的各个像素对应的相位信息以得到深度图像。That is to say, the structured light projector 121 can be used to transmit structured light to the current user; the structured light camera 122 can be used to capture the structured light image modulated by the current user, and demodulate the phase information corresponding to each pixel of the structured light image to obtain depth image.

具体地，结构光投射器121将一定模式的结构光投射到当前用户的面部及躯体上后，在当前用户的面部及躯体的表面会形成由当前用户调制后的结构光图像。结构光摄像头122拍摄经调制后的结构光图像，再对结构光图像进行解调以得到深度图像。其中，结构光的模式可以是激光条纹、格雷码、正弦条纹、非均匀散斑等。Specifically, after the structured light projector 121 projects a certain pattern of structured light onto the face and body of the current user, a structured light image modulated by the current user is formed on the surface of the current user's face and body. The structured light camera 122 captures the modulated structured light image, and then demodulates the structured light image to obtain a depth image. Among them, the mode of structured light can be laser stripes, Gray codes, sinusoidal stripes, non-uniform speckle, and the like.

请参阅图5，在某些实施方式中，步骤203解调结构光图像的各个像素对应的相位信息以得到深度图像的步骤包括：Referring to FIG. 5, in some embodiments, the step of demodulating phase information corresponding to each pixel of the structured light image in step 203 to obtain a depth image includes:

步骤301，解调结构光图像中各个像素对应的相位信息。Step 301: Demodulate phase information corresponding to each pixel in the structured light image.

步骤302，将相位信息转化为深度信息。Step 302, converting the phase information into depth information.

步骤303，根据深度信息生成深度图像。Step 303, generating a depth image according to the depth information.

请继续参阅图2，在某些实施方式中，步骤301、步骤302和步骤303均可以由结构光摄像头122实现。Please continue to refer to FIG. 2 , in some embodiments, step 301 , step 302 and step 303 may all be implemented by the structured light camera 122 .

也即是说，结构光摄像头122可进一步用于解调结构光图像中各个像素对应的相位信息，将相位信息转化为深度信息，以及根据深度信息生成深度图像。That is to say, the structured light camera 122 can be further used to demodulate the phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

具体地，与未经调制的结构光相比，调制后的结构光的相位信息发生了变化，在结构光图像中呈现出的结构光是产生了畸变之后的结构光，其中，变化的相位信息即可表征物体的深度信息。因此，结构光摄像头122首先解调出结构光图像中各个像素对应的相位信息，再根据相位信息计算出深度信息，从而得到最终的深度图像。Specifically, compared with the unmodulated structured light, the phase information of the modulated structured light has changed, and the structured light presented in the structured light image is the structured light after distortion is generated, wherein the changed phase information It can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information according to the phase information, thereby obtaining the final depth image.

为了使本领域的技术人员更加清楚的了解根据结构来采集当前用户的面部及躯体的深度图像的过程，下面以一种应用广泛的光栅投影技术(条纹投影技术)为例来阐述其具体原理。其中，光栅投影技术属于广义上的面结构光。In order to make those skilled in the art more clearly understand the process of collecting the depth image of the current user's face and body according to the structure, the following takes a widely used grating projection technology (fringe projection technology) as an example to illustrate its specific principle. Among them, the grating projection technology belongs to the surface structured light in a broad sense.

如图6(a)所示，在使用面结构光投影的时候，首先通过计算机编程产生正弦条纹，并将正弦条纹通过结构光投射器121投射至被测物，再利用结构光摄像头122拍摄条纹受物体调制后的弯曲程度，随后解调该弯曲条纹得到相位，再将相位转化为深度信息即可获取深度图像。为避免产生误差或误差耦合的问题，使用结构光进行深度信息采集前需对深度图像采集组件12进行参数标定，标定包括几何参数(例如，结构光摄像头122与结构光投射器121之间的相对位置参数等)的标定、结构光摄像头122的内部参数以及结构光投射器121的内部参数的标定等。As shown in Fig. 6(a), when using surface structured light projection, first generate sinusoidal fringes through computer programming, project the sinusoidal fringes to the object to be measured through the structured light projector 121, and then use the structured light camera 122 to shoot the fringes The degree of curvature after modulation by the object, then demodulate the curved fringes to obtain the phase, and then convert the phase into depth information to obtain the depth image. In order to avoid the problem of errors or error coupling, the depth image acquisition component 12 needs to be calibrated before using structured light for depth information acquisition. The calibration includes geometric parameters (for example, the relative relationship between the structured light camera 122 and the structured light projector 121 ). position parameters, etc.), the internal parameters of the structured light camera 122 and the internal parameters of the structured light projector 121, and the like.

具体而言，第一步，计算机编程产生正弦条纹。由于后续需要利用畸变的条纹获取相位，比如采用四步移相法获取相位，因此这里产生四幅相位差为

的条纹，然后结构光投射器121将该四幅条纹分时投射到被测物(图6(a)所示的面具)上，结构光摄像头122采集到如图6(b)左边的图，同时要读取如图6(b)右边所示的参考面的条纹。Specifically, in the first step, the computer was programmed to generate sinusoidal fringes. Since the distorted fringes need to be used to obtain the phase in the future, for example, the four-step phase-shift method is used to obtain the phase, so the four phase differences generated here are:

Then the structured light projector 121 time-divisionally projects the four fringes onto the object to be tested (the mask shown in Fig. 6(a) ), and the structured light camera 122 collects the picture on the left of Fig. 6(b), and at the same time To read the fringes of the reference plane as shown on the right in Figure 6(b).

第二步，进行相位恢复。结构光摄像头122根据采集到的四幅受调制的条纹图(即结构光图像)计算出被调制相位，此时得到的相位图是截断相位图。因为四步移相算法得到的结果是由反正切函数计算所得，因此结构光调制后的相位被限制在[-π,π]之间，也就是说，每当调制后的相位超过[-π,π]，其又会重新开始。最终得到的相位主值如图6(c)所示。The second step is to perform phase recovery. The structured light camera 122 calculates the modulated phase according to the collected four modulated fringe images (ie, structured light images), and the phase image obtained at this time is a truncated phase image. Because the result obtained by the four-step phase-shifting algorithm is calculated by the arc tangent function, the modulated phase of structured light is limited between [-π, π], that is, whenever the modulated phase exceeds [-π] ,π], it starts again. The final phase principal value is shown in Fig. 6(c).

其中，在进行相位恢复过程中，需要进行消跳变处理，即将截断相位恢复为连续相位。如图6(d)所示，左边为受调制的连续相位图，右边是参考连续相位图。Among them, in the process of phase recovery, it is necessary to perform de-jump processing, that is, to restore the truncated phase to a continuous phase. As shown in Fig. 6(d), the left side is the modulated continuous phase diagram, and the right side is the reference continuous phase diagram.

第三步，将受调制的连续相位和参考连续相位相减得到相位差(即相位信息)，该相位差表征了被测物相对参考面的深度信息，再将相位差代入相位与深度的转化公式(公式中涉及到的参数经过标定)，即可得到如图6(e)所示的待测物体的三维模型。The third step is to subtract the modulated continuous phase and the reference continuous phase to obtain a phase difference (ie, phase information), which represents the depth information of the measured object relative to the reference plane, and then substitute the phase difference into the conversion of phase and depth Formula (parameters involved in the formula are calibrated), the three-dimensional model of the object to be measured as shown in Figure 6(e) can be obtained.

应当理解的是，在实际应用中，根据具体应用场景的不同，本发明实施例中所采用的结构光除了上述光栅之外，还可以是其他任意图案。It should be understood that, in practical applications, according to different specific application scenarios, the structured light used in the embodiments of the present invention may also be any other pattern in addition to the above-mentioned grating.

作为一种可能的实现方式，本发明还可使用散斑结构光进行当前用户的深度信息的采集。As a possible implementation manner, the present invention can also use speckle structured light to collect depth information of the current user.

具体地，散斑结构光获取深度信息的方法是使用一基本为平板的衍射元件，该衍射元件具有特定相位分布的浮雕衍射结构，横截面为具有两个或多个凹凸的台阶浮雕结构。衍射元件中基片的厚度大致为1微米，各个台阶的高度不均匀，高度的取值范围可为0.7微米～0.9微米。图7(a)所示结构为本实施例的准直分束元件的局部衍射结构。图7(b)为沿截面A-A的剖面侧视图，横坐标和纵坐标的单位均为微米。散斑结构光生成的散斑图案具有高度的随机性，并且会随着距离的不同而变换图案。因此，在使用散斑结构光获取深度信息前，首先需要标定出空间中的散斑图案，例如，在距离结构光摄像头122的0～4米的范围内，每隔1厘米取一个参考平面，则标定完毕后就保存了400幅散斑图像，标定的间距越小，获取的深度信息的精度越高。随后，结构光投射器121将散斑结构光投射到被测物(即当前用户)上，被测物表面的高度差使得投射到被测物上的散斑结构光的散斑图案发生变化。结构光摄像头122拍摄投射到被测物上的散斑图案(即结构光图像)后，再将散斑图案与前期标定后保存的400幅散斑图像逐一进行互相关运算，进而得到400幅相关度图像。空间中被测物体所在的位置会在相关度图像上显示出峰值，把上述峰值叠加在一起并经过插值运算后即可得到被测物的深度信息。Specifically, the method for obtaining depth information from speckle structured light is to use a substantially flat diffraction element with a relief diffraction structure with a specific phase distribution and a stepped relief structure with two or more concavities and convexities in cross section. The thickness of the substrate in the diffractive element is approximately 1 micron, the heights of each step are not uniform, and the value range of the height may be 0.7 microns to 0.9 microns. The structure shown in FIG. 7( a ) is the local diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 7(b) is a cross-sectional side view along the section A-A, and the units of the abscissa and the ordinate are both micrometers. The speckle pattern generated by speckle structured light is highly random and changes with distance. Therefore, before using the speckle structured light to obtain the depth information, the speckle pattern in the space needs to be calibrated first. After the calibration is completed, 400 speckle images are saved, and the smaller the calibration interval, the higher the accuracy of the acquired depth information. Then, the structured light projector 121 projects the speckle structured light onto the measured object (ie, the current user), and the height difference of the measured object surface changes the speckle pattern of the speckle structured light projected on the measured object. After the structured light camera 122 captures the speckle pattern (ie, structured light image) projected on the object to be measured, the speckle pattern is then cross-correlated with the 400 speckle images saved after the previous calibration one by one, and then 400 correlation images are obtained. degree image. The position of the measured object in the space will show a peak value on the correlation image, and the depth information of the measured object can be obtained by superimposing the above peaks and performing interpolation operation.

由于普通的衍射元件对光束进行衍射后得到多数衍射光，但每束衍射光光强差别大，对人眼伤害的风险也大。即便是对衍射光进行二次衍射，得到的光束的均匀性也较低。因此，利用普通衍射元件衍射的光束对被测物进行投射的效果较差。本实施例中采用准直分束元件，该元件不仅具有对非准直光束进行准直的作用，还具有分光的作用，即经反射镜反射的非准直光经过准直分束元件后往不同的角度出射多束准直光束，且出射的多束准直光束的截面面积近似相等，能量通量近似相等，进而使得利用该光束衍射后的散点光进行投射的效果更好。同时，激光出射光分散至每一束光，进一步降低了伤害人眼的风险，且散斑结构光相对于其他排布均匀的结构光来说，达到同样的采集效果时，散斑结构光消耗的电量更低。Since the ordinary diffractive element diffracts the light beam to obtain most of the diffracted light, but the light intensity of each diffracted light varies greatly, and the risk of damage to the human eye is also great. Even if the diffracted light is subjected to secondary diffraction, the uniformity of the resulting beam is low. Therefore, the effect of projecting the measured object with the light beam diffracted by the common diffractive element is poor. In this embodiment, a collimating beam splitting element is used, which not only has the function of collimating the non-collimated beam, but also has the function of splitting light, that is, the non-collimated light reflected by the mirror passes through the collimated beam splitting element Multiple collimated beams are emitted from different angles, and the cross-sectional areas of the emitted multiple collimated beams are approximately equal, and the energy fluxes are approximately equal, so that the projection effect of the diffracted scattered light of the beam is better. At the same time, the emitted laser light is dispersed to each beam, which further reduces the risk of harming the human eye. Compared with other structured lights with uniform distribution, the speckle structured light consumes the same amount of light when the same collection effect is achieved. lower power.

请参阅图8，作为一种可能的实现方式，步骤103处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像，包括：Referring to FIG. 8, as a possible implementation manner, step 103 processes the scene image and the depth image to extract the character area of the current user in the scene image to obtain the character area image, including:

步骤401，识别场景图像中的人脸区域。Step 401: Identify the face area in the scene image.

步骤402，从深度图像中获取与人脸区域对应的深度信息。Step 402: Acquire depth information corresponding to the face region from the depth image.

步骤403，根据人脸区域的深度信息确定人物区域的深度范围。Step 403: Determine the depth range of the character area according to the depth information of the face area.

步骤404，根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域以获得人物区域图像。Step 404 , according to the depth range of the character area, determine a character area connected to the face area and falling within the depth range to obtain an image of the character area.

请再参阅图2，在某些实施方式中，步401、步骤402、步骤403和步骤404均可以由处理器20实现。Referring to FIG. 2 again, in some embodiments, step 401 , step 402 , step 403 and step 404 may all be implemented by the processor 20 .

也即是说，处理器20可进一步用于识别场景图像中的人脸区域，从深度图像中获取与人脸区域对应的深度信息，根据人脸区域的深度信息确定人物区域的深度范围，以及根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域以获得人物区域图像。That is to say, the processor 20 may be further configured to identify the face area in the scene image, obtain depth information corresponding to the face area from the depth image, determine the depth range of the character area according to the depth information of the face area, and A person area connected to the face area and falling within the depth range is determined according to the depth range of the person area to obtain a person area image.

具体地，首先可采用已训练好的深度学习模型识别出场景图像中的人脸区域，随后根据场景图像与深度图像的对应关系可确定出人脸区域的深度信息。由于人脸区域包括鼻子、眼睛、耳朵、嘴唇等特征，因此，人脸区域中的各个特征在深度图像中所对应的深度数据是不同的，例如，在人脸正对深度图像采集组件12时，深度图像采集组件12拍摄得的深度图像中，鼻子对应的深度数据可能较小，而耳朵对应的深度数据可能较大。因此，上述的人脸区域的深度信息可能为一个数值或是一个数值范围。其中，当人脸区域的深度信息为一个数值时，该数值可通过对人脸区域的深度数据取平均值得到；或者，可以通过对人脸区域的深度数据取中值得到。Specifically, the trained deep learning model can be used to identify the face region in the scene image, and then the depth information of the face region can be determined according to the corresponding relationship between the scene image and the depth image. Since the face region includes features such as nose, eyes, ears, lips, etc., the depth data corresponding to each feature in the face region in the depth image is different, for example, when the face is facing the depth image acquisition component 12 , in the depth image captured by the depth image acquisition component 12, the depth data corresponding to the nose may be smaller, while the depth data corresponding to the ear may be larger. Therefore, the above-mentioned depth information of the face region may be a value or a value range. Wherein, when the depth information of the face region is a numerical value, the numerical value can be obtained by taking the average value of the depth data of the face region; or, it can be obtained by taking the median value of the depth data of the face region.

由于人物区域包含人脸区域，也即是说，人物区域与人脸区域同处于某一个深度范围内，因此，处理器20确定出人脸区域的深度信息后，可以根据人脸区域的深度信息设定人物区域的深度范围，再根据人物区域的深度范围提取落入该深度范围内且与人脸区域相连接的人物区域以获得人物区域图像。Since the character area includes the face area, that is to say, the character area and the face area are within a certain depth range. Therefore, after the processor 20 determines the depth information of the face area, it can be based on the depth information of the face area. The depth range of the person area is set, and then the person area that falls within the depth range and is connected to the face area is extracted according to the depth range of the person area to obtain a person area image.

如此，即可根据深度信息从场景图像中提取出人物区域图像。由于深度信息的获取不受环境中光照、色温等因素的影像响，因此，提取出的人物区域图像更加准确。In this way, the human region image can be extracted from the scene image according to the depth information. Since the acquisition of depth information is not affected by factors such as illumination and color temperature in the environment, the extracted image of the human area is more accurate.

请参阅图9，在某些实施方式中，视频通信背景显示方法还包括以下步骤：Referring to FIG. 9, in some embodiments, the video communication background display method further includes the following steps:

步骤501，处理场景图像以得到场景图像的全场边缘图像。Step 501: Process the scene image to obtain a full-field edge image of the scene image.

步骤502，根据所述全场边缘图像修正所述人物区域图像。Step 502, correcting the image of the person area according to the full-field edge image.

请再参阅图2，在某些实施方式中，步骤501和步骤502均可以由处理器20实现。Referring to FIG. 2 again, in some embodiments, both step 501 and step 502 may be implemented by the processor 20 .

也即是说，处理器20还可用于处理场景图像以得到场景图像的全场边缘图像，以及根据全场边缘图像修正人物区域图像。That is to say, the processor 20 can also be used to process the scene image to obtain the full-field edge image of the scene image, and to correct the person area image according to the full-field edge image.

处理器20首先对场景图像进行边缘提取以得到全场边缘图像，其中，全场边缘图像中的边缘线条包括当前用户以及当前用户所处场景中背景物体的边缘线条。具体地，可通过Canny算子对场景图像进行边缘提取。Canny算子进行边缘提取的算法的核心主要包括以下几步：首先，用2D高斯滤波模板对场景图像进行卷积以消除噪声；随后，利用微分算子得到各个像素的灰度的梯度值，并根据梯度值计算各个像素的灰度的梯度方向，通过梯度方向可以找到对应像素沿梯度方向的邻接像素；随后，遍历每一个像素，若某个像素的灰度值与其梯度方向上前后两个相邻像素的灰度值相比不是最大的，那么认为这个像素不是边缘点。如此，即可确定场景图像中处于边缘位置的像素点，从而获得边缘提取后的全场边缘图像。The processor 20 first performs edge extraction on the scene image to obtain a full-field edge image, wherein the edge lines in the full-field edge image include the current user and the edge lines of background objects in the scene where the current user is located. Specifically, the edge of the scene image can be extracted by the Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly includes the following steps: first, convolve the scene image with a 2D Gaussian filter template to eliminate noise; then, use the differential operator to obtain the gradient value of the gray level of each pixel, and Calculate the gradient direction of the grayscale of each pixel according to the gradient value, and the adjacent pixels of the corresponding pixel along the gradient direction can be found through the gradient direction; then, traverse each pixel, if the grayscale value of a pixel is the same as the two before and after the gradient direction. If the gray value of the adjacent pixel is not the largest, then the pixel is considered not to be an edge point. In this way, the pixel points at the edge position in the scene image can be determined, so as to obtain the full-field edge image after edge extraction.

处理器20获取全场边缘图像后，再根据全场边缘图像对人物区域图像进行修正。可以理解，人物区域图像是将场景图像中与人脸区域连接并落入设定的深度范围的所有像素进行归并后得到的，在某些场景下，可能存在一些与人脸区域连接且落入深度范围内的物体。因此，为使得提取的人物区域图像更为准确，可使用全场边缘图对人物区域图像进行修正。After acquiring the full-field edge image, the processor 20 corrects the character area image according to the full-field edge image. It can be understood that the image of the person area is obtained by merging all the pixels in the scene image that are connected to the face area and fall within the set depth range. Objects within the depth range. Therefore, in order to make the extracted human region image more accurate, the full-field edge map can be used to correct the human region image.

进一步地，处理器20还可对修正后的人物区域图像进行二次修正，例如，可对修正后的人物区域图像进行膨胀处理，扩大人物区域图像以保留人物区域图像的边缘细节。Further, the processor 20 may further perform a secondary correction on the corrected person area image, for example, may perform expansion processing on the corrected person area image to expand the person area image to preserve the edge details of the person area image.

步骤104，对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将人物区域图像与第二虚拟背景图像融合以得到合并图像，显示给与当前用户进行视频通信的目标用户。Step 104, performing locking processing on the preset first virtual background image to generate a second virtual background image, and merging the character area image with the second virtual background image to obtain a combined image, which is displayed to the target of video communication with the current user user.

其中，作为一种示例，预设的第一虚拟背景图像可能是根据用户所在真实场景信息建模生成的，该示例中的第一虚拟背景图像反应了用户所在的真实场景信息；作为另一种示例：预设的第一虚拟背景图像可以是用户根据个人喜好自行搭建的，该示例中的第一虚拟背景图像反应了用户的偏好。无论第一虚拟背景图像是基于何种元素建立的，都可以在某些程度上暴露用户的个人隐私。Wherein, as an example, the preset first virtual background image may be generated by modeling based on the real scene information where the user is located, and the first virtual background image in this example reflects the real scene information where the user is located; as another Example: the preset first virtual background image may be constructed by the user according to personal preference, and the first virtual background image in this example reflects the user's preference. Regardless of what elements the first virtual background image is based on, the user's personal privacy may be exposed to a certain extent.

处理器20得到人物区域图像后，对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将人物区域图像与第二虚拟背景图像融合以得到合并图像。在某些实施方式中预设的第一虚拟背景图像可以是由处理器20随机选取，或者由当前用户自行选定。融合后的合并图像可在电子装置1000的显示屏上进行显示，也可通过与电子装置1000连接的打印机进行打印。After obtaining the character area image, the processor 20 performs locking processing on the preset first virtual background image to generate a second virtual background image, and fuses the character area image with the second virtual background image to obtain a combined image. In some embodiments, the preset first virtual background image may be randomly selected by the processor 20 or selected by the current user. The merged combined image can be displayed on the display screen of the electronic device 1000 , and can also be printed by a printer connected to the electronic device 1000 .

在本发明的一个实施例中，当前用户与他人进行视频过程中希望隐藏第一虚拟背景，此时，即可使用本发明实施方式的视频通信背景显示方法将当前用户对应的人物区域图像与预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将所述人物区域图像与所述第二虚拟背景图像融合以得到合并图像，再向目标用户显示融合后的合并图像。由于当前用户正与对方视频通话，因此，可见光摄像头11需实时拍摄当前用户的场景图像，深度图像采集组件12也需要实时采集当前用户对应的深度图像，并由处理器20及时对实时采集的场景图像和深度图像进行处理以使得对方能够看到流畅的由多帧合并图像组合而成的视频画面。In one embodiment of the present invention, the current user wishes to hide the first virtual background during the video process with others. In this case, the video communication background display method according to the embodiment of the present invention can be used to display the image of the character area corresponding to the current user with the preset image. The set first virtual background image is locked to generate a second virtual background image, and the person area image is fused with the second virtual background image to obtain a merged image, and the merged image is displayed to the target user. Since the current user is in a video call with the other party, the visible light camera 11 needs to capture the scene image of the current user in real time, and the depth image acquisition component 12 also needs to capture the depth image corresponding to the current user in real time, and the processor 20 timely captures the scene captured in real time. The images and depth images are processed so that the other party can see a smooth video image composed of multiple frames of merged images.

需要说明的是，根据应用场景的不同，可采用不同的实现方式对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，作为一种可能的实现方式，对预设的第一虚拟背景图像进行高斯模糊处理生成第二虚拟背景图像，作为另一种可能的实现方式，对预设的第一虚拟背景图像进行马赛克处理生成第二虚拟背景图像。It should be noted that, according to different application scenarios, different implementations may be used to perform locking processing on the preset first virtual background image to generate the second virtual background image. Gaussian blurring is performed on a virtual background image to generate a second virtual background image. As another possible implementation manner, mosaic processing is performed on a preset first virtual background image to generate a second virtual background image.

步骤105，确定当前用户与目标用户之间的熟悉度。Step 105: Determine the familiarity between the current user and the target user.

步骤106，根据熟悉度从第一虚拟背景图像中获取相应的图像元素，并在第二虚拟背景图像中向目标用户显示图像元素。Step 106: Acquire corresponding image elements from the first virtual background image according to the familiarity, and display the image elements to the target user in the second virtual background image.

以理解，在一些应用场景下，视频通信的用户越是熟悉，则越是可能将暴露其隐私信息的第一虚拟背景图像展示给对方，此时，确定当前用户与目标用户之间的熟悉度，进而，若判断获知熟悉度大于预设阈值，则将合并图像切换成当前用户的场景图像向目标用户显示。It is understood that in some application scenarios, the more familiar the user of video communication is, the more likely it is to show the first virtual background image exposing their private information to the other party. At this time, determine the familiarity between the current user and the target user. , and further, if it is determined that the learned familiarity is greater than the preset threshold, the merged image is switched to the scene image of the current user and displayed to the target user.

需要说明的是，根据具体应用场景的不同，可采用多种不同的方式确定当前用户与目标用户之间的熟悉度：It should be noted that, according to different application scenarios, the familiarity between the current user and the target user can be determined in a variety of ways:

作为一种可能的实现方式，如图10所示，步骤105包括：As a possible implementation manner, as shown in Figure 10, step 105 includes:

步骤601，根据预设的匹配指标检测当前用户和目标用户的视频交互信息。Step 601: Detect video interaction information between a current user and a target user according to a preset matching index.

步骤602，若检测到视频交互信息满足预设的匹配信息，则查询预设的匹配信息与熟悉度的对应关系，确定当前用户与目标用户之间的熟悉度。Step 602 , if it is detected that the video interaction information satisfies the preset matching information, query the corresponding relationship between the preset matching information and the familiarity, and determine the familiarity between the current user and the target user.

在本示例中，应当理解的是，用户之间越是熟悉，其谈论的话题或者使用的话语越是随意，或者，其谈论的信息量越大，用户之间也越是熟悉，从而，可根据预设的匹配指标检测语音信息和文本信息的内容关键词，其中，该关键词是根据大量实验数据标定的，可以是界定用户之间熟悉度的词，比如包含用户之间关系称呼词(“妈妈”、“伙计”、“爸爸”)，又比如口语化的词“你去死吧”、“鬼才相信你”等，和/或，语音信息和文本信息的信息量。In this example, it should be understood that the more familiar the users are, the more casual the topics they talk about or the words they use, or the greater the amount of information they talk about, the more familiar the users are. Content keywords of voice information and text information are detected according to a preset matching index, wherein the keyword is calibrated according to a large amount of experimental data, and can be a word that defines the familiarity between users, such as a term containing a relationship between users ( "Mom", "Dude", "Dad"), for example, colloquial words "Go to hell", "The devil believes in you", etc., and/or, the information volume of voice messages and text messages.

进而，可以理解，预先设置匹配信息与熟悉度的对应关系，比如“妈妈”对应的熟悉度较高，“李教授”对应的熟悉度较低，语音信息量10条对应的熟悉度较低，语音信息量1000条对应的熟悉度较高等，从而，若检测到视频交互信息满足预设的匹配信息，则查询预设的匹配信息与熟悉度的对应关系，确定当前用户与目标用户之间的熟悉度。Further, it can be understood that the corresponding relationship between matching information and familiarity is preset, for example, the familiarity corresponding to "Mom" is relatively high, the familiarity corresponding to "Professor Li" is relatively low, and the familiarity corresponding to 10 pieces of voice information is relatively low. Therefore, if it is detected that the video interaction information meets the preset matching information, the corresponding relationship between the preset matching information and the familiarity is queried, and the relationship between the current user and the target user is determined. Familiarity.

作为另一种可能的实现方式，如图11所示，步骤105包括：As another possible implementation manner, as shown in FIG. 11 , step 105 includes:

步骤701，向目标用户发送与不同熟悉度对应的验证请求。Step 701: Send verification requests corresponding to different familiarity levels to the target user.

步骤702，根据目标用户反馈的请求响应与预设的标准信息进行验证，根据验证结果确定当前用户与目标用户之间的熟悉度。In step 702, verification is performed according to the request response fed back by the target user and preset standard information, and the familiarity between the current user and the target user is determined according to the verification result.

在本示例中，用户预先设置针对不同熟悉度对应的验证请求，并针对每一个验证请求设置对应的标准信息，比如针对高熟悉度设置的验证请求是“我们家有几口人”，针对该请求设置的标准信息可能为“五口”，针对低熟悉度设置的验证请求是“我是男是女”，针对该请求设置的标准信息可能为“女人”等，从而，在视频聊天时，向目标用户发送与不同熟悉度对应的验证请求。In this example, the user presets verification requests corresponding to different degrees of familiarity, and sets corresponding standard information for each verification request. The standard information set for the request may be "five mouths", the verification request set for low familiarity is "I am a man or a woman", and the standard information set for this request may be "woman", etc. Therefore, during video chat, Send verification requests corresponding to different familiarity levels to target users.

此时，目标用户根据该验证请求反馈请求响应，此时只有较为熟悉的目标用户才能反馈出与较高的熟悉度对应的验证请求对应的标准信息，较为陌生的目标用户仅能反馈出与较低的熟悉度对应的验证请求对应的标准信息，因而，根据目标用户反馈的请求响应与预设的标准信息进行验证，根据验证结果确定当前用户与目标用户之间的熟悉度。At this time, the target user feeds back the request response according to the verification request. At this time, only the more familiar target users can feed back the standard information corresponding to the verification request corresponding to the higher familiarity, and the relatively unfamiliar target users can only feedback the more familiar ones. The low familiarity corresponds to the standard information corresponding to the verification request. Therefore, verification is performed according to the request response fed back by the target user and the preset standard information, and the familiarity between the current user and the target user is determined according to the verification result.

作为又一种可能的实现方式，如图12所示，步骤105包括：As another possible implementation manner, as shown in Figure 12, step 105 includes:

步骤801，获取目标用户的用户图像，提取用户图像的面部特征信息。步骤802，根据面部特征信息查询预设的图像信息库获取目标用户的身份信息。In step 801, a user image of a target user is acquired, and facial feature information of the user image is extracted. Step 802: Query a preset image information database according to the facial feature information to obtain the identity information of the target user.

步骤803，查询预设的身份信息与熟悉度的对应关系，确定当前用户与目标用户之间的熟悉度。Step 803 , query the corresponding relationship between the preset identity information and the familiarity, and determine the familiarity between the current user and the target user.

在本示例中，预先建立其他用户的身份信息与熟悉度的对应关系，比如，建立家人的身份信息与较高熟悉度的对应关系，建立朋友的身份信息与中等熟悉度的对应关系，建立陌生人与较低熟悉度的对应关系等。In this example, the correspondence between the identity information of other users and the familiarity is established in advance, for example, the correspondence between the identity information of family members and the higher familiarity is established, the correspondence between the identity information of friends and the medium familiarity is established, and the stranger is established. Correspondence between people and lower familiarity, etc.

进而，获取目标用户的用户图像，提取用户图像的面部特征信息，该用户图像可是截取视频通话中目标用户的面部截图提取的，进而，根据面部特征信息查询预设的图像信息库获取目标用户的身份信息，查询预设的身份信息与熟悉度的对应关系，确定当前用户与目标用户之间的熟悉度。Further, the user image of the target user is obtained, and the facial feature information of the user image is extracted, and the user image can be extracted by intercepting the facial screenshot of the target user in the video call, and then, the preset image information database is queried according to the facial feature information to obtain the target user's facial feature information. Identity information, query the correspondence between preset identity information and familiarity, and determine the familiarity between the current user and the target user.

进一步地，根据熟悉度从第一虚拟背景图像中获取相应的图像元素，并在第二虚拟背景图像中向目标用户显示图像元素，其中，该图像元素包括第一虚拟背景图像中的物品、颜色等，其中，熟悉度越高，在第二虚拟背景图像中向目标用户显示图像元素越多，作为一种可能的实现方式，根据与熟悉度对应的物品数量，从第一虚拟背景图像中获取相应数量的图像元素，作为另一种可能的实现方式，根据与熟悉度对应的物品类型(比如办公类型、生活用品类型等)，从第一虚拟背景图像中获取相应类型的图像元素。Further, the corresponding image elements are obtained from the first virtual background image according to the familiarity, and the image elements are displayed to the target user in the second virtual background image, wherein the image elements include the items, colors in the first virtual background image etc., where the higher the familiarity is, the more image elements are displayed to the target user in the second virtual background image. As a possible implementation, the number of items corresponding to the familiarity is obtained from the first virtual background image. For the corresponding number of image elements, as another possible implementation manner, image elements of corresponding types are obtained from the first virtual background image according to the item type (eg, office type, daily necessities type, etc.) corresponding to the familiarity.

综上所述，本发明实施例的视频通信背景显示方法，获取当前用户的场景图像，获取当前用户的深度图像，处理场景图像和深度图像，以提取当前用户在场景图像中的人物区域而获得人物区域图像，对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将人物区域图像与第二虚拟背景图像融合以得到合并图像，显示给与当前用户进行视频通信的目标用户，确定当前用户与目标用户之间的熟悉度，根据熟悉度从第一虚拟背景图像中获取相应的图像元素，并在第二虚拟背景图像中向目标用户显示图像元素。由此，在视频通信时，随着用户和目标用户之间的熟悉，将当前用户的虚拟背景图像逐渐开放并显示给目标用户，保护了用户的隐私，实现了通信安全。To sum up, the video communication background display method according to the embodiment of the present invention is obtained by acquiring the scene image of the current user, acquiring the depth image of the current user, and processing the scene image and the depth image to extract the character area of the current user in the scene image. Character area image, lock the preset first virtual background image to generate a second virtual background image, and fuse the character area image with the second virtual background image to obtain a combined image, and display it to the video communication with the current user. The target user determines the familiarity between the current user and the target user, acquires corresponding image elements from the first virtual background image according to the familiarity, and displays the image elements to the target user in the second virtual background image. Therefore, during video communication, as the user and the target user become familiar with each other, the virtual background image of the current user is gradually opened and displayed to the target user, which protects the user's privacy and realizes communication security.

请一并参阅图3和图13，本发明实施方式还提出一种电子装置1000。电子装置1000包括视频通信背景显示装置100。视频通信背景显示装置100可以利用硬件和/或软件实现。视频通信背景显示装置100包括成像设备10和处理器20。Please refer to FIG. 3 and FIG. 13 together. An embodiment of the present invention further provides an electronic device 1000 . The electronic device 1000 includes a video communication background display device 100 . VIDEO COMMUNICATIONS BACKGROUND Display apparatus 100 may be implemented using hardware and/or software. Video Communication Background The display apparatus 100 includes an imaging device 10 and a processor 20 .

成像设备10包括可见光摄像头11和深度图像采集组件12。The imaging device 10 includes a visible light camera 11 and a depth image acquisition component 12 .

具体地，可见光摄像头11包括图像传感器111和透镜112，可见光摄像头11可用于捕捉当前用户的彩色信息以获得场景图像，其中，图像传感器111包括彩色滤镜阵列(如Bayer滤镜阵列)，透镜112的个数可为一个或多个。可见光摄像头11在获取场景图像过程中，图像传感器111中的每一个成像像素感应来自拍摄场景中的光强度和波长信息，生成一组原始图像数据；图像传感器111将该组原始图像数据发送至处理器20中，处理器20对原始图像数据进行去噪、插值等运算后即得到彩色的场景图像。处理器20可按多种格式对原始图像数据中的每个图像像素逐一处理，例如，每个图像像素可具有8、10、12或14比特的位深度，处理器20可按相同或不同的位深度对每一个图像像素进行处理。Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112. The visible light camera 11 can be used to capture color information of the current user to obtain a scene image, wherein the image sensor 111 includes a color filter array (such as a Bayer filter array), and the lens 112 can be one or more. When the visible light camera 11 acquires a scene image, each imaging pixel in the image sensor 111 senses the light intensity and wavelength information from the shooting scene to generate a set of raw image data; the image sensor 111 sends the set of raw image data to the processing In the processor 20, the processor 20 obtains a color scene image after performing operations such as denoising and interpolation on the original image data. The processor 20 may process each image pixel in the raw image data individually in a variety of formats, for example, each image pixel may have a bit depth of 8, 10, 12 or 14 bits, and the processor 20 may process the raw image data in the same or different formats. Bit depth is processed for each image pixel.

深度图像采集组件12包括结构光投射器121和结构光摄像头122，深度图像采集组件12可用于捕捉当前用户的深度信息以得到深度图像。结构光投射器121用于将结构光投射至当前用户，其中，结构光图案可以是激光条纹、格雷码、正弦条纹或者随机排列的散斑图案等。结构光摄像头122包括图像传感器1221和透镜1222，透镜1222的个数可为一个或多个。图像传感器1221用于捕捉结构光投射器121投射至当前用户上的结构光图像。结构光图像可由深度采集组件12发送至处理器20进行解调、相位恢复、相位信息计算等处理以获取当前用户的深度信息。The depth image acquisition component 12 includes a structured light projector 121 and a structured light camera 122, and the depth image acquisition component 12 can be used to capture depth information of the current user to obtain a depth image. The structured light projector 121 is used for projecting structured light to the current user, wherein the structured light pattern may be a laser stripe, a Gray code, a sinusoidal stripe or a randomly arranged speckle pattern or the like. The structured light camera 122 includes an image sensor 1221 and a lens 1222, and the number of the lenses 1222 may be one or more. The image sensor 1221 is used to capture the structured light image projected by the structured light projector 121 onto the current user. The structured light image can be sent by the depth acquisition component 12 to the processor 20 for processing such as demodulation, phase recovery, phase information calculation, etc. to obtain the depth information of the current user.

在某些实施方式中，可见光摄像头11与结构光摄像头122的功能可由一个摄像头实现，也即是说，成像设备10仅包括一个摄像头和一个结构光投射器121，上述摄像头不仅可以拍摄场景图像，还可拍摄结构光图像。In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be realized by one camera, that is to say, the imaging device 10 only includes one camera and one structured light projector 121, and the above cameras can not only capture scene images, Structured light images can also be taken.

除了采用结构光获取深度图像外，还可通过双目视觉方法、基于飞行时间差(Timeof Flight，TOF)等深度像获取方法来获取当前用户的深度图像。In addition to using structured light to acquire depth images, depth images of the current user can also be acquired through binocular vision methods and depth image acquisition methods based on time of flight (TOF).

处理器20进一步用于将由从场景图像和深度图像中提取的人物区域图像与第二虚拟背景图像融合，将合并图像显示给与当前用户进行视频通信的目标用户，确定当前用户与目标用户之间的熟悉度，在判断获知熟悉度大于预设阈值时，将合并图像切换成当前用户的场景图像向目标用户显示。在提取人物区域图像时，处理器20可以结合深度图像中的深度信息从场景图像中提取出二维的人物区域图像，也可以根据深度图像中的深度信息建立人物区域的三维图，再结合场景图像中的色彩信息对三维的人物区域进行颜色填补以得到三维的彩色的人物区域图像。因此，融合处理人物区域图像和第二虚拟背景图像时可以是将二维的人物区域图像与第二虚拟背景图像进行融合以得到合并图像，也可以是将三维的彩色的人物区域图像与第二虚拟背景图像进行融合以得到合并图像。The processor 20 is further configured to fuse the image of the character region extracted from the scene image and the depth image with the second virtual background image, display the merged image to the target user performing video communication with the current user, and determine the relationship between the current user and the target user. When it is determined that the familiarity is greater than the preset threshold, the merged image is switched to the scene image of the current user and displayed to the target user. When extracting the image of the person area, the processor 20 can extract a two-dimensional image of the person area from the scene image in combination with the depth information in the depth image, or can create a three-dimensional image of the person area according to the depth information in the depth image, and then combine the scene The color information in the image fills the three-dimensional character area with color to obtain a three-dimensional color image of the character area. Therefore, when fusing the image of the person area and the second virtual background image, the two-dimensional image of the person area and the second virtual background image may be fused to obtain a combined image, or the three-dimensional color image of the person area and the second virtual background image may be fused. The virtual background images are fused to obtain a merged image.

此外，视频通信背景显示装置100还包括图像存储器30。图像存储器30可内嵌在电子装置1000中，也可以是独立于电子装置1000外的存储器，并可包括直接存储器存取(Direct Memory Access，DMA)特征。可见光摄像头11采集的原始图像数据或深度图像采集组件12采集的结构光图像相关数据均可传送至图像存储器30中进行存储或缓存。处理器20可从图像存储器30中读取原始图像数据以进行处理得到场景图像，也可从图像存储器30中读取结构光图像相关数据以进行处理得到深度图像。另外，场景图像和深度图像还可存储在图像存储器30中，以供处理器20随时调用处理，例如，处理器20调用场景图像和深度图像进行人物区域提取，并将提后的得到的人物区域图像与第二虚拟背景图像进行融合处理以得到合并图像。其中，第二虚拟背景图像和合并图像也可存储在图像存储器30中。In addition, the video communication background display apparatus 100 further includes an image memory 30 . The image memory 30 may be embedded in the electronic device 1000, or may be a memory independent of the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The original image data collected by the visible light camera 11 or the related data of the structured light image collected by the depth image collection component 12 can be sent to the image memory 30 for storage or buffering. The processor 20 can read raw image data from the image memory 30 for processing to obtain a scene image, and can also read data related to structured light images from the image memory 30 for processing to obtain a depth image. In addition, the scene image and the depth image can also be stored in the image memory 30 for the processor 20 to call for processing at any time. For example, the processor 20 calls the scene image and the depth image to extract the character area, and then retrieves the obtained character area. The image is fused with the second virtual background image to obtain a merged image. Wherein, the second virtual background image and the merged image may also be stored in the image memory 30 .

视频通信背景显示装置100还可包括显示器50。显示器50可直接从处理器20中获取合并图像，还可从图像存储器30中获取合并图像。显示器50显示合并图像以供目标用户观看，或者由图形引擎或图形处理器(Graphics Processing Unit，GPU)进行进一步的处理。视频通信背景显示装置100还包括编码器/解码器60，编码器/解码器60可编解码场景图像、深度图像及合并图像等的图像数据，编码的图像数据可被保存在图像存储器30中，并可以在图像显示在显示器50上之前由解码器解压缩以进行显示。编码器/解码器60可由中央处理器(Central Processing Unit，CPU)、GPU或协处理器实现。换言之，编码器/解码器60可以是中央处理器(Central Processing Unit，CPU)、GPU、及协处理器中的任意一种或多种。VIDEO COMMUNICATIONS BACKGROUND Display device 100 may also include display 50 . The display 50 may obtain the merged image directly from the processor 20 , and may also obtain the merged image from the image memory 30 . The display 50 displays the combined image for viewing by the target user, or further processing is performed by a graphics engine or a graphics processing unit (Graphics Processing Unit, GPU). Video communication background The display device 100 further includes an encoder/decoder 60, the encoder/decoder 60 can encode and decode image data such as scene images, depth images, and merged images, and the encoded image data can be stored in the image memory 30, And may be decompressed by the decoder for display before the image is displayed on the display 50 . The encoder/decoder 60 may be implemented by a central processing unit (Central Processing Unit, CPU), a GPU or a co-processor. In other words, the encoder/decoder 60 may be any one or more of a central processing unit (Central Processing Unit, CPU), a GPU, and a co-processor.

视频通信背景显示装置100还包括控制逻辑器40。成像设备10在成像时，处理器20会根据成像设备获取的数据进行分析以确定成像设备10的一个或多个控制参数(例如，曝光时间等)的图像统计信息。处理器20将图像统计信息发送至控制逻辑器40，控制逻辑器40控制成像设备10以确定好的控制参数进行成像。控制逻辑器40可包括执行一个或多个例程(如固件)的处理器和/或微控制器。一个或多个例程可根据接收的图像统计信息确定成像设备10的控制参数。Video Communication Background Display device 100 also includes control logic 40 . When the imaging device 10 is imaging, the processor 20 analyzes the data obtained by the imaging device to determine image statistics of one or more control parameters (eg, exposure time, etc.) of the imaging device 10 . Processor 20 sends the image statistics to control logic 40, which controls imaging device 10 to determine good control parameters for imaging. Control logic 40 may include a processor and/or microcontroller executing one or more routines (eg, firmware). One or more routines may determine control parameters for imaging device 10 based on the received image statistics.

请参阅图14，本发明实施方式的电子装置1000包括一个或多个处理器200、存储器300和一个或多个程序310。其中一个或多个程序310被存储在存储器300中，并且被配置成由一个或多个处理器200执行。程序310包括用于执行上述任意一项实施方式的视频通信背景显示方法的指令。Referring to FIG. 14 , an electronic device 1000 according to an embodiment of the present invention includes one or more processors 200 , a memory 300 and one or more programs 310 . One or more of the programs 310 are stored in the memory 300 and configured to be executed by the one or more processors 200 . Program 310 includes instructions for performing the video communication background display method of any one of the above embodiments.

例如，程序310包括用于执行以下步骤所述的视频通信背景显示方法的指令：For example, program 310 includes instructions for performing the video communication background display method described in the following steps:

步骤01，获取当前用户的场景图像。Step 01: Acquire a scene image of the current user.

步骤02，获取当前用户的深度图像。Step 02: Obtain a depth image of the current user.

步骤03，处理场景图像和深度图像，以提取当前用户在场景图像中的人物区域而获得人物区域图像。Step 03: Process the scene image and the depth image to extract the character area of the current user in the scene image to obtain the character area image.

步骤04，对预设的第一虚拟背景图像进行加锁处理生成第二虚拟背景图像，并将人物区域图像与第二虚拟背景图像融合以得到合并图像，显示给与当前用户进行视频通信的目标用户。Step 04, performing locking processing on the preset first virtual background image to generate a second virtual background image, and merging the character area image with the second virtual background image to obtain a combined image, which is displayed to the target of video communication with the current user. user.

步骤05，确定当前用户与目标用户之间的熟悉度。Step 05: Determine the familiarity between the current user and the target user.

步骤06，根据熟悉度从第一虚拟背景图像中获取相应的图像元素，并在第二虚拟背景图像中向目标用户显示图像元素。Step 06: Acquire corresponding image elements from the first virtual background image according to the familiarity, and display the image elements to the target user in the second virtual background image.

再例如，程序310还包括用于执行以下步骤所述的视频通信背景显示方法的指令：For another example, the program 310 also includes instructions for executing the video communication background display method described in the following steps:

0331：解调结构光图像中各个像素对应的相位信息；0331: demodulate the phase information corresponding to each pixel in the structured light image;

0332：将相位信息转化为深度信息；和0332: Convert phase information to depth information; and

0333：根据深度信息生成深度图像。0333: Generate a depth image based on the depth information.

本发明实施方式的计算机可读存储介质包括与能够摄像的电子装置1000结合使用的计算机程序。计算机程序可被处理器200执行以完成上述任意一项实施方式的视频通信背景显示方法。The computer-readable storage medium of the embodiment of the present invention includes a computer program used in conjunction with the electronic device 1000 capable of imaging. The computer program can be executed by the processor 200 to implement the video communication background display method of any one of the above embodiments.

例如，计算机程序可被处理器200执行以完成以下步骤所述的视频通信背景显示方法：For example, a computer program can be executed by the processor 200 to perform the video communication background display method described in the following steps:

再例如，计算机程序还可被处理器200执行以完成以下步骤所述的视频通信背景显示方法：For another example, the computer program can also be executed by the processor 200 to complete the video communication background display method described in the following steps:

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically defined.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the invention includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present invention belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in conjunction with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

此外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it should be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

1. A video communication background display method for an electronic device, comprising:

acquiring a scene image of a current user;

acquiring a depth image of the current user;

processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

locking a preset first virtual background image to generate a second virtual background image, fusing the figure region image and the second virtual background image to obtain a combined image, and displaying the combined image to a target user performing video communication with the current user;

determining familiarity between the current user and the target user, the determining familiarity between the current user and the target user comprising: sending verification requests corresponding to different familiarity degrees to the target user; verifying according to the request response fed back by the target user and preset standard information, and determining the familiarity between the current user and the target user according to a verification result;

and acquiring corresponding image elements from the first virtual background image according to the familiarity, and displaying the image elements to the target user in the second virtual background image.

2. The method of claim 1, wherein the obtaining the depth image of the current user comprises:

projecting structured light towards the current user;

shooting a structured light image modulated by the current user; and

and demodulating phase information corresponding to each pixel of the structured light image to obtain the depth image.

3. The method of claim 2, wherein demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image comprises:

demodulating phase information corresponding to each pixel in the structured light image;

converting the phase information into depth information; and

and generating the depth image according to the depth information.

4. The method of claim 1, wherein the processing the scene image and the depth image to extract a human figure region of the current user in the scene image to obtain a human figure region image comprises:

identifying a face region in the scene image;

acquiring depth information corresponding to the face area from the depth image;

determining the depth range of the character region according to the depth information of the face region; and

and determining a person region which is connected with the face region and falls into the depth range according to the depth range of the person region to obtain the person region image.

5. The method of claim 4, further comprising:

processing the scene image to obtain a full-field edge image of the scene image; and

and correcting the image of the person region according to the full-field edge image.

6. The method according to claim 1, wherein the locking the preset first virtual background image to generate a second virtual background image comprises:

performing Gaussian blur processing on a preset first virtual background image to generate a second virtual background image;

or,

and performing mosaic processing on the preset first virtual background image to generate a second virtual background image.

7. The method of claim 1, wherein the determining familiarity between the current user and the target user further comprises:

detecting video interaction information of the current user and the target user according to a preset matching index;

and if the video interaction information is detected to meet the preset matching information, inquiring the preset corresponding relation between the matching information and the familiarity, and determining the familiarity between the current user and the target user.

8. The method according to claim 7, wherein the detecting video interaction information of the current user and the target user according to a preset matching index comprises:

and detecting content keywords of the voice information and the text information according to a preset matching index, and/or detecting the information content of the voice information and the text information.

9. The method of claim 1, wherein the determining familiarity between the current user and the target user further comprises:

acquiring a user image of the target user, and extracting facial feature information of the user image;

inquiring a preset image information base according to the facial feature information to acquire the identity information of the target user;

and inquiring the preset corresponding relation between the identity information and the familiarity, and determining the familiarity between the current user and the target user.

10. The method of claim 1, wherein the obtaining the corresponding image element from the first virtual background image according to the familiarity comprises:

acquiring a corresponding number of image elements from the first virtual background image according to the number of the articles corresponding to the familiarity; and/or the presence of a gas in the gas,

and acquiring corresponding types of image elements from the first virtual background image according to the types of the articles corresponding to the familiarity.

11. A video communication background display apparatus for an electronic apparatus, comprising:

the visible light camera is used for acquiring a scene image of a current user;

the depth image acquisition component is used for acquiring a depth image of the current user;

a processor, configured to process the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

12. The apparatus of claim 11, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting structured light to the current user;

the structured light camera is configured to:

shooting a structured light image modulated by the current user; and

13. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the video communication background display method of any of claims 1-10.

14. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device capable of capturing images, the computer program being executable by a processor to perform the video communication background display method of any of claims 1-10.