[go: up one dir, main page]

CN101651772A - Method for extracting video interested region based on visual attention - Google Patents

Method for extracting video interested region based on visual attention Download PDF

Info

Publication number
CN101651772A
CN101651772A CN200910152520A CN200910152520A CN101651772A CN 101651772 A CN101651772 A CN 101651772A CN 200910152520 A CN200910152520 A CN 200910152520A CN 200910152520 A CN200910152520 A CN 200910152520A CN 101651772 A CN101651772 A CN 101651772A
Authority
CN
China
Prior art keywords
video frame
depth
pixel
current
visual attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910152520A
Other languages
Chinese (zh)
Other versions
CN101651772B (en
Inventor
张云
蒋刚毅
郁梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Spparks Technology Co ltd
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN2009101525203A priority Critical patent/CN101651772B/en
Publication of CN101651772A publication Critical patent/CN101651772A/en
Application granted granted Critical
Publication of CN101651772B publication Critical patent/CN101651772B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于视觉注意的视频感兴趣区域的提取方法,该方法提取的感兴趣区域融合了静态图像域视觉注意、运动视觉注意和深度视觉注意,有效抑制各视觉注意提取内在的单一性和不准确性,解决了静态图像域视觉注意中的复杂背景引起的噪声问题,解决了运动视觉注意无法提取局部运动和运动幅度小的感兴趣区域,从而提高计算精度,增强算法的稳定性,能够从纹理复杂的背景和运动环境中提取出感兴趣区域;另外,通过该方法获取的感兴趣区域除符合人眼对静态纹理视频帧的视觉感兴趣特性和人眼对运动对象感兴趣的视觉特性外,还符合在立体视觉中对深度感强或距离近的对象感兴趣的深度感知特性,符合人眼立体视觉的语义特征。

The invention discloses a method for extracting a region of interest in a video based on visual attention. The region of interest extracted by the method combines visual attention in static image domain, visual attention in movement and depth visual attention, effectively suppressing the inherent singleness of each visual attention extraction. It solves the noise problem caused by the complex background in the visual attention of the static image domain, and solves the problem that the motion visual attention cannot extract the local motion and the region of interest with small motion range, thereby improving the calculation accuracy and enhancing the stability of the algorithm , can extract the region of interest from the background with complex texture and moving environment; in addition, the region of interest obtained by this method is not only in line with the human eye's visual interest characteristics of static texture video frames and the human eye's interest in moving objects In addition to visual characteristics, it also conforms to the depth perception characteristics of being interested in objects with a strong sense of depth or close distance in stereo vision, and conforms to the semantic characteristics of human stereo vision.

Description

一种基于视觉注意的视频感兴趣区域的提取方法 A Visual Attention-Based Method for Extracting Regions of Interest in Video

技术领域 technical field

本发明涉及一种视频信号的处理方法,尤其是涉及一种基于视觉注意的视频感兴趣区域的提取方法。The invention relates to a method for processing video signals, in particular to a method for extracting video interest regions based on visual attention.

背景技术 Background technique

立体电视,又称3DTV(Three Dimensional Television,三维电视),由于立体电视能够提供从平面到立体的跨越,给予观看者特有的立体感和真实感,因此受到了国内外研究机构和产业界的高度重视。2002年,在欧洲委员会支持的IST计划中启动了一个ATTEST(高级三维电视系统技术)项目,该项目目标致力于建立一条完整的可向后兼容的三维数字电视广播链系统。ATTEST项目的目标是提出一个3DTV广播链的新理念,与现有的二维广播实现向下兼容,并广泛地支持各种不同形式的二维和三维显示。ATTEST项目的主要设计理念在于提出了在传统二维视频图像传输的基础上,增加深度图(Depth Map)作为增强层信息,即“二维彩色视频加深度”的数据表示,以二维彩色视频加深度的方式在显示终端解码、重建三维视频,而且业界部分先进裸眼自由立体显示终端也已支持二维彩色视频加深度的显示模式。Stereoscopic TV, also known as 3DTV (Three Dimensional Television, three-dimensional TV), because stereoscopic TV can provide a leap from flat to three-dimensional, and give viewers a unique sense of three-dimensionality and reality, it has been highly recognized by research institutions and industries at home and abroad. Pay attention to. In 2002, an ATTEST (Advanced 3D Television System Technology) project was launched in the IST program supported by the European Commission. The project aims to establish a complete backward compatible 3D digital TV broadcasting chain system. The goal of the ATTEST project is to propose a new concept of 3DTV broadcasting chain, realize backward compatibility with existing 2D broadcasting, and widely support various forms of 2D and 3D display. The main design concept of the ATTEST project is to propose adding a depth map (Depth Map) as the enhancement layer information on the basis of traditional two-dimensional video image transmission, that is, the data representation of "two-dimensional color video plus depth". The way of adding depth is to decode and reconstruct 3D video on the display terminal, and some advanced naked-eye autostereoscopic display terminals in the industry have also supported the display mode of adding depth to 2D color video.

在人类视觉接收与处理系统中,由于大脑资源有限以及外界环境信息重要性区别,在处理过程中人脑对外界环境信息并不是一视同仁的,而是表现出选择特性,即感兴趣程度不同。一直以来,视频感兴趣区域的提取是视频压缩与通信、视频检索、模式识别等领域中基于内容的视频处理方法的核心和难点技术之一。视觉心理学研究表明,人眼的这种对于外界视觉输入的选择性或感兴趣程度的差异性,与人的视觉注意特性存在密不可分的联系。目前,视觉注意力线索研究主要划分为两个方面展开:自顶向下(Top-down)(也称概念驱动,Concept-driven)的注意力线索和自底向上(Bottom-up)(也称刺激驱动,Stimulus-driven)的注意力线索。自顶向下的注意力线索主要来自复杂的心理过程,并直接注意与场景中的某些对象,包括对象形状、动作以及模式等其他相关的识别特征,该线索受个人知识、兴趣爱好、潜意识等因素的影响,因人而异。另一种线索是自底向上的注意力线索,主要来自视频场景的视觉特征因素的对视皮层引起的直接刺激,主要包括颜色、亮度、方向等刺激,自底向上的注意力线索本能的、自动的,具有较好的普遍适用性,且相对稳定,基本不受个人知识、爱好等意识因素的影响,所以自底向上的注意力线索是自动感兴趣区域的提取方法研究的热点内容之一。In the human visual receiving and processing system, due to limited brain resources and differences in the importance of external environmental information, the human brain does not treat external environmental information equally in the processing process, but shows selective characteristics, that is, different degrees of interest. For a long time, the extraction of video regions of interest has been one of the core and difficult technologies of content-based video processing methods in the fields of video compression and communication, video retrieval, and pattern recognition. Visual psychology studies have shown that the human eye's selectivity or interest in external visual input is inseparable from the human visual attention characteristics. At present, the research on visual attention cues is mainly divided into two aspects: top-down (Top-down) (also called Concept-driven) attention cues and bottom-up (Bottom-up) (also called Concept-driven) attention cues. Stimulus-driven attentional cues. Top-down attention cues mainly come from complex psychological processes, and directly pay attention to certain objects in the scene, including object shapes, actions, patterns and other related identification features, which are influenced by personal knowledge, hobbies, subconscious The influence of other factors varies from person to person. Another kind of clue is the bottom-up attention clue, which mainly comes from the direct stimulation of the visual cortex caused by the visual feature factors of the video scene, mainly including color, brightness, direction and other stimuli. The bottom-up attention clue is instinctive, Automatic, has good universal applicability, and is relatively stable, and is basically not affected by conscious factors such as personal knowledge and hobbies. Therefore, bottom-up attention clues are one of the hot topics in the research of automatic region-of-interest extraction methods. .

然而,目前自动感兴趣区域的提取主要分为三类,1)、利用单个视点的图像内部信息,包括亮度、颜色、纹理或方向等刺激信息,提取人眼对当前视频帧的感兴趣区域,该方法主要提取亮度、颜色和纹理对比差异性较大的区域作为感兴趣区域,这样使得该方法难以适用于复杂背景环境的感兴趣区域提取;2)、基于人眼对运动区域感兴趣的视觉原理,利用视频帧间的运动信息作为主要线索来提取感兴趣区域,然而这种方法对于缓慢运动或局部运动的对象却难以准确提取,也难以适用于全局运动情况下的感兴趣区域提取;3)、采用静态纹理和运动信息相结合的提取方法,这种方法由于静态纹理与运动信息间的冗余和相关性较弱,并不能有效抑制各自存在的提取误差和噪声,从而使得提取精度不高。这三类传统方法由于可以利用的信息量的限制引起提取的感兴趣区域不够准确,稳定性欠佳;另一方面,传统方法并未考虑对深度感强或距离观看者较近的对象感兴趣的立体视觉特性,不能很好的表现具有立体视觉的人眼真正的感兴趣程度,从而难以适用于新一代立体(三维)/多视点视频中的符合立体视觉语义特征的感兴趣区域提取。However, the current automatic region of interest extraction is mainly divided into three categories, 1) using the internal information of a single viewpoint image, including stimulus information such as brightness, color, texture or direction, to extract the region of interest of the human eye on the current video frame, This method mainly extracts regions with large differences in brightness, color and texture contrast as regions of interest, which makes this method difficult to apply to the region of interest extraction in complex background environments; The principle is to use the motion information between video frames as the main clue to extract the region of interest. However, this method is difficult to accurately extract slow or local moving objects, and it is also difficult to apply to the region of interest extraction in the case of global motion; 3 ), using the extraction method combining static texture and motion information. Due to the weak redundancy and correlation between static texture and motion information, this method cannot effectively suppress the existing extraction errors and noises, so that the extraction accuracy is not good. high. Due to the limitation of the amount of information available, these three types of traditional methods cause the extracted regions of interest to be inaccurate and poor in stability; on the other hand, the traditional methods do not consider being interested in objects with a strong sense of depth or closer to the viewer. The stereoscopic characteristics of stereoscopic vision cannot well represent the true degree of interest of the human eye with stereoscopic vision, so it is difficult to apply to the region of interest extraction that conforms to the semantic features of stereoscopic vision in the new generation of stereoscopic (3D)/multi-viewpoint videos.

发明内容 Contents of the invention

本发明所要解决的技术问题是提供一种能够使提取得到的视频感兴趣区域的精度较高、稳定性较好,且所提取的视频感兴趣区域符合人眼立体视觉语义特征的基于视觉注意的视频感兴趣区域的提取方法。The technical problem to be solved by the present invention is to provide a visual attention-based system that can make the extracted video region of interest have higher precision and better stability, and the extracted video region of interest conforms to the semantic characteristics of human stereo vision. Method for extracting regions of interest from videos.

本发明解决上述技术问题所采用的技术方案为:一种基于视觉注意的视频感兴趣区域的提取方法,包括以下步骤:The technical solution adopted by the present invention to solve the above-mentioned technical problems is: a method for extracting a video region of interest based on visual attention, comprising the following steps:

①将二维彩色视频定义为纹理视频,定义纹理视频中各时刻的纹理视频帧的尺寸大小均为W×H,W为纹理视频中各时刻的纹理视频帧的宽,H为纹理视频中各时刻的纹理视频帧的高,记纹理视频中t时刻的纹理视频帧为Ft,定义纹理视频中t时刻的纹理视频帧Ft为当前纹理视频帧,采用公知的静态图像视觉注意检测方法检测当前纹理视频帧的静态图像域视觉注意,得到当前纹理视频帧的静态图像域视觉注意的分布图,记为SI,当前纹理视频帧的静态图像域视觉注意的分布图SI的尺寸大小为W×H且其为ZS比特深度表示的灰度图;①Define the two-dimensional color video as texture video, and define the size of the texture video frame at each moment in the texture video as W×H, where W is the width of the texture video frame at each moment in the texture video, and H is the The height of the texture video frame at time, record the texture video frame at time t in the texture video as F t , define the texture video frame F t at time t in the texture video as the current texture video frame, and use the known static image visual attention detection method to detect The static image domain visual attention of the current texture video frame obtains the distribution map of the static image domain visual attention of the current texture video frame, which is denoted as S I , and the size of the distribution map S I of the static image domain visual attention of the current texture video frame is: W×H and it is a grayscale image represented by Z S bit depth;

②采用运动视觉注意检测方法检测当前纹理视频帧的运动视觉注意,得到当前纹理视频帧的运动视觉注意的分布图,记为SM,当前纹理视频帧的运动视觉注意的分布图SM的尺寸大小为W×H且其为ZS比特深度表示的灰度图;② Use the motion visual attention detection method to detect the motion visual attention of the current texture video frame, and obtain the distribution map of the motion visual attention of the current texture video frame, denoted as S M , the size of the motion visual attention distribution map S M of the current texture video frame A grayscale image of size W×H and Z bit depth representation;

③定义纹理视频对应的深度视频中各时刻的深度视频帧为ZD比特深度表示的灰度图,将深度视频中各时刻的深度视频帧的尺寸大小均设置为W×H,W为深度视频中各时刻的深度视频帧的宽,H为深度视频中各时刻的深度视频帧的高,记深度视频中t时刻的深度视频帧为Dt,定义深度视频中t时刻的深度视频帧Dt为当前深度视频帧,采用深度视觉注意检测方法检测当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意,得到三维视频图像的深度视觉注意的分布图,记为SD,三维视频图像的深度视觉注意的分布图SD的尺寸大小为W×H且其为ZS比特深度表示的灰度图;③ Define the depth video frame at each moment in the depth video corresponding to the texture video as a grayscale image represented by Z D bit depth, set the size of the depth video frame at each moment in the depth video to W×H, W is the depth video The width of the depth video frame at each moment in the depth video, H is the height of the depth video frame at each moment in the depth video, record the depth video frame at time t in the depth video as D t , define the depth video frame D t at time t in the depth video is the current depth video frame, using the depth visual attention detection method to detect the depth visual attention of the 3D video image jointly displayed by the current depth video frame and the current texture video frame, and obtain the distribution map of the depth visual attention of the 3D video image, denoted as SD , The distribution map SD of the depth visual attention of three-dimensional video image has a size of W×H and it is a grayscale image represented by Z S bit depth;

④采用基于深度感知的视觉注意融合方法将当前纹理视频帧的静态图像域视觉注意的分布图SI、当前纹理视频帧的运动视觉注意的分布图SM、当前深度视频帧及当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD融合,以提取符合人眼立体感知的三维视觉注意的分布图,记为S,三维视觉注意的分布图S的尺寸大小为W×H且其为ZS比特深度表示的灰度图;④Adopt the visual attention fusion method based on depth perception to combine the static image domain visual attention distribution map S I of the current texture video frame, the motion visual attention distribution map S M of the current texture video frame, the current depth video frame and the current depth video frame The depth visual attention distribution map S D of the 3D video image jointly displayed with the current texture video frame is fused to extract the 3D visual attention distribution map that conforms to the stereoscopic perception of the human eye, denoted as S, the size of the 3D visual attention distribution map S A grayscale image of size W×H and Z bit depth representation;

⑤对三维视觉注意的分布图S进行阈值化和宏块化后处理,获取当前纹理视频帧的最终的符合人眼立体感知的感兴趣区域;⑤ Thresholding and macro-blocking post-processing are performed on the distribution map S of 3D visual attention, and the final region of interest conforming to the stereoscopic perception of the human eye is obtained for the current texture video frame;

⑥重复步骤①~⑤直至处理完纹理视频中的所有纹理视频帧,获取纹理视频的视频感兴趣区域。⑥Repeat steps ①~⑤ until all the texture video frames in the texture video are processed, and the video ROI of the texture video is obtained.

所述的步骤②中的运动视觉注意检测方法的具体过程为:The specific process of the motion visual attention detection method in the described step 2. is:

②-1、记纹理视频中与当前纹理视频帧时间上连续的t+j时刻的纹理视频帧为Ft+j,记纹理视频中与当前纹理视频帧时间上连续的t-j时刻的纹理视频帧为Ft-j,其中,j∈(0,NF/2],NF为小于10的正整数;②-1. Record the texture video frame at time t+j continuous with the current texture video frame in the texture video as F t+j , record the texture video frame at the time tj time continuous with the current texture video frame in the texture video is F tj , where, j∈(0, N F /2], N F is a positive integer less than 10;

②-2、采用公知的光流法计算当前纹理视频帧与t+j时刻的纹理视频帧Ft+j在水平方向的运动向量图像和竖直方向的运动向量图像,及当前纹理视频帧与t-j时刻的纹理视频帧Ft-j在水平方向的运动向量图像和竖直方向的运动向量图像,记当前纹理视频帧与t+j时刻的纹理视频帧Ft+j在水平方向的运动向量图像为Vt+j H及竖直方向的运动向量图像为Vt+j V,记当前纹理视频帧与t-j时刻的纹理视频帧Ft-j在水平方向的运动向量图像为Vt-j H及竖直方向的运动向量图像为Vt-j V,Vt+j H、Vt+j V、Vt-j H及Vt-j V的宽为W及高为H;②-2, using the known optical flow method to calculate the motion vector image of the current texture video frame and the texture video frame F t+j at the moment t +j in the horizontal direction and the motion vector image of the vertical direction, and the current texture video frame and The motion vector image of the texture video frame F tj in the horizontal direction and the motion vector image in the vertical direction of the texture video frame F tj at the moment tj, record the motion vector image of the current texture video frame and the texture video frame F t+j of the moment t +j in the horizontal direction as The motion vector image of V t+j H and the vertical direction is V t+j V , and the motion vector image of the current texture video frame and the texture video frame F tj at the moment tj in the horizontal direction is V tj H and the motion vector image of the vertical direction The motion vector image is V tj V , the width of V t+j H , V t+j V , V tj H and V tj V is W and the height is H;

②-3、将Vt+j H的绝对值与Vt+j V的绝对值叠加得到当前纹理视频帧与t+j时刻的纹理视频帧Ft+j的运动幅度图像,记为Mt+j M t + j = | V t + j H | + | V t + j V | , 记Mt+j中坐标为(x,y)的像素的运动幅度值为mt+j(x,y);将Vt-j H的绝对值与Vt-j V的绝对值叠加得到当前纹理视频帧与t-j时刻的纹理视频帧Ft-j的运动幅度图像,记为Mt-j M t - j = | V t - j H | + | V t - j V | , 记Mt-j中坐标为(x,y)的像素的运动幅度值为mt-j(x,y);②-3. Superimpose the absolute value of V t+j H and the absolute value of V t+j V to obtain the motion amplitude image of the current texture video frame and the texture video frame F t+ j at time t+j, denoted as M t +j , m t + j = | V t + j h | + | V t + j V | , Record the motion amplitude value of the pixel whose coordinates are (x, y) in M t+j is m t+j (x, y); superimpose the absolute value of V tj H and the absolute value of V tj V to obtain the current texture video frame and the motion amplitude image of texture video frame F tj at time tj, denoted as M tj , m t - j = | V t - j h | + | V t - j V | , Note that the motion amplitude value of the pixel whose coordinates are (x, y) in M tj is m tj (x, y);

②-4、利用当前纹理视频帧和t+j时刻的纹理视频帧Ft+j及t-j时刻的纹理视频帧Ft-j,提取联合运动图,记为Mj Δ,提取联合运动图Mj Δ的具体过程为:判断当前纹理视频帧与t+j时刻的纹理视频帧Ft+j的运动幅度图像Mt+j中的各个像素和当前纹理视频帧与t-j时刻的纹理视频帧Ft-j的运动幅度图像Mt-j中对应坐标的像素的运动幅度值中的最小值是否大于设定的第一阈值T1,如果是,则确定联合运动图Mj Δ中相应坐标的像素的像素值为Mt+j和Mt-j中对应坐标的像素的运动幅度值之和的平均,否则,确定联合运动图Mj Δ中相应坐标的像素的像素值为0;对于Mt+j中坐标为(x,y)的像素和Mt-j中坐标为(x,y)的像素,判断min(mt+j(x,y),mt-j(x,y))是否大于设定的第一阈值T1,如果是,则确定联合运动图Mj Δ中坐标为(x,y)的像素的像素值为

Figure G2009101525203D00043
否则,确定联合运动图Mj Δ中坐标为(x,y)的像素的像素值为0,其中,min()为取最小值函数;②-4. Use the current texture video frame and the texture video frame F t+j at time t+j and the texture video frame F tj at time tj to extract the joint motion map, denoted as M j Δ , and extract the joint motion map M j Δ The specific process is: judging the current texture video frame and the texture video frame F t+j at the time t+j of each pixel in the motion range image M t+j and the current texture video frame and the texture video frame F tj at the time tj Whether the minimum value of the motion amplitude value of the pixel corresponding to the coordinate in the motion amplitude image M tj is greater than the set first threshold T 1 , if yes, then determine the pixel value M of the pixel corresponding to the coordinate in the joint motion map M j Δ The average of the sum of the motion amplitude values of the pixels corresponding to the coordinates in t+j and M tj , otherwise, the pixel value of the pixel corresponding to the coordinates in the joint motion map M j Δ is determined to be 0; for M t+j , the coordinates are (x , y) and the pixel whose coordinates are (x, y) in M tj , determine whether min(m t+j (x, y), m tj (x, y)) is greater than the set first threshold T 1 , if yes, determine the pixel value of the pixel with coordinates (x, y) in the joint motion map M j Δ
Figure G2009101525203D00043
Otherwise, determine that the pixel value of the pixel whose coordinates are (x, y) in the joint motion map M j Δ is 0, where min() is a minimum value function;

②-5、将在时间上与t时刻距离1时刻至NF/2时刻的各个时刻的联合运动图加权叠加得到当前纹理视频帧的加权联合运动图,记为M,记当前纹理视频帧的加权联合运动图M中坐标为(x,y)的像素的像素值为m(x,y), m ( x , y ) = Σ j = 1 N F / 2 ζ j m j Δ ( x , y ) , 其中,mj Δ(x,y)表示在时间上与t时刻距离j时刻的联合运动图Mj Δ中坐标为(x,y)的像素的像素值,、ζi为加权系数,加权系数ζi满足 Σ j = 1 N F / 2 ζ j = 1 ; 2.-5. The weighted joint motion map obtained from the weighted joint motion map of the current texture video frame is obtained by the weighted joint motion map of each moment from 1 moment to NF /2 moment with the t moment in time, denoted as M, and the number of the current texture video frame The pixel value of the pixel whose coordinates are (x, y) in the weighted joint motion map M is m(x, y), m ( x , the y ) = Σ j = 1 N f / 2 ζ j m j Δ ( x , the y ) , Among them, m j Δ (x, y) represents the pixel value of the pixel whose coordinates are (x, y) in the joint motion map M j Δ at time j at a distance from time t in time, and ζ i is the weighting coefficient, and the weighting coefficient ζ i satisfies Σ j = 1 N f / 2 ζ j = 1 ;

②-6、对当前纹理视频帧的加权联合运动图M进行高斯金字塔分解,分解成nL层加权联合运动图,记加权联合运动图M高斯金字塔分解后得到的第i层加权联合运动图为M(i),第i层加权联合运动图M(i)的宽和高分别为W/2i和H/2i,其中,nL为小于20的正整数,i∈[0,nL-1],W为当前纹理视频帧的宽,H为当前纹理视频帧的高;②-6. Decompose the weighted joint motion map M of the current texture video frame into a Gaussian pyramid, decompose it into n L layers of weighted joint motion maps, and record the i-th layer weighted joint motion map obtained after the Gaussian pyramid decomposition of the weighted joint motion map M is M(i), the width and height of the i-th layer weighted joint motion map M(i) are W/2 i and H/2 i respectively, where n L is a positive integer less than 20, i∈[0,n L -1], W is the width of the current texture video frame, and H is the height of the current texture video frame;

②-7、利用当前纹理视频帧的加权联合运动图M的nL层加权联合运动图,提取当前纹理视频帧的运动视觉注意的分布图SM,记SM中坐标为(x,y)的像素的像素值为sm(x,y),SM=FM,其中,

Figure G2009101525203D00052
Figure G2009101525203D00053
s,c∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure G2009101525203D00054
为归一化至-1区间的归一化函数,符号“||”为绝对值运算符号,M(c)为第c层加权联合运动图,M(s)为第s层加权联合运动图,符号为M(c)与M(s)进行跨层级作差运算符,如果c<s,则将M(s)上采样至与M(c)具有相同分辨率的图像上,然后将M(c)的各个像素与上采样后的M(s)相对应像素分别进行作差,如果c>s,则将M(c)上采样至与M(s)具有相同分辨率的图像上,然后将M(s)的各个像素与上采样后的M(c)相对应像素分别进行作差,符号
Figure G2009101525203D00057
为M(c)与M(s)进行跨层级相加运算符,如果c<s,则将M(s)上采样至与M(c)具有相同分辨率的图像上,然后将M(c)的各个像素与上采样后的M(s)相对应像素分别进行求和,如果c>s,则将M(c)上采样至与M(s)具有相同分辨率的图像上,然后将M(s)的各个像素与上采样后的M(c)相对应像素分别进行求和。2.-7. Utilize the weighted joint motion map of the n L layer weighted joint motion map M of the weighted joint motion map M of the current texture video frame to extract the distribution map S M of the motion visual attention of the current texture video frame, and record the coordinates in S M as (x, y) The pixel value of the pixel is s m (x, y), S M =F M , where,
Figure G2009101525203D00052
Figure G2009101525203D00053
s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3},
Figure G2009101525203D00054
to normalize to The normalization function of the -1 interval, the symbol "||" is the absolute value operation symbol, M(c) is the weighted joint motion map of the c-th layer, M(s) is the weighted joint motion map of the s-th layer, the symbol Perform a cross-level difference operator for M(c) and M(s), if c<s, then upsample M(s) to an image with the same resolution as M(c), and then M(c ) and the corresponding pixels of the upsampled M(s) are respectively differentiated. If c>s, then M(c) is upsampled to an image with the same resolution as M(s), and then The difference between each pixel of M(s) and the corresponding pixel of M(c) after upsampling is respectively performed, and the sign
Figure G2009101525203D00057
Perform a cross-level addition operator for M(c) and M(s), if c<s, then upsample M(s) to an image with the same resolution as M(c), and then M(c ) and the corresponding pixels of the upsampled M(s) are summed separately, if c>s, then M(c) is upsampled to an image with the same resolution as M(s), and then Each pixel of M(s) is summed with the corresponding pixel of M(c) after upsampling.

所述的步骤②-4中设定的第一阈值T1=1。The first threshold T 1 set in the step ②-4=1.

所述的步骤③中的深度视觉注意检测方法的具体过程为:The concrete process of the deep visual attention detection method in described step 3. is:

③-1、对当前深度视频帧进行高斯金字塔分解,分解成nL层深度视频帧,记当前深度视频帧高斯金字塔分解后得到的第i层深度视频帧为D(i),第i层深度视频帧D(i)的宽和高分别为W/2i和H/2i,其中,nL为小于20的正整数,i∈[0,nL-1],W为当前深度视频帧的宽,H为当前深度视频帧的高;③-1. Decompose the current depth video frame into a Gaussian pyramid, and decompose it into n L -layer depth video frames. Record the i-th layer depth video frame obtained after the Gaussian pyramid decomposition of the current depth video frame as D(i), and the i-th layer depth The width and height of the video frame D(i) are W/2 i and H/2 i respectively, where n L is a positive integer less than 20, i∈[0,n L -1], and W is the current depth video frame width, H is the height of the current depth video frame;

③-2、利用当前深度视频帧的nL层深度视频帧,提取当前深度视频帧的深度特征图,记为FD其中,

Figure G2009101525203D00062
s,c ∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure G2009101525203D00063
为归一化至区间的归一化函数,符号“||”为绝对值运算符号,D(c)为第c层深度视频帧,D(s)为第s层深度视频帧,符号
Figure G2009101525203D00065
为D(c)与D(s)进行跨层级作差运算符,如果c<s,则将D(s)上采样至与D(c)具有相同分辨率的图像上,然后将D(c)的各个像素与上采样后的D(s)相对应像素分别进行作差,如果c>s,则将D(c)上采样至与D(s)具有相同分辨率的图像上,然后将D(s)的各个像素与上采样后的D(c)相对应像素分别进行作差,符号
Figure G2009101525203D00067
为D(c)与D(s)进行跨层级相加运算符,如果c<s,则将D(s)上采样至与D(c)具有相同分辨率的图像上,然后将D(c)的各个像素与上采样后的D(s)相对应像素分别进行求和,如果c>s,则将D(c)上采样至与D(s)具有相同分辨率的图像上,然后将D(s)的各个像素与上采样后的D(c)相对应像素分别进行求和;③-2. Utilize the n L depth video frames of the current depth video frame to extract the depth feature map of the current depth video frame, denoted as F D , in,
Figure G2009101525203D00062
s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3},
Figure G2009101525203D00063
to normalize to The normalization function of the interval, the symbol "||" is the absolute value operation symbol, D(c) is the depth video frame of the c-th layer, D(s) is the depth video frame of the s-th layer, the symbol
Figure G2009101525203D00065
Perform a cross-level difference operator for D(c) and D(s), if c<s, then upsample D(s) to an image with the same resolution as D(c), and then use D(c Each pixel of ) corresponds to the upsampled D(s) The pixels are respectively differentiated. If c>s, D(c) is up-sampled to an image with the same resolution as D(s), and then each pixel of D(s) is combined with the up-sampled D(c ) for the corresponding pixels to make a difference respectively, and the symbol
Figure G2009101525203D00067
Perform a cross-level addition operator for D(c) and D(s), if c<s, then upsample D(s) to an image with the same resolution as D(c), and then D(c ) and the corresponding pixels of the upsampled D(s) are summed separately, if c>s, then D(c) is upsampled to an image with the same resolution as D(s), and then Each pixel of D(s) is summed with the corresponding pixel of D(c) after upsampling;

③-3、采用公知的0度、π/4度、π/2度和3π/4度方向Gabor滤波器对当前深度视频帧作卷积运算,以提取0度、π/4度、π/2度和3π/4度方向的四个方向分量,得到当前深度视频帧的四个方向分量图,四个方向分量图分别表示为O0 D、Oπ/4 D、Oπ/2 D和O3π/4 D;对当前深度视频帧的O0 D方向分量图、Oπ/4 D方向分量图、Oπ/2 D方向分量图和O3π/4 D方向分量图分别进行高斯金字塔分解,各分解成nL层方向分量图,记θ度方向的方向分量图经高斯金字塔分解后得到的第i层方向分量图为Oθ D(i),Oθ D(i)的宽和高分别为W/2i和H/2i,其中,θ∈{0,π/4,π/2,3π/4}i∈[0,nL-1],W为当前深度视频帧的宽,H为当前深度视频帧的高;③-3. Use the well-known 0 degree, π/4 degree, π/2 degree and 3π/4 degree direction Gabor filter to perform convolution operation on the current depth video frame to extract 0 degree, π/4 degree, π/ The four direction components of the 2 degree and 3π/4 degree directions are obtained to obtain the four direction component maps of the current depth video frame, and the four direction component maps are represented as O 0 D , O π/4 D , O π/2 D and O 3π/4 D ; Gaussian pyramid decomposition is performed on the O 0 D direction component map, O π/4 D direction component map, O π/2 D direction component map and O 3π/4 D direction component map of the current depth video frame , each is decomposed into n L layer direction component graphs, and the i-th layer direction component graph obtained after Gaussian pyramid decomposition of the direction component graph in the θ degree direction is O θ D (i), the width and height of O θ D (i) are W/2 i and H/2 i respectively, where, θ∈{0, π/4, π/2, 3π/4}i∈[0, n L -1], W is the width of the current depth video frame , H is the height of the current depth video frame;

③-4、利用当前深度视频帧的各度方向的方向分量图的nL层方向分量图,提取当前深度视频帧的初步深度方向特征图,记为F′DO F &OverBar; DO &prime; = 1 4 &Sigma; &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } F &OverBar; O &theta; , 其中,

Figure G2009101525203D00073
s,c ∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure G2009101525203D00074
为归一化至
Figure G2009101525203D00075
区间的归一化函数,符号“||”为绝对值运算符号,Oθ D(c)为θ度方向的方向分量图的第c层方向分量图,Oθ D(s)为θ度方向的方向分量图的第s层方向分量图,符号
Figure G2009101525203D00076
为Oθ D(c)与Oθ D(s)进行跨层级作差运算符,如果c<s,则将Oθ D(s)上采样至与Oθ D(c)具有相同分辨率的图像上,然后将Oθ D(c)的各个像素与上采样后的Oθ D(s)相对应像素分别进行作差,如果c>s,则将Oθ D(c)上采样至与Oθ D(s)具有相同分辨率的图像上,然后将Oθ D(s)的各个像素与上采样后的Oθ D(c)相对应像素分别进行作差,符号
Figure G2009101525203D00077
为Oθ D(c)与Oθ D(s)进行跨层级相加运算符,如果c<s,则将Oθ D(s)上采样至与Oθ D(c)具有相同分辨率的图像上,然后将Oθ D(c)的各个像素与上采样后的Oθ D(s)相对应像素分别进行求和,如果c>s,则将Oθ D(c)上采样至与Oθ D(s)具有相同分辨率的图像上,然后将Oθ D(s)的各个像素与上采样后的Oθ D(c)相对应像素分别进行求和;3.-4, utilize the nL layer direction component map of the direction component map of each degree direction of the current depth video frame to extract the preliminary depth direction feature map of the current depth video frame, denoted as F ' DO , f &OverBar; do &prime; = 1 4 &Sigma; &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } f &OverBar; o &theta; , in,
Figure G2009101525203D00073
s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3},
Figure G2009101525203D00074
to normalize to
Figure G2009101525203D00075
The normalization function of the interval, the symbol "||" is the absolute value operation symbol, O θ D (c) is the c-th layer direction component map of the direction component map in the θ degree direction, O θ D (s) is the θ degree direction The s-th layer direction component map of the direction component map, symbol
Figure G2009101525203D00076
Perform a cross-level difference operator for O θ D (c) and O θ D (s), if c<s, then upsample O θ D (s) to the same resolution as O θ D (c) On the image, then make a difference between each pixel of O θ D (c) and the corresponding pixel of the up-sampled O θ D (s), if c>s, then up-sample O θ D (c) to the same O θ D (s) has the same resolution on the image, and then make a difference between each pixel of O θ D (s) and the corresponding pixel of O θ D (c) after upsampling, the symbol
Figure G2009101525203D00077
Perform a cross-level addition operator for O θ D (c) and O θ D (s), if c < s, then upsample O θ D (s) to the same resolution as O θ D (c) On the image, each pixel of O θ D (c) is summed with the corresponding pixel of the upsampled O θ D (s), if c>s, then O θ D (c) is upsampled to the same value as O θ D (s) on an image with the same resolution, and then sum each pixel of O θ D (s) with the corresponding pixel of the upsampled O θ D (c);

③-5、采用公知的形态学膨胀算法以大小为w1×h1的块为基本膨胀单元对当前深度视频帧的初步深度方向特征图F′DO进行n1次膨胀操作,得到当前深度视频帧的深度方向特征图,记为FDO③-5. Using the known morphological dilation algorithm, take the block of size w 1 ×h 1 as the basic dilation unit to perform n 1 dilation operations on the preliminary depth direction feature map F′ DO of the current depth video frame to obtain the current depth video The depth direction feature map of the frame, denoted as F DO ;

③-6、利用当前深度视频帧的深度特征图FD和深度方向特征图FDO,获取当前深度视频帧的初步深度视觉注意的分布图,记为S′D

Figure G2009101525203D00078
记S′D中坐标为(x,y)的像素的像素值为s′d(x,y),其中,
Figure G2009101525203D00079
为归一化至
Figure G2009101525203D000710
区间的归一化函数;③-6. Utilize the depth feature map F D and the depth direction feature map F DO of the current depth video frame to obtain the distribution map of the preliminary depth visual attention of the current depth video frame, denoted as S′ D ,
Figure G2009101525203D00078
Note that the pixel value of the pixel whose coordinates are (x, y) in S′ D is s′ d (x, y), where,
Figure G2009101525203D00079
to normalize to
Figure G2009101525203D000710
The normalization function of the interval;

③-7、利用当前深度视频帧的初步深度视觉注意的分布图S′D,获取当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD,记SD中坐标为(x,y)的像素的像素值为sd(x,y),sd(x,y)=s′d(x,y)·g(x,y),其中, g ( x , y ) = 0.2 if x < b | | y < b | | x > W - b | | y > H - b 0.4 elseif x < 2 b | | y < 2 b | | x > W - 2 b | | y > H - 2 b 0.6 elseif x < 3 b | | y < 3 b | | x > W - 3 b | | y > H - 3 b 1 else , W为当前深度视频帧的宽,H为当前深度视频帧的高,b为设定的第二阈值,符号“||为“或”运算符。③-7. Using the distribution map S′ D of the preliminary depth visual attention of the current depth video frame, obtain the distribution map S D of the depth visual attention of the 3D video image jointly presented by the current depth video frame and the current texture video frame, denote S D The pixel value of the pixel whose coordinates are (x, y) is s d (x, y), s d (x, y) = s′ d (x, y) g(x, y), where, g ( x , the y ) = 0.2 if x < b | | the y < b | | x > W - b | | the y > h - b 0.4 else if x < 2 b | | the y < 2 b | | x > W - 2 b | | the y > h - 2 b 0.6 else if x < 3 b | | the y < 3 b | | x > W - 3 b | | the y > h - 3 b 1 else , W is the width of the current depth video frame, H is the height of the current depth video frame, b is the set second threshold, and the symbol "|| is an "or" operator.

所述的步骤③-5中w1=8,h1=8,n1=2,所述的步骤③-7中设定的第二阈值b为16。In the step ③-5, w 1 =8, h 1 =8, n 1 =2, and the second threshold b set in the step ③-7 is 16.

所述的步骤④中的基于深度感知的视觉注意融合方法的具体过程为:The specific process of the visual attention fusion method based on depth perception in described step 4. is:

④-1、通过Q(d(x,y))=d(x,y)+γ对当前深度视频帧进行尺度变换,其中,γ为值在

Figure G2009101525203D00082
范围内的系数,d(x,y)表示当前深度视频帧中坐标为(x,y)的像素的像素值,Q(d(x,y))表示尺度变换后的当前深度视频帧中坐标为(x,y)的像素的像素值;④-1. Carry out scale transformation to the current depth video frame by Q(d(x, y))=d(x, y)+γ, wherein, γ is the value in
Figure G2009101525203D00082
Coefficients within the range, d(x, y) represents the pixel value of the pixel whose coordinates are (x, y) in the current depth video frame, Q(d(x, y)) represents the coordinates in the current depth video frame after scale transformation is the pixel value of the pixel at (x, y);

④-2、利用尺度变换后的当前深度视频帧、当前深度视频帧及当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD、当前纹理视频帧的运动视觉注意的分布图SM以及当前纹理视频帧的静态图像域视觉注意的分布图SI,获取三维视觉注意的分布图S,记三维视觉注意的分布图S中坐标为(x,y)的像素的像素值为s(x,y),

Figure G2009101525203D00083
其中,KD、KM和KI分别SD、SM以及SI的加权系数,加权系数满足条件: &Sigma; a &Element; { D , M , I } K a = 1 , 0≤Ka≤1,
Figure G2009101525203D00085
为归一化至
Figure G2009101525203D00086
区间的归一化函数,sD(x,y)、sM(x,y)和sI(x,y)分别表示SD、SM以及SI中坐标为(x,y)的像素的像素值,Θab(x,y)为视觉注意的相关值,Θab(x,y)=min(sa(x,y),sb(x,y)),min()为取最小值函数,Cab为相关系数,相关系数满足条件: &Sigma; a , b &Element; { D , M , I } , a &NotEqual; b C ab = 1 , 0≤Cab<1,相关系数CDM表示SD与SM的相关度,相关系数CDI表示SD与SI的相关度,相关系数CIM表示SI与SM的相关度,a,b ∈{D,M,I}且a≠b。④-2. Using the scale-transformed current depth video frame, the current depth video frame, and the depth visual attention distribution map SD of the 3D video image jointly presented by the current depth video frame and the current texture video frame, and the motion of the current texture video frame The distribution map S M of visual attention and the distribution map S I of visual attention in the static image domain of the current texture video frame, the distribution map S of three-dimensional visual attention is obtained, and the coordinates in the distribution map S of three-dimensional visual attention are (x, y) The pixel value of the pixel is s(x, y),
Figure G2009101525203D00083
Among them, K D , K M and K I are the weighting coefficients of S D , S M and S I respectively, and the weighting coefficients meet the conditions: &Sigma; a &Element; { D. , m , I } K a = 1 , 0≤K a ≤1,
Figure G2009101525203D00085
to normalize to
Figure G2009101525203D00086
The normalization function of the interval, s D (x, y), s M (x, y) and s I (x, y) represent the pixel with coordinates (x, y) in S D , S M and S I respectively , Θ ab (x, y) is the relevant value of visual attention, Θ ab (x, y)=min(s a (x, y), s b (x, y)), min() is the The minimum value function, C ab is the correlation coefficient, and the correlation coefficient satisfies the conditions: &Sigma; a , b &Element; { D. , m , I } , a &NotEqual; b C ab = 1 , 0≤C ab <1, the correlation coefficient C DM represents the correlation between SD and S M , the correlation coefficient C DI represents the correlation between SD and S I , and the correlation coefficient C IM represents the correlation between S I and S M , a , b ∈ {D, M, I} and a≠b.

所述的步骤⑤中对三维视觉注意的分布图S进行阈值化和宏块化后处理的具体过程为:In the described step 5., the distribution map S of three-dimensional visual attention is carried out the specific process of thresholding and macroblocking post-processing as:

⑤-1、记三维视觉注意的分布图S中坐标为(x,y)的像素的像素值为s(x,y),定义第三阈值TS T S = k T &CenterDot; &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 s ( x , y ) / ( W &times; H ) , 其中,W为三维视觉注意的分布图S的宽,H为三维视觉注意的分布图S的高,kT∈(0,3);新建一个初步二值掩模图像,判断s(x,y)≥TS是否成立,如果成立,则将初步二值掩模图像中坐标为(x,y)的像素标记为感兴趣像素,否则,将初步二值掩模图像中(x,y)坐标的像素标记为非感兴趣像素;⑤-1. The pixel value of the pixel whose coordinates are (x, y) is s(x, y) in the distribution map S that remembers the three-dimensional visual attention, defines the third threshold T S , T S = k T &Center Dot; &Sigma; the y = 0 h - 1 &Sigma; x = 0 W - 1 the s ( x , the y ) / ( W &times; h ) , Among them, W is the width of the distribution map S of 3D visual attention, H is the height of the distribution map S of 3D visual attention, k T ∈ (0, 3); create a new preliminary binary mask image, judge s(x, y ) ≥ T S is true, if it is true, mark the pixel with coordinates (x, y) in the preliminary binary mask image as the pixel of interest, otherwise, set the coordinates (x, y) in the preliminary binary mask image The pixels of are marked as non-interest pixels;

⑤-2、将初步二值掩模图像分割成(W/w2)×(H/h2)个尺寸大小为w2×h2的块,且块与块之间互不重叠,记横坐标为u且纵坐标为v的块为Bu,v,其中,u∈[0,W/w2-1],v∈[0,H/h2-1],根据初步二值掩模图像中的各个块确定当前纹理视频帧中的对应的各个块中的像素为感兴趣像素还是非感兴趣像素,对于块Bu,v,判断块Bu,v中标记为感兴趣像素的像素的个数是否大于设定的第四阈值Tb,其中,0≤Tb≤w2×h2,如果是,则将当前纹理视频帧中与块Bu,v对应的块中的所有像素标记为感兴趣像素,并将块Bu,v对应的块作为感兴趣区域块,否则,将当前纹理视频帧中与块Bu,v对应的块中的所有像素标记为非感兴趣像素,并将块Bu,v对应的块作为非感兴趣区域块,得到当前纹理视频帧的初步感兴趣区域掩模图像,该初步感兴趣区域掩模图像由感兴趣区域块和非感兴趣区域块组成;⑤-2. Divide the preliminary binary mask image into (W/w 2 )×(H/h 2 ) blocks with a size of w 2 ×h 2 , and the blocks do not overlap each other, mark horizontally The block with coordinate u and ordinate v is B u, v , where u ∈ [0, W/w 2 -1], v ∈ [0, H/h 2 -1], according to the preliminary binary mask Each block in the image determines whether the pixel in the corresponding block in the current texture video frame is a pixel of interest or a non-interest pixel. For block B u, v , determine the pixel marked as a pixel of interest in block B u, v Whether the number of is greater than the set fourth threshold T b , wherein, 0≤T b ≤w 2 ×h 2 , if yes, all pixels in the block corresponding to the block B u, v in the current texture video frame Mark as a pixel of interest, and use the block corresponding to the block B u, v as the region of interest block, otherwise, mark all the pixels in the block corresponding to the block B u, v in the current texture video frame as non-interest pixels, And the block corresponding to block B u, v is regarded as the non-interest area block, obtains the initial interest area mask image of the current texture video frame, and the preliminary interest area mask image is composed of the interest area block and the non-interest area block composition;

⑤-3、将初步感兴趣区域掩模图像中与感兴趣区域块最相邻的非感兴趣区域块中的所有像素标记为第NR级过渡感兴趣区域,更新初步感兴趣区域掩模图像;然后,将更新后的初步感兴趣区域掩模图像中与第NR级过渡感兴趣区域最邻近的非感兴趣区域块中的所有像素标记为第NR-1级过渡感兴趣区域,递归更新初步感兴趣区域掩模图像;再重复递归上述过程,直至标记到第1级过渡感兴趣区域;最后得到当前纹理视频帧的最终的感兴趣区域掩模图像,该最终的感兴趣区域掩模图像由感兴趣区域块、NR级过渡感兴趣区域和非感兴趣区域块组成;⑤-3. Mark all the pixels in the non-region of interest block closest to the region of interest block in the preliminary region of interest mask image as the N R transition region of interest, and update the preliminary region of interest mask image ; Then, mark all the pixels in the non-ROI block closest to the NRth- level transitional ROI in the updated preliminary ROI mask image as the NR -1th transitional ROI, recursively Update the initial ROI mask image; repeat the recursive above process until the first-level transition ROI is marked; finally get the final ROI mask image of the current texture video frame, the final ROI mask The image is composed of ROI blocks, NR- level transition ROIs and non-ROI blocks;

⑤-4、记最终的感兴趣区域掩模图像中坐标为(x,y)的像素的像素值为r(x,y),将最终的感兴趣区域掩模图像中非感兴趣区域块中的所有像素的像素值置为r(x,y)=255,将最终的感兴趣区域掩模图像中NR级过渡感兴趣区域中的所有像素的像素值置为 r ( x , y ) = e N R + 1 &times; f ( x , y ) , 将最终的感兴趣区域掩模图像中感兴趣区域块中的所有像素的像素值置为r(x,y)=f(x,y),得到当前纹理视频帧的感兴趣区域,其中,e表示过渡感兴趣区域的级数,e∈[1,NR],f(x,y)表示当前纹理视频帧中坐标为(x,y)的像素的像素值。⑤-4. Record the pixel value of the pixel whose coordinates are (x, y) in the final region of interest mask image r(x, y), and place the non-interest region block in the final region of interest mask image Set the pixel values of all pixels in r(x, y)=255, and set the pixel values of all pixels in the N R- level transition ROI in the final ROI mask image as r ( x , the y ) = e N R + 1 &times; f ( x , the y ) , Set the pixel values of all pixels in the ROI block in the final ROI mask image to r(x, y)=f(x, y), to obtain the ROI of the current texture video frame, where e Indicates the series of the transition region of interest, e∈[1, N R ], f(x, y) indicates the pixel value of the pixel whose coordinates are (x, y) in the current texture video frame.

所述的步骤⑤-2中的w2=16,h2=16,设定的第四阈值Th=50。In the step ⑤-2, w 2 =16, h 2 =16, and the set fourth threshold T h =50.

与现有技术相比,本发明的优点在于联合利用了时间上同步的纹理视频帧和纹理视频帧对应的深度视频帧,首先通过提取纹理视频帧的静态图像域视觉注意,获取纹理视频帧的静态图像域视觉注意的分布图,通过时间上连续的纹理视频帧提取运动视觉注意,获取纹理视频帧的运动视觉注意的分布图,通过提取深度视频帧的深度视觉注意,获取深度视频帧与纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图,然后利用已得到的静态图像域视觉注意的分布图、运动视觉注意的分布图及深度视觉注意的分布图以及深度信息,经过基于深度感知的融合方法得到符合人眼立体视觉特性的三维(立体)视觉注意的分布图,再经过阈值化和宏块化后处理操作得到最终的符合人眼立体感知的视频感兴趣区域及其对应的感兴趣区域与非感兴趣区域的掩模图像。该方法提取的感兴趣区域融合了静态图像域视觉注意、运动视觉注意和深度视觉注意,有效抑制各视觉注意提取内在的单一性和不准确性,解决了静态图像域视觉注意中的复杂背景引起的噪声问题,解决了运动视觉注意无法提取局部运动和运动幅度小的感兴趣区域,从而提高计算精度,增强算法的稳定性,能够从纹理复杂的背景和运动环境中提取出感兴趣区域。另外,通过该方法获取的感兴趣区域除符合人眼对静态纹理视频帧的视觉感兴趣特性和人眼对运动对象感兴趣的视觉特性外,还符合在立体视觉中对深度感强或距离近的对象感兴趣的深度感知特性,符合人眼立体视觉的语义特征。Compared with the prior art, the present invention has the advantage of jointly utilizing the temporally synchronized texture video frame and the depth video frame corresponding to the texture video frame, firstly by extracting the visual attention of the static image domain of the texture video frame, and obtaining the The distribution map of visual attention in the static image domain, extracting motion visual attention through temporally continuous texture video frames, obtaining the distribution map of motion visual attention of texture video frames, and obtaining depth video frames and texture by extracting depth visual attention of depth video frames The depth visual attention distribution map of the 3D video image jointly presented by the video frames, and then using the obtained static image domain visual attention distribution map, motion visual attention distribution map and depth visual attention distribution map and depth information, through depth-based The perceptual fusion method obtains the distribution map of three-dimensional (stereo) visual attention that conforms to the characteristics of human stereo vision, and then after thresholding and macroblocking post-processing operations, the final video region of interest and its corresponding sense of interest conforming to human stereo perception are obtained. Mask images of regions of interest and non-interest regions. The region of interest extracted by this method combines static image domain visual attention, motion visual attention and depth visual attention, which effectively suppresses the inherent singleness and inaccuracy of each visual attention extraction, and solves the problem of complex background in static image domain visual attention. The noise problem solves the problem that motion visual attention cannot extract local motion and the region of interest with small motion range, thereby improving the calculation accuracy, enhancing the stability of the algorithm, and being able to extract the region of interest from the background with complex texture and the moving environment. In addition, the region of interest obtained by this method not only conforms to the visual characteristics of the human eye for static texture video frames and the visual characteristics of the human eye for moving objects, but also conforms to the strong sense of depth or short distance in stereo vision. The depth perception characteristics of the object of interest conform to the semantic characteristics of human stereo vision.

附图说明 Description of drawings

图1a为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧;Figure 1a is the color video frame at time t in the test sequence "Ballet" two-dimensional color video;

图1b为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧;Figure 1b is the color video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图2a为测试序列“Ballet”二维彩色视频对应的深度视频中t时刻的深度视频帧;Figure 2a is the depth video frame at time t in the depth video corresponding to the test sequence "Ballet" two-dimensional color video;

图2b为测试序列“Door Flower”二维彩色视频对应的深度视频中t时刻的深度视频帧;Figure 2b is the depth video frame at time t in the depth video corresponding to the test sequence "Door Flower" two-dimensional color video;

图3为本发明方法的总体流程框图;Fig. 3 is the overall flow chart of the inventive method;

图4为采用公知的静态图像视觉注意检测方法检测当前纹理视频帧的静态图像域视觉注意的流程框图;Fig. 4 is the flow chart diagram that adopts known static image visual attention detection method to detect the static image domain visual attention of current texture video frame;

图5为运动视觉注意检测方法的流程框图;Fig. 5 is the flowchart of motion visual attention detection method;

图6为深度视觉注意检测方法的流程框图;Fig. 6 is the flowchart of deep visual attention detection method;

图7a为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的亮度特征图;Figure 7a is a luminance feature map of the color video frame at time t in the test sequence "Ballet" two-dimensional color video;

图7b为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的色度特征图;Figure 7b is a chromaticity feature map of the color video frame at time t in the test sequence "Ballet" two-dimensional color video;

图7c为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的方向特征图;Figure 7c is the direction feature map of the color video frame at time t in the test sequence "Ballet" two-dimensional color video;

图8a为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的静态图像域视觉注意的分布图;Figure 8a is a distribution diagram of visual attention in the static image domain of the color video frame at moment t in the test sequence "Ballet" two-dimensional color video;

图8b为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的运动视觉注意的分布图;Fig. 8 b is the distribution diagram of the motion visual attention of the color video frame at moment t in the test sequence "Ballet" two-dimensional color video;

图8c为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧与对应的深度视频帧联合展现的三维视频图像的深度视觉注意的分布图;Fig. 8c is a distribution diagram of the depth visual attention of the three-dimensional video image jointly presented by the color video frame and the corresponding depth video frame at time t in the test sequence "Ballet" two-dimensional color video;

图9为测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧及对应的深度视频帧经本发明处理后得到的三维视觉注意的分布图;Fig. 9 is the distribution diagram of the three-dimensional visual attention obtained after the color video frame and the corresponding depth video frame are processed by the present invention at time t in the test sequence "Ballet" two-dimensional color video;

图10a为测试序列“Ballet”的t时刻的纹理视频帧的经本发明提取的最终的感兴趣区域掩模图像;Fig. 10a is the final ROI mask image extracted by the present invention of the texture video frame at time t of the test sequence "Ballet";

图10b为测试序列“Ballet”的t时刻的纹理视频帧的经本发明提取的感兴趣区域;Fig. 10b is the region of interest extracted by the present invention of the texture video frame at time t of the test sequence "Ballet";

图11a为测试序列“Ballet”的t时刻的纹理视频帧的经传统仅依据静态图像域视觉注意线索感兴趣区域提取方法提取的感兴趣区域;Fig. 11a is the region of interest extracted by the traditional region of interest extraction method based only on visual attention clues in the static image domain of the texture video frame at time t of the test sequence "Ballet";

图11b为测试序列“Ballet”的t时刻的纹理视频帧的经传统仅依据运动视觉注意线索感兴趣区域提取方法提取的感兴趣区域;Figure 11b is the region of interest extracted by the traditional region of interest extraction method based only on motion visual attention clues of the texture video frame at time t of the test sequence "Ballet";

图11c为测试序列“Ballet”的t时刻的纹理视频帧的经传统静态图像域视觉注意和运动视觉注意联合感兴趣区域提取方法提取的感兴趣区域;Figure 11c is the region of interest extracted by the traditional static image domain visual attention and motion visual attention joint region of interest extraction method of the texture video frame at time t of the test sequence "Ballet";

图12a为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的亮度特征图;Figure 12a is a luminance feature map of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图12b为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的色度特征图;Figure 12b is a chromaticity feature map of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图12c为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的方向特征图;Figure 12c is the direction feature map of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图13a为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的静态图像域视觉注意的分布图;Figure 13a is a distribution diagram of visual attention in the static image domain of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图13b为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的运动视觉注意的分布图;Figure 13b is a distribution diagram of the motion visual attention of the color video frame at moment t in the test sequence "Door Flower" two-dimensional color video;

图13c为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧与对应的深度视频帧联合展现的三维视频图像的深度视觉注意的分布图;Figure 13c is a distribution diagram of the depth visual attention of the three-dimensional video image jointly presented by the color video frame and the corresponding depth video frame at time t in the test sequence "Door Flower" two-dimensional color video;

图14为测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧及对应的深度视频帧经本发明处理后得到的三维视觉注意的分布图;Fig. 14 is the distribution diagram of the three-dimensional visual attention obtained after the color video frame and the corresponding depth video frame are processed by the present invention at time t in the two-dimensional color video of the test sequence "Door Flower";

图15a为测试序列“Door Flower”的t时刻的纹理视频帧的经本发明提取的最终的感兴趣区域掩模图像;Fig. 15a is the final ROI mask image extracted by the present invention for the texture video frame at time t of the test sequence "Door Flower";

图15b为测试序列“Door Flower”的t时刻的纹理视频帧的经本发明提取的感兴趣区域;Fig. 15b is the region of interest extracted by the present invention of the texture video frame at time t of the test sequence "Door Flower";

图16a为测试序列“Door Flower”的t时刻的纹理视频帧的经传统仅依据静态图像域视觉注意线索感兴趣区域提取方法提取的感兴趣区域;Figure 16a is the region of interest extracted by the traditional method of extracting the region of interest based only on the visual attention clues of the static image domain for the texture video frame at time t of the test sequence "Door Flower";

图16b为测试序列“Door Flower”的t时刻的纹理视频帧的经传统仅依据运动视觉注意线索感兴趣区域提取方法提取的感兴趣区域;Figure 16b is the region of interest extracted by the traditional method of extracting the region of interest only based on the motion visual attention clues of the texture video frame at time t of the test sequence "Door Flower";

图16c为测试序列“Door Flower”的t时刻的纹理视频帧的经静态图像域视觉注意和运动视觉注意联合感兴趣区域提取方法提取的感兴趣区域。Fig. 16c is the region of interest extracted by the joint region of interest extraction method of static image domain visual attention and motion visual attention of the texture video frame at time t of the test sequence "Door Flower".

具体实施方式 Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明的一种基于视觉注意的视频感兴趣区域的提取方法,主要联合利用了时间上同步的纹理视频的信息和深度视频的信息来提取视频感兴趣区域,在本实施例中纹理视频主要采用二维彩色视频,纹理视频以测试序列“Ballet”二维彩色视频和“Door Flower”二维彩色视频为例,图1a给出了测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧,图1b给出了测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧,图2a为测试序列“Ballet”二维彩色视频对应的深度视频中t时刻的深度视频帧,图2b为测试序列“Door Flower”二维彩色视频对应的深度视频中t时刻的深度视频帧,二维彩色视频对应的深度视频中各时刻的深度视频帧为ZD比特深度表示的灰度图,灰度图的灰度值表示深度视频帧中各像素所表示的对象到相机的相对距离。纹理视频中各时刻的纹理视频帧的尺寸大小定义为W×H,而对于纹理视频对应的深度视频中各时刻的深度视频帧,若深度视频帧的尺寸大小与纹理视频帧的尺寸大小不相同,则一般采用现有的尺度变换和插值等方法将深度视频帧的尺寸大小设置为与纹理视频帧相同的尺寸大小,即也为W×H,W为纹理视频中各时刻的纹理视频帧的宽或深度视频中各时刻的深度视频帧的宽,H为纹理视频中各时刻的纹理视频帧的高或深度视频中各时刻的深度视频帧的高,将深度视频帧的尺寸大小设置成与纹理视频帧的尺寸大小相同,目的是为了更方便地提取视频感兴趣区域。A method for extracting a region of interest in a video based on visual attention in the present invention mainly utilizes the information of the texture video and the information of the depth video synchronized in time to extract the region of interest in the video. In this embodiment, the texture video mainly uses Two-dimensional color video, texture video Taking the test sequence "Ballet" two-dimensional color video and "Door Flower" two-dimensional color video as examples, Figure 1a shows the color video frame at time t in the test sequence "Ballet" two-dimensional color video , Figure 1b shows the color video frame at time t in the test sequence "Door Flower" two-dimensional color video, Figure 2a shows the depth video frame at time t in the depth video corresponding to the test sequence "Ballet" two-dimensional color video, Figure 2b is the depth video frame at time t in the depth video corresponding to the two-dimensional color video of the test sequence "Door Flower ". The grayscale value of the degree map represents the relative distance of the object represented by each pixel in the depth video frame to the camera. The size of the texture video frame at each moment in the texture video is defined as W×H, and for the depth video frame at each moment in the depth video corresponding to the texture video, if the size of the depth video frame is different from the size of the texture video frame , then the existing scale transformation and interpolation methods are generally used to set the size of the depth video frame to the same size as the texture video frame, which is also W×H, and W is the texture video frame at each moment in the texture video. The width of the depth video frame at each moment in the wide or depth video, H is the height of the texture video frame at each moment in the texture video or the height of the depth video frame at each moment in the depth video, and the size of the depth video frame is set to be the same as The size of the texture video frames is the same, the purpose is to extract the video region of interest more conveniently.

本发明方法的总体流程框图如图3所示,具体包括以下步骤:The overall block diagram of the inventive method as shown in Figure 3, specifically comprises the following steps:

①将二维彩色视频定义为纹理视频,定义纹理视频中各时刻的纹理视频帧的尺寸大小均为W×H,W为纹理视频中各时刻的纹理视频帧的宽,H为纹理视频中各时刻的纹理视频帧的高,记纹理视频中t时刻的纹理视频帧为Ft,定义纹理视频中t时刻的纹理视频帧Ft为当前纹理视频帧,采用公知的静态图像视觉注意检测方法检测当前纹理视频帧的静态图像域视觉注意,得到当前纹理视频帧的静态图像域视觉注意的分布图,记为SI,当前纹理视频帧的静态图像域视觉注意的分布图SI的尺寸大小为W×H且其为ZS比特深度表示的灰度图,该灰度图中某一像素的像素值越大表示人眼对当前纹理视频帧的对应像素的相对注意程度越高,像素值越小表示人眼对当前纹理视频帧的相对注意程度越低。①Define the two-dimensional color video as texture video, and define the size of the texture video frame at each moment in the texture video as W×H, where W is the width of the texture video frame at each moment in the texture video, and H is the The height of the texture video frame at time, record the texture video frame at time t in the texture video as F t , define the texture video frame F t at time t in the texture video as the current texture video frame, and use the known static image visual attention detection method to detect The static image domain visual attention of the current texture video frame obtains the distribution map of the static image domain visual attention of the current texture video frame, which is denoted as S I , and the size of the distribution map S I of the static image domain visual attention of the current texture video frame is: W×H and it is a grayscale image represented by ZS bit depth. The larger the pixel value of a certain pixel in the grayscale image, the higher the relative attention of the human eye to the corresponding pixel of the current texture video frame, and the higher the pixel value is. A small value means that the human eye pays less attention to the current texture video frame.

在此具体实施例中,采用公知的静态图像视觉注意检测方法检测当前纹理视频帧的静态图像域视觉注意的流程框图如图4所示,在图4中每个矩形表示一种数据处理过程,每个菱形分别示意一幅图像,不同尺寸的菱形表示不同分辨率的图像,是相应操作的输入和输出数据;当前纹理视频帧为RGB格式的图像,图像中的每个像素由R、G和B三个颜色通道表示,首先将当前纹理视频帧的每个像素的各颜色通道分量线性变换,分解为一个亮度分量图和两个色度分量图即红绿分量图和蓝黄分量图,亮度分量图、红绿分量图及蓝黄分量图分别记为I、RG及BY,亮度分量图I在(x,y)坐标的像素值表示为Ix,y=(rx,y+gx,y+bx,y)/3,其中,Ix,y表示亮度分量在(x,y)坐标的像素值,rx,y、gx,y、bx,y分别为当前纹理视频帧在(x,y)坐标的RGB三个颜色通道的像素的像素值,红绿分量图RG、蓝黄分量图BY两个色度分量图分别在(x,y)坐标的像素值分别表示为:In this specific embodiment, adopt known static image visual attention detection method to detect the flow chart of the still image domain visual attention of current texture video frame as shown in Figure 4, in Figure 4 each rectangle represents a kind of data processing process, Each rhombus represents an image, and diamonds of different sizes represent images of different resolutions, which are the input and output data of the corresponding operation; the current texture video frame is an image in RGB format, and each pixel in the image is composed of R, G and B three color channel representations, first linearly transform each color channel component of each pixel of the current texture video frame, and decompose it into a luminance component map and two chrominance component maps, namely the red-green component map and the blue-yellow component map, the brightness Component diagram, red-green component diagram and blue-yellow component diagram are denoted as I, RG and BY respectively, and the pixel value of luminance component diagram I in (x, y) coordinate is represented as I x, y =(r x, y +g x , y +b x, y )/3, wherein, I x, y represent the pixel value of the brightness component at (x, y) coordinates, r x, y , g x, y , b x, y are respectively the current texture video The pixel values of the pixels of the RGB three color channels of the frame at the (x, y) coordinates, and the pixel values of the two chromaticity component maps of the red-green component map RG and the blue-yellow component map BY respectively at the (x, y) coordinates represent for:

R x , y = r x , y - ( g x , y + b x , y ) / 2 G x , y = g x , y - ( r x , y + b x , y ) / 2 B x , y = b x , y - ( r x , y + g x , y ) / 2 Y x , y = r x , y + g x , y - 2 ( | r x , y - g x , y | + b x , y ) RG x , y = R x , y - G x , y BY x , y = B x , y - Y x , y , 其中,RGx,y表示红绿分量图RG在(x,y)坐标的像素值,BYx,y表示蓝黄分量图BY在(x,y)坐标的像素值;采用公知的Gabor滤波器提取亮度分量图的0度,45度,90度和135度四个方向分量图,提取的四个方向分量图分别记为Oθ T &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } ; 对一个亮度分量、两个色度分量和四个方向分量分别进行高斯金字塔分解,各分解为nL层,在此nL为小于20的正整数,各分量图统一用l表示,记各分量图l经高斯金字塔分解后得到的第i层的层分量图为l(i),其中,i∈[0,nL-1], l &Element; { I } &cup; { RG , BY } &cup; { O 0 T , O &pi; / 4 T , O &pi; / 2 T , O 3 &pi; / 4 T } , 所以共产生的7个分量图,每个分量图分解为nL层层分量图,共7×nL层层分量图,本实施例中nL取值为9;利用已提取的层分量图计算各分量(色度、亮度、方向分量)的特征图为:

Figure G2009101525203D00143
&ForAll; l &Element; { I } &cup; { RG , BY } &cup; { O 0 T , O &pi; / 4 T , O &pi; / 2 T , O 3 &pi; / 4 T } , 其中,s,c∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},为归一化至-1区间的归一化函数,
Figure G2009101525203D00147
表示最容易引起注意,0表示最不容易引起注意,l(c)表示分量图l的第c层层分量图,l(s)表示分量图l的第s层层分量图,符号
Figure G2009101525203D00148
为l(c)与l(s)进行跨层级作差运算符,如果c<s则将l(s)上采样至与l(c)具有相同分辨率的图像上,然后将l(c)与上采样后的l(s)对应像素分别进行作差,如果c>s,则将l(c)上采样至与l(s)具有相同分辨率的图像上,然后将l(s)与上采样后的l(c)对应像素分别进行作差,符号
Figure G2009101525203D00149
表示l(c)与l(s)跨层级相加运算符,如果c<s则将l(s)上采样至与l(c)具有相同分辨率的图像上,然后将l(c)与上采样后的l(s)对应像素分别进行求和,如果c>s,则将l(c)上采样至与l(s)具有相同分辨率的图像上,然后将l(s)与上采样后的l(c)对应像素分别进行求和,将各分量的特征图线性融合并归一化,得到当前纹理视频帧的静态图像域视觉注意的分布图SI
Figure G2009101525203D001410
R x , the y = r x , the y - ( g x , the y + b x , the y ) / 2 G x , the y = g x , the y - ( r x , the y + b x , the y ) / 2 B x , the y = b x , the y - ( r x , the y + g x , the y ) / 2 Y x , the y = r x , the y + g x , the y - 2 ( | r x , the y - g x , the y | + b x , the y ) RG x , the y = R x , the y - G x , the y BY x , the y = B x , the y - Y x , the y , Among them, RG x, y represents the pixel value of the red-green component map RG at the (x, y) coordinates, BY x, y represents the pixel value of the blue-yellow component map BY at the (x, y) coordinates; the known Gabor filter is used Extract the 0 degree, 45 degree, 90 degree and 135 degree four direction component diagrams of the brightness component diagram, and the extracted four direction component diagrams are respectively recorded as O θ T , &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } ; Decompose a luminance component, two chrominance components and four direction components into a Gaussian pyramid, each decomposed into n L layers, where n L is a positive integer less than 20, and each component map is uniformly represented by l, and each component is recorded The layer component graph of the i-th layer obtained after graph l is decomposed by Gaussian pyramid is l(i), where, i∈[0,n L -1], l &Element; { I } &cup; { RG , BY } &cup; { o 0 T , o &pi; / 4 T , o &pi; / 2 T , o 3 &pi; / 4 T } , Therefore, the 7 component graphs produced in total, each component graph is decomposed into n L layer component graphs, a total of 7×n L layer layer component graphs, and the value of n L in this embodiment is 9; using the extracted layer component graphs Calculate the feature map of each component (chroma, brightness, direction component) as:
Figure G2009101525203D00143
&ForAll; l &Element; { I } &cup; { RG , BY } &cup; { o 0 T , o &pi; / 4 T , o &pi; / 2 T , o 3 &pi; / 4 T } , Among them, s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3}, to normalize to The normalization function of the -1 interval,
Figure G2009101525203D00147
Indicates that it is the easiest to attract attention, 0 indicates that it is the least likely to attract attention, l(c) indicates the c-th layer component graph of component graph l, l(s) represents the s-th layer component graph of component graph l, and the symbol
Figure G2009101525203D00148
Perform a cross-level difference operator for l(c) and l(s), if c<s, upsample l(s) to an image with the same resolution as l(c), and then use l(c) Make a difference with the pixel corresponding to the upsampled l(s), if c>s, then upsample l(c) to an image with the same resolution as l(s), and then compare l(s) with The corresponding pixels of l(c) after upsampling are respectively subtracted, and the symbol
Figure G2009101525203D00149
Indicates the cross-level addition operator of l(c) and l(s), if c<s, upsample l(s) to an image with the same resolution as l(c), and then combine l(c) with The corresponding pixels of the upsampled l(s) are summed separately. If c>s, then upsample l(c) to an image with the same resolution as l(s), and then combine l(s) with the above The corresponding pixels of l(c) after sampling are summed separately, and the feature maps of each component are linearly fused and normalized to obtain the distribution map S I of visual attention in the static image domain of the current texture video frame,
Figure G2009101525203D001410

测试序列“Ballet”和“Door Flower”的各个图像的尺寸大小为1024×768,测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的亮度特征图、色度特征图和方向特征图分别如图7a、图7b和图7c所示;测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的亮度特征图、色度特征图和方向特征图分别如图12a、图12b和图12c所示。在此具体实施例中,ZS=8,即静态图像域视觉注意的分布图SI的每个像素采用8比特深度表示,测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧的静态图像域视觉注意的分布图如图8a所示;测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧的静态图像域视觉注意的分布图如图13a所示。在此,静态图像域视觉注意检测方法还可采用其他公知的视觉注意检测方法。The size of each image of the test sequence "Ballet" and "Door Flower" is 1024×768, the luminance feature map, chrominance feature map and direction feature map of the color video frame at time t in the test sequence "Ballet" two-dimensional color video As shown in Figure 7a, Figure 7b and Figure 7c respectively; the brightness feature map, chromaticity feature map and direction feature map of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video are shown in Figure 12a and Figure 12b respectively and shown in Figure 12c. In this specific embodiment, Z S =8, that is, each pixel of the distribution map S I of visual attention in the static image domain adopts 8-bit depth representation, and the color video frame at time t in the test sequence "Ballet" two-dimensional color video The distribution of visual attention in the static image domain is shown in Figure 8a; the distribution of visual attention in the static image domain of the color video frame at time t in the test sequence "Door Flower" two-dimensional color video is shown in Figure 13a. Here, the visual attention detection method in the static image domain may also use other known visual attention detection methods.

②采用运动视觉注意检测方法检测当前纹理视频帧的运动视觉注意,得到当前纹理视频帧的运动视觉注意的分布图,记为SM,当前纹理视频帧的运动视觉注意的分布图SM的尺寸大小为W×H且其为ZS比特深度表示的灰度图,该灰度图中某一像素的像素值越大表示人眼对当前纹理视频帧的对应像素的相对运动注意程度越高,像素值越小表示人眼对当前纹理视频帧的对应像素的相对运动注意程度越低。② Use the motion visual attention detection method to detect the motion visual attention of the current texture video frame, and obtain the distribution map of the motion visual attention of the current texture video frame, denoted as S M , the size of the motion visual attention distribution map S M of the current texture video frame The size is W×H and it is a grayscale image represented by ZS bit depth. The larger the pixel value of a certain pixel in the grayscale image, the higher the relative motion attention of the human eye to the corresponding pixel of the current texture video frame, A smaller pixel value indicates that human eyes pay less attention to the relative motion of the corresponding pixel of the current texture video frame.

在此具体实施例中,运动视觉注意检测方法的流程框图如图5所示,该运动视觉注意检测方法的具体过程为:In this specific embodiment, the flow chart of the motion visual attention detection method is as shown in Figure 5, and the specific process of the motion visual attention detection method is:

②-1、记纹理视频中与当前纹理视频帧时间上连续的t+j时刻的纹理视频帧为Ft+j,记纹理视频中与当前纹理视频帧时间上连续的t-j时刻的纹理视频帧为Ft-j,其中,j∈(0,NF/2],NF为小于10的正整数,在本实施例的具体应用过程中取NF=4,即采用当前纹理视频帧以及当前纹理视频帧的前两帧和后两帧联合提取纹理视频的运动区域。②-1. Record the texture video frame at time t+j continuous with the current texture video frame in the texture video as F t+j , record the texture video frame at the time tj time continuous with the current texture video frame in the texture video is F tj , wherein, j∈(0, NF /2], NF is a positive integer less than 10, and NF = 4 is taken in the specific application process of this embodiment, that is, the current texture video frame and the current texture The first two frames and the last two frames of the video frame jointly extract the motion region of the texture video.

②-2、采用公知的光流法计算当前纹理视频帧与t+j时刻的纹理视频帧Ft+j在水平方向的运动向量图像和竖直方向的运动向量图像,及当前纹理视频帧与t-j时刻的纹理视频帧Ft-j在水平方向的运动向量图像和竖直方向的运动向量图像,记当前纹理视频帧与t+j时刻的纹理视频帧Ft+j在水平方向的运动向量图像为Vt+j H及竖直方向的运动向量图像为Vt+j V,记当前纹理视频帧与t-j时刻的纹理视频帧Ft-j在水平方向的运动向量图像为Vt-j H及竖直方向的运动向量图像为Vt-j V,Vt+j H、Vt+j V、Vt-j H及Vt-j V的宽为W及高为H。②-2, using the known optical flow method to calculate the motion vector image of the current texture video frame and the texture video frame F t+j at the moment t+j in the horizontal direction and the motion vector image of the vertical direction, and the current texture video frame and The motion vector image of the texture video frame F tj in the horizontal direction and the motion vector image in the vertical direction of the texture video frame F tj at the moment tj, record the motion vector image of the current texture video frame and the texture video frame F t+j of the moment t +j in the horizontal direction as The motion vector image of V t+j H and the vertical direction is V t+j V , and the motion vector image of the current texture video frame and the texture video frame F tj at the moment tj in the horizontal direction is V tj H and the motion vector image of the vertical direction The motion vector image is V tj V , V t+j H , V t+j V , V tj H and V tj V have width W and height H.

②-3、将Vt+j H的绝对值与Vt+j V的绝对值叠加得到当前纹理视频帧与t+j时刻的纹理视频帧Ft+j的运动幅度图像,记为Mt+j M t + j = | V t + j H | + | V t + j V | , 记Mt+j中坐标为(x,y)的像素的运动幅度值为mt+j(x,y);将Vt-j H的绝对值与Vt-j V的绝对值叠加得到当前纹理视频帧与t-j时刻的纹理视频帧Ft-j的运动幅度图像,记为Mt-j M t - j = | V t - j H | + | V t - j V | , 记Mt-j中坐标为(x,y)的像素的运动幅度值为mt-j(x,y)。②-3. Superimpose the absolute value of V t+j H and the absolute value of V t+j V to obtain the motion amplitude image of the current texture video frame and the texture video frame F t+ j at time t+j, denoted as M t +j , m t + j = | V t + j h | + | V t + j V | , Record the motion amplitude value of the pixel whose coordinates are (x, y) in M t+j is m t+j (x, y); superimpose the absolute value of V tj H and the absolute value of V tj V to obtain the current texture video frame and the motion amplitude image of texture video frame F tj at time tj, denoted as M tj , m t - j = | V t - j h | + | V t - j V | , Note that the motion amplitude value of the pixel whose coordinates are (x, y) in M tj is m tj (x, y).

②-4、利用当前纹理视频帧和t+j时刻的纹理视频帧Ft+j及t-j时刻的纹理视频帧Ft-j,提取联合运动图,记为Mj Δ,提取联合运动图Mj Δ的具体过程为:判断当前纹理视频帧与t+j时刻的纹理视频帧Ft+j的运动幅度图像Mt+j中的各个像素和当前纹理视频帧与t-j时刻的纹理视频帧Ft-j的运动幅度图像Mt-j中对应坐标的像素的运动幅度值中的最小值是否大于设定的第一阈值T1,如果是,则确定联合运动图Mj Δ中相应坐标的像素的像素值为Mt+j和Mt-j中对应坐标的像素的运动幅度值之和的平均,否则,确定联合运动图Mj Δ中相应坐标的像素的像素值为0;对于Mt+j中坐标为(x,y)的像素和Mt-j中坐标为(x,y)的像素,判断min(mt+j(x,y),mt-j(x,y))是否大于设定的第一阈值T1,如果是,则确定联合运动图Mj Δ中坐标为(x,y)的像素的像素值为

Figure G2009101525203D00162
否则,确定联合运动图Mj Δ中坐标为(x,y)的像素的像素值为0,其中,min()为取最小值函数。在此,第一阈值T1=1,以滤除非常微小的相机参数抖动所造成的小噪声点。②-4. Use the current texture video frame and the texture video frame F t+j at time t+j and the texture video frame F tj at time tj to extract the joint motion map, denoted as M j Δ , and extract the joint motion map M j Δ The specific process is: judging the current texture video frame and the texture video frame F t+j at the time t+j of each pixel in the motion range image M t+j and the current texture video frame and the texture video frame F tj at the time tj Whether the minimum value of the motion amplitude value of the pixel corresponding to the coordinate in the motion amplitude image M tj is greater than the set first threshold T 1 , if yes, then determine the pixel value M of the pixel corresponding to the coordinate in the joint motion map M j Δ The average of the sum of the motion amplitude values of the pixels corresponding to the coordinates in t+j and M tj , otherwise, the pixel value of the pixel corresponding to the coordinates in the joint motion map M j Δ is determined to be 0; for M t+j , the coordinates are (x , y) and the pixel whose coordinates are (x, y) in M tj , determine whether min(m t+j (x, y), m tj (x, y)) is greater than the set first threshold T 1 , if yes, determine the pixel value of the pixel with coordinates (x, y) in the joint motion map M j Δ
Figure G2009101525203D00162
Otherwise, it is determined that the pixel value of the pixel with coordinates (x, y) in the joint motion map M j Δ is 0, where min() is a minimum value function. Here, the first threshold T 1 =1 to filter out small noise points caused by very slight camera parameter shakes.

②-5、将在时间上与t时刻距离1时刻至NF/2时刻的各个时刻的联合运动图加权叠加得到当前纹理视频帧的加权联合运动图,记为M,记当前纹理视频帧的加权联合运动图M中坐标为(x,y)的像素的像素值为m(x,y), m ( x , y ) = &Sigma; j = 1 N F / 2 &zeta; j m j &Delta; ( x , y ) , 其中,mj Δ(x,y)表示在时间上与t时刻距离j时刻的联合运动图Mj Δ中坐标为(x,y)的像素的像素值,ζj为加权系数,加权系数ζj满足 &Sigma; j = 1 N F / 2 &zeta; j = 1 . 2.-5. The weighted joint motion map obtained from the weighted joint motion map of the current texture video frame is obtained by the weighted joint motion map of each moment from 1 moment to NF /2 moment with the t moment in time, denoted as M, and the number of the current texture video frame The pixel value of the pixel whose coordinates are (x, y) in the weighted joint motion map M is m(x, y), m ( x , the y ) = &Sigma; j = 1 N f / 2 &zeta; j m j &Delta; ( x , the y ) , Among them, m j Δ (x, y) represents the pixel value of the pixel whose coordinates are (x, y) in the joint motion map M j Δ at time j at a distance from time t in time, ζ j is the weighting coefficient, and the weighting coefficient ζ meet &Sigma; j = 1 N f / 2 &zeta; j = 1 .

在视频中,运动物体是主要的感兴趣区域,然而由于运动类型不同,人们的注意程度是不同的,将视频的运动类型主要分为以下两类情况,第一类,对于静止相机拍摄的情况,背景静止,运动物体是主要感兴趣对象;第二类,对于运动相机拍摄的情况,背景全局运动,而运动物体与相机保持相对静止或呈现于背景不一致运动的情况,此时,该运动物体仍然是感兴趣对象;针对以上分析,人们运动注意区域主要来源于该物体运动属性区别于背景环境的运动属性,是运动对比度较大的区域,因此可采用以下步骤获取运动视觉注意。In the video, the moving object is the main area of interest. However, due to the different types of motion, people's attention is different. The motion types of the video are mainly divided into the following two types. The first type is the case of still camera shooting. , the background is still, and the moving object is the main object of interest; the second type, for the case of a moving camera shooting, the background moves globally, while the moving object and the camera remain relatively still or appear in the background. It is still an object of interest; according to the above analysis, the people's motion attention area mainly comes from the motion properties of the object that are different from the motion properties of the background environment, and it is an area with a large motion contrast. Therefore, the following steps can be used to obtain motion visual attention.

②-6、对当前纹理视频帧的加权联合运动图M进行高斯金字塔分解,分解成nL层加权联合运动图,记加权联合运动图M经高斯金字塔分解后得到的第i层加权联合运动图为M(i),第i层加权联合运动图M(i)的宽和高分别为W/2i和H/2i,其中,nL为小于20的正整数,i∈[0,nL-1],第0层为最底层,第nL-1层为最高层,W为当前纹理视频帧的宽,H为当前纹理视频帧的高;在本实施例的具体应用过程中nL取值为9。②-6. Decompose the weighted joint motion map M of the current texture video frame into a Gaussian pyramid, decompose it into n L layers of weighted joint motion maps, and record the i-th layer weighted joint motion map obtained after the weighted joint motion map M is decomposed by the Gaussian pyramid is M(i), the width and height of the i-th layer weighted joint motion map M(i) are W/2 i and H/2 i respectively, where n L is a positive integer less than 20, i∈[0,n L -1], the 0th layer is the bottom layer, the nth L -1 layer is the highest layer, W is the width of the current texture video frame, and H is the height of the current texture video frame; in the specific application process of the present embodiment, n The value of L is 9.

②-7、利用当前纹理视频帧的加权联合运动图M的n层nL加权联合运动图,提取当前纹理视频帧的运动视觉注意的分布图SM,记SM中坐标为(x,y)的像素的像素值为sm(x,y),SM=FM,其中,

Figure G2009101525203D00172
s,c ∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure G2009101525203D00173
为归一化至
Figure G2009101525203D00174
区间的归一化函数,符号“||”为绝对值运算符号,M(c)为第c层加权联合运动图,M(s)为第s层加权联合运动图,符号
Figure G2009101525203D00175
为M(c)与M(s)进行跨层级作差运算符,如果c<s,则将M(s)上采样至与M(c)具有相同分辨率的图像上,然后将M(c)的各个像素与上采样后的M(s)相对应像素分别进行作差,如果c>s,则将M(c)上采样至与M(s)具有相同分辨率的图像上,然后将M(s)的各个像素与上采样后的M(c)相对应像素分别进行作差,符号为M(c)与M(s)进行跨层级相加运算符,如果c<s,则将M(s)上采样至与M(c)具有相同分辨率的图像上,然后将M(c)的各个像素与上采样后的M(s)相对应像素分别进行求和,如果c>s,则将M(c)上采样至与M(s)具有相同分辨率的图像上,然后将M(s)的各个像素与上采样后的M(c)相对应像素分别进行求和。②-7. Utilize the n-layer n L weighted joint motion map of the weighted joint motion map M of the current texture video frame to extract the distribution map S M of the motion visual attention of the current texture video frame, and record the coordinates in S M as (x, y ) pixel value of the pixel s m (x, y), S M = F M , where,
Figure G2009101525203D00172
s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3},
Figure G2009101525203D00173
to normalize to
Figure G2009101525203D00174
The normalization function of the interval, the symbol "||" is the absolute value operation symbol, M(c) is the weighted joint motion map of the c-th layer, M(s) is the weighted joint motion map of the s-th layer, the symbol
Figure G2009101525203D00175
Perform a cross-level difference operator for M(c) and M(s), if c<s, then upsample M(s) to an image with the same resolution as M(c), and then M(c ) and the corresponding pixels of the upsampled M(s) are respectively differentiated. If c>s, then M(c) is upsampled to an image with the same resolution as M(s), and then The difference between each pixel of M(s) and the corresponding pixel of M(c) after upsampling is respectively performed, and the sign Perform a cross-level addition operator for M(c) and M(s), if c<s, then upsample M(s) to an image with the same resolution as M(c), and then M(c ) and the corresponding pixels of the upsampled M(s) are summed separately, if c>s, then M(c) is upsampled to an image with the same resolution as M(s), and then Each pixel of M(s) is summed with the corresponding pixel of M(c) after upsampling.

测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧经本步骤处理后得到的运动视觉注意的分布图如图8b所示;测试序列“Door Flower”二维彩色视频中t时刻的彩色视频帧经本步骤处理后得到的运动视觉注意的分布图如图13b所示。The distribution diagram of motion visual attention obtained after processing the color video frame at time t in the test sequence "Ballet" two-dimensional color video is shown in Figure 8b; the color video frame at time t in the test sequence "Door Flower" two-dimensional color video The distribution diagram of motion visual attention obtained after the video frames are processed in this step is shown in Fig. 13b.

③定义纹理视频对应的深度视频中各时刻的深度视频帧为ZD比特深度表示的灰度图,其0到

Figure G2009101525203D00181
范围的灰度值表示深度视频帧中的各个像素所表示的拍摄到的对象到拍摄相机的相对距离,灰度值0对应最大深度,灰度值
Figure G2009101525203D00182
对应最小深度,将深度视频中各时刻的深度视频帧的尺寸大小均设置为W×H,W为深度视频中各时刻的深度视频帧的宽,H为深度视频中各时刻的深度视频帧的高,记深度视频中t时刻的深度视频帧为Dt,定义深度视频中t时刻的深度视频帧Dt为当前深度视频帧,采用深度视觉注意检测方法检测当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意,得到三维视频图像的深度视觉注意的分布图,记为SD,三维视频图像的深度视觉注意的分布图SD的尺寸大小为W×H且其为ZS比特深度表示的灰度图,该灰度图中某一像素的像素值越大表示人眼对当前纹理视频帧的对应像素的相对深度注意程度越高,像素值越小表示人眼对当前纹理视频帧的相对深度注意程度越低。本实施例中,深度视频帧的每个像素由ZD=8比特深度表示,视觉注意分布图的每个像素由ZS=8比特深度表示。③Defining the depth video frame corresponding to the texture video at each moment is a grayscale image represented by the Z D bit depth, and its 0 to
Figure G2009101525203D00181
The grayscale value of the range indicates the relative distance from the captured object represented by each pixel in the depth video frame to the shooting camera. The grayscale value 0 corresponds to the maximum depth, and the grayscale value
Figure G2009101525203D00182
Corresponding to the minimum depth, the size of the depth video frame at each moment in the depth video is set to W×H, W is the width of the depth video frame at each moment in the depth video, and H is the width of the depth video frame at each moment in the depth video High, record the depth video frame at time t in the depth video as D t , define the depth video frame D t at time t in the depth video as the current depth video frame, and use the depth visual attention detection method to detect the current depth video frame and the current texture video frame The depth visual attention of the 3D video images jointly displayed, the distribution map of the depth visual attention of the 3D video images is obtained, denoted as SD , the size of the distribution map SD of the depth visual attention of the 3D video images is W×H and it is The grayscale image represented by Z S bit depth, the larger the pixel value of a pixel in the grayscale image, the higher the human eye’s attention to the relative depth of the corresponding pixel of the current texture video frame, and the smaller the pixel value, the human eye’s attention to the corresponding pixel of the current texture video frame. The lower the relative depth attention of the current texture video frame. In this embodiment, each pixel of the depth video frame is represented by Z D =8 bit depth, and each pixel of the visual attention distribution map is represented by Z S =8 bit depth.

特有的立体感是立体视频区别于传统单通道视频的主要特点,对于立体视频的视觉注意力,深度感主要通过两个方面影响着用户的视觉注意力,一方面,用户对于靠近拍摄相机阵列的景物(或物体)的感兴趣程度一般大于远离拍摄相机阵列的景物(或物体);另一方面,深度不连续区域提供给用户以强烈的深度对比。在此具体实施例中,深度视觉注意检测方法的流程框图如图6所示,该深度视觉注意检测方法的具体过程为:The unique stereoscopic effect is the main feature that distinguishes stereoscopic video from traditional single-channel video. For the visual attention of stereoscopic video, the sense of depth mainly affects the user's visual attention through two aspects. Scenes (or objects) are generally more interesting than scenes (or objects) far away from the camera array; on the other hand, the depth discontinuity region provides the user with a strong depth contrast. In this specific embodiment, the flow chart of the deep visual attention detection method is shown in Figure 6, and the specific process of the deep visual attention detection method is:

③-1、对当前深度视频帧进行高斯金字塔分解,分解成nL层深度视频帧,记当前深度视频帧高斯金字塔分解后得到第i层深度视频帧为D(i),第i层深度视频帧D(i)的宽和高分别为W/2i和H/2i,其中,nL为小于20的正整数,i∈[0,nL-1],第0层为最底层,分辨率最大D(0)=Dt,第nL-1层为最高层,分辨率最小,W为当前深度视频帧的宽,H为当前深度视频帧的高。③-1. Decompose the current depth video frame into a Gaussian pyramid, and decompose it into n L -layer depth video frames. After the Gaussian pyramid decomposition of the current depth video frame, obtain the i-th layer depth video frame as D(i), and the i-th layer depth video frame The width and height of frame D(i) are W/2 i and H/2 i respectively, where n L is a positive integer less than 20, i∈[0, n L -1], and the 0th layer is the bottom layer, The maximum resolution is D(0)=D t , the n L -1th layer is the highest layer, and the resolution is the smallest, W is the width of the current depth video frame, and H is the height of the current depth video frame.

③-2、利用当前深度视频帧的nL层深度视频帧,提取当前深度视频帧的深度特征图,记为FD其中,

Figure G2009101525203D00184
s,c ∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},为归一化至区间的归一化函数,符号“||”为绝对值运算符号,D(c)为第c层深度视频帧,D(s)为第s层深度视频帧,符号
Figure G2009101525203D00187
为D(c)与D(s)进行跨层级作差运算符,如果c<s,则将D(s)上采样至与D(c)具有相同分辨率的图像上,然后将D(c)的各个像素与上采样后的D(s)相对应像素分别进行作差,如果c>s,则将D(c)上采样至与D(s)具有相同分辨率的图像上,然后将D(s)的各个像素与上采样后的D(c)相对应像素分别进行作差,符号
Figure G2009101525203D00191
为D(c)与D(s)进行跨层级相加运算符,如果c<s,则将D(s)上采样至与D(c)具有相同分辨率的图像上,然后将D(c)的各个像素与上采样后的D(s)相对应像素分别进行求和,如果c>s,则将D(c)上采样至与D(s)具有相同分辨率的图像上,然后将D(s)的各个像素与上采样后的D(c)相对应像素分别进行求和。③-2. Utilize the n L depth video frames of the current depth video frame to extract the depth feature map of the current depth video frame, denoted as F D , in,
Figure G2009101525203D00184
s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3}, to normalize to The normalization function of the interval, the symbol "||" is the absolute value operation symbol, D(c) is the depth video frame of the c-th layer, D(s) is the depth video frame of the s-th layer, the symbol
Figure G2009101525203D00187
Perform a cross-level difference operator for D(c) and D(s), if c<s, then upsample D(s) to an image with the same resolution as D(c), and then use D(c ) and the corresponding pixels of the upsampled D(s) are respectively differentiated, if c>s, then D(c) is upsampled to an image with the same resolution as D(s), and then Each pixel of D(s) is different from the corresponding pixel of D(c) after upsampling, and the symbol
Figure G2009101525203D00191
Perform a cross-level addition operator for D(c) and D(s), if c<s, then upsample D(s) to an image with the same resolution as D(c), and then D(c ) and the corresponding pixels of the upsampled D(s) are summed separately, if c>s, then D(c) is upsampled to an image with the same resolution as D(s), and then Each pixel of D(s) is summed with the corresponding pixel of D(c) after upsampling.

③-3、深度差异性较大的深度边缘区域给予用户更强的深度感,所以当前深度视频帧中的深度强边缘区域是深度视觉注意的另一重要感兴趣区域,因此在此采用公知的0度、π/4度、π/2度和3π/4度方向Gabor滤波器对当前深度视频帧作卷积运算,以提取0度、π/4度、π/2度和3π/4度方向的四个方向分量,得到当前深度视频帧的四个方向分量图,四个方向分量图分别表示为O0 D、Oπ/4 D、Oπ/2 D和O3π/4 D;对当前深度视频帧的O0 D方向分量图、Oπ/4 D方向分量图、Oπ/2 D方向分量图和O3π/4 D方向分量图分别进行高斯金字塔分解,各分解成nL层方向分量图,记θ度方向的方向分量图经高斯金字塔分解后得到的第i层方向分量图为Oθ D(i),Oθ D(i)的宽和高分别为W/2i和H/2i,其中,θ∈{0,π/4,π/2,3π/4}i∈[0,nL-1],第0层为最底层, O &theta; D ( 0 ) = O &theta; D , 第nL-1层为最高层,W为当前深度视频帧的宽,H为当前深度视频帧的高。③-3. The depth edge area with large depth difference gives the user a stronger sense of depth, so the depth strong edge area in the current depth video frame is another important area of interest for depth visual attention, so the well-known 0 degree, π/4 degree, π/2 degree and 3π/4 degree direction Gabor filter performs convolution operation on the current depth video frame to extract 0 degree, π/4 degree, π/2 degree and 3π/4 degree The four direction components of the direction are obtained to obtain the four direction component maps of the current depth video frame, and the four direction component maps are respectively represented as O 0 D , O π/4 D , O π/2 D and O 3π/4 D ; The O 0 D direction component map, O π/4 D direction component map, O π/2 D direction component map and O 3π/4 D direction component map of the current depth video frame are respectively decomposed into a Gaussian pyramid, each decomposed into n L layers Direction component map, record the direction component map in the θ-degree direction decomposed by the Gaussian pyramid, the i-th layer direction component map is O θ D (i), the width and height of O θ D (i) are respectively W/2 i and H/2 i , where, θ∈{0, π/4, π/2, 3π/4}i∈[0, n L -1], the 0th layer is the bottom layer, o &theta; D. ( 0 ) = o &theta; D. , The n L -1th layer is the highest layer, W is the width of the current depth video frame, and H is the height of the current depth video frame.

③-4、利用当前深度视频帧的各度方向的方向分量图的nL层方向分量图,提取当前深度视频帧的初步深度方向特征图,记为F′DO F &OverBar; DO &prime; = 1 4 &Sigma; &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } F &OverBar; O &theta; , 其中, s,c ∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},

Figure G2009101525203D00196
为归一化至
Figure G2009101525203D00197
区间的归一化函数,符号“||”为绝对值运算符号,Oθ D(c)为θ度方向的方向分量图的第c层方向分量图,Oθ D(s)为θ度方向的方向分量图的第s层方向分量图,符号
Figure G2009101525203D00201
为Oθ D(c)与Oθ D(s)进行跨层级作差运算符,如果c<s,则将Oθ D(s)上采样至与Oθ D(c)具有相同分辨率的图像上,然后将Oθ D(c)的各个像素与上采样后的Oθ D(s)相对应像素分别进行作差,如果c>s,则将Oθ D(c)上采样至与Oθ D(s)具有相同分辨率的图像上,然后将Oθ D(s)的各个像素与上采样后的Oθ D(c)相对应像素分别进行作差,符号
Figure G2009101525203D00202
为Oθ D(c)与Oθ D(s)进行跨层级相加运算符,如果c<s,则将Oθ D(s)上采样至与Oθ D(c)具有相同分辨率的图像上,然后将Oθ D(c)的各个像素与上采样后的Oθ D(s)相对应像素分别进行求和,如果c>s,则将Oθ D(c)上采样至与Oθ D(s)具有相同分辨率的图像上,然后将Oθ D(s)的各个像素与上采样后的Oθ D(c)相对应像素分别进行求和。3.-4, utilize the nL layer direction component map of the direction component map of each degree direction of the current depth video frame to extract the preliminary depth direction feature map of the current depth video frame, denoted as F ' DO , f &OverBar; do &prime; = 1 4 &Sigma; &theta; &Element; { 0 , &pi; 4 , &pi; 2 , 3 &pi; 4 } f &OverBar; o &theta; , in, s, c ∈ [0, n L -1], s = c + δ, δ = {-3, -2, -1, 1, 2, 3},
Figure G2009101525203D00196
to normalize to
Figure G2009101525203D00197
The normalization function of the interval, the symbol "||" is the absolute value operation symbol, O θ D (c) is the c-th layer direction component map of the direction component map in the θ degree direction, O θ D (s) is the θ degree direction The s-th layer direction component map of the direction component map, symbol
Figure G2009101525203D00201
Perform a cross-level difference operator for O θ D (c) and O θ D (s), if c<s, then upsample O θ D (s) to the same resolution as O θ D (c) On the image, then make a difference between each pixel of O θ D (c) and the corresponding pixel of the up-sampled O θ D (s), if c>s, then up-sample O θ D (c) to the same O θ D (s) has the same resolution on the image, and then make a difference between each pixel of O θ D (s) and the corresponding pixel of O θ D (c) after upsampling, the symbol
Figure G2009101525203D00202
Perform a cross-level addition operator for O θ D (c) and O θ D (s), if c < s, then upsample O θ D (s) to the same resolution as O θ D (c) On the image, each pixel of O θ D (c) is summed with the corresponding pixel of the upsampled O θ D (s), if c>s, then O θ D (c) is upsampled to the same value as O θ D (s) has the same resolution on the image, and then each pixel of O θ D (s) is summed with the corresponding pixel of the upsampled O θ D (c).

③-5、采用公知的形态学膨胀算法以大小为w1×h1的块为基本膨胀单元对当前深度视频帧的初步深度方向特征图F′DO进行n1次膨胀操作,得到当前深度视频帧的深度方向特征图,记为FDO。在本实施例中,针对“Ballet”和“Doorflower”测试序列,测试序列中各个图像的尺寸大小为1024×768,形态学膨胀的基本单元采用8×8的块,即w1×h1=8×8,膨胀次数n1=2。③-5. Using the known morphological dilation algorithm, take the block of size w 1 ×h 1 as the basic dilation unit to perform n 1 dilation operations on the preliminary depth direction feature map F′ DO of the current depth video frame to obtain the current depth video Depth direction feature map of a frame, denoted as F DO . In this embodiment, for the "Ballet" and "Doorflower" test sequences, the size of each image in the test sequence is 1024×768, and the basic unit of morphological expansion adopts 8×8 blocks, that is, w 1 ×h 1 = 8×8, the number of expansions n 1 =2.

③-6、利用当前深度视频帧的深度特征图FD和深度方向特征图FDO,获取当前深度视频帧的初步深度视觉注意的分布图,记为S′D

Figure G2009101525203D00203
记S′D中坐标为(x,y)的像素的像素值为s′d(x,y),其中,
Figure G2009101525203D00204
为归一化至
Figure G2009101525203D00205
区间的归一化函数。③-6. Utilize the depth feature map F D and the depth direction feature map F DO of the current depth video frame to obtain the distribution map of the preliminary depth visual attention of the current depth video frame, denoted as S′ D ,
Figure G2009101525203D00203
Note that the pixel value of the pixel whose coordinates are (x, y) in S′ D is s′ d (x, y), where,
Figure G2009101525203D00204
to normalize to
Figure G2009101525203D00205
Normalization function for intervals.

③-7、对于图像的左右边界区域,左视点图像具有的左图像边界,在右视点并没有与之对应的区域,所以无法在人脑中形成立体感;同理,对于右视点图像的右图像边界也难以形成立体感。所以在立体视频中,图像的左右边界区域提供的立体感较弱甚至没有立体感,是非立体视觉注意区域,所以本发明对当前深度视频帧的初步深度视觉注意的分布图S′D的边界区域进行抑制,利用当前深度视频帧的初步深度视觉注意的分布图S′D,获取当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD,记SD中坐标为(x,y)的像素的像素值为sd(x,y),sd(x,y)=s′d(x,y)·g(x,y),其中, g ( x , y ) = 0.2 if x < b | | y < b | | x > W - b | | y > H - b 0.4 elseif x < 2 b | | y < 2 b | | x > W - 2 b | | y > H - 2 b 0.6 elseif x < 3 b | | y < 3 b | | x > W - 3 b | | y > H - 3 b 1 else , W为当前深度视频帧的宽,H为当前深度视频帧的高,b为设定的第二阈值,符号“||”为“或”运算符。在此,第二阈值b取值为16。函数g(x,y)也可为其他对图像边缘区域抑制的二维函数,如模板大小与纹理视频帧尺寸相同的二维高斯函数。③-7. For the left and right boundary areas of the image, the left image boundary of the left viewpoint image has no corresponding region in the right viewpoint, so it cannot form a stereoscopic effect in the human brain; similarly, for the right It is also difficult to form a three-dimensional effect at the image boundary. Therefore, in the stereoscopic video, the stereoscopic effect provided by the left and right boundary regions of the image is weak or even has no stereoscopic effect, and it is a non-stereoscopic vision attention region, so the present invention pays attention to the distribution map S′ D of the boundary region of the preliminary depth vision of the current depth video frame To suppress, use the distribution map S′ D of the preliminary depth visual attention of the current depth video frame to obtain the distribution map S D of the depth visual attention of the 3D video image jointly presented by the current depth video frame and the current texture video frame, record in SD The pixel value of the pixel whose coordinates are (x, y) is s d (x, y), s d (x, y) = s′ d (x, y)·g(x, y), where, g ( x , the y ) = 0.2 if x < b | | the y < b | | x > W - b | | the y > h - b 0.4 else if x < 2 b | | the y < 2 b | | x > W - 2 b | | the y > h - 2 b 0.6 else if x < 3 b | | the y < 3 b | | x > W - 3 b | | the y > h - 3 b 1 else , W is the width of the current depth video frame, H is the height of the current depth video frame, b is the set second threshold, and the symbol "||" is an "or" operator. Here, the second threshold b takes a value of 16. The function g(x, y) can also be other two-dimensional functions for suppressing image edge regions, such as two-dimensional Gaussian functions whose template size is the same as the texture video frame size.

图8c给出了测试序列“Ballet”二维彩色视频中t时刻的彩色视频帧与对应的深度视频帧联合展现的三维视频图像的深度视觉注意的分布图;图13c给出了测试序列“DoorFlower”二维彩色视频中t时刻的彩色视频帧与对应的深度视频帧联合展现的三维视频图像的深度视觉注意的分布图。Figure 8c shows the depth visual attention distribution diagram of the 3D video image jointly presented by the color video frame and the corresponding depth video frame at time t in the test sequence "Ballet" 2D color video; Figure 13c shows the distribution diagram of the test sequence "DoorFlower "The distribution map of the depth visual attention of the 3D video image jointly presented by the color video frame and the corresponding depth video frame at time t in the 2D color video.

④采用基于深度感知的视觉注意融合方法将当前纹理视频帧的静态图像域视觉注意的分布图SI、当前纹理视频帧的运动视觉注意的分布图SM、当前深度视频帧及当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD融合,以提取符合人眼立体感知的三维视觉注意的分布图,记为S,三维视觉注意的分布图S的尺寸大小为W×H且其为ZS比特深度表示的灰度图,该灰度图中某一像素的像素值越大表示人眼对当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的对应像素的相对关注程度越高,像素值越小表示人眼对当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的相对关注程度越低。④Adopt the visual attention fusion method based on depth perception to combine the static image domain visual attention distribution map S I of the current texture video frame, the motion visual attention distribution map S M of the current texture video frame, the current depth video frame and the current depth video frame The depth visual attention distribution map S D of the 3D video image jointly displayed with the current texture video frame is fused to extract the 3D visual attention distribution map that conforms to the stereoscopic perception of the human eye, denoted as S, the size of the 3D visual attention distribution map S The size is W×H and it is a grayscale image represented by Z S bit depth. The larger the pixel value of a certain pixel in the grayscale image, the 3D video image jointly presented by the human eye to the current depth video frame and the current texture video frame The higher the relative attention degree of the corresponding pixel of , the smaller the pixel value is, the lower the relative attention degree of the human eye is to the 3D video image jointly presented by the current depth video frame and the current texture video frame.

在传统单通道中,运动物体相比于静止物体更容易引起观看者的注意,对于都为静止的物体,颜色鲜艳的区域、色彩或亮度对比度较大的区域、纹理方向差异性较大的区域等更容易引起观看者的注意;在立体视频中,人眼的视觉注意分布除了受到运动视觉注意和静态图像域视觉注意影响外,还受到立体视频提供给予用户特有的立体感的影响;这种立体感主要来源于我们左右眼所看到的场景的微小的位置偏差,称为视差,例如我们双眼间距约为6厘米,各眼所收到的物体影像投影到视网膜上形成的视觉影像有微小的位置偏差,这个微小偏差通过大脑自动综合为具备深度的立体图像,形成立体视觉,立体感所体现的对象相对距离信息是直接影响我们注意力选择另一重要因素。在立体视频中,深度不连续区域或深度对比度较大的区域所包含的对象能够给予用户更加强烈的深度为之差异,具有更强的立体感或深度感,是用户感兴趣的区域之一;另一方面,观看者对靠近拍摄相机(或视频观看者)的前景区域的感兴趣程度大于远离拍摄相机(或视频观看者)的区域的感兴趣程度,所以前景区域通常是立体视频观看者感兴趣区域的重要潜在区域,基于以上分析,确定影响人眼三维视觉注意的因素包括静态图像域视觉注意、运动视觉注意、深度视觉注意以及深度四个因素,因此,在此具体实施例中基于深度感知的视觉注意融合方法的具体过程为:In the traditional single channel, moving objects are more likely to attract the attention of the viewer than static objects. For all static objects, areas with bright colors, areas with large color or brightness contrast, and areas with large differences in texture directions etc. are more likely to attract the attention of the viewer; in stereoscopic video, the visual attention distribution of the human eye is not only affected by the movement visual attention and static image domain visual attention, but also by the unique stereoscopic effect provided by the stereoscopic video to the user; this The three-dimensional effect mainly comes from the slight positional deviation of the scene seen by our left and right eyes, which is called parallax. This small deviation is automatically synthesized by the brain into a stereoscopic image with depth, forming stereoscopic vision. The relative distance information of the object reflected in the stereoscopic sense is another important factor that directly affects our attention choice. In the stereoscopic video, the objects contained in the depth discontinuous area or the area with greater depth contrast can give the user a stronger depth difference, which has a stronger sense of three-dimensional or depth, and is one of the areas of interest to the user; On the other hand, the viewer is more interested in the foreground area close to the shooting camera (or video viewer) than the area far away from the shooting camera (or video viewer), so the foreground area is usually a stereoscopic video viewer. The important potential area of the region of interest, based on the above analysis, determines that the factors that affect the three-dimensional visual attention of the human eye include four factors: static image domain visual attention, motion visual attention, depth visual attention, and depth. Therefore, in this specific embodiment, based on the depth The specific process of the perceptual visual attention fusion method is as follows:

④-1、通过Q(d(x,y))=d(x,y)+γ对当前深度视频帧进行尺度变换,其中,γ为值在范围内的系数,d(x,y)表示当前深度视频帧中坐标为(x,y)的像素的像素值,Q(d(x,y))表示尺度变换后的当前深度视频帧中坐标为(x,y)的像素的像素值。④-1. Carry out scale transformation to the current depth video frame by Q(d(x, y))=d(x, y)+γ, wherein, γ is the value in Coefficients within the range, d(x, y) represents the pixel value of the pixel whose coordinates are (x, y) in the current depth video frame, Q(d(x, y)) represents the coordinates in the current depth video frame after scale transformation The pixel value of the pixel at (x, y).

④-2、利用尺度变换后的当前深度视频帧、当前深度视频帧及当前深度视频帧与当前纹理视频帧联合展现的三维视频图像的深度视觉注意的分布图SD、当前纹理视频帧的运动视觉注意的分布图SM以及当前纹理视频帧的静态图像域视觉注意的分布图SI,获取三维视觉注意的分布图S,记三维视觉注意的分布图S中坐标为(x,y)的像素的像素值为s(x,y),其中,KD、KM和KI分别SD、SM以及SI的加权系数,加权系数满足条件: &Sigma; a &Element; { D , M , I } K a = 1 , 0≤Ka≤1,

Figure G2009101525203D00224
为归一化至
Figure G2009101525203D00225
区间的归一化函数,sD(x,y)、sM(x,y)和sI(x,y)分别表示SD、SM以及SI中坐标为(x,y)的像素的像素值,Θab(x,y)为视觉注意的相关值,Θab(x,y)=min(sa(x,y),sb(x,y)),min()为取最小值函数,Cab为相关系数,相关系数满足条件: &Sigma; a , b &Element; { D , M , I } , a &NotEqual; b C ab = 1 , 0≤Cab<1,相关系数CDM表示SD与SM的相关度,相关系数CDI表示SD与SI的相关度,相关系数CIM表示SI与SM的相关度,a,b ∈{D,M,I}且a≠b。④-2. Using the scale-transformed current depth video frame, the current depth video frame, and the depth visual attention distribution map SD of the 3D video image jointly presented by the current depth video frame and the current texture video frame, and the motion of the current texture video frame The distribution map S M of visual attention and the distribution map S I of visual attention in the static image domain of the current texture video frame, the distribution map S of three-dimensional visual attention is obtained, and the coordinates in the distribution map S of three-dimensional visual attention are (x, y) The pixel value of the pixel is s(x, y), Among them, K D , K M and K I are the weighting coefficients of S D , S M and S I respectively, and the weighting coefficients meet the conditions: &Sigma; a &Element; { D. , m , I } K a = 1 , 0≤K a ≤1,
Figure G2009101525203D00224
to normalize to
Figure G2009101525203D00225
The normalization function of the interval, s D (x, y), s M (x, y) and s I (x, y) represent the pixel with coordinates (x, y) in S D , S M and S I respectively , Θ ab (x, y) is the relevant value of visual attention, Θ ab (x, y)=min(s a (x, y), s b (x, y)), min() is the The minimum value function, C ab is the correlation coefficient, and the correlation coefficient satisfies the conditions: &Sigma; a , b &Element; { D. , m , I } , a &NotEqual; b C ab = 1 , 0≤C ab <1, the correlation coefficient C DM represents the correlation between SD and S M , the correlation coefficient C DI represents the correlation between SD and S I , and the correlation coefficient C IM represents the correlation between S I and S M , a , b ∈ {D, M, I} and a≠b.

运动视觉注意、静态图像域视觉注意和深度视觉注意联合人们视觉注意都起着重要的作用,然而运动视觉注意是视频视觉注意中最重要的内容,其次是由图像域的亮度、颜色和方向引起的静态图像域视觉注意,再次之为深度视觉注意,所以在本实施例中,各视觉注意分布图均由ZS=8比特深度表示,在此具体实施例中取KD=0.15、KM=0.4和KI=0.35,深度视觉注意与运动视觉注意的相关度较小,深度视觉注意与静态图像域视觉注意的相关度也较小,静态图像域视觉注意与运动视觉注意的相关度较大,所以在此相关系数CDM、CDI和CIM分别设置为0.2、0.2和0.6,尺度变换系数γ表征纹理视频场景的景物纵深度,γ越小景物纵深越大,给予观看者的深度感越强,相反,γ越大景物纵深越小,给予观看者的深度感越弱,针对“Ballet”和“Door Flower”测试序列,由于场景的景物景深较小,因此设置尺度变换系数γ为50,针对“Ballet”测试序列t时刻的纹理视频帧及对应的深度视频帧提取得到的三维视觉注意的分布图如图9所示,针对“Door Flower”测试序列t时刻的纹理视频帧及对应的深度视频帧提取得到的三维视觉注意的分布图如图14所示。Motion visual attention, static image domain visual attention and depth visual attention all play important roles in conjunction with people's visual attention, however motion visual attention is the most important content in video visual attention, followed by brightness, color and orientation of the image domain Visual attention in the static image domain, again it is depth visual attention, so in this embodiment, each visual attention distribution map is represented by Z S =8 bit depth, in this specific embodiment take K D =0.15, K M =0.4 and K I =0.35, the correlation between depth visual attention and motion visual attention is small, the correlation between depth visual attention and static image domain visual attention is also small, and the correlation between static image domain visual attention and motion visual attention is relatively small is large, so the correlation coefficients C DM , C DI and C IM are set to 0.2, 0.2 and 0.6 respectively, and the scale transformation coefficient γ represents the depth of the scene in the texture video scene. The smaller the γ, the greater the depth of the scene, giving the viewer more On the contrary, the larger the γ, the smaller the depth of the scene, and the weaker the depth sense given to the viewer. For the "Ballet" and "Door Flower" test sequences, since the scene has a smaller depth of field, the scale transformation coefficient γ is set as 50. The distribution diagram of the three-dimensional visual attention extracted from the texture video frame and the corresponding depth video frame at the time t of the "Ballet" test sequence is shown in Figure 9. The texture video frame and the corresponding depth video frame at the time t of the "Door Flower" test sequence The distribution map of 3D visual attention obtained by extracting the depth video frame is shown in Fig. 14.

⑤对三维视觉注意的分布图S进行阈值化和宏块化后处理,获取当前纹理视频帧的最终的符合人眼立体感知的感兴趣区域。⑤Thresholding and macroblocking post-processing are performed on the distribution map S of 3D visual attention to obtain the final region of interest that conforms to the stereoscopic perception of the human eye in the current texture video frame.

在此具体实施例中,对三维视觉注意的分布图S进行阈值化和宏块化后处理的具体过程为:In this specific embodiment, the specific process of performing thresholding and macroblock post-processing on the distribution map S of 3D visual attention is as follows:

⑤-1、记三维视觉注意的分布图S中坐标为(x,y)的像素的像素值为s(x,y),定义第三阈值TS T S = k T &CenterDot; &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 s ( x , y ) / ( W &times; H ) , 其中,W为三维视觉注意的分布图S的宽,H为三维视觉注意的分布图S的高,kT∈(0,3),在此具体实施例的应用过程中kT可取值为1.5;新建一个初步二值掩模图像,判断s(x,y)≥TS是否成立,如果成立,则将初步二值掩模图像中坐标为(x,y)的像素标记为感兴趣像素,否则,将初步二值掩模图像中(x,y)坐标的像素标记为非感兴趣像素。⑤-1. The pixel value of the pixel whose coordinates are (x, y) is s(x, y) in the distribution map S that remembers the three-dimensional visual attention, defines the third threshold T S , T S = k T &Center Dot; &Sigma; the y = 0 h - 1 &Sigma; x = 0 W - 1 the s ( x , the y ) / ( W &times; h ) , Wherein, W is the width of the distribution map S of three-dimensional visual attention, H is the height of the distribution map S of three-dimensional visual attention, k T ∈ (0, 3), and k T can be taken as 1.5; create a new preliminary binary mask image, and judge whether s(x, y) ≥ T S is established, and if so, mark the pixel with coordinates (x, y) in the preliminary binary mask image as the pixel of interest , otherwise, mark the pixel at (x, y) coordinate in the preliminary binary mask image as a non-interest pixel.

⑤-2、将初步二值掩模图像分割成(W/w2)×(H/h2)个尺寸大小为w2×h2的块,且块与块之间互不重叠,记横坐标为u且纵坐标为v的块为Bu,v,其中,u∈[0,W/w2-1],v∈[0,H/h2-1],根据初步二值掩模图像中的各个块确定当前纹理视频帧中的对应的各个块中的像素为感兴趣像素还是非感兴趣像素,对于块Bu,v,判断块Bu,v中标记为感兴趣像素的像素的个数是否大于设定的第四阈值Tb,其中,0≤Tb≤w2×h2,如果是,则将当前纹理视频帧中与块Bu,v对应的块中的所有像素标记为感兴趣像素,并将块Bu,v对应的块作为感兴趣区域块,否则,将当前纹理视频帧中与块Bu,v对应的块中的所有像素标记为非感兴趣像素,并将块Bu,v对应的块作为非感兴趣区域块,得到当前纹理视频帧的初步感兴趣区域掩模图像,该初步感兴趣区域掩模图像由感兴趣区域块和非感兴趣区域块组成。⑤-2. Divide the preliminary binary mask image into (W/w 2 )×(H/h 2 ) blocks with a size of w 2 ×h 2 , and the blocks do not overlap each other, mark horizontally The block with coordinate u and ordinate v is B u, v , where u ∈ [0, W/w 2 -1], v ∈ [0, H/h 2 -1], according to the preliminary binary mask Each block in the image determines whether the pixel in the corresponding block in the current texture video frame is a pixel of interest or a non-interest pixel. For block B u, v , determine the pixel marked as a pixel of interest in block B u, v Whether the number of is greater than the set fourth threshold T b , wherein, 0≤T b ≤w 2 ×h 2 , if yes, all pixels in the block corresponding to the block B u, v in the current texture video frame Mark as a pixel of interest, and use the block corresponding to the block B u, v as the region of interest block, otherwise, mark all the pixels in the block corresponding to the block B u, v in the current texture video frame as non-interest pixels, And the block corresponding to block B u, v is regarded as the non-interest area block, obtains the initial interest area mask image of the current texture video frame, and the preliminary interest area mask image is composed of the interest area block and the non-interest area block composition.

在本实施例中,测试序列“Ballet”和“Door Flower”中各图像的尺寸大小为1024×768,因此可设置块Bu,v的尺寸w2×h2为16×16,通常像素个数很少的区域不容易引起观看者的兴趣,所以在此第四阈值Tb设置为50。In this embodiment, the size of each image in the test sequence "Ballet" and "Door Flower" is 1024×768, so the size w 2 ×h 2 of the block Bu, v can be set to 16×16, usually pixels Regions with a small number are not likely to attract the interest of the viewer, so the fourth threshold T b is set to 50 here.

⑤-3、由于在感兴趣区域和非感兴趣区域之间通常不是骤然转变的,而是缓慢变化的,存在过渡区,所以本发明在感兴趣区域与非感兴趣区域之间设置NR级过渡感兴趣区域。将初步感兴趣区域掩模图像中与感兴趣区域块最相邻的非感兴趣区域块中的所有像素标记为第NR级过渡感兴趣区域,更新初步感兴趣区域掩模图像;然后,将更新后的初步感兴趣区域掩模图像中与第NR级过渡感兴趣区域最邻近的非感兴趣区域块中的所有像素标记为第NR-1级过渡感兴趣区域,递归更新初步感兴趣区域掩模图像;再重复递归上述过程,直至标记到第1级过渡感兴趣区域;最后得到当前纹理视频帧的最终的感兴趣区域掩模图像,该最终的感兴趣区域掩模图像由感兴趣区域块、NR级过渡感兴趣区域和非感兴趣区域块组成。在此具体实施例中,NR取值为2,即设置2级过渡感兴趣区域。⑤-3. Since the area of interest and the area of non-interest is usually not changed abruptly, but changes slowly, there is a transition zone, so the present invention sets NR level between the area of interest and the area of non-interest transition region of interest. Mark all the pixels in the non-region of interest block nearest to the region of interest block in the preliminary region of interest mask image as the NRth level transition region of interest, and update the preliminary region of interest mask image; then, set In the updated preliminary ROI mask image, all pixels in the non-ROI block closest to the NR- th transition ROI are marked as NR -1th transition ROIs, and the preliminary ROI is updated recursively Region mask image; repeat the recursive above-mentioned process again, until marking to the first-level transition region of interest; finally get the final region of interest mask image of the current texture video frame, the final region of interest mask image is determined by the region of interest Region block, NR level transition region of interest and non-region of interest block. In this specific embodiment, the value of NR is 2, that is, two levels of transition ROIs are set.

图10a给出了测试序列“Ballet”的t时刻的纹理视频帧的最终的感兴趣区域掩模图像;图15a给出了测试序列“Door Flower”的t时刻的纹理视频帧的最终的感兴趣区域掩模图像。图10a和图15a中黑色区域表示感兴趣区域,灰色区域为过渡感兴趣区域,白色为非感兴趣区域。Figure 10a shows the final ROI mask image of the texture video frame at time t of the test sequence "Ballet"; Figure 15a shows the final ROI mask image of the texture video frame at time t of the test sequence "Door Flower" Region mask image. In Fig. 10a and Fig. 15a, the black region represents the region of interest, the gray region is the transition region of interest, and the white region is the non-interest region.

⑤-4、记最终的感兴趣区域掩模图像中坐标为(x,y)的像素的像素值为r(x,y),将最终的感兴趣区域掩模图像中非感兴趣区域块中的所有像素的像素值置为r(x,y)=255,将最终的感兴趣区域掩模图像中NR级过渡感兴趣区域中的所有像素的像素值置为 r ( x , y ) = e N R + 1 &times; f ( x , y ) , 将最终的感兴趣区域掩模图像中感兴趣区域块中的所有像素的像素值置为r(x,y)=f(x,y),得到当前纹理视频帧的感兴趣区域,其中,e表示过渡感兴趣区域的级数,e∈[1,NR],f(x,y)表示当前纹理视频帧中坐标为(x,y)的像素的像素值。⑤-4. Record the pixel value of the pixel whose coordinates are (x, y) in the final region of interest mask image r(x, y), and place the non-interest region block in the final region of interest mask image Set the pixel values of all pixels in r(x, y)=255, and set the pixel values of all pixels in the N R- level transition ROI in the final ROI mask image as r ( x , the y ) = e N R + 1 &times; f ( x , the y ) , Set the pixel values of all pixels in the ROI block in the final ROI mask image to r(x, y)=f(x, y), to obtain the ROI of the current texture video frame, where e Indicates the series of the transition region of interest, e∈[1, N R ], f(x, y) indicates the pixel value of the pixel whose coordinates are (x, y) in the current texture video frame.

图10b给出了测试序列“Ballet”的t时刻的纹理视频帧的感兴趣区域;图15b给出了测试序列“Door Flower”的t时刻的纹理视频帧的感兴趣区域。图10b和图15b中的感兴趣区域与t时刻的纹理视频帧的像素值相同,显示彩色的纹理内容,过渡感兴趣区域通过降低亮度显示的暗灰色区域,平滑白色区域为与感兴趣区域掩模图像的白色区域对应的非感兴趣区域。作为提取效果对比,图11a和图16a分别给出了传统仅依据静态图像域视觉注意线索提取的测试序列“Ballet”和“Door Flower”t时刻的纹理视频帧的感兴趣区域,没能去除背景纹理丰富的噪声区域;图11b和图16b给出了传统仅依据运动视觉注意线索提取的测试序列“Ballet”和“Door Flower”t时刻的纹理视频帧的感兴趣区域,对于“Ballet”序列,仅依据运动视觉注意线索感兴趣区域提取方法不能完整提取运动非常缓慢的男士,同时运动影子引起的背景噪声严重;对于“Door Flower”序列,仅依据运动视觉注意线索感兴趣区域提取方法,仅提取运动区域,却没有考虑纹理复杂性和立体视觉提供的深度感。图11c和图16c给出了依据静态图像域视觉注意与运动视觉注意线索联合的测试序列“Ballet”和“Door Flower”t时刻的纹理视频帧的感兴趣区域,虽然该方法联合静态和运动视觉信息,然而背景环境中的纹理区域和运动噪声并不能有效抑制。Fig. 10b shows the region of interest of the texture video frame at time t of the test sequence "Ballet"; Fig. 15b shows the region of interest of the texture video frame of the test sequence "Door Flower" at time t. The region of interest in Figure 10b and Figure 15b has the same pixel value as the texture video frame at time t, displaying colored texture content, the transition region of interest is a dark gray region displayed by reducing the brightness, and the smooth white region is a masked area with the region of interest The non-interest region corresponding to the white region of the model image. As a comparison of the extraction effects, Figure 11a and Figure 16a respectively show the regions of interest of the texture video frames of the test sequences "Ballet" and "Door Flower" at time t, which are traditionally extracted only based on visual attention cues in the static image domain, and the background cannot be removed. Noise area with rich texture; Figure 11b and Figure 16b show the area of interest of the texture video frame of the test sequence "Ballet" and "Door Flower" at time t, which is traditionally extracted only based on motion visual attention cues. For the "Ballet" sequence, The ROI extraction method based only on motion visual attention cues cannot fully extract the man who moves very slowly, and the background noise caused by moving shadows is serious; for the "Door Flower" sequence, only based on the ROI extraction method based on motion visual attention regions of motion without taking into account texture complexity and the sense of depth provided by stereo vision. Figure 11c and Figure 16c show the region of interest of the texture video frame at time t of the test sequences "Ballet" and "Door Flower" based on the combination of static image domain visual attention and motion visual attention cues, although the method combines static and motion vision information, however textured regions and motion noise in the background environment cannot be effectively suppressed.

从图10a、图10b与图11a、图11b、图11c,图15a、图15b与图16a、图16b、图16c间的对比实验可以看出,本发明提取的感兴趣区域融合了静态图像域视觉注意、运动视觉注意和深度视觉注意,有效抑制各视觉注意提取内在的单一性和不准确性,解决了静态图像域视觉注意中的复杂背景引起的噪声问题,解决了运动视觉注意无法提取局部运动和运动幅度小的感兴趣区域,从而提高计算精度,增强算法的稳定性,能够从纹理复杂的背景和运动环境中提取出感兴趣区域。另外,通过本发明获取的感兴趣区域除符合人眼对静态纹理视频帧的视觉感兴趣特性和人眼对运动对象感兴趣的视觉特性外,还符合在立体视觉中对深度感强或距离近的对象感兴趣的深度感知特性,符合人眼立体视觉的语义特征。It can be seen from the comparison experiment between Fig. 10a, Fig. 10b and Fig. 11a, Fig. 11b, Fig. 11c, Fig. 15a, Fig. 15b and Fig. Visual attention, motion visual attention and depth visual attention can effectively suppress the inherent singleness and inaccuracy of each visual attention extraction, solve the noise problem caused by complex background in visual attention in the static image domain, and solve the problem that motion visual attention cannot extract local The region of interest with small motion and motion range can improve the calculation accuracy, enhance the stability of the algorithm, and can extract the region of interest from the background with complex texture and the moving environment. In addition, the region of interest obtained through the present invention not only conforms to the visual characteristics of human eyes on static texture video frames and the visual characteristics of human eyes on moving objects, but also conforms to the strong sense of depth or short distance in stereo vision. The depth perception characteristics of the object of interest conform to the semantic characteristics of human stereo vision.

⑥重复步骤①~⑤直至处理完纹理视频中的所有纹理视频帧,获取纹理视频的视频感兴趣区域。⑥Repeat steps ①~⑤ until all the texture video frames in the texture video are processed, and the video ROI of the texture video is obtained.

在此具体实施例中,当前纹理视频帧的静态图像域视觉注意的分布图SI、当前纹理视频帧的运动视觉注意的分布图SM、三维视频图像的深度视觉注意的分布图SD、三维视觉注意的分布图S均为ZS比特深度表示的灰度图,纹理视频对应的深度视频中各时刻的深度视频帧为ZD比特深度表示的灰度图,而在此灰度图均采用了256色,用8位深度表示,因此,取ZS=8,ZD=8,当然在实际应用过程中也可采用其他比特深度表示灰度图,比如16位深度,如果用16位深度表示灰度图的话,则表示精度会更高一些。In this specific embodiment, the distribution map S I of the static image domain visual attention of the current texture video frame, the distribution map S M of the motion visual attention of the current texture video frame, the distribution map S D of the depth visual attention of the three-dimensional video image, The distribution map S of three-dimensional visual attention is a grayscale image represented by Z S bit depth, and the depth video frames at each moment in the depth video corresponding to the texture video are grayscale images represented by Z D bit depth, and here the grayscale images are all 256 colors are used, represented by 8-bit depth, therefore, Z S =8, Z D =8, of course, other bit depths can also be used to represent grayscale images in practical applications, such as 16-bit depth, if 16-bit If the depth represents the grayscale image, the accuracy will be higher.

Claims (8)

1. A method for extracting a video interesting region based on visual attention is characterized by comprising the following steps:
firstly, defining a two-dimensional color video as a texture video, defining the size of texture video frames at each moment in the texture video to be W multiplied by H, W being the width of the texture video frames at each moment in the texture video, H being the height of the texture video frames at each moment in the texture video, and recording the texture video frame at t moment in the texture video as FtDefining a texture video frame F at time t in the texture videotFor the current texture video frame, the well-known static state is adoptedThe image visual attention detection method detects the visual attention of the static image domain of the current texture video frame to obtain a distribution diagram of the visual attention of the static image domain of the current texture video frame, which is marked as SIThe static image domain visual attention distribution map S of the current texture video frameIHas a dimension of W × H and is ZSA grayscale map of bit depth representations;
secondly, detecting the motion visual attention of the current texture video frame by adopting a motion visual attention detection method to obtain a distribution diagram of the motion visual attention of the current texture video frame, and recording the distribution diagram as SMDistribution map S of motion visual attention of current texture video frameMHas a dimension of W × H and is ZSA grayscale map of bit depth representations;
defining depth video frame of each time in depth video corresponding to texture video as ZDSetting the size of a depth video frame at each moment in a depth video to be W multiplied by H, wherein W is the width of the depth video frame at each moment in the depth video, H is the height of the depth video frame at each moment in the depth video, and D is the depth video frame at t moment in the depth videotDefining a depth video frame D at time t in the depth videotFor the current depth video frame, detecting the depth visual attention of the three-dimensional video image jointly displayed by the current depth video frame and the current texture video frame by adopting a depth visual attention detection method to obtain a depth visual attention distribution diagram of the three-dimensional video image, which is marked as SDDistribution diagram S of depth visual attention of three-dimensional video imageDHas a dimension of W × H and is ZSA grayscale map of bit depth representations;
fourthly, adopting a visual attention fusion method based on depth perception to visually pay attention to the static image domain of the current texture video frameIDistribution diagram S of motion visual attention of current texture video frameMAnd a current depth video frame and a depth visual attention distribution diagram S of a three-dimensional video image jointly displayed by the current depth video frame and the current texture video frameDFusing to extract a distribution diagram of three-dimensional visual attention corresponding to human eye stereoscopic perception, which is recorded as SThe size of S is W × H and it is ZSA grayscale map of bit depth representations;
performing thresholding and macro block post-processing on the distribution graph S of the three-dimensional visual attention to obtain a final region of interest which accords with human eye three-dimensional perception of the current texture video frame;
sixthly, repeating the steps from the first step to the fifth step until all texture video frames in the texture video are processed, and obtaining the video interesting area of the texture video.
2. The method for extracting a video interesting region based on visual attention according to claim 1, wherein the specific process of the moving visual attention detection method in the step (II) is as follows:
(II-1) recording the texture video frame at the time t + j continuous with the current texture video frame in the texture video as Ft+jRecording the texture video frame at the t-j time point which is continuous with the current texture video frame in the texture video as Ft-jWherein j ∈ (0, N)F/2],NFIs a positive integer less than 10;
2, calculating the current texture video frame and the texture video frame F at the moment t + j by adopting a known optical flow methodt+jMotion vector image in horizontal direction and motion vector image in vertical direction, and texture video frame F of current texture video frame and time t-jt-jRecording the current texture video frame and the texture video frame F at the time of t + j in the motion vector image in the horizontal direction and the motion vector image in the vertical directiont+jThe motion vector image in the horizontal direction is Vt+j HAnd the motion vector image in the vertical direction is Vt+j VRecording the current texture video frame and the texture video frame F at the moment t-jt-jThe motion vector image in the horizontal direction is Vt-j HAnd the motion vector image in the vertical direction is Vt-j V,Vt+j H、Vt+j V、Vt+j HAnd Vt-j HW for width and H for height;
② -3, mixing Vt+j HAbsolute value of and Vt+j VThe absolute value of the current texture video frame is superposed to obtain the texture video frame F of the current texture video frame and the texture video frame F at the moment of t + jt+jMotion amplitude image of, noted as Mt+j M t + j = | V t + j H | + | V t + j V | , Memory Mt+jThe motion amplitude value of the pixel with the middle coordinate (x, y) is mt+j(x, y); will Vt-j HAbsolute value of and Vt-j VThe absolute value of the texture video frame is superposed to obtain the texture video frame F of the current texture video frame and the t-j momentt-jMotion amplitude image of, noted as Mt-j M t - j = | V t - j H | + | V t - j V | , Memory Mt-jThe motion amplitude value of the pixel with the middle coordinate (x, y) is mt-j(x,y);
② 4, utilizing current texture video frame and texture video frame F at t + j momentt+jAnd texture video frame F at time t-jt-jExtracting a joint motion map, denoted as Mj ΔExtracting the joint movement map Mj ΔThe specific process comprises the following steps: judging the current texture video frame and the texture video frame F at the moment t + jt+jMotion amplitude image Mt+jEach pixel in (a) and the current texture video frame and the texture video frame F at time t-jt-jMotion amplitude image Mt-jWhether the minimum value in the motion amplitude values of the pixels of the corresponding coordinates is larger than a set first threshold value T or not1And if so, determining a joint motion map Mj ΔThe pixel value of the pixel of the corresponding coordinate is Mt+jAnd Mt-jAverage of the sum of motion amplitude values of pixels of the corresponding coordinates, otherwise, determining the joint motion map Mj ΔThe pixel value of the pixel of the corresponding coordinate is 0; for Mt+jPixel with (x, y) middle coordinate and Mt-jThe pixel with the middle coordinate of (x, y) is judged to be min (m)t+j(x,y),mt-j(x, y)) is greater than a set first threshold value T1And if so, determining a joint motion map Mj ΔThe pixel value of the pixel with the middle coordinate (x, y) is
Figure A2009101525200004C1
Otherwise, determining the joint motion map Mj ΔThe pixel value of the pixel with the middle coordinate (x, y) is 0, wherein min () is a minimum function;
② -5, the distance from the t time to the 1 time to the N timeFWeighted superposition of the combined motion images at each moment of the/2 moment is carried out to obtain a weighted combined motion image of the current texture video frame, the weighted combined motion image is marked as M, the pixel value of a pixel with coordinates (x, y) in the weighted combined motion image M of the current texture video frame is marked as M (x, y),
Figure A2009101525200004C2
wherein m isj Δ(x, y) represents the joint motion map M at a time j away from time tj ΔPixel value, ζ, of a pixel having a middle coordinate of (x, y)jAs a weighting coefficient, a weighting coefficient ζiSatisfy the requirement of
Figure A2009101525200004C3
② 6, carrying out Gaussian pyramid decomposition on the weighted joint motion picture M of the current texture video frame to decompose into nLAnd (3) a layer weighted joint motion picture, wherein the ith layer weighted joint motion picture obtained after M Gaussian pyramid decomposition of the weighted joint motion picture of the current texture video frame is recorded as M (i), and the width and the height of the ith layer weighted joint motion picture M (i) are respectively W/2iAnd H/2iWherein n isLIs a positive integer less than 20, i ∈ [0, n ∈ [ ]L-1]W is the width of the current texture video frame, and H is the height of the current texture video frame;
② -7, utilizing the weighted joint motion map of the current texture video frame to nLLayer weighted joint motion map, extracting the motion visual attention distribution map S of the current texture video frameMRemember SMThe pixel value of the pixel with the middle coordinate (x, y) is sm(x,y),SM=FMWherein
Figure A2009101525200004C5
s,c∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure A2009101525200004C6
is normalized to
Figure A2009101525200004C7
The normalized function of interval, the symbol "|" is absolute value operation symbol, M (c) is the c-th layer weighted joint motion diagram, M(s) is the s-th layer weighted joint motion diagram, symbol
Figure A2009101525200005C1
Performing cross-level difference operation on M (c) and M(s), if c < s, up-sampling M(s) to the image with the same resolution as that of M (c), and then matching each pixel of M (c) with the up-sampled corresponding pixel of M(s)Respectively performing difference on the pixels, if c is more than s, up-sampling M (c) to the image with the same resolution as that of M(s), and then respectively performing difference on each pixel of M(s) and the corresponding pixel of the up-sampled M (c), and symbolizingPerforming a cross-level addition operator for M (c) and M(s), if c < s, upsampling M(s) to an image with the same resolution as M (c), then summing each pixel of M (c) with the corresponding pixel of upsampled M(s), respectively, if c > s, upsampling M (c) to an image with the same resolution as M(s), and then summing each pixel of M(s) with the corresponding pixel of upsampled M (c), respectively.
3. The method as claimed in claim 2, wherein the first threshold T set in step (II) -4 is set1=1。
4. The method for extracting video interesting regions based on visual attention according to claim 1 or 2, wherein the depth visual attention detection method in the third step comprises the following specific processes:
thirdly-1, carrying out Gaussian pyramid decomposition on the current depth video frame to obtain nLThe layer depth video frame is recorded as a layer i depth video frame obtained after the Gaussian pyramid decomposition of the current depth video frame is D (i), and the width and the height of the layer i depth video frame D (i) are respectively W/2iAnd H/2iWherein n isLIs a positive integer less than 20, i ∈ [0, n ∈ [ ]L-1]W is the width of the current depth video frame, and H is the height of the current depth video frame;
③ 2, utilizing n of current depth video frameLLayer depth video frame, extracting depth characteristic map of current depth video frame, and recording asWherein,
Figure A2009101525200005C4
s,c∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},
Figure A2009101525200005C5
is normalized to
Figure A2009101525200005C6
The normalized function of the interval, the symbol "|" is the absolute value operation symbol, D (c) is the c-th layer depth video frame, D(s) is the s-th layer depth video frame, the symbol
Figure A2009101525200005C7
Performing cross-level difference operation on D (c) and D(s), if c < s, upsampling D(s) to an image with the same resolution as D (c), then respectively performing difference on each pixel of D (c) and the corresponding pixel of the upsampled D(s), if c > s, upsampling D (c) to an image with the same resolution as D(s), then respectively performing difference on each pixel of D(s) and the corresponding pixel of the upsampled D (c), and symbolizing
Figure A2009101525200006C1
Performing a cross-layer addition operator for D (c) and D(s), if c < s, upsampling D(s) to an image with the same resolution as D (c), then summing each pixel of D (c) with the corresponding pixel of upsampled D(s), respectively, if c > s, upsampling D (c) to an image with the same resolution as D(s), and then summing each pixel of D(s) with the corresponding pixel of upsampled D (c), respectively;
thirdly, carrying out convolution operation on the current depth video frame by adopting known Gabor filters with 0 degree, pi/4 degree, pi/2 degree and 3 pi/4 degree directions to extract four direction components of the 0 degree, pi/4 degree, pi/2 degree and 3 pi/4 degree directions to obtain four direction component graphs of the current depth video frame, wherein the four direction component graphs are respectively represented as O0 D、Oπ/4 D、Oπ/2 DAnd O3π/4 D(ii) a O for current depth video frame0 DDirectional component diagram, Oπ/4 DDirectional component diagram, Oπ/2 DDirectional component diagram and O3π/4 DThe direction component diagram is respectively decomposed into n by Gaussian pyramidLThe layer direction component diagram is marked, and the ith layer direction component diagram obtained by decomposing the direction component diagram in the theta degree direction by a Gaussian pyramid is Oθ D(i),Oθ D(i) Is W/2 in width and height respectivelyiAnd H/2iWhere θ ∈ {0, π/4, π/2, 3 π/4} i ∈ [0, n ∈L-1]W is the width of the current depth video frame, and H is the height of the current depth video frame;
thirdly-4, utilizing n of the direction component diagram of each degree direction of the current depth video frameLExtracting a primary depth direction feature map of the current depth video frame, and recording the primary depth direction feature map as F'DO <math> <mrow> <msubsup> <mover> <mi>F</mi> <mo>&OverBar;</mo> </mover> <mi>DO</mi> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mn>4</mn> </mfrac> <munder> <mi>&Sigma;</mi> <mrow> <mi>&theta;</mi> <mo>&Element;</mo> <mo>{</mo> <mn>0</mn> <mo>,</mo> <mfrac> <mi>&pi;</mi> <mn>4</mn> </mfrac> <mo>,</mo> <mfrac> <mi>&pi;</mi> <mn>2</mn> </mfrac> <mo>,</mo> <mfrac> <mrow> <mn>3</mn> <mi>&pi;</mi> </mrow> <mn>4</mn> </mfrac> <mo>}</mo> </mrow> </munder> <msub> <mover> <mi>F</mi> <mo>&OverBar;</mo> </mover> <msub> <mi>O</mi> <mi>&theta;</mi> </msub> </msub> <mo>,</mo> </mrow> </math> Wherein,
Figure A2009101525200006C4
s,c∈[0,nL-1],s=c+δ,δ={-3,-2,-1,1,2,3},is normalized toNormalized function of interval, symbol "|" is absolute value operation symbol, Oθ D(c) A c-th layer direction component diagram which is a direction component diagram of a theta degree direction, Oθ D(s) an s-th layer direction component diagram of a theta degree direction, symbol
Figure A2009101525200006C7
Is Oθ D(c) And Oθ D(s) perform a cross-level differencing operator, if c < s, then Oθ D(s) upsampling to and Oθ D(c) On the image with the same resolution, and then Oθ D(c) And the up-sampled Oθ D(s) performing difference on corresponding pixels respectively, and if c is more than s, performing Oθ D(c) Up-sampling to and Oθ D(s) on an image with the same resolution, and then adding Oθ D(s) each pixel with up-sampled Oθ D(c) Performing difference and sign on corresponding pixels
Figure A2009101525200007C1
Is Oθ D(c) And Oθ D(s) performing a cross-level addition operator, if c < s, adding Oθ D(s) upsampling to and Oθ D(c) Images with the same resolutionThen O is introducedθ D(c) And the up-sampled Oθ D(s) the corresponding pixels are summed separately, and if c > s, O is addedθ D(c) Up-sampling to and Oθ D(s) on an image with the same resolution, and then adding Oθ D(s) each pixel with up-sampled Oθ D(c) Summing corresponding pixels respectively;
③ 5, adopting the known morphological dilation algorithm to obtain the size w1×h1Is a preliminary depth direction feature map F 'of the basic expansion unit to the current depth video frame'DOCarry out n1Performing secondary expansion operation to obtain a depth direction characteristic diagram of the current depth video frame, and recording the characteristic diagram as FDO
Thirdly-6, utilizing the depth characteristic image F of the current depth video frameDAnd depth direction feature map FDOObtaining a distribution map, noted as S ', of the preliminary depth visual attention of the current depth video frame'D
Figure A2009101525200007C2
Note S'DThe pixel value of the pixel with the middle coordinate of (x, y) is s'd(x, y) wherein,is normalized to
Figure A2009101525200007C4
A normalization function of the interval;
-7, profile S 'using preliminary depth visual attention of current depth video frame'DObtaining a depth visual attention distribution diagram S of the three-dimensional video image jointly displayed by the current depth video frame and the current texture video frameDRemember SDThe pixel value of the pixel with the middle coordinate (x, y) is sd(x,y),sd(x,y)=s′d(x, y) · g (x, y), wherein, g ( x , y ) = 0.2 if x < b | | y < b | | x > W - b | | y > H - b 0.4 elseif x < 2 b | | y < 2 b | | x > W - 2 b | | y > H - 2 b 0.6 elseif x < 3 b | | y < 3 b | | x > W - 3 b | | y > H - 3 b 1 else , w is the width of the current depth video frame, H is the height of the current depth video frame, b is a set second threshold, and the symbol "|" is an "or" operator.
5. The method according to claim 4, wherein w in-5 is w1=8,h1=8,n1The second threshold b set in the step (c) -7 is 16.
6. The method for extracting a video interesting region based on visual attention according to claim 1, wherein the visual attention fusion method based on depth perception in the step (iv) comprises the following specific processes:
and 1, carrying out scale transformation on the current depth video frame by Q (d (x, y)) ═ d (x, y) + gamma, wherein the gamma is a value in
Figure A2009101525200008C1
Coefficients within a range, d (x, y) represents a pixel value of a pixel with coordinates (x, y) in the current depth video frame, and Q (d (x, y)) represents a pixel value of a pixel with coordinates (x, y) in the current depth video frame after the scaling;
fourthly-2, performing combined display by utilizing the current depth video frame after the scale transformation, the current depth video frame and the current texture video frameDistribution diagram S of depth visual attention of three-dimensional video imageDDistribution diagram S of motion visual attention of current texture video frameMAnd a histogram S of the static image domain visual attention of the current texture video frameIAcquiring a three-dimensional visual attention distribution graph S, wherein the pixel value of a pixel with coordinates (x, y) in the three-dimensional visual attention distribution graph S is S (x, y),
Figure A2009101525200008C2
wherein, KD、KMAnd KIRespectively SD、SMAnd SIThe weighting coefficient satisfies the condition: <math> <mrow> <munder> <mi>&Sigma;</mi> <mrow> <mi>a</mi> <mo>&Element;</mo> <mo>{</mo> <mi>D</mi> <mo>,</mo> <mi>M</mi> <mo>,</mo> <mi>I</mi> <mo>}</mo> </mrow> </munder> <msub> <mi>K</mi> <mi>a</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> </mrow> </math> 0≤Ka≤1,
Figure A2009101525200008C4
is normalized to
Figure A2009101525200008C5
Normalized function of interval, sD(x,y)、sM(x, y) and sI(x, y) each represents SD、SMAnd SIThe pixel value, Θ, of a pixel having the middle coordinate (x, y)ab(x, y) is the correlation value of visual attention, [ theta ]ab(x,y)=min(sa(x,y),sb(x, y)), min () is the minimum function, CabAs the correlation coefficient, the correlation coefficient satisfies the condition: <math> <mrow> <munder> <mi>&Sigma;</mi> <mrow> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>&Element;</mo> <mo>{</mo> <mi>D</mi> <mo>,</mo> <mi>M</mi> <mo>,</mo> <mi>I</mi> <mo>}</mo> <mo>,</mo> <mi>a</mi> <mo>&NotEqual;</mo> <mi>b</mi> </mrow> </munder> <msub> <mi>C</mi> <mi>ab</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> </mrow> </math> 0≤Cab< 1, correlation coefficient CDMDenotes SDAnd SMDegree of correlation, coefficient of correlation CDIDenotes SDAnd SIDegree of correlation, coefficient of correlation CIMDenotes SIAnd SMA, b ∈ { D, M, I } and a ≠ b.
7. The method for extracting a video region of interest based on visual attention as claimed in claim 6, wherein the specific process of thresholding and macro-block post-processing the distribution map S of three-dimensional visual attention in the fifth step is as follows:
-1, recording the pixel value of the pixel with coordinate (x, y) in the distribution graph S of the three-dimensional visual attention as S (x, y), defining a third threshold TS <math> <mrow> <msub> <mi>T</mi> <mi>S</mi> </msub> <mo>=</mo> <msub> <mi>k</mi> <mi>T</mi> </msub> <mo>&CenterDot;</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>y</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>H</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>&Sigma;</mi> <mrow> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>W</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>s</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>/</mo> <mrow> <mo>(</mo> <mi>W</mi> <mo>&times;</mo> <mi>H</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Wherein W is the width of the three-dimensional visual attention profile S, H is the height of the three-dimensional visual attention profile S, and kTE (0, 3); newly building a preliminary binary mask image, and judging that s (x, y) is more than or equal to TSIf so, marking the pixel with the coordinate (x, y) in the preliminary binary mask image as the pixel of interest, otherwise, marking the pixel with the coordinate (x, y) in the preliminary binary mask image as the pixel of non-interest;
fifthly, dividing the preliminary binary mask image into (W/W)2)×(H/h2) Size of w2×h2The blocks are not overlapped, and the block with the abscissa of u and the block with the ordinate of v is marked as Bu,vWherein u ∈ [0, W/W ]2-1],v∈[0,H/h2-1]Determining whether the pixel in each corresponding block in the current texture video frame is the pixel of interest or the non-interest pixel according to each block in the preliminary binary mask image, and regarding the block Bu,vJudgment Block Bu,vWhether the number of pixels marked as the pixel of interest is larger than a set fourth threshold value T or notbWherein, 0 is less than or equal to Tb≤w2×h2If yes, the current texture video frame is compared with the block Bu,vAll pixels in the corresponding block are marked as pixels of interest, and block B is marked as the pixel of interestu,vTaking the corresponding block as the region of interest block, otherwise, comparing the current texture video frame with the block Bu,vAll pixels in the corresponding block are marked as non-interesting pixels and block B is markedu,vThe corresponding block is used as a non-interested area block to obtain a preliminary interested area mask image of the current texture video frame, wherein the preliminary interested area mask image consists of an interested area block and a non-interested area block;
fifthly, 3, a non-interested region block B which is most adjacent to the interested region block in the preliminary interested region mask imageu,vAll pixels in (1) are labeled as NthRLevel transition interested areas and updating the mask images of the primary interested areas; then, the updated preliminary region of interest mask imageNeutral and NRNon-interested region block B nearest to stage transition interested regionu,vAll pixels in (1) are labeled as NthR-level 1 transition region of interest, recursively updating the preliminary region of interest mask image; repeating the recursion process until the region of interest is marked to the level 1 transition region of interest; finally, obtaining a final interested region mask image of the current texture video frame, wherein the final interested region mask image consists of an interested region block and NRA level transition region of interest and a region of non-interest block;
fifthly, recording the pixel value of a pixel with the coordinate of (x, y) in the final region-of-interest mask image as r (x, y), setting the pixel values of all pixels in a region block which is not the region of interest in the final region-of-interest mask image as r (x, y) 255, and setting N in the final region-of-interest mask image as NRSetting the pixel values of all pixels in the region of interest of the level transition to <math> <mrow> <mi>r</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mi>e</mi> <mrow> <msub> <mi>N</mi> <mi>R</mi> </msub> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>&times;</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Setting the pixel values of all pixels in the region of interest block in the final region of interest mask image as r (x, y) ═ f (x, y), obtaining the region of interest of the current texture video frame, wherein e represents the progression of the transition region of interest, and e belongs to [1, N ]R]And f (x, y) represents the pixel value of the pixel with coordinates (x, y) in the current texture video frame.
8. The method according to claim 7, wherein w in the step (c) -2 is w2=16,h2=16,Set fourth threshold value Tb=50。
CN2009101525203A 2009-09-11 2009-09-11 Method for extracting video interested region based on visual attention Expired - Fee Related CN101651772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101525203A CN101651772B (en) 2009-09-11 2009-09-11 Method for extracting video interested region based on visual attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101525203A CN101651772B (en) 2009-09-11 2009-09-11 Method for extracting video interested region based on visual attention

Publications (2)

Publication Number Publication Date
CN101651772A true CN101651772A (en) 2010-02-17
CN101651772B CN101651772B (en) 2011-03-16

Family

ID=41673862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101525203A Expired - Fee Related CN101651772B (en) 2009-09-11 2009-09-11 Method for extracting video interested region based on visual attention

Country Status (1)

Country Link
CN (1) CN101651772B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853513A (en) * 2010-06-06 2010-10-06 华中科技大学 A Spatiotemporal Saliency Visual Attention Method Based on Information Entropy
CN101894371A (en) * 2010-07-19 2010-11-24 华中科技大学 A bioinspired top-down approach to visual attention
CN101950362A (en) * 2010-09-14 2011-01-19 武汉大学 Analytical system for attention of video signal
CN101964911A (en) * 2010-10-09 2011-02-02 浙江大学 Ground power unit (GPU)-based video layering method
CN102036073A (en) * 2010-12-21 2011-04-27 西安交通大学 Method for encoding and decoding JPEG2000 image based on vision potential attention target area
CN102034267A (en) * 2010-11-30 2011-04-27 中国科学院自动化研究所 Three-dimensional reconstruction method of target based on attention
CN102063623A (en) * 2010-12-28 2011-05-18 中南大学 Method for extracting image region of interest by combining bottom-up and top-down ways
CN102496024A (en) * 2011-11-25 2012-06-13 山东大学 Method for detecting incident triggered by characteristic frame in intelligent monitor
CN102572216A (en) * 2010-12-15 2012-07-11 佳能株式会社 Image processing apparatus and image processing method thereof
CN102630025A (en) * 2011-02-03 2012-08-08 美国博通公司 Method and system for processing signal
CN102663741A (en) * 2012-03-22 2012-09-12 北京佳泰信业技术有限公司 Method for carrying out visual stereo perception enhancement on color digit image and system thereof
CN103095996A (en) * 2013-01-25 2013-05-08 西安电子科技大学 Multi-sensor video fusion method based on space-time conspicuousness detection
CN103546736A (en) * 2012-07-12 2014-01-29 三星电子株式会社 Image processing device and method
CN103797510A (en) * 2011-07-07 2014-05-14 Ati科技无限责任公司 View Image Processing for Focus Orientation
CN104318569A (en) * 2014-10-27 2015-01-28 北京工业大学 Space salient region extraction method based on depth variation model
US8994792B2 (en) 2010-08-27 2015-03-31 Broadcom Corporation Method and system for creating a 3D video from a monoscopic 2D video and corresponding depth information
WO2015188666A1 (en) * 2014-06-13 2015-12-17 华为技术有限公司 Three-dimensional video filtering method and device
CN105550685A (en) * 2015-12-11 2016-05-04 哈尔滨工业大学 Visual attention mechanism based region-of-interest extraction method for large-format remote sensing image
CN105893999A (en) * 2016-03-31 2016-08-24 北京奇艺世纪科技有限公司 Method and device for extracting a region of interest
CN108961261A (en) * 2018-03-14 2018-12-07 中南大学 A kind of optic disk region OCT image Hierarchical Segmentation method based on spatial continuity constraint
CN109754357A (en) * 2018-01-26 2019-05-14 京东方科技集团股份有限公司 Image processing method, processing device, and processing device
CN109903247A (en) * 2019-02-22 2019-06-18 西安工程大学 High-precision grayscale method for color images based on Gaussian color space correlation
CN110070538A (en) * 2019-04-28 2019-07-30 华北电力大学(保定) Bolt two-dimensional visual documents structured Cluster method based on form optimization depth characteristic
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
CN110399842A (en) * 2019-07-26 2019-11-01 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and computer readable storage medium
CN110675940A (en) * 2019-08-01 2020-01-10 平安科技(深圳)有限公司 Pathological image labeling method and device, computer equipment and storage medium
CN111723829A (en) * 2019-03-18 2020-09-29 四川大学 A fully convolutional object detection method based on attention mask fusion
CN112654546A (en) * 2020-04-30 2021-04-13 华为技术有限公司 Method and device for identifying object of interest of user
CN113572958A (en) * 2021-07-15 2021-10-29 杭州海康威视数字技术股份有限公司 Method and equipment for automatically triggering camera to focus
CN113936015A (en) * 2021-12-17 2022-01-14 青岛美迪康数字工程有限公司 Method and device for extracting effective region of image

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853513A (en) * 2010-06-06 2010-10-06 华中科技大学 A Spatiotemporal Saliency Visual Attention Method Based on Information Entropy
CN101894371A (en) * 2010-07-19 2010-11-24 华中科技大学 A bioinspired top-down approach to visual attention
CN101894371B (en) * 2010-07-19 2011-11-30 华中科技大学 Bio-inspired top-down visual attention method
US8994792B2 (en) 2010-08-27 2015-03-31 Broadcom Corporation Method and system for creating a 3D video from a monoscopic 2D video and corresponding depth information
CN101950362A (en) * 2010-09-14 2011-01-19 武汉大学 Analytical system for attention of video signal
CN101950362B (en) * 2010-09-14 2013-01-09 武汉大学 Analytical system for attention of video signal
CN101964911A (en) * 2010-10-09 2011-02-02 浙江大学 Ground power unit (GPU)-based video layering method
CN102034267A (en) * 2010-11-30 2011-04-27 中国科学院自动化研究所 Three-dimensional reconstruction method of target based on attention
CN102572216B (en) * 2010-12-15 2015-07-22 佳能株式会社 Image processing apparatus and image processing method thereof
CN102572216A (en) * 2010-12-15 2012-07-11 佳能株式会社 Image processing apparatus and image processing method thereof
US8699797B2 (en) 2010-12-15 2014-04-15 Canon Kabushiki Kaisha Image processing apparatus, image processing method thereof, and computer-readable storage medium
CN102036073A (en) * 2010-12-21 2011-04-27 西安交通大学 Method for encoding and decoding JPEG2000 image based on vision potential attention target area
CN102036073B (en) * 2010-12-21 2012-11-28 西安交通大学 Method for encoding and decoding JPEG2000 image based on vision potential attention target area
CN102063623A (en) * 2010-12-28 2011-05-18 中南大学 Method for extracting image region of interest by combining bottom-up and top-down ways
CN102630025A (en) * 2011-02-03 2012-08-08 美国博通公司 Method and system for processing signal
CN102630025B (en) * 2011-02-03 2015-10-28 美国博通公司 A kind of method and system of processing signals
CN103797510A (en) * 2011-07-07 2014-05-14 Ati科技无限责任公司 View Image Processing for Focus Orientation
CN102496024A (en) * 2011-11-25 2012-06-13 山东大学 Method for detecting incident triggered by characteristic frame in intelligent monitor
CN102663741B (en) * 2012-03-22 2014-09-24 侯克杰 Method for carrying out visual stereo perception enhancement on color digit image and system thereof
CN102663741A (en) * 2012-03-22 2012-09-12 北京佳泰信业技术有限公司 Method for carrying out visual stereo perception enhancement on color digit image and system thereof
CN103546736A (en) * 2012-07-12 2014-01-29 三星电子株式会社 Image processing device and method
US9661296B2 (en) 2012-07-12 2017-05-23 Samsung Electronics Co., Ltd. Image processing apparatus and method
CN103546736B (en) * 2012-07-12 2016-12-28 三星电子株式会社 Image processing equipment and method
CN103095996A (en) * 2013-01-25 2013-05-08 西安电子科技大学 Multi-sensor video fusion method based on space-time conspicuousness detection
WO2015188666A1 (en) * 2014-06-13 2015-12-17 华为技术有限公司 Three-dimensional video filtering method and device
CN104318569A (en) * 2014-10-27 2015-01-28 北京工业大学 Space salient region extraction method based on depth variation model
CN104318569B (en) * 2014-10-27 2017-02-22 北京工业大学 Space salient region extraction method based on depth variation model
CN105550685A (en) * 2015-12-11 2016-05-04 哈尔滨工业大学 Visual attention mechanism based region-of-interest extraction method for large-format remote sensing image
CN105550685B (en) * 2015-12-11 2019-01-08 哈尔滨工业大学 The large format remote sensing image area-of-interest exacting method of view-based access control model attention mechanism
CN105893999A (en) * 2016-03-31 2016-08-24 北京奇艺世纪科技有限公司 Method and device for extracting a region of interest
CN109754357A (en) * 2018-01-26 2019-05-14 京东方科技集团股份有限公司 Image processing method, processing device, and processing device
CN109754357B (en) * 2018-01-26 2021-09-21 京东方科技集团股份有限公司 Image processing method, processing device and processing equipment
CN108961261A (en) * 2018-03-14 2018-12-07 中南大学 A kind of optic disk region OCT image Hierarchical Segmentation method based on spatial continuity constraint
CN108961261B (en) * 2018-03-14 2022-02-15 中南大学 Optic disk region OCT image hierarchy segmentation method based on space continuity constraint
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
CN110110578B (en) * 2019-02-21 2023-09-29 北京工业大学 Indoor scene semantic annotation method
CN109903247A (en) * 2019-02-22 2019-06-18 西安工程大学 High-precision grayscale method for color images based on Gaussian color space correlation
CN111723829B (en) * 2019-03-18 2022-05-06 四川大学 A fully convolutional object detection method based on attention mask fusion
CN111723829A (en) * 2019-03-18 2020-09-29 四川大学 A fully convolutional object detection method based on attention mask fusion
CN110070538B (en) * 2019-04-28 2022-04-15 华北电力大学(保定) Two-dimensional visual structure clustering method of bolts based on morphological optimization depth features
CN110070538A (en) * 2019-04-28 2019-07-30 华北电力大学(保定) Bolt two-dimensional visual documents structured Cluster method based on form optimization depth characteristic
CN110399842B (en) * 2019-07-26 2021-09-28 北京奇艺世纪科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium
CN110399842A (en) * 2019-07-26 2019-11-01 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and computer readable storage medium
CN110675940A (en) * 2019-08-01 2020-01-10 平安科技(深圳)有限公司 Pathological image labeling method and device, computer equipment and storage medium
CN112654546A (en) * 2020-04-30 2021-04-13 华为技术有限公司 Method and device for identifying object of interest of user
WO2021217575A1 (en) * 2020-04-30 2021-11-04 华为技术有限公司 Identification method and identification device for object of interest of user
CN112654546B (en) * 2020-04-30 2022-08-02 华为技术有限公司 Identification method and identification device of object of interest to user
CN113572958A (en) * 2021-07-15 2021-10-29 杭州海康威视数字技术股份有限公司 Method and equipment for automatically triggering camera to focus
CN113936015B (en) * 2021-12-17 2022-03-25 青岛美迪康数字工程有限公司 Method and device for extracting effective region of image
CN113936015A (en) * 2021-12-17 2022-01-14 青岛美迪康数字工程有限公司 Method and device for extracting effective region of image

Also Published As

Publication number Publication date
CN101651772B (en) 2011-03-16

Similar Documents

Publication Publication Date Title
CN101651772B (en) Method for extracting video interested region based on visual attention
CN101588445B (en) A Depth-Based Method for Extracting Regions of Interest in Video
CN102741879B (en) Method and system for generating depth map from monocular image
US8488868B2 (en) Generation of a depth map from a monoscopic color image for rendering stereoscopic still and video images
CN102271254B (en) A Preprocessing Method of Depth Image
RU2587425C2 (en) Method of producing high-quality image depth map
CN101699512B (en) Depth generating method based on background difference sectional drawing and sparse optical flow method
US20120274626A1 (en) Stereoscopic Image Generating Apparatus and Method
CN101180653A (en) Method and device for three-dimensional rendering
US20110249886A1 (en) Image converting device and three-dimensional image display device including the same
CN105069808A (en) Video image depth estimation method based on image segmentation
CN106127799B (en) A kind of visual attention detection method for 3 D video
CN102203829A (en) Method and device for generating a depth map
CN103996198A (en) Method for detecting region of interest in complicated natural environment
Kuo et al. Depth estimation from a monocular view of the outdoors
CN102263979A (en) Method and device for generating depth map for stereoscopic planar video
CN105869115A (en) Depth image super-resolution method based on kinect2.0
CN119251618A (en) Infrared small target detection method based on wavelet guided state space model
Fan et al. Vivid-DIBR based 2D–3D image conversion system for 3D display
CN102780900B (en) Image display method of multi-person multi-view stereoscopic display
CN103152569A (en) Video ROI (region of interest) compression method based on depth information
CN115188039B (en) Depth fake video technology tracing method based on image frequency domain information
CN101610422A (en) Method for compressing three-dimensional image video sequence
CN104143203A (en) A method of image editing and dissemination
Yang et al. Depth map generation using local depth hypothesis for 2D-to-3D conversion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHANGHAI SILICON INTELLECTUAL PROPERTY EXCHANGE CE

Free format text: FORMER OWNER: NINGBO UNIVERSITY

Effective date: 20120105

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 315211 NINGBO, ZHEJIANG PROVINCE TO: 200030 XUHUI, SHANGHAI

TR01 Transfer of patent right

Effective date of registration: 20120105

Address after: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1704

Patentee after: Shanghai Silicon Intellectual Property Exchange Co.,Ltd.

Address before: 315211 Zhejiang Province, Ningbo Jiangbei District Fenghua Road No. 818

Patentee before: Ningbo University

ASS Succession or assignment of patent right

Owner name: SHANGHAI SIPAI KESI TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: SHANGHAI SILICON INTELLECTUAL PROPERTY EXCHANGE CENTER CO., LTD.

Effective date: 20120217

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 200030 XUHUI, SHANGHAI TO: 201203 PUDONG NEW AREA, SHANGHAI

TR01 Transfer of patent right

Effective date of registration: 20120217

Address after: 201203 Shanghai Chunxiao Road No. 350 South Building Room 207

Patentee after: Shanghai spparks Technology Co.,Ltd.

Address before: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1704

Patentee before: Shanghai Silicon Intellectual Property Exchange Co.,Ltd.

ASS Succession or assignment of patent right

Owner name: SHANGHAI GUIZHI INTELLECTUAL PROPERTY SERVICE CO.,

Free format text: FORMER OWNER: SHANGHAI SIPAI KESI TECHNOLOGY CO., LTD.

Effective date: 20120606

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee
CP02 Change in the address of a patent holder

Address after: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1706

Patentee after: Shanghai spparks Technology Co.,Ltd.

Address before: 201203 Shanghai Chunxiao Road No. 350 South Building Room 207

Patentee before: Shanghai spparks Technology Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20120606

Address after: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1704

Patentee after: Shanghai Guizhi Intellectual Property Service Co.,Ltd.

Address before: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1706

Patentee before: Shanghai spparks Technology Co.,Ltd.

DD01 Delivery of document by public notice

Addressee: Shi Lingling

Document name: Notification of Passing Examination on Formalities

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200120

Address after: 201203 block 22301-1450, building 14, No. 498, GuoShouJing Road, Pudong New Area (Shanghai) pilot Free Trade Zone, Shanghai

Patentee after: Shanghai spparks Technology Co.,Ltd.

Address before: 200030 Shanghai City No. 333 Yishan Road Huixin International Building 1 building 1704

Patentee before: Shanghai Guizhi Intellectual Property Service Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110316

Termination date: 20200911

CF01 Termination of patent right due to non-payment of annual fee